Multi-modal RAG: Chat with Docs containing Images

Ғылым және технология

Learn how to build a multimodal RAG system using CLIP mdoel.
LINKS:
Notebook: tinyurl.com/pfc64874
Flow charts in the paper:
tinyurl.com/4pp78xuf
tinyurl.com/5yeww5py
tinyurl.com/4un6y6x5
tinyurl.com/2jkbb3ma
💻 RAG Beyond Basics Course:
prompt-s-site.thinkific.com/c...
Let's Connect:
🦾 Discord: / discord
☕ Buy me a Coffee: ko-fi.com/promptengineering
|🔴 Patreon: / promptengineering
💼Consulting: calendly.com/engineerprompt/c...
📧 Business Contact: engineerprompt@gmail.com
Become Member: tinyurl.com/y5h28s6h
💻 Pre-configured localGPT VM: bit.ly/localGPT (use Code: PromptEngineering for 50% off).
Signup for Newsletter, localgpt:
tally.so/r/3y9bb0
00:00 Introduction to Multimodal RAC Systems
01:24 First Approach: Unified Vector Space
02:23 Second Approach: Grounding Modalities to Text
03:57 Third Approach: Separate Vector Stores
06:26 Code Implementation: Setting Up
09:05 Code Implementation: Downloading Data
11:13 Code Implementation: Creating Vector Stores
14:00 Querying the Vector Store
All Interesting Videos:
Everything LangChain: • LangChain
Everything LLM: • Large Language Models
Everything Midjourney: • MidJourney Tutorials
AI Image Generation: • AI Image Generation Tu...

Пікірлер: 31

@engineerprompt13 күн бұрын
If you want to learn RAG Beyond Basics, checkout this course: prompt-s-site.thinkific.com/courses/rag
@ilaydelrey312213 күн бұрын
a nice open source and self hosted version would be great
@AI-Teamone11 күн бұрын
Such an insightful information, Eagerly waiting for more multimodel approches.
@tasfiulhedayet13 күн бұрын
We need more videos on this topic
@aerotheory13 күн бұрын
Keep going with this approach, it is something I have been struggling with.
@waju3234
12 күн бұрын
Me too. For my case, the answer is normally hidden behind the data, context and the images.
@RolandoLopezNieto13 күн бұрын
Lots of good info, thanks
@Techn0man1ac8 күн бұрын
What about make same, but using LLAMA3 or less local LLM?
@BACA0113 күн бұрын
Thanks your videos are very helpful. I have several Gigs of pdf ebooks that i would like to process with RAG. What do you think what approach would be the best, this or a graphrag. In my case i'm looking only for local models as the costs would be very high. What if to convert all pdf pages into images first and then process them with local model like phi 3 vision and then process it with Graphrag, would it work out?
@ignaciopincheira2313 күн бұрын
It is essential to conduct a thorough preprocessing of the documents before entering them into the RAG. This involves extracting the text, tables, and images, and processing the latter through a vision module. Additionally, it is crucial to maintain content coherence by ensuring that references to tables and images are correctly preserved in the text. Only after this processing should the documents be entered into a LLM.
@engineerprompt
13 күн бұрын
agree!
@jtjames79
12 күн бұрын
That's a lot of work. Can an AI do this?
@engineerprompt
12 күн бұрын
@@jtjames79 Yup :)
@ArdeniusYT13 күн бұрын
Hi your videos are very helpful thank you
@engineerprompt
13 күн бұрын
Glad you like them!
@pradeepmallampalli454113 күн бұрын
I appreciate your effort. Pl create one to fine tune the model for efficient retrieval if possible, with lang chain.
@JNET_Reloaded13 күн бұрын
wheres the code used?
@vinayakaholla13 күн бұрын
Can you pls dive deeper into why qdrant was used and other vector dbs limitations to store both text and image embeddings, thx
@engineerprompt
13 күн бұрын
will see if I can create a video on it.
@mohsenghafari765213 күн бұрын
it's great job! Thanks
@engineerprompt
13 күн бұрын
thanks :)
@amanharis184513 күн бұрын
Can we do this method using Langchain ?
@engineerprompt
13 күн бұрын
Yes, will be creating a video on it.
@codelucky10 күн бұрын
Is it better than GraphRAG? How does the output quality compare to it?
@engineerprompt
10 күн бұрын
You could potentially create a graphRAG on top of it.
@garfield58412 күн бұрын
Thanks
@legendchdou957813 күн бұрын
Very nice video but if you can do it with open source embedding model it would be very cool. thank you for the video
@redbaron355513 күн бұрын
This approach is not good enough to add value. The pictures and text needs to be referenced and linked in both vector stores to create better similarities.
@engineerprompt
10 күн бұрын
watch my latest video :)
@RickySupriyadi13 күн бұрын
I except image generation will be have another kind of breed... image gen based on image understanding based on facts