Multi-modal RAG: Chat with Docs containing Images
Ғылым және технология
Learn how to build a multimodal RAG system using CLIP mdoel.
LINKS:
Notebook: tinyurl.com/pfc64874
Flow charts in the paper:
tinyurl.com/4pp78xuf
tinyurl.com/5yeww5py
tinyurl.com/4un6y6x5
tinyurl.com/2jkbb3ma
💻 RAG Beyond Basics Course:
prompt-s-site.thinkific.com/c...
Let's Connect:
🦾 Discord: / discord
☕ Buy me a Coffee: ko-fi.com/promptengineering
|🔴 Patreon: / promptengineering
💼Consulting: calendly.com/engineerprompt/c...
📧 Business Contact: engineerprompt@gmail.com
Become Member: tinyurl.com/y5h28s6h
💻 Pre-configured localGPT VM: bit.ly/localGPT (use Code: PromptEngineering for 50% off).
Signup for Newsletter, localgpt:
tally.so/r/3y9bb0
00:00 Introduction to Multimodal RAC Systems
01:24 First Approach: Unified Vector Space
02:23 Second Approach: Grounding Modalities to Text
03:57 Third Approach: Separate Vector Stores
06:26 Code Implementation: Setting Up
09:05 Code Implementation: Downloading Data
11:13 Code Implementation: Creating Vector Stores
14:00 Querying the Vector Store
All Interesting Videos:
Everything LangChain: • LangChain
Everything LLM: • Large Language Models
Everything Midjourney: • MidJourney Tutorials
AI Image Generation: • AI Image Generation Tu...
Пікірлер: 31
If you want to learn RAG Beyond Basics, checkout this course: prompt-s-site.thinkific.com/courses/rag
a nice open source and self hosted version would be great
Such an insightful information, Eagerly waiting for more multimodel approches.
We need more videos on this topic
Keep going with this approach, it is something I have been struggling with.
@waju3234
12 күн бұрын
Me too. For my case, the answer is normally hidden behind the data, context and the images.
Lots of good info, thanks
What about make same, but using LLAMA3 or less local LLM?
Thanks your videos are very helpful. I have several Gigs of pdf ebooks that i would like to process with RAG. What do you think what approach would be the best, this or a graphrag. In my case i'm looking only for local models as the costs would be very high. What if to convert all pdf pages into images first and then process them with local model like phi 3 vision and then process it with Graphrag, would it work out?
It is essential to conduct a thorough preprocessing of the documents before entering them into the RAG. This involves extracting the text, tables, and images, and processing the latter through a vision module. Additionally, it is crucial to maintain content coherence by ensuring that references to tables and images are correctly preserved in the text. Only after this processing should the documents be entered into a LLM.
@engineerprompt
13 күн бұрын
agree!
@jtjames79
12 күн бұрын
That's a lot of work. Can an AI do this?
@engineerprompt
12 күн бұрын
@@jtjames79 Yup :)
Hi your videos are very helpful thank you
@engineerprompt
13 күн бұрын
Glad you like them!
I appreciate your effort. Pl create one to fine tune the model for efficient retrieval if possible, with lang chain.
wheres the code used?
Can you pls dive deeper into why qdrant was used and other vector dbs limitations to store both text and image embeddings, thx
@engineerprompt
13 күн бұрын
will see if I can create a video on it.
it's great job! Thanks
@engineerprompt
13 күн бұрын
thanks :)
Can we do this method using Langchain ?
@engineerprompt
13 күн бұрын
Yes, will be creating a video on it.
Is it better than GraphRAG? How does the output quality compare to it?
@engineerprompt
10 күн бұрын
You could potentially create a graphRAG on top of it.
Thanks
Very nice video but if you can do it with open source embedding model it would be very cool. thank you for the video
This approach is not good enough to add value. The pictures and text needs to be referenced and linked in both vector stores to create better similarities.
@engineerprompt
10 күн бұрын
watch my latest video :)
I except image generation will be have another kind of breed... image gen based on image understanding based on facts