Multi-modal RAG: Chat with Docs containing Images

Ғылым және технология

Learn how to build a multimodal RAG system using CLIP mdoel.
LINKS:
Notebook: tinyurl.com/pfc64874
Flow charts in the paper:
tinyurl.com/4pp78xuf
tinyurl.com/5yeww5py
tinyurl.com/4un6y6x5
tinyurl.com/2jkbb3ma
💻 RAG Beyond Basics Course:
prompt-s-site.thinkific.com/c...
Let's Connect:
🦾 Discord: / discord
☕ Buy me a Coffee: ko-fi.com/promptengineering
|🔴 Patreon: / promptengineering
💼Consulting: calendly.com/engineerprompt/c...
📧 Business Contact: engineerprompt@gmail.com
Become Member: tinyurl.com/y5h28s6h
💻 Pre-configured localGPT VM: bit.ly/localGPT (use Code: PromptEngineering for 50% off).
Signup for Newsletter, localgpt:
tally.so/r/3y9bb0
00:00 Introduction to Multimodal RAC Systems
01:24 First Approach: Unified Vector Space
02:23 Second Approach: Grounding Modalities to Text
03:57 Third Approach: Separate Vector Stores
06:26 Code Implementation: Setting Up
09:05 Code Implementation: Downloading Data
11:13 Code Implementation: Creating Vector Stores
14:00 Querying the Vector Store
All Interesting Videos:
Everything LangChain: • LangChain
Everything LLM: • Large Language Models
Everything Midjourney: • MidJourney Tutorials
AI Image Generation: • AI Image Generation Tu...

Пікірлер: 31

  • @engineerprompt
    @engineerprompt13 күн бұрын

    If you want to learn RAG Beyond Basics, checkout this course: prompt-s-site.thinkific.com/courses/rag

  • @ilaydelrey3122
    @ilaydelrey312213 күн бұрын

    a nice open source and self hosted version would be great

  • @AI-Teamone
    @AI-Teamone11 күн бұрын

    Such an insightful information, Eagerly waiting for more multimodel approches.

  • @tasfiulhedayet
    @tasfiulhedayet13 күн бұрын

    We need more videos on this topic

  • @aerotheory
    @aerotheory13 күн бұрын

    Keep going with this approach, it is something I have been struggling with.

  • @waju3234

    @waju3234

    12 күн бұрын

    Me too. For my case, the answer is normally hidden behind the data, context and the images.

  • @RolandoLopezNieto
    @RolandoLopezNieto13 күн бұрын

    Lots of good info, thanks

  • @Techn0man1ac
    @Techn0man1ac8 күн бұрын

    What about make same, but using LLAMA3 or less local LLM?

  • @BACA01
    @BACA0113 күн бұрын

    Thanks your videos are very helpful. I have several Gigs of pdf ebooks that i would like to process with RAG. What do you think what approach would be the best, this or a graphrag. In my case i'm looking only for local models as the costs would be very high. What if to convert all pdf pages into images first and then process them with local model like phi 3 vision and then process it with Graphrag, would it work out?

  • @ignaciopincheira23
    @ignaciopincheira2313 күн бұрын

    It is essential to conduct a thorough preprocessing of the documents before entering them into the RAG. This involves extracting the text, tables, and images, and processing the latter through a vision module. Additionally, it is crucial to maintain content coherence by ensuring that references to tables and images are correctly preserved in the text. Only after this processing should the documents be entered into a LLM.

  • @engineerprompt

    @engineerprompt

    13 күн бұрын

    agree!

  • @jtjames79

    @jtjames79

    12 күн бұрын

    That's a lot of work. Can an AI do this?

  • @engineerprompt

    @engineerprompt

    12 күн бұрын

    @@jtjames79 Yup :)

  • @ArdeniusYT
    @ArdeniusYT13 күн бұрын

    Hi your videos are very helpful thank you

  • @engineerprompt

    @engineerprompt

    13 күн бұрын

    Glad you like them!

  • @pradeepmallampalli4541
    @pradeepmallampalli454113 күн бұрын

    I appreciate your effort. Pl create one to fine tune the model for efficient retrieval if possible, with lang chain.

  • @JNET_Reloaded
    @JNET_Reloaded13 күн бұрын

    wheres the code used?

  • @vinayakaholla
    @vinayakaholla13 күн бұрын

    Can you pls dive deeper into why qdrant was used and other vector dbs limitations to store both text and image embeddings, thx

  • @engineerprompt

    @engineerprompt

    13 күн бұрын

    will see if I can create a video on it.

  • @mohsenghafari7652
    @mohsenghafari765213 күн бұрын

    it's great job! Thanks

  • @engineerprompt

    @engineerprompt

    13 күн бұрын

    thanks :)

  • @amanharis1845
    @amanharis184513 күн бұрын

    Can we do this method using Langchain ?

  • @engineerprompt

    @engineerprompt

    13 күн бұрын

    Yes, will be creating a video on it.

  • @codelucky
    @codelucky10 күн бұрын

    Is it better than GraphRAG? How does the output quality compare to it?

  • @engineerprompt

    @engineerprompt

    10 күн бұрын

    You could potentially create a graphRAG on top of it.

  • @garfield584
    @garfield58412 күн бұрын

    Thanks

  • @legendchdou9578
    @legendchdou957813 күн бұрын

    Very nice video but if you can do it with open source embedding model it would be very cool. thank you for the video

  • @redbaron3555
    @redbaron355513 күн бұрын

    This approach is not good enough to add value. The pictures and text needs to be referenced and linked in both vector stores to create better similarities.

  • @engineerprompt

    @engineerprompt

    10 күн бұрын

    watch my latest video :)

  • @RickySupriyadi
    @RickySupriyadi13 күн бұрын

    I except image generation will be have another kind of breed... image gen based on image understanding based on facts

Келесі