LangChain - Parent-Document Retriever Deepdive with Custom PgVector Store

In this video we gonna make a Deepdive into Parent-Document Retriever. We not only use the langchain docstore, but we will also create our own custom docstore. This is quite an advanced video and probably the advanced one you will about this topic on KZread
Code: github.com/Coding-Crashkurse/...
Timestamps:
0:00 Introduction into Parent-Document Retriever
1:55 PD-Retriever with InMemory Store
6:37 PD-Retriever with Postgres based Store

Пікірлер: 23

  • @jarekmor
    @jarekmor10 күн бұрын

    hi! I like your videos and I learned a lot from it. Your approach is realy production-ready and I am implementing some of your ideas in my PoC for one of my customers.There will be more stuff in the PoC - MS Sharepoint On-Premise integration, AD and LDAP authorization, Neo4J, Multivectorstore ret., etc. But your ideas was the fundation for my project. Thank you very much and keep going! :-)

  • @varruktalalle4090
    @varruktalalle4090Ай бұрын

    Can you explain how to reload the pg-parentDocRetriever e.g. first create the retriever as you showed and then reload the retriever in a different script?

  • @maxlgemeinderat9202
    @maxlgemeinderat9202Ай бұрын

    working exactly on this at the moment! My eval showed that ParentDocumentRetriever works best for my use case. What do you think of my idea of implementing a Reranker (e.g. ColBERT) after retrieving the small chunks and the only get the parent chunks of the reranked child chunks? Atm I am trying to implement this but I think I have to change the MultivectorRetriever class in Langchain. Or how would you add this to your solution (e.g. doing reranking with langchain CompressionRetriever)? I can't rerank the results in the end as normal, as the ParentChunks probably will be too large for a reranking model with 512 max_tokens

  • @M10n8
    @M10n8Ай бұрын

    This can be extended nicely over MultiVectorRetriever which nicely pair with 'unstructured' library, so you can make RAG over pdf files which unstructured would extract tables, images and text separately and ask model to make captions from images (base64 passed to openai), make summary from tables and if you like also text, then store that and retrieve using MultiVectorRetriever with PGVector as db ;-) Can I request video? ++

  • @codingcrashcourses8533

    @codingcrashcourses8533

    Ай бұрын

    Next Videos will be about LangGraph, but maybe after that :)

  • @thawab85
    @thawab85Ай бұрын

    you had a few videos on Raptor, would be great if you can compare the indexing methods and what's the usecases each is recommended for.

  • @codingcrashcourses8533

    @codingcrashcourses8533

    Ай бұрын

    Great idea!

  • @AngelWhite007
    @AngelWhite007Ай бұрын

    Please make a video on creating a sidebar like Chatgpt using ReactJs and Langchain Python

  • @codingcrashcourses8533

    @codingcrashcourses8533

    Ай бұрын

    Man this is 90 percent Front end work, you will find better people to build this

  • @Emmit-hv5pw
    @Emmit-hv5pwАй бұрын

    Thanks !! Any plans of a tutorial on a custom agents with memory having custom tools to retrieve information from a SQL DB, vector store (pdf) and tool calling (real time info) with eval on LangSmith in a real business case environment.

  • @codingcrashcourses8533

    @codingcrashcourses8533

    Ай бұрын

    Probably too difficult for a tutorial to do all that stuff at once. Maybe an easier usecase with RAG and Memory.

  • @yazanrisheh5127
    @yazanrisheh5127Ай бұрын

    First

  • @angelmoreno3383
    @angelmoreno338315 күн бұрын

    That is a really interesting implementation! I wonder if this could help reducing time on the retriever.add_documents operation, as I'm trying to do a RAG with around 100 pdfs and when testing ParentDocument retriever this is delaying too much. Do you know any solution for this?

  • @codingcrashcourses8533

    @codingcrashcourses8533

    15 күн бұрын

    Hm, how do you preprocess your pdfs? How many chunks do you have at the end?

  • @angelmoreno3383

    @angelmoreno3383

    15 күн бұрын

    @@codingcrashcourses8533 On my vectorstore they are splitted on 800 chunk size. On my store im loading them using PyPDF loaders and a kv docstore

  • @angelmoreno3383

    @angelmoreno3383

    15 күн бұрын

    @@codingcrashcourses8533 im using PyPDF loader and then storing them on a LocalFileStore using create_kv_docstore. At the end my docstore has around 350 chunks

  • @andreypetrunin5702
    @andreypetrunin5702Ай бұрын

    Markus, hi. Can you give me the code to this video? I want to convert it to the Xata database.

  • @codingcrashcourses8533

    @codingcrashcourses8533

    Ай бұрын

    I added the notebook

  • @andreypetrunin5702

    @andreypetrunin5702

    Ай бұрын

    @@codingcrashcourses8533 Спасибо!!!!

  • @andreypetrunin5702

    @andreypetrunin5702

    20 күн бұрын

    @@codingcrashcourses8533 The code only creates and saves the database, but how do I load it when I reuse it? If I didn't see it, I apologize.

  • @codingcrashcourses8533

    @codingcrashcourses8533

    20 күн бұрын

    @@andreypetrunin5702 You don´t have to "reload" it when you use PgVector, this service is permanently running inside a container. The "get_relevant_documents" method already uses it

  • @andreypetrunin5702

    @andreypetrunin5702

    20 күн бұрын

    @@codingcrashcourses8533 confused with the local FAISS and Croma databases. ))))