LangChain - Parent-Document Retriever Deepdive with Custom PgVector Store

In this video we gonna make a Deepdive into Parent-Document Retriever. We not only use the langchain docstore, but we will also create our own custom docstore. This is quite an advanced video and probably the advanced one you will about this topic on KZread
Code: github.com/Coding-Crashkurse/...
Timestamps:
0:00 Introduction into Parent-Document Retriever
1:55 PD-Retriever with InMemory Store
6:37 PD-Retriever with Postgres based Store

Пікірлер: 23

@jarekmor10 күн бұрын
hi! I like your videos and I learned a lot from it. Your approach is realy production-ready and I am implementing some of your ideas in my PoC for one of my customers.There will be more stuff in the PoC - MS Sharepoint On-Premise integration, AD and LDAP authorization, Neo4J, Multivectorstore ret., etc. But your ideas was the fundation for my project. Thank you very much and keep going! :-)
@varruktalalle4090Ай бұрын
Can you explain how to reload the pg-parentDocRetriever e.g. first create the retriever as you showed and then reload the retriever in a different script?
@maxlgemeinderat9202Ай бұрын
working exactly on this at the moment! My eval showed that ParentDocumentRetriever works best for my use case. What do you think of my idea of implementing a Reranker (e.g. ColBERT) after retrieving the small chunks and the only get the parent chunks of the reranked child chunks? Atm I am trying to implement this but I think I have to change the MultivectorRetriever class in Langchain. Or how would you add this to your solution (e.g. doing reranking with langchain CompressionRetriever)? I can't rerank the results in the end as normal, as the ParentChunks probably will be too large for a reranking model with 512 max_tokens
@M10n8Ай бұрын
This can be extended nicely over MultiVectorRetriever which nicely pair with 'unstructured' library, so you can make RAG over pdf files which unstructured would extract tables, images and text separately and ask model to make captions from images (base64 passed to openai), make summary from tables and if you like also text, then store that and retrieve using MultiVectorRetriever with PGVector as db ;-) Can I request video? ++
@codingcrashcourses8533
Ай бұрын
Next Videos will be about LangGraph, but maybe after that :)
@thawab85Ай бұрын
you had a few videos on Raptor, would be great if you can compare the indexing methods and what's the usecases each is recommended for.
@codingcrashcourses8533
Ай бұрын
Great idea!
@AngelWhite007Ай бұрын
Please make a video on creating a sidebar like Chatgpt using ReactJs and Langchain Python
@codingcrashcourses8533
Ай бұрын
Man this is 90 percent Front end work, you will find better people to build this
@Emmit-hv5pwАй бұрын
Thanks !! Any plans of a tutorial on a custom agents with memory having custom tools to retrieve information from a SQL DB, vector store (pdf) and tool calling (real time info) with eval on LangSmith in a real business case environment.
@codingcrashcourses8533
Ай бұрын
Probably too difficult for a tutorial to do all that stuff at once. Maybe an easier usecase with RAG and Memory.
@yazanrisheh5127Ай бұрын
First
@angelmoreno338315 күн бұрын
That is a really interesting implementation! I wonder if this could help reducing time on the retriever.add_documents operation, as I'm trying to do a RAG with around 100 pdfs and when testing ParentDocument retriever this is delaying too much. Do you know any solution for this?
@codingcrashcourses8533
15 күн бұрын
Hm, how do you preprocess your pdfs? How many chunks do you have at the end?
@angelmoreno3383
15 күн бұрын
@@codingcrashcourses8533 On my vectorstore they are splitted on 800 chunk size. On my store im loading them using PyPDF loaders and a kv docstore
@angelmoreno3383
15 күн бұрын
@@codingcrashcourses8533 im using PyPDF loader and then storing them on a LocalFileStore using create_kv_docstore. At the end my docstore has around 350 chunks
@andreypetrunin5702Ай бұрын
Markus, hi. Can you give me the code to this video? I want to convert it to the Xata database.
@codingcrashcourses8533
Ай бұрын
I added the notebook
@andreypetrunin5702
Ай бұрын
@@codingcrashcourses8533 Спасибо!!!!
@andreypetrunin5702
20 күн бұрын
@@codingcrashcourses8533 The code only creates and saves the database, but how do I load it when I reuse it? If I didn't see it, I apologize.
@codingcrashcourses8533
20 күн бұрын
@@andreypetrunin5702 You don´t have to "reload" it when you use PgVector, this service is permanently running inside a container. The "get_relevant_documents" method already uses it
@andreypetrunin5702
20 күн бұрын
@@codingcrashcourses8533 confused with the local FAISS and Croma databases. ))))

LangChain - Parent-Document Retriever Deepdive with Custom PgVector Store

Пікірлер: 23

@codingcrashcourses8533

Ай бұрын

@codingcrashcourses8533

Ай бұрын

@codingcrashcourses8533

Ай бұрын

@codingcrashcourses8533

Ай бұрын

@codingcrashcourses8533

15 күн бұрын

@angelmoreno3383

15 күн бұрын

@angelmoreno3383

15 күн бұрын

@codingcrashcourses8533

Ай бұрын

@andreypetrunin5702

Ай бұрын

@andreypetrunin5702

20 күн бұрын

@codingcrashcourses8533

20 күн бұрын

@andreypetrunin5702

20 күн бұрын

Келесі