Advanced RAG with Knowledge Graphs (Neo4J demo)
I recently created a demo for some prospective clients of mine, demonstrating how to use Large Language Models (LLMs) together with graph databases like Neo4J.
The two have a lot of interesting interactions, namely that you can now create knowledge graphs easier than ever before, by having AI find the graph entities and relationships from your unstructured data, rather than having to do all that manually.
On top of that, graph databases also have some advantages for Retrieval Augmented Generation (RAG) applications compared to vector search, which is currently the prevailing approach to RAG.
Connect with me on LinkedIn: / johannesjolkkonen
▬▬▬▬▬▬ T I M E S T A M P S ▬▬▬▬▬▬
0:00 - Intro
2:16 - Demo starts
2:55 - Creating graph from unstructured data
4:23 - Chatting with the knowledge graph
5:55 - Advantages of Graphs vs Vector Search
Пікірлер: 77
Hey everybody, thanks for the great comments! Finally got around to making a more detailed tutorial for this demo, with code available on Github. You can check it out here: kzread.info/dash/bejne/ppd8q6Z8d9icido.html
this is incredible! I see so many use cases opening up. thank you for sharing this!
This is a great video. It clearly explained to me the difference between vector database and graph databases and the new features. We can build using the Graph Databases. Thank you.
Super nice food for thought. Thanks for sharing an alternative. Would love a deeper dive with some clear examples confirming the 3 advantages 😊 But might experiment myself for fun too!
This concept & this video are truly amazing. I have a specific idea how to apply this - i think this might change my whole project & I will explore this graph based approach!! Great work - thank you.
Very nice demo. It showed why and how to use the graph database for RAG and answered questions that I came up with while watching.
great content and delivery - love your work
Really neat demo! I think this works so well because graphs help LLMs approximate the sort of clear relationships humans have in their brain about the world.
Great video! I wanted to explore the graph dbs exactly for this use case. Imagine also adding work pieces to this. Jiras, code reviews, comments, etc. P.S. the music is great 😂
Well illustrated! Thanks
I love when data engineers making videos it's so easy to understand side look even the description is structured 👍
Wow. This is amazing
I had to subscribe based on this idea alone! I'm trying to think of another way I could implement this with standard RAG for those that use LangChain/Flowise, and Mermaid code to hold the node information.
Gotta look at decentrlaized knowledge graphs. Those are the future of RAG databases.
Good présentation. Thank you!
That's exactly what I am looking for? Apart from the tutorials, are you also considering starting a discord channel where people can chat? I think there is growing interest in KG + LLMs but no where to dicuss
very interesting thanks !
thank you for video
Excellent job Johannes! After watching the video "Knowledge Graph Construction Demo from raw text using an LLM" by Neo4j, I came across your video and found that you addressed the crucially important question some of us are thinking about: "How can we improve the way we do RAG?" I agree with your assessment that using KGs provide very significant benefits that would compel us to want to use this approach vs using vector embeddings. However, am I correct in understanding that we need better workflows / pipelines to get all the kinds of data we need to work with into a KG to take more advantage of these benefits? Sounds like you may have listened to Denny Vrandecic discusses The Future of Knowledge Graphs in a World of Large Language Models.
@johannesjolkkonen
6 ай бұрын
Hey Steve, thank you! You are correct, using a KG will almost certainly involve more pre-processing/workflows compared to just having an unstructured text/vector database. LLMs can be very useful in the process of extracting entities and relationships for your graph, but it's still a serious undertaking, with a lot of quality checks needed to make it production-ready. It's all still pretty experimental and niche, but I think this approach will become increasingly mainstream over the next 1-2 years. I haven't checked out Danny's video, but I definitely will now! I can also recommend going through the content that the Neo4J team has been creating around LLMs
@evetsnilrac9689
6 ай бұрын
@@johannesjolkkonen Here's my summary of the key points of Denny's presentation. • LLMs are expensive to train • LLMs are expensive to run inference responses. • LLMs can’t be trusted to correctly output accurate facts. o Answers are just guessing based on stochastic probability, even if it has inferred a different answer in a different language - i.e., it does not “know” what it “knows” because it does not maintain a list of all the things it knows-it just generates outputs at inference runtime. • Knowledge in ChatGPT seems to be not stored in a language-independent way, but is stored within each individual language. • They are not very good at math and it would be economically inappropriate to use them for math computation • Autoregressive Transformer Models such as ChatGPT are supposed to be Turing complete, but they are a very expensive reiteration of Turing’s tarpit. You could do everything with them, but it doesn't mean you should. • It is economically inappropriate to attempt to improve LLM’s ability to internalize knowledge (know what it knows) because it will always be cheaper, faster, and more accurate(?) to externalize it in a graph store and look it up when needed. In a world where language models can generate infinite content, "knowledge" (vs content) becomes valuable. • We don't want to machine learn Obama's place of birth every time we need it. • We want to store it once and for all and that's what knowledge graphs are good for: to keep your valuable knowledge safe. The knowledge graph provides you with the ground truth for your LLMs. • LLMs are probably the best tool for knowledge extraction we have seen developed in a decade or two. • They can be an amazing tool to speed up the creation of a knowledge graph. • We want to extract knowledge into a symbolic form. We want the system to overfit for truth. • And this is why it makes so much sense to store the knowledge in a symbolic system that can be edited, audited, curated, and understood…where we can cover the long tail by simply adding new nodes to the knowledge graph that can be simply looked up instead of systems that need to be trained to return knowledge with a certain probability that may have them making stuff up on the fly.
@wdonno
6 ай бұрын
@@evetsnilrac9689, such a helpful summary! Thank you!
A really awesome video Johannes, wondering if there is a github repo for this? Thanks.
Fabulous video, thanks! Would be even better with no music, or at least if it was very much lower volume :)
Thanks for sharing! Ca you also share how are you dealing with consolidation of output nodes. Some project descriptions might generate "Graph Neural Nets" another "Graph Neural Network" or "GNN"
@johannesjolkkonen
6 ай бұрын
Hey Djan! Consolidation/entity resolution is definitely one of the most interesting challenges with these kinds of applications, but in this demo there's nothing implemented for that yet
Nice!
How does the chat interface communicate with the database? Is it based on prompts that create cypher queries?
I'm working on somthing similar, but you make it look easy! Would love to chat and see if we could collaborate on something to get in front of clients :)
I would be curious about your view on when vector seach is better suited than graph search for RAG. Thanks for this great video! It helps a lot
@johannesjolkkonen
5 ай бұрын
Thank you! Vector search is still great for a lot of situations, when answers can be found directly in the unstructured text. Where graphs (or really any other more "structured" databases) start to shine is when you need to understand concepts and their relationships beyond what's explicitly said in the text. But this is a lot more demanding too, and often not necessary. Also the two aren't mutually exclusive, with neo4j (and recently AWS Neptune, another graph db) supporting vector search to also search nodes by their similarity. This combination is super exciting!
Is it better in some way than using SQL db and relations based on for example sql schemas etc. which also can be easily used when doing retrieval?
Did you use attributes to add more characteristics to the nodes an edges, example : to score strength of relationship ? I have tried to ask LLMS to create graphs using various prompts from its native knowledge and it does poorly, which is interesting as des it indicate a lack of understanding / relationships or more of a fine tuning issue, what do you think?
@johannesjolkkonen
5 ай бұрын
Hey, I haven't added such metadata but that's a great idea! For your problem, I'd say the most important thing is to make sure you tell the LLM what kinds of entities and relationships you are looking for. In other words, you should have a pre-defined schema in mind for your graph. Some pre-processing might also be useful if your data is also very messy.
very nice and inspiring. QQ: if gpt4 created incorrect cipher, do we try to detect and auto fix/retry?
@johannesjolkkonen
5 ай бұрын
Thank you! You can see the details in my latest video, but in this setup we aren't doing that. That's definitely one of the top ways, and simplest ways that this could be improved
I like the music
I am very excited to see how your code works. Please share your solution.
Around 5:45, how does the LLM combine the graph search with "normal" LLM generation? What happens behind the scenes?
@johannesjolkkonen
5 ай бұрын
Hey! I show that part in detail in my latest video, here: kzread.info/dash/bejne/faCVk8WYoJjcYNo.html
When the text to cypher conversion happens, how does the LLM know how the nodes/edges are labeled and therefore able to accurately write the query?
@johannesjolkkonen
6 ай бұрын
Hey Jeremy! If you are referring to the chat interaction, we pass the schema of the graph onto the LLM, alongside the user's query. For other questions, I just released a detailed breakdown of how to generate the graph which you can find on my channel. All the code is available as well.
@Jeremy-bd2yx
6 ай бұрын
@@johannesjolkkonen thank you! watching now!
what's the added value for a company to use a tool like that ? Is it to save time ? Like what's their ROI if they invest in such a solution ? Thank you for the video and the great work ;) That would be awesome if you also talked about the business side of t his, thanks
@johannesjolkkonen
5 ай бұрын
Thank you! I'm sure I'll be talking more about some concrete business cases around this in the future 🙂
Why not using both KG with Vector embeddings?
More info please.
How can that generate useful relationship triples when you can only give small subsets of the data to the LLM at a time?
@johannesjolkkonen
5 ай бұрын
Hey, good question. Two points: - We can add nodes and relationships to the graph incrementally, so we don't need to identify all the relationships at once. - The subsets can also really be quite large, using the 16k-32k context window models that would be ~15-30 pages of content at a time. And so while there can be some cases relationships that only become apparent when looking at the "full picture" of all the data, I think most of the relationships can be identified within the subsets, in isolation. For example, if a paragraph mentions that some technologies were used for one project, that's all we need to know about these tech->project relationships. Then if we find more relationships or attributes for that project or those technologies later in the data, we can just add them to the graph. This can be different case-by-case, of course 🙂
Please teach us how to do it
Hey great video , do you have the code on repo?
@johannesjolkkonen
2 ай бұрын
Thanks! Yes I do, you can find a more detailed tutorial on my channel which also has the link to the repo (:
will u be able to share the prompts and code snippets?
@johannesjolkkonen
6 ай бұрын
The repo is still a work in progress, but I'm planning to make a video soon where I share and walk through the code in more detail!
@mtprovasti
6 ай бұрын
Mahtavaa, ajattelin soveltaa tämmöistä ihan perinteiseen hierarkiseen taksonomiaan. Odotan innolla.
@shaunjohann
6 ай бұрын
@@johannesjolkkonenthat's great to hear! i'm working on a project that needed to hear some of what you said
@johannesjolkkonen
6 ай бұрын
A full video-walkthrough is now live here: kzread.info/dash/bejne/ppd8q6Z8d9icido.html Repository link included (:
I think there are learners who find music essential for concentration and understanding and would go as far as advocating for music in classrooms. But there are others who find the background music being noise and therefore distracting and annoying. I am assuming you listened to the video after adding the music and found it better with the background music than without. To cater for both groups of learners, perhaps you could upload two versions of your videos, one version without the addition of the music and the other with the music. You may include a label such as "without music" and "with music" respectively.
I really don't see how this is any different than a typical database with more columns. For example: Sort by company Lookup Azure Next sort by number of projects Lookup employee
Please do not use music when creating future videos.
@johannesjolkkonen
7 ай бұрын
Hey, thanks for the feedback. I'll keep that in mind!
@UlrikStreetPoulsen
6 ай бұрын
Agreed, that's really off-putting
@infinit854
6 ай бұрын
I enjoyed the music 👍
@NLPprompter
6 ай бұрын
agree but you can use music in between pause but not when you re not talking..
@itslordquas
6 ай бұрын
bro what about a "thank you for the amazing info" before nitpicking? 😂
Presentation about nothing. How to build that required
@johannesjolkkonen
3 ай бұрын
Hey, I also have a full tutorial on this here: kzread.info/dash/bejne/ppd8q6Z8d9icido.html&lc=UgyOfLtgIOQyEu2zmMF4AaABAg 🙂
Yes the background music is distracting and annoying.
excellent video - but the music ...... please no.........