Advanced RAG with Knowledge Graphs (Neo4J demo)

I recently created a demo for some prospective clients of mine, demonstrating how to use Large Language Models (LLMs) together with graph databases like Neo4J.
The two have a lot of interesting interactions, namely that you can now create knowledge graphs easier than ever before, by having AI find the graph entities and relationships from your unstructured data, rather than having to do all that manually.
On top of that, graph databases also have some advantages for Retrieval Augmented Generation (RAG) applications compared to vector search, which is currently the prevailing approach to RAG.
Connect with me on LinkedIn: / johannesjolkkonen
▬▬▬▬▬▬ T I M E S T A M P S ▬▬▬▬▬▬
0:00 - Intro
2:16 - Demo starts
2:55 - Creating graph from unstructured data
4:23 - Chatting with the knowledge graph
5:55 - Advantages of Graphs vs Vector Search

Пікірлер: 77

  • @johannesjolkkonen
    @johannesjolkkonen6 ай бұрын

    Hey everybody, thanks for the great comments! Finally got around to making a more detailed tutorial for this demo, with code available on Github. You can check it out here: kzread.info/dash/bejne/ppd8q6Z8d9icido.html

  • @w_chadly
    @w_chadly5 ай бұрын

    this is incredible! I see so many use cases opening up. thank you for sharing this!

  • @SafetyLabsInc_ca
    @SafetyLabsInc_ca6 ай бұрын

    This is a great video. It clearly explained to me the difference between vector database and graph databases and the new features. We can build using the Graph Databases. Thank you.

  • @alchemication
    @alchemication6 ай бұрын

    Super nice food for thought. Thanks for sharing an alternative. Would love a deeper dive with some clear examples confirming the 3 advantages 😊 But might experiment myself for fun too!

  • @agentDueDiligence
    @agentDueDiligence3 ай бұрын

    This concept & this video are truly amazing. I have a specific idea how to apply this - i think this might change my whole project & I will explore this graph based approach!! Great work - thank you.

  • @jonathancooper7068
    @jonathancooper70683 ай бұрын

    Very nice demo. It showed why and how to use the graph database for RAG and answered questions that I came up with while watching.

  • @michaeldoyle4222
    @michaeldoyle42223 ай бұрын

    great content and delivery - love your work

  • @itsdavidmora
    @itsdavidmora19 күн бұрын

    Really neat demo! I think this works so well because graphs help LLMs approximate the sort of clear relationships humans have in their brain about the world.

  • @WisherTheKing
    @WisherTheKing6 ай бұрын

    Great video! I wanted to explore the graph dbs exactly for this use case. Imagine also adding work pieces to this. Jiras, code reviews, comments, etc. P.S. the music is great 😂

  • @chrisogonas
    @chrisogonasАй бұрын

    Well illustrated! Thanks

  • @NLPprompter
    @NLPprompter6 ай бұрын

    I love when data engineers making videos it's so easy to understand side look even the description is structured 👍

  • @MuhammedBasil
    @MuhammedBasil5 ай бұрын

    Wow. This is amazing

  • @AssassinUK
    @AssassinUK6 ай бұрын

    I had to subscribe based on this idea alone! I'm trying to think of another way I could implement this with standard RAG for those that use LangChain/Flowise, and Mermaid code to hold the node information.

  • @AEVMU
    @AEVMU3 ай бұрын

    Gotta look at decentrlaized knowledge graphs. Those are the future of RAG databases.

  • @engage-meta
    @engage-meta6 ай бұрын

    Good présentation. Thank you!

  • @chenzhong1182
    @chenzhong11826 ай бұрын

    That's exactly what I am looking for? Apart from the tutorials, are you also considering starting a discord channel where people can chat? I think there is growing interest in KG + LLMs but no where to dicuss

  • @antoninleroy3863
    @antoninleroy38636 ай бұрын

    very interesting thanks !

  • @mostlazydisciplinedperson
    @mostlazydisciplinedperson6 ай бұрын

    thank you for video

  • @evetsnilrac9689
    @evetsnilrac96896 ай бұрын

    Excellent job Johannes! After watching the video "Knowledge Graph Construction Demo from raw text using an LLM" by Neo4j, I came across your video and found that you addressed the crucially important question some of us are thinking about: "How can we improve the way we do RAG?" I agree with your assessment that using KGs provide very significant benefits that would compel us to want to use this approach vs using vector embeddings. However, am I correct in understanding that we need better workflows / pipelines to get all the kinds of data we need to work with into a KG to take more advantage of these benefits? Sounds like you may have listened to Denny Vrandecic discusses The Future of Knowledge Graphs in a World of Large Language Models.

  • @johannesjolkkonen

    @johannesjolkkonen

    6 ай бұрын

    Hey Steve, thank you! You are correct, using a KG will almost certainly involve more pre-processing/workflows compared to just having an unstructured text/vector database. LLMs can be very useful in the process of extracting entities and relationships for your graph, but it's still a serious undertaking, with a lot of quality checks needed to make it production-ready. It's all still pretty experimental and niche, but I think this approach will become increasingly mainstream over the next 1-2 years. I haven't checked out Danny's video, but I definitely will now! I can also recommend going through the content that the Neo4J team has been creating around LLMs

  • @evetsnilrac9689

    @evetsnilrac9689

    6 ай бұрын

    @@johannesjolkkonen Here's my summary of the key points of Denny's presentation. • LLMs are expensive to train • LLMs are expensive to run inference responses. • LLMs can’t be trusted to correctly output accurate facts. o Answers are just guessing based on stochastic probability, even if it has inferred a different answer in a different language - i.e., it does not “know” what it “knows” because it does not maintain a list of all the things it knows-it just generates outputs at inference runtime. • Knowledge in ChatGPT seems to be not stored in a language-independent way, but is stored within each individual language. • They are not very good at math and it would be economically inappropriate to use them for math computation • Autoregressive Transformer Models such as ChatGPT are supposed to be Turing complete, but they are a very expensive reiteration of Turing’s tarpit. You could do everything with them, but it doesn't mean you should. • It is economically inappropriate to attempt to improve LLM’s ability to internalize knowledge (know what it knows) because it will always be cheaper, faster, and more accurate(?) to externalize it in a graph store and look it up when needed. In a world where language models can generate infinite content, "knowledge" (vs content) becomes valuable. • We don't want to machine learn Obama's place of birth every time we need it. • We want to store it once and for all and that's what knowledge graphs are good for: to keep your valuable knowledge safe. The knowledge graph provides you with the ground truth for your LLMs. • LLMs are probably the best tool for knowledge extraction we have seen developed in a decade or two. • They can be an amazing tool to speed up the creation of a knowledge graph. • We want to extract knowledge into a symbolic form. We want the system to overfit for truth. • And this is why it makes so much sense to store the knowledge in a symbolic system that can be edited, audited, curated, and understood…where we can cover the long tail by simply adding new nodes to the knowledge graph that can be simply looked up instead of systems that need to be trained to return knowledge with a certain probability that may have them making stuff up on the fly.

  • @wdonno

    @wdonno

    6 ай бұрын

    @@evetsnilrac9689, such a helpful summary! Thank you!

  • @quansun8245
    @quansun82456 ай бұрын

    A really awesome video Johannes, wondering if there is a github repo for this? Thanks.

  • @Epistemophilos
    @Epistemophilos5 ай бұрын

    Fabulous video, thanks! Would be even better with no music, or at least if it was very much lower volume :)

  • @_jen_z_
    @_jen_z_6 ай бұрын

    Thanks for sharing! Ca you also share how are you dealing with consolidation of output nodes. Some project descriptions might generate "Graph Neural Nets" another "Graph Neural Network" or "GNN"

  • @johannesjolkkonen

    @johannesjolkkonen

    6 ай бұрын

    Hey Djan! Consolidation/entity resolution is definitely one of the most interesting challenges with these kinds of applications, but in this demo there's nothing implemented for that yet

  • @pierrebonnet2026
    @pierrebonnet20266 ай бұрын

    Nice!

  • @infinit854
    @infinit8546 ай бұрын

    How does the chat interface communicate with the database? Is it based on prompts that create cypher queries?

  • @AdamLorentzen
    @AdamLorentzen6 ай бұрын

    I'm working on somthing similar, but you make it look easy! Would love to chat and see if we could collaborate on something to get in front of clients :)

  • @inflationking1271
    @inflationking12715 ай бұрын

    I would be curious about your view on when vector seach is better suited than graph search for RAG. Thanks for this great video! It helps a lot

  • @johannesjolkkonen

    @johannesjolkkonen

    5 ай бұрын

    Thank you! Vector search is still great for a lot of situations, when answers can be found directly in the unstructured text. Where graphs (or really any other more "structured" databases) start to shine is when you need to understand concepts and their relationships beyond what's explicitly said in the text. But this is a lot more demanding too, and often not necessary. Also the two aren't mutually exclusive, with neo4j (and recently AWS Neptune, another graph db) supporting vector search to also search nodes by their similarity. This combination is super exciting!

  • @bartoszko4028
    @bartoszko402827 күн бұрын

    Is it better in some way than using SQL db and relations based on for example sql schemas etc. which also can be easily used when doing retrieval?

  • @tomgiannulli911
    @tomgiannulli9115 ай бұрын

    Did you use attributes to add more characteristics to the nodes an edges, example : to score strength of relationship ? I have tried to ask LLMS to create graphs using various prompts from its native knowledge and it does poorly, which is interesting as des it indicate a lack of understanding / relationships or more of a fine tuning issue, what do you think?

  • @johannesjolkkonen

    @johannesjolkkonen

    5 ай бұрын

    Hey, I haven't added such metadata but that's a great idea! For your problem, I'd say the most important thing is to make sure you tell the LLM what kinds of entities and relationships you are looking for. In other words, you should have a pre-defined schema in mind for your graph. Some pre-processing might also be useful if your data is also very messy.

  • @jingqiwu2865
    @jingqiwu28655 ай бұрын

    very nice and inspiring. QQ: if gpt4 created incorrect cipher, do we try to detect and auto fix/retry?

  • @johannesjolkkonen

    @johannesjolkkonen

    5 ай бұрын

    Thank you! You can see the details in my latest video, but in this setup we aren't doing that. That's definitely one of the top ways, and simplest ways that this could be improved

  • @phmfthacim
    @phmfthacim23 күн бұрын

    I like the music

  • @sakinamosavi1104
    @sakinamosavi11046 ай бұрын

    I am very excited to see how your code works. Please share your solution.

  • @Epistemophilos
    @Epistemophilos5 ай бұрын

    Around 5:45, how does the LLM combine the graph search with "normal" LLM generation? What happens behind the scenes?

  • @johannesjolkkonen

    @johannesjolkkonen

    5 ай бұрын

    Hey! I show that part in detail in my latest video, here: kzread.info/dash/bejne/faCVk8WYoJjcYNo.html

  • @Jeremy-bd2yx
    @Jeremy-bd2yx6 ай бұрын

    When the text to cypher conversion happens, how does the LLM know how the nodes/edges are labeled and therefore able to accurately write the query?

  • @johannesjolkkonen

    @johannesjolkkonen

    6 ай бұрын

    Hey Jeremy! If you are referring to the chat interaction, we pass the schema of the graph onto the LLM, alongside the user's query. For other questions, I just released a detailed breakdown of how to generate the graph which you can find on my channel. All the code is available as well.

  • @Jeremy-bd2yx

    @Jeremy-bd2yx

    6 ай бұрын

    @@johannesjolkkonen thank you! watching now!

  • @MrDonald911
    @MrDonald9115 ай бұрын

    what's the added value for a company to use a tool like that ? Is it to save time ? Like what's their ROI if they invest in such a solution ? Thank you for the video and the great work ;) That would be awesome if you also talked about the business side of t his, thanks

  • @johannesjolkkonen

    @johannesjolkkonen

    5 ай бұрын

    Thank you! I'm sure I'll be talking more about some concrete business cases around this in the future 🙂

  • @thehappycookiehour
    @thehappycookiehour3 ай бұрын

    Why not using both KG with Vector embeddings?

  • @SDGwynn
    @SDGwynn6 ай бұрын

    More info please.

  • @98f5
    @98f55 ай бұрын

    How can that generate useful relationship triples when you can only give small subsets of the data to the LLM at a time?

  • @johannesjolkkonen

    @johannesjolkkonen

    5 ай бұрын

    Hey, good question. Two points: - We can add nodes and relationships to the graph incrementally, so we don't need to identify all the relationships at once. - The subsets can also really be quite large, using the 16k-32k context window models that would be ~15-30 pages of content at a time. And so while there can be some cases relationships that only become apparent when looking at the "full picture" of all the data, I think most of the relationships can be identified within the subsets, in isolation. For example, if a paragraph mentions that some technologies were used for one project, that's all we need to know about these tech->project relationships. Then if we find more relationships or attributes for that project or those technologies later in the data, we can just add them to the graph. This can be different case-by-case, of course 🙂

  • @alinakhaee4935
    @alinakhaee49356 ай бұрын

    Please teach us how to do it

  • @CreativityCourse
    @CreativityCourse2 ай бұрын

    Hey great video , do you have the code on repo?

  • @johannesjolkkonen

    @johannesjolkkonen

    2 ай бұрын

    Thanks! Yes I do, you can find a more detailed tutorial on my channel which also has the link to the repo (:

  • @zaursamedov8906
    @zaursamedov89066 ай бұрын

    will u be able to share the prompts and code snippets?

  • @johannesjolkkonen

    @johannesjolkkonen

    6 ай бұрын

    The repo is still a work in progress, but I'm planning to make a video soon where I share and walk through the code in more detail!

  • @mtprovasti

    @mtprovasti

    6 ай бұрын

    Mahtavaa, ajattelin soveltaa tämmöistä ihan perinteiseen hierarkiseen taksonomiaan. Odotan innolla.

  • @shaunjohann

    @shaunjohann

    6 ай бұрын

    @@johannesjolkkonenthat's great to hear! i'm working on a project that needed to hear some of what you said

  • @johannesjolkkonen

    @johannesjolkkonen

    6 ай бұрын

    A full video-walkthrough is now live here: kzread.info/dash/bejne/ppd8q6Z8d9icido.html Repository link included (:

  • @openyard
    @openyard5 күн бұрын

    I think there are learners who find music essential for concentration and understanding and would go as far as advocating for music in classrooms. But there are others who find the background music being noise and therefore distracting and annoying. I am assuming you listened to the video after adding the music and found it better with the background music than without. To cater for both groups of learners, perhaps you could upload two versions of your videos, one version without the addition of the music and the other with the music. You may include a label such as "without music" and "with music" respectively.

  • @Noneofyourbusiness2000
    @Noneofyourbusiness200018 күн бұрын

    I really don't see how this is any different than a typical database with more columns. For example: Sort by company Lookup Azure Next sort by number of projects Lookup employee

  • @labloke5020
    @labloke50207 ай бұрын

    Please do not use music when creating future videos.

  • @johannesjolkkonen

    @johannesjolkkonen

    7 ай бұрын

    Hey, thanks for the feedback. I'll keep that in mind!

  • @UlrikStreetPoulsen

    @UlrikStreetPoulsen

    6 ай бұрын

    Agreed, that's really off-putting

  • @infinit854

    @infinit854

    6 ай бұрын

    I enjoyed the music 👍

  • @NLPprompter

    @NLPprompter

    6 ай бұрын

    agree but you can use music in between pause but not when you re not talking..

  • @itslordquas

    @itslordquas

    6 ай бұрын

    bro what about a "thank you for the amazing info" before nitpicking? 😂

  • @podunkman2709
    @podunkman27093 ай бұрын

    Presentation about nothing. How to build that required

  • @johannesjolkkonen

    @johannesjolkkonen

    3 ай бұрын

    Hey, I also have a full tutorial on this here: kzread.info/dash/bejne/ppd8q6Z8d9icido.html&lc=UgyOfLtgIOQyEu2zmMF4AaABAg 🙂

  • @openyard
    @openyard5 күн бұрын

    Yes the background music is distracting and annoying.

  • @mcpduk
    @mcpduk3 ай бұрын

    excellent video - but the music ...... please no.........