Lessons Learned on LLM RAG Solutions

Ғылым және технология

We’re going to do a technical deep dive into Retrieval Augmented Generation, or RAG, one of the most popular Generative AI projects. There is a ton of content about RAG applications with LLMs, but very little addresses the challenges associated with building practical applications. Today you’re going to get the inside scoop from some engineers with that experience.
LLMs can be used to convert documents like emails or contracts to sets of vectors called embeddings. Embeddings can be used to find sets of text that are more similar in meaning. The most common business applications are semantic search, searching based on meaning and not keywords, and document Q&A. Each step presents unique challenges, and we're going to address them today.

Пікірлер: 20

  • @humbertomejia3374
    @humbertomejia33747 ай бұрын

    🎯 Key Takeaways for quick navigation: 00:02 🧭 *RAG Understanding and Challenges* - RAG, or Retrieval Augmented Generation, is a popular Generative AI project facing challenges like Echo and the need for context in responses. 02:10 🔄 *RAG in Action and Improvement* - Demonstrates RAG with an example of an employee querying about bringing a dog to work, highlighting improved responses with retrieved policies. 03:05 📚 *Automatic Retrieval Application* - Explains the application process of automatically retrieving relevant information for RAG, covering obtaining documents, chunking, embedding, and generating responses. 04:29 🛠️ *RAG as a Customization Tool* - Discusses RAG as a practical way to customize language models, emphasizing its use cases and significance in adapting to diverse datasets. 07:39 📄 *Parsing Documents in Real-world Applications* - Emphasizes the need to parse various document types, discussing challenges with tables and messy real-world data. 09:44 🧩 *Importance of Document Hierarchy* - Explores maintaining document hierarchy for meaningful embeddings, stressing the need for a flat representation. 15:26 🕵️ *Ensuring Relevant Retrieval* - Emphasizes the retrieval step's importance in RAG applications and discusses the impact of incorrect retrieval on response accuracy. 18:54 🎯 *Investment Priority: Retrieval Over Generation* - Advocates investing time in perfecting the retrieval step, acknowledging the complexities beyond choosing the right embedding model. 21:23 🤷‍♂️ *Challenges in Evaluating RAG Applications* - Explores difficulties in evaluating RAG applications due to diverse implementation methods and emphasizes the need for comprehensive evaluation metrics. 21:57 📚 *Evaluation of RAG Applications* - Evaluation involves assessing faithfulness, measuring alignment with evidence, and avoiding hallucinations. Challenges lie in nuanced evaluation given diverse user queries. 27:13 🤖 *Challenges in Evaluating RAG Applications* - Multiple choice evaluations simplify the process but may introduce biases. The variation in user queries requires adaptive systems, highlighting the ambiguity in assessing intelligent systems. 28:17 🚀 *Techniques for Improving RAG Performance* - Enhancing search capabilities involves using embeddings, metadata, rules, or heuristics. Summarization during retrieval, diversifying queries, and addressing varied inputs improve efficacy. 32:27 🔄 *Fine-tuning and Summarization in RAG* - Fine-tuning components like the embedding model or using adapters tailors RAG for different applications. Summarization techniques enhance summaries by coalescing information into fewer sentences, emphasizing the need for specific directions in summarization requests. Made with HARPA AI

  • @sprobertson
    @sprobertson8 ай бұрын

    Very refreshing to see something about RAG that goes beyond surface level

  • @prolegoinc

    @prolegoinc

    8 ай бұрын

    Justin has some very good advice at our next live event this Thursday. kzread.infoY_Nr9-IWF8o?si=s8G_FR1MJ_ejdd56 If you are interested in RAG you won’t be disappointed.

  • @joeaccent2027
    @joeaccent20277 ай бұрын

    Thank you. More useful than most conversations on the topic. Heuristics is clearly still a major space in this next wave of AI.

  • @alexmolyneux816
    @alexmolyneux8165 ай бұрын

    Just to mention the summarisation technique they mention at the end is 'Chain of Density'. Iteratively making the summary more and more dense

  • @prolegoinc
    @prolegoinc8 ай бұрын

    My first KZread Live! The beginning was a bit rough because I started hearing my own voice in the background. It turns out I had another Chrome tab with this page open and it started playing automatically in the background. I paused because I couldn't figure out what was happening. Lesson learned for next time: close your other tabs.

  • @IamMarcusTurner

    @IamMarcusTurner

    8 ай бұрын

    For your first really good job. If you do more always good to have someone in the else listen to the stream to confirm the audio for you if that’s a concern. Hard to do all that solo.

  • @prolegoinc

    @prolegoinc

    8 ай бұрын

    Thanks. Good idea, I’ll do that next time.

  • @sprobertson

    @sprobertson

    8 ай бұрын

    Haha that's the worst feeling, quick recovery though

  • @Shishiranshoku
    @Shishiranshoku2 ай бұрын

    I was wondering if you have some suggestions on optimizing the documentation being used for RAG. We're using RAG linked to our Notion 'wiki', and I want to implement guidelines for the info being added, to ensure it is 'ai friendly'.

  • @AlonAvramson
    @AlonAvramson4 ай бұрын

    Thank you! very practical and up to date discussion

  • @paultruax9664
    @paultruax96643 ай бұрын

    Great information! Thank you guys.

  • @rickrischter9631
    @rickrischter96318 ай бұрын

    Fantastic presentation! A question directed at Justin: When executing multiple queries that have slight variations, what method do you employ to aggregate or coalesce the responses into a unified result? Do you use a LLM to serve as a judge for this aggregation?

  • @justinpounders

    @justinpounders

    8 ай бұрын

    Glad you enjoyed the discussion! Yes, when you generate multiple variations of the question you can use an LLM to summarize the responses into the "final answer." Another faster approach I've been experimenting with recently is to have an LLM "agent" look at the retrieved documents and decide if it has enough information to respond. If so, then great. If not, then it can ask follow up questions until it is able to answer the question. This usually saves quite a few LLM calls.

  • @rickrischter9631

    @rickrischter9631

    8 ай бұрын

    @@justinpounders I see, thanks. Regarding this yours new approach in testing, this another agent formulates a new(s) query for the similarity search, right?

  • @justinpounders

    @justinpounders

    8 ай бұрын

    @@rickrischter9631Yep. The original query + search results go to an LLM that can either respond to the user or run a new query through the search function.

  • @humbledev-mp4zz
    @humbledev-mp4zz4 ай бұрын

    Great information, some useful takeaways, thanks. Have you experimented much with hybrid retrieval, vector search + keyword search, to retrieve accurate chunks?

  • @carvalhoribeiro
    @carvalhoribeiro2 ай бұрын

    Great discussion. Thanks for sharing this

  • @LandingBusiness
    @LandingBusiness7 ай бұрын

    Please link the video mentioned in the description and tag me when you get a chance. I’m just learning the RAG aspect but have theoretically visioned the application case I’d like to focus on. Thank you very much for the informative discussion!

  • @prolegoinc

    @prolegoinc

    7 ай бұрын

    We do a deep dive here: kzread.infoY_Nr9-IWF8o?feature=share in this live session. The "Intro to RAG" video will be released next Wednesday along with the source code. You can get notified via email here: www.prolego.com/newsletter

Келесі