Greg Kamradt (Data Indy)

Greg Kamradt (Data Indy)

Learning AI & Data One Line Of Code At A Time

Early Signals (AI Business Ideas) Newsletter: earlysignal.ai/
Twitter: twitter.com/GregKamradt
Contact: Twitter DM or [email protected]

I react to OpenAI DevDay

I react to OpenAI DevDay

Пікірлер

  • @andrejss
    @andrejss18 сағат бұрын

    Thank you! Amazing!

  • @christosmelissourgos2757
    @christosmelissourgos2757Күн бұрын

    Great stuff!

  • @shashankhegde8365
    @shashankhegde8365Күн бұрын

    Hey! I really liked your video. I am getting these errors when I run the stt script WebSocketException in LiveClient.start: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1002) Could not open socket: 'LiveClient' object has no attribute '_keep_alive_thread'

  • @jsnmad
    @jsnmadКүн бұрын

    16:48 Crazy levels here. As database developer, this is amazing.

  • @vk2875
    @vk2875Күн бұрын

    Amazing tutorial on this subject. Really appreciate your passion into detailing it in so much depths. Thank you !!!

  • @adityasankhla1433
    @adityasankhla1433Күн бұрын

    With the continuous influx of short form content, props to you for making this so interesting to watch. Didn't even realise it was an hour long. Loved every second of it. Thanks!

  • @cs-vk4rn
    @cs-vk4rn2 күн бұрын

    Could you help me understand what's going on. I'm running this in Docker and keep getting an error when it gets to running the .py: "ModuleNotFoundError: No module named 'langchain_groq'"

  • @hanzo_process
    @hanzo_process3 күн бұрын

    👍👍👍

  • @user-ph3fy3rt1d
    @user-ph3fy3rt1d3 күн бұрын

    I have the column in my dataset but it still showing key error 💀

  • @vchewbah
    @vchewbah3 күн бұрын

    Thank you for creating this tutorial it's exactly what I was looking for. Great content!

  • @ultraprim
    @ultraprim3 күн бұрын

    Brilliantly executed. That graph is incredibly intuitive and information dense.

  • @VeronicaLightspeed
    @VeronicaLightspeed3 күн бұрын

    how can we interrupt the ai??? plsss helpp

  • @VeronicaLightspeed
    @VeronicaLightspeed3 күн бұрын

    how could we interrupt the voicebot can anyone help (pls)

  • @loryo80
    @loryo804 күн бұрын

    The firstidea that came to me . Is build a personnal shoper apps. Working for a lot brands by saving theireproductds and bring them sells. The consumer can save his face and body for one time and every time he will need to shop it will cn'nect to the personnal shoper apps. Choose the style the needs or only wait for suggestion et voilà, everybody will be happy the consummer the brands and the try on solution company .

  • @kwabenaababioadwabour2491
    @kwabenaababioadwabour24914 күн бұрын

    Where has this channel been all this while? This is gold. Thanks for the great video!

  • @123arskas
    @123arskas4 күн бұрын

    Where are the Insights?

  • @vijaybrock
    @vijaybrock4 күн бұрын

    Hi Sir, what is the best chunking method to process the complex pdfs such as 10K reports. 10K reports will have so many TABLES, How to load those tables to vectorDBs?

  • @jmojnida
    @jmojnida5 күн бұрын

    How to deal with errors in case of agents? My prompt is this: qry = "Who is the current prime minister of UK? What is the largest prime number that is smaller than his age? Include the name and age of the prime minister in your response." agent.run(qry) It found the prime minister of US Rishi Sunaak. His age 42 years (may be). But, check out the prime number, it arrives at 42. 42 is NOT a prime number. How to deal with such errors? Action Input: "Rishi Sunak age" Observation: 43 years Thought: I need to find the largest prime number that is smaller than 43. Action: Calculator Action Input: "Largest prime number smaller than 43" Observation: Answer: 42 Thought: I now know the final answer. Final Answer: The current prime minister of UK is Rishi Sunak, who is 43 years old. The largest prime number that is smaller than his age is 42. > Finished chain. The current prime minister of UK is Rishi Sunak, who is 43 years old. The largest prime number that is smaller than his age is 42.

  • @trackerprince6773
    @trackerprince67735 күн бұрын

    Would fine tuning yield better result or is that not guaranteed? Especially if you have large amounts of wirting examples

  • @brijeshjaggi4579
    @brijeshjaggi45797 күн бұрын

    thanks greg, this was very very easy to understand and insightful

  • @Himanshu-gg6vo
    @Himanshu-gg6vo7 күн бұрын

    Hi... Any suggestion like how we can handle large chunks s some of the chunks are having token length greater then 4k !!

  • @nfaza80
    @nfaza807 күн бұрын

    Theory & Importance of Text Splitting: Context Limits: Language models have limitations on the amount of data they can process at once. Splitting helps by breaking down large texts into manageable chunks. Signal-to-Noise Ratio: Providing focused information relevant to the task improves the model's accuracy and efficiency. Splitting eliminates unnecessary data, enhancing the signal-to-noise ratio. Retrieval Optimization: Splitting prepares data for effective retrieval, ensuring the model can easily access the necessary information for its task. Five Levels of Text Splitting: Level 1: Character Splitting: Concept: Dividing text based on a fixed number of characters. Pros: Simplicity and ease of implementation. Cons: Rigidity and disregard for text structure. Tools: LangChain's CharacterTextSplitter. Level 2: Recursive Character Text Splitting: Concept: Recursively splitting text using a hierarchy of separators like double new lines, new lines, spaces, and characters. Pros: Leverages text structure (paragraphs) for more meaningful splits. Cons: May still split sentences if chunk size is too small. Tools: LangChain's RecursiveCharacterTextSplitter. Level 3: Document Specific Splitting: Concept: Tailoring splitting strategies to specific document types like markdown, Python code, JavaScript code, and PDFs. Pros: Utilizes document structure (headers, functions, classes) for better grouping of similar information. Cons: Requires specific splitters for different document types. Tools: LangChain's various document-specific splitters, Unstructured library for PDFs and images. Level 4: Semantic Splitting: Concept: Grouping text chunks based on their meaning and context using embedding comparisons. Pros: Creates semantically coherent chunks, overcoming limitations of physical structure-based methods. Cons: Requires more processing power and is computationally expensive. Methods: Hierarchical clustering with positional reward, finding breakpoints between sequential sentences. Level 5: Agentic Chunking: Concept: Employing an agent-like system that iteratively decides whether new information belongs to an existing chunk or should initiate a new one. Pros: Emulates human-like chunking with dynamic decision-making. Cons: Highly experimental, slow, and computationally expensive. Tools: LangChain Hub prompts for proposition extraction, custom agentic chunker script. Bonus Level: Alternative Representations: Concept: Exploring ways to represent text beyond raw form for improved retrieval. Methods: Multi-vector indexing (using summaries or hypothetical questions), parent document retrieval, graph structure extraction. Key Takeaways: The ideal splitting strategy depends on your specific task, data type, and desired outcome. Consider the trade-off between simplicity, accuracy, and computational cost when choosing a splitting method. Experiment with different techniques and evaluate their effectiveness for your application. Be mindful of future advancements in language models and chunking technologies. Further Exploration: Full Stack Retrieval website: Explore tutorials, code examples, and resources for retrieval and chunking techniques. LangChain library: Discover various text splitters, document loaders, and retrieval tools. Unstructured library: Explore options for extracting information from PDFs and images. LlamaIndex library: Investigate alternative chunking and retrieval methods. Research papers and articles on text splitting and retrieval.

  • @deeplearningdummy
    @deeplearningdummy8 күн бұрын

    Awesome Greg! Best TTS-STT demo yet. Do you have any ideas on how to modify your example for two people having a conversation, and the AI participating as a third person. For example, debate students are debating and want the AI to be the judge to help them improve their debate skills. I would love to hear your thoughts on this. Thanks for this tutorial. I've been looking for this solution since the 90's!

  • @FedeTango
    @FedeTango9 күн бұрын

    Is there any alternative for Spanish? I cannot find it.

  • @crystalstudioswebdesign
    @crystalstudioswebdesign9 күн бұрын

    Can this be added to a website?

  • @Munk-tt6tz
    @Munk-tt6tz10 күн бұрын

    Your channel is a gem, thank you!

  • @YoPranita
    @YoPranita11 күн бұрын

    Awesome explanation☺was very helpful

  • @henkhbit5748
    @henkhbit574811 күн бұрын

    Thanks, Excellent video about chunking strategies👍 Question: Can i store the pulled html table using unstructured in a vector database together with a normal text and asking question (RAG)?.

  • @Ideariver
    @Ideariver13 күн бұрын

    This was an awesome content

  • @markwantstolearn
    @markwantstolearn13 күн бұрын

    LLAMA PARSE for Semantic Chunking isfree

  • @andrewtschesnok5582
    @andrewtschesnok558214 күн бұрын

    Nice. But in reality your demo is 3,500-4,000 ms from when you stop speaking to getting a response. It does not match the numbers you are printing...

  • @HideousSlots
    @HideousSlots15 күн бұрын

    conversational endpointing is a great idea, but I'd like to see that combined with a small model agent that was only looking for breaks in the conversation and an appropriate time to interject. Maybe with a crude scale for the length of the response. So if the user has a break in the point they're trying to make - we don't want the user interrupted and the conversation moved on - what would be more appropriate would be a simple acknowledgement. But once the point is complete, we would then pass back that we want a longer response.

  • @frothyphilosophy7000
    @frothyphilosophy700015 күн бұрын

    This. I need something like this for a project, but I'm not very familiar with Groq or Deepgram yet; just starting to dig in. This thing starts responding with the first little pause, so it constantly cuts me off when I'm just pausing momentarily to think of how I want to phrase the rest of my sentence. If it wants to send data at every minor pause in order to understand context, predict the full query, and begin formulating a response, that's fine-- but it needs to wait until I've finished my entire input before verifying/sending its response. Out of the box, this is like a person who doesn't actually listen to what you're saying and is just waiting for their turn to speak. Is there an easy way to affect the response times and/or understanding of when the user has finished a full thought or do I need to develop logic/rules from scratch?

  • @HideousSlots
    @HideousSlots15 күн бұрын

    @@frothyphilosophy7000 not that I’ve seen. And this would be a massive leap in improving conversation. It literally just needs a small model to parse the text at every pause and see if it’s an appropriate time to interject. Just the same as a polite human would do. The groq api should be able to do it. I’m really surprised we haven’t seen this effectively enabled anywhere yet.

  • @frothyphilosophy7000
    @frothyphilosophy700015 күн бұрын

    @@HideousSlots Gotcha. Yeah, guess I'll need to implement that, as it's unusable otherwise.

  • @paparaoveeragandham284
    @paparaoveeragandham28416 күн бұрын

    good to see it

  • @nessrinetrabelsi8581
    @nessrinetrabelsi858117 күн бұрын

    Thanks! How does it compare with assemblyai universal 1? do you know which speech-to-text support arabic with the best accuracy in real time?

  • @NadaaTaiyab
    @NadaaTaiyab18 күн бұрын

    Wow! I hadn't even thought about Agentic Chunking! I need to try this. I did some extensive experimentation with chunking on a project at work for a clinical knowledge base and I found that chunking strategies can make the difference between an ok retrieval and an awesome retrieval that works across a higher percentage of queries.

  • @GeorgAubele
    @GeorgAubele18 күн бұрын

    You are amazing!

  • @GeorgAubele
    @GeorgAubele18 күн бұрын

    Awesome video! You do a great job!

  • @drakongames5417
    @drakongames541720 күн бұрын

    what the ___. how good can a tutorial be. such a gem of a video. thx for making this. new to ml and found this very helpful

  • @nattapongthanngam7216
    @nattapongthanngam721620 күн бұрын

    I'm immensely grateful for your enlightening series on the 5 Levels Of LLM Summarizing. The concept of chunks nearest to centroids representing summaries is brilliant and has offered me a fresh perspective. I eagerly anticipate your insights on AGENTS!

  • @urglik
    @urglik20 күн бұрын

    This app won't find my API keys either Groq or Openai though they are there. Too bad. Any suggestions greg?

  • @urglik
    @urglikКүн бұрын

    API's being found either!

  • @nattapongthanngam7216
    @nattapongthanngam721620 күн бұрын

    Thank you, Greg, for this informative video on using LLMs to extract data from text! I found it particularly valuable for its potential application in skill/information extraction from resumes/CVs submitted to large companies. I also noticed a minor error in the original code: """ output = chain.predict_and_parse(text="...")['data'] printOutput(output) """ updated code: """ output = chain.run(text="...")['data'] print(output) """

  • @top_1_percent
    @top_1_percent20 күн бұрын

    You're a legend mate. I learn so much in a few minutes of your videos. Thanks for sharing your valuable knowledge and helping shape the world.

  • @DataIndependent
    @DataIndependent20 күн бұрын

    Love it - thank you very much!

  • @Celso-tb6eb
    @Celso-tb6eb20 күн бұрын

    i cloned the code but response time is like 12 seconds. 4 weeks past and i'm late to the party

  • @nattapongthanngam7216
    @nattapongthanngam721621 күн бұрын

    Hey Greg, thanks for the video on structured output! One quick tip - maybe it will help other people, when i run code print(output.content) output ```json [ { "input_industry": "air LineZ", ... and it cannot run next code json.loads(output_content) it has to correct symbol first output_content = output_content.replace("```", "'''") On a separate note, I'm looking for a video about using LangChain for question answering across multiple documents. Any chance you have one in your playlist?

  • @nattapongthanngam7216
    @nattapongthanngam721621 күн бұрын

    Thanks Greg! Great video on using custom files. Could you share a video about RAGs? I heard there are many types and I'd love to learn which is best for different tasks.

  • @nattapongthanngam7216
    @nattapongthanngam721621 күн бұрын

    Thank you, Sensei Greg, for this amazing demonstration of LangChain's capabilities!

  • @nattapongthanngam7216
    @nattapongthanngam721621 күн бұрын

    Great tutorial!

  • @nattapongthanngam7216
    @nattapongthanngam721621 күн бұрын

    Appreciate the clear explanation of Token Limit