Using LangChain Output Parsers to get what you want out of LLMs

Ғылым және технология

OutParsers Colab: drp.li/bzNQ8
In this video I go through what outparsers are and how to use them in LangChain to improve you the results you get out of your models.
My Links:
Twitter - / sam_witteveen
Linkedin - / samwitteveen
Github:
github.com/samwit/langchain-t...
github.com/samwit/llm-tutorials
00:00 Intro
04:56 Structured Output Parser
12:26 CommaSeparatedList OutputParser
14:13 Pydantic OutputParser
19:00 Output FixingParser
21:26 Retry OutputParser

Пікірлер: 90

@pankymathur11 ай бұрын
Thanks a lot Sam, I really like the way you went deep into explaining all different types of parser with examples. This is definitely one of the top notch content video you released, keep it up 😊
@magicofjafo11 ай бұрын
Dude, I was exactly having output parser problems just last night. This is exactly what I needed. Thanks.
@rashidquamar10 күн бұрын
thanks Sam, I was strugling with outparser and you helped on time
@jasonlosser814111 ай бұрын
Great video. I’m using core python elements to parse right now, but I’ll incorporate output parsers in my next rebuild.
@kuhajeyangunaratnam8652Ай бұрын
Thanks a lot mate. This is invaluable as it gets, code walkthrough with all the explanations. Not to mention that code itself well documented.
@jdallain11 ай бұрын
Thank you very much! This is super helpful and something I’ve struggled with
@MariuszWoloszyn11 ай бұрын
I used to insert and output in YAML. It's more human readable than json hence it works better with llms. No missing colon or stuff like that.
@jolieriskin444611 ай бұрын
Another variation I've been using is to have a separate JSON repair method. I usually use a similar technique of showing the example JSON and immediately call my validation routine afterwards. If there is an error, send the JSON error and line number it's on as a separate call and try up to 3x to repair the output. The nice thing is you can use a lot fewer tokens on the repair call and potentially call a more specific or faster model that is tailored towards just fixing JSON (rather than wasting an expensive call to GPT4 etc...).
@toddnedd213811 ай бұрын
Thank you. There are pros and cons with langchain. It is a powerful framework but sometimes it is (imho) a little bit to verbose if it comes to prompt templates. This adds up if you make a lot of request to the model and costs unnecessary tokens (in production ready applications). Therefore i use my own written prompts, the crucial thing is finishing the prompt with the final instruction, here is an example: --- The result type 2 should be provided in the following JSON data structure: { "name": "name of the task", "number": number of the task as integer, // integer value "tool": "name of the tool to solve the task", // omitted "isfinal": true if this is the last task, // boolean value } Respond only with the output in the exact format specified, with no explanation or conversation. --- So far, this has always worked reliably (over 500 calls). I found this in a lot of papers, so, the credits for this go to some other intelligent guys. From my experience, the name of the fields and also the order of the fields can make a difference.
@MichaelNagyTeodoro
11 ай бұрын
I did the same thing, it's better to do the parsing outside of langchain methods.
@thatryanp
11 ай бұрын
From what I see of langchain examples, it seems that for someone with development experience, they would be better served with some basic utility functions rather than taking on langchain's design assumptions.
@toddnedd2138
11 ай бұрын
@@thatryanp & @Michael Nagy Teodoro There could be some disadvantage if you write own solutions when it comes to updating to a newer underlying model. Maybe not critical today, but one day it might be a topic. My guess, the langchain community will be fast provide updates.
@RobotechII11 ай бұрын
Wonderful content! I'm sending it to my team
@ugk432111 ай бұрын
Super content...explained well. Thank you
@____2080_____11 ай бұрын
Awesome and thank you for teaching
@TomanswerAi11 ай бұрын
Great explanation thank you!
@bingolio7 ай бұрын
Sam you are my unsung Hero of AI. THANKS!
@tubingphd11 ай бұрын
Thank you Sam
@pranavmarla11 ай бұрын
I've been playing around with langchain for a couple of days and this is really helpful! Output parsers would be great while dealing with tools that need to interpret the response. I hope this gets integrated into SimpleSequentialChains too? Because currently SimpleSequentialChains only accept prompt templates which have a single inputs.
@galkim19 ай бұрын
This is great, thanks
@vijaybudhewar701411 ай бұрын
That is something new i did not know this...as always you did your job at its best
@elikyals7 ай бұрын
Can output parsers be use with the csv agent?
@oz454910 ай бұрын
I have an agents which goes through a list of tasks. I want the output structure to be different depending on the question asked. Maybe in one instance I just want to return json but in another instance i want to return markdown. I tried to do this with prompts but it is not consistent. Is it possible to do this?
@giraymordor11 ай бұрын
Hello Sam, i have a question: I aim to send a sizable text to the OpenAI API and subsequently ask it to return a few select sections from the text I've dispatched. The text I intend to send consists of approximately 15k tokens, but the token limit for gpt3-5.turbo is merely 4k. How might I circumvent this limitation and send this text to OpenAI using the API? This is not for the purpose of summarization, as there are ample examples of that on KZread. My goal is to send a substantial amount of text to OpenAI within the same context, and for the model to retain what I previously sent. Following this, I would like it to return a few parts from the original text, preserving the integrity of the context throughout these operations. Thank you in advance for your guidance!
@anindyab11 ай бұрын
Thanks for this, Sam. Your videos on Langchain have been incredibly informative and helpful. Here's a request: Can you please do a video on creating Langchain agents with open source/local LLMs? The agents seem to require specific kind of output from the LLMs and I think that can be a nice follow up to this video. In my brief experience open source LLMs are not easy to work with when it comes to creating agents. Your take on this will be very helpful.
@samwitteveenai
11 ай бұрын
The big challenge is most of the released Open Source models can't return the right format. I have a new OpenAI one coming and will try to convert that to open source to show people.
@anindyab
11 ай бұрын
@@samwitteveenai This is great news. Thank you!
@tomaszzielonka980810 ай бұрын
How giving a specific role (in this example a master branding consultant) improves (or impacts, in general) the outcome of a prompt? LLMs make predictions based on sequence of words and I try to bond role-playing with model's output.
@alihosseini5928 ай бұрын
As you also mentioned in the video, CommaSeperatedOutputParser does not really work well(for example there was a dot at the end of LLM's response. Is there any other way to get the model to output only a list?
@imaneb407314 күн бұрын
Hello thank you so much for such valuable and creative content that helps us a lot please i have a question , I am using pydantic Output Parser on a strcutured pdf documents to generate a dataset ( where I will select only specific fields ) I used OpenAI as llm model but the problem i faced is i am working with a folder of 100 pdfs so the code suddenly is intrurrepted due to openai limit daily rate of requestion . Please how to handle this is there a trick ? Or another alternative?
@Darkhellwings11 ай бұрын
Thanks for the explanations. What I still miss from this tutorial (and some others of yours), is how to personalize langchain's API to go beyond what is provided at the moment. For instance, a simple question raised immediately after watching this would be : how to implement a custom output parser, for a custom format that is not JSON or lists ? Is it possible to make something for tables ? Thanks anyway, that was still great !
@samwitteveenai
11 ай бұрын
almost all customization is done at the prompt level. If you are doing something for a table you would want to think through first what would the LLM return as a string. a CSV? How would it represent a table etc. Then work on the prompt that gets that and lastly think about an output parser . You raise an interesting issue, maybe I should make a video walking through how I test the prompts first and get that worked out. If you have good use case please let me know. One issue I have is I can't show any projects I work on for clients etc.
@sethhavens157411 ай бұрын
i’ve noticed that using turbo 3.5 recently there is quite often issues with the model being overloaded - using langchain (at least i assume that is where this comes from) the chain will usually retry the llm query - is there a way to control the number of retries and the interval between retries? and thanks for the awesome content, super useful stuff! 👍
@samwitteveenai
11 ай бұрын
I think there is a PR submitted to control the number of retries but don't think it is there yet.
@sethhavens1574
11 ай бұрын
@@samwitteveenai cool thanks for the feedback dude
@Aroma_of_a_Roamer10 ай бұрын
Love your content Sam. I was wondering have you ever got classification/Data Extraction working with an Open Source LLM such as Llama 2? Would love to see a video on this if you have. Thanks Keep up the great work.
@samwitteveenai
10 ай бұрын
I have been working on this with mixed results, hopefully can show something with LLaMA2
@Aroma_of_a_Roamer
10 ай бұрын
@@samwitteveenai You are an absolute champion. I think all app development is exclusively done with ChatGPT since it is a) superior to open source LLM & b) App & library developers such as Langchain have geared their app development towards it, using their own prompt templates. Each LLM has its own way and nuance as to how to format the prompt in order to make it work correctly.
@redthunder618310 ай бұрын
How does the output parser actually parse the output that it gets back tho? Is it just regular code? Or is it something more. Like as an example what if the model forgets the second “ to end a string???
@cloudprofessor2 ай бұрын
How can we use output parsers with RetrievalQA ?
@MrOldz6711 ай бұрын
Hey @Sam thanks again for all these useful videos I was wondering would that be possible to use the same outputformer to get a Json file that later we would be able to use as a dataset to train our language model If yes would it be possible to bypass openai in this process and maybe use another Llm from a privacy perspective Thanks a lot
@samwitteveenai
11 ай бұрын
Yes, absolutely you can use it to make datasets. Lots of people are doing this. It will work with other LLMs but most the open source ones won't have good outputs so they often fail etc.
@MrOldz67
11 ай бұрын
@@samwitteveenai Thanks for the answer I will try to find a way to do that. But meantime if you would like to make a video i'll be really interested :) Thanks in advance
@BrianRhea10 ай бұрын
Thanks Sam! Would using an Output Parser in combination with Kor make sense? Is that worth a video on its own?
@samwitteveenai
10 ай бұрын
At the moment all of this is changing with the OpenAI functions (if you haven't seen them I have a few vids about this ). Currently LangChain also seems to be rethinking this. I will revisit some of these. One issue is going to be are we going to have 2 very different ecosystems ie OpenAI vs everything else. I am testing some of the new things in some commercial projects, so let see how they go and then I will make some new vids.
@easyaistudio11 ай бұрын
the problem with trying to do the formatting in the same prompt that does the reasoning is that it impacts the result
@samwitteveenai
11 ай бұрын
You can get the model to give reasoning before as part of the output. Ideally you want reasoning instructions earlier that output instructions.
@askcoachmarty11 ай бұрын
Great vids, Sam! So, is this awesome pydantic output parser available for node? I'm finding shaky info in the JS docs, I'm currently using the StructuredOutputParser, but I'm creating some agents that I want to output in Markdown. Is it best in javascript to just post-process and convert to markdown? Any pointers or thoughts would be greatly appreciated!
@samwitteveenai
11 ай бұрын
Pydantic is a Python thing so maybe not in the JS version, but my guess is they will have something like this soon. Technically you could just make it yourself as it is all just setting a prompt. I have a new vid coming out in an hour which shows another way to do the same thing.
@askcoachmarty
11 ай бұрын
@@samwitteveenai cool. I'll look for that video!
@ElNinjaZeros8 ай бұрын
When I try to apply this parsing with models called -langchain, sometimes it works and sometimes it doesn't. Same with langchain's pydantic.
@ohmkaark9 ай бұрын
Thanks a lot for great explanation!!
@RedCloudServices11 ай бұрын
There is no LangChain plugin in the ChatGPT plugin store. Did they remove it?
@user-wr4yl7tx3w11 ай бұрын
By chance. Does LangChain have an implementation of the Tree of Thought?
@samwitteveenai
11 ай бұрын
Not yet but I have been playing around with it. I want to make sure it works for things not just in the paper before I make a video.
@RobvanHaaren2 ай бұрын
Sam, I love your videos, I'm a huge fan. The only feedback I would have to make your channel better is to fix your typos. Both in your template strings (not a big deal since the LLM will understand regardless), but also your video titles (eg. at 4:57) "Ouput", may affect your credibility. All the best and keep up the good work!
@samwitteveenai
2 ай бұрын
Thanks & Sorry about that. I have tended to record these on the fly and put them out. I have someone editing now who will hopefully catch them as well.
@PaulBenthamcom11 ай бұрын
With regards the Pydantic Output Parser, when it gets the badly formatted output - do you get that as your prompt result or does the parser feed that error back to itself to correct it until it has a well formatted output to return to the user?
@samwitteveenai
11 ай бұрын
It will give an error and you can set that to trigger an auto retry etc.
@blackpixels984111 ай бұрын
Thanks Sam! Is it just me or do you also feel that the API is slow to return json 'code' than it is plaintext? Getting upwards of 30 seconds per API call to parse a PDF table into 250 tokens of json
@samwitteveenai
11 ай бұрын
Interesting. I haven't noticed that. It shouldn't be any slower
@bleo448511 ай бұрын
Hi Sam, thanks for the video. You should set up a patreon or something. Your videos have helped a lot. thanks and keep up the good work!
@samwitteveenai
11 ай бұрын
Thanks for the kind words.
@xiam1911 ай бұрын
Can you do a video on ReWOO (Reasoning WithOut Observation)?
@samwitteveenai
11 ай бұрын
Yeah it looks pretty cool. I will take a proper look.
@MichaelScharf11 ай бұрын
Is this not eating up a lot of tokens, especially the pedantic case?
@samwitteveenai
11 ай бұрын
Yes it does eat up some more tokens, but the pydantic model really allows you to use the outputs in an API etc much easier. Regarding price it all depends on how much you value interaction. I see some customers are happy to pay a dollar ++ for each conversation which is a lot of tokens. Usually that is a lot cheaper than a real human being involved etc.
@picklenickil11 ай бұрын
This is what you call.. more than a party-trick.
@ivanlee74509 ай бұрын
is it possible to use another llm for output parser
@samwitteveenai
9 ай бұрын
yeah certainly then it becomes a bit like an RCI chain which I made a video about.
@ivanlee7450
9 ай бұрын
How about hugging face model
@shivamkumar-qp1jm11 ай бұрын
Can we extract code from the response
@samwitteveenai
11 ай бұрын
yes take a look at the PAL chain it does this kind of thing
@mytechnotalent11 ай бұрын
New to LangChain Sam and I appreciate this video. Really looking for how to tune this properly with the open-face HuggingFace rather than OpenAPI paid API.
@NoidoDev11 ай бұрын
I don't get it, maybe I missed something or don't know some important element. Why is the language model supposed to do the parsing as some form of formatting? Why isn't this just done in code with the response from the model?
@samwitteveenai
11 ай бұрын
Getting the model to output it like that makes it much easier than try to write regex expressions for every possible case the model might output.
@orhandag540Ай бұрын
but wwhat if we want to to that with an open source LLM(Hugging Face) ?
@samwitteveenai
Ай бұрын
You can certainly do the same with something like a Mistral fine tune etc
@orhandag540
Ай бұрын
@@samwitteveenai but somehow the prompt template of mistral is not compatible with langchain models, I was trying to build this with exactly with mistral
@ashvathnarayananns63206 ай бұрын
Can you post these videos using open source llm rather than using open ai APIs. Thank you
@samwitteveenai
6 ай бұрын
I have posted quite a few videos that use OpenSource models. One challenge is up till recently the OSS models weren't good enought to a lot of the tasks.
@ashvathnarayananns6320
6 ай бұрын
@@samwitteveenai Okay and Thanks a lot for your reply!
@gitmaxd8 ай бұрын
I disagree! This is one of the more sexy parts! It’s the hocus pocus of “Prompt Enginering”. Great video!
@clray12311 ай бұрын
Honestly the more I watch about LangChain the less value I see in using it vs. just coding your own interactions with the model. It seems to be doing trivial things at a very high level of text processing and obscuring what it does. While you still have to learn the API and be limited by it.
@MadhavanSureshRobos11 ай бұрын
Practically speaking, isn't guidance so much easier and better to use? For practical reasons these doesn't seem to add more value
@samwitteveenai
11 ай бұрын
I am planning to do a video on Guidance and Guardrails as welll.
@MadhavanSureshRobos
11 ай бұрын
That'll be wonderful!
@hqcart111 ай бұрын
I've managed to get gpt3.5 to return json for 100k prompts, and it always returned json. it took me few hours to get the right prompt though!
@jawadmansoor606410 ай бұрын
What parser or other method do you use in chains? For example: memory = ConversationBufferMemory(memory_key="chat_history") tools = load_tools(["google-search", "llm-math"], llm=llm) agent = initialize_agent(tools, llm, agent=AgentType.CONVERSATIONAL_REACT_DESCRIPTION, handle_parsing_errors=_handle_error, memory=memory, verbose=True) I am getting output parsing errors: Thought:Could not parse LLM output: `Do I need to use a calculator or google search for this conversation? Yes, it's about Leo DiCaprio girlfriend current age raised 0.43 power.` Action:google_search` Observation: Could not parse LLM output: `Do I need to use a ca Thought:Could not parse LLM output: `Could not parse LLM output: `` Do you want me to look up more information about Leo DiCaprio girlfriend's current age raised 0.43 power?`` Action:google_search`` Observation: Could not parse LLM output: `Could not parse LLM o Thought:Could not parse LLM output: `` Is there anything else you would like me to do for today?`` AI:Thank you! > Finished chain. 'Thank you!'