Tagging and Extraction - Classification using OpenAI Functions

Ғылым және технология

Colab: drp.li/Ys6hc
My Links:
Twitter - / sam_witteveen
Linkedin - / samwitteveen
Github:
github.com/samwit/langchain-t...
github.com/samwit/llm-tutorials
00:00 Intro
00:52 Classification/Tagging
06:15 Using Pydantic
08:02 Extraction, NER

Пікірлер: 63

  • @daffertube
    @daffertube7 ай бұрын

    Thank you for making these tutorials. They are very helpful!

  • @jaimemelon2621
    @jaimemelon262111 ай бұрын

    Incredible channel on LangChain and AI in general.

  • @djpremier333
    @djpremier33311 ай бұрын

    You have the best channel about langchain, I love your content.

  • @samwitteveenai

    @samwitteveenai

    11 ай бұрын

    Thanks.

  • @qingdong801
    @qingdong8015 ай бұрын

    Thank you so much for sharing the code in colab and github!

  • @jasonlosser8141
    @jasonlosser8141 Жыл бұрын

    Hi Sam, once again the quality of your videos are amazing. I built an extraction using OpenAI functions this weekend to get excellent JSON returned. I have a few scripts that attempt to do this with basic prompting, but they can hallucinate occasionally. As of now, this function concept is working great. Your tip on enum is phenomenal - I hadn’t thought of that. Also, a criticism I have is turbo isn’t quite as solid as davinci3 on its returns. I don’t have api gpt4 yet - I will try that once granted by OpenAI. Anyways, last thing - do you feel running through langchain is even necessary? I have felt the OpenAI function implementation can eliminate langchain for a great deal of what I do -- perhaps a bit more scripting on my part, but eliminates a friction/fail point

  • @samwitteveenai

    @samwitteveenai

    Жыл бұрын

    You raise a number of key issues here, let me try to address each one. 1. I agree the turbo model is often not as good as Davinci 003 etc. I personally think that is because turbo is a distilled smaller model (but I have no inside knowledge on that) 2. GPT-4 is coming to more people soon. 3. I think for some things LangChain is the right solution and for others not. This week I have worked on a number of things that ping the OpenAI APIs directly just because it was easier for what I was doing. LangChain & Llama Index is still very cool and probably the best ways to go for using data and tools with LLMs, If you can get away without it then that is fine to do

  • @sethhavens1574
    @sethhavens1574 Жыл бұрын

    now that’s some clever stuff 👌

  • @toddnedd2138
    @toddnedd2138 Жыл бұрын

    Thanks for the explanation. The new features (functions & larger prompt window) of the openAI models are a little bit like you buy a technical device with a lot of buttons but the vendor does only give you an example usage instead of a detailed manual. It would be very helpful if openAI would publish some training data of the models. On the other hand, maybe this try&error attempt creates the new jobs that everybody is talking about ; - )

  • @borisw1166
    @borisw1166 Жыл бұрын

    Thank you for the video. Did some testing today with Kor and Kor seems to work better in my cases. Tested it with a bill with instruction for the customer reference for the bank transfer. With Kor not only do I get the right customer reference (based on the instructions on the bill), but it also calculates the right amount (since no total amount is on the bill). With functions, it only works with tagging (instead of extraction) but it does not calculate the amount. This is also a good example for a "positiv" prompt-injection, since the instruction on how to use the right customer reference was on the bill and got "injected" into the prompt :D

  • @samwitteveenai

    @samwitteveenai

    Жыл бұрын

    I like Kor and made a video about that in the past, so not surprised it may work better for certain use cases. Be really careful relying on any of the models to calculate correctly, that sounds like it is the kind of thing that could break easily.

  • @borisw1166

    @borisw1166

    Жыл бұрын

    @@samwitteveenai Sure, we are not relying on that. I was actually surprised it did calculate the amount. I am currently testing out different ways for data extraction, and that bill actually failed in my banking app (with the photo transfer feature) so I just gave it a try.

  • @anubiseyeproductions2921
    @anubiseyeproductions2921 Жыл бұрын

    You didn’t start the video with “Okay…”. I aways look forward to that.

  • @sethhavens1574

    @sethhavens1574

    Жыл бұрын

    also this ^^ 👍

  • @samwitteveenai

    @samwitteveenai

    Жыл бұрын

    lol

  • @shraey2021
    @shraey202110 ай бұрын

    Hi Sam, just came across your channel. Pretty cool stuff. I had a query. I saw your video (maybe couple back) where we convert function calling from open ai into Tools and then call it as an agent. Here we call it more like a chain, am thinking multiple functions like multiple tools and chain calls it. Are both techniques equal or is one way better than the other? Cheers

  • @rkenne1391
    @rkenne1391 Жыл бұрын

    Thank you so much, insightful + notebook. Awesome, quick question, how do you combine it with few shots / icl ? FewshotPromptTemplate ?

  • @samwitteveenai

    @samwitteveenai

    Жыл бұрын

    Can just add it into the prompt template or yes look at that prompt template.

  • @MadhavanSureshRobos
    @MadhavanSureshRobos Жыл бұрын

    Looking forward for more projects with open LLMs too. I feel OpenAI is the best no doubt but it's a Supercomputer vs PC fight now and we'd rather have PCs. Just my opinion. Anyway always love your content.

  • @samwitteveenai

    @samwitteveenai

    Жыл бұрын

    don't worry I haven't given up of OpenSource LLMs. I will make some more vids soon.

  • @Xaddre
    @Xaddre Жыл бұрын

    I created a terminal using it where it will return the command necessary to do whatever the user asks in plain English it’s pretty prototype like but I actually use it a lot when I forget a terminal command for something.

  • @dare2dream148
    @dare2dream14810 ай бұрын

    Thanks agani for sharing Sam! I've got two qns. 1. Is these optimizations why GPT-4 seems to be becoming worse in some other tasks over time? Does it imply OpenAI is focusing more on API usage than Chat usage going forward? 2. On applying few-shot ICL to these functions. What are some of ways/ideas to implement it?

  • @MrOldz67
    @MrOldz67 Жыл бұрын

    Hey Sam Thanks as always for your great video that's a big work you're doing for the community I am curious to get your thoughts about a potential usage with document summarization and QA. I am currently building a solution like this that will allow you to query your documents and ask them questions or ask the chatbot for summarization or even generation of a copy using a document template etc I was looking to build that with unstructured data and using a faiss or vectordb solution to build a database is there any benefit to use tagging and extraction or can that even be a solution? Thanks in advance for your answer as always

  • @samwitteveenai

    @samwitteveenai

    11 ай бұрын

    Yes often you will extract meta data and then do the search with both vectors and meta data.

  • @kesavanr5341
    @kesavanr53414 ай бұрын

    Your Videos are awesome, Thank you for the langchain series, I was wondering if there are any tagging chains with open source llm like palm or llama

  • @samwitteveenai

    @samwitteveenai

    4 ай бұрын

    yeah there are some with some of the new Mistral models. I will try to make some new vids over time

  • @kunalmundada8754
    @kunalmundada875411 ай бұрын

    Nice video! Just wondering if I could add type as list or array and define what I want in the array in the description

  • @samwitteveenai

    @samwitteveenai

    11 ай бұрын

    Yes especially in the Pydantic classes you should be able to do that.

  • @andy111007
    @andy1110079 ай бұрын

    Hi Sam, thanks for the amazing video. Any plans for. any follow ups. This is an interesting concept underutilized

  • @Arjay2186
    @Arjay2186 Жыл бұрын

    Thanks. Awesome as usual. Is there a way to combine this with Retrieval QA for loooong documents and changing information to be extracted each time?

  • @samwitteveenai

    @samwitteveenai

    Жыл бұрын

    Probably not simply if you want to change the schema each time. there is a new Retrieval QA using functions too. I may make a video on that.

  • @Arjay2186

    @Arjay2186

    Жыл бұрын

    @@samwitteveenai Big thanks. That would be great. Need to find time to play around with Langchain again. Whole thing is changing way too fast.

  • @sskarimirelandsskarimirela8750
    @sskarimirelandsskarimirela8750 Жыл бұрын

    Dear Sam I'm really quite worried that almost the integration is with open ai which is not open source or limited is that means that you can't or langchain can be used only with open ai ? Thanks

  • @samwitteveenai

    @samwitteveenai

    Жыл бұрын

    LangChain really seems to be OpenAI first and then other models later (though it supports the other models fine as well). The problem is most of the open source models just can't do this kind of task, meaning they don't have the reasoning skills to do it.

  • @sskarimirelandsskarimirela8750

    @sskarimirelandsskarimirela8750

    Жыл бұрын

    @@samwitteveenai many thanks dear for your great effort . I feel like we can't run business for small company with dominance from few big companies and pay for each token ....

  • @samwitteveenai

    @samwitteveenai

    Жыл бұрын

    Trust me often paying for tokens is much cheaper than running 4-8GPUs to run your own model etc.

  • @sskarimirelandsskarimirela8750

    @sskarimirelandsskarimirela8750

    Жыл бұрын

    @@samwitteveenai many thanks 👍👍👍

  • @dangerous235
    @dangerous235 Жыл бұрын

    thank you for your great video. just a question: why it's runnable while we don't have a local function named information_extraction?

  • @samwitteveenai

    @samwitteveenai

    Жыл бұрын

    good question. Its because we don't have anything in the outparser that tries to run a a function, like we did in the other ones.

  • @dangerous235

    @dangerous235

    Жыл бұрын

    could you explain how could we check (print) outparser, for example, in case of the agent with stock price tool in your previous video? I want to better understand the differences between 2 cases (run local function and these)

  • @dangerous235

    @dangerous235

    Жыл бұрын

    I realize with these cases, instead of require function to be executed, the parameters themselves are our expected output, correct?

  • @TomMathews
    @TomMathews Жыл бұрын

    Hi Sam. First off, your content is simply amazing. Detailed and very informative. I have been motivated and experimenting with LangChain quite a lot, especially since I started watching your content. Can you possibly also create a video on the latest HugginFace Transformer Agents?

  • @samwitteveenai

    @samwitteveenai

    Жыл бұрын

    I made a video a while back, have they added anything new?

  • @TomMathews

    @TomMathews

    11 ай бұрын

    @@samwitteveenai My bad. I had been going through your LangChain playlist and somehow missed that video. Thanks again.

  • @user-vw4rq5sm2y
    @user-vw4rq5sm2y Жыл бұрын

    I was working on this and trying on some of my use cases. This feels amazing. The problem I want to solve now is to run these chains on large amount of data (documents). Is there any solution/hack that you found using langchain?

  • @samwitteveenai

    @samwitteveenai

    Жыл бұрын

    Actually just a for loop is fine and monitor for API calls that don't work etc

  • @user-le4cv1mm4h
    @user-le4cv1mm4h7 ай бұрын

    hi Sam can we use LLM for detecting images fake or real ? thank you

  • @RyanScottForReal
    @RyanScottForReal11 ай бұрын

    Hmm I guess since I'm needing to do both extraction and tagging I need to run 2 steps - is there a way to do both in a single shot?

  • @samwitteveenai

    @samwitteveenai

    11 ай бұрын

    yes just put it all in one pydantic class.

  • @yurijmikhassiak7342
    @yurijmikhassiak7342 Жыл бұрын

    Thanks. Do you think GPT4 whould do better?

  • @samwitteveenai

    @samwitteveenai

    Жыл бұрын

    Yes totally. GPT-4 does better pretty much on most things especially reasoning stuff.

  • @youssefsalah5265
    @youssefsalah52653 ай бұрын

    how to add examples to the prompet

  • @vinsi90184
    @vinsi9018411 ай бұрын

    Hey Sam, follow and love your videos. I am trying to create a bot where I need to extract an entity and pass on to some APIs depending on the question. For example, what's the weather in my area? should respond to which area you reside and if I pass a name that matches in the list, then I can call the API and return the answer. So, problem 1 is to ask user recursively to get complete question and in that process also solve the problem 2 which is to extract the "entity" but matching to what I have in my docs. So, if the person response my home, the bot should ask but which city is your home and if I give a city name that does not exist the bot says it doesn't have the information. I have to do it over multiple types of entities and not just one, say location, date / date range etc. What should be the best way to approach this problem?

  • @samwitteveenai

    @samwitteveenai

    11 ай бұрын

    This would be more the prompt and serious amount of customizing the prompt. Also you could look at filling a set of slots, so when the slot is still empty the program then asks a direct question (Which city do you live in") and then filter for that slot. Don't think of the OpenAI Functions as being static, you can call with different functions based on what you get back.

  • @csharpner
    @csharpner Жыл бұрын

    Is this available for locally run models yet?

  • @samwitteveenai

    @samwitteveenai

    Жыл бұрын

    Not yet especially not how they structure the input etc like this. But there might be some things coming soon.

  • @dtkincaid
    @dtkincaid11 ай бұрын

    I must be missing something. The use cases you're showing here were already possible using output parsers. What's new here? I've been doing these things for a while now.

  • @samwitteveenai

    @samwitteveenai

    11 ай бұрын

    This is far more stable than output parsers because the model is fine-tuned to use these functions.

  • @JOHNSMITH-ve3rq
    @JOHNSMITH-ve3rq Жыл бұрын

    Bro your mic seems to max out at a certain frequency or something? 13:26 or so the word “extract” gets flattened or cut off. It seems to be a sound setting.

  • @samwitteveenai

    @samwitteveenai

    Жыл бұрын

    Its because I recording in a room with a lot of reverb and then I run it through a denoiser to remove the reverb. Certainly not ideal and open to any suggested solutions you have.

  • @gramothy_taylor

    @gramothy_taylor

    Жыл бұрын

    Have you tried a microphone isolation shield? Cheaper than doing a whole room of acoustics, and should help a lot with that.

  • @mattizzle81
    @mattizzle81 Жыл бұрын

    I found the functions a really cool idea but not ready for prime time. It breaks too much still. ChatGPT sometimes ignores them altogether and claims it can't do anything, or even worse it "pretends" to use the function. It calls it but the API doesn't actually pick it up as a function call.

Келесі