Create Your Own ChatGPT with PDF Data in 5 Minutes (LangChain Tutorial)

Тәжірибелік нұсқаулар және стиль

📚 My Free Resource Hub & Skool Community: bit.ly/3uRIRB3 (Check “KZread Resources” tab for any mentioned resources!)
🤝 Need AI Solutions Built? Work with me: bit.ly/3K3L4gN
📈 Find out how we help industry experts sign their first 5 AI Agency clients, guaranteed: bit.ly/skoolmain
In this video I show you how to train ChatGPT on your own data in 5 minutes using LangChain so you can chat with your PDFs! This is a super beginner friendly guide that explains how these custom knowledge chatbots can be created in a few minutes using LangChain. This is similar to tools like ChatPDF which allow you to chat to your docs (chatpdf.com/).
If you've ever wanted to know how to chat with your PDFs or train ChatGPT on your own data, this is the video for you! Code available below.
Create a copy of my notebook (code):
colab.research.google.com/dri...
Timestamps:
0:00 - What we're building
1:10 - System Explained
2:48 - Creating the chatbot
8:18 - Steal my code!

Пікірлер: 309

@LiamOttley11 ай бұрын
Leave your questions below! 😎 📚 My Free Skool Community: bit.ly/3uRIRB3 🤝 Work With Me: www.morningside.ai/ 📈 My AI Agency Accelerator: bit.ly/3wxLubP
@moses5407 Жыл бұрын
Golden! Clear, concise info and a notebook! If it's too fast for some viewers, I'll remind that they can always show down the replay speed.
@borisbadinoff1291 Жыл бұрын
👏👏 Hey Liam, your five-minute tutorial is fantastic! Kudos and thanks for putting the effort to produce it. Your app is exactly what any knowledge worker is craving for: We all have gigabytes of pdf files in some folder named "READ", "TO READ" or "__TO READ" (so it stays on top of the root :), but never get to it (probably distracted by all these tutorials to become more productive we love to watch). A bot that can read that stuff for us, so we can continue to wing it is a true godsend. :D
@guilhermeveiga9345 Жыл бұрын
Thought it would be just another video on the subject, but you summarize in an awesome way! Great vid! Congrats
@naturallydope247 Жыл бұрын
This was definitely one of your better videos. You explained Langchain well and I’m glad you used the colab notebook instead of Jupyter or repl.
@ryanjames3907 Жыл бұрын
thank you for time, effort and generosity, I wish very good things for you.
@chandrachoodR Жыл бұрын
Thats a fantastic video and to the point and thanks for the code as well
@user-tm1jp7fk7n Жыл бұрын
You're awesome, Liam !!
@stefano94103 Жыл бұрын
Excellent! Thank you for your hard work to put these together.
@LiamOttley
Жыл бұрын
My pleasure! Thanks for watching
@AlbyTheMovieCreator
Жыл бұрын
This video was copied from the beginning to the end from the channel Prompt Engineering
@stefano94103
Жыл бұрын
@@AlbyTheMovieCreator Oh wow I totally didn't know that. Thanks for the heads up! SMH😒
@CK-ho7gj Жыл бұрын
Awesome tutorial. Cheers Liam
@gabijazza1220 Жыл бұрын
Cheers, this is a brilliant video. Looking forward to making a bespoke AI.
@konstantinrebrov675 Жыл бұрын
Wonderful tutorial. Thank you!
@LiamOttley
Жыл бұрын
No worries 🤙🏼
@bendaniels8677 Жыл бұрын
Appreciate your hustle bro
@sganesh07 Жыл бұрын
Thanks Liam ... neat and fast as always; could you post another similar video doing the same thing with Llama index pls. I thought that was easier.
@justingu9541 Жыл бұрын
Thank you for your excellent sharing. This is great guidance, and I hope you can continue to share more! If there's anything I can do, please let me know~
@AndrewSheves Жыл бұрын
Liam, this is a great tutorial, thank you. What I really liked was the explanation of what is happening behind the scenes - anyone (even a non-developer) like me - can cut and paste the code but knowing what the commands are doing is super helpful. The explanations in the Colab are great and I took your advice and stole your code. The chatbot was up and running in a few hours (remember: non-developer) but that included building a separate UI. Great work, thank you
@aradinac
Жыл бұрын
can i ask whether you paod for the OPENAI KEY OR YOU DID IT WITH THE FREE TRAIL? Cuz am encountering this error RateLimitError: You exceeded your current quota, please check your plan and billing details.
@AndrewSheves
Жыл бұрын
@@aradinac I used the paid for openAi key
@csss142
7 ай бұрын
@@AndrewSheves which one did you buy?
@miguelmunoz4135
6 ай бұрын
@@aradinac I have the same error because I have the account not paid, if you found another solution, pls let us know
@antonpictures Жыл бұрын
i there! As a fellow filmmaker, I find the concept of regenerative agents fascinating. I'm curious, what specific types of agents are you interested in exploring in your video? Additionally, have you thought about incorporating some real-world examples of sim city-like models, such as the ones developed by Stanford, to help illustrate the concept to your audience? Looking forward to hearing more about your project! George Anton
@omountassir Жыл бұрын
Freaking Great Content! Keep Rocking 💯
@coinhawk Жыл бұрын
Great job... will run this on my writings/ book collection and my code snippets, and build an awesome, MeKnowledgeBase 😎
@luigiseven Жыл бұрын
Awesome work
@vicentesoto162811 ай бұрын
Liam your content is unreal Some of the best I've seen so far This is hard knowledge You are brilliant What do you mean by 512 tokens on every chunk? Characters? I'll be waiting for a detailed masterclass Vicente
@tspang1977 Жыл бұрын
Hi Liam, great video. I do have a question, from the following code, i notice that we don't have to specifically turn the "query" into embeddings, before it performs a search against the vector db? Is it because the function "similiary_serach" internally calls the openapi embedding to perform words embeddings? query = "Who created transformers?" docs = db.similarity_search(query)
@zoumanakeita801611 ай бұрын
Straightforward and concise! Great explanation. How do you extract the exact page number where the answer was found?
@SimonStJohn Жыл бұрын
Hey Liam! Awesome...could you do one that scrapes data from blog/website for embedded chatbot for a blog?
@user-rc6ik9gz6g11 ай бұрын
Thank you, keep going.
@MichielVermandel Жыл бұрын
Thanks for the great video! One question: which OpenAI model is used to retrieve the answer? Is it gpt-35-turbo or ada or...? Where is it defined?
@andym9565 Жыл бұрын
Great video! Would be cool to create a video similar with Apify and LangChain.
@user-by3xv9kv4s Жыл бұрын
This is great Liam, thank you for sharing, what's the simple automated way to deploy this code to a basic online application/chat page
@chatbotsvideochatbotsforwe1207 Жыл бұрын
Very Good👍
@tuwayne3624 Жыл бұрын
Thank you, I've learned a lot from your channel. I'm curious about the differences between the llama index and the langchain. Maybe I'm still a beginner in AI and don't quite understand.
@chrispac6264
Жыл бұрын
Ask ChatGPT4
@joepbaks Жыл бұрын
Thanks a lot man, been trying to get this to work via other ways for days. This was so easy, great tutorial. How would you transfer something like this to a user friendly ux/ui?
@chrispac6264
Жыл бұрын
Ask ChatGPT4
@noteniceu Жыл бұрын
Can you feed it multiple pdf at the same time like a group of 300 or would you have to run each line individually.
@1Esteband Жыл бұрын
Thank you it worked perfectly despite generating an error on the pip install. ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. yfinance 0.2.18 requires beautifulsoup4>=4.11.1, but you have beautifulsoup4 4.8.2 which is incompatible.
@mic96572 ай бұрын
Those biceps too! 💪
@ganashayoutube Жыл бұрын
awesome bro
@vverboX Жыл бұрын
I would love to see a video which helps me to deploy such a chatbot (created on colab) on a webpage.
@SedhuujGorem5 ай бұрын
The Best tool for this is kzread.info/dash/bejne/lJd_ma6llKWZlM4.html I like some of the transitions, but sometimes they're a bit too much and are seemingly random. Since we use these persistent elements that transition across pages to indicate some kind of relationship between the previous and the next states, some of your transitions confuse me because I can't immediately see what the relationship is. For example 1:23 of the selectable tiles (which weren't selected) transition into being two switches... does that mean anything? are they related in some way? I see this as random and a bad use of the design language. However, at 3:14 I like the transition from switches to the ticks on a paper, that makes sense to me. Epic presentation tho
@JerryTrade2810 ай бұрын
Will you be sharing your Marcus Aurelius database u created previously? I was really looking forward to that
@johnjoesafatso11 ай бұрын
Amazing content. Thank you! Is there a way to do this with PDFs that have graphics and images?
@minhe900811 ай бұрын
great tutorial! I have hundreds of research papers in pdf format. Can I use this approach to build a vector db and then chat with chatgpt? Is there a limit to the size of db? any pitfall to avoid?thanks!
@quangdinhdota2388 Жыл бұрын
Amazing Video. I have a question: Can Your notebook (code) run with muti file pdf?
@Ramp_cat_7 Жыл бұрын
This is amazing! Can you teach us mindai?
@qwerto-ye5pe Жыл бұрын
Hi! I just wanted to ask what are the licenses used in this project? Are they commercial-friendly?
@user-we3qo9kj4q Жыл бұрын
is there way to also store the questions from the user and the answers to them for monitoring, data analysis and other ideas?
@JohnAlexanderEcheverryOcampo Жыл бұрын
Thanks
@bene88597 Жыл бұрын
You got my mail buddy GJ
@Pppljssbs Жыл бұрын
As a beginner coding their first ever plug-in, how long would it take to develop a high quality plug-in?
@Iatalksbrasil Жыл бұрын
great video! help me to complete me knowlege about best praticies in prompt!
@juliamarsh207710 ай бұрын
Can you also use it to write content, e.g. web articles, based on the PDF or PDFs you have uploaded?
@armandocapogrossi6689 Жыл бұрын
Thanks, very good content. Just a question to understand the market better: did I misinterpret your hourly rate at $997/45 mins?
@vukradovic1722 ай бұрын
Excellent
@InnocenceVVX9 ай бұрын
So essentially you calculate semantic similarity of the stored vectors and the asked question, then provide the 4 most similar vectors as context in the prompt?
@yiyuanzhang6335 Жыл бұрын
will need a video on how to do this for multiple pdfs
@user-rf9dl1bl6s Жыл бұрын
Hia Liam! Which version of gpt does the chatbot use? Can I use it with gpt4?
@marcosemeria97 Жыл бұрын
Can you suggest alternatives to OpenAI in terms of embeddings and llm? They are too expensive their APIs
@JJBoi8708 Жыл бұрын
What is a good way to split text in a textbook pdf because on one page it has 2 columns, text on the left and right side?
@georgekokkinakis7288 Жыл бұрын
Can you explain how we could use other llms than openai, for example can we use mosaic mpt-7b ?
@navigatingsideways8 ай бұрын
It’s convenient because I just completed a Data Analysis course via IBM, and Vanderbilt Promp Engineering course. I created my first Smart Bot for my Dad’s website on Sunday. I’d like to dump RFP contractor documents to easily take the 88 pages to question parts of a bid
@Finalform77 Жыл бұрын
Hi Liam, I am getting 'authentication Error' when running 2. section of the code "Embed text and store embeddings" . I have not change anything yet just running it as is. Any suggestion?
@frosti7 Жыл бұрын
What solution can dynamically add or extract database for an LLM? Like your company information that can be accessible by employees
@featherly4267 Жыл бұрын
Brother can you make video on how to use autogpt for beginners 😊
@paulp67529 ай бұрын
Great stuff. Is there any good model to perform the embeddings calculation (and then semantic search) on my server in oppose to use OpenAI API?
@flyinonminds6415 Жыл бұрын
great video!, it is possible to add more than 1 pdf with that code ?, will be possible to provide a code for multiple pdf ? thank you
@kiranhipparagi54310 ай бұрын
Hello. Thanks for a great video. But i have financial statement pdf file and it contains tables in it. How can i achieve besy results out of it? Any suggestions or help would be more helpful. Thanks😊
@timtensor69948 ай бұрын
great tutorial , can it be modified to support multiple pdfs ?
@tibz11c2 ай бұрын
Great Work! Can we do this with a local or a smaller language model ?
@TheSacredGrove Жыл бұрын
Cool AF!
@LiamOttley
Жыл бұрын
💪🏼
@tommycondon1918 Жыл бұрын
Could you do it using Gradio interface and importing openai module?
@TheSimoncio10 ай бұрын
What about using any other open source LLM instead of GPT? thank you!
@siddhantmohanty15784 ай бұрын
Hi, THANK YOU for sharing your knowledge. Could please let me know how many PDF can we train using this technique and does this LLM remember what PDFs it has been trained on or do we have to train the LLM at before running the query?
@derrickwong3114 Жыл бұрын
Can the chatbot incorporate website links or app deeplink as the chat results?
@maxdranitsa3 ай бұрын
Liam, if there an option to make the assistant always use the data that has been uploaded to knowledgebase? It doesn't read the KB files every time and uses the links that even doesn't exist
@angel1st007 Жыл бұрын
Hey @LiamOttley - I copied your code lab project, however, on the very 1st, I bumped into "ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. yfinance 0.2.21 requires beautifulsoup4>=4.11.1, but you have beautifulsoup4 4.8.2 which is incompatible.". Any thoughts?
@denizkapteina2151 Жыл бұрын
Thanks for the super video. I have a question: in the overview you show that ChatGPT3.5 is used, or that the query is last processed by 3.5. But in the code I can't find any reference to it. Where is my mistake?
@LiamOttley
Жыл бұрын
The default LLM for Langchains "OpenAI()" is text-davinci-003 and "ChatOpenAI()" is gpt-3.5-turbo I believe
@willyjauregui654114 күн бұрын
Out of complete ignorance, is Langchaining the best method currently available to increase the perform of our LLMs Chatbots? If not, what is it or what other methods are out there that I may be missing. Thanks for answering.
@mr.pantherpanther10136 ай бұрын
Hey Liam @ 03.22 you said we can upload pdf data by entering the pdf name. But what if we have more pdf, life for example I have 5 pdf?
@sayamkhan4209 Жыл бұрын
Where does he describe the model to use for output? Is he using Da Vinci 003?
@TheUselessgeneration Жыл бұрын
I cant wait until we can expand this to all documents. I assume that is what Microsoft 365 Copilot will do.
@rishabpoddar3866 Жыл бұрын
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. yfinance 0.2.18 requires beautifulsoup4>=4.11.1, but you have beautifulsoup4 4.8.2 which is incompatible.
@suriyakrishnan5177 Жыл бұрын
Can you explain this same example using expressJS? Coz no other tutorial hasn't used expressJS to illustrate this example
@markbrown160911 ай бұрын
good show chap, can i use chatgpt 3.5?
@Miya-ub5qn Жыл бұрын
Thank you very much for this great video!!! One question. On the part of Create chat bot with chat memory (OPTIONAL), I received the following message "DeprecationWarning: on_submit is deprecated. Instead, set the .continuous_update attribute to False and observe the value changing with: mywidget.observe(callback, 'value'). input_box.on_submit(on_submit)" Why? Would you be able to fix it?
@ranjitherusa7139
Жыл бұрын
I am having same issue Is the optional segment should be on same py program?
@michaeldblake9 ай бұрын
I'd love to figure out how to do this.
@stefan-ls7yd11 ай бұрын
So when I store text in a Vector DB, this method retrieves the raw text to input to the LLM again? Is this the Ada encoder?
@vrynstudios9 ай бұрын
i am noob here. Is it possible to embed it on a site? If I embed, is it standalone? or still it uses GPT API calls and costs?
@themotivationhub1355 Жыл бұрын
Yo I’ve made plugins but don’t know how to test it so can you give some ideas .(I don’t have access to the plugins yet.I’m in the waitlist)
@ameynaik2743 Жыл бұрын
Is there an alternative to open Ai embedding engine which is competitive and free?
@Essential-Self9 ай бұрын
hi and thanks for your work. i am totally new at this but i would like to be able to chat with my whole archive, like a second brain. is this possible with this method?
@yosta3826 Жыл бұрын
Do i pay openai api tokens when using the code or i use gpt2 local model.
@moses5407 Жыл бұрын
Can this be expanded to read from multiple pdf's ... Pt can this be fine by combining pdf's into a single file?
@sahansathsara7106 Жыл бұрын
This is great! But how much does it cost
@aipy51479 ай бұрын
Great video! I was wondering why is it a private chatbot when you're using openAI key and sending the information to LLM GPT-3.5? How can you secure sensitive data with your method? Thank you sharing your knowledge.
@lubeckable
6 ай бұрын
Using and hosting by yourself a custom open source LLM like llama or mistral
@ian5629 Жыл бұрын
How to use this in my business or website? How to embed for example in a better ui
@ronan8154 ай бұрын
Hi, sorry, there is an issue in colab, first script: ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. pydrive2 1.6.3 requires six>=1.13.0, but you have six 1.12.0 which is incompatible. yfinance 0.2.36 requires beautifulsoup4>=4.11.1, but you have beautifulsoup4 4.8.2 which is incompatible.. By the way, do you plan to make an adaptation for Mistral AI?
@GiovaDuarte Жыл бұрын
This is great. Is it possible to retrieve images from the PDF? I have a PDF with many graphics that help understand the content. Do you have any ideas as to how I can provide images as part of the conversation?
@MCroppered
Жыл бұрын
What type of graphics are you talking about?
@quinnherden
Жыл бұрын
You could leverage Lang Chain's agent feature set to use computer vision to analyze your images.
@GiovaDuarte
Жыл бұрын
@@MCroppered the PDF I have has images embedded and I was wondering if how I could recall these during a conversation
@GiovaDuarte
Жыл бұрын
@@quinnherden I will research this. Thanks!
@gaben7
Жыл бұрын
@@GiovaDuarte if you figure out how to bring images along with the conversation, let us know how please
@kingarthur0407 Жыл бұрын
I've written a prompt for GPT-4 that I use with chatGPT in Macromancy formatting to transform it into a legal assistant, and the results have been stellar. Is it possible to encode this prompt into the system you describe so that the bot operates with it in mind?
@kingarthur0407
Жыл бұрын
bump
@igortrindade-dev5 ай бұрын
I'm sorry about the silly question, if I use this script in a separated python nodule and call if with other documents, it will mix the sources of documents or this instance of the vector db will live only in runtime?
@harshavardhan70979 ай бұрын
Can i do it on jupyter notebook rather then using colab
@saurabhagarwal925310 ай бұрын
that was very helpful. how can I add more PDFs to the knowledge base?
@katemariageorge7396
3 ай бұрын
were you able to do it?
@we-hb4ni Жыл бұрын
Is there a limit to the number of PDF chunks you can add to the vector DB?
@LiamOttley
Жыл бұрын
Not necessarily, if you cram it full of thousands of chunks I'd assume the recall just gets slower and slower and uses more resources on your system. Best to setup different indexes for different information or use namespaces (Pinecone feature)