Create Your Own ChatGPT with PDF Data in 5 Minutes (LangChain Tutorial)
Тәжірибелік нұсқаулар және стиль
📚 My Free Resource Hub & Skool Community: bit.ly/3uRIRB3 (Check “KZread Resources” tab for any mentioned resources!)
🤝 Need AI Solutions Built? Work with me: bit.ly/3K3L4gN
📈 Find out how we help industry experts sign their first 5 AI Agency clients, guaranteed: bit.ly/skoolmain
In this video I show you how to train ChatGPT on your own data in 5 minutes using LangChain so you can chat with your PDFs! This is a super beginner friendly guide that explains how these custom knowledge chatbots can be created in a few minutes using LangChain. This is similar to tools like ChatPDF which allow you to chat to your docs (chatpdf.com/).
If you've ever wanted to know how to chat with your PDFs or train ChatGPT on your own data, this is the video for you! Code available below.
Create a copy of my notebook (code):
colab.research.google.com/dri...
Timestamps:
0:00 - What we're building
1:10 - System Explained
2:48 - Creating the chatbot
8:18 - Steal my code!
Пікірлер: 309
Leave your questions below! 😎 📚 My Free Skool Community: bit.ly/3uRIRB3 🤝 Work With Me: www.morningside.ai/ 📈 My AI Agency Accelerator: bit.ly/3wxLubP
Golden! Clear, concise info and a notebook! If it's too fast for some viewers, I'll remind that they can always show down the replay speed.
👏👏 Hey Liam, your five-minute tutorial is fantastic! Kudos and thanks for putting the effort to produce it. Your app is exactly what any knowledge worker is craving for: We all have gigabytes of pdf files in some folder named "READ", "TO READ" or "__TO READ" (so it stays on top of the root :), but never get to it (probably distracted by all these tutorials to become more productive we love to watch). A bot that can read that stuff for us, so we can continue to wing it is a true godsend. :D
Thought it would be just another video on the subject, but you summarize in an awesome way! Great vid! Congrats
This was definitely one of your better videos. You explained Langchain well and I’m glad you used the colab notebook instead of Jupyter or repl.
thank you for time, effort and generosity, I wish very good things for you.
Thats a fantastic video and to the point and thanks for the code as well
You're awesome, Liam !!
Excellent! Thank you for your hard work to put these together.
@LiamOttley
Жыл бұрын
My pleasure! Thanks for watching
@AlbyTheMovieCreator
Жыл бұрын
This video was copied from the beginning to the end from the channel Prompt Engineering
@stefano94103
Жыл бұрын
@@AlbyTheMovieCreator Oh wow I totally didn't know that. Thanks for the heads up! SMH😒
Awesome tutorial. Cheers Liam
Cheers, this is a brilliant video. Looking forward to making a bespoke AI.
Wonderful tutorial. Thank you!
@LiamOttley
Жыл бұрын
No worries 🤙🏼
Appreciate your hustle bro
Thanks Liam ... neat and fast as always; could you post another similar video doing the same thing with Llama index pls. I thought that was easier.
Thank you for your excellent sharing. This is great guidance, and I hope you can continue to share more! If there's anything I can do, please let me know~
Liam, this is a great tutorial, thank you. What I really liked was the explanation of what is happening behind the scenes - anyone (even a non-developer) like me - can cut and paste the code but knowing what the commands are doing is super helpful. The explanations in the Colab are great and I took your advice and stole your code. The chatbot was up and running in a few hours (remember: non-developer) but that included building a separate UI. Great work, thank you
@aradinac
Жыл бұрын
can i ask whether you paod for the OPENAI KEY OR YOU DID IT WITH THE FREE TRAIL? Cuz am encountering this error RateLimitError: You exceeded your current quota, please check your plan and billing details.
@AndrewSheves
Жыл бұрын
@@aradinac I used the paid for openAi key
@csss142
7 ай бұрын
@@AndrewSheves which one did you buy?
@miguelmunoz4135
6 ай бұрын
@@aradinac I have the same error because I have the account not paid, if you found another solution, pls let us know
i there! As a fellow filmmaker, I find the concept of regenerative agents fascinating. I'm curious, what specific types of agents are you interested in exploring in your video? Additionally, have you thought about incorporating some real-world examples of sim city-like models, such as the ones developed by Stanford, to help illustrate the concept to your audience? Looking forward to hearing more about your project! George Anton
Freaking Great Content! Keep Rocking 💯
Great job... will run this on my writings/ book collection and my code snippets, and build an awesome, MeKnowledgeBase 😎
Awesome work
Liam your content is unreal Some of the best I've seen so far This is hard knowledge You are brilliant What do you mean by 512 tokens on every chunk? Characters? I'll be waiting for a detailed masterclass Vicente
Hi Liam, great video. I do have a question, from the following code, i notice that we don't have to specifically turn the "query" into embeddings, before it performs a search against the vector db? Is it because the function "similiary_serach" internally calls the openapi embedding to perform words embeddings? query = "Who created transformers?" docs = db.similarity_search(query)
Straightforward and concise! Great explanation. How do you extract the exact page number where the answer was found?
Hey Liam! Awesome...could you do one that scrapes data from blog/website for embedded chatbot for a blog?
Thank you, keep going.
Thanks for the great video! One question: which OpenAI model is used to retrieve the answer? Is it gpt-35-turbo or ada or...? Where is it defined?
Great video! Would be cool to create a video similar with Apify and LangChain.
This is great Liam, thank you for sharing, what's the simple automated way to deploy this code to a basic online application/chat page
Very Good👍
Thank you, I've learned a lot from your channel. I'm curious about the differences between the llama index and the langchain. Maybe I'm still a beginner in AI and don't quite understand.
@chrispac6264
Жыл бұрын
Ask ChatGPT4
Thanks a lot man, been trying to get this to work via other ways for days. This was so easy, great tutorial. How would you transfer something like this to a user friendly ux/ui?
@chrispac6264
Жыл бұрын
Ask ChatGPT4
Can you feed it multiple pdf at the same time like a group of 300 or would you have to run each line individually.
Thank you it worked perfectly despite generating an error on the pip install. ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. yfinance 0.2.18 requires beautifulsoup4>=4.11.1, but you have beautifulsoup4 4.8.2 which is incompatible.
Those biceps too! 💪
awesome bro
I would love to see a video which helps me to deploy such a chatbot (created on colab) on a webpage.
The Best tool for this is kzread.info/dash/bejne/lJd_ma6llKWZlM4.html I like some of the transitions, but sometimes they're a bit too much and are seemingly random. Since we use these persistent elements that transition across pages to indicate some kind of relationship between the previous and the next states, some of your transitions confuse me because I can't immediately see what the relationship is. For example 1:23 of the selectable tiles (which weren't selected) transition into being two switches... does that mean anything? are they related in some way? I see this as random and a bad use of the design language. However, at 3:14 I like the transition from switches to the ticks on a paper, that makes sense to me. Epic presentation tho
Will you be sharing your Marcus Aurelius database u created previously? I was really looking forward to that
Amazing content. Thank you! Is there a way to do this with PDFs that have graphics and images?
great tutorial! I have hundreds of research papers in pdf format. Can I use this approach to build a vector db and then chat with chatgpt? Is there a limit to the size of db? any pitfall to avoid?thanks!
Amazing Video. I have a question: Can Your notebook (code) run with muti file pdf?
This is amazing! Can you teach us mindai?
Hi! I just wanted to ask what are the licenses used in this project? Are they commercial-friendly?
is there way to also store the questions from the user and the answers to them for monitoring, data analysis and other ideas?
Thanks
You got my mail buddy GJ
As a beginner coding their first ever plug-in, how long would it take to develop a high quality plug-in?
great video! help me to complete me knowlege about best praticies in prompt!
Can you also use it to write content, e.g. web articles, based on the PDF or PDFs you have uploaded?
Thanks, very good content. Just a question to understand the market better: did I misinterpret your hourly rate at $997/45 mins?
Excellent
So essentially you calculate semantic similarity of the stored vectors and the asked question, then provide the 4 most similar vectors as context in the prompt?
will need a video on how to do this for multiple pdfs
Hia Liam! Which version of gpt does the chatbot use? Can I use it with gpt4?
Can you suggest alternatives to OpenAI in terms of embeddings and llm? They are too expensive their APIs
What is a good way to split text in a textbook pdf because on one page it has 2 columns, text on the left and right side?
Can you explain how we could use other llms than openai, for example can we use mosaic mpt-7b ?
It’s convenient because I just completed a Data Analysis course via IBM, and Vanderbilt Promp Engineering course. I created my first Smart Bot for my Dad’s website on Sunday. I’d like to dump RFP contractor documents to easily take the 88 pages to question parts of a bid
Hi Liam, I am getting 'authentication Error' when running 2. section of the code "Embed text and store embeddings" . I have not change anything yet just running it as is. Any suggestion?
What solution can dynamically add or extract database for an LLM? Like your company information that can be accessible by employees
Brother can you make video on how to use autogpt for beginners 😊
Great stuff. Is there any good model to perform the embeddings calculation (and then semantic search) on my server in oppose to use OpenAI API?
great video!, it is possible to add more than 1 pdf with that code ?, will be possible to provide a code for multiple pdf ? thank you
Hello. Thanks for a great video. But i have financial statement pdf file and it contains tables in it. How can i achieve besy results out of it? Any suggestions or help would be more helpful. Thanks😊
great tutorial , can it be modified to support multiple pdfs ?
Great Work! Can we do this with a local or a smaller language model ?
Cool AF!
@LiamOttley
Жыл бұрын
💪🏼
Could you do it using Gradio interface and importing openai module?
What about using any other open source LLM instead of GPT? thank you!
Hi, THANK YOU for sharing your knowledge. Could please let me know how many PDF can we train using this technique and does this LLM remember what PDFs it has been trained on or do we have to train the LLM at before running the query?
Can the chatbot incorporate website links or app deeplink as the chat results?
Liam, if there an option to make the assistant always use the data that has been uploaded to knowledgebase? It doesn't read the KB files every time and uses the links that even doesn't exist
Hey @LiamOttley - I copied your code lab project, however, on the very 1st, I bumped into "ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. yfinance 0.2.21 requires beautifulsoup4>=4.11.1, but you have beautifulsoup4 4.8.2 which is incompatible.". Any thoughts?
Thanks for the super video. I have a question: in the overview you show that ChatGPT3.5 is used, or that the query is last processed by 3.5. But in the code I can't find any reference to it. Where is my mistake?
@LiamOttley
Жыл бұрын
The default LLM for Langchains "OpenAI()" is text-davinci-003 and "ChatOpenAI()" is gpt-3.5-turbo I believe
Out of complete ignorance, is Langchaining the best method currently available to increase the perform of our LLMs Chatbots? If not, what is it or what other methods are out there that I may be missing. Thanks for answering.
Hey Liam @ 03.22 you said we can upload pdf data by entering the pdf name. But what if we have more pdf, life for example I have 5 pdf?
Where does he describe the model to use for output? Is he using Da Vinci 003?
I cant wait until we can expand this to all documents. I assume that is what Microsoft 365 Copilot will do.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. yfinance 0.2.18 requires beautifulsoup4>=4.11.1, but you have beautifulsoup4 4.8.2 which is incompatible.
Can you explain this same example using expressJS? Coz no other tutorial hasn't used expressJS to illustrate this example
good show chap, can i use chatgpt 3.5?
Thank you very much for this great video!!! One question. On the part of Create chat bot with chat memory (OPTIONAL), I received the following message "DeprecationWarning: on_submit is deprecated. Instead, set the .continuous_update attribute to False and observe the value changing with: mywidget.observe(callback, 'value'). input_box.on_submit(on_submit)" Why? Would you be able to fix it?
@ranjitherusa7139
Жыл бұрын
I am having same issue Is the optional segment should be on same py program?
I'd love to figure out how to do this.
So when I store text in a Vector DB, this method retrieves the raw text to input to the LLM again? Is this the Ada encoder?
i am noob here. Is it possible to embed it on a site? If I embed, is it standalone? or still it uses GPT API calls and costs?
Yo I’ve made plugins but don’t know how to test it so can you give some ideas .(I don’t have access to the plugins yet.I’m in the waitlist)
Is there an alternative to open Ai embedding engine which is competitive and free?
hi and thanks for your work. i am totally new at this but i would like to be able to chat with my whole archive, like a second brain. is this possible with this method?
Do i pay openai api tokens when using the code or i use gpt2 local model.
Can this be expanded to read from multiple pdf's ... Pt can this be fine by combining pdf's into a single file?
This is great! But how much does it cost
Great video! I was wondering why is it a private chatbot when you're using openAI key and sending the information to LLM GPT-3.5? How can you secure sensitive data with your method? Thank you sharing your knowledge.
@lubeckable
6 ай бұрын
Using and hosting by yourself a custom open source LLM like llama or mistral
How to use this in my business or website? How to embed for example in a better ui
Hi, sorry, there is an issue in colab, first script: ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. pydrive2 1.6.3 requires six>=1.13.0, but you have six 1.12.0 which is incompatible. yfinance 0.2.36 requires beautifulsoup4>=4.11.1, but you have beautifulsoup4 4.8.2 which is incompatible.. By the way, do you plan to make an adaptation for Mistral AI?
This is great. Is it possible to retrieve images from the PDF? I have a PDF with many graphics that help understand the content. Do you have any ideas as to how I can provide images as part of the conversation?
@MCroppered
Жыл бұрын
What type of graphics are you talking about?
@quinnherden
Жыл бұрын
You could leverage Lang Chain's agent feature set to use computer vision to analyze your images.
@GiovaDuarte
Жыл бұрын
@@MCroppered the PDF I have has images embedded and I was wondering if how I could recall these during a conversation
@GiovaDuarte
Жыл бұрын
@@quinnherden I will research this. Thanks!
@gaben7
Жыл бұрын
@@GiovaDuarte if you figure out how to bring images along with the conversation, let us know how please
I've written a prompt for GPT-4 that I use with chatGPT in Macromancy formatting to transform it into a legal assistant, and the results have been stellar. Is it possible to encode this prompt into the system you describe so that the bot operates with it in mind?
@kingarthur0407
Жыл бұрын
bump
I'm sorry about the silly question, if I use this script in a separated python nodule and call if with other documents, it will mix the sources of documents or this instance of the vector db will live only in runtime?
Can i do it on jupyter notebook rather then using colab
that was very helpful. how can I add more PDFs to the knowledge base?
@katemariageorge7396
3 ай бұрын
were you able to do it?
Is there a limit to the number of PDF chunks you can add to the vector DB?
@LiamOttley
Жыл бұрын
Not necessarily, if you cram it full of thousands of chunks I'd assume the recall just gets slower and slower and uses more resources on your system. Best to setup different indexes for different information or use namespaces (Pinecone feature)