Search Your PDF App using Langchain, ChromaDB, and Open Source LLM: No OpenAI API (Runs on CPU)

Ғылым және технология

Welcome to this tutorial video where we introduce an innovative approach to searching your PDF application using the power of Langchain, ChromaDB, and Open Source LLM, all running on your CPU.
Langchain is a powerful library designed for generative AI tasks, providing a range of capabilities that enhance language generation and understanding.
ChromaDB, on the other hand, acts as a vector store and database, enabling us to store and retrieve vectors efficiently. By integrating ChromaDB into our search tool, we can create a robust and scalable solution for managing the vector representations of PDF documents, allowing for faster and more accurate searches.
Finally, we utilize an Open Source LLM (Language Model) to enable question-answering capabilities within our search tool. With the power of LLM, we can process user queries and extract relevant information from PDF documents, providing precise and context-aware answers to the user's questions.
The unique aspect of this tutorial is that we do not rely on the OpenAI API, meaning you can run this entire system on your CPU without the need for external services. This ensures greater control, privacy, and accessibility for your PDF search needs.
Join us in this tutorial video as we guide you through the process of building your own PDF search tool using Langchain, ChromaDB, and Open Source LLM. Discover how to harness the potential of these technologies to create a powerful and efficient search system tailored to your requirements. Let's unlock the potential of your PDF application and revolutionize the way you search for information.
AI Anytime's GitHub: github.com/AIAnytime
LaMiNi LM Model: huggingface.co/MBZUAI/LaMini-...
ChromaDB: www.trychroma.com/
Langchain: python.langchain.com/docs/get...
LLM Playlist: • Large Language Models
Join WhatsApp: chat.whatsapp.com/EDnAeyBL18G..
#langchain #python #ai

Пікірлер: 192

@jorgerios4091 Жыл бұрын
Awesome, this is what real people need, a free alternative of custom knowledge Q&A to OpenAI. I've tried privateGPT but it is too slow taking 2 minutes to provide answers while consuming 16-RAM and woring in the command prompt. This model looks much better, I'll be looking forward for the chatbot variant. Thank you!
@AIAnytime
Жыл бұрын
Thanks for your kind words! I agree with you,,,,,,,, The chatbot video will be released by tomorrow.
@sandilemfaziАй бұрын
You are a blessing my guy. Amazing, thanks for taking your time and teaching this. Truly Appreciate your efforts.
@ditchtech Жыл бұрын
Instructive and thorough, appreicate your efforts!
@AIAnytime
Жыл бұрын
Thank you Sir, please consider subscribing to the channel.
@master86967 ай бұрын
You are doing a superb job! These videos not only give knowledge, but motivates us to learn GenAI and start writing few pieces of code towards the use cases. Keep doing such use case driven videos, trust me that the community will appreciate and embrace folks like you. Keep rocking!
@AIAnytime
7 ай бұрын
So nice of you
@vivekraj933310 ай бұрын
I'm glad that I found your channel brother 🙌❤
@AIAnytime
10 ай бұрын
Welcome aboard! Thanks. Plz keep supporting.
5 ай бұрын
Very well explained video, and nice content. Congratulations!
@AIAnytime
5 ай бұрын
Thanks a lot!
@roberty.agyekumaddo60718 ай бұрын
Awesome tutorial. Is there a way of embedding this into a webpage? Also, is there a way to customize the appearance of the streamlit GUI.
@shruti280611 күн бұрын
13:30 why vectorstore? 17:18 safe tensors 20:12 project ideas 40:00 device_map 45:12 streamlit cache resource (decorator) 49:00 chaintype 1:01:47 token size (how it affects answers)
@taison00724 ай бұрын
Hello bro , since my PC low end is it possible to the the llm models on azure and access from there and make the same application and also deploy it online
@saumyajaiswal65856 ай бұрын
Thank you for the awesome video. With the source citation does it also give images from the pdf in the answer?
@ROKKor-hs8tg6 ай бұрын
This is without any subscription to any form. Does this code have a Google Colab page for testing? Once the libraries are downloaded, will the code work?
@fabsync6 ай бұрын
Super awesome tutorial! I wonder if you want to search pdfs in folders and subfolders.. what would be the code for that?
@khalidal-reemi33618 ай бұрын
Thanks alot I learned alot. I will try doiing this tutorial.
@AIAnytime
8 ай бұрын
Glad it was helpful!
@sneharoy356611 ай бұрын
Superb video. So easy to follow...
@AIAnytime
11 ай бұрын
Thanks a lot 😊
@geojames42369 ай бұрын
Awesome video... can you pls suggest any model for querying the Portuguese PDF document ?
@PavanKumar-yk5mq6 ай бұрын
How to deploy this RAG model in aws. I mean what services we can use to deploy other than EC2?
@FootyFunniesSS11 ай бұрын
Great work man.. this really helps
@AIAnytime
11 ай бұрын
Glad it helped
@FootyFunniesSS
11 ай бұрын
@@AIAnytime getting error - ValueError: weight is on the meta device, we need a `value` to put in on cpu.
5 ай бұрын
Is it possible to change "all-MiniLM-L6-v2" per Watson LLM or any other paid LLM?
@avijit_barua11 ай бұрын
Too much great working
@_yurisales3 ай бұрын
How can I make my streamlit + chromadb application faster? I'm loading locally 30 pdf files and when I run the application, It takes like 40 minutes to load the documents and, after, it loads the streamlit interface. Is there a way where I can reduce this time? Is there a way to work with multithreading or multiparallelism in chromadb?
@mohamedkeddache42024 ай бұрын
it happened to me every time 😭 i follow the video and do everything right but i still have problems i install python 3.10 then create a new environment and install the requirements then i downloaded the model. i think the error are from incompatible version ? need help please.
@ukcp26526 күн бұрын
How to handle and maintain chroma db for multiple user requests, and how we know which directory belongs to which user request in 2-tier architecture
@ilaydelrey31227 ай бұрын
Thank you for putting this tutorial together. It would be great if you could also include the versions of the packages you use in your requirements.txt because the packages change so fast and many things dont work anymore
@253_r.asidharth8
7 ай бұрын
Did u manage to run the project??
@AIAnytime
7 ай бұрын
Let me update the GitHub repo with the version. Thanks
@ilaydelrey3122
7 ай бұрын
@@253_r.asidharth8 no not yet due to newer package version
@adityapatel_00
7 ай бұрын
@@AIAnytimeMay I know when will you update the requirements.txt file with versions you used?
@pratikchatterjee599211 ай бұрын
This is great! Thanks a lot! I love the way you are explaining every bits and pieces. I am facing an error. Till the 1st question the app is working but when ever I ask the 2nd question getting the below error NotImplementedError: Cannot copy out of meta tensor; no data! Any idea?
@AIAnytime
11 ай бұрын
Thanks for your comments. Probably, you don't have much compute power. What's your laptop specs? And device_map is auto, CPU, or CUDA?
@qvenmisakais3 ай бұрын
Hi, is a perfecto tutorial, great working!. I have a question. Even though I ask questions in Spanish, he answers me in English. Where can you define the language?
@anannyachamat63664 ай бұрын
Hi Sir, can you please tell me if the Text Summarization using LaMiniT5 248M and This Search your PDF using LaMiniT5 738M can be integrated and made into one single project??? PLS Answer sir.
@user-qi4jw1lf9i7 ай бұрын
PLEASE TELL ME THE SYSTEM CONFUGURATION REQUIRED FOR THIS.... I HAVE DONE BUT SHOWING Load_weight proeblem
@dchuguashvili10 ай бұрын
Is it possible to deploy a chatbot that has been fine-tuned using a custom knowledge base and the Llama2 framework on a live production website? My plan is to fine-tune the chatbot with data derived from 100 pages of PDF documents. The aim is for the chatbot to interact with online users and generate responses based on this material. If the chatbot is anticipated to engage with approximately 2,000 users per month and accommodate at least 20 users simultaneously, could you offer a rough estimate of the projected costs?
@junaidiqbal4104
9 ай бұрын
hi, do you get any idea about that, i hope you will answer it
@jilanikashif11 ай бұрын
Hi, this is great tutorial and really helped me, shifting from Machine Learning to Generative AI is really amazing. It would be great if you create app for the same using Flask and Docker
@AIAnytime
11 ай бұрын
I have an app where i have containerised the app. Plz watch those 2 videos as well.
@jilanikashif
11 ай бұрын
@@AIAnytime Could you please share the link. Thanks for quick reply
@BharatVarsh47
9 ай бұрын
can u share link please? @@AIAnytime
@vishusupersonic27089 ай бұрын
sir i made an offload folder , because it asked me to , but when i do 1 query it is making 2.5 gb something files in the offload folder , so how to solve it , please help
@RahulGupta-ub1op3 ай бұрын
What if we ask a question that is not from PDF?
@vrynstudios9 ай бұрын
What will it cost to host it in a server? Suppose I have 1000 users daily and use it how much would I need to pay for such PDF searching feature? *Please reply* . I am a noob on AI server side hosting.
@raghu077011 ай бұрын
I think if we ask general questions lik “who is naredra modi?” It will answer out of the pdf files
@MikelBaghdasarian11 ай бұрын
Really Interesting, I was wondering how to mix it with the oobabooga repo, add some options like load various PDFs, load CSV, XLM, and other type of documents (power point, .txt, and others) with some DDBB behind for user access... that would be awesome!
@AIAnytime
11 ай бұрын
Cool idea! Maybe I can see if I can create a video soon.
@MikelBaghdasarian
11 ай бұрын
Great! If you need a concept of use let me know, I already tryied and mixed both git repost and they are working fine, the best is to have access to add LORA trained on obabooga to be used under langchain!@@AIAnytime
@keeperofthelight96814 ай бұрын
Doesnt work getting crazzy error with embdeggins both with huggingfqce a d sentence transformer
@JahangeerRathore9 ай бұрын
Awosome but i want to use Lamini model online not offline becuse my RAM is 8 GB it crashed after running how to achieve that thanks in advance...
@associatedbiblestudentsofs53088 ай бұрын
'Chromadb' is not compatible with Python 3.11. I'm trying to find a work around, but a very well-developed course. Thank you.
@Collegemitra-Official
7 ай бұрын
Did you find the solution?
@user-py8qx6th8p8 ай бұрын
I have few questions. What other models besides lamini can I use? I am trying to use llama2 or bloom. Also, what API should I use if I dont want to download the llm
@AIAnytime
8 ай бұрын
Look at my latest video.... Using Zephyr and Mistral LLMs.
@sandedom339 Жыл бұрын
Very nice! can you load multiple PDF files for Q&A?
@AIAnytime
Жыл бұрын
Yes of course! Make sure your machine has enough compute power for inference. You can invest multiple files to create embeddings on any machine.
@raghu077011 ай бұрын
Many people’s build like this but there is no solution to restrict the model to answer only for pdf files
@dingowhiz481 Жыл бұрын
Excellent video ! exactly what I need for POC. I realize that Chroma installation is a challenge with 'pip install chroma-migrate' and run `chroma-migrate` commands which crashed my Linux. Do you know of LTS version of Chroma?
@AIAnytime
Жыл бұрын
Thank you for your comment! Can you let me know your python version? Can you try python 3.10 and then do a pip install chromadb?
@dingowhiz481
Жыл бұрын
@@AIAnytime I'm running Python 3.10.6 on Ubuntu 22.04.2
@nandanhegde344411 ай бұрын
great video
@mlloving8 ай бұрын
Awesome demo. Would you please let me know where to download the repo of this demo. I did not find it on your github. Thanks.
@AIAnytime
8 ай бұрын
It's on my GitHub. Please check the repositories.
@anuyogesh89795 ай бұрын
I am getting this error - " AttributeError: 'Collection' object has no attribute '__pydantic_extra__' " why?
@vivekpatel2736Ай бұрын
@AIAnytime can we get the images also from pdf in answer ?
@I3lor11 ай бұрын
great video, if i replace the checkpoint with any other model (eg. google/mt5), will the project still work as intended?
@AIAnytime
11 ай бұрын
Yes, absolutely! It should work if you have a decent machine that can load the model in memory.
@I3lor
11 ай бұрын
@@AIAnytime thank you, you have been very helpful
@John-jx4ho11 ай бұрын
Awesome!
@AIAnytime
11 ай бұрын
Thank you! Cheers!
@mohitkapoor437411 ай бұрын
Very nice tutorial. It helped me solve a issue I was working on. Could you please help as how can we reduce the latency of answers from ChatBot? Also what if the pdf of more than 100 pages?
@AIAnytime
11 ай бұрын
Thanks for the comment. Infrastructure is the key and ofcourse some tweaking with the preprocessing and algorithms. Get a better compute power and you can see the improvement.
@mohitkapoor4374
11 ай бұрын
@@AIAnytime Thank you so much. Is there any way I can connect with you or any tutorial I should follow to scale things up after referring your tutorial?
@stephennfernandes11 ай бұрын
Hey great work man this really helps, could please explain in brief or if possible make a video about how vecrorDBs work internally? What are they exactly, word embeddings like starspace fasttext or sentence transformer embeddings over similarity search. How does this technology like langchain llama_index work internally
@AIAnytime
11 ай бұрын
Hi Stephen, thanks for your comment. Maybe I can try doing that. But i feel there are many such videos available on KZread. But yes I can explain in simpler terms. My focus is to help my subscribers build projects in Generative AI... But stay tuned 🔜
@stephennfernandes
11 ай бұрын
@@AIAnytime thanks a ton
@mcodetsh18 күн бұрын
Many of the settings and imports have been deprecated and you will get many errors. I recommend not using this code but just learning the work flow and the thinking processes. Thank you still for this video.
@sarojapulipaka297225 күн бұрын
Can we also give large files as input file (1000 pages)
@truckfinanceaustralia1335 Жыл бұрын
great vid!
@AIAnytime
Жыл бұрын
Thank you.
@madhupatel670711 ай бұрын
Great explain! I did the same as you did in the video but got some errors so is there any way to reach you, Really need your help.
@AIAnytime
11 ай бұрын
How can I help you? My credentials are on about channel or on KZread banner.
@user-yd3zk4hb1o11 ай бұрын
Since the cromadb is updated the code throwing some errors related to chromadb , can you please update those codes and push it your repo
@AIAnytime
11 ай бұрын
Just a request, can you open a PR on GitHub repo? I will just merge that PR? Let me know... Just a few lines of code. They have migrated from duck db to sqlite.
@ashishanand42332 ай бұрын
how to solve this AttributeError: 'Client' object has no attribute 'chroma_api_impl'?
@DeviGoneMad4 ай бұрын
can you mention the version of python you are using here?
@yanayana-cm5qgАй бұрын
what python version are you using?
@deepudeepak139010 ай бұрын
Can i use falcon 40b in the place that llm u r using???
@AIAnytime
10 ай бұрын
Ofcourse you can. Make sure you have enough compute power.
@mainakmukhrjee63288 ай бұрын
Hello sir , can you please help me with an error:ModuleNotFoundError: No module named 'langchain' ? I have installed langchain and have checked it pip show langchain
@pagadishyam704910 ай бұрын
Hi, your videos are really very impressive. I am trying to recreate this but receiving the below error when executing ingest.py error Message: " duckdb.InvalidInputException: Invalid Input Error: Required module 'pandas.core.arrays.arrow.dtype' failed to import, due to the following Python exception: ModuleNotFoundError: No module named 'pandas.core.arrays.arrow.dtype' "
@AIAnytime
10 ай бұрын
Can you look at the chroma db version you are using? Chroma db has recently migrated from duckdb to Sqlite. In that case, you need to make changes in Constants.py... do you mind looking at GitHub issues of this repo on my GitHub?
@pagadishyam7049
10 ай бұрын
@@AIAnytime can i use old version of cromadb, will it work?
@pagadishyam7049
10 ай бұрын
@@AIAnytime downgrade the version of cromadb and pandas: chromadb==0.3.26 , pandas==2.0.3 it worked for me, hope this helps others.
@ehteshamnehal702411 ай бұрын
Hi. Sometime while running the model I'm getting the following error -Cannot copy out of meta tensor; no data! Any Idea how to solve this? Also I'm using faiss db instead of chroma. Thanks.
@AIAnytime
11 ай бұрын
Can you check if you are offloading some weights to CPU? You using cuda or cpu as device map? Or auto?
@ehteshamnehal7024
11 ай бұрын
@@AIAnytime I'm not off loading any weights to CPU. Also I'm using auto.
@talhaabdulqayyum19310 ай бұрын
I am getting this error NotImplementedError: Cannot copy out of meta tensor; no data! Any work arounds?
@rngwrngw6612
5 ай бұрын
Did you find a solution?
@nandanhegde344411 ай бұрын
how much gb is LaMini file??
@user-zj7cp8dg9f3 ай бұрын
how can we do this with knowledge graph ?
@overrideguilarte Жыл бұрын
Is there any model in Spanish similar to this one that can be integrated?
@AIAnytime
Жыл бұрын
Yes you can try something like 'GPT-2 SMALL SPANISH '... Explore on Huggingface models. Please subscribe to the channel if this helps. Thanks
@overrideguilarte
Жыл бұрын
@@AIAnytime thanks
@abhishekpasalkar66808 ай бұрын
showing this error Even after updating chroma and also migrating it "ValueError: You are using a deprecated configuration of Chroma."
@ritamchatterjee8785
3 ай бұрын
ya getting the same error
@adityapatel_007 ай бұрын
Hello Brother, Appreciate your work. But can you please update the requirements.txt with the version numbers, the versions are changed and we are facing problems running it. Can you hurry? Thank you.
@ShadyPencil
6 ай бұрын
try this... pydantic==1.10.13 chromadb==0.3.26 langchain==0.0.267 streamlit==1.25.0 transformers==4.31.0 torch==2.0.1 einops==0.6.1 bitsandbytes==0.41.1 accelerate==0.21.0 pdfminer.six==20221105 beautifulsoup4==4.12.2 sentence-transformers duckdb==0.7.1 sentencepiece==0.1.99 six==1.16.0 requests==2.31.0 uvicorn==0.18.3 torchvision==0.15.2 streamlit-chat
@bhautikin11 ай бұрын
Does it means I only add data into the vectordb for new PDF, and no need to train again?
@AIAnytime
11 ай бұрын
You have to create embeddings for the new files.
@bhautikin
11 ай бұрын
@AIAnytime got it. Thanks
@shivamthaman70814 ай бұрын
Please consider investing in a microphone thst will enhance the quality of audio in the videos
@AIAnytime
4 ай бұрын
Sure sir
@leehenriques66614 ай бұрын
AttributeError: chroma_api_impl can you help me fix this please
@tapanpati94529 ай бұрын
How to connect you?The whatsapp link does not working...
@deepak12915 ай бұрын
have anyone ran it with 8GB RAM (CPU) windows laptop?
@nayankarpe496110 ай бұрын
Which extension are you using for auto completion of code??
@AIAnytime
10 ай бұрын
Tabnine.
@nayankarpe4961
10 ай бұрын
@@AIAnytime Thank you ✌️
@mythzing711 ай бұрын
Getting this error when i ran the code. Searched online, couldn't find a solution. Could you please help? NotImplementedError: Cannot copy out of meta tensor; no data!
@AIAnytime
11 ай бұрын
Are you running on CPU or CUDA?
@mythzing7
11 ай бұрын
@@AIAnytime cpu
@MrTabishMehdi10 ай бұрын
Can you please update the version of all libraries. I am getting error in Chromadb because of version. Kindly do the needful
@RameshPatil28592
3 ай бұрын
Hi did you update the libraries and resolved chromadb error?
@mohlabo39173 ай бұрын
hi cannot install chromadb --error failed
@yashsrivastava487810 ай бұрын
could you please do a video on same LMQL,Langchain and Chainlit together which takes multiple files of different format please
@AIAnytime
10 ай бұрын
Sure Yash. Soon. Thanks for the idea.
@yashsrivastava4878
10 ай бұрын
@@AIAnytime thank you sir 😊
@yashsrivastava4878
10 ай бұрын
@@AIAnytime sir please make video on this as soon as you can 🙏
@AIAnytime
10 ай бұрын
By Sunday. Currently in a family emergency. Apologies for the delay!
@yashsrivastava4878
10 ай бұрын
@@AIAnytime ok sir 🙂
@AMITSINGH-hu4es7 ай бұрын
PDF file of resume is not output.
@akshay_raut Жыл бұрын
Great tutorial, waiting for the chatbot ... and wp link is not working ,can you please share group link again ? Thank you!
@AIAnytime
Жыл бұрын
Thanks Akshay! Please find it here: chat.whatsapp.com/EDnAeyBL18GB9xxcnyTW3Y The chatbot video will be posted by tomorrow.
@sscoder170
Ай бұрын
@@AIAnytime can you share the link of that chatbot video here please regarding with this.
@Hope1GamingCSGODota2more6 ай бұрын
what about supabase for vector store?
@AIAnytime
6 ай бұрын
That's a good choice.
@rajesh1906Ай бұрын
can we run with 8gb RAM ?
@foodfashionmasti82979 ай бұрын
Chroma db error your using a deprecated configuration of chromo, some migrate
@abhishekpasalkar6680
8 ай бұрын
same error with me how come do you solve this
@epictetus__8 ай бұрын
Bookmark: 21:00
@adithyas64284 ай бұрын
i am getting an error NotImplementedError: Cannot copy out of meta tensor; no data! did anyone face this error and is there any solution
@DeviGoneMad
4 ай бұрын
yep facing the same! did u fixt it?
@adithyas6428
Ай бұрын
@@DeviGoneMad no could not find a solution
@abhisycvirat3 ай бұрын
The kid smoking in the background distracted me 😂
@mort-ai Жыл бұрын
does this work on any language?
@AIAnytime
Жыл бұрын
Thanks for your comment! No it doesn't work for any language.
@user-qi4jw1lf9i7 ай бұрын
giving error ModuleNotFoundError: No module named 'pandas.core.arrays.arrow.dtype' please correct this code in ingest.py file please hlp
@rngwrngw6612
5 ай бұрын
Did you find a solution?
@tech4tomorrow
5 ай бұрын
Yes I got it..
@moralstorieskids3884 Жыл бұрын
Code please, gone through your git hub, could'nt able to find
@AIAnytime
Жыл бұрын
Please find it here: github.com/AIAnytime/Search-Your-PDF-App . Can you please subscribe to the channel?
@deepjyotibaishya75766 ай бұрын
This repo link please
@JonathanLyon2 ай бұрын
are you available for hire?
@Rider-jn6zh4 ай бұрын
Hello brother, Can you please upload videos on how to evaluate llm model and which evaluation metrics can be used for specific usecase. As I am getting this question in every interview and not able to answer itt
@user-qi4jw1lf9i7 ай бұрын
please make it 10 millions pages for lawyer use case
@amosmaru254211 ай бұрын
Really awsome. How do I reach you?
@AIAnytime
11 ай бұрын
Thank you! Look at KZread Banner on my channel. All social media are listed. Or about the channel section.
@manofsteel6173Ай бұрын
where is code link??
@Anna007228 ай бұрын
Whatsapp link not working
@dr.aravindacvnmamit37705 ай бұрын
ValueError: You are using a deprecated configuration of Chroma.
@samarth-joseph
5 ай бұрын
Downgrade chromadb version pip uninstall chromadb pip install chromadb==0.3.29
@083-cse-sameerkhan39 ай бұрын
sir will it work on 8GB RAM
@AIAnytime
9 ай бұрын
Difficult but it will for a few questions
@shivamkumar-qp1jm Жыл бұрын
When I use any app first I see it suffers from the hallucinations or not but this is good no hallucinations
@AIAnytime
Жыл бұрын
I agree with you! Thank you.
@Tenly2009
Жыл бұрын
To my knowledge, all large language models are susceptible to hallucination. Your methodology seems flawed.
@AIAnytime
Жыл бұрын
Hi Tenly, thanks for your message. I have used a language model. It's not that large. It generalises well on the embeddings that we create! When you use LLMs which are too large, they don't generalise well on the documents that you so it hallucinates from the base models. Can you try this and let me know if you get high hallucinations rate? You can't remove that character but LaMiNi really helps on the hallucinations rate.
@Tenly2009
Жыл бұрын
@@AIAnytime I’m only 13 minutes into your video(and still watching), but my comment was directed to the person who said they “test for hallucinations” before anything else - but I can’t imagine what kind of tests he could perform to conclude that a model doesn’t hallucinate.
@QuitandoCaretas3 ай бұрын
muy pero que muy mal explicado... abres la IDE con muchas cosas que no sabemos de donde salen.
@ksreenivas39338 ай бұрын
Half knowledge