Prompt Engineering
Күн бұрын
166,041
1

Llama-2 with LocalGPT: Chat with YOUR Documents

Ғылым және технология

Пікірлер: 233

@engineerprompt10 ай бұрын
Want to connect? 💼Consulting: calendly.com/engineerprompt/consulting-call 🦾 Discord: discord.com/invite/t4eYQRUcXB ☕ Buy me a Coffee: ko-fi.com/promptengineering |🔴 Join Patreon: Patreon.com/PromptEngineering
@electricsheep2305 Жыл бұрын
This is the most important KZread video I have ever watched. Thank you and all the contributors. Looking forward to the prompt template video!
@brianhauk8136 Жыл бұрын
I look forward to seeing your different prompt templates based on the chosen model. 🙂
@simonmoyajimenez2045 Жыл бұрын
I found this video really compelling. I believe it would be incredibly fascinating to leverage a CSV connection to answer data-specific questions. It reminds me of the article I read titled 'Talk To Your CSV: How To Visualize Your Data With Langchain And Streamlit.
@isajoha99629 ай бұрын
Thanks for the overview explaining the differences between the models in the context. 👍
@Swanidhi9 ай бұрын
Great content! I look forward to your videos. Pleae also create a video to guide people who are new to the world of deep learning so that they know what to learn from where and things they need to learn to start contributing to projects such as localGPT. Also, another video on how to determine which quantized model can be run efficiently on a local system by specifying parameters for this assessment. Thanks again!
@adriantang5811 Жыл бұрын
Thank you for your sharing!
@awu878 Жыл бұрын
really like your video!! Looking forward for the video about the local GPT api😃
@mauryaanurag6622
7 ай бұрын
On which system did u did this?
@wtfisthishandlebs11 ай бұрын
Great work! Cheers
@BigAura Жыл бұрын
Great job! Thank you!!
@deepalisharma1327 Жыл бұрын
Very nicely explained, thank you!!
@mauryaanurag6622
7 ай бұрын
On which system did u did this?
@RabyRoah Жыл бұрын
Awesome video! Would next love to see a video on how to install LLaMA2 + LocalGPT with GUI in the cloud (Azure). Thank you.
@REALTIES
11 ай бұрын
Waiting for this as well. Hope we get this soon. 🙂
@user-uo5po4vg1m Жыл бұрын
Ultimate, one of the best video in youtube.
@mauryaanurag6622
7 ай бұрын
On which system did u did this?
@zohaibsiddiqui1420 Жыл бұрын
It would be great if these massive beast models could run on low-end machines so that we can increase contributions
@absar66 Жыл бұрын
Many thanks..carry on the good work mate 👍
@engineerprompt
Жыл бұрын
Thank you for kind words.
@caiyu53810 ай бұрын
Thank you for providing such great AI tool for use.
@synaestesia-bg3ew Жыл бұрын
Love you, you the best KZreadr Ai expert. I am learning so much because you don't just talk, you are making things work.
@engineerprompt
Жыл бұрын
Thanks for the kind words.
@synaestesia-bg3ew
Жыл бұрын
@@engineerprompt No it's not just kind words, i really admire and recognize when someone have a true passion and put the effort. Further more I really love your emphasis on localisation of Ai projects, I think it's the future,no one wants to be a blind slave of a central corporation's Ai, censoring and monitoring every words . Even Elon said it, Ai should be democratized. The only problem, we might not achieved it because all the odds and money are against it, but I believe it worth trying.
@SadeghShahmohammadi Жыл бұрын
Nice work. Very well done. When are you going to explain the API version?
@1989arrvind Жыл бұрын
Great👍
@MikeEbrahimi Жыл бұрын
A video with API access would be great
@nikow1060 Жыл бұрын
BTW if someone has issues setting up CUDA and conflict with bits and bytes: after checking CUDA in visual studio and checking that Cuda version matches torch requirements you can try the following for windows installation: pip install bitsandbytes-windows . Someone has provided a corrected bitsandbytes version for windows ... it worked for me after 24 hours of bitesandbytes errors compalining about Cuda installation
@richardjfinn13 Жыл бұрын
Worth pointing out - you got it when you asked to show sources, but the presidentail term limit is NOT in Article 2. It's in the 22nd Ammendment, passed in 1951 after FDR was elected to a 3rd term.
@Socrataclysm Жыл бұрын
This looks wonderful. Going to get this setup tonight. Anyone familiar with differences between privateGPT and LocalGPT? Seems like they give you mostly the same functionality.
@hish.b
11 ай бұрын
I don’t think privategpt has gpu support tbh.
@ypsehlig Жыл бұрын
Same question as shogun-c, looking for hardware spec recommendations
@chengqian573711 ай бұрын
hello, thanks for the video. However, what's the meaning of having chunk size as 1000, while the sentence embedding model can only take a maximum of 128 tokens at a time? I would suggest to reduce the chunk size to 128 if you prefer the sentence transformer.
@kamilnwa8020 Жыл бұрын
Awesome video. QQ: how to add multiple “cuda”? the original code specifies `device="cuda:0"`. How to modify this line to use 2 or more GPUs?
@ToMtheVth10 ай бұрын
Works quite well! Unfortunatelly performance on my M1 MB Pro is a bit of an issue. Ingested 30 Dokuments, Prompt eval time is 20 Minutes... I need better Hardware XD
@aketo8082 Жыл бұрын
Thank you for this video. One thing keeps me busy. How does the "intelligenz" work in GPT, LLM...? Because I can't see, that this model "understand" relationship, can identify locations and difference between three person with same first name. I know LLM is a database for the words. I use GTP4All, tested all available LLM's, but always the same problem. Also correction via chat are not possible, and so on. So I guess, I don't understand that "AI" right or miss some basic information. Also how to train that. Same problems with ChatGPT, Bing, Bard, etc. Thank you for any hint, links and suggestions.
@user-yr8sg8vc2m11 ай бұрын
great video helped a lot but there is an issue that when passing parameters with --show_sources and also asking the llama2 a question outside the data which was ingested , it provides answer for that and states an incorrect source document which has nothing to do with the actual question, and why provide all data which were used cant we directly get line number and page number with the document name only.
@kittyketan Жыл бұрын
Great Work !! shot for the moon!
@engineerprompt
Жыл бұрын
Thanks 😊
@intuitivej932710 ай бұрын
Hi thankful for you, I am experimenting llama2 13b ggml in my notebook. But a few days ago, I was realized that llama was not saved in my local but it was like.. i don't know.. snap shot...?I want to save the model and load it from my local path. I tried some codes but failed.. could you please guide me? Thank you again for your sharing ❤
@paulparadise605911 ай бұрын
It's a great app! Thank you! Suggestion, can you redo the requirements.txt to install torch with cuda support? It is irritating to have to uninstall torch and install it with conda and the cuda flag set. And then troubleshoot the issues that causes.
@user-hf3fu2xt2j Жыл бұрын
took someone this short to create a video about it)
@caiyu53810 ай бұрын
I have asked this question in your other video "This localgpt works greatly for my file, I use T4 GPU with 16GB cuda memory, it will take 2-4 minutes to answer my questions for a file with 4-5 pages. Is it expected to take so long to answer the question using T4 GPU?". After watching this video, I think I can use quantized version model instead of LLAMA 2 vicuna 7B model
@boscocorrea1895 Жыл бұрын
Waiting for the api video.. :)
@Elshaibi Жыл бұрын
Great video as usual, it would be great to have one click install file for those who are not expert
@engineerprompt
Жыл бұрын
Let me see what I can put together
@user-fh4kd5sl7l11 ай бұрын
HI Thanks for sharing , great content and very useful, one question can we create a prompt like chatbot , like when we want to read research publications
@engineerprompt
11 ай бұрын
Yes!
@asepmulyana9085 Жыл бұрын
Have you created localgpt api? For example to connect to whatsapp bot. I am still not have an idea how to do that. Thanks.
@nadavel8510 ай бұрын
Thanks a lot for this! Did anyone encounter this error and resolved it? "certificate verify failed: unable to get local issuer certificate"
@vkarasik Жыл бұрын
Many thanks - I was able to install localGPT and ingest my docs. Two questions (vanilla install by cloning your repo as is): 1) on 16CPUs/64GB RAM x86 instance it takes 1-1.5 minutes for getting answer, 2) "what is the term limit of the us president?" I'm getting "The President of the United States has a term limit of four years as specified in Article II, Section 1 of the US Constitution." answer :-(
@RameshBaburbabu Жыл бұрын
Thanks Man !! , it was very useful. Here is one use case , for evolving user data embedding . in RDBMS DB we update a row and we can add more data in other tables with associated Foreign keys. can we embed say patient data day after day and ask questions for 1 year of particular patient ..? bottom line , I am looking for `incrementally embed` ....
@mmdls602
10 ай бұрын
Use llama index in that case. It has bunch of libraries that you can use to “refresh” the embedding when you add or delete the data. Super useful.
@ihebakermi9432 ай бұрын
thank
@DhruvJoshiDJ Жыл бұрын
one click installer for the project would be great.
@nattyzaddy65559 ай бұрын
Is the larger embeddings model much better than the smaller one?
@giovanith Жыл бұрын
hello, this run in Windows 10 ? trying a lot here but no success (W10, 40 Gb Ram, RTX4070 12 Gb VRam). Thanks
@aldisgailis99016 ай бұрын
@engineerprompt I would join in development, more on front end side, but at the moment seems this repo will mainly orient it self to be a doc reading tool, so idk. However, if this GPT could just be injesting files for knowledge, have access to internet to find out stuff, summarize. Sure id be up for it
@touristtam11 ай бұрын
How is it that this type of LLM generator project have usually no test what so ever and heavily lean on Conda?
@lshadowSFX Жыл бұрын
what if i already have all the files of the model downloaded somewhere? how do i make it use those files?
@ssvfx. Жыл бұрын
after 10 hours straight... WE GOT ENTER A QUERY: LETSGOOOOOOOO
@fabulatetra865011 ай бұрын
Is this using Fine Tuning on Llama-2 Model? thanks
@abdalrhmanalkabani8784 Жыл бұрын
when I run the code do i install the model locally ?
@bowenchen490811 ай бұрын
Is it very slow if we run locally? thank you in advance
@SamirDamle Жыл бұрын
Is there a way we can have a Docker image of this that I can run as a container with everything configured by default?
@StarfilmerOne
3 ай бұрын
I thought same, at least config presets for models or smtng I don't understand
@tk-tt5bw10 ай бұрын
Nice video. But can we make some videos for a M1 silicon MacBook
@adivramasalvayer35374 ай бұрын
Is it possible to use elastic as a vector database? If yes, may I get the tutorial link? Thank You
@TheCopernicus1 Жыл бұрын
Amazing project mate, any recommendations on running the UI version? I have followed the documentation and changed to the GGML versions of Llama2 however keeps erroring out. Could you perhaps recommend any additional instructions? I am running on M1 many thanks!
@prestonmccauley43
Жыл бұрын
I had the same issue on PAC llama errored out....
@IbrahimAkar
10 ай бұрын
Use the GGUF models. I had the same issue.
@Gingeey2311 ай бұрын
Great video and impressive project. I'm having errors when ingesting .txt files, but .PDFs seem to work fine! Has anyone run a WireShark report or similar to monitor the packets being transmitted externally to ensure that this is 100% local and no data is being leaked? would be great to get that assurance! again, great work and thanks
@Psychopatz
8 ай бұрын
i mean you could just turn off the data for it to verify that its purely local
@Psychopatz
8 ай бұрын
i mean you could just turn off the data for it to verify that its purely local
@pratik1762006s Жыл бұрын
Thank you for the video. Would you be able to share how to work with llama for sentiment analysis. I downloaded the model, but it somehow doesnt work from local or from the transformers version.
@mauryaanurag6622
7 ай бұрын
On which system did u did this?
@pratik1762006s
7 ай бұрын
@@mauryaanurag6622 rtx 4090
@littlesomethingforyou4 ай бұрын
hi i an unable to get back a response after i "enter query". vs code just gets stuck. is it because im running it on my cpu?
@md.ashrafulislamfahim310610 ай бұрын
Can you please tell me how can I use this project for implementing it using django for a chatbot?
@anushaaladakatti41456 ай бұрын
Can we extract and show images from this, as it responds with text content? how to do it
@abeechr Жыл бұрын
Seems like setup and installation went well, but when I enter a query, nothing happens. No error, no nothing. Many attempts… Any ideas?
@intuitivej932711 ай бұрын
Hi, thankful for your sharing, i am experimenting with it. By the way, how can i train the model properly? Would it remember the doc and the conversations we had after restarting the computer? I want to fine-tune it so it can learn the context.
@mauryaanurag6622
7 ай бұрын
On which system did u did this?
@intuitivej9327
7 ай бұрын
@mauryaanurag6622 system..? Window11..? I am a beginner, so I am not sure if this is the right answer..
@olegpopov3180 Жыл бұрын
Where is the model weights update (finetune)? Or am i missing something in the video... So, you created embeddings and store them in the DB. How are they connected to the model without finetuing process?
@engineerprompt
Жыл бұрын
I would recommend to watch the localgpt video, link is in the description. That will clarify alot of things to you. In this case, we are not doing any fine-tuning of the model search. Rather we do a semantic search on most relevant parts of the document and then give those parts along with the prompt to the LLM to generate an answer
@teleprint-me Жыл бұрын
Pro Tip: Don't set max_tokens to the maximum value. It won't work out the way you hope. It's better to use a fractional value that represents a percentage of the maximum sequence length.
@Ericzon
Жыл бұрын
please, could you elaborate more this answer?
@bobo32756
11 ай бұрын
@@Ericzon yes, this would be interesting !
@antdok9573
11 ай бұрын
Has NO idea why
@teleprint-me
11 ай бұрын
@Ericzon The sequence length represents the context window for a model. The max token parameter dictates the maximum sequence length the model can generate as output. When you input a text sequence, it becomes a part of the context window. The model will generate a continuation sequence based on the input. What do you think will happen as a result if the model is allowed to generate an output sequence that is as long as its context window while incorporating your input? For those that don't know, it generates an error in the best-case scenarios. The worst-case scenario is a bug of your own making that leaves you with absolute frustration.
@teleprint-me
11 ай бұрын
@antdok9573 I was in the ER and then working on PyGPTPrompt while recovering at home. So 🤷🏽‍♂️. Also, if you think the Reactance Response is a clever mental hack, I would like to disagree with you. I find statements like this as rude as they are pretentious.
@brianrowe115211 ай бұрын
Mine says bitsandbytes is deprecated, but then when I try to do pip install says its already met, but when I run again it says deprecated.. please install.
@mibaatwork Жыл бұрын
You talk about cpu and Nvidia, what’s about AMD? Can you add it also?
@gold-junge91 Жыл бұрын
It would be grateful to add it to my paperless-ngx
@jennilthiyam12617 ай бұрын
Hi. I have tried to chat with two CSV files. the thing is the model is not performing well. it is not even giving the correct answer when I ask about a particular value in a row given the key words. It is not good at all. I am using 70B. Does anyone have any idea how to make it more relatable? It does not even able to understand the data presented in CSV files.
@trobinsun985110 ай бұрын
Does it needs a powerful machine ? a lot of RAM ?
@kashishvarshney2225 Жыл бұрын
I run it on cuda but it's still giving ans in 3 mins how can I improve it's speed please someone reply
@ignacio371411 ай бұрын
hey friend. it seems that you know a lot!! could you give me a hand with this? I want to know what's the best way of creating your own "chatgpt" but giving a specific amount of X files. Let's say a small book, and then be able to ask questions about it, but running it locally or in the cloud (not using any third party basically). what's the best way of doing it? is it even possible? like running something like that in google colab with files from google drive or something like that? thanks in advance man!
@ElectroRestore11 ай бұрын
Thank you so much for this video! It has 2 of the three components I am in need of: privateGPT and Llama 2 7B (or 13B) Chat model. My third requirement is to have this running in on my server Windows 11 machine, such as Text-Gen Web UI so I can access it over my network with remote wifi browsers. Can you explain how to get this in the web ui?
@engineerprompt
11 ай бұрын
Wait for my api video :)
@victoradegbite4819
11 ай бұрын
Hi, @@engineerprompt I need your email for discussion on a project.
@engineerprompt
11 ай бұрын
@@victoradegbite4819 check out the video description :)
@Bamseficationify10 ай бұрын
Im running intel 7 cpu Best i could get was like 570000ms which is like almost 10 minutes for a reply. How do i get this number down without getting a gpu
@bhavikpatel761211 ай бұрын
Getting error for Nonetype object is not subscriptable. Can you please help.
@jerkoviskov537910 ай бұрын
Do i need GPT4 API to run this on my machine?
@ihydrocarbon11 ай бұрын
Seems to work on my Fedora 38 ThinkPad i5, but am confused as to how to train it on other data. I removed the constituion.pdf file and it still responds to queries with answer that refer to that document...
@engineerprompt
11 ай бұрын
There is a DB folder, delete that and rerun the ingest.py
@user-jh8od5jl1c11 ай бұрын
Which python version is used in this video?
@swapnilmahure4802 Жыл бұрын
where the model save locally ?
@kunzelbunt10 ай бұрын
Did i need a specific structure in my .pdf Document, so that the language model could read the data cleanly?
@engineerprompt
10 ай бұрын
This will work well on text, figures and tables is still an issue’s
@MrBorkori11 ай бұрын
Hi, thank you for your content it's very useful! Now I'm trying to run with TheBloke/Llama-2-70B-Chat-GGML, but gives me an error - error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024 llama_load_model_from_file: failed to load model Any help will be appreciated
@b326yr Жыл бұрын
Welldone, but nah I'm just gonna wait for things to become simplier and have better interface. Waiting for Microsoft Copilot.
@RealEnigmaEngine11 ай бұрын
Is there a support for AMD GPUs?
@pedroavex11 ай бұрын
Hi bro! Thanks for the video. I have a question: you set the chunks at 1000 and overlap 200 in your ingest file. Would I have a higher chance to have a more complete answer if i raise both parameters, say chunks of 4000 and overlap 1000? Would it impact too much the performance? Thanks!!
@dhananjaywithme
10 ай бұрын
Bigger Chunk Size is slower to process hence will require a more powerful processing Unit. Also, these numbers depend on the purpose for which we are ingesting the data. Consider the length of the documents in your dataset. If the documents are short, you may want to use a smaller chunk size to avoid splitting the documents into too many small pieces. Consider the complexity of the task. If the task is complex, such as generating a summary of a long document, you may want to use a larger chunk size to provide the model with more context. Consider the performance of your model. If your model is slow, you may want to use a smaller chunk size to improve performance.
@test123826 ай бұрын
Does document type include html?
@rahuldayal8406 Жыл бұрын
Great video, I am using M2 pro it answers my query really slow, how can I make it quick?
@gamerwalkers
11 ай бұрын
same problem using m1
@user-el7ju9vw6g11 ай бұрын
how can I run it with rocM and AMD GPUs? I'm a noob here and want to explore this project.
@IbrahimAkar10 ай бұрын
Does it download/load the model every time you run the app? I think there should be a check if the model is already downloaded if this is the case
@engineerprompt
10 ай бұрын
It downloads the model only once
@Shogun-C Жыл бұрын
What would the optimum hardware set up be for all of this?
@Yakibackk
Жыл бұрын
With petal you can run from any hardware even mobile
@HedgeHawking
Жыл бұрын
@@Yakibackk Tell me more about pertal please
@FaridShahidinejad
11 ай бұрын
I'm running llama2 on LM studio which doesn't require all these convoluted steps on my Ryzen 9, 48GB of ram, and a 1080 card and it runs like mollases
@brookyu2 Жыл бұрын
on macbook pro with m1 pro, after inputting the prompt, nothing is returned. the program is running forever. anything to do with my pytorch installation?
@gamerwalkers
11 ай бұрын
worked ok for me. but result was slow. took 3-5 minutes for me
@manu05329911 ай бұрын
is there a way for locat gtp to read a pdf file and extract the data to a specified JSON structure
@engineerprompt
11 ай бұрын
Yes, you will have to write a code for that.
@alaamohammad642211 ай бұрын
Please i get error tbe repo liama2 not existing !!!!😢😢😢😢
@hassentangier389110 ай бұрын
I downloaded the code ,but it's already changed how to integrate the llama downlaoded,please
@VoltVandal Жыл бұрын
Thank you ! working really great, even on a M1. Just one question, i compiled/ran llama_cpp on my M1 and it is using GPU, can this be somehow work also for your project ? [I'm just a beginner, THX]
@engineerprompt
Жыл бұрын
Yes, under the hood, it's running the models on llama_cpp.
@VoltVandal
Жыл бұрын
Ha, got it, maybe no problem for pro's, but had just to: CMAKE_ARGS="-DLLAMA_METAL=on" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir And now it is using GPU !
@gamerwalkers
11 ай бұрын
@@VoltVandal Hi Martin. Could you help with me on steps how you made it work? I am using m1 but model is running very slow. How fast does it give you a response. For me ranging between 3-5 minutes.
@VoltVandal
11 ай бұрын
@@gamerwalkers Well, this is due to the endless swapping (16GB mbp with 7B model). I got it running directly with the main command from the llama-cpp. But if i start embedding etc. i'm ending with the same 5min swap. I'm not a python pro or ai pro, so can not give you an answer, why this happens. I returned to my old nvidia machine, unfortunately. So at least it is working, not getting GPU errors, but still no fun. 😞 Maybe some ai pro can tell more on that issue.
@gamerwalkers
11 ай бұрын
@@VoltVandal when you say directly from the main command from llama-cpp are you able to run that command trained on our own pdf like how video has done? Or you are just referring to default llama prompts that answers queries but not trained on our own pdf data?
@DarylatDarylCrouse11 ай бұрын
The code shown in the video is not the same as what is at the repo right now. at 13:17 for example, I can't find any of that in the current repo. Pleaser help.
@engineerprompt
11 ай бұрын
Yes, it's changing as I am trying to add more features and make it better. Model definition has been moved to constants.py. I created a new video on the localGPT API, the changes are outlined in that video: kzread.info/dash/bejne/fXamtpKcqtXacdY.html
@Weltraumaff311 ай бұрын
Maybe a stupid question and I'm just missing what --device_type I should enter but: I'm struggling a bit using my AMD 6900XT. I don't want to use my CPU and PyTorch doesn't seem to work with OpenCL for example. Has anyone got an idea? Cheers in advance
@engineerprompt
11 ай бұрын
in this case, cpu :)
@manu05329911 ай бұрын
also is there local gtp that could convert the word doc to some other language
@engineerprompt
11 ай бұрын
Depending on the model, some will support other languages.
@varunms8506 ай бұрын
Can you give your system setting , RAM, VRAM, GPU and Processor ?
@tvbox653311 ай бұрын
why not AMD gpu with ROCM?
@extempore667 ай бұрын
Hello everyone, using a local (cached) llama2 model with LanceDB as an index store. Ingesting many PDF files and using a fairly standdard prompt template ("given the context below {context_str}. Based on the context and not prior knowledge, answer the query. {query_str} ..."). It is very frustrating getting different answers for the same question. Aany similar experiences? Thhanks
@engineerprompt
7 ай бұрын
Have you looked at the source documents returned in each case? That will be a good starting point. Also potentially reduce the temperature of the LLM
@extempore66
7 ай бұрын
@@engineerprompt Thank you for the prompt response. I have looked at the documents. Also ran a Retriever evaluator (Faithfulness, relevancy, Correctness ...) Sometimes it passes sometimes it does not. Still trying to wrap my brain around why the answers are so different while the nodes (vectors) returned are obviously consistently the same