Llama-2 with LocalGPT: Chat with YOUR Documents

Ғылым және технология

Пікірлер: 233

  • @engineerprompt
    @engineerprompt10 ай бұрын

    Want to connect? 💼Consulting: calendly.com/engineerprompt/consulting-call 🦾 Discord: discord.com/invite/t4eYQRUcXB ☕ Buy me a Coffee: ko-fi.com/promptengineering |🔴 Join Patreon: Patreon.com/PromptEngineering

  • @electricsheep2305
    @electricsheep2305 Жыл бұрын

    This is the most important KZread video I have ever watched. Thank you and all the contributors. Looking forward to the prompt template video!

  • @brianhauk8136
    @brianhauk8136 Жыл бұрын

    I look forward to seeing your different prompt templates based on the chosen model. 🙂

  • @simonmoyajimenez2045
    @simonmoyajimenez2045 Жыл бұрын

    I found this video really compelling. I believe it would be incredibly fascinating to leverage a CSV connection to answer data-specific questions. It reminds me of the article I read titled 'Talk To Your CSV: How To Visualize Your Data With Langchain And Streamlit.

  • @isajoha9962
    @isajoha99629 ай бұрын

    Thanks for the overview explaining the differences between the models in the context. 👍

  • @Swanidhi
    @Swanidhi9 ай бұрын

    Great content! I look forward to your videos. Pleae also create a video to guide people who are new to the world of deep learning so that they know what to learn from where and things they need to learn to start contributing to projects such as localGPT. Also, another video on how to determine which quantized model can be run efficiently on a local system by specifying parameters for this assessment. Thanks again!

  • @adriantang5811
    @adriantang5811 Жыл бұрын

    Thank you for your sharing!

  • @awu878
    @awu878 Жыл бұрын

    really like your video!! Looking forward for the video about the local GPT api😃

  • @mauryaanurag6622

    @mauryaanurag6622

    7 ай бұрын

    On which system did u did this?

  • @wtfisthishandlebs
    @wtfisthishandlebs11 ай бұрын

    Great work! Cheers

  • @BigAura
    @BigAura Жыл бұрын

    Great job! Thank you!!

  • @deepalisharma1327
    @deepalisharma1327 Жыл бұрын

    Very nicely explained, thank you!!

  • @mauryaanurag6622

    @mauryaanurag6622

    7 ай бұрын

    On which system did u did this?

  • @RabyRoah
    @RabyRoah Жыл бұрын

    Awesome video! Would next love to see a video on how to install LLaMA2 + LocalGPT with GUI in the cloud (Azure). Thank you.

  • @REALTIES

    @REALTIES

    11 ай бұрын

    Waiting for this as well. Hope we get this soon. 🙂

  • @user-uo5po4vg1m
    @user-uo5po4vg1m Жыл бұрын

    Ultimate, one of the best video in youtube.

  • @mauryaanurag6622

    @mauryaanurag6622

    7 ай бұрын

    On which system did u did this?

  • @zohaibsiddiqui1420
    @zohaibsiddiqui1420 Жыл бұрын

    It would be great if these massive beast models could run on low-end machines so that we can increase contributions

  • @absar66
    @absar66 Жыл бұрын

    Many thanks..carry on the good work mate 👍

  • @engineerprompt

    @engineerprompt

    Жыл бұрын

    Thank you for kind words.

  • @caiyu538
    @caiyu53810 ай бұрын

    Thank you for providing such great AI tool for use.

  • @synaestesia-bg3ew
    @synaestesia-bg3ew Жыл бұрын

    Love you, you the best KZreadr Ai expert. I am learning so much because you don't just talk, you are making things work.

  • @engineerprompt

    @engineerprompt

    Жыл бұрын

    Thanks for the kind words.

  • @synaestesia-bg3ew

    @synaestesia-bg3ew

    Жыл бұрын

    @@engineerprompt No it's not just kind words, i really admire and recognize when someone have a true passion and put the effort. Further more I really love your emphasis on localisation of Ai projects, I think it's the future,no one wants to be a blind slave of a central corporation's Ai, censoring and monitoring every words . Even Elon said it, Ai should be democratized. The only problem, we might not achieved it because all the odds and money are against it, but I believe it worth trying.

  • @SadeghShahmohammadi
    @SadeghShahmohammadi Жыл бұрын

    Nice work. Very well done. When are you going to explain the API version?

  • @1989arrvind
    @1989arrvind Жыл бұрын

    Great👍

  • @MikeEbrahimi
    @MikeEbrahimi Жыл бұрын

    A video with API access would be great

  • @nikow1060
    @nikow1060 Жыл бұрын

    BTW if someone has issues setting up CUDA and conflict with bits and bytes: after checking CUDA in visual studio and checking that Cuda version matches torch requirements you can try the following for windows installation: pip install bitsandbytes-windows . Someone has provided a corrected bitsandbytes version for windows ... it worked for me after 24 hours of bitesandbytes errors compalining about Cuda installation

  • @richardjfinn13
    @richardjfinn13 Жыл бұрын

    Worth pointing out - you got it when you asked to show sources, but the presidentail term limit is NOT in Article 2. It's in the 22nd Ammendment, passed in 1951 after FDR was elected to a 3rd term.

  • @Socrataclysm
    @Socrataclysm Жыл бұрын

    This looks wonderful. Going to get this setup tonight. Anyone familiar with differences between privateGPT and LocalGPT? Seems like they give you mostly the same functionality.

  • @hish.b

    @hish.b

    11 ай бұрын

    I don’t think privategpt has gpu support tbh.

  • @ypsehlig
    @ypsehlig Жыл бұрын

    Same question as shogun-c, looking for hardware spec recommendations

  • @chengqian5737
    @chengqian573711 ай бұрын

    hello, thanks for the video. However, what's the meaning of having chunk size as 1000, while the sentence embedding model can only take a maximum of 128 tokens at a time? I would suggest to reduce the chunk size to 128 if you prefer the sentence transformer.

  • @kamilnwa8020
    @kamilnwa8020 Жыл бұрын

    Awesome video. QQ: how to add multiple “cuda”? the original code specifies `device="cuda:0"`. How to modify this line to use 2 or more GPUs?

  • @ToMtheVth
    @ToMtheVth10 ай бұрын

    Works quite well! Unfortunatelly performance on my M1 MB Pro is a bit of an issue. Ingested 30 Dokuments, Prompt eval time is 20 Minutes... I need better Hardware XD

  • @aketo8082
    @aketo8082 Жыл бұрын

    Thank you for this video. One thing keeps me busy. How does the "intelligenz" work in GPT, LLM...? Because I can't see, that this model "understand" relationship, can identify locations and difference between three person with same first name. I know LLM is a database for the words. I use GTP4All, tested all available LLM's, but always the same problem. Also correction via chat are not possible, and so on. So I guess, I don't understand that "AI" right or miss some basic information. Also how to train that. Same problems with ChatGPT, Bing, Bard, etc. Thank you for any hint, links and suggestions.

  • @user-yr8sg8vc2m
    @user-yr8sg8vc2m11 ай бұрын

    great video helped a lot but there is an issue that when passing parameters with --show_sources and also asking the llama2 a question outside the data which was ingested , it provides answer for that and states an incorrect source document which has nothing to do with the actual question, and why provide all data which were used cant we directly get line number and page number with the document name only.

  • @kittyketan
    @kittyketan Жыл бұрын

    Great Work !! shot for the moon!

  • @engineerprompt

    @engineerprompt

    Жыл бұрын

    Thanks 😊

  • @intuitivej9327
    @intuitivej932710 ай бұрын

    Hi thankful for you, I am experimenting llama2 13b ggml in my notebook. But a few days ago, I was realized that llama was not saved in my local but it was like.. i don't know.. snap shot...?I want to save the model and load it from my local path. I tried some codes but failed.. could you please guide me? Thank you again for your sharing ❤

  • @paulparadise6059
    @paulparadise605911 ай бұрын

    It's a great app! Thank you! Suggestion, can you redo the requirements.txt to install torch with cuda support? It is irritating to have to uninstall torch and install it with conda and the cuda flag set. And then troubleshoot the issues that causes.

  • @user-hf3fu2xt2j
    @user-hf3fu2xt2j Жыл бұрын

    took someone this short to create a video about it)

  • @caiyu538
    @caiyu53810 ай бұрын

    I have asked this question in your other video "This localgpt works greatly for my file, I use T4 GPU with 16GB cuda memory, it will take 2-4 minutes to answer my questions for a file with 4-5 pages. Is it expected to take so long to answer the question using T4 GPU?". After watching this video, I think I can use quantized version model instead of LLAMA 2 vicuna 7B model

  • @boscocorrea1895
    @boscocorrea1895 Жыл бұрын

    Waiting for the api video.. :)

  • @Elshaibi
    @Elshaibi Жыл бұрын

    Great video as usual, it would be great to have one click install file for those who are not expert

  • @engineerprompt

    @engineerprompt

    Жыл бұрын

    Let me see what I can put together

  • @user-fh4kd5sl7l
    @user-fh4kd5sl7l11 ай бұрын

    HI Thanks for sharing , great content and very useful, one question can we create a prompt like chatbot , like when we want to read research publications

  • @engineerprompt

    @engineerprompt

    11 ай бұрын

    Yes!

  • @asepmulyana9085
    @asepmulyana9085 Жыл бұрын

    Have you created localgpt api? For example to connect to whatsapp bot. I am still not have an idea how to do that. Thanks.

  • @nadavel85
    @nadavel8510 ай бұрын

    Thanks a lot for this! Did anyone encounter this error and resolved it? "certificate verify failed: unable to get local issuer certificate"

  • @vkarasik
    @vkarasik Жыл бұрын

    Many thanks - I was able to install localGPT and ingest my docs. Two questions (vanilla install by cloning your repo as is): 1) on 16CPUs/64GB RAM x86 instance it takes 1-1.5 minutes for getting answer, 2) "what is the term limit of the us president?" I'm getting "The President of the United States has a term limit of four years as specified in Article II, Section 1 of the US Constitution." answer :-(

  • @RameshBaburbabu
    @RameshBaburbabu Жыл бұрын

    Thanks Man !! , it was very useful. Here is one use case , for evolving user data embedding . in RDBMS DB we update a row and we can add more data in other tables with associated Foreign keys. can we embed say patient data day after day and ask questions for 1 year of particular patient ..? bottom line , I am looking for `incrementally embed` ....

  • @mmdls602

    @mmdls602

    10 ай бұрын

    Use llama index in that case. It has bunch of libraries that you can use to “refresh” the embedding when you add or delete the data. Super useful.

  • @ihebakermi943
    @ihebakermi9432 ай бұрын

    thank

  • @DhruvJoshiDJ
    @DhruvJoshiDJ Жыл бұрын

    one click installer for the project would be great.

  • @nattyzaddy6555
    @nattyzaddy65559 ай бұрын

    Is the larger embeddings model much better than the smaller one?

  • @giovanith
    @giovanith Жыл бұрын

    hello, this run in Windows 10 ? trying a lot here but no success (W10, 40 Gb Ram, RTX4070 12 Gb VRam). Thanks

  • @aldisgailis9901
    @aldisgailis99016 ай бұрын

    @engineerprompt I would join in development, more on front end side, but at the moment seems this repo will mainly orient it self to be a doc reading tool, so idk. However, if this GPT could just be injesting files for knowledge, have access to internet to find out stuff, summarize. Sure id be up for it

  • @touristtam
    @touristtam11 ай бұрын

    How is it that this type of LLM generator project have usually no test what so ever and heavily lean on Conda?

  • @lshadowSFX
    @lshadowSFX Жыл бұрын

    what if i already have all the files of the model downloaded somewhere? how do i make it use those files?

  • @ssvfx.
    @ssvfx. Жыл бұрын

    after 10 hours straight... WE GOT ENTER A QUERY: LETSGOOOOOOOO

  • @fabulatetra8650
    @fabulatetra865011 ай бұрын

    Is this using Fine Tuning on Llama-2 Model? thanks

  • @abdalrhmanalkabani8784
    @abdalrhmanalkabani8784 Жыл бұрын

    when I run the code do i install the model locally ?

  • @bowenchen4908
    @bowenchen490811 ай бұрын

    Is it very slow if we run locally? thank you in advance

  • @SamirDamle
    @SamirDamle Жыл бұрын

    Is there a way we can have a Docker image of this that I can run as a container with everything configured by default?

  • @StarfilmerOne

    @StarfilmerOne

    3 ай бұрын

    I thought same, at least config presets for models or smtng I don't understand

  • @tk-tt5bw
    @tk-tt5bw10 ай бұрын

    Nice video. But can we make some videos for a M1 silicon MacBook

  • @adivramasalvayer3537
    @adivramasalvayer35374 ай бұрын

    Is it possible to use elastic as a vector database? If yes, may I get the tutorial link? Thank You

  • @TheCopernicus1
    @TheCopernicus1 Жыл бұрын

    Amazing project mate, any recommendations on running the UI version? I have followed the documentation and changed to the GGML versions of Llama2 however keeps erroring out. Could you perhaps recommend any additional instructions? I am running on M1 many thanks!

  • @prestonmccauley43

    @prestonmccauley43

    Жыл бұрын

    I had the same issue on PAC llama errored out....

  • @IbrahimAkar

    @IbrahimAkar

    10 ай бұрын

    Use the GGUF models. I had the same issue.

  • @Gingeey23
    @Gingeey2311 ай бұрын

    Great video and impressive project. I'm having errors when ingesting .txt files, but .PDFs seem to work fine! Has anyone run a WireShark report or similar to monitor the packets being transmitted externally to ensure that this is 100% local and no data is being leaked? would be great to get that assurance! again, great work and thanks

  • @Psychopatz

    @Psychopatz

    8 ай бұрын

    i mean you could just turn off the data for it to verify that its purely local

  • @Psychopatz

    @Psychopatz

    8 ай бұрын

    i mean you could just turn off the data for it to verify that its purely local

  • @pratik1762006s
    @pratik1762006s Жыл бұрын

    Thank you for the video. Would you be able to share how to work with llama for sentiment analysis. I downloaded the model, but it somehow doesnt work from local or from the transformers version.

  • @mauryaanurag6622

    @mauryaanurag6622

    7 ай бұрын

    On which system did u did this?

  • @pratik1762006s

    @pratik1762006s

    7 ай бұрын

    @@mauryaanurag6622 rtx 4090

  • @littlesomethingforyou
    @littlesomethingforyou4 ай бұрын

    hi i an unable to get back a response after i "enter query". vs code just gets stuck. is it because im running it on my cpu?

  • @md.ashrafulislamfahim3106
    @md.ashrafulislamfahim310610 ай бұрын

    Can you please tell me how can I use this project for implementing it using django for a chatbot?

  • @anushaaladakatti4145
    @anushaaladakatti41456 ай бұрын

    Can we extract and show images from this, as it responds with text content? how to do it

  • @abeechr
    @abeechr Жыл бұрын

    Seems like setup and installation went well, but when I enter a query, nothing happens. No error, no nothing. Many attempts… Any ideas?

  • @intuitivej9327
    @intuitivej932711 ай бұрын

    Hi, thankful for your sharing, i am experimenting with it. By the way, how can i train the model properly? Would it remember the doc and the conversations we had after restarting the computer? I want to fine-tune it so it can learn the context.

  • @mauryaanurag6622

    @mauryaanurag6622

    7 ай бұрын

    On which system did u did this?

  • @intuitivej9327

    @intuitivej9327

    7 ай бұрын

    @mauryaanurag6622 system..? Window11..? I am a beginner, so I am not sure if this is the right answer..

  • @olegpopov3180
    @olegpopov3180 Жыл бұрын

    Where is the model weights update (finetune)? Or am i missing something in the video... So, you created embeddings and store them in the DB. How are they connected to the model without finetuing process?

  • @engineerprompt

    @engineerprompt

    Жыл бұрын

    I would recommend to watch the localgpt video, link is in the description. That will clarify alot of things to you. In this case, we are not doing any fine-tuning of the model search. Rather we do a semantic search on most relevant parts of the document and then give those parts along with the prompt to the LLM to generate an answer

  • @teleprint-me
    @teleprint-me Жыл бұрын

    Pro Tip: Don't set max_tokens to the maximum value. It won't work out the way you hope. It's better to use a fractional value that represents a percentage of the maximum sequence length.

  • @Ericzon

    @Ericzon

    Жыл бұрын

    please, could you elaborate more this answer?

  • @bobo32756

    @bobo32756

    11 ай бұрын

    @@Ericzon yes, this would be interesting !

  • @antdok9573

    @antdok9573

    11 ай бұрын

    Has NO idea why

  • @teleprint-me

    @teleprint-me

    11 ай бұрын

    @Ericzon The sequence length represents the context window for a model. The max token parameter dictates the maximum sequence length the model can generate as output. When you input a text sequence, it becomes a part of the context window. The model will generate a continuation sequence based on the input. What do you think will happen as a result if the model is allowed to generate an output sequence that is as long as its context window while incorporating your input? For those that don't know, it generates an error in the best-case scenarios. The worst-case scenario is a bug of your own making that leaves you with absolute frustration.

  • @teleprint-me

    @teleprint-me

    11 ай бұрын

    @antdok9573 I was in the ER and then working on PyGPTPrompt while recovering at home. So 🤷🏽‍♂️. Also, if you think the Reactance Response is a clever mental hack, I would like to disagree with you. I find statements like this as rude as they are pretentious.

  • @brianrowe1152
    @brianrowe115211 ай бұрын

    Mine says bitsandbytes is deprecated, but then when I try to do pip install says its already met, but when I run again it says deprecated.. please install.

  • @mibaatwork
    @mibaatwork Жыл бұрын

    You talk about cpu and Nvidia, what’s about AMD? Can you add it also?

  • @gold-junge91
    @gold-junge91 Жыл бұрын

    It would be grateful to add it to my paperless-ngx

  • @jennilthiyam1261
    @jennilthiyam12617 ай бұрын

    Hi. I have tried to chat with two CSV files. the thing is the model is not performing well. it is not even giving the correct answer when I ask about a particular value in a row given the key words. It is not good at all. I am using 70B. Does anyone have any idea how to make it more relatable? It does not even able to understand the data presented in CSV files.

  • @trobinsun9851
    @trobinsun985110 ай бұрын

    Does it needs a powerful machine ? a lot of RAM ?

  • @kashishvarshney2225
    @kashishvarshney2225 Жыл бұрын

    I run it on cuda but it's still giving ans in 3 mins how can I improve it's speed please someone reply

  • @ignacio3714
    @ignacio371411 ай бұрын

    hey friend. it seems that you know a lot!! could you give me a hand with this? I want to know what's the best way of creating your own "chatgpt" but giving a specific amount of X files. Let's say a small book, and then be able to ask questions about it, but running it locally or in the cloud (not using any third party basically). what's the best way of doing it? is it even possible? like running something like that in google colab with files from google drive or something like that? thanks in advance man!

  • @ElectroRestore
    @ElectroRestore11 ай бұрын

    Thank you so much for this video! It has 2 of the three components I am in need of: privateGPT and Llama 2 7B (or 13B) Chat model. My third requirement is to have this running in on my server Windows 11 machine, such as Text-Gen Web UI so I can access it over my network with remote wifi browsers. Can you explain how to get this in the web ui?

  • @engineerprompt

    @engineerprompt

    11 ай бұрын

    Wait for my api video :)

  • @victoradegbite4819

    @victoradegbite4819

    11 ай бұрын

    Hi, @@engineerprompt I need your email for discussion on a project.

  • @engineerprompt

    @engineerprompt

    11 ай бұрын

    @@victoradegbite4819 check out the video description :)

  • @Bamseficationify
    @Bamseficationify10 ай бұрын

    Im running intel 7 cpu Best i could get was like 570000ms which is like almost 10 minutes for a reply. How do i get this number down without getting a gpu

  • @bhavikpatel7612
    @bhavikpatel761211 ай бұрын

    Getting error for Nonetype object is not subscriptable. Can you please help.

  • @jerkoviskov5379
    @jerkoviskov537910 ай бұрын

    Do i need GPT4 API to run this on my machine?

  • @ihydrocarbon
    @ihydrocarbon11 ай бұрын

    Seems to work on my Fedora 38 ThinkPad i5, but am confused as to how to train it on other data. I removed the constituion.pdf file and it still responds to queries with answer that refer to that document...

  • @engineerprompt

    @engineerprompt

    11 ай бұрын

    There is a DB folder, delete that and rerun the ingest.py

  • @user-jh8od5jl1c
    @user-jh8od5jl1c11 ай бұрын

    Which python version is used in this video?

  • @swapnilmahure4802
    @swapnilmahure4802 Жыл бұрын

    where the model save locally ?

  • @kunzelbunt
    @kunzelbunt10 ай бұрын

    Did i need a specific structure in my .pdf Document, so that the language model could read the data cleanly?

  • @engineerprompt

    @engineerprompt

    10 ай бұрын

    This will work well on text, figures and tables is still an issue’s

  • @MrBorkori
    @MrBorkori11 ай бұрын

    Hi, thank you for your content it's very useful! Now I'm trying to run with TheBloke/Llama-2-70B-Chat-GGML, but gives me an error - error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024 llama_load_model_from_file: failed to load model Any help will be appreciated

  • @b326yr
    @b326yr Жыл бұрын

    Welldone, but nah I'm just gonna wait for things to become simplier and have better interface. Waiting for Microsoft Copilot.

  • @RealEnigmaEngine
    @RealEnigmaEngine11 ай бұрын

    Is there a support for AMD GPUs?

  • @pedroavex
    @pedroavex11 ай бұрын

    Hi bro! Thanks for the video. I have a question: you set the chunks at 1000 and overlap 200 in your ingest file. Would I have a higher chance to have a more complete answer if i raise both parameters, say chunks of 4000 and overlap 1000? Would it impact too much the performance? Thanks!!

  • @dhananjaywithme

    @dhananjaywithme

    10 ай бұрын

    Bigger Chunk Size is slower to process hence will require a more powerful processing Unit. Also, these numbers depend on the purpose for which we are ingesting the data. Consider the length of the documents in your dataset. If the documents are short, you may want to use a smaller chunk size to avoid splitting the documents into too many small pieces. Consider the complexity of the task. If the task is complex, such as generating a summary of a long document, you may want to use a larger chunk size to provide the model with more context. Consider the performance of your model. If your model is slow, you may want to use a smaller chunk size to improve performance.

  • @test12382
    @test123826 ай бұрын

    Does document type include html?

  • @rahuldayal8406
    @rahuldayal8406 Жыл бұрын

    Great video, I am using M2 pro it answers my query really slow, how can I make it quick?

  • @gamerwalkers

    @gamerwalkers

    11 ай бұрын

    same problem using m1

  • @user-el7ju9vw6g
    @user-el7ju9vw6g11 ай бұрын

    how can I run it with rocM and AMD GPUs? I'm a noob here and want to explore this project.

  • @IbrahimAkar
    @IbrahimAkar10 ай бұрын

    Does it download/load the model every time you run the app? I think there should be a check if the model is already downloaded if this is the case

  • @engineerprompt

    @engineerprompt

    10 ай бұрын

    It downloads the model only once

  • @Shogun-C
    @Shogun-C Жыл бұрын

    What would the optimum hardware set up be for all of this?

  • @Yakibackk

    @Yakibackk

    Жыл бұрын

    With petal you can run from any hardware even mobile

  • @HedgeHawking

    @HedgeHawking

    Жыл бұрын

    @@Yakibackk Tell me more about pertal please

  • @FaridShahidinejad

    @FaridShahidinejad

    11 ай бұрын

    I'm running llama2 on LM studio which doesn't require all these convoluted steps on my Ryzen 9, 48GB of ram, and a 1080 card and it runs like mollases

  • @brookyu2
    @brookyu2 Жыл бұрын

    on macbook pro with m1 pro, after inputting the prompt, nothing is returned. the program is running forever. anything to do with my pytorch installation?

  • @gamerwalkers

    @gamerwalkers

    11 ай бұрын

    worked ok for me. but result was slow. took 3-5 minutes for me

  • @manu053299
    @manu05329911 ай бұрын

    is there a way for locat gtp to read a pdf file and extract the data to a specified JSON structure

  • @engineerprompt

    @engineerprompt

    11 ай бұрын

    Yes, you will have to write a code for that.

  • @alaamohammad6422
    @alaamohammad642211 ай бұрын

    Please i get error tbe repo liama2 not existing !!!!😢😢😢😢

  • @hassentangier3891
    @hassentangier389110 ай бұрын

    I downloaded the code ,but it's already changed how to integrate the llama downlaoded,please

  • @VoltVandal
    @VoltVandal Жыл бұрын

    Thank you ! working really great, even on a M1. Just one question, i compiled/ran llama_cpp on my M1 and it is using GPU, can this be somehow work also for your project ? [I'm just a beginner, THX]

  • @engineerprompt

    @engineerprompt

    Жыл бұрын

    Yes, under the hood, it's running the models on llama_cpp.

  • @VoltVandal

    @VoltVandal

    Жыл бұрын

    Ha, got it, maybe no problem for pro's, but had just to: CMAKE_ARGS="-DLLAMA_METAL=on" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir And now it is using GPU !

  • @gamerwalkers

    @gamerwalkers

    11 ай бұрын

    @@VoltVandal Hi Martin. Could you help with me on steps how you made it work? I am using m1 but model is running very slow. How fast does it give you a response. For me ranging between 3-5 minutes.

  • @VoltVandal

    @VoltVandal

    11 ай бұрын

    @@gamerwalkers Well, this is due to the endless swapping (16GB mbp with 7B model). I got it running directly with the main command from the llama-cpp. But if i start embedding etc. i'm ending with the same 5min swap. I'm not a python pro or ai pro, so can not give you an answer, why this happens. I returned to my old nvidia machine, unfortunately. So at least it is working, not getting GPU errors, but still no fun. 😞 Maybe some ai pro can tell more on that issue.

  • @gamerwalkers

    @gamerwalkers

    11 ай бұрын

    @@VoltVandal when you say directly from the main command from llama-cpp are you able to run that command trained on our own pdf like how video has done? Or you are just referring to default llama prompts that answers queries but not trained on our own pdf data?

  • @DarylatDarylCrouse
    @DarylatDarylCrouse11 ай бұрын

    The code shown in the video is not the same as what is at the repo right now. at 13:17 for example, I can't find any of that in the current repo. Pleaser help.

  • @engineerprompt

    @engineerprompt

    11 ай бұрын

    Yes, it's changing as I am trying to add more features and make it better. Model definition has been moved to constants.py. I created a new video on the localGPT API, the changes are outlined in that video: kzread.info/dash/bejne/fXamtpKcqtXacdY.html

  • @Weltraumaff3
    @Weltraumaff311 ай бұрын

    Maybe a stupid question and I'm just missing what --device_type I should enter but: I'm struggling a bit using my AMD 6900XT. I don't want to use my CPU and PyTorch doesn't seem to work with OpenCL for example. Has anyone got an idea? Cheers in advance

  • @engineerprompt

    @engineerprompt

    11 ай бұрын

    in this case, cpu :)

  • @manu053299
    @manu05329911 ай бұрын

    also is there local gtp that could convert the word doc to some other language

  • @engineerprompt

    @engineerprompt

    11 ай бұрын

    Depending on the model, some will support other languages.

  • @varunms850
    @varunms8506 ай бұрын

    Can you give your system setting , RAM, VRAM, GPU and Processor ?

  • @tvbox6533
    @tvbox653311 ай бұрын

    why not AMD gpu with ROCM?

  • @extempore66
    @extempore667 ай бұрын

    Hello everyone, using a local (cached) llama2 model with LanceDB as an index store. Ingesting many PDF files and using a fairly standdard prompt template ("given the context below {context_str}. Based on the context and not prior knowledge, answer the query. {query_str} ..."). It is very frustrating getting different answers for the same question. Aany similar experiences? Thhanks

  • @engineerprompt

    @engineerprompt

    7 ай бұрын

    Have you looked at the source documents returned in each case? That will be a good starting point. Also potentially reduce the temperature of the LLM

  • @extempore66

    @extempore66

    7 ай бұрын

    @@engineerprompt Thank you for the prompt response. I have looked at the documents. Also ran a Retriever evaluator (Faithfulness, relevancy, Correctness ...) Sometimes it passes sometimes it does not. Still trying to wrap my brain around why the answers are so different while the nodes (vectors) returned are obviously consistently the same

Келесі