Running a Hugging Face LLM on your laptop

Ғылым және технология

In this video, we'll learn how to run a Large Language Model (LLM) from Hugging Face on our own machine.
Blog post: www.markhneedham.com/blog/202...
Notebook: github.com/mneedham/LearnData...
Other videos showing how to run LLMs on your own machine
• Running Mixtral on you...
• LLMs on your own machi...
• Running Mistral AI on ...
• Hugging Face GGUF Mode...

Пікірлер: 82

@elmino1910 ай бұрын
You explained completely and perfectly without wasting the audience's time! well done
@learndatawithmark
8 ай бұрын
Thanks!
@headshorts_YT2 ай бұрын
Awesome! Thanks for this video.
@NappoAvanti4 ай бұрын
Thanks for this video!!
@alexandrerodtchenko60992 ай бұрын
Super video!
@armantech5926Ай бұрын
That's Great! Thank you!
@piyushharsh019 ай бұрын
Super helpful and easy to understand!
@learndatawithmark
8 ай бұрын
Glad it was helpful :)
@shivamroy177510 ай бұрын
This was an extremely informative video. Really appreciate it.
@learndatawithmark
10 ай бұрын
Thanks, glad you enjoyed it!
@flaviocorreia44623 ай бұрын
Thank you very much, you helped me a lot
@enceladus966 ай бұрын
this video saved my day
@MitulGarg3Ай бұрын
Absolutely wonderful video! to the point and well explianed! way to go! thanks a lot!
@learndatawithmark
16 күн бұрын
Thanks - very kind of you :D
@diln516 сағат бұрын
i personally found disabling your wifi from a jupyter notebook to be bad ass
@knotfoursail640410 ай бұрын
Super helpful 👍
@knotfoursail6404
10 ай бұрын
Random idea, but a video on how to run an embeddings model on a laptop would be really cool 😀 Could even combine embeddings + text2text for more specific answers. Or even t5_3b + selenium to create something similar to bing chat. Anyway, wish you luck on KZread 😊
@learndatawithmark
9 ай бұрын
Sorry, I didn't see this reply! I've got a notebook with that idea sketched out, so I'll create a video for that soon. On holiday at the moment, but will do it when I get back home!
@wasgeht240912 күн бұрын
thx
@user-ph5is3hi9c4 ай бұрын
thanks Mark, very nice video, super clearly put! could you please suggest, what could be the reason if (when trying to set the wifi off) the output of those lines of code is "ModuleNotFoundError: No module named utils"?
@learndatawithmark
4 ай бұрын
utils should be referring to this file - github.com/mneedham/LearnDataWithMark/blob/main/llm-own-laptop/notebooks/utils.py - so in theory that's independent of WiFi connectivity. If it can't find that module you could copy/paste those functions into the notebook and use them like that.
@viniciustsugi80079 ай бұрын
Awesome content, love your channel! Video is very informative and concise, thanks. As a friendly suggestion, you might want to give a couple of secs at the end for the video for slow people like me to hit that well deserved like button :)
@learndatawithmark
9 ай бұрын
Thanks for your kind words! Let me see if I can figure out a good way to implement your suggestion 🙂
@user-du8hf3he7r4 ай бұрын
An API key is not needed if the model is downloaded and run locally.
@itspaintosee
3 ай бұрын
So long as you have a behemoth of a machine. 16GB Ram = 100% memory usage 😭😂
@jayo3074
2 ай бұрын
I don't think anyone can afford an expensive laptop lol
@nikhilesh255
Ай бұрын
Do you know how to run it on live servers!! How to get?
@mikiallen77333 ай бұрын
thanks sir , however I want to know 1- how one can integrate specific set of models (pre-trained) ones in to Rstudio ? so that one can simply run examples on data "proprietary in my case " locally within R 2- is there a way to ask the inference API for tasks different from the typical sentiment classification of text for example "multi-entity tagging" , "modalities" ....etc your input is highly appreciated
@dimitripetrenko4386 ай бұрын
Hi Mark! This video is very helpful, may I ask do you think fastchat can be used in combination with Qdrant for RAG? Thank you in advance
@learndatawithmark
6 ай бұрын
Yeh you can could combine it with any database to do RAG.
@l501l501l7 ай бұрын
Hi Mark, great video. May I know your notebook and the configuration? I’m thinking switching to MacOS to play around with Gen AI.
@learndatawithmark
6 ай бұрын
I'm using the latest version of Jupyter Lab and I have it set to dark mode with pretty much every one of the views hidden so that I can use as much of the screen as I can. Not sure if that answered your question, so feel free to follow up!
@MarxTech_DIY6 ай бұрын
Hey, great tutorial! I also found your blog on this and followed that, but I always get this error: Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. This is my first time experimenting with LLMs, so any assistance would be greatly appreciated.
@learndatawithmark
6 ай бұрын
Oh I'm not sure about that error - I haven't seen that one before. Since I made this video I've been playing around with another tool called Ollama which I found easier to use. It might be worth giving that a try to see if that works for you? kzread.info/dash/bejne/gHqbp8mqpcSTlso.html
@user-lx1th5gr5z5 ай бұрын
Thank you! I finally downloaded a big llama model.. lol 😹
@learndatawithmark
5 ай бұрын
Winning!
@wadejohnson45425 ай бұрын
What is the configuration of your local environment
@Shivam-bi5uo4 ай бұрын
i want to work with a model that is tagged as 'text-generation' how do i run it?
@RedPythonАй бұрын
What is the editor you are using on localhost ?
@learndatawithmark
Ай бұрын
I'm using a Jupyter notebook in the video
@radoslavkoynov3223 ай бұрын
I am getting an error/ info log from transformers (twice) stating "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained." The model then generates only a bunch of whitespace, no matter the input. I have followed through your steps and made sure the files were downloaded at the expected location. The behavior occurrs both with and without setting legacy=False.
@learndatawithmark
3 ай бұрын
Does it work after you see that message?
@kyledavelaar455
2 ай бұрын
@@learndatawithmark getting the same error and hang when running in colab or locally. seems like the pipeline("my query") never resolves
@mokh16113 ай бұрын
I'm probably missing something, but where are you using the downloaded files? you are entering model_id in .from_pretrained(), how is it finding/using the downloaded model?
@learndatawithmark
3 ай бұрын
It's reading from the ~/.cache directory. So it constructs a file path based on that directory & the model id
@SudhakarVyasКүн бұрын
Thanks Mark for this video. A quick question- Is this safe to pass some PII data to one of the open source hugging face models that require the hugging face API token ? If No, how can this be resolved in deployment so that there is no risk of data leakage ? Please guide through this.
@learndatawithmark
20 сағат бұрын
It depends. If you are passing your HF API token because you're using the HF inference endpoint then your data is getting sent to the HF API. If you're passing it because you're downloading a model that requires token auth then your data will only be local to where you run the model that you download.
@Cynadyde7 ай бұрын
If you're getting a wacky error trying to perform `AutoTokenizer.from_pretrained(model_id, legacy=False)`, do pip install protobuf==3.20.1 and restart the jupyter kernel
@learndatawithmark
6 ай бұрын
Good tip! I get that error somewhat randomly but never quite figured out the combination steps that result in it happening!
@Haui1985m6 ай бұрын
Hi, wich webinterface you use for python scripts? I want to use it to :)
@learndatawithmark
6 ай бұрын
This is Jupyter Lab - jupyter.org/
@InderasteinАй бұрын
hey um, i don't know if you'll read this in time, but I have a problem: pytorch_model.bin: 0%| | 0.00/13.5G [00:00
@learndatawithmark
Ай бұрын
Hard to know exactly why - maybe connectivity with Hugging Face or maybe your internet or maybe the download tool?! You could try going to Hugging Face directly and click through to files and download them directly to see if it helps.
@OmarAli195917 ай бұрын
sorry i'm just starting with this, the code you're writing in the beginning, what is the website called?
@learndatawithmark
7 ай бұрын
Do you mean this one? huggingface.co/
@rodriguezmj114 ай бұрын
Has anyone built a GUI for this?
@darylallen24857 ай бұрын
1:03 - Thanks for this clarification. I'd done quite a bit of Google searching and scouring the Hugging Face website for this information. I found nothing of value. I'm a computer enthusiast / gamer and not a professional machine learning engineer. Since embarking on running an LLM locally on my previous daily use desktop, I've noticed its near impossible to find a model's resource needs. GPT4 says a 7b parameter model would consume about 48 GB memory. I asked it what size model would fit in my 12 GB Nvidia 3060, it said about 3.2 billion. My question for you is, why is it that everyone in this space who seems to offer a model (or talk about them) never includes something like a system requirements descriptor? Is it one of those situations where, if you need to ask, you probably don't have enough resources? Thanks for any insight you can give on this phenomenon.
@learndatawithmark
7 ай бұрын
My impression is that most of the models being created are assuming that you have insanely good GPUs to run them on! Since I created this video, there's been a lot of work done by a guy called TheBloke on Hugging Face to 'quantise' the models, which effectively means that the amount of resources required is reduced, but the quality of the model is slightly reduced too. I've found those models work a lot better on my laptop. The Bloke is using a format called GGUF, which is kind of a defact format for LLM models. I made a video showing how to run one of his models on my machine - kzread.info/dash/bejne/aXZ8lqVvXau2YZc.html. That video uses a tool called Ollama which works on Linux/Mac - kzread.info/dash/bejne/gHqbp8mqpcSTlso.html There is also another library called CTransformers which lets you choose whether to run models on the GPU or CPU. I've found the 7B parameter quantised models work reasonably well even on the CPU. I should probably create a video about that I guess! But in the mean time, this is the link - github.com/marella/ctransformers
@darylallen2485
5 ай бұрын
@@learndatawithmark thanks!
@sillystuff62473 ай бұрын
Huggingface is a single point of failure. An index of alternative download URLs for LLMs is needed for when Huggingface is down, such as now (Feb 29, 0200 GMT)
@user-sm1re8xm5p25 күн бұрын
under the "..." in huggingface there is a "clone this repo" which copies all stuff onto your PC. seems simpler to me.
@learndatawithmark
25 күн бұрын
Probably works for this one but sometimes there will be multiple different versions of the same model and it'll take up all your free space if you do that!
@insideworld41222 ай бұрын
sir if wifi is on then they model is working properly or not?
@learndatawithmark
2 ай бұрын
Yes it should work without wifi - but you will need a connection to the internet to download the model.
@timjx36758 ай бұрын
Great vid, however I’m getting a value error, failure to import transformers error even though I used pip to do that, wondering if it’s a python version issue, I’m using 3.10, wonder if anyone has any ideas ? Thx
@learndatawithmark
8 ай бұрын
Can you share a script with all the code you ran and I'll try to reproduce?
@CGATTMUSIC
6 ай бұрын
use 3.9 its more stable
@The_Little_One_Of_Darkness4 ай бұрын
Hi, i try to find someone who uses GGUF directly and locally without using a .bin to launch it because I would like to launch it under python, is this possible? Or should I do something else?
@learndatawithmark
4 ай бұрын
You can do this using CTransformers like I did in this video - kzread.info/dash/bejne/hWaoys-wlNW_oqw.html I think you might even be able to do it with HuggingFace transformers, but I haven't tried it myself.
@The_Little_One_Of_Darkness
4 ай бұрын
@@learndatawithmark if one day you make a video on this, I would like to see it, in fact what I would have liked was to discuss with the model directly with python without going through any interface and to give it a personality with json like we have could do it with webui (but without webui) I tried various methods and honestly I find so little explanation. I had the idea of making my own bot as I saw in "wifu" mode in the sense that it is totally customizable and we give it a personality with a long term memory. The basic idea was to have a small model just for me. I'm just frustrated to see bots that don't even remember talking to us 2 seconds before. xD
@learndatawithmark
4 ай бұрын
@@The_Little_One_Of_Darkness it sounds like you want to keep the history of the chat messages between you and the LLM? I showed how to do this in memory on this video using Ollama, but it can be adapted to another approach - kzread.info/dash/bejne/f51-s8GznNGoldI.html. I can across a tool called MemGPT which I think attempts to solve this problem, but I haven't tried it yet - memgpt.ai/
@marufakamallabonno1468 ай бұрын
How can I use this downloaded model next time ?
@learndatawithmark
7 ай бұрын
It will already be there so if you try to use it again there won't be any need to download it
@mohsenghafari76523 ай бұрын
hi. please help me. how to create custom model from many pdfs in Persian language? tank you.
@paulohss22 ай бұрын
So many steps missing in this video...
@AwkwardTruths2 ай бұрын
Pinned
@artusanctus9973 ай бұрын
Seems unnecessarily complex... isn't there like an online space to use this stuff without having to write a bunch of stuff just to download it?
@learndatawithmark
3 ай бұрын
Yeh I think with a bunch of the models you're able to run them on the Hugging Face website on the right hand side of the page. And then in general there are many services that offer APIs that you can call. The approach describe in this video is only for if you don't want to use those services.
@TheFrankyguitar7 ай бұрын
When I run this: "os.environ.get("HUGGING_FACE_API_KEY")" I get"None". Is it normal?
@TheFrankyguitar
7 ай бұрын
I guess I need to set the HUGGING_FACE_API_KEY variable to my token beforehand.
@learndatawithmark
7 ай бұрын
You would need to set that environment variable before running your Python environment otherwise yeh it'll be none