Fine-Tuning Meta's Llama 3 8B for IMPRESSIVE Deployment on Edge Devices - OUTSTANDING Results!

Ғылым және технология

This video demonstrates an innovative workflow that combines Meta's open-weight Llama 3 8B model with efficient fine-tuning techniques (LoRA and PEFT) to deploy highly capable AI on resource-constrained devices.
We start by using a 4-bit quantized version of the Llama 3 8B model and fine-tune it on a custom dataset. The fine-tuned model is then exported in the GGUF format, optimized for efficient deployment and inference on edge devices using the GGML library.
Impressively, the fine-tuned Llama 3 8B model accurately recalls and generates responses based on our custom dataset when run locally on a MacBook. This demo highlights the effectiveness of combining quantization, efficient fine-tuning, and optimized inference formats to deploy advanced language AI on everyday devices.
Join us as we explore the potential of fine-tuning and efficiently deploying the Llama 3 8B model on edge devices, making AI more accessible and opening up new possibilities for natural language processing applications.
Be sure to subscribe to stay up-to-date on the latest advances in AI.
My Links
Subscribe: / @scott_ingram
X.com: / scott4ai
GitHub: github.com/scott4ai
Hugging Face: huggingface.co/scott4ai
Links:
Colab Demo: colab.research.google.com/dri...
Dataset: github.com/scott4ai/llama3-8b...
Unsloth Colab: colab.research.google.com/dri...
Unsloth Wiki: github.com/unslothai/unsloth/...
Unsloth Web: unsloth.ai/

Пікірлер: 24

@israelcohen441226 күн бұрын
So i never post comments, but the way you explained this was by far the best i have seen online, i wish I found your channel 8 months ago :) Please keep posting videos your explanation is very well thought off and put together.
@andrew.derevo5 күн бұрын
Good stuff sir, thanks a lot 🙌
@petroff_ss11 күн бұрын
Thank you! You have a talent for explaining and planning a workshop! Thank you for your work!
@ratsock26 күн бұрын
Absolutely fantastic! Really appreciate, detailed, clear breakdown of concrete steps that let us drive value, rather than the clickbait hypetrain that everyone else is on.
@tal7atal7a6626 күн бұрын
i like the thumbnails, topic types, explains methods, and the mr who explain. nice channel very valuable infos ❤
@gustavomarquez226925 күн бұрын
You are amazing! This is the best explanation about this topic. I liked it and just subscribed. Thank you very much !!!
@scott_ingram
25 күн бұрын
Thank you so much for the kind words and for subscribing, I really appreciate it! I'm so glad you found the video helpful in explaining how to fine-tune LLaMA 3 and run it on your own device. It's a fascinating topic and technology with a lot of potential. I'm looking forward to sharing more content on large language models and AI that you'll hopefully find just as valuable. Stay tuned!
@andrepamplona999325 күн бұрын
Super, hyper fantastic! Thank you.
@Danishkhan-ni5qf15 күн бұрын
Wow!
@EuSouAnonimoCara19 күн бұрын
Awesome content!
@RameshBaburbabu26 күн бұрын
Thank you so much for sharing that fantastic clip! It was really informative. I'm currently looking into fine-tuning a model with my ERP system, which handles some pretty complex data. Right now, I'm creating dataframes and using panda-ai for analytics. Could you guide me on how to train and make inferences with this row/column data? I really appreciate your time and help!
@scott_ingram
25 күн бұрын
Thanks for your question and for watching the video. I'm glad you found it informative! Your approach largely depends on your use case and the kind of insights you're looking to derive from your data. Generally, you're going to want to follow these steps to train a model with complex data: Decide how you plan to interact with the model. For instance, maybe you're doing text generation; or natural language understanding tasks like sentiment analysis, named entity recognition and question answering; or text summarization; or domain-specific queries like legal, medical or corporate. Choose a model that has high benchmarks for the specific requirements of your task, the nature of your data and the desired output format. A model is more likely to train well if the base model's capabilities are already very strong for the task you intend to use it for. Consider factors like model performance, computational resources, and the availability of pre-trained weights for your specific domain or language. Prepare and preprocess your dataframes, removing/filling missing values, encoding variables numerically and normalizing the data. The cleaner the data, the better the training will be. Split the data into a training set and validation set. The validation set will be data you haven't trained the model on to see how the model performs with unseen data. Fine-tune with your dataset, test the model out, then iterate on the process by tweaking data, adding more data, trying different training parameters, even trying different models. Hope this helps guide you in your endeavor!
@15ky324 күн бұрын
Amazing video, thanks for the best explanation I’ve ever seen on KZread. Could you also please make a video how to finetune the phi3 model? 🙏
@scott_ingram
24 күн бұрын
Great suggestion! I will look into that.
@andrew.derevo5 күн бұрын
Did you have any experience with fine tuning for non english data on this model, any suggestions for a good multilingual open sources models?🙏
@15ky324 күн бұрын
Is the output from Ollama on your MacBook in real-time? Or you have speed up in the video? On my 2014 iMac, it is significantly slower. It's about time for a new one. What are the technical specifications of your Mac?
@scott_ingram
24 күн бұрын
Except for the download, which I sped up significantly, everything in terminal was shown in real time. The demo was done on a MacBook Pro M3 Pro Max. YMMV with other hardware.
@azkarathore435514 күн бұрын
Hi i want to finetune llama3 for English to urdu machine translation can you guide me regarding this.dataset is opus 100
@madhudson126 күн бұрын
rather than using google colab + compute for training, what are your thoughts on using a local machine + GPU?
@guyvandenberg9297
26 күн бұрын
Good question. I am about to try that. I think you need an Ampere architecture on the GPU - (A100 or RTX 3090). Scott, thanks for a great video.
@guyvandenberg9297
26 күн бұрын
Ampere architecture for BF16 as opposed to F16 as per Scott's explanation in the video.
@scott_ingram
26 күн бұрын
Thanks for your question! The notebook is designed to do the training on Colab, but you can run it locally for training if you have compatible hardware; I haven't tested it locally though. The RTX3090 does support brain float. Install python, then set up a virtual environment: python3 -m venv venv source venv/bin/activate Next, install and start the Jupyter notebook service: pip install jupyter jupyter notebook --kernel_name=python3 That will run a local jupyter notebook service and connect to a Python 3 kernel. Then, test GPU availability: import torch print(torch.cuda.is_available()) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") print(f"Using device: {device}") Here's how you would create a tensor with pytorch on the RTX 3090 and tell it to use brain float: tensor = torch.randn(1024, 1024, dtype=torch.bfloat16) Some cells in the notebook won't run correctly, such as the first cell that sets up text wrapping (this cell is not relevant for training); that's designed for Colab specifically. There may be other compatibility issues, but I haven't tested it running locally. This should get you started to see whether your GPU could potentially work. Let me know how it works out!
@PreparelikeJoseph
19 күн бұрын
@@scott_ingram Id really like to get some ai agents running locally on a self hosted model. Im hoping two rtx 3090 can combine just via PCI and load a full 70b model.