How to Run LLAMA 3 on your PC or Raspberry Pi 5

Ғылым және технология

Meta (i.e. Facebook) has released Llama 3, its latest Large Language Model. It comes in two sizes, an 8 billion parameter model and a 70 billion parameter model. In this video I look at how you can run the 8 billion parameter model on your PC or even on a Raspberry Pi 5.
---
Let Me Explain T-shirt: teespring.com/gary-explains-l...
Twitter: / garyexplains
Instagram: / garyexplains
#garyexplains

Пікірлер: 52

  • @sonofabippi
    @sonofabippi26 күн бұрын

    This was such a pleasant, easy to digest video about llama 3, that the next video started playing and I was like, "wait, that dude was awesome. I need to go back, do all the things (like, subscribe, bell)!"

  • @maxdiamond55
    @maxdiamond55Ай бұрын

    Thanks Gary great video, nice and direct. Exactly whats needed to get up and going, can't wait to test this.

  • @EndaFarrell

    @EndaFarrell

    Ай бұрын

    I _did_ get set up after watching this: 15 min later I was up and running with Llama 3 7B in lmstudio. Thanks Gary!

  • @coolParadigmes
    @coolParadigmesАй бұрын

    Amazing that such a huge LLM can run on a Raspberry PI 5! By the way for the towel question (3 towels=3 hours to dry, how long for 9 towels?) it describes correctly the reasoning to use [it depends on the towels characteristics (supposed identical) and the drying process], but it gets it wrong anyway 🙂 1 hour/towel which is only valid with a dryer paradigm with a drying power of precisely one towel/hour, but in the sun and plenty of space it takes the same time independently of the number of towel, so 3 hours.

  • @user-yp2ps3gn3x

    @user-yp2ps3gn3x

    Ай бұрын

    So why are you answering 3 hours? Is it because you batched the towels into 3-towel batches? But why do that? Lay them all out together on a sunny day, and ONE hour later they will be dry.

  • @coolParadigmes

    @coolParadigmes

    Ай бұрын

    ​@@user-yp2ps3gn3x I assumed that the original question "3 hours for 3 towels to dry" implied implicitly a standard external sunny garden also because 3 hours inside a dryer for 3 towels is incredibly long. But of course the original setting (3 hours for 3 towels) could be in relation with a malfunctioning dryer or super wet towels etc. So except assuming the most probable setting for the 2 situations (outside in both cases) we don't have enough information to extrapolate the time for 9 towels.

  • @NordicSemi
    @NordicSemiАй бұрын

    Love your t-shirt, Gary!

  • @mikeburke7028
    @mikeburke7028Ай бұрын

    Thanks, definitely going to give it a try.

  • @GaryExplains

    @GaryExplains

    Ай бұрын

    Have fun!

  • @MRLPZ359
    @MRLPZ3595 күн бұрын

    Interesting video, any experience with other models, maybe SLMs like Phi-3 on raspberry pi. Also on smaller Raspis. Is there a chance to run this on a Raspi Zero 2 ?

  • @andueskitzoidneversolo2823
    @andueskitzoidneversolo2823Ай бұрын

    never heard of the towel question. i like it. like asking if a pound of feathers on Jupiter also weighs the same on earth.

  • @chrisarmstrong8198
    @chrisarmstrong8198Ай бұрын

    Did it give the correct answer for the towels question ? It looked like it was about to say 9 hours.

  • @peanutnutter1

    @peanutnutter1

    Ай бұрын

    Looks like it

  • @JoelJosephReji

    @JoelJosephReji

    Ай бұрын

    yup, it got derailed towards the end

  • @An.Individual

    @An.Individual

    Ай бұрын

    You cannot say 9 hours is wrong. You need more information before you start telling LLM's they are wrong.

  • @Norman_Fleming

    @Norman_Fleming

    Ай бұрын

    @@An.Individual or anyone actually. The question is incomplete.

  • @aleksandardjurovic9203
    @aleksandardjurovic9203Ай бұрын

    Thank you!

  • @peterfrisch8373
    @peterfrisch8373Ай бұрын

    Gary, would a Raspberry Pi run Llama 3 faster with a Google Coral TPU?

  • @xuldevelopers
    @xuldevelopersАй бұрын

    When recommending software to install on users' computers, how do you verify its source? Have you read the terms and conditions? What do you know about that company?

  • @ManthaarJanyaro
    @ManthaarJanyaroАй бұрын

    Can I feed the data to it, and ask from that Data?

  • @Wobbothe3rd
    @Wobbothe3rdАй бұрын

    Could a Mac with sufficient RAM run the full model?

  • @vasudevmenon2496
    @vasudevmenon2496Ай бұрын

    Any reason why llama file wasn't used since you can swap out gguf file which works on nvidia, m1 and experimental support for rocm? Duck duck go AI chat is quite good and didn't need login. Claude felt a bit more natural than openai gpt4. Is llama3 a new foundation model that can perform on par with higher parameter in condensed form?

  • @GaryExplains

    @GaryExplains

    Ай бұрын

    Having to swap out one file for another defeats the whole purpose of llamafile.

  • @vasudevmenon2496

    @vasudevmenon2496

    Ай бұрын

    @@GaryExplains but you can use different models with same executable.

  • @GaryExplains

    @GaryExplains

    Ай бұрын

    Yes, but the whole point of llamafile is that you just download one file, that is its unique feature, otherwise use LM Studio or Ollama. Lllamafile is based on llama.cpp, as is LM Studio, so you gain nothing by fiddling around with the internals of llamafile. Just use an app that is designed to support multiple models. For example LM Studio also has experimental support for rocm, because they use the same underlying tech.

  • @vasudevmenon2496

    @vasudevmenon2496

    Ай бұрын

    @@GaryExplains oh wait i think you're mistaking it for generic 4GB file covered in your video. I'm just using their small executable which needed to be renamed to exe and proper gguf and other parameters. So

  • @GaryExplains

    @GaryExplains

    Ай бұрын

    OK, but I still don't see how that is better than LM Studio or Ollama?

  • @JNET_Reloaded
    @JNET_ReloadedАй бұрын

    ollama run dolphin-llama3

  • @AlwaysCensored-xp1be

    @AlwaysCensored-xp1be

    Ай бұрын

    Thanks, will try that one. Don't like some answers from Llama3

  • @ex2uply
    @ex2uplyАй бұрын

    hey which ram version of raspberry Pi u are using?

  • @GaryExplains

    @GaryExplains

    Ай бұрын

    Pi 5 with 8GB.

  • @MrStevemur
    @MrStevemurАй бұрын

    Sounds fun, but I'd like it to be able to analyze input from sources other than what I type into a chat window. E.G., "Read report_final.docx and generate a summary." Are we there yet?

  • @Vinnye9

    @Vinnye9

    27 күн бұрын

    GPT-4 can do it, but i haven't seen any local options. Sorry.

  • @G4GUO
    @G4GUOАй бұрын

    Are those examples using a quantized version of the model?

  • @GaryExplains

    @GaryExplains

    Ай бұрын

    Yes

  • @Nik.leonard

    @Nik.leonard

    Ай бұрын

    Ollama by default uses q4_0. I'm not sure why it doesn't uses the arguably better, and only slightly larger q4_KM

  • @G4GUO

    @G4GUO

    Ай бұрын

    @@GaryExplains I thought that might be the case. I am running the 8B model on an RTX4090 and wondered how you managed to get it to fit!

  • @timr.2257
    @timr.2257Ай бұрын

    What would be the specs required for the 70 billion parameter model?

  • @tonysheerness2427

    @tonysheerness2427

    Ай бұрын

    Money.

  • @Nik.leonard

    @Nik.leonard

    Ай бұрын

    Quantized on 4 bit it requires aprox. 40gb VRAM, puting it only in the realm of dual RTX 3090/4090's or top-tier mac M series (Max or Ultra with 64gb ram). Sure, you can run it in cpu/gpu mixed mode if you have 64gb ram but it will generate only 1 to 2 Tokens per second. The full unquantized model weighs 141gb so it's nearly impossible to run in a home lab. For testing the model, you can use Groq beta.

  • @Wobbothe3rd

    @Wobbothe3rd

    Ай бұрын

    @@Nik.leonard Thanks, was wondering exactly this.

  • @iainmalcolm1
    @iainmalcolm1Ай бұрын

    hehe works a treat

  • @DryRoastedNutz
    @DryRoastedNutzАй бұрын

    Will a dual cpu 40 core xenon server wirh 128G ram run the 70 billion version?

  • @DryRoastedNutz

    @DryRoastedNutz

    Ай бұрын

    Also, has the extreme liberal bias been removed?

  • @moldytexas

    @moldytexas

    Ай бұрын

    Afaik yes. It takes around 20-40gb of storage, just that inference will he stupidly slow. If you have access to some beefy gpu's, offload it onto that depending upon vram ofc.

Келесі