Gemma 2 - Google's New 9B and 27B Open Weights Models

Ғылым және технология

Colab Gemma 2 9B: drp.li/6LuJt
🕵️ Interested in building LLM Agents? Fill out the form below
Building LLM Agents Form: drp.li/dIMes
👨‍💻Github:
github.com/samwit/langchain-t... (updated)
github.com/samwit/llm-tutorials

Пікірлер: 40

@olimiemma4 күн бұрын
Am so glad I found your channel by the way.
@jbsiddallКүн бұрын
great video sam! this is my first look at gemm 2.
@SwapperTheFirst4 күн бұрын
fantastic news and great overview, Sam.
@Nick_With_A_Stick4 күн бұрын
27b takes up 15gb in 4bits ❤. Although llama 8b smashes both its human eval with 62.2, which google conveniently left out of the chart in the paper. But then again llama 8b’s human eval drops to 40 at 4bits, and there was a new code benchmark I think by big code bench, I saw on twitter and it showed llama 3 70b actually sucks at coding and *potentially* *allegedly* trained on human eval, probably on accident with a 8T token dataset 🤷‍♂️ Elo dosent really lie, with the exception of GPT4-o people just like it because it makes pretty outputs, like they way it formulates it’s outputs is really visually appealing (for example it uses a ton of markdown, like lines separating the title, and big font and small font at certain times). which 100% launch the scores to the moon because claude sonnet 3.5 is significantly better, provided my main use case is coding.
@blisphul8084
3 күн бұрын
the IQ1_S quant is only 6GBs so it can fit on an 8GB GPU like the RTX 3060ti. No need for H100s here. Though based on humaneval, I'll stick with dolphin Qwen2 for now.
@Nick_With_A_Stick
3 күн бұрын
@@blisphul8084 for coding I like Codestral with continue dev a vs code extension, but then again ever model sucks in comparison to sonnet 3.5’s 1 shot code ability. And for some reason it actually kinda looses performance at multi shot, if you are in a convo with it editing long code it occasionally messes up, but it rarely ever makes an error if you start a new convo and re ask the question with the code. Side note, I wish Eric had done his fine-tuning on top of the Qwen instruct model using lora. It would help combine the strengths of both the datasets.
@unclecode4 күн бұрын
Thanks, Sam! You know, it started with 7b as a trend, then Meta made it 8b, and now Google has 9b. I wish they'd compete in the opposite direction. 😄 Btw, I have an opinion. Let me share it and hear yours. I’ve noticed recently proprietary models often train with a chain-of-thought style, to the level became annoying because it’s hard to get the model to do otherwise. This approach ensures the model crosses benchmarks but gives it one personality that's hard to change. For instance, GPT-4o became a headache for me! It always follows one pattern, regenerating entire previous answers, even if the answer is just one word! It's annoying, especially in coding. Imagine you want a small change, but it regenerates the whole code. constantly have to remind it not to regenerate the whole code, just show a part of it, and it's frustrating. This is clearly due to the training data. I don’t see this issue with open-source models. One proprietary model I like, Anthropic, still feels controllable. I can shape its responses and keep it consistent. To me, this technique hides model weaknesses. It’s easier to train a model to stick to one style, especially if the data is syntactically generated. Language models need a word distribution that's not overly adjusted, or they become biased. When they release a model as an instruct model with one level of fine-tuning, you still expect it to be unbiased. Fine-tuning it to take on another behavior would be tough.
@longboardfella5306
3 күн бұрын
Interesting. I’ve noticed the same thing with getting stuck - when it kept producing an incorrect word table I couldn’t get it to stop repeating that each time.
@samwitteveenai
3 күн бұрын
Definitely post training ( SFT, IT, RLHF,RLAIF etc ) has changed a lot in the last 9 months. All the big proprietary models and big company open weights are now using synthetic data heavily. A big challenge with synthetic data is creating the right amount of diversity. This could explain some of what you are seeing. Also you might be seeing models that have been overly aligned with reward models etc. Anthropic has “Ant thinking “ for their version of CoT and it is wrapped in xml tags. I think a lot of that gets filtered on their UI etc. The Gemma models clearly show Google has gone down the path of baking CoT into the models. For following System prompts well I think Llama is much better. I test the model by asking them to be a drunk assistant. For some reason Llama can do that very well.
@supercurioTube4 күн бұрын
All quantizations are available for Gemma2 9b and 27b in Ollama, but the 27b has an issue, with a tendency to never stop its output.
@falmanna
3 күн бұрын
The same happened to me with 9B 4bit ksm
@volkovolko5778
3 күн бұрын
Got the issue on my video on my channel
@TreeYogaSchool3 күн бұрын
Thank you for making this video!
@SirajFlorida3 күн бұрын
Well if Gemma 2 is just barely beating llama3 8b and it has an additional billion parameters than I would leap to say that llama is the higher quallity model. Not to mention outstanding support for llama. I get the commercial license, but I kind of see the llama license as not allowing big tech to just plagiarize models. If only we had llama 30ish B. Oh dear zuck. If you can ever hear these words. Please give us the 30B. We love and appreciate all that you do!!!
@SirajFlorida
3 күн бұрын
Actually, I think he did... and it's multimodal model called chameleon. :-D
@onlyms4693
3 күн бұрын
So can we use llama 3 for our chat bot support freely for our enterprise?
@imadsaddik4 күн бұрын
Thanks for the video
@jondo76804 күн бұрын
The 9b one is very interesting and promising.
@dahahaka4 күн бұрын
Damn, feels like Gemma came out last month
@toadlguy4 күн бұрын
Running Genma2 9B with Ollama on an 8GB M1 Mac, even though it is only 5.5 GBs (for the 4-bit quantized model) it immediately starts running into swap problems and outputs at about 1 token/sec. The llama3 8B (which is 4.7GBs for 4-bit quantized model) runs fine entirely in working memory even with lots of other processes running. So there must be something different about how the inference code is running (or Ollama's implementation)
@NoSubsWithContent
4 күн бұрын
are you sure that 8GB isn't being partially used to run the system at all? it could also just be that the hardware is too old, I had 16GB and still couldn't run qwen 2 0.5B
@samwitteveenai
3 күн бұрын
Apparently they had issues with it and are fixing them. For me I just made vid in the last few hours and it seemed fine. Maybe try to uninstall and try again.
@micbab-vg2mu4 күн бұрын
thank you - I am waiting for Geminy 2.0 Pro - )
@samwitteveenai
4 күн бұрын
Give it a bit of time.
@strangelyproton51884 күн бұрын
hello can you please tell me whats the best hardware to buy for running at max 70b models not just for inferencing but also for instructor tuning
@NoSubsWithContent
4 күн бұрын
with quantization I think you can get away with a single H100, 80GB. using QDoRA will achieve nearly the same performance as full finetuning while still fitting within this constraint. for cost effectiveness you could try multi-GPU training with older versions, this is just harder to set up and takes a lot more understanding of the specs
@user-bd8jb7ln5g4 күн бұрын
What I really wanted from Gemma is at least a 100k context window. It looks like that is not forthcoming.
@samwitteveenai
3 күн бұрын
Someone may do a fine tune to get it out to that length. Let’s see
@AdrienSales3 күн бұрын
I gave a try to gemma:9b vs llama3:7b on function calling... and I got much better results with llama3. Did you give a try to function calling ?... maybe will there be a specifi tuning for fucntion calling.
@samwitteveenai
3 күн бұрын
AFAIK Google doesn’t do any FT for function calling on the open weights models. I have been told it could be due to legal issues, doesn’t make a lot of sense to me. The base models can be tuned to do this though
@AdrienSales
Күн бұрын
@@samwitteveenai Keeping an eye on FT on ollama library within the next few days
@MeinDeutschkurs4 күн бұрын
dolphin, dolphin, dolphin!!!! ❤❤❤❤ I hope it’s being read!!!! 🎉🎉🎉🎉gemma2 seems to be a cool model for dolphin. Have I already mentioned it? Dolphin. Just in case! 😆😆🤩🤩
@Outplayedqt
3 күн бұрын
Dolphin-3.0-Gemma2 🙏🏼
@MeinDeutschkurs
3 күн бұрын
@@Outplayedqt 🥰
@AudiovisuelleDroge4 күн бұрын
Neither 9b or 27b Instruct supports a system prompt, what were you testing?
@samwitteveenai
4 күн бұрын
You can just append them together, which is what I did there. You can see it in the notebook.
@flat-line
3 күн бұрын
@@samwitteveenaiwhat is system prompt support for ? If we can just do it like this ?
@samwitteveenai
3 күн бұрын
So on models that support a system prompt it normally gets fed into the model with a special token added. If the model is trained for that it can respond better for it (like the Llama models) if it doesn’t have it like Gemma prepending like I did still can work well but it is just part of the overall context.
@sammcj20004 күн бұрын
Tiny little 8k context. Pretty useless for many things.