Sam Witteveen
Күн бұрын
5,503
1

Qwen 2 - For Reasoning or Creativity?

Ғылым және технология

In this video I go through the new releases from Qwen family of models and look at where they excel and where perhaps they aren't as good as other models out there.
Colab: drp.li/ADevp
🕵️ Interested in building LLM Agents? Fill out the form below
Building LLM Agents Form: drp.li/dIMes
👨‍💻Github:
github.com/samwit/langchain-t... (updated)
github.com/samwit/llm-tutorials
⏱️Time Stamps:
00:00 Intro
00:29 Qwen 1.5
01:06 Qwen 2 Blog
01:22 Qwen 2 Model Information
01:58 Qwen 2 Multilingual
03:22 Qwen 2 Model Performance
04:37 Qwen-Agent
05:48 License
06:19 Qwen 2 in Ollama
06:53 Code Time
07:07 Qwen2 7B Demo
11:35 Qwen 2 InstructChat Demo

Пікірлер: 35

@MeinDeutschkurs15 күн бұрын
It is really good in creative writing if you prompt it to plan the story first. I got amazing results with the phrase “rich and enhanced narration”. It was also fun to play with untypical text strings based on patterns as an output format.
@samwitteveenai
15 күн бұрын
Interesting I will try that out.
@Viewable1116 күн бұрын
Qwen 1.5 and 2.0 models appear to be optimized for tasks in the STEM areas, whereas other models appear to be optimized for creative writing and conversation. Qwen 2 7B instruct reached 79.9 on HumanEval, which is very impressive for a non-coding specific model of that size. Can't wait for a coding optimized version of Qwen 2. The strongest open source coding model is currently Codestral 22B before Llama 3 70B instruct and Mixtral 8x22B instruct
@Viewable11
8 күн бұрын
Update with new benchmark data: The Aider LLM leaderboard shows the following list of open source LLMs best suited for use with the Aider coding assistant: DeepSeekCoder V2 236B, DeepSeekChat V2 236B, Qwen2 72B instruct, DeepSeekCoder 33B, Codestral 22B, Llama3 70B instruct, WizardLM2 8x22B, CodeQwen1.5 7B Chat. DeepSeekCoder V2 236B is now the best coding LLM overall. For the first time, an open source LLM is better than all paid models in one domain. It is challenging to run the DeepSeekCoder V2 236B, since you need hundreds of GB of fast memory, which rules out local hosting. The best open source coding LLM for local hosting is DeepSeekCoder 33B, which can run on a RTX 3090.
@ShawnThuris16 күн бұрын
Hey Sam, thanks again for putting out these videos. I think you may have a noise gate set too high on your voice as we occasionally lose trailing syllables in your videos. If you have a lot of background noise to exclude, look for a hysteresis adjustment on the gate -- then you can set the level needed to keep the gate open once it's open. If there's no hysteresis setting, the next best is to set a slower release time.
@samwitteveenai
16 күн бұрын
yeah for some reason the noise reduction went crazy on this recording, even though it is the same as what I normally use. I think this could be due to Descript shipping a new version. Looking for alternatives today.
@patricktang337714 күн бұрын
Q is abbreviation for "Question"; and "Wen" is the PinYin for 问， which is the character "Question" in Chinese. This LLM was trained by Chinese tech giant company Alibaba (similar to AWS), and Simplified Chinese is the core for multilingual base in this model. It is interesting that Simplified Chinese is not included in the language chart in your video. 🤔
@unclecode16 күн бұрын
Amazing review. All the cool features and improvements in math and coding aside, I'm extremely happy to see we have an Apache model that sounds like all of us! Covering all main language regions has always been my number one concern. AI models were becoming extremely selective, and we know that when we lose genetic diversity, we face extinction. No matter our cultural background, our existence is deeply interconnected with all other cultures on planet Earth, even those we haven't yet heard of. Languages are like musical notes; losing even one changes the entire symphony. In my opinion, language defines us, and if AI is to preserve our essence, language models must reflect this diversity. The lack of linguistic diversity means a lack of humanity and, eventually, a form of cultural extinction. For me, this development is amazing, and I agree, Groq should really bring this into the game. I'm going to share your video on my X and draft a post about it. Thx for your video that motivated me to write about this topic :)
@7hunderbird16 күн бұрын
I would humbly ask, please don’t put in long sections of black video for too long. It makes me think YT has bugged out. I would suggest a neutral tone color other than black. Thanks for your work on these informative videos! Keep it up. ❤
@MudroZvon
16 күн бұрын
scotophobia?
@7hunderbird
16 күн бұрын
@@MudroZvon no. Just was listening to the video while I took notes and the long black pause threw me off my rhythm. It’s dark for a substantial time of almost 30 seconds. I’ve had YT simply stop sending video before while continuing to play audio. And it was optional feedback overall.
@alenjosesr316014 күн бұрын
Hi, can you do ollama function calling video?
@GoldenBeholden16 күн бұрын
I wonder how much performance you can get out of a 7B model like this using an approach similar to Anthropic's recent Monosemanticity paper. Are the answers these benchmarks are looking to be found encoded somewhere in the model given the right biases during inference, or do we really need those very large models after all?
@samwitteveenai
15 күн бұрын
I certainly think a lot tricks are showing that these small models can do a lot more than originally thought. The game is not just about size alone.
@davidw866816 күн бұрын
So, it was trained on the gsm8k dataset we can presume don't we? What i noticed for previous versions anecdotally, is that it did pretty well in various languages in terms of style and tone of the particular language.
@samwitteveenai
15 күн бұрын
I was wondering the same. I suspect they may have made a similar dataset rather than actually train on GSM8K.
@Nick_With_A_Stick16 күн бұрын
I had used it on my custom json mode benchmark, and I got it down to a .3% failure rate(after changing the multi-turn prompts to “json objects not detect, please try again differently”). It really likes to follow json mode, I would recommend if you need a model with consistent json mode. I only think it ever failed due to out of context. Once lm studio supports llama.cpp’s quantized kv cache, then I can finally use 128k context length.
@MukulTripathi
16 күн бұрын
I do need a model to do that, can you share a git repo with your code to share how you achieved this?
@Nick_With_A_Stick
16 күн бұрын
@@MukulTripathi sorry can’t its an active project I’m writing a paper over; however, if you use LM studio, you can use open ai API. Letting you set the response format to “json_object” and at the end of the prompt include “Return response in json format with the following json objects “‘whatever you want:’ ‘just an example:’ ‘as is this one:’” Then if you want my results, you make the script parse the json format into a new json file; if it fails, resend it to the model with a new prompt I had in my original comment, with a limit of 3 times. Also if you want a GGUF with more than 32k max context you can use the 128k 5bit qwen 2 7b gguf I made its under my HF profile “Vezora”.
@8ballAI
15 күн бұрын
I'm working on json and structured text lmk outputs. You're much further ahead with that failure rate. Please share your GitHub if possible thx
@samwitteveenai
15 күн бұрын
Good comment. It makes sense that it would be good at the JSON given the reasoning strength. Thanks for pointing this out.
@Nick_With_A_Stick
15 күн бұрын
@@samwitteveenai awesome :)! YT deleted my second comment, but if you want to use a GGUF with more than 32k context, I uploaded qwen 2 7b 128k to my HF page: Vezora. And I learned if you put the code first, then the system prompt at the end, it tends to work even better (assuming due to needle in the haystack). My benchmark did 6800/6860. At the end. Very very impressive, and handled it self at higher context.
@xhan367416 күн бұрын
pronounced as "qian wen". Q stands for "Qian"(meaning "thousands"), "wen" means "ask". So Qwen in Chinese means "thousands of questions"
@samwitteveenai
15 күн бұрын
Thanks !!
@eightrice16 күн бұрын
"democratizing AI" does not mean multi-lingual
@samwitteveenai
16 күн бұрын
I agree it shouldn't only mean multi-lingual. For me it should mean making AI (models, frameworks and ideally hardware) more accessible to people which is hard to do without making things more multi lingual when the majority of the world doesn't speak English. I am curious what do you take it to mean?
@eightrice
15 күн бұрын
@@samwitteveenai Decentralized training and inference so that the people at large can own the weights. We do that using a smart consensus protocol along with an architecture of incentives (a.k.a. economy). That way, the people can retain economic relevance. It also helps with safety and alignment.
@tjchatgptgoat16 күн бұрын
The production for this model started in China - I'm out.
@samwitteveenai
16 күн бұрын
genuinely curious. why? This is made by Alibaba Cloud which is a very capitalist org.
@tjchatgptgoat
15 күн бұрын
@@samwitteveenai Alibaba is no longer controlled by the original founder of Alibaba is controlled by the Chinese government. When you install this model on your computer you're basically telling every hacker in the Chinese government come steal your stuff. I love your channel but I'm sitting this one out.
@husanaaulia4717
14 күн бұрын
@@tjchatgptgoatIt shouldn't possible, model doesn't have access to the computer. Model isn't executable 🤔. Or I might wrong
@tjchatgptgoat
14 күн бұрын
Just think about it for a second theyre giving us all of these large language models for free? It's because we the product. They have basically open source you and I not the model we're the ones providing the use cases. Now through China in a mix and your hecked to death. Hacking is what they do
@drsamhuygens
14 күн бұрын
@@husanaaulia4717you are right. TJ is being paranoid.
@TomGally14 күн бұрын
Note that the model has been censored for topics that are sensitive to the Chinese government. If you try asking it about the sovereignty status of Tibet, the independence of Taiwan, the Tiananmen incident, etc., it will throw an error in most cases. I observed this when testing the 72B instruct chat version on HuggingFace.