AI Explained
6 ай бұрын
74,727
1

Phi-2, Imagen-2, Optimus-Gen-2: Small New Models to Change the World?

Ғылым және технология

Phi-2 is a tiny model that could fit on a phone, but it outperforms huge language models like Llama 2. I explain more about how it was made and what it means. Then we see Imagen-2, the most stunning text-to-image yet, at least according to Google' images. We then glimpse Optimus 2, smaller in the sense that it's 10kg lighter! With more degrees of freedom, it's movements look a lot more humanoid. And then the full launch of AI Insiders, plus a recap of why we shouldn't use the MMLU to 2 decimal places!
/ aiexplained
phi2 now on HuggingFace: huggingface.co/microsoft/phi-2
Bubeck Video, min 19: • Textbooks Are All You ...
Phi 2: www.microsoft.com/en-us/resea...
Shital Shah: / 1734882570603753814
Shoggoth: / 1702488701782352097
Mamba 3B: www.together.ai/blog/mamba-3b...
Phi 1.5B: arxiv.org/abs/2309.05463
Phi 1: arxiv.org/abs/2306.11644
Microsoft Prompting: www.microsoft.com/en-us/resea...
SmartGPT Video: • SmartGPT: Major Benchm...
The Information: www.theinformation.com/articl...
Imagen 2: / 1734954295655534780
deepmind.google/technologies/...
/ 1734763060244386074
Greg Technology: / 1734544659953623509
Swyx: www.latent.space/
AI Engineer: youtube.com/@aiDotEngineer?si...
Shawn Wang: x.com/swyx?s=09
/ aiexplained Non-Hype, Free Newsletter: signaltonoise.beehiiv.com/

Пікірлер: 432

@SebastienBubeck6 ай бұрын
Yet another amazing video! I really enjoyed your critical take on benchmarks like MMLU, this is much needed.
@aiexplained-official
6 ай бұрын
Thanks so much Sebastien, Phi-2 is an incredible model - have been testing it for many hours - congratulations to you and the team! And yes, am looking forward to new benchmarking standards for 2024. Thank you again for speaking yesterday.
@Diabloto966 ай бұрын
Philip doing public work by fact-checking the MMLU WHILE creating all this content?? Impressive work, you're one-of-a-kind in the AI vulgarization field, congrats!
@aiexplained-official
6 ай бұрын
Thanks Diabloto, I am very LLM-curious
@gabrote42
6 ай бұрын
@@aiexplained-official major credits!!! Hope you sent a link to this to all those companies!
@sumanthbalaji1768
6 ай бұрын
@@aiexplained-officialhey this MMLU flaws are crazy, could you share the doc of inaccuracies for others to go through?
@skierpage
6 ай бұрын
@@sumanthbalaji1768I found a Medium post from August, "Errors in the MMLU: The Deep Learning Benchmark is Wrong Surprisingly Often," but that seems independent work by a Daniel Erenrich.
@sumanthbalaji1768
6 ай бұрын
@@skierpage yes I went through that blog too, doesn't have this document of errors
@Megneous6 ай бұрын
You honestly need to publish a paper on the errors in the MMLU. This needs to be seen by academia.
@KP-sg9fm
6 ай бұрын
100%
@maxm1555
6 ай бұрын
No paper needed, they should watch this video and immediately build a new test from the ground up!
@StevenAkinyemi
6 ай бұрын
They know lol
@onoff5604
6 ай бұрын
please please publish, but please prepare to be attacked for your honesty
@raphaelsoltero88056 ай бұрын
I feel as though it is slightly ironic that the Ai's intelligence was held back not by their own way of learning, but by our inaccurate datasets.
@KibberShuriq
6 ай бұрын
It makes a lot of sense though. We tried to make it equally good at predicting experts AS WELL as predicting average Joes AND raging lunatics. Of course that task is much harder than just predicting experts.
@rantmarket6 ай бұрын
I still can't believe the MMLU isn't being called out by people, at least. It's been so long since you found those problems, that I won't accept that people don't know about the issue enough to have it thrown out by every benchmark set using it. Thank you again for your great work. Cheers.
@aiexplained-official
6 ай бұрын
Thanks rant. I thought so to and then up it pops with Gemini, front and centre
@skierpage
6 ай бұрын
@@aiexplained-official what did the authors of "Measuring Massive Multitask Language Understanding," Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt, say when you contacted them?
@DaveShap6 ай бұрын
Increase efficiency!
@ClayFarrisNaff6 ай бұрын
I love that you're an informed AI enthusiast, yet you're not afraid to criticize -- and to do so forcefully -- where you see the need. It's a mark of integrity.
@aiexplained-official
6 ай бұрын
Thanks Clay
@kylemorris53386 ай бұрын
Having seen your previous work on the MMLU that graph that declared a .06 PERCENT breakthrough made me burst out laughing. We need an MMLU 2 or something to that effect yesterday, and I'm starting to suspect the only reason we don't have it yet is that nobody wants their fancy benchmark numbers to go down, even if they would be more accurate. Re: Phi-2, I am happy to see that synthetic data is getting more love, as opposed to the models that just use mass scrapes of any data that isn't tied down properly.
@randfur6 ай бұрын
Thanks for looking into the benchmark data, they were too opaque up until now. Whenever a model scores impressively on one we should dig into it to know whether it really means it's good at X subject or if it's just good at making the same mistakes.
@swyxTV6 ай бұрын
thanks for having me as your first Insiders speaker Philip!
@aiexplained-official
6 ай бұрын
And thank you so much swyx. It was a great talk and laughed at the intro!
@L1AM6 ай бұрын
Well, at this rate this time next year we'll have a locally runnable AGI.
@Feel_theagi
6 ай бұрын
I'm more excited about how much better the largest cloud ones will be
@Boufonamong
6 ай бұрын
Imagine that 😂, I'm calling mine hal
@Karearearea
6 ай бұрын
5 years from now we probably will
@aN0nyMas
6 ай бұрын
@@BoufonamongI'm calling mine Meg. Short for Megatron.
@aiexplained-official
6 ай бұрын
AG-phi?
@CalConrad6 ай бұрын
For the record, you have the best thumbnails in the game.
@aiexplained-official
6 ай бұрын
Aw thanks Cal, often get criticised for them and hundreds of offers to pay for thumbnail services but I love them too. Minimalist.
@skippy60866 ай бұрын
The pace of the race toward the edge of cliff quickens
@MachineLearningZuu
6 ай бұрын
Gemini Nano hit the punch line 🥊
@GrindThisGame
6 ай бұрын
Time to fly.
@aiexplained-official
6 ай бұрын
Hmm, but synthetic data is good for safety no?
@Igor_lvanov
6 ай бұрын
@@aiexplained-official Maybe this model won't be a Shoggoth, but there are a lot of ways things may got wrong. E.g. because we will get extremely powerful systems without proper defence mechanisms against misuse, or things like instrumental convergence.
@consultantnigel-projectman72746 ай бұрын
As a new Patreon member, I'm here to tell you how amazing AI Insiders is. Phillip's research is impeccable. The Insider info is priceless. Those of you who make your living with AI - do yourself a favor & budget the $30 each month to support Phillip. Everyone will eventually be making their living with AI; if not today, very soon. You will need quality, authoritative information upon which you can make important decisions. AI Insiders will provide you with AI news that is second to none. If you have not already, join. Completely worth the money.
@aiexplained-official
6 ай бұрын
Such high praise, thank you so much. If you like what's there, in 2024 you will be even more impressed!
@heinkle16 ай бұрын
I’ll be honest, I stopped watching your videos for a while because they caused me too much anxiety - but when I then look at some of the other things going on in the world, it is actually comforting to hear about AI. Congrats on your meteoric growth in 2023.
@aiexplained-official
6 ай бұрын
Thanks for coming back heinkle! Great to have you here
@H1kari_16 ай бұрын
The big big issue most people are currently overseeing is that all those benchmarks are in english. The data is in english. The models are heavily optimized for english. GPT3.5 and GPT4? Speaks about any language it has gotten some data for and also provides excellent results for tasks in those languages.
@aiexplained-official
6 ай бұрын
Great point
@twosaibackbot
6 ай бұрын
Yeah I am Swedish and will be truly scared of an automated workforce when these LLM:s speak and understand smaller more local languages fluently. GPT-4 is decent at it but not yet good enough for professional use
@jokmenen_
6 ай бұрын
Very true. I haven't seen a model with less than 70b params yet that really impressed me with its performance in my language
@ryzikx
6 ай бұрын
though that is a very big problem i'd argue the larger problem is 'poisoned' models basically trained to tackle the benchmarks rather than being actual general-purpose models
@KyriosHeptagrammaton
6 ай бұрын
Given that multi-modality seemed to boost performance I wonder if multilingual models would also be boosted.
@jawadur_6 ай бұрын
The most value delivered per minute on KZread
@aiexplained-official
6 ай бұрын
Thanks so much jawadur!
@_ptoni_6 ай бұрын
I was impressed by the phi-2 code perf
@alphahurricane79576 ай бұрын
i think that smaller models giving out 100% accurate information to a general, bigger AI capable of understanding and finding anomalies in the process, or be critical of the result is the real AGI i saw today "teslabot 2", im very much interested in seeing AI and robotics in everyday life a lot of insights as always, thanks!
@MCA0090
6 ай бұрын
Maybe the way to go is finding ways to make models smaller and more efficient to the point that they could run on local devices instead of big datacenters rellying on clound and internet connection and higher latencies (Cloud would never work to make robots work properly)... Yesterday I was reading about liquid neural networks and how they can do the work with just a few neurons, it seems promising to shrink really large NNs into much smaller and faster ones especially for vision, videos, images and audio/speech recognition, for robotics LNNs can handle vision better than current neural networks and run fast even on small devices such as Raspberry Pi because it needs just a few neurons to do the same task as a really big NN based on other architectures do. LNN are very small and have plasticity to adapt to new situations without needing a new training process.
@harnageaa6 ай бұрын
TL;DR If the data set trained for these 'gpts' would actually be accurate, we'd have even more impressive models overall. So not even changing the training method, just the data you can get way better models
@skierpage
6 ай бұрын
I wonder if a large model with a big context window would be able to spot inconsistencies and mistakes in training data. I saw a documentary where a AI presented with logical inconsistencies went into a garbled "Does not compute" loop and eventually caught fire, so maybe it's too dangerous!
@harnageaa
6 ай бұрын
Idk, how can you determine if something is right or wrong if you learn the wrong thing in the dataset. I think the best would be "smaller models" used by a bigger model.Where the smaller models are used to detect inconsistencies within the dataset. You train the small models with 100% accurate data and you teach them to spot right/wrong answers, and that's their sole purpose, and they will find every mistake in any dataset. So a model for math one for chemistry one for biology,etc. Then the bigger model could access through api these mini models and get the results from them and recreate pdfs with "correct dataset". I think it's safer that way, when you have a big models it's harder to "control" and know what he actually knows. And to make a model that have perfect data for code,math,physics,etc. It's basically the final product we want, but to obtain that we need to curate the data we have, and fastest way to do that is a smaller model. Then once all data is curated, we use that for a bigger model. I spammed q_q oops. u get the point. @@skierpage
@skierpage
6 ай бұрын
@@harnageaa symbolic AI tried to develop AI by teaching systems only the right answers, and it's utterly failed to keep up with transformers. One of the great things about LLMs is they can handle inconsistency and exceptions: "Water is wet" (ice), "Palestine is a state" (disputed), "An asteroid killed the dinosaurs" (generally accepted), etc. Learning everything includes ingesting megabytes of the "wrong" things; again, I want to know if an LLM can be aware of discrepancies while or after it trains.
@skippersthepenguin35916 ай бұрын
They should make Phi-3 a 7B model. If Phi-2 is a quarter of it then increasing by double should make it even better, and 7B models are runnable on 90% of computer hardware.
@berkertaskiran
6 ай бұрын
Their priority is probably phones.
@zygote396
6 ай бұрын
yeah, i think it's especially important for decent models to be able to run on low-end phones so that LLM access isn't restricted to the first world @@berkertaskiran
@noone-ld7pt
6 ай бұрын
@@zygote396 Oh wow, that's an incredibly important argument, had not thought about it like that and I really appreciate you sharing that perspective!
@QH96
6 ай бұрын
Don't quote me, but a 7 billion model would probably use about 6 GB of ram
@carkawalakhatulistiwa
6 ай бұрын
@@QH96and iPhone 15 pro max only have 8gb ram . And IOS sistem aredy use 2gb of ram
@3dVisualist6 ай бұрын
With AI Insider, you really are creating a lot of content. I do hope it turns out you were AI all along!
@aiexplained-official
6 ай бұрын
Haha not quite, a hardworking human!
@iamcoreymcclain
6 ай бұрын
@@aiexplained-officialthe way you pronounced “Imagen” made me question if this was an AI voice as well lol but I think you’ve left enough small clues to prove your humanity 😂
@SBImNotWritingMyNameHere
6 ай бұрын
@@aiexplained-official thats what you think
@3dVisualist
6 ай бұрын
@@aiexplained-official certainly hardworking! Thanks for all your explainers, they really help stay on top of the fast moving world of AI.
@OLBAPPOAWECBRKLFK6 ай бұрын
Amazing as always! Whish regular media was half as good at divulging complicated topics, this channel is gold
@aiexplained-official
6 ай бұрын
Wow thanks pablo
@baychaoz6 ай бұрын
7:06 such a legend
@BTFranklin6 ай бұрын
Is there any effort to actually correct the MMLU? If not, why not? What would be required to get these corrected? I feel that this is a serious problem and it's disturbing that the MMLU is continuing to be used without correction.
@Olack876 ай бұрын
Amazing video, as always. Have you contacted any of the people in the field about the erroneous benchmarks? Do we know if anyone is working on it to create new ones or fix them? I can't believe they don't know or care about it but the problem is still there it seems.
@aiexplained-official
6 ай бұрын
Yes, and people are. There are better benchmarks coming out all the time, hence my surprise at this MMLU d-measuring
@MindFieldMusic6 ай бұрын
Billy Madison to the MMLU, "I choose: Business Ethics." 😉
@educated_guesst6 ай бұрын
Hi Philip just wanted to say thank you for still pumpin out so many videos despite your patreon contend probably also being a ton of work Thank you so much for keeping us informed!
@aiexplained-official
6 ай бұрын
Haha, no thank you for supporting on Insiders. It's what keeps the main channel going!
@Dylan-zg2jl6 ай бұрын
As usual, a fascinating video with revealing insights that are seldom if ever found anywhere else. Great job mate and look forward to more
@aiexplained-official
6 ай бұрын
Thanks so much Dylan!
@ryzikx6 ай бұрын
always a good day when phillip ai uploads
@sharkeys6 ай бұрын
You know they are flexing their ability when they show hands :D
@aiexplained-official
6 ай бұрын
Haha
@schemage22106 ай бұрын
We have all seen the Boston Dynamics robots doing incredible things, but the scripting and trail and error involved to make those incredible videos is insane. And lets not forgot that the Atlas robot is huge. Are we actually meant to believe that Musk's Optimus robot is "as described"? AI powered, and physically capable of all the actions its shown doing?
@whiterottenrabbit
6 ай бұрын
This time next year
@McDonaldsCalifornia
6 ай бұрын
Anything Musk hypes up should be taken with a laaarge grain of sand
@k2255 ай бұрын
AIs are experiencing the real world of academic exams. I remember several times in college where textbooks were wrong, exam questions were ambiguous, or we were told to give outdated or blatantly wrong answers to pass tests and get good grades.
@eburgwedel6 ай бұрын
Can’t wait to see Mixtral in the mix!
@DreamOfFlying6 ай бұрын
I absolutely love your videos! They deserve each and every view and like!
@aiexplained-official
6 ай бұрын
Thanks so much Dream!
@HonestyLies6 ай бұрын
great vid as always, strapping in for next year's craziness
@stephenrodwell6 ай бұрын
Thanks! Fantastic content, as always. 🙏🏼
@onoff56046 ай бұрын
Thank you so much for investigating problems with testing.
@Q1w1N6 ай бұрын
I don't know what's more concerning, the fact that those models did so good at flawed test, or that they might be much more capable than we think.
@Shaunmcdonogh-shaunsurfing6 ай бұрын
Sounds great for general chat conversation
@doctormaddix21436 ай бұрын
Can’t appreciate your work enough! Thank you.❤
@aiexplained-official
6 ай бұрын
Needed to hear that, thank you
@MrSchweppes6 ай бұрын
As always great video! Very informative! Many thanks to you!
@aiexplained-official
6 ай бұрын
Thanks so much, as always!
@freek6336 ай бұрын
Phi-2 is a tiny model that could fit on a phone, but it outputs huge language models like Llama 2. (from the caption) outputs should be outperforms!
@aiexplained-official
6 ай бұрын
Thanks freek, missed that!
@Just4Games20116 ай бұрын
Great video, but why not mention Mixtral? Are you still experimenting with it?
@aiexplained-official
6 ай бұрын
First I think phi2 is more significant but also to cover it properly would be a lot more work, there's only so much time in the day!
@Just4Games2011
6 ай бұрын
@@aiexplained-official Fair point, can't wait to see your video on it.
@clearpupil6 ай бұрын
This explains why I did so badly in my medical exams. The college has all the wrong answers :-)
@aiexplained-official
6 ай бұрын
True true
@miker996 ай бұрын
when will they learn? rubbish in rubbish out. Thanks for all your efforts to bring awareness to this issue of testing quality.
@CrueMusic6 ай бұрын
Thank you! I hope you dont reduce the ammount of great content here on your channel. Its invalueable.
@aiexplained-official
6 ай бұрын
Hopefully this video is evidence not!
@JohnLeMayDragon6 ай бұрын
Thanks for another informative video!
@aiexplained-official
6 ай бұрын
Thanks so much John, means a lot
@davidbutler93236 ай бұрын
By this time next year, I expect to see a continuous stream of AI Explained content generated by Phillip-2 or I'll be really disappointed.
@aiexplained-official
6 ай бұрын
Haha, I will be human-generated to the end
@williamjmccartan88796 ай бұрын
Thank you for sharing your time and work Phillip, I responded to one of your tweets, by asking if you knew what is going on over at liquid ai, the new year is fine and by the looks of it your going to really busy, but if you get a chance I'm curious as that's where Joscha Bach is working at right now. Merry Christmas to you and your family and all the other family helping you with this great work, and a Happy New Year.
@aiexplained-official
6 ай бұрын
Merry Christmas Bill, will check it out, cool name at the very least
@Y3llowMustang6 ай бұрын
Wow that was surprisingly sudden end to the video
@aiexplained-official
6 ай бұрын
The wonderful day came earlier!
@nacho78726 ай бұрын
Amazing video as usual, thanks for the fast update
@aiexplained-official
6 ай бұрын
Thanks nacho!
@covle91806 ай бұрын
Small models ftw! If I cant run it on my phone or self-host it (without really expensive GPUs) then 90% or use cases just don't work. Models are flaky enough as they are. Add to that the unreliability of some companies' APIs, we need self hosted solutions we can fine tune. (Not to mention privacy issues)
@youssefanajar40616 ай бұрын
Best yt channel
@matusstiller42196 ай бұрын
Great video, like always.
@aiexplained-official
6 ай бұрын
Thanks matus
@GrindThisGame6 ай бұрын
Better data, better models...makes sense.
@kevinli37676 ай бұрын
I'll ask the question that everyone's curious about - how are you able to 1) access, 2) digest, 3) synthesize, and 4) produce so productively???
@aiexplained-official
6 ай бұрын
Will do a video on that someday! And don't forget the hours of content (researched, narrated and edited by me) for AI Insiders at the same time, plus sourcing and conducting interviews! And comment replying!
@kevinli3767
6 ай бұрын
AGI must be helping you with the details :D @@aiexplained-official
@ok3737376 ай бұрын
Brilliant!
@patronspatron76816 ай бұрын
Me thinks the Phi models were named after you. :-)
@aiexplained-official
6 ай бұрын
Haha, very kind of you to think it!
@YoussefMohamed-er6zy6 ай бұрын
Finally a new video!!!🎉🎉🎉
@atom14966 ай бұрын
For the benchmark, it is common to include wrong or ambiguous questions so catch training leakage. It should not be possible to get a 100%.
@yw19716 ай бұрын
I think if we can find a formula, no matter how long & complex, that can be the 'Engine' for such a training, it will change the field.
@bobtivnan6 ай бұрын
Tesla robot walking like "I just sharted"
@jamesatotago6 ай бұрын
Great video again! Please do a video on synthetic data. I get that this will likely decrease toxicity but what else will it do. If, for example, Microsoft is building the synthetic data, does that mean that we are training the AI on Microsoft’s view of the world? One can imagine how this could be influenced by all sorts of commercial imperatives. Will synthetic data make models more and more similar to one another and perhaps less interesting?
@aiexplained-official
6 ай бұрын
I don't think less interesting if you ensure diversity - see original phi1 vid
@Veileihi6 ай бұрын
Feels like we're a part of those vaguely touched upon histories in AI movies 😅
@aiexplained-official
6 ай бұрын
Haha, nice way of putting the strangeness
@aaronnewman26 ай бұрын
You are beautiful sir. Thanks as always.
@aiexplained-official
6 ай бұрын
Wow thanks Aaron, that's cheered my spirits
@KP-sg9fm6 ай бұрын
TOP FRICKEN NOTCH MY FRIEND, THANK YOU!!!
@aiexplained-official
6 ай бұрын
So kind KP!
@muhammedkoroglu65446 ай бұрын
Amazing content! Don’t get how you don’t have a million subs
@aiexplained-official
6 ай бұрын
Aw thanks Muhammed
@carterellsworth78446 ай бұрын
Is it rational to say that if Google and OpenAI are using the MMLU benchmarks in this way without acknowledging the benchmarks problems that they are behaving too naively to deserve public trust to try and solve the alignment problem? It's so blatant once you point it out that I find it very disturbing no one else talks about it
@skierpage
6 ай бұрын
The two issues seem unrelated. The numbers game to two decimal digits is stupid when the benchmarks are 1% flawed, and training to the test when the test is bad may degrade models' real-world abilities, but what does that have to do with alignment?
@user-hk8jt6so3l6 ай бұрын
I can not thank you enough! I will definitely support you on patren when my finances allow it! THANK YOU FOR GUIDING US THROUGH ALL OF THIS, YOU ARE THE BEST!❤
@aiexplained-official
6 ай бұрын
Thanks so much, no worries on Patreon, your kindness here is enough!
@cjgoeson6 ай бұрын
0:00 “You my have thor”
@aiexplained-official
6 ай бұрын
Subtle t in there somewhere
@tomski26716 ай бұрын
By my estimate it cost about $70k to train. However the real cost is preparing the data.
@lorenzoblz7996 ай бұрын
It would be interesting to take a few LLMs and ask them to evaluate the questions: are they clear, are they ambiguous, do they make sense?
@jessedbrown19806 ай бұрын
Jesus crist. So many implications from this, and will result in massive improvements. Thank you so much for pointing this out as it will really slap AI into hyper drive.
@onoff56046 ай бұрын
Great video!! many thanks. On the topic of generated images of human faces: Look at the shirt collar (and ears and ear-rings if you can see them), instant give-away. The face is phenomenal...but textile manufacturing is apparently a harder problem.
@aiexplained-official
6 ай бұрын
Nice spot
@mugwortofz3lb1za6 ай бұрын
Always the best videos!! Have you considered making a patreon tier where some of the funds go towards a google colabs backend for the patreons to use, depending on their subscription amount & time?? Given how little resources were used training Phi-2, it could be a good idea to let people experiment with the concepts shown in your videos, as well as more exotic variations in model architecture such as cyclic attention heads, sub networks etc..
@user-pf9jv1fl2n6 ай бұрын
Great video just one question Do you feel the AGI?
@ekstrajohn
6 ай бұрын
The others think you should no longer be on the board. It's not my decision, really.
@beaumac6 ай бұрын
AGI coming to a mobile device near you in 2024 thanks to synthetic data. Is there any safety checking done on this data?
@aiexplained-official
6 ай бұрын
Well it's synthetic so shouldn't be as bad but I was still surprised that there was toxicity at all, maybe I shouldn't be
@tomaszkarwik63576 ай бұрын
if this was SDXL. i'd give the image a 9/10 the problems are: -the eyes ( they are not pointing in the same direction) -the ear (it is just weird) -the lighting is wrong (the leafs are lit from behind the subject and the subject is lit from the front) -her whole right side is a bit wonky - the 1 strand of hair in the back is weird. 7:33 PS if you want to see the best SDXL models, use the ones over at cvitai and not the stablity ai's (the 1.0 model is still the best you can get from there). Just pick the "JuggernautXL" or " DreamshaperXL" as they are SotA for XL. PSPS Other then the part about imagen-2 this was a very good video. Love your dedication to the craft of making ai news without the hype.
@aiexplained-official
6 ай бұрын
Thanks tomas, your professional eye caught much more than me, apologies
@tomaszkarwik6357
6 ай бұрын
@@aiexplained-official i ain't a proffesional, but i use SD. these things are just what you train your eye for
@maciejbala477
6 ай бұрын
really? Dreamshaper is SotA? I knew about Juggernaut but i remembered dreamshaper's earlier non-SDXL versions as kinda worse than some alternatives. Will have to try it out WyvernMix was another that impressed me
@tomaszkarwik6357
6 ай бұрын
@@maciejbala477 the XL version is at least close to the sota for trubo Or at least it was late last week
@anywallsocket6 ай бұрын
Soon we’ll have to get the LLMs to not only generate the next update’s training data, but to prove to us the labels are correct, because otherwise we are limited by what we think we know.
@onil23016 ай бұрын
Is there a way to access the document you've compiled of the errors that you found in the MMLU Benchmark? i would like to source it for my bachelor thesis, if that's possible.
@zonas79156 ай бұрын
Welcome HAL9000 and Skynet !
@andybrice27116 ай бұрын
I wonder if it's a good idea to remove all "toxicity" from datasets though. I'd imagine it's necessary to hear plenty of bad ideas in order to understand them. It might result in a model which is naïve.
@aiexplained-official
6 ай бұрын
Interesting point, nice one
@supertetleman6 ай бұрын
Just wait until Jan 8th. No meetings on the last week of December or first week of Jan. All the AI researchers have extra time to work on their pet projects and keep the compute famrs running over the holiday; I expect to see some interesting results, it's always the most productive time of year.
@dcgamer10276 ай бұрын
Apprciate the updates as always, I was wanting to look more into the MMLU since you mentioned people still using it and thought I'd go back and watch your video on it, but it's not in the description, might be a good idea to put it there since you played a bunch of it at the end here. I assume I'm not the only one that might want to look more at that part. Anyways ty and have a good day :) edit: also just a thought, has anyone compiled an exhaustive list of the issues in the MMLU test? And if so does anyone have a link to that list?
@aiexplained-official
6 ай бұрын
Hey dc, thought I put it in there somewhere! Can search mmlu broken benchmark too. And no, to the best of my knowledge this channel has shown the biggest repository of mistakes in that benchmark
@DavidsKanal6 ай бұрын
Hey Philip, dunno if it's me watching this at 6am but this video felt a little fast and stressful. Do you think you could integrate a 1 to 2-second pause before switching to a new topic to give us time to digest the information?
@aiexplained-official
6 ай бұрын
Thanks for the feedback David, will bear it in mind!
@be2eo502
6 ай бұрын
Agreed. We poor biological intelligences need a longer pause between concepts to integrate the information.
@aiexplained-official
6 ай бұрын
It's like we need a smidgen of noise between the signal
@humunu6 ай бұрын
MMLU...WTF? (Merry Christmas/Happy Holidays)
@BradleyZS6 ай бұрын
The errors with the MMLU makes me think a good test for AI should have trick questions - questions without actual answers or lacking the appropriate option - to test the AIs ability to recognise when it doesn't know or can't find the answer.
@skierpage
6 ай бұрын
I think the video showed that GPT-4 would give a better answer than any of the garbled multiple choice answers. I think you could engineer a different test-taking prompt where you prompt the AI to pick the best multiple choice answer but also point out when there's a problem with the Q&A. One problem is these technical reports are drowning in a sea of benchmark numbers, so I'm sure the person cranking out all the scores to two decimal digits has no time for nuance or evaluation.
@BradleyZS
6 ай бұрын
@@skierpage While it is useful to let it answer freely, in terms of serving people AI should be able to work within constraints. Otherwise it will likely become just an advertising tool, always telling you to buy the industry tool to get the job done. In an example specefic to me, I do a lot of python programming on my phone and ChatGPT often gives coding examples for libraries that don't don't work on my phone. So it's handy if we can give it a constraint - asking it to solve the coding problem with a specefic library - since we may want the best solution we can do right now than the theoretical perfect solution.
@skierpage
6 ай бұрын
@@BradleyZS Make up your mind. Do you want to constrain the AI to answer a multiple choice question, or point out that it's flawed? What should the AI do in response to a sleazy lawyer: "Have you stopped beating your wife? Answer yes or no!"
@BradleyZS
6 ай бұрын
@@skierpage The ideal would be if the AI could recognise the intent of such questions. That it could understand that a leading question is intended to ascribe undue guilt to it, or that a trick test question exists to test the AI's ability to react to an impossible task. Such an ability, I believe, is crucial for the progression of AI beyond the simple LLM. An AI should be able to understand the desire of the user, and in the context whether it should give the best answer or admit the inability to answer.
@AICodingAdventures6 ай бұрын
Awesome video! You did a great job exposing MMLU and how shady it is. I agree that people should stop trusting it as a measure of capabilities. What about MoE and Mistral?
@aiexplained-official
6 ай бұрын
Thanks AICA, still investigating !
@UncleJoeLITE6 ай бұрын
I'll speak only to what I know. That project sounds amazing, I wish I was into VC, I'd buy in! Tbh, most GenX weren't taught ANY entrepreneurship if we went the corporate/govt career. I'm sure you have even bigger plans, depending on what sticks. _Putting decimal places on data with ~? confidence intervals is how we manipulate ofc._
@KolTregaskes6 ай бұрын
4:30 Not many people are talking about the flaws in these benchmarks, e.g. MMLU. Perhaps we need another video on this?
@KolTregaskes
6 ай бұрын
I've read, heard and watched a lot of content for Gemini and very few mentioned any issues with MMLU. For once I think a more clickbaity title is needed.
@aiexplained-official
6 ай бұрын
Haha, more so than the original 'Broken Benchmark Smartgpt' one!
@KolTregaskes
6 ай бұрын
@@aiexplained-official Hehe, indeed. Perhaps it needs spelling out more, including words like "MMLU" and not "SmartGPT". BTW, how is SmartGPT going?
@aiexplained-official
6 ай бұрын
@@KolTregaskes more to come on that front in 2024...:)
@zockermarlon51836 ай бұрын
comment for the algo. keep up the great videos :D
@aiexplained-official
6 ай бұрын
Thanks zocker!
@Woodchuckization6 ай бұрын
Is it time for Philip to create a bench marking test for AI systems himself?
@lhomme_bepis6 ай бұрын
Could you add timeline sections to your videos? I'd like to see an outline of what topics exactly are being covered at a quick glance
@aiexplained-official
6 ай бұрын
When I add timestamps YT doesn’t automatically segment the video, wondering what I am missing
@Truizify6 ай бұрын
Is anybody working on anything like an MMLU 2.0? Something that addresses all the factual errors, lack of multilingual questions, etc? Seems crazy this isn't in the works given A. the popularity of MMLU as a benchmark (Gemini👀) and B. the flaws inherent to MMLU that should be widely known at this point.
@OccultDemonCassette6 ай бұрын
Gemini Pro has terrible context memory retention. 5 replies in and it doesn't even remember the first prompt from the conversation.
@haileycollet41476 ай бұрын
Please make a cleaned (remove or fix questions) MMLU bench as a PR to Eleuthear's evaluation benchmark :)
@haileycollet4147
6 ай бұрын
Some fixes better than none...
@Houshalter
6 ай бұрын
You can't just change a benchmark that is already widely used. It would create confusion when different models are tested at different times. And produce results that aren't comparable to each other. It needs to be a new benchmark like "MMLU 2"
@haileycollet4147
6 ай бұрын
@@Houshalter I mean, arguably it's pretty worthless in its current state ... I suppose it could be its own bench, or v2 or 1.5 or whatever, but seems better to fix it somewhere than to just say it's bad, since it's gonna get used anyway...
@procactus91096 ай бұрын
Im so glad you're not saying the words Sam Altman every chance you get :). I'll give you some slack now LoL
@aiexplained-official
6 ай бұрын
Haha, thank you
@procactus9109
6 ай бұрын
@@aiexplained-official lol... Cheers mate
@mukulishwar27376 ай бұрын
Can you also talk about the newly released Mixtral 8x7b?
@HakuCell6 ай бұрын
will you also make youtube shorts for those who don't have time for all the details?
@aiexplained-official
6 ай бұрын
Maybe one day! Do you find the videos too long?