LLMs will hit the data wall if they can’t generalize - OpenAI cofounder John Schulman
Ғылым және технология
Full Episode: • John Schulman (OpenAI ...
Apple Podcasts: podcasts.apple.com/us/podcast...
Spotify: open.spotify.com/episode/1ivz...
Transcript: www.dwarkeshpatel.com/p/john-...
Me on Twitter: / dwarkesh_sp
Пікірлер: 44
co-founder ?? i swear they're just writing new characters into this OAI plot-line
@MatRuizMat
16 күн бұрын
this guy is a scientific legend in the AI/RL field bro
@user-jf5uv9ir5k
16 күн бұрын
Exactly, he must be the 10th person to claim cofounder status
@kevinamiri909
15 күн бұрын
Bro this is the real person behind all OpenAI innovations I swear.
this guy AIs.
Dwarkesh pls stop uploading teasers before the actual show.. seeing shortform content suggests that the episode exists and there is no way to know until visiting you channel, only to then get disappointed.
@daniellawson9894
16 күн бұрын
Could keep it but put teaser / preview in the title
@radekwarowny
16 күн бұрын
Yeah I hate that too
@aazzrwadrf
16 күн бұрын
The full ep is probably not done editing yet. I don’t mind it tbh.
@forthehomies7043
16 күн бұрын
such an entitled take bro. just sub and keep notis on
@noone-ld7pt
16 күн бұрын
@@forthehomies7043 not an entitled take at all, he shared his opinion an a lot of people agreed. that's useful constructive feedback.
When will be the uploaded of full podcast link
what is with the sound mixing, something is off
@BadWithNames123
15 күн бұрын
they use ai to "clean" the audio track.. I hate it
@user-bp2ol4wi1c
15 күн бұрын
@@BadWithNames123 it sounds shit , raw would do better i think
Didn't really age well with Claude 3 and GPT 4o
Wow, great topic
Another legend. This is definitely my go-to AI podcast.
They will never run out of data. What they will likely run out of is captured data. Humans collectively likely produce massive amounts of _text data_ just by talking to each other every day, the question is how to capture it in a voluntary manner? Even if LLMs on their own can’t get us to AGI by themselves, they can serve as a sophisticated foundation on which to train other modalities on top of.
@dovekie3437
16 күн бұрын
How much of the human corpus of knowledge and history and science and literature are LLMs actually trained on? I would guess that it's less than 1/50th of existing books given the training size vs total amount of terabytes of text data all the books would require.
@squamish4244
8 күн бұрын
@@dovekie3437 Not to mention the five million scientific papers produced every year, a number that has soared in recent years.
@dovekie3437
7 күн бұрын
@@squamish4244 Hopefully the LLMs put information gained from "scientific" papers from the humanities in the same place in its memory that it puts religious texts.
and the game is back to algorithms and compute, again!
My hypothesis ( yet don’t have data to support it) : Current generative AI technologies (LLMs ) will reach at plateau soon(again lacks data) due to at least three reasons. Reason 1: underlying models zero in on a single value which makes cross domain generation of text (or images, videos, or data points) very limited and sometimes awkward. Reason 2: post 2022/23 distinction between naturally occurring (as well generating data) and synthetic data is blurring very fast which puts learning data in downward self spiral. Reason 3: Limited labeled data availability with respect to niche . For example images about various trees vs images of tree.
@Hexanitrobenzene
14 күн бұрын
You might be right. Mike Pound on Computerphile discusses a new paper: kzread.info/dash/bejne/lniJpY-FobnYgLg.html
it's not the data but the ARCHITECTURE that is a dead end
@kraithaywire
16 күн бұрын
What do you mean by dead? Will we not see any more progress for quite some time or what? I would really love to know. Thank you.
@IcySpicy3
16 күн бұрын
You mean x86?
@JackLawrence-dn2jb
15 күн бұрын
@@kraithaywire People have been saying the ARCHITECTURE IS A DEAD END for years, but that continues to be disproven time and time again. Don't listen to the doomers and naysayers.
@egor.okhterov
14 күн бұрын
@JackLawrence-dn2jb how is it disproven? By fancy UI? 😂
@JackLawrence-dn2jb
14 күн бұрын
@@egor.okhterov The fact that the models are getting better year by year. Elo scores going up, now we have multi-modality, improved text to video, improved text to image. People like you been saying these are a dead end for years. Clowns lmao
I found someone that makes sense, please release the full interview, I cannot wait to watch his interview.
"uhm"
REALLY don't like seeing clips of a full length interview that doesn't exist. Please stop doing this.
Well this aged like milk.
@assgoblin3981
16 күн бұрын
what the fuck happened
@aloysius_music
16 күн бұрын
Did it? GPT-4o is super impressive (and uncanny), but the core reasoning isn't a massive step up. There's a reason they didn't call it GPT-5.
@squamish4244
8 күн бұрын
@@aloysius_music It reveals the potential of where we can go from here, though. LLMs have a limit, but it's not GPT-4.
uhhhhhh ummmmm uhhhhh
He’s not a good public speaker.
@Derick99
7 күн бұрын
I think he's having trouble answering without saying to much