An Exactly Solvable Model for Emergence and Scaling Laws

The paper:
arxiv.org/abs/2404.17563
Support my learning journey either by clicking the Join button above or becoming a Patreon member!
/ tunadorable
Discuss this stuff with other Tunadorks on Discord
/ discord
All my other links
linktr.ee/tunadorable

Пікірлер: 35

@Crawdaddy_Ro7 күн бұрын
Emergence is one of the concepts I enjoy researching most! Complexity science is, without a doubt, a truly futuristic science! This paper really pulls my cord, dude! Edit: The paper is interesting but feels pretty basic when it comes to explaining emergence in deep learning models. They used a simplified model with specific tasks designed just for this research, and while it's cool to see skills following a power law and showing up as a sigmoid curve, I'm not sure how relevant it is to real-world applications. The models seem too tailored to this experiment to draw any solid conclusions about how skills emerge in more complex, practical scenarios.
@loganlawrence14767 күн бұрын
Parameter count limit in the bottleneck table might also be a proxy for inference costs or product latency, eg. a company sets aside a fixed budget for a deployed model but has lots of time until go-live and is willing to spend money on training to find the best performer within that speed constraint. Just an idea, great video btw!
@AndreRSilva-oz1nd7 күн бұрын
Man, amazing vids, keep the good work going!
@marcfruchtman94737 күн бұрын
The title of this paper is super interesting. I do find the choice of "skills" as being basis functions within the model to be somewhat difficult for me to wrap my head around. It would be immeasurably more useful if they were able to demonstrate that it also modeled some real world example... such as using the MNIST data and applying some basis function such as detecting horizontal lines, vertical lines, Diagonals, loops, etc and then evaluating the result to see if it matched their findings when using the mathematically derived basis functions. I look forward to any future updates.
@whemmakatatt53117 күн бұрын
NICE content. S tier
@wwkk49647 күн бұрын
Top tier content!
@netherportals7 күн бұрын
Pretty cool new ability
@joe_limon7 күн бұрын
I think to advance future models, we are going to have to figure out how to increase training efficiency.
@kimcosmos6 күн бұрын
Is it possible to separate the simple skills by filtering out all data that does not assume those simple skills. ie. filter out the obvious data once it becomes obvious, to avoid repeated;ly reinventing the wheel. It means identify the obvious once it becomes obvious. ie looking for nonobvious or counter intuitive data. Running a prediction filter on (what has become) the obvious. It means testing generalising circuits (4 layers to find, +4 to test) and using them as retrieval heads to filter the data stream. Qstar is relatively compute inefficient but useful with sparse data because of its improved accuracy, and this would be a good use case. Filtered data, less shots. Maybe less parameters after filtering and maybe less layers if fast grokking retrieval heads with 8 layers.
@Tunadorable
6 күн бұрын
interesting. recently the fineweb-edu dataset was created as a filtered down version of fineweb where they asked llama 70b whether each document had educational value or not. i imagine that may be a conceptually easier method (albeit potentially more computationally intensive). a question like “is this document relatively mundane, or does it contain unusually rare/complex facts/reasoning?”. alternatively some sort of rating by perplexity or some other quantitative measure might work.
@kimcosmos
6 күн бұрын
@@Tunadorable RAG like retrieval heads can use a more focused subset for local learning. Especially few shot sparse data methodical analytics. ie "What am I missing here?" Fineweb extracts data pairs (ER?) with one of its 5 reward prompts for creating artificial data being "- Add another point if the extract addresses certain elements pertinent to education but does not align closely with educational standards. It might mix educational content with non-educational material, offering a superficial overview of potentially useful topics, or presenting information in a disorganized manner and incoherent writing style."
@kimcosmos
6 күн бұрын
@@Tunadorable "Add another point if the extract addresses certain elements pertinent to education but does not align closely with educational standards. It might mix educational content with non-educational material, offering a superficial overview of potentially useful topics, or presenting information in a disorganized manner and incoherent writing style.". 1 out of 5 points in their artificial generator prompt. Its not using Q* to find optimum paths. Fineweb is getting the low hanging fruit. Q* shakes the tree and is good for ticker feeds
@RoulDukeGonzo7 күн бұрын
How does this relate to the whole "measurement creates emergence" thing?
@JGLambourne6 күн бұрын
're orthography of real world skills. Feels a little bit of a stretch to think of such complex things in this linear way, but I guess one could imagine some "basis" skills from which others are composed.
@andrewsilber7 күн бұрын
Maybe Congress should authorize a full digitization of the Library of Congress if what we need is trillions of tokens of quality data. Presumably they could justify it on the grounds of national security, if the goal is to stay ahead in the “AI arms race”
@Tunadorable
7 күн бұрын
interesting
@phpn997 күн бұрын
It's a descriptive model. It has no predictive power.
@JehovahsaysNetworth7 күн бұрын
ChatGPT can’t write PHP like I showed it how to. I tried and it failed to understand. If you know a better bot to try out direct me to one to choose to work with.
@SoFukinDope24
7 күн бұрын
easy solution: use anthropic
@JehovahsaysNetworth
7 күн бұрын
@@SoFukinDope24 I will search for it and try it thanks
@RoulDukeGonzo
7 күн бұрын
Easier solution, learn python
@JehovahsaysNetworth
7 күн бұрын
@@RoulDukeGonzo I know some python I used to use a piwiki bot on my mediawiki
@ricosrealm
7 күн бұрын
Claude is the best for coding.
@RoulDukeGonzo7 күн бұрын
From the comments i think i got the answer, but just to clarify, this is theoretical right? Why would skill data be so uniform on real skills.
@Tunadorable
7 күн бұрын
yes it's theoretical. on real skills it's likely not as uniform but very possible the general theme still holds true in aggregate. The idea that some skills are common while rare skills are very very very (orders of magnitude or exponentially more) rare seems reasonable; if anything the alternative would be that rare skills are only slightly more (geometric? linearly?) rare would be a good thing. However so far the fact that we've had to increase LLM training compute by orders of magnitude in order to get linear returns on benchmarks would imply the former
@sikunowlol7 күн бұрын
oi
@jacksaunders19297 күн бұрын
Have you thought about doing a PhD?
@Tunadorable
7 күн бұрын
oof during undergrad I considered doing it one in economics but back then after going through through the legit publication process, talking with professors, looking at the way the system works, etc it sounded more restrictive than freeing. considered it again when I decided I wanted to pivot into AI but I was blessed to chance upon a short conversation with Paul Christiano and he told me it wasn't necessary for this field, just self-publish then go work at a company. Rn I'm hoping I can become self-sufficient off KZread and do a combo of research & science education without any boss/restrictions
@danielmartinmonge40543 күн бұрын
This paper seems to miss the point about emergent capabilities. From my understanding, the model is learning to solve a specific problem only because it appears in the dataset and is solved in an exact way. The more frequently this exact problem appears, the faster the model learns it.However, true logic, abstraction, and understanding are about finding broader connections between concepts and solving new problems that are not present in the dataset. My intuition suggests that this approach is not suitable for learning natural language. Human knowledge cannot be reduced to a finite set of easily solvable problems. This method overlooks the critical strength of large language models: symbolic abstraction, where specific problems are merely examples of broader categories.It seems to me that the paper fails to address the core aspects of these new architectures. It applies mathematical models designed for narrow, purpose-specific AI rather than for this broader kind of intelligence.
@waveFunction257 күн бұрын
Oi
@GNARGNARHEAD7 күн бұрын
oi, this is a comment

An Exactly Solvable Model for Emergence and Scaling Laws

Пікірлер: 35

@Tunadorable

6 күн бұрын

@kimcosmos

6 күн бұрын

@kimcosmos

6 күн бұрын

@Tunadorable

7 күн бұрын

@SoFukinDope24

7 күн бұрын

@JehovahsaysNetworth

7 күн бұрын

@RoulDukeGonzo

7 күн бұрын

@JehovahsaysNetworth

7 күн бұрын

@ricosrealm

7 күн бұрын

@Tunadorable

7 күн бұрын

@Tunadorable

7 күн бұрын

Келесі