Exploring Learning Dynamics in Concept Space

arxiv.org/abs/2406.19370v1
Support my learning journey either by clicking the Join button above, becoming a Patreon member, or a one-time Venmo!
/ tunadorable
account.venmo.com/u/tunadorable
Discuss this stuff with other Tunadorks on Discord
/ discord
All my other links
linktr.ee/tunadorable

Пікірлер: 18

  • @spencerfunk6697
    @spencerfunk669722 күн бұрын

    oi pretty graphs

  • @kimcosmos
    @kimcosmos21 күн бұрын

    Its like telling frustrated infants to use their words. Overfitting is over rewarded. Using their words (concepts) is risky because there was a lot of frustration before reward was reached. Its learning aversion. And yes very like grokking but somehow they are prompting to guess based on stable concepts. Not clear on how. Necessity is the mother of invention so rightsizing the number of layers (effort scope) might help. When we get super AGI its going to spend a lot of time telling us whats within our learning scope and telling us to grow up.

  • @spencerfunk6697
    @spencerfunk669722 күн бұрын

    holy shit tis is interesting. given 2 different things , it can infer that something = or opposite may also exist. weird but im fuckin with it

  • @wwkk4964
    @wwkk496421 күн бұрын

    Thanks for sharing this earlier in week. This paper's concept space was what i was reminded of when you were showing yesterday's orthogonal basis for different skills being learned paper. Stephen Wolfram also had conducted some experiments with what he called was "inter concept space" where he was trying to explore image diffusion models inaccessible spaces that we couldn't reach because we didn't have a word for it. Had some weird images 😅

  • @TylerLemke
    @TylerLemke21 күн бұрын

    You are growing really fast. Great work. I am inspired by what you are doing to start an AI channel of my own. Not sure what angle to take but I really like how you are using this as a way to document your journey and now help others. Keep up the hard work.

  • @Tunadorable

    @Tunadorable

    21 күн бұрын

    🙏 thanks and you should! once u do send a video link in my discord

  • @Crawdaddy_Ro
    @Crawdaddy_Ro21 күн бұрын

    Excellent research. This one is much better than the last paper on emergent abilities. The concept space framework is especially fascinating, as I see it being very useful in further emergence research. If we can utilize a diverse range of datasets and establish clear metrics for measuring the emergent abilities, this could be the first step in a long and winding staircase.

  • @marcusrobbins854
    @marcusrobbins85422 күн бұрын

    The way that it cannot separate concepts for color and fruit in the case of strawberry is interesting. But only in the case where the color remains unspecified for strawberry in the dataset. Where a thing is shown to have the ability to come in different colors, the capability exists to abstract to a new color, but where it does not the essence of strawberry becomes entangled with the essence of red. That implies an inappropriate division made between different kinds of fruit. Is it learning to generalise the color of different fruit for each individual fruit? With a larger model would it learn to generalise the concept of thing of which strawberry is a kind, and then associate the color change property with object rather than each thing? Does this point to a fundamental limitation in learning capabilties, if so is there a way of carving out this particular kind of learning, or limitation?

  • @Tunadorable

    @Tunadorable

    22 күн бұрын

    my impression based on other papers is that a larger model trained on more diverse data would in fact abstract out the concept of color enough to be able to change the color of anything, including strawberries which never get labeled with a color. imagine a dataset with every fruit, some % of fruits do have multiple colors, and such a dataset would be large enough to likely ensure differentiation of the concepts despite the lack of color labels on some fruits. the point of this model is moreso for a low level mechanistic understanding

  • @seriousbusiness2293
    @seriousbusiness229317 күн бұрын

    Im highly curious on these types of papers! It feels more philosophical and close to base ideas. It's so boring seeing new models just being better by some specialized training or by feeding more data into a bigger model. A human can see two images of a new fantasy fruit and draw a purple version of it. The solution for AI cant be to train on billions of variations or to label every detail of the image and promp. We need to force Emergent propertys. Im highly curious how we should adjust neuron's for that, i feel back prop is overkill and needs a complementary buddy. We probably need to only adjust a few neurons for concept space adjustments (like seen in the Claude golden gate bridge paper example).

  • @mrpocock
    @mrpocock22 күн бұрын

    The test data in groking are out of distribution for the training data if you have strong-enough over-fitting priors on the training data. In the extreme case where your definition for being out-of-distribution is a distribution that's peaked with P=1 for things that are exactly training examples, and P=0 for things not in the training set then the test sets are absolutely out of distribution. When you train past the initial memorisation phase to the generalisation phase, it is still learning the P=1 distribution, but because of the sparsity taking over from data fit, it's doing so with algorithmically simpler representations that have no penalty as there's no cost for modelling the P=0 examples as it doesn't see them during training, so this happen by accident to capture rules that generalise beyond this overfit distribution. It's kind of the opposite intuition from traditional ML. I am wondering if we've been going about things wrongly during training, and should absolutely push models to grossly memorising data as early as possible e.g. take a minibatch, run gradient descent until that minibatch is memorised, and then move on to the next batch, and only later start to do less work per batch once you aren't seeing gradient decay.

  • @hjups
    @hjups22 күн бұрын

    Very interesting paper, and the proof of failure to disentangle is interesting. Unfortunately, the authors did not explore the converse: what happens if you train with "red apple" and "yellow apple" but only prompt "apple"? Typically diffusion models give you some combination based on the statistical presence in the training data (almost like quantum super-position until the observation of sampling collapses the distribution). Seeing that result empirically proven would have been nice. Also, the experiments in the paper have limited generalization, since diffusion models are typically able to latch onto strong concepts early in training, but can still fail to establish coherent structural detail. The choice to use simpler shapes (especially circles) doesn't really help show the distinction between "concept" and "performance". So while you might be able to prompt for things like "purple train", it's going to be malformed with incoherent class details (e.g. tracks, windows, etc.) much further into training than the OOD departure point (100k gradient updates vs 20k). Long tail concepts are obviously slower to learn as well, which the authors didn't seem to explore either (e.g. use partial masking but train for 1/alpha times longer).

  • @banalMinuta
    @banalMinuta21 күн бұрын

    hey man, I'm having trouble trying to figure out how they actually manipulate context length. do you know how they come up with The token contacts window? everything I read about it just seems to make me more confused

  • @Tunadorable

    @Tunadorable

    21 күн бұрын

    could be wrong but I'm not sure how context length would be relevant here, they're messing with diffusion models not LLMs. Try looking up those three models they did experiments on to learn more about their architectures. I believe at least 1 if not 2 of the 3 were ViTs and the other I would assume is probably a UNet given that it was an older version of one of the ViTs

  • @banalMinuta

    @banalMinuta

    20 күн бұрын

    honestly, it was just a random curiosity, One that just sparked an entire day of research. I am no credentialed academic, but I think these models are actually metafunctional semiotic systems. I think they know much more than we realize

  • @banalMinuta
    @banalMinuta22 күн бұрын

    obviously anecdotal evidence isn't evidence. however, I will say that spending countless hours modulating a minutia of prompts yields results that boggle the mind

  • @sikunowlol
    @sikunowlol22 күн бұрын

    oi

  • @dadsonworldwide3238
    @dadsonworldwide323822 күн бұрын

    They don't know things it is anylitical learning, that separates us from animals just like how our ancestors created English and taught serfs and slaves alike z-y vertical axis of faith ( mosaic commandments)ie qauntom physics ✝️ horizontal axis the cross ✝️ x works, physical lawisms etc etc Newton 3 lines of measure = truest True known standard flattest surface tunes all precision instruments and pragmatic common sense Christian objectivism. It is repeating puritan movement encoded english longitude and latitude built on the pilgrimage confirmed history of nations people places and things on the alphabetical exodus .indo European language symbols on objects. I'd argue convergences of nueral nodes in objects both.image or word has to cross paths if it's tuned wrong cursed and blessings concave or deformity will set in . Self drivers, robots, Twitter all systems tuned by pragmatic common sense objectivism proper will in line with precision instruments but it will call out when we prescribe realism over anti realism to further over time lines of measure. Evolutionary time will not work under that tuned lines of phylosphy The specific way to get shit done is a very literal thread of life & technology