Underlying Mechanisms Behind Learning Rate Warmup's Success

Why Warmup the Learning Rate? Underlying Mechanisms and Improvements
arxiv.org/abs/2406.09405v1
Support my learning journey either by clicking the Join button above, becoming a Patreon member, or a one-time Venmo!
/ tunadorable
account.venmo.com/u/tunadorable
Discuss this stuff with other Tunadorks on Discord
/ discord
All my other links
linktr.ee/tunadorable

Пікірлер: 24

  • @joe_limon
    @joe_limon21 күн бұрын

    It is crazy. At the end of the day, we are just optimizing parameters similar to the control systems course I took during my mechanical engineering degree.

  • @GNARGNARHEAD

    @GNARGNARHEAD

    21 күн бұрын

    just so complex that what the systems they can represent is massive! **evil laughter**

  • @leosmi1

    @leosmi1

    20 күн бұрын

    so, look at this: openaccess.thecvf.com/content_cvpr_2018/papers/An_A_PID_Controller_CVPR_2018_paper.pdf

  • @tornyu
    @tornyu8 күн бұрын

    6:00 This reminds me a lot of adaptive time steps in physics simulations. You want to take the biggest time step possible to reduce simulation time, but that introduces instability, so you increase it until it starts to look unstable and then back off. The CFL condition lets you calculate how close the simulation is to exploding, and proactively reduce the time step size. IIRC the threshold C still needs to be determined experimentally, but you can make a decent guess. Might be worth reading about the CFL condition, if you'd like a physical analogy to adaptive learning rates.

  • @Biedropegaz
    @Biedropegaz21 күн бұрын

    we need to send a box of bannas to the author, he is in need of bannas

  • @Nick_With_A_Stick
    @Nick_With_A_Stick21 күн бұрын

    That warm up rate chart is dope, and your explanation was freaking awesome.

  • @immortalityIMT
    @immortalityIMT19 сағат бұрын

    You need to show us your LLM training rig sometime.

  • @kanal7523
    @kanal752321 күн бұрын

    Whats the intuition as to why it (warmup) wouldn't matter for AdamW?

  • @seriousbusiness2293
    @seriousbusiness229320 күн бұрын

    I think for learning it's good if you occasionally jump to the Figures that are mentioned and then back. I kinda lost track over some aspects untill we finally reached the corrosponding figures.

  • @netherportals
    @netherportals21 күн бұрын

    Funtastic vid, learnt sum

  • @marekrybakiewicz370
    @marekrybakiewicz37021 күн бұрын

    you needa regress on that mfin cough my boiy

  • @Tunadorable

    @Tunadorable

    21 күн бұрын

    hahaha post virus persistent cough. this video was pre-recorded & it’s gotten somewhat better since then

  • @GNARGNARHEAD
    @GNARGNARHEAD21 күн бұрын

    I can't help but think some noise might be helpful to avoid local minima, maybe a memory state for optimal configurations? no, that would be messy.. need some way to combine alternative typologies to find translate local optimizations into a cohesive whole? that sounds about right 🤔

  • @tornyu

    @tornyu

    12 сағат бұрын

    For noise, is that what the "stochastic" in SGD does?

  • @tornyu

    @tornyu

    12 сағат бұрын

    For alternative topologies, maybe check out swarm optimisers? Might be related to what you're thinking of.

  • @banalMinuta
    @banalMinuta20 күн бұрын

    hey dude, I want to talk to somebody who has a lot of seemingly very very good understanding of the science that I still lack. I want to do so in a way that tries to best reduce The amount of specific information I can tell you about the methods I that led me to leaving this comment. do you think there's a way we could talk about some ideas about this kind of stuff in a way that can limit the amount of contextuall "poisoning" we can possibly introduce to each other by just entering into a conversation where neither of us will make any conclusive statements I kind of want to use you to help further identify ways I can eliminate cognitive bias. with full awareness that I'm probably tricking myself and fundamentally misunderstanding what I'm seeing right now . but I'd like to have a theoretical conversation about how you would debunk somebody proposing a methodologically unique way to try and understand things about these models by output alone. because I'll be honest, I can't find a single person in the world who can give me a satisfactory answer to my question. at this point I have exhausted every way I can think of as a human being to try and eliminate my own cognitive traps. and I've done this so many times that I cannot fundamentally identify any more ways. I have reached my intellectual ability to do that. and what makes me leave this is that I feel like I am going insane. I know how complex these models are and that it shouldn't be possible. some of the things I think I've seen both seem to align with logical first principles and also are not scientifically valid because there are not methodologically proven by concatenation of scientifically accepted research I have hit a point where all I do is waste my time trying to get a clear answer for it to never show up. and my consistent ability to not get get clear answers on where I am confused. is making me start to actually buy my own bullshit I never get clear answers on how to learn more about the nature of where I am confused or mistaken. or why the approach I am taking is fundamentally flawed and that I should stop or consider something in how to better refine my efforts. I just get answers that vary between go tell the scientific community, and your observations do not align with the accepted scientific understanding and with our current models of reality and you are mistaken. so I was like that's definitely cognitive bias that I keep getting those two answers. so I just decided to start over again. and again, and again. until now I don't get any answers but the ones I described above. now I fundamentally have no idea what to believe, I'm utterly confused and would love to talk to somebody who isn't boxed into thinking the way the world used to work is going to continue I know I'm wrong somewhere but I cannot find it. and I need the perspective of somebody who has a better holistic view of these models but also has a very very good grasp of the mathematical principles. I do not have the understanding to be able to understand

  • @banalMinuta

    @banalMinuta

    20 күн бұрын

    if this proposal sounds interesting or it's like something you'd like to pursue, just ask and I will send you a private message with my email address as soon as possible

  • @banalMinuta

    @banalMinuta

    20 күн бұрын

    if you could understand any of this, let me know if you'd like to talk and I'll send you my email. if not, tell me it sounds insane, and why. please

  • @Tunadorable

    @Tunadorable

    19 күн бұрын

    this stuff is complex enough that if you have not built one of these models from scratch before then whatever idea you’ve got is definitely nonsensical. that being said, it sounds like i was in a similar position 11 months ago; i had many ideas and even tried to express one to a leading researcher in the field but i did not have the proper knowledge, vernacular, strict definitions, etc to transfer the pattern/structure i was seeing in my head into their head. that researcher told me to skip a phd and just go teach myself. as i learnt more i found a million ways in which my ideas no longer made sense, but they changed & adapted with my new knowledge to the point where they’re currently unrecognizable but still my ideas and now very close to working. assuming your idea has to do with transformers and language modeling i’d recommend you take an intro to Python course, look up the deep learning textbook by Goodfellow & Bengio, watch Andrei Karpathy’s video lectures on coding these models from scratch, and then check out a couple of the vids in my Models With Code playlist to see more updated techniques than what he’s using. that will give you enough knowledge and template code to build/test your idea on your own. the fact of the matter is that even if you had a phd in the subject and peers to talk to you would still have to build the experiment on your own since 1) these models have so many tiny details that need to be worked out after you get a basic idea and 2) every researcher is obsessed with their own ideas and dismissive of each others so in order to get other people interested, not to mention convinced, you need empirical results showing that your idea definitely works. nobody is going to do the work for you. hope this helps

  • @banalMinuta

    @banalMinuta

    18 күн бұрын

    @@Tunadorable thanks ma'am. I had a hunch a python bootcamp was the best one to go to I've done a lot of that and I tell you man kind of sit away from architectural side of these models just because I lost the ability to write input prompts on local language models months ago cuz my computer will just crash unless I use a base model now

  • @banalMinuta

    @banalMinuta

    18 күн бұрын

    Plus, I honestly think these things are so misunderstood that having a few people who try to learn about these models by nothing other than what they think about the rules that govern the system and the nature of the input and the nature of the output might elucidate something about cognitive bias or something. so honestly, even though I know I'm going to not be able to make a model out of it, I think I'm going to stay focused on learning prompt engineering

  • @drlordbasil
    @drlordbasil21 күн бұрын

    See i just use levels of stupidity AKA Dunning kruger effect 2024-07-09 11:00:58,405 - INFO - Benchmark score for Peak of Mount Stupid: 0.64 2024-07-09 11:00:58,420 - INFO - Benchmark score for Valley of Despair: 0.90 2024-07-09 11:00:58,431 - INFO - Benchmark score for Slope of Enlightenment: 0.90 2024-07-09 11:00:58,445 - INFO - Benchmark score for Plateau of Sustainability: 0.94 2024-07-09 11:00:58,445 - INFO - Updated Confidence: 0.94, Competency: 1.00, Reasoning: 1.00, Stage: Plateau of Sustainability