Andrej Karpathy

Andrej Karpathy

FAQ
Q: How can I pay you? Do you have a Patreon or etc?
A: As KZread partner I do share in a small amount of the ad revenue on the videos, but I don't maintain any other extra payment channels. I would prefer that people "pay me back" by using the knowledge to build something great.

Пікірлер

  • @oleksandrasaskia
    @oleksandrasaskia7 сағат бұрын

    Thank you so much!!! For democratizing education and this technology for all of us! AMAZING! Much much love!

  • @monocles.IcedPeaksOfFire
    @monocles.IcedPeaksOfFire7 сағат бұрын

    ❤ > It's (a) pleasure

  • @akhilphilnat
    @akhilphilnat9 сағат бұрын

    fine tuning next please

  • @user-cb3pf6qf2z
    @user-cb3pf6qf2z10 сағат бұрын

    Is it possible to just calculate the gradients once and then you know them? Not resetting and recalculating. What am I missing?

  • @lielbn0
    @lielbn015 сағат бұрын

    Thanks!! I have never understood better how a neural network works!

  • @Clammer999
    @Clammer99916 сағат бұрын

    One of the best videos on under the hood look in LLMs. Love the clarity and patience Andrej imparts considering he’s such a legend in AI.

  • @anthonyjackson7644
    @anthonyjackson764417 сағат бұрын

    Was i the only one invested with the leaf story😭

  • @mikezhao1838
    @mikezhao183823 сағат бұрын

    Hi Not sure what happens, but I can't join discord, it says "unable to accept invite". I have issue to get the "m" from multinormial distribution, I got 3 instead 13 if I use num_samples=1. if I use num_samples>1, I got 13. but I got "mi." if num_samples=2. people on line suggest to use touch 1.13.1, I dont have this old version on my macOs.

  • @AsmaKhan-lk3wb
    @AsmaKhan-lk3wbКүн бұрын

    Wow!!! This is extremely helpful and well-made. Thank you so much!

  • @AdityaAVG
    @AdityaAVGКүн бұрын

    This guy has become my favorite tutor .

  • @ezekwu77
    @ezekwu77Күн бұрын

    I love this learning resource and the simplicity of the tutorial style. Thanks at Andrej KArpathy.

  • @switchwithSagar
    @switchwithSagarКүн бұрын

    In the case of a simple Bigram model @32:38 we are sampling only one character, however, while calculating loss we consider the character with the highest probability. The character sampled is unlikely to be the same to the character with the highest probability in the row unless we sample a large number of characters from the multinomial distribution. So, my question is, does the loss function reflect the correct loss? Can anyone help me understand this.

  • @meow-mi333
    @meow-mi333Күн бұрын

    Thanks I really need this level of details to understand what’s going on. ❤

  • @MagicBoterham
    @MagicBoterham2 күн бұрын

    1:25:23 Why is the last layer made "less confident like we saw" and where did we see this?

  • @marcelomenezes3796
    @marcelomenezes37962 күн бұрын

    This is the best video ever about Intro to LLM.

  • @MrManlify
    @MrManlify2 күн бұрын

    How were you able to run the loop without adding a requires_grad command in the "implementing the training loop, overfitting one batch" section of the video? For me it only worked when I changed the lines to: g = torch.Generator().manual_seed(2147483647) # For Reproducibility C = torch.randn((27, 2), generator=g, requires_grad=True) W1 = torch.randn((6, 100), generator=g, requires_grad=True) b1 = torch.randn(100, generator=g, requires_grad=True) W2 = torch.randn((100, 27), generator=g, requires_grad=True) b2 = torch.randn(27, generator=g, requires_grad=True) parameters = [C, W1, b1, W2, b2]

  • @AlexTang99
    @AlexTang992 күн бұрын

    This is the most amazing video on neural network mathematics knowledge I've ever seen; thank you very much, Andrej!

  • @adirmashiach4639
    @adirmashiach46392 күн бұрын

    Something you didn't explain - 51:30 - if we want L to go up we simply need to increase the variables in the direction of the gradient? How come it is so if some gradients are negative?

  • @yourxylitol
    @yourxylitol2 күн бұрын

    first question: yes second question: because thats the definition of a gradient -> If the gradient is negative, this means that if you make the data smaller, the loss will increase

  • @howardbaek5413
    @howardbaek54132 күн бұрын

    This is the single best explanation of backpropagation in code that I've seen so far. Thanks Andrej.

  • @ThefirstrobloxCEO989
    @ThefirstrobloxCEO9893 күн бұрын

    Thanks a lot for the insight. and demonstration. I really look forward more videos from you, Andrej!

  • @debdeepsanyal9030
    @debdeepsanyal90303 күн бұрын

    just a random fun fact, with gen = torch.Generator().manual_seed(2147483647), the bigram generated name i got was `c e x z e .`, amazing.

  • @mehulchopra1517
    @mehulchopra15173 күн бұрын

    Thanks a ton for this Andrej! Explained and presented in such simple and relatable terms. Gives confidence to get into the weeds now.

  • @wangcwy
    @wangcwy4 күн бұрын

    The best ML tutorial video I have watched this year. I really like detailed example, and how these difficult concepts are explained in a simple manner. What a treat for me to watch and learn!

  • @hotshot-te9xw
    @hotshot-te9xw4 күн бұрын

    What methods of alignment do you personally feel are extreamly promising for ensuring future AGI doesnt kill us all

  • @a000000j
    @a000000j4 күн бұрын

    One of the best explanation of LLM....

  • @ced1401
    @ced14014 күн бұрын

    Thanks you very much

  • @quentinquarantino8261
    @quentinquarantino82614 күн бұрын

    Is this the real, one and only Andrej Karpathy? Or is this a deep fake?

  • @soumilbinhani8803
    @soumilbinhani88034 күн бұрын

    Hello sir, it would be great if you could make a video on how exactly these tokens are converted into embedding vectors, eg - how to make word to vec.. Thank you

  • @soblueskyzll
    @soblueskyzll4 күн бұрын

    I am following exactly (i believe) to calculate all the gradients, but beginning from dhpreact, the results show "exact: False, approximate True", with maxdiff on the order of 1e-9 ~ 1e-10. Is it just some numerical issue, or I did something wrong? Anyone had same issue?

  • @Kevin.Kawchak
    @Kevin.Kawchak4 күн бұрын

    Thank you for the discussion

  • @josephmathew4667
    @josephmathew46675 күн бұрын

    Thank you so much Andrej. As many has already commented, this was by far one of the best lectures I have ever listened.

  • @chadlinden6912
    @chadlinden69125 күн бұрын

    Learning the math is really interesting, helps to build a mental image of a plane or block of vectors shifting while training. I'd be curious to know if in the history/evolution of ML and AI if hardware drove intense matrix math derived software solutions to AI, or if improving hardware made this math possible.

  • @ehudklein
    @ehudklein5 күн бұрын

    wow

  • @zbaktube
    @zbaktube6 күн бұрын

    About makemore: Did you show it to Elon? 😀

  • @zbaktube
    @zbaktube6 күн бұрын

    Hehe, the bloopers at the end are priceless! I almost switched off the video when you said goodbye 😀

  • @Aliced3645
    @Aliced36456 күн бұрын

    Finished!

  • @andreasfraunberger5169
    @andreasfraunberger51696 күн бұрын

    🎯 Key Takeaways for quick navigation: 00:00 *📝 Andrej Karpathy re-presents a popular talk on large language models for KZread.* 00:29 *💾 LLaMA 270B by Meta AI is an open weights large language model, unlike ChatGPT.* 04:09 *💰 LLaMA 270B's training involves high costs, utilizing 6,000 GPUs over 12 days.* 06:56 *🧠 Large language models predict the next word, gaining context from extensive training.* 09:03 *📚 Pre-training compresses internet data into a model's parameters, similar to a lossy zip file.* 14:29 *🤖 Fine-tuning shapes base models into assistants via quality Q&A pairs.* 22:21 *📚 Human trainers follow complex instructions emphasizing helpful and truthful AI behavior.* 23:46 *🏆 Closed language models presently outclass open-source ones on performance leaderboards.* 33:16 *🖼️ Large language models are moving toward multimodality, with the ability to use and generate multimedia.* 42:11 *🤖 Language models are evolving to act like computational OS, coordinating tools for solving problems.* 46:18 *🛡️ Safety measures in language models can be circumvented with inventive prompt crafting.* 52:08 *💉 Covert instructions within media can manipulate language models, leading to undesired actions.* 56:18 *⚠️ Language models face threats like data poisoning, which can embed trigger phrases causing harmful behavior.* Made with HARPA AI

  • @steveh572
    @steveh5723 күн бұрын

    Super useful

  • @rgonzo66
    @rgonzo666 күн бұрын

    If you give it time to think, it will always come up with the same answer: 42.

  • @him12March
    @him12March6 күн бұрын

    Amazing insights - wonderful video and great slide deck

  • @aojing
    @aojing6 күн бұрын

    @1:16:48 the method `byte_encoder` is to shift ordinal of special ASCII characters up by 256, e.g., replacing SPACE with “Ġ” (U+0120)

  • @avishakeadhikary
    @avishakeadhikary7 күн бұрын

    Its amazing how Andrej is such a polite guy. Thanks for sharing this amazing content. :)

  • @user-se8wy3mi5g
    @user-se8wy3mi5g7 күн бұрын

    на 0.75 скорости норм смотрится)

  • @RohitSharma-qm8hv
    @RohitSharma-qm8hv7 күн бұрын

    can ChatGPT browse internet? I am really confused. I used to think that it cannot. As of now, ChatGPT cannot browse the internet. It generates responses based on a mixture of licensed data, data created by human trainers, and publicly available information. It was last trained on new data in 2023, so its knowledge is current only up until that time. Also, ChatGPT can't plot graphs like he showed. Was his ChatGPT integrated with other tools or APIs that facilitate real-time data retrieval or internet browsing and use tools like graph plotting? So, what was Andrej speaking about? Could someone help me understand?

  • @waytolegacy
    @waytolegacy8 күн бұрын

    Anyone saw the *end credits*? So funny 😅😅🤣🤣

  • @waytolegacy
    @waytolegacy8 күн бұрын

    He literally made me smile at 1:27:41 (ahh swiftie!)

  • @waytolegacy
    @waytolegacy8 күн бұрын

    Our variable naming was really good (1:16:20)

  • @franzbischoff
    @franzbischoff8 күн бұрын

    Current GPT-4: In circuits, whispers, Wisdom blooms from silicon- Worlds shaped by our hands.

  • @zalzalahbuttsaab
    @zalzalahbuttsaab8 күн бұрын

    Yah got here through the Micrograd song on Udio. Dang! That thing's addictive!!! 🤣🤣🤣 Excellent learning tool!

  • @Zack-yx5nl
    @Zack-yx5nl8 күн бұрын

    I wonder if certain cases, like the repetitive use of spaces and punctuation can either be manually trained (or absolutely defined) and then excluded from general tokenization.

  • @shawncortexiphan2367
    @shawncortexiphan23678 күн бұрын

    great video! thanks a lot Andrej!! I would only ask two simply question, why do you "average" the values to set up "attention" back and forth among words? what does it mean from language point of view?