FAQ
Q: How can I pay you? Do you have a Patreon or etc?
A: As KZread partner I do share in a small amount of the ad revenue on the videos, but I don't maintain any other extra payment channels. I would prefer that people "pay me back" by using the knowledge to build something great.
Пікірлер
Thank you so much!!! For democratizing education and this technology for all of us! AMAZING! Much much love!
❤ > It's (a) pleasure
fine tuning next please
Is it possible to just calculate the gradients once and then you know them? Not resetting and recalculating. What am I missing?
Thanks!! I have never understood better how a neural network works!
One of the best videos on under the hood look in LLMs. Love the clarity and patience Andrej imparts considering he’s such a legend in AI.
Was i the only one invested with the leaf story😭
Hi Not sure what happens, but I can't join discord, it says "unable to accept invite". I have issue to get the "m" from multinormial distribution, I got 3 instead 13 if I use num_samples=1. if I use num_samples>1, I got 13. but I got "mi." if num_samples=2. people on line suggest to use touch 1.13.1, I dont have this old version on my macOs.
Wow!!! This is extremely helpful and well-made. Thank you so much!
This guy has become my favorite tutor .
I love this learning resource and the simplicity of the tutorial style. Thanks at Andrej KArpathy.
In the case of a simple Bigram model @32:38 we are sampling only one character, however, while calculating loss we consider the character with the highest probability. The character sampled is unlikely to be the same to the character with the highest probability in the row unless we sample a large number of characters from the multinomial distribution. So, my question is, does the loss function reflect the correct loss? Can anyone help me understand this.
Thanks I really need this level of details to understand what’s going on. ❤
1:25:23 Why is the last layer made "less confident like we saw" and where did we see this?
This is the best video ever about Intro to LLM.
How were you able to run the loop without adding a requires_grad command in the "implementing the training loop, overfitting one batch" section of the video? For me it only worked when I changed the lines to: g = torch.Generator().manual_seed(2147483647) # For Reproducibility C = torch.randn((27, 2), generator=g, requires_grad=True) W1 = torch.randn((6, 100), generator=g, requires_grad=True) b1 = torch.randn(100, generator=g, requires_grad=True) W2 = torch.randn((100, 27), generator=g, requires_grad=True) b2 = torch.randn(27, generator=g, requires_grad=True) parameters = [C, W1, b1, W2, b2]
This is the most amazing video on neural network mathematics knowledge I've ever seen; thank you very much, Andrej!
Something you didn't explain - 51:30 - if we want L to go up we simply need to increase the variables in the direction of the gradient? How come it is so if some gradients are negative?
first question: yes second question: because thats the definition of a gradient -> If the gradient is negative, this means that if you make the data smaller, the loss will increase
This is the single best explanation of backpropagation in code that I've seen so far. Thanks Andrej.
Thanks a lot for the insight. and demonstration. I really look forward more videos from you, Andrej!
just a random fun fact, with gen = torch.Generator().manual_seed(2147483647), the bigram generated name i got was `c e x z e .`, amazing.
Thanks a ton for this Andrej! Explained and presented in such simple and relatable terms. Gives confidence to get into the weeds now.
The best ML tutorial video I have watched this year. I really like detailed example, and how these difficult concepts are explained in a simple manner. What a treat for me to watch and learn!
What methods of alignment do you personally feel are extreamly promising for ensuring future AGI doesnt kill us all
One of the best explanation of LLM....
Thanks you very much
Is this the real, one and only Andrej Karpathy? Or is this a deep fake?
Hello sir, it would be great if you could make a video on how exactly these tokens are converted into embedding vectors, eg - how to make word to vec.. Thank you
I am following exactly (i believe) to calculate all the gradients, but beginning from dhpreact, the results show "exact: False, approximate True", with maxdiff on the order of 1e-9 ~ 1e-10. Is it just some numerical issue, or I did something wrong? Anyone had same issue?
Thank you for the discussion
Thank you so much Andrej. As many has already commented, this was by far one of the best lectures I have ever listened.
Learning the math is really interesting, helps to build a mental image of a plane or block of vectors shifting while training. I'd be curious to know if in the history/evolution of ML and AI if hardware drove intense matrix math derived software solutions to AI, or if improving hardware made this math possible.
wow
About makemore: Did you show it to Elon? 😀
Hehe, the bloopers at the end are priceless! I almost switched off the video when you said goodbye 😀
Finished!
🎯 Key Takeaways for quick navigation: 00:00 *📝 Andrej Karpathy re-presents a popular talk on large language models for KZread.* 00:29 *💾 LLaMA 270B by Meta AI is an open weights large language model, unlike ChatGPT.* 04:09 *💰 LLaMA 270B's training involves high costs, utilizing 6,000 GPUs over 12 days.* 06:56 *🧠 Large language models predict the next word, gaining context from extensive training.* 09:03 *📚 Pre-training compresses internet data into a model's parameters, similar to a lossy zip file.* 14:29 *🤖 Fine-tuning shapes base models into assistants via quality Q&A pairs.* 22:21 *📚 Human trainers follow complex instructions emphasizing helpful and truthful AI behavior.* 23:46 *🏆 Closed language models presently outclass open-source ones on performance leaderboards.* 33:16 *🖼️ Large language models are moving toward multimodality, with the ability to use and generate multimedia.* 42:11 *🤖 Language models are evolving to act like computational OS, coordinating tools for solving problems.* 46:18 *🛡️ Safety measures in language models can be circumvented with inventive prompt crafting.* 52:08 *💉 Covert instructions within media can manipulate language models, leading to undesired actions.* 56:18 *⚠️ Language models face threats like data poisoning, which can embed trigger phrases causing harmful behavior.* Made with HARPA AI
Super useful
If you give it time to think, it will always come up with the same answer: 42.
Amazing insights - wonderful video and great slide deck
@1:16:48 the method `byte_encoder` is to shift ordinal of special ASCII characters up by 256, e.g., replacing SPACE with “Ġ” (U+0120)
Its amazing how Andrej is such a polite guy. Thanks for sharing this amazing content. :)
на 0.75 скорости норм смотрится)
can ChatGPT browse internet? I am really confused. I used to think that it cannot. As of now, ChatGPT cannot browse the internet. It generates responses based on a mixture of licensed data, data created by human trainers, and publicly available information. It was last trained on new data in 2023, so its knowledge is current only up until that time. Also, ChatGPT can't plot graphs like he showed. Was his ChatGPT integrated with other tools or APIs that facilitate real-time data retrieval or internet browsing and use tools like graph plotting? So, what was Andrej speaking about? Could someone help me understand?
Anyone saw the *end credits*? So funny 😅😅🤣🤣
He literally made me smile at 1:27:41 (ahh swiftie!)
Our variable naming was really good (1:16:20)
Current GPT-4: In circuits, whispers, Wisdom blooms from silicon- Worlds shaped by our hands.
Yah got here through the Micrograd song on Udio. Dang! That thing's addictive!!! 🤣🤣🤣 Excellent learning tool!
I wonder if certain cases, like the repetitive use of spaces and punctuation can either be manually trained (or absolutely defined) and then excluded from general tokenization.
great video! thanks a lot Andrej!! I would only ask two simply question, why do you "average" the values to set up "attention" back and forth among words? what does it mean from language point of view?