Andrej Karpathy

FAQ
Q: How can I pay you? Do you have a Patreon or etc?
A: As KZread partner I do share in a small amount of the ad revenue on the videos, but I don't maintain any other extra payment channels. I would prefer that people "pay me back" by using the knowledge to build something great.

5 ай бұрын

[1hr Talk] Intro to Large Language Models

Жыл бұрын

Let's build GPT: from scratch, in code, spelled out.

Жыл бұрын

Building makemore Part 5: Building a WaveNet

Жыл бұрын

Building makemore Part 4: Becoming a Backprop Ninja

Жыл бұрын

Building makemore Part 3: Activations & Gradients, BatchNorm

Жыл бұрын

Building makemore Part 2: MLP

Жыл бұрын

The spelled-out intro to language modeling: building makemore

Жыл бұрын

Stable diffusion dreams of psychedelic faces

Жыл бұрын

Stable diffusion dreams of steampunk brains

Жыл бұрын

Stable diffusion dreams of tattoos

Жыл бұрын

The spelled-out intro to neural networks and backpropagation: building micrograd

Жыл бұрын

Stable diffusion dreams of "blueberry spaghetti" for one night

Жыл бұрын

Stable diffusion dreams of steam punk neural networks

Пікірлер

@oleksandrasaskia7 сағат бұрын

Thank you so much!!! For democratizing education and this technology for all of us! AMAZING! Much much love!

@monocles.IcedPeaksOfFire7 сағат бұрын

❤ > It's (a) pleasure

@akhilphilnat9 сағат бұрын

fine tuning next please

@user-cb3pf6qf2z10 сағат бұрын

Is it possible to just calculate the gradients once and then you know them? Not resetting and recalculating. What am I missing?

@lielbn015 сағат бұрын

Thanks!! I have never understood better how a neural network works!

@Clammer99916 сағат бұрын

One of the best videos on under the hood look in LLMs. Love the clarity and patience Andrej imparts considering he’s such a legend in AI.

@anthonyjackson764417 сағат бұрын

Was i the only one invested with the leaf story😭

@mikezhao183823 сағат бұрын

Hi Not sure what happens, but I can't join discord, it says "unable to accept invite". I have issue to get the "m" from multinormial distribution, I got 3 instead 13 if I use num_samples=1. if I use num_samples>1, I got 13. but I got "mi." if num_samples=2. people on line suggest to use touch 1.13.1, I dont have this old version on my macOs.

@AsmaKhan-lk3wbКүн бұрын

Wow!!! This is extremely helpful and well-made. Thank you so much!

@AdityaAVGКүн бұрын

This guy has become my favorite tutor .

@ezekwu77Күн бұрын

I love this learning resource and the simplicity of the tutorial style. Thanks at Andrej KArpathy.

@switchwithSagarКүн бұрын

In the case of a simple Bigram model @32:38 we are sampling only one character, however, while calculating loss we consider the character with the highest probability. The character sampled is unlikely to be the same to the character with the highest probability in the row unless we sample a large number of characters from the multinomial distribution. So, my question is, does the loss function reflect the correct loss? Can anyone help me understand this.

@meow-mi333Күн бұрын

Thanks I really need this level of details to understand what’s going on. ❤

@MagicBoterham2 күн бұрын

1:25:23 Why is the last layer made "less confident like we saw" and where did we see this?

@marcelomenezes37962 күн бұрын

This is the best video ever about Intro to LLM.

@MrManlify2 күн бұрын

How were you able to run the loop without adding a requires_grad command in the "implementing the training loop, overfitting one batch" section of the video? For me it only worked when I changed the lines to: g = torch.Generator().manual_seed(2147483647) # For Reproducibility C = torch.randn((27, 2), generator=g, requires_grad=True) W1 = torch.randn((6, 100), generator=g, requires_grad=True) b1 = torch.randn(100, generator=g, requires_grad=True) W2 = torch.randn((100, 27), generator=g, requires_grad=True) b2 = torch.randn(27, generator=g, requires_grad=True) parameters = [C, W1, b1, W2, b2]

@AlexTang992 күн бұрын

This is the most amazing video on neural network mathematics knowledge I've ever seen; thank you very much, Andrej!

@adirmashiach46392 күн бұрын

Something you didn't explain - 51:30 - if we want L to go up we simply need to increase the variables in the direction of the gradient? How come it is so if some gradients are negative?

@yourxylitol2 күн бұрын

first question: yes second question: because thats the definition of a gradient -> If the gradient is negative, this means that if you make the data smaller, the loss will increase

@howardbaek54132 күн бұрын

This is the single best explanation of backpropagation in code that I've seen so far. Thanks Andrej.

@ThefirstrobloxCEO9893 күн бұрын

Thanks a lot for the insight. and demonstration. I really look forward more videos from you, Andrej!

@debdeepsanyal90303 күн бұрын

just a random fun fact, with gen = torch.Generator().manual_seed(2147483647), the bigram generated name i got was `c e x z e .`, amazing.

@mehulchopra15173 күн бұрын

Thanks a ton for this Andrej! Explained and presented in such simple and relatable terms. Gives confidence to get into the weeds now.

@wangcwy4 күн бұрын

The best ML tutorial video I have watched this year. I really like detailed example, and how these difficult concepts are explained in a simple manner. What a treat for me to watch and learn!

@hotshot-te9xw4 күн бұрын

What methods of alignment do you personally feel are extreamly promising for ensuring future AGI doesnt kill us all

@a000000j4 күн бұрын

One of the best explanation of LLM....

@ced14014 күн бұрын

Thanks you very much

@quentinquarantino82614 күн бұрын

Is this the real, one and only Andrej Karpathy? Or is this a deep fake?

@soumilbinhani88034 күн бұрын

Hello sir, it would be great if you could make a video on how exactly these tokens are converted into embedding vectors, eg - how to make word to vec.. Thank you

@soblueskyzll4 күн бұрын

I am following exactly (i believe) to calculate all the gradients, but beginning from dhpreact, the results show "exact: False, approximate True", with maxdiff on the order of 1e-9 ~ 1e-10. Is it just some numerical issue, or I did something wrong? Anyone had same issue?

@Kevin.Kawchak4 күн бұрын

Thank you for the discussion

@josephmathew46675 күн бұрын

Thank you so much Andrej. As many has already commented, this was by far one of the best lectures I have ever listened.

@chadlinden69125 күн бұрын

Learning the math is really interesting, helps to build a mental image of a plane or block of vectors shifting while training. I'd be curious to know if in the history/evolution of ML and AI if hardware drove intense matrix math derived software solutions to AI, or if improving hardware made this math possible.

@ehudklein5 күн бұрын

wow

@zbaktube6 күн бұрын

About makemore: Did you show it to Elon? 😀

@zbaktube6 күн бұрын

Hehe, the bloopers at the end are priceless! I almost switched off the video when you said goodbye 😀

@Aliced36456 күн бұрын

Finished!

@andreasfraunberger51696 күн бұрын

🎯 Key Takeaways for quick navigation: 00:00 *📝 Andrej Karpathy re-presents a popular talk on large language models for KZread.* 00:29 *💾 LLaMA 270B by Meta AI is an open weights large language model, unlike ChatGPT.* 04:09 *💰 LLaMA 270B's training involves high costs, utilizing 6,000 GPUs over 12 days.* 06:56 *🧠 Large language models predict the next word, gaining context from extensive training.* 09:03 *📚 Pre-training compresses internet data into a model's parameters, similar to a lossy zip file.* 14:29 *🤖 Fine-tuning shapes base models into assistants via quality Q&A pairs.* 22:21 *📚 Human trainers follow complex instructions emphasizing helpful and truthful AI behavior.* 23:46 *🏆 Closed language models presently outclass open-source ones on performance leaderboards.* 33:16 *🖼️ Large language models are moving toward multimodality, with the ability to use and generate multimedia.* 42:11 *🤖 Language models are evolving to act like computational OS, coordinating tools for solving problems.* 46:18 *🛡️ Safety measures in language models can be circumvented with inventive prompt crafting.* 52:08 *💉 Covert instructions within media can manipulate language models, leading to undesired actions.* 56:18 *⚠️ Language models face threats like data poisoning, which can embed trigger phrases causing harmful behavior.* Made with HARPA AI

@steveh5723 күн бұрын

Super useful

@rgonzo666 күн бұрын

If you give it time to think, it will always come up with the same answer: 42.

@him12March6 күн бұрын

Amazing insights - wonderful video and great slide deck

@aojing6 күн бұрын

@1:16:48 the method `byte_encoder` is to shift ordinal of special ASCII characters up by 256, e.g., replacing SPACE with “Ġ” (U+0120)

@avishakeadhikary7 күн бұрын

Its amazing how Andrej is such a polite guy. Thanks for sharing this amazing content. :)

@user-se8wy3mi5g7 күн бұрын

на 0.75 скорости норм смотрится)

@RohitSharma-qm8hv7 күн бұрын

can ChatGPT browse internet? I am really confused. I used to think that it cannot. As of now, ChatGPT cannot browse the internet. It generates responses based on a mixture of licensed data, data created by human trainers, and publicly available information. It was last trained on new data in 2023, so its knowledge is current only up until that time. Also, ChatGPT can't plot graphs like he showed. Was his ChatGPT integrated with other tools or APIs that facilitate real-time data retrieval or internet browsing and use tools like graph plotting? So, what was Andrej speaking about? Could someone help me understand?

@waytolegacy8 күн бұрын

Anyone saw the *end credits*? So funny 😅😅🤣🤣

@waytolegacy8 күн бұрын

He literally made me smile at 1:27:41 (ahh swiftie!)

@waytolegacy8 күн бұрын

Our variable naming was really good (1:16:20)

@franzbischoff8 күн бұрын

Current GPT-4: In circuits, whispers, Wisdom blooms from silicon- Worlds shaped by our hands.

@zalzalahbuttsaab8 күн бұрын

Yah got here through the Micrograd song on Udio. Dang! That thing's addictive!!! 🤣🤣🤣 Excellent learning tool!

@Zack-yx5nl8 күн бұрын

I wonder if certain cases, like the repetitive use of spaces and punctuation can either be manually trained (or absolutely defined) and then excluded from general tokenization.

@shawncortexiphan23678 күн бұрын

great video! thanks a lot Andrej!! I would only ask two simply question, why do you "average" the values to set up "attention" back and forth among words? what does it mean from language point of view?

@oleksandrasaskia7 сағат бұрын
Thank you so much!!! For democratizing education and this technology for all of us! AMAZING! Much much love!
@monocles.IcedPeaksOfFire7 сағат бұрын
❤ > It's (a) pleasure
@akhilphilnat9 сағат бұрын
fine tuning next please
@user-cb3pf6qf2z10 сағат бұрын
Is it possible to just calculate the gradients once and then you know them? Not resetting and recalculating. What am I missing?
@lielbn015 сағат бұрын
Thanks!! I have never understood better how a neural network works!
@Clammer99916 сағат бұрын
One of the best videos on under the hood look in LLMs. Love the clarity and patience Andrej imparts considering he’s such a legend in AI.
@anthonyjackson764417 сағат бұрын
Was i the only one invested with the leaf story😭
@mikezhao183823 сағат бұрын
Hi Not sure what happens, but I can't join discord, it says "unable to accept invite". I have issue to get the "m" from multinormial distribution, I got 3 instead 13 if I use num_samples=1. if I use num_samples>1, I got 13. but I got "mi." if num_samples=2. people on line suggest to use touch 1.13.1, I dont have this old version on my macOs.
@AsmaKhan-lk3wbКүн бұрын
Wow!!! This is extremely helpful and well-made. Thank you so much!
@AdityaAVGКүн бұрын
This guy has become my favorite tutor .
@ezekwu77Күн бұрын
I love this learning resource and the simplicity of the tutorial style. Thanks at Andrej KArpathy.
@switchwithSagarКүн бұрын
In the case of a simple Bigram model @32:38 we are sampling only one character, however, while calculating loss we consider the character with the highest probability. The character sampled is unlikely to be the same to the character with the highest probability in the row unless we sample a large number of characters from the multinomial distribution. So, my question is, does the loss function reflect the correct loss? Can anyone help me understand this.
@meow-mi333Күн бұрын
Thanks I really need this level of details to understand what’s going on. ❤
@MagicBoterham2 күн бұрын
1:25:23 Why is the last layer made "less confident like we saw" and where did we see this?
@marcelomenezes37962 күн бұрын
This is the best video ever about Intro to LLM.
@MrManlify2 күн бұрын
How were you able to run the loop without adding a requires_grad command in the "implementing the training loop, overfitting one batch" section of the video? For me it only worked when I changed the lines to: g = torch.Generator().manual_seed(2147483647) # For Reproducibility C = torch.randn((27, 2), generator=g, requires_grad=True) W1 = torch.randn((6, 100), generator=g, requires_grad=True) b1 = torch.randn(100, generator=g, requires_grad=True) W2 = torch.randn((100, 27), generator=g, requires_grad=True) b2 = torch.randn(27, generator=g, requires_grad=True) parameters = [C, W1, b1, W2, b2]
@AlexTang992 күн бұрын
This is the most amazing video on neural network mathematics knowledge I've ever seen; thank you very much, Andrej!
@adirmashiach46392 күн бұрын
Something you didn't explain - 51:30 - if we want L to go up we simply need to increase the variables in the direction of the gradient? How come it is so if some gradients are negative?
@yourxylitol2 күн бұрын
first question: yes second question: because thats the definition of a gradient -> If the gradient is negative, this means that if you make the data smaller, the loss will increase
@howardbaek54132 күн бұрын
This is the single best explanation of backpropagation in code that I've seen so far. Thanks Andrej.
@ThefirstrobloxCEO9893 күн бұрын
Thanks a lot for the insight. and demonstration. I really look forward more videos from you, Andrej!
@debdeepsanyal90303 күн бұрын
just a random fun fact, with gen = torch.Generator().manual_seed(2147483647), the bigram generated name i got was `c e x z e .`, amazing.
@mehulchopra15173 күн бұрын
Thanks a ton for this Andrej! Explained and presented in such simple and relatable terms. Gives confidence to get into the weeds now.
@wangcwy4 күн бұрын
The best ML tutorial video I have watched this year. I really like detailed example, and how these difficult concepts are explained in a simple manner. What a treat for me to watch and learn!
@hotshot-te9xw4 күн бұрын
What methods of alignment do you personally feel are extreamly promising for ensuring future AGI doesnt kill us all
@a000000j4 күн бұрын
One of the best explanation of LLM....
@ced14014 күн бұрын
Thanks you very much
@quentinquarantino82614 күн бұрын
Is this the real, one and only Andrej Karpathy? Or is this a deep fake?
@soumilbinhani88034 күн бұрын
Hello sir, it would be great if you could make a video on how exactly these tokens are converted into embedding vectors, eg - how to make word to vec.. Thank you
@soblueskyzll4 күн бұрын
I am following exactly (i believe) to calculate all the gradients, but beginning from dhpreact, the results show "exact: False, approximate True", with maxdiff on the order of 1e-9 ~ 1e-10. Is it just some numerical issue, or I did something wrong? Anyone had same issue?
@Kevin.Kawchak4 күн бұрын
Thank you for the discussion
@josephmathew46675 күн бұрын
Thank you so much Andrej. As many has already commented, this was by far one of the best lectures I have ever listened.
@chadlinden69125 күн бұрын
Learning the math is really interesting, helps to build a mental image of a plane or block of vectors shifting while training. I'd be curious to know if in the history/evolution of ML and AI if hardware drove intense matrix math derived software solutions to AI, or if improving hardware made this math possible.
@ehudklein5 күн бұрын
wow
@zbaktube6 күн бұрын
About makemore: Did you show it to Elon? 😀
@zbaktube6 күн бұрын
Hehe, the bloopers at the end are priceless! I almost switched off the video when you said goodbye 😀
@Aliced36456 күн бұрын
Finished!
@andreasfraunberger51696 күн бұрын
🎯 Key Takeaways for quick navigation: 00:00 *📝 Andrej Karpathy re-presents a popular talk on large language models for KZread.* 00:29 *💾 LLaMA 270B by Meta AI is an open weights large language model, unlike ChatGPT.* 04:09 *💰 LLaMA 270B's training involves high costs, utilizing 6,000 GPUs over 12 days.* 06:56 *🧠 Large language models predict the next word, gaining context from extensive training.* 09:03 *📚 Pre-training compresses internet data into a model's parameters, similar to a lossy zip file.* 14:29 *🤖 Fine-tuning shapes base models into assistants via quality Q&A pairs.* 22:21 *📚 Human trainers follow complex instructions emphasizing helpful and truthful AI behavior.* 23:46 *🏆 Closed language models presently outclass open-source ones on performance leaderboards.* 33:16 *🖼️ Large language models are moving toward multimodality, with the ability to use and generate multimedia.* 42:11 *🤖 Language models are evolving to act like computational OS, coordinating tools for solving problems.* 46:18 *🛡️ Safety measures in language models can be circumvented with inventive prompt crafting.* 52:08 *💉 Covert instructions within media can manipulate language models, leading to undesired actions.* 56:18 *⚠️ Language models face threats like data poisoning, which can embed trigger phrases causing harmful behavior.* Made with HARPA AI
@steveh5723 күн бұрын
Super useful
@rgonzo666 күн бұрын
If you give it time to think, it will always come up with the same answer: 42.
@him12March6 күн бұрын
Amazing insights - wonderful video and great slide deck
@aojing6 күн бұрын
@1:16:48 the method `byte_encoder` is to shift ordinal of special ASCII characters up by 256, e.g., replacing SPACE with “Ġ” (U+0120)
@avishakeadhikary7 күн бұрын
Its amazing how Andrej is such a polite guy. Thanks for sharing this amazing content. :)
@user-se8wy3mi5g7 күн бұрын
на 0.75 скорости норм смотрится)
@RohitSharma-qm8hv7 күн бұрын
can ChatGPT browse internet? I am really confused. I used to think that it cannot. As of now, ChatGPT cannot browse the internet. It generates responses based on a mixture of licensed data, data created by human trainers, and publicly available information. It was last trained on new data in 2023, so its knowledge is current only up until that time. Also, ChatGPT can't plot graphs like he showed. Was his ChatGPT integrated with other tools or APIs that facilitate real-time data retrieval or internet browsing and use tools like graph plotting? So, what was Andrej speaking about? Could someone help me understand?
@waytolegacy8 күн бұрын
Anyone saw the *end credits*? So funny 😅😅🤣🤣
@waytolegacy8 күн бұрын
He literally made me smile at 1:27:41 (ahh swiftie!)
@waytolegacy8 күн бұрын
Our variable naming was really good (1:16:20)
@franzbischoff8 күн бұрын
Current GPT-4: In circuits, whispers, Wisdom blooms from silicon- Worlds shaped by our hands.
@zalzalahbuttsaab8 күн бұрын
Yah got here through the Micrograd song on Udio. Dang! That thing's addictive!!! 🤣🤣🤣 Excellent learning tool!
@Zack-yx5nl8 күн бұрын
I wonder if certain cases, like the repetitive use of spaces and punctuation can either be manually trained (or absolutely defined) and then excluded from general tokenization.
@shawncortexiphan23678 күн бұрын
great video! thanks a lot Andrej!! I would only ask two simply question, why do you "average" the values to set up "attention" back and forth among words? what does it mean from language point of view?