The math behind Attention: Keys, Queries, and Values matrices

Ғылым және технология

This is the second of a series of 3 videos where we demystify Transformer models and explain them with visuals and friendly examples.
Video 1: The attention mechanism in high level • The Attention Mechanis...
Video 2: The attention mechanism with math (this one)
Video 3: Transformer models • What are Transformer M...
If you like this material, check out LLM University from Cohere!
llm.university
00:00 Introduction
01:18 Recap: Embeddings and Context
04:46 Similarity
11:09 Attention
20:46 The Keys and Queries Matrices
25:02 The Values Matrix
28:41 Self and Multi-head attention
33:54: Conclusion

Пікірлер: 277

  • @SerranoAcademy
    @SerranoAcademy7 ай бұрын

    Hello all! In the video I made a comment about how the Key and Query matrices capture low and high level properties of the text. After reading some of your comments, I've realized that this is not true (or at least there's no clear reason for it to be true), and probably something I misunderstood while reading in different places in the literature and threads. Apologies for the error, and thank you to all who pointed it out! I've removed that part of the video.

  • @tantzer6113

    @tantzer6113

    7 ай бұрын

    No worries. It might help to pin this comment to the top. Thanks a lot for the video.

  • @chrisw4562

    @chrisw4562

    2 ай бұрын

    Thanks for note. That comment actually sounds very reasonable to me. If I understand this right, keys and querys help to determine the context.

  • @JTedam
    @JTedam4 ай бұрын

    I have watched more than 10 videos trying to wrap my head around the paper, attention is all you need. This video is by far the best video. I have been trying to assess why it is so effective at explaining such a complex concept and why the concept is hard to understand in the first place. Serrano explains the concepts, step by step, without making any assumptions. It helps a great deal. He also used diagrams, showing animations along the way as he explains. As for the architecture, there are so many layers condense in to the architecture. It has obviously evolved over the years with multiple concepts interlaced into the attention mechanism. so it is important to break it down into the various architectures and take each one at a time - positional encoding, tokenization, embedding, feed forward, normalization, neural networks, the math behind it, vectors, query-key -values. etc. Each of these are architectures that need explaining, or perhaps a video of their own, before putting them together. I am not quite there yet but this has improved my understanding a great deal. Serrano, keep up your approach. I would like to see you cover other areas such as Transformer with human feedback, the new Qstar architecture etc. You break it down so well.

  • @SerranoAcademy

    @SerranoAcademy

    4 ай бұрын

    Thank you for such a thorough analysis! I do enjoy making the videos a lot, so I'm glad you find them useful. And thank you for the suggestions! Definitely RLHF and QStar are topics I'm interested in, so hopefully soon there'll be videos of those!

  • @blahblahsaurus2458

    @blahblahsaurus2458

    Ай бұрын

    Did you also try reading the original Attention is All you Need paper, and if so, what was your experience? Was there too much jargon and math to understand?

  • @visahonkanen7291

    @visahonkanen7291

    26 күн бұрын

    Agree, an excellelt öööököööööööövnp

  • @JTedam

    @JTedam

    21 күн бұрын

    @@blahblahsaurus2458 too much jargon obviously intended for those already Familiar with the concepts. The diagram appears upside down and not intuitive at all. Nobody has attempted to redraw the architecture diagram in the paper. It follows no particular convention at all.

  • @Rish__01
    @Rish__018 ай бұрын

    This might be the best video on attention mechanisms on youtube right now. I really liked the fact that you explained matrix multplications with linear transformations. It brings a whole new level of understanding with respect to embedding space. Thanks a lot!!

  • @SerranoAcademy

    @SerranoAcademy

    8 ай бұрын

    Thank you so much! I enjoy seeing things pictorially, especially matrices, and I'm glad that you do too!

  • @maethu

    @maethu

    4 ай бұрын

    This is really great, thanks a lot!

  • @JosueHuaman-oz4fk

    @JosueHuaman-oz4fk

    Ай бұрын

    That is what many disseminators lack: explaining things with the mathematical foundations. I understand that it is difficult to do so. However, you did it, and in an amazing way. The way you explained the linear transformation was epic. Thank you.

  • @fcx1439
    @fcx14392 ай бұрын

    this is definitely the best explained video for attention model, the original paper sucks because there is not intuition at all, just simple words and crazy math equations that I don't know what it's doing

  • @user-tl3ix3xf3j
    @user-tl3ix3xf3j7 ай бұрын

    This is unequivocally the best introduction to Transformers and Attention Mechanisms on the entire internet. Luis Serrano has guided me all the way from Machine Learning to Deep Learning and onto Large Language Models, maximizing the entropy of my AI thinking, allowing for limitless possibilities.

  • @JonMasters

    @JonMasters

    Ай бұрын

    💯 agree. Everything else is utter BS by comparison. I’ve never tipped someone $10 for a video before this one ❤

  • @computersciencelearningina7382
    @computersciencelearningina73822 ай бұрын

    This is the best description of Keys, Query, and Values I have ever seen across the internet. Thank you.

  • @__redacted__
    @__redacted__5 ай бұрын

    I really like how you're using these concrete examples and combining them with visuals. These really help build an intuition on what's actually happening. It's definitely a lot easier for people to consume than struggling with reading academic papers, constantly looking things up, and feeling frustrated and unsure. Please keep creating content like this!

  • @23232323rdurian
    @23232323rdurian8 ай бұрын

    you explain very well Luis. Thank you. It's HARD to explain complicated topics in a way people can easily understand. You do it very well.

  • @SerranoAcademy

    @SerranoAcademy

    8 ай бұрын

    Thank you! :)

  • @rohitchan007
    @rohitchan0076 ай бұрын

    Please continue making videos. You're the best teacher on this planet.

  • @channel8048
    @channel80488 ай бұрын

    Just the Keys and Queries section is worth the watch! I have been scratching my head on this for an entire month!

  • @SerranoAcademy

    @SerranoAcademy

    8 ай бұрын

    Thank you! :)

  • @joelegger2570
    @joelegger25705 ай бұрын

    These are the best videos so far I saw to understand how Transformer / LLM works. Thank you. I really like maths but it is good that you keep math simple that one don't loose the overview. You really have a talent to explain complex things in a simple way. Greets from Switzerland

  • @WhatsAI
    @WhatsAI8 ай бұрын

    The best explanation I've seen so far! Really cool to see how much closer the field is getting to understanding those models instead of being so abstract thanks to people like you, Luis! :)

  • @ganapathysubramaniam
    @ganapathysubramaniam5 ай бұрын

    Absolutely the best set of videos explaining the most discussed topic. Thank you!!

  • @aravind_selvam
    @aravind_selvam7 ай бұрын

    This video is, without a doubt, the best video on transformers and attention that I have ever seen.

  • @guitarcrax127
    @guitarcrax1278 ай бұрын

    Amazing video. pushed forward my understanding of attention by quite a few steps and helped me build an intuition for what’s happening under the hood. Eagerly waiting for the next one

  • @dekasthiti
    @dekasthitiАй бұрын

    This really is one of the best videos explaining the purpose of K, Q, V. The illustrations provide a window into the math behind the concepts.

  • @ChujiOlinze
    @ChujiOlinze8 ай бұрын

    Thanks for sharing your knowledge freely. I have been waiting patiently. You add a different perspective that we appreciate. Looking forward to the 3rd video. Thank you!

  • @SerranoAcademy

    @SerranoAcademy

    8 ай бұрын

    Thank you! So glad you like the videos!

  • @alexrypun
    @alexrypun6 ай бұрын

    Finally! This is the best from the tons of videos/articles I saw/read. Thank you for your work!

  • @snehotoshbanerjee1938
    @snehotoshbanerjee19385 ай бұрын

    One of the Best video on Attention. Such a complex subject been taught in a simple manner.Thank u!

  • @Chill_Magma
    @Chill_Magma7 ай бұрын

    Honestly you are the best content creator for learning Machine learning and Deep learning in a visual and intuitive way

  • @johnschut164
    @johnschut1644 ай бұрын

    Your explanations are truly great! You have even understood that you sometimes have to ‘lie’ first to be able to explain things better. My sincere compliments! 👊

  • @MrMacaroonable
    @MrMacaroonable4 ай бұрын

    this is absolutely the best video that clearly illustrate and explains why we need v,k,q in attention. Bravo!

  • @chiboreache
    @chiboreache8 ай бұрын

    very nice and easy explanation, thanks!

  • @SeyyedMohammadLoghmanDastgheyb
    @SeyyedMohammadLoghmanDastgheyb7 ай бұрын

    This is the best video that I have seen about the concept of attention! (I have seen more than 10 videos but none of them was like this.) Thank you so much! I am waiting for the next videos that you have promised! You are doing a great job!

  • @lijunzhang2788
    @lijunzhang27887 ай бұрын

    Great explanation. I was waitinig for this after your first video on attention mechanism! Your are so talented in explaining things in easily understandable ways! Thank you for the effort put into this and keep up the great work!

  • @kranthikumar4397
    @kranthikumar4397Ай бұрын

    This is one of the best videos on attention and w,k,v so far.Thank you for a detailed explanation

  • @RoyBassTube
    @RoyBassTube5 сағат бұрын

    Thanks! This is one of the best explanations of Q, K & V I've heard!

  • @lengooi6125
    @lengooi61253 ай бұрын

    Simply the best explanation on this subject.Crystal clear .Thank you

  • @TheMotorJokers
    @TheMotorJokers7 ай бұрын

    Thank you, really good job on the visualization! They make the process really understandable.

  • @shannawallace7855
    @shannawallace78555 ай бұрын

    I had to read this research paper for my Intro to AI class and it's obviously written for people who already have a lot of background knowledge in this field. so being a newbie I was so lost lol. Thanks for breaking it down and making it easy to understand!

  • @user-zq8bd7iz4e
    @user-zq8bd7iz4e7 ай бұрын

    The best explanation l've ever seen about the attention mechanism, amazing

  • @redmond2582
    @redmond25824 ай бұрын

    Amazing explanation of very difficult concepts. The best explanation I have found on the topic so far.

  • @leilanifrost771
    @leilanifrost771Ай бұрын

    Math is not my strong suit, but you made these mathematical concepts so clear with all the visual animations and your concise descriptions. Thank you so much for the hard work and making this content freely accessible to us!

  • @MrSikesben
    @MrSikesben3 ай бұрын

    This is truly the best video explaining each stage of a transformer, thanks man

  • @etienneboutet7193
    @etienneboutet71938 ай бұрын

    Great video as always ! Thank you so much for this quality content.

  • @joshuaohara7704
    @joshuaohara77047 ай бұрын

    Amazing video! Took my intuition to the next level.

  • @chrisw4562
    @chrisw45622 ай бұрын

    Thank you for the great tutorial. This is the clearest explanation I have found so far.

  • @bzaruk
    @bzaruk5 ай бұрын

    MAN! I have no words! Your channel is priceless! thank you for everything!!!

  • @brainxyz
    @brainxyz8 ай бұрын

    Amazing explanation. Thanks a lot for your efforts.

  • @antraprakash2562
    @antraprakash25623 ай бұрын

    This is one of best video I've come across to understand embeddings, attention. Looking forward to more such explanations which can simplify such complex mechanisms in AI world. Thanks for your efforts

  • @awinashjha
    @awinashjha7 ай бұрын

    This probably is “the best video “ on this topic

  • @deniz517
    @deniz5177 ай бұрын

    The best video I have ever watched about this!

  • @alnouralharin
    @alnouralharinАй бұрын

    One of the best explanations I have ever watched

  • @0xSingletOnly
    @0xSingletOnly3 ай бұрын

    I'm going to try implement self-attention and multi-head attention myself, thanks so much for doing this guide!

  • @danielmoore4311
    @danielmoore43115 ай бұрын

    Excellent job! Please continue making videos that breakdown the math.

  • @MarkusEicher70
    @MarkusEicher705 ай бұрын

    HI Luis. Thank you for this video. I'm sure, this is a very good way to explain this complex topic, but I just won't get this into my brain. I'm currently doing the Math for Machine Learning specialization on Coursera and brushing up my algebra and calculus skills that are way to low. In any case, you made me getting involved into this and now I will grind through it till I make it. I'm sure the pain will become less and the fog will lighten up. 😊

  • @vasanthakumarg4538
    @vasanthakumarg45384 ай бұрын

    This is the best video I had seen explaining attention mechanism. Keep up the good work!

  • @BABA-oi2cl
    @BABA-oi2cl4 ай бұрын

    Thanks a lot for this. I always got terrified of the maths that might be there but the way you explained it all made it seem really easy ❤

  • @EkShunya
    @EkShunya8 ай бұрын

    Thank you it was a superb explanation 🤩

  • @kylelau1329
    @kylelau13294 ай бұрын

    I've been watching over 10 of the Transformers architecture tutorial videos, This one is so far the most intuitive way to understand it! really good work! yeah, Natural language processing is a hard topic, This tutorial is kind of revealed the black boxe from the large language model.

  • @brandonheaton6197
    @brandonheaton61978 ай бұрын

    Amazing explanation. I am a professional pedagogue and this is stellar work

  • @Chill_Magma
    @Chill_Magma7 ай бұрын

    Excellent explaination

  • @januaymagori4642
    @januaymagori46427 ай бұрын

    Today i have understood attention mechanism better than never before

  • @sheiphanshaijan1249
    @sheiphanshaijan12498 ай бұрын

    Brilliant Explanation.

  • @SerranoAcademy

    @SerranoAcademy

    8 ай бұрын

    Thank you! :)

  • @devmum2008
    @devmum20082 ай бұрын

    This is great videos with clarity! on Keys, Query, and Values. Thank you

  • @saintcodded2918
    @saintcodded29183 ай бұрын

    This is powerful yet so simple. Thanks

  • @PeterGodek2
    @PeterGodek25 ай бұрын

    Best video so far on this topic

  • @_ncduy_
    @_ncduy_Ай бұрын

    This is the best video for people trying to understand basic knowledge about transformer, thank you so much ^^

  • @joehannes23
    @joehannes234 ай бұрын

    Great video finally understood all the concepts in their context

  • @sreelakshminarayanan.m6609
    @sreelakshminarayanan.m660917 күн бұрын

    Best Video to get clear understanding of transformers

  • @panagiotiskyriakis795
    @panagiotiskyriakis7952 ай бұрын

    Great and intutive explanations! Well done!

  • @knobbytrails577
    @knobbytrails5774 ай бұрын

    Best video on this topic so far!

  • @user-ff7fu3ky1v
    @user-ff7fu3ky1v6 ай бұрын

    Great explanation. I just really needed the third video. Hope you will post it soon.

  • @EkShunya
    @EkShunya8 ай бұрын

    the scaling factor in scaled dot product can be understood as the ~ dis(points). in higher dimentions the estimate of distance between two points in roughly srqt(dimentions)

  • @deveshnandan323
    @deveshnandan3232 ай бұрын

    Sir , You are a Blessing to New Learners like me , Thank You , Big Respect.❤

  • @sadiaafrinpurba9179
    @sadiaafrinpurba91798 ай бұрын

    Thank you for the explantion.

  • @aldotanca9430
    @aldotanca94305 ай бұрын

    Thanks, very useful. I love the way you explain things here and on Coursera.

  • @user-eg8mt4im1i
    @user-eg8mt4im1i5 ай бұрын

    Amazing video and explanations, thank you !!

  • @pavangupta6112
    @pavangupta61125 ай бұрын

    Very well explained. Got a bit closer to understanding attention models.

  • @celilylmaz4426
    @celilylmaz44264 ай бұрын

    This video has the best explanations of QKV matrices and linear layers among the other resources i ve come across. I don't know why but people seem not interested in explaining whats really happening with each action we take which results in loads of vague points. Yet, the video could ve been further improved with more concrete examples and numbers. Thank you.

  • @davidking545
    @davidking5455 ай бұрын

    Thank you so much! the image at 24:29 made this whole concept click immediately.

  • @tankado_ndakota
    @tankado_ndakota2 күн бұрын

    amazing video. that's what i looking for. I need to know mathematical background to understand what is happening behind. thank you sir!

  • @SulkyRain
    @SulkyRain3 ай бұрын

    Love the simplification you brought !!! super

  • @Wise_Man_on_YouTube
    @Wise_Man_on_YouTube2 ай бұрын

    "This step is called softmax" . 😮😮😮 Today I understood why softmax is used. Such a beautiful function. And such a great way to demonstrate it.

  • @OpenAITutor
    @OpenAITutor7 ай бұрын

    You are a master!

  • @user-jz8hr5fo9e
    @user-jz8hr5fo9e2 күн бұрын

    Great Explanation. Thank you so much

  • @danherman212nyc
    @danherman212nycАй бұрын

    I studied linear algebra during the day on Coursera and watch KZread videos at night on state of the art machine learning. I’m amazed by how fast you learn with Luis. I’ve learned everything I was curious about. Thank you!

  • @SerranoAcademy

    @SerranoAcademy

    Ай бұрын

    Thank you, it’s an honor to be part of your learning journey! :)

  • @cooperwu38
    @cooperwu382 ай бұрын

    Super clear ! Great video !!

  • @alieskandarian5258
    @alieskandarian52583 ай бұрын

    It was fascinating to me, I searched a lot for a math explained which didn't find thanks for this Please do more😅 with more complex ones

  • @MrMehrd
    @MrMehrd8 ай бұрын

    Fast forward watched, seems to be good , thx will watch

  • @ThinAirElon
    @ThinAirElon7 ай бұрын

    This is Great ! in next video can you please include why we need sin and cosine functions for posistional encoding ? whats the intution behind it? if we add this vector to embedding vector what happens ?

  • @DanteNoguez
    @DanteNoguez4 ай бұрын

    Amazing. Thanks a lot for this!

  • @bonadio60
    @bonadio602 ай бұрын

    As always, great content! Thanks

  • @Hiyori___
    @Hiyori___2 ай бұрын

    God sent video. So incredibly well put

  • @wiktormigaszewski8684
    @wiktormigaszewski86843 ай бұрын

    Yep, a truly terrific video. Congrats!

  • @rollingstone1784
    @rollingstone178410 күн бұрын

    @SerranoAcademy At 13:23, you show a matrix-vector multiplication with a column-vector (rows of the table times columns of the vector) by right-multiplication. On the right side, maybe you could use, additionally to "is sent to", the icon "orange' (orange prime). This would show the multiplication in a clearer way Remark: you use a matrix-vector multiplication here (using a row of the matrix and the words as a column on the right of the matrix). If you use row vectors, the the word vector should be placed horizontally on the left of the matrix and in the explanation, a column of the matrix has to be used. The result is then a row vector again (maybe a bit hard to sketch)

  • @BrikeshKumar987
    @BrikeshKumar9874 ай бұрын

    Thank you so much !! I watched several video and none could explain the concept so well

  • @SerranoAcademy

    @SerranoAcademy

    4 ай бұрын

    Thanks, I'm so glad you enjoyed it! Lemme know if you have suggestions for more topics to cover!

  • @healthyhappy7487
    @healthyhappy74875 күн бұрын

    Best video. Great explanation

  • @naveensubramanian4876
    @naveensubramanian48767 ай бұрын

    Are these slides available somewhere for reference? It will be a great help. Thanks

  • @rollingstone1784
    @rollingstone178410 күн бұрын

    @SerranoAcademy If you want to come to the same notation as in the mentioned paper, Q times K_transpose, than the orange is the query and the phone is the key here. The you calculate q times Q times K_transpose times key_transpose (as mentioned in the paper) Remark: the paper uses "sequences", described as a "row vectors". However, usually one uses column vectors. Using row vectors, the linear transformation is a left multiplication a times A and the dot product is written as a times b_transpose. Using column vectors, the linear transformation is A times a and the dot product is written as a_transpose times b. This, in my opinion, is the standard notation, e.g. to write Ax = b and not xA=b.

  • @cool12345687
    @cool123456874 ай бұрын

    This is awesome.. Thanks a ton for this video. May God bless you..

  • @manojkalyan94
    @manojkalyan94Ай бұрын

    Loved it want to go through again and again ❤

  • @user-hf3fu2xt2j
    @user-hf3fu2xt2j2 ай бұрын

    best explanation i've seen

  • @Ludwighaffen1
    @Ludwighaffen15 ай бұрын

    Great video series! Thanks you! That helped a ton 🙂 One small remark: the concept of the "length" of a vector that you use here confused me. Here, I guess you take the point of view of a programmer: len(vector) outputs the number of dimensions of the vector. However, for a mathematician, the length of a vector is its norm or also called magnitude (square root of x^2 + y^2).

  • @mattmurdock3868
    @mattmurdock38682 ай бұрын

    Best video on this topic🙌🏻

  • @gemini_537
    @gemini_5372 ай бұрын

    Summary by Gemini: This video is about the math behind attention mechanisms in large language models. The speaker first gives a brief overview of what attention mechanisms are and how they are used in large language models. Then, he dives into the details of the math behind attention mechanisms, including the concepts of keys, queries, and values matrices. Here are the key points from the video: * Attention mechanisms are a way for large language models to focus on the most relevant parts of an input sentence when generating text. * Keys, queries, and values matrices are all used to calculate the attention weights, which determine how much weight to give to each word in the input sentence. * The keys and queries matrices are used to find the similarity between words in the input sentence. * The values matrix is used to combine the information from the relevant words to generate the output text. The speaker also mentions that he will be going into more detail about how attention mechanisms are used in Transformer models in the next video in this series.

  • @ayoubelmhamdi7920
    @ayoubelmhamdi79207 ай бұрын

    so great video

  • @o.k.4599
    @o.k.45992 ай бұрын

    I haven't blinked my eyes for a sec. 👏🏼🙏🏼

  • @MSGMSUSA
    @MSGMSUSA4 ай бұрын

    Wow!!! Now, I understand attention mechanism. I did not understand a bit when learning about this in an expensive AI course

  • @tariqkhan1518
    @tariqkhan151810 күн бұрын

    Thankyou so much for the video.

Келесі