L19.4.1 Using Attention Without the RNN -- A Basic Form of Self-Attention
Ғылым және технология
Slides: sebastianraschka.com/pdf/lect...
-------
This video is part of my Introduction of Deep Learning course.
Next video: • L19.4.2 Self-Attention...
The complete playlist: • Intro to Deep Learning...
A handy overview page with links to the materials: sebastianraschka.com/blog/202...
-------
If you want to be notified about future videos, please consider subscribing to my channel: / sebastianraschka
Пікірлер: 26
Phenomenal explanation. Thank you for your devotion to open and free education!
Best intro to self-attention I have seen so far! Thank you a lot!
@SebastianRaschka
2 жыл бұрын
wow, thanks! glad to hear it was clear!
@vatsalpatel6330
Жыл бұрын
agreed
your videos are super underrated. You deserve a lot more views!!
@SebastianRaschka
Жыл бұрын
Wow thanks for the compliment. Maybe that's because I don't do any SEO, haha
I think this is the best so far to explain self-attention and most basic.
Thank you very much for these videos. They make complicated things seem much simpler and much more fun. And you do a great job explaining the intuition behind these sometimes quite confusing topics. So thanks again it's a massive help!
@SebastianRaschka
2 жыл бұрын
Thanks so much for saying this, I am glad to hear!
Love your videos. You are really good at breaking down things at clear steps. Most other videos on youtube either do not make any sense or are not explaining things on a deep enough level.
Amazing and simple explanation!
@SebastianRaschka
Жыл бұрын
Thanks!!
only 902 views? this is a great resource!
@SebastianRaschka
2 жыл бұрын
Hah, I take this as a compliment! Thanks!
how do we create the word embedding? Also what is x_i in 12:38?
It doesn’t make sense for me dot(xi,xj). It seems I am comparing the similarity between words instead of comparing key and query. Could explain better , please?
@SebastianRaschka
2 жыл бұрын
This is a good point. Essentially, it is same thing as computing the similarity between the query and a key in its simple form without parameters. Instead of dot(x_i, x_j) the key-query computation would be dot(q_i, k_j), but the query itself is computed as q_i = Q x_i, and the key is computed as k_j = K x_j. So, if you don't use weight matrices Q and K, this would be the same similarity between words.
What is the intuition to multiply the dot products between the word embeddings? So as I understood not all the embedding methods can be used to get the attention by the dot products. Only those, that represent the importance of terms at the beginning (TF-IDF, BM25 can't be used then).
great lesson. I am just starting to learn dl and I may ask something silly, but this self-attention looks to me the same as what we have in graph nn-s
@SebastianRaschka
Жыл бұрын
yeah I think it's somewhat related. Btw there are actually graph attention networks as well 😅 arxiv.org/abs/1710.10903
@davidlearnforus
Жыл бұрын
@@SebastianRaschka many thanks for answer and paper
Nennt man hier nicht ein dot product attention score und die ganze summe attention vector?
Why geometric deep learning is not included in the playlist?
@SebastianRaschka
2 жыл бұрын
The semester is only so long ... But my new book (coming out next month) will have a chapter on graph neural nets!
you underline / highlight too much almost every word / sentence: your slides are overloaded and unreadable
@SebastianRaschka
2 жыл бұрын
Oh oh. I think I got better recently when I switched from an iPad to a pen tablet -- the iPad makes the annotation just too easy lol