L19.4.1 Using Attention Without the RNN -- A Basic Form of Self-Attention

Ғылым және технология

Slides: sebastianraschka.com/pdf/lect...
-------
This video is part of my Introduction of Deep Learning course.
Next video: • L19.4.2 Self-Attention...
The complete playlist: • Intro to Deep Learning...
A handy overview page with links to the materials: sebastianraschka.com/blog/202...
-------
If you want to be notified about future videos, please consider subscribing to my channel: / sebastianraschka

Пікірлер: 26

@jameson16972 жыл бұрын
Phenomenal explanation. Thank you for your devotion to open and free education!
@vladimir_egay2 жыл бұрын
Best intro to self-attention I have seen so far! Thank you a lot!
@SebastianRaschka
2 жыл бұрын
wow, thanks! glad to hear it was clear!
@vatsalpatel6330
Жыл бұрын
agreed
@geekyprogrammer4831 Жыл бұрын
your videos are super underrated. You deserve a lot more views!!
@SebastianRaschka
Жыл бұрын
Wow thanks for the compliment. Maybe that's because I don't do any SEO, haha
@mohammadyahya78 Жыл бұрын
I think this is the best so far to explain self-attention and most basic.
@angrest12 жыл бұрын
Thank you very much for these videos. They make complicated things seem much simpler and much more fun. And you do a great job explaining the intuition behind these sometimes quite confusing topics. So thanks again it's a massive help!
@SebastianRaschka
2 жыл бұрын
Thanks so much for saying this, I am glad to hear!
@gluteusminimus2134 Жыл бұрын
Love your videos. You are really good at breaking down things at clear steps. Most other videos on youtube either do not make any sense or are not explaining things on a deep enough level.
@nobywils Жыл бұрын
Amazing and simple explanation!
@SebastianRaschka
Жыл бұрын
Thanks!!
@albertoandreotti79402 жыл бұрын
only 902 views? this is a great resource!
@SebastianRaschka
2 жыл бұрын
Hah, I take this as a compliment! Thanks!
@algorithmo1342 ай бұрын
how do we create the word embedding? Also what is x_i in 12:38?
@thiagopx12 жыл бұрын
It doesn’t make sense for me dot(xi,xj). It seems I am comparing the similarity between words instead of comparing key and query. Could explain better , please?
@SebastianRaschka
2 жыл бұрын
This is a good point. Essentially, it is same thing as computing the similarity between the query and a key in its simple form without parameters. Instead of dot(x_i, x_j) the key-query computation would be dot(q_i, k_j), but the query itself is computed as q_i = Q x_i, and the key is computed as k_j = K x_j. So, if you don't use weight matrices Q and K, this would be the same similarity between words.
@7369392 жыл бұрын
What is the intuition to multiply the dot products between the word embeddings? So as I understood not all the embedding methods can be used to get the attention by the dot products. Only those, that represent the importance of terms at the beginning (TF-IDF, BM25 can't be used then).
@davidlearnforus Жыл бұрын
great lesson. I am just starting to learn dl and I may ask something silly, but this self-attention looks to me the same as what we have in graph nn-s
@SebastianRaschka
Жыл бұрын
yeah I think it's somewhat related. Btw there are actually graph attention networks as well 😅 arxiv.org/abs/1710.10903
@davidlearnforus
Жыл бұрын
@@SebastianRaschka many thanks for answer and paper
@friedrichwilhelmhufnagel3577 Жыл бұрын
Nennt man hier nicht ein dot product attention score und die ganze summe attention vector?
@abubakarali63992 жыл бұрын
Why geometric deep learning is not included in the playlist?
@SebastianRaschka
2 жыл бұрын
The semester is only so long ... But my new book (coming out next month) will have a chapter on graph neural nets!
@WahranRai2 жыл бұрын
you underline / highlight too much almost every word / sentence: your slides are overloaded and unreadable
@SebastianRaschka
2 жыл бұрын
Oh oh. I think I got better recently when I switched from an iPad to a pen tablet -- the iPad makes the annotation just too easy lol