L19.3 RNNs with an Attention Mechanism

Ғылым және технология

Slides: sebastianraschka.com/pdf/lect...
-------
This video is part of my Introduction of Deep Learning course.
Next video: • L19.4.1 Using Attentio...
The complete playlist: • Intro to Deep Learning...
A handy overview page with links to the materials: sebastianraschka.com/blog/202...
-------
If you want to be notified about future videos, please consider subscribing to my channel: / sebastianraschka

Пікірлер: 16

  • @mahaaljarrah3236
    @mahaaljarrah32362 жыл бұрын

    Thank you very much, was really helpful.

  • @SOFTWAREMASTER
    @SOFTWAREMASTER8 ай бұрын

    Thanks. The video was clear.

  • @Amapramaadhy
    @Amapramaadhy Жыл бұрын

    Thanks for the great content. I find the "time step" terminology concept confusing. Might we call it "next item in the sequence" instead?

  • @LoveinPortofino1
    @LoveinPortofino12 жыл бұрын

    Thanks for the very detailed explanation. In the graph you show that S_{t-1} and c_{t} goes into the calculation of S_{t}. However, we also need y_{t-1} as well. In the original paper, the formula is: S_{t} = f(S_{t-1}, y_{t-1}, c_{t}). That is why I did not quite understand the computation graph to calculate S_{t}. Is the formula below correct? S_{t} = sigmoid(Weight_{hidden state} * S_{t-1} + Weight_{context} * c_{t} + Weight_{input} y_{t-1})

  • @Mvkv4L
    @Mvkv4L Жыл бұрын

    Hi Sebastian. I hope you're doing well. I have a question about the attention weights (alpha) and the energies (e) and I was hoping you would help. What are the shapes of alpha and e? Are they vectors or scalars?

  • @abubakarali6399
    @abubakarali63992 жыл бұрын

    When you summing up all the attention weights then aggregate function give single result, instead of which word is more important. How this aggregate function remember attention weight of every word?

  • @SebastianRaschka

    @SebastianRaschka

    2 жыл бұрын

    Good point. The attention weights give you basically the importance of a word. Btw here I am using word as a loose term that also means the representation of it as a real-valued vector. You weight the "relevant" words via these weights more strongly when you aggregate. Like you hinted at, you squash the attention-weighted words into a single one, but this is still more "powerful" than a regular RNN. In the regular RNN, you carry on the words iteration by iteration, so information from early words might become forgotten. Say you are in word 10 in a sentence input. With the attention weighted version, you can have still a high weight on word 1, and then via the aggregation function this word will have a high influence then in time step 10.

  • @koiRitwikHai
    @koiRitwikHai Жыл бұрын

    On 17:42, I think the two items (that are going into the pink "Neural Net" box) should be S_{tprime-1} and h_{t} where tprime = t' because otherwise e_{t,tprime} will be solely dependent on t only, then why even call it e_{t,tprime}? just call it e_t simply

  • @ricardogomes9528

    @ricardogomes9528

    Жыл бұрын

    I think it should be S_{t-1} (as it is) and h_{t'}, because in the formula below we have that t' ranges all the way through T, which is the max_index of the time-steps of the encoder. Am I wrong?

  • @borutsvara7245

    @borutsvara7245

    7 ай бұрын

    Yes, i think should be h_{t'}, otherwise the t' dependency does not make sense. But also you will need to run the yellow RNN on the entire sentence to get S_{t-1} and compute attention, which makes no much sense. Further in the next slide you have a h_{'t}, which may indicate a correction attempt.

  • @Prithviization

    @Prithviization

    3 ай бұрын

    Slide 29 here : sebastianraschka.com/pdf/lecture-notes/stat453ss21/L19_seq2seq_rnn-transformers__slides.pdf

  • @736939
    @7369392 жыл бұрын

    What is the meaning of the bidirectional RNN? Why exactly is this type used for the attention?

  • @SebastianRaschka

    @SebastianRaschka

    2 жыл бұрын

    A bidirectional sounds fancier than it really is. You can think of a standard RNN where you use it on the input sentence as usual. Then, you use it again on the sentence where the words are in reversed order. Then, you concatenate the 2 representations. The one from the forward sentence and the one from the reverse sentence. Why? I guess that's because you want to capture more of the context. For some words, the relevant words come before, for some it's after.

  • @736939

    @736939

    2 жыл бұрын

    @@SebastianRaschka Thank you professor.

  • @yosimadsu2189
    @yosimadsu2189 Жыл бұрын

    Thanks. More detailed but not the best. It lacks actual value to be calculated. Still confusing though

  • @nayanrainakaul9813
    @nayanrainakaul98132 ай бұрын

    Thanks kween

Келесі