Deep Learning Lecture 12: Recurrent Neural Nets and LSTMs

Slides available at: www.cs.ox.ac.uk/people/nando....
Course taught in 2015 at the University of Oxford by Nando de Freitas with great help from Brendan Shillingford.

Пікірлер: 48

  • @autripat
    @autripat8 жыл бұрын

    Key bookmarks, LSTM explanation starts at 25:30 LSTM implementation at 31:49 Torch code at 34:15

  • @nikolatanev3293

    @nikolatanev3293

    8 жыл бұрын

    thank you :)

  • @rajupowers

    @rajupowers

    7 жыл бұрын

    46:40 image captioning

  • @WahranRai

    @WahranRai

    6 жыл бұрын

    I totally desagree with your approach !!! all things are related and we have to understand how we come to one concept from another !!!

  • @kingpopaul
    @kingpopaul8 жыл бұрын

    Thanks for publishing those videos!

  • @jovanyagathe7790
    @jovanyagathe77908 жыл бұрын

    I am really curious and looking forward to the next parts.

  • @aliahmadvand4135
    @aliahmadvand41358 жыл бұрын

    Thank you for your great explanations!

  • @alexandra-stefaniamoloiu2431
    @alexandra-stefaniamoloiu24318 жыл бұрын

    Great explanations! Thank you!

  • @SebastianSchwank
    @SebastianSchwank9 жыл бұрын

    Love it ! Just genius !

  • @AlexanderBollbach
    @AlexanderBollbach7 жыл бұрын

    can somebody use a neural network to filter out those low frequencies?

  • @ganujha6586

    @ganujha6586

    7 жыл бұрын

    Alexander Bollbach

  • @yannisran7312

    @yannisran7312

    6 жыл бұрын

    You can speed the play to make high-frequency noise trimmed

  • @malvinnyahwai8582
    @malvinnyahwai85825 жыл бұрын

    Thank you for such a useful video

  • @hellMode
    @hellMode4 жыл бұрын

    23:54 why explode? you have a upper bound, if the upper bound goes to zero then gradient vanish, but if the upper bound goes to infinite, it bounds nothing.

  • @MrBBGROUP
    @MrBBGROUP6 жыл бұрын

    The lecture explains the high conceptual level of using RNN, LSTM and their applications. Thanks very much for that. I would appreciate more if it can tell a little more in details about how LSTM solves the vanishing or exploding problem in RNN by using gates. Also, how backpropagation works in this case to minimize the error in details? Probably I am going to learn more about it. Anyway, tks much,

  • @ahmedmazari9279
    @ahmedmazari92797 жыл бұрын

    How back propagation works in bidirectional LSTM ?

  • @markszlazak
    @markszlazak9 жыл бұрын

    When will lectures 13 and 14 become available?

  • @arturodeza3816
    @arturodeza38168 жыл бұрын

    12.48: In the RNN's cartoon: x_t and x_(t-1) should not be connected?

  • @SteveRowe
    @SteveRowe7 жыл бұрын

    Thank you for making your lecture available! Why is the attribution for LSTM given as Alex Graves instead of Juergen Schmidhuber?

  • @sudhaannangi8143

    @sudhaannangi8143

    7 жыл бұрын

    Steve Rowe

  • @chrisanderson1513

    @chrisanderson1513

    7 жыл бұрын

    It might be that he used Alex Graves' slides?

  • @TheDeatheater3
    @TheDeatheater38 ай бұрын

    super good

  • @hypnoticpoisons
    @hypnoticpoisons7 жыл бұрын

    at @36:57 what is the variable 'opt'?

  • @vtn6
    @vtn68 жыл бұрын

    I think your explanation at the beginning about the coolness of the convnet that yan leCunn demoed on the class is missing something? Specifically: taking a picture of something (in your talk, a picture of the crowd), pointing the camera away, and having a program signal "high" when the crowd is back in the field of view isn't that exciting? (you could do this with just dot product and threshold). Does the convnet provide scale and rotational invariance? Just based on your explanation, I don't see how the convnet provides advantages over much simpler methods.

  • @isaamthalhath4359

    @isaamthalhath4359

    4 жыл бұрын

    convnets can capture features better than an RNN, thats why we mainly use convnets instead of RNN when it comes to image processing. It can downsample or upscale images by using the captured features.

  • @jurelecnik
    @jurelecnik7 жыл бұрын

    At 19:40... why is there Theta^T (that is, Theta transpose) in the derivative and not just Theta?

  • @user-pg4bq1wo7t

    @user-pg4bq1wo7t

    7 жыл бұрын

    I think this is a widespread error. Typically people don't write down BPTT in an explicit manner: instead, they define intermediate variables \delta_t and use it to express BPTT. At the year of 2012, a paper ("On the difficulty of training recurrent neural networks") tries to directly write the derivatives out. In this paper, the matrix is transposed, which I think to be an error. Though, this error doesn't compromise its correctness: the paper was aimed at illustrating why gradient explosion/vanishing occurs, which will not be affected by the additional transpose operation. But the transpose will affect BPTT's implementation (This might be overcome by automatic differentiation system of modern deep learning framework, such as TensorFlow.) But since then many slides cite that work, including but not limited to many famous courses (e.g.: Stanford's CS224d, etc.), which has a very bad influence. If you search Google for "RNN Jacobian transpose", you will see many Stanford students questioning about this! It's really strange that the course instructor doesn't correct this error and keeps making students prove this.

  • @calmnessduan3243
    @calmnessduan32437 жыл бұрын

    THx

  • @PushkarTripathi1
    @PushkarTripathi18 жыл бұрын

    At 38:19. Why is it 2000 x 4 dimensions per sentence. The hidden state is 1000 numbers and we have 4 levels of LSTMs so it should be 4 x 1000. Where are the other 4000 coming from ? I noticed the same in the original paper as well arxiv.org/pdf/1409.3215.pdf , so I am surely missing something.

  • @zeus1082
    @zeus10826 жыл бұрын

    I just found that this LSTM is a mimic of plc ladder logic diagram.

  • @JoePist0ne
    @JoePist0ne7 жыл бұрын

    What does he mean when he states "(...) recurrence is essential for Turing Completeness"?

  • @chrisanderson1513

    @chrisanderson1513

    7 жыл бұрын

    I think he's talking about requirements for something being Turing complete. This might help: cs.stackexchange.com/questions/991/are-there-minimum-criteria-for-a-programming-language-being-turing-complete

  • 8 жыл бұрын

    Why is it like nn.Sigmoid()(...) in the torch code?

  • 8 жыл бұрын

    +Gökçen Eraslan Aah, it's something like a = nn.Sigmoid(); c = a:forward(b)

  • @diodin8587
    @diodin85877 жыл бұрын

    Shouldn't the recurrent part of RNN be h_t = φ(θ h_{t-1} + θ_x x_t) ? The activation should take all the input including h and x.

  • @FariborzGhavamian

    @FariborzGhavamian

    7 жыл бұрын

    I don't think so. See \phi(h_{t-1}) as the output at time step t-1, which is injected back as input for the time step t.

  • @rutapetra8795

    @rutapetra8795

    7 жыл бұрын

    That`s the part that confused me too. I have checked couple more different papers on RNN and all included both h and x.

  • @emadwilliam45

    @emadwilliam45

    5 жыл бұрын

    I am also confused about it, did you come up with an explanation?

  • @antoniowyldernandofreitaso8796
    @antoniowyldernandofreitaso87965 жыл бұрын

    Helo sou do Brasil

  • @WahranRai
    @WahranRai6 жыл бұрын

    Why dont you use standard notation for RNN recursive formula !!!!!!???? h(t) = phi( h(t-1)*W + U*x(t) ) and y(t) = psi( V*h(t) ) + eventually biases

  • @gogopie64
    @gogopie647 жыл бұрын

    Somewhat confused about this. Is h a scalar or a vector? Also if h is a vector then what is a product of vectors?

  • @chrisanderson1513

    @chrisanderson1513

    7 жыл бұрын

    I think h is a vector. It could be point-wise multiplication. en.wikipedia.org/wiki/Hadamard_product_(matrices)

  • @riccardoandreetta9520
    @riccardoandreetta95207 жыл бұрын

    there's a lot of magic there. It looks like you can't REALLY explain in which way sentences are generated. I wonder how you can design systems, if you can't control the parameters, because you don't exactly know what they do. Moreover, how are the parameters learnt by the system ? there is no minimizing process here of a cost function, or is there one ? it's not clear, at least to me.

  • @IgorAherne

    @IgorAherne

    6 жыл бұрын

    As far as I know, the gates (input, forget, output) are 1-layer "mini-neural-nets" themselves. So their weights get tweaked through back propagation as well. This increases processing cost by a large amount however. However I still don't see how the exploding / vanishing gradient (during training) is solved with these complex LSTM systems...

  • @pattiknuth4822
    @pattiknuth48223 жыл бұрын

    Guy loves to hear himself talk. The actual lecture doesn't begin until about 7:50

  • @AlqGo
    @AlqGo7 жыл бұрын

    This kind of "condensed" lecture is only suitable for people who have solid background knowledge in NN already.

  • @hypnoticpoisons
    @hypnoticpoisons7 жыл бұрын

    At 19:40... why is there Theta^T (that is, Theta transpose) in the derivative and not just Theta?

  • @calmnessduan3243
    @calmnessduan32437 жыл бұрын

    THx