What is Layer Normalization? | Deep Learning Fundamentals

You might have heard about Batch Normalization before. It is a great way to make your networks faster and better but there are some shortcomings of Batch Norm. That's why researchers have come up with an improvement over Batch Norm called Layer Normalization.
In this video, we learn how Layer Normalization works, how it compares to Batch Normalization and for what cases it works best.
👇 Get your free AssemblyAI token here
www.assemblyai.com/?...
▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬▬
🖥️ Website: www.assemblyai.com
🐦 Twitter: / assemblyai
🦾 Discord: / discord
▶️ Subscribe: kzread.info?...
🔥 We're hiring! Check our open roles: www.assemblyai.com/careers
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
#MachineLearning #DeepLearning

Пікірлер: 43

  • @solidsnake013579
    @solidsnake0135794 ай бұрын

    hands down best and fastest explanation on youtube

  • @samtj3524
    @samtj3524 Жыл бұрын

    for some reason, I have always had doubts about whether I truly understand this concept. But after watching your video, I can confidently say I fully understand. Thank you for your efforts!

  • @AssemblyAI

    @AssemblyAI

    Жыл бұрын

    Happy to help

  • @ujjalkrdutta7854
    @ujjalkrdutta7854 Жыл бұрын

    Commenting here again, great series of videos. I know all the concepts, but still going through them again, just to make sure that perhaps I could get a 2nd perspective on the topics. And indeed I am able to view things differently. Really good way of explaining!

  • @AssemblyAI

    @AssemblyAI

    Жыл бұрын

    That's great to hear Ujjal!

  • @suicidaldonut
    @suicidaldonut2 жыл бұрын

    really excellent explainer -- thanks for making this!

  • @AssemblyAI

    @AssemblyAI

    2 жыл бұрын

    You're very welcome!

  • @whyisitnowhuh8691
    @whyisitnowhuh8691 Жыл бұрын

    thank you!! this really helps

  • @AssemblyAI

    @AssemblyAI

    Жыл бұрын

    Great to hear! - Mısra

  • @user-wz6fh5kl6l
    @user-wz6fh5kl6l3 ай бұрын

    concise but clear

  • @Bookerer
    @Bookerer2 жыл бұрын

    Very helpful!

  • @AssemblyAI

    @AssemblyAI

    2 жыл бұрын

    Glad to hear! - Mısra

  • @jiehu3634
    @jiehu36342 жыл бұрын

    This is the best explaination I've ever seen!!!! Thanks

  • @AssemblyAI

    @AssemblyAI

    2 жыл бұрын

    Thank you Jie! - Mısra

  • @shivkrishnajaiswal8394
    @shivkrishnajaiswal8394 Жыл бұрын

    Good Video!!

  • @dainispolis3550
    @dainispolis3550 Жыл бұрын

    Thank you , be cous of your model i was traind the latvian languige asr

  • @mesutt.2442
    @mesutt.2442 Жыл бұрын

    Thank you very much! This was a great explanation! By any chance does anyone have an idea about why the Transformer architecture uses layer normalization? Because, in the video you mention that the layer normalization works better with RNN. The Transformer model does not use any recurrence, however, they still use layer normalization..

  • @AssemblyAI

    @AssemblyAI

    Жыл бұрын

    Really good question Mesut. My guess is that layer norm lends itself better to tasks that need to be parallelized. That's why they use Layer norm instead of batch norm.

  • @ujjalkrdutta7854

    @ujjalkrdutta7854

    Жыл бұрын

    I think it is suited because each token can still be processed independently in a transformer, inherently. It's a different thing we refine them with attention

  • @FazeLyndon
    @FazeLyndon Жыл бұрын

    Layer Normalization requires a number for normalized_shape, can you please advise what would be a good number for this? Is this the same number as number of layers?

  • @atchutram9894
    @atchutram98942 жыл бұрын

    I think the working of LN is well explained and a comparison with BN is also well presented. But I didn't understand why LN is better of sequences.

  • @AssemblyAI

    @AssemblyAI

    2 жыл бұрын

    Hey atchut, thank you! The reason LN works with sequences is because the mean and std is calculated per example and not per batch.

  • @jiehu3634
    @jiehu36342 жыл бұрын

    QQ: does layer norm requires features having a similar scale? else it's normalized with a different scale and the feature with a small value is likely to get biased results?

  • @AssemblyAI

    @AssemblyAI

    2 жыл бұрын

    Hey Jie, Layer Norm makes sure that the features are on the same scale. So you do not need to do any prior normalization.

  • @user-yw4kq3ds8h
    @user-yw4kq3ds8h4 ай бұрын

    At 02:30, you said in Normalization, you calculate average and the mean for each neuron. I suppose you meant average and SD there.

  • @josemuarnapoleon
    @josemuarnapoleon9 ай бұрын

    She is my dreamed girl. the content is super concise and sweet, thanks!

  • @jiehu3634
    @jiehu36342 жыл бұрын

    And another question, if we can guarantee the sequences with the same length, then BN should work?

  • @jreinhart

    @jreinhart

    2 жыл бұрын

    If you have a small batch size then batch norm still may not work because the batch would not be representative of the data set. She explains this at 0:45 of this video.

  • @jiehu3634

    @jiehu3634

    2 жыл бұрын

    @@jreinhart I got this, then if batch size is large, we won't have this issue.

  • @paaabl0.

    @paaabl0.

    Жыл бұрын

    They are just reading what's already on Wikipedia to have some content for marketing purposes. Don't expect deeper insight, it's not a science channel.

  • @themightyquinn100
    @themightyquinn100 Жыл бұрын

    Is it normalization or standardization? You mention mean and standard deviation which is not normalization.

  • @checkxfile
    @checkxfile Жыл бұрын

    what would happen if we don't use any normalization?

  • @fakhriddintojiboev7252
    @fakhriddintojiboev72522 жыл бұрын

    Simple and best explanation! By the way, you are very beautiful!

  • @adekoyasamuel8788
    @adekoyasamuel87885 ай бұрын

    For some reason I don't understand your calculation in the batch normalization aspect

  • @Omsip123

    @Omsip123

    5 ай бұрын

    Ok

  • @maryamaghili1148
    @maryamaghili1148Ай бұрын

    you are confusing the definitions. in the 2:15 minute you claim a batch with 3 data points is coming to the layer and then instead of having 1mean and var for the entire batch, you calc 3 mean and variance for each neuron, which does not look right. Please revisit your video.

  • @endlesswu
    @endlesswu5 ай бұрын

    if you have really small batch number, if you have really small batch number,

  • @popescutrandafir
    @popescutrandafir2 ай бұрын

    i am deep in love

  • @konstantinkurlayev9242
    @konstantinkurlayev9242 Жыл бұрын

    So it is a simple math, goash ....

  • @ILManent
    @ILManent Жыл бұрын

    Actually, I don't think this video is really accurate. If you read the original paper (arxiv.org/pdf/1607.06450.pdf) in section 3.1 you can see that not only vectors get normalized as you described. They also get rescaled with learnable parameters! Half of the story is missing!

  • @robertbracco8321

    @robertbracco8321

    Жыл бұрын

    Thanks for your comment, I didn't know this and it made me check the pytorch layernorm docs and you're correct. I agree this should have been mentioned in the video. I'm guessing the reason it wasn't is because batchnorm has the same gamma/beta scale and shift parameters and they were mainly focused on how bn and ln differ, but it would have been better if it were included.

  • @yoctometer2045

    @yoctometer2045

    Жыл бұрын

    While you are right, these learnable paramteres are present in every normalization technique: batch norm, layer norm, group norm etc. So if we are reflecting on the intuitive differences between BN and LN, this video does splendid job.