What is Layer Normalization? | Deep Learning Fundamentals

You might have heard about Batch Normalization before. It is a great way to make your networks faster and better but there are some shortcomings of Batch Norm. That's why researchers have come up with an improvement over Batch Norm called Layer Normalization.
In this video, we learn how Layer Normalization works, how it compares to Batch Normalization and for what cases it works best.
👇 Get your free AssemblyAI token here
www.assemblyai.com/?...
▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT ▬▬▬▬▬▬▬▬▬▬▬▬
🖥️ Website: www.assemblyai.com
🐦 Twitter: / assemblyai
🦾 Discord: / discord
▶️ Subscribe: kzread.info?...
🔥 We're hiring! Check our open roles: www.assemblyai.com/careers
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
#MachineLearning #DeepLearning

Пікірлер: 43

@solidsnake0135794 ай бұрын
hands down best and fastest explanation on youtube
@samtj3524 Жыл бұрын
for some reason, I have always had doubts about whether I truly understand this concept. But after watching your video, I can confidently say I fully understand. Thank you for your efforts!
@AssemblyAI
Жыл бұрын
Happy to help
@ujjalkrdutta7854 Жыл бұрын
Commenting here again, great series of videos. I know all the concepts, but still going through them again, just to make sure that perhaps I could get a 2nd perspective on the topics. And indeed I am able to view things differently. Really good way of explaining!
@AssemblyAI
Жыл бұрын
That's great to hear Ujjal!
@suicidaldonut2 жыл бұрын
really excellent explainer -- thanks for making this!
@AssemblyAI
2 жыл бұрын
You're very welcome!
@whyisitnowhuh8691 Жыл бұрын
thank you!! this really helps
@AssemblyAI
Жыл бұрын
Great to hear! - Mısra
@user-wz6fh5kl6l3 ай бұрын
concise but clear
@Bookerer2 жыл бұрын
Very helpful!
@AssemblyAI
2 жыл бұрын
Glad to hear! - Mısra
@jiehu36342 жыл бұрын
This is the best explaination I've ever seen!!!! Thanks
@AssemblyAI
2 жыл бұрын
Thank you Jie! - Mısra
@shivkrishnajaiswal8394 Жыл бұрын
Good Video!!
@dainispolis3550 Жыл бұрын
Thank you , be cous of your model i was traind the latvian languige asr
@mesutt.2442 Жыл бұрын
Thank you very much! This was a great explanation! By any chance does anyone have an idea about why the Transformer architecture uses layer normalization? Because, in the video you mention that the layer normalization works better with RNN. The Transformer model does not use any recurrence, however, they still use layer normalization..
@AssemblyAI
Жыл бұрын
Really good question Mesut. My guess is that layer norm lends itself better to tasks that need to be parallelized. That's why they use Layer norm instead of batch norm.
@ujjalkrdutta7854
Жыл бұрын
I think it is suited because each token can still be processed independently in a transformer, inherently. It's a different thing we refine them with attention
@FazeLyndon Жыл бұрын
Layer Normalization requires a number for normalized_shape, can you please advise what would be a good number for this? Is this the same number as number of layers?
@atchutram98942 жыл бұрын
I think the working of LN is well explained and a comparison with BN is also well presented. But I didn't understand why LN is better of sequences.
@AssemblyAI
2 жыл бұрын
Hey atchut, thank you! The reason LN works with sequences is because the mean and std is calculated per example and not per batch.
@jiehu36342 жыл бұрын
QQ: does layer norm requires features having a similar scale? else it's normalized with a different scale and the feature with a small value is likely to get biased results?
@AssemblyAI
2 жыл бұрын
Hey Jie, Layer Norm makes sure that the features are on the same scale. So you do not need to do any prior normalization.
@user-yw4kq3ds8h4 ай бұрын
At 02:30, you said in Normalization, you calculate average and the mean for each neuron. I suppose you meant average and SD there.
@josemuarnapoleon9 ай бұрын
She is my dreamed girl. the content is super concise and sweet, thanks!
@jiehu36342 жыл бұрын
And another question, if we can guarantee the sequences with the same length, then BN should work?
@jreinhart
2 жыл бұрын
If you have a small batch size then batch norm still may not work because the batch would not be representative of the data set. She explains this at 0:45 of this video.
@jiehu3634
2 жыл бұрын
@@jreinhart I got this, then if batch size is large, we won't have this issue.
@paaabl0.
Жыл бұрын
They are just reading what's already on Wikipedia to have some content for marketing purposes. Don't expect deeper insight, it's not a science channel.
@themightyquinn100 Жыл бұрын
Is it normalization or standardization? You mention mean and standard deviation which is not normalization.
@checkxfile Жыл бұрын
what would happen if we don't use any normalization?
@fakhriddintojiboev72522 жыл бұрын
Simple and best explanation! By the way, you are very beautiful!
@adekoyasamuel87885 ай бұрын
For some reason I don't understand your calculation in the batch normalization aspect
@Omsip123
5 ай бұрын
Ok
@maryamaghili1148Ай бұрын
you are confusing the definitions. in the 2:15 minute you claim a batch with 3 data points is coming to the layer and then instead of having 1mean and var for the entire batch, you calc 3 mean and variance for each neuron, which does not look right. Please revisit your video.
@endlesswu5 ай бұрын
if you have really small batch number, if you have really small batch number,
@popescutrandafir2 ай бұрын
i am deep in love
@konstantinkurlayev9242 Жыл бұрын
So it is a simple math, goash ....
@ILManent Жыл бұрын
Actually, I don't think this video is really accurate. If you read the original paper (arxiv.org/pdf/1607.06450.pdf) in section 3.1 you can see that not only vectors get normalized as you described. They also get rescaled with learnable parameters! Half of the story is missing!
@robertbracco8321
Жыл бұрын
Thanks for your comment, I didn't know this and it made me check the pytorch layernorm docs and you're correct. I agree this should have been mentioned in the video. I'm guessing the reason it wasn't is because batchnorm has the same gamma/beta scale and shift parameters and they were mainly focused on how bn and ln differ, but it would have been better if it were included.
@yoctometer2045
Жыл бұрын
While you are right, these learnable paramteres are present in every normalization technique: batch norm, layer norm, group norm etc. So if we are reflecting on the intuitive differences between BN and LN, this video does splendid job.

What is Layer Normalization? | Deep Learning Fundamentals

Пікірлер: 43

@AssemblyAI

Жыл бұрын

@AssemblyAI

Жыл бұрын

@AssemblyAI

2 жыл бұрын

@AssemblyAI

Жыл бұрын

@AssemblyAI

2 жыл бұрын

@AssemblyAI

2 жыл бұрын

@AssemblyAI

Жыл бұрын

@ujjalkrdutta7854

Жыл бұрын

@AssemblyAI

2 жыл бұрын

@AssemblyAI

2 жыл бұрын

@jreinhart

2 жыл бұрын

@jiehu3634

2 жыл бұрын

@paaabl0.

Жыл бұрын

@Omsip123

5 ай бұрын

@robertbracco8321

Жыл бұрын

@yoctometer2045

Жыл бұрын

Келесі