Swin Transformer - Paper Explained

Brief explanation of swin transformer paper.
Paper link: arxiv.org/abs/2103.14030
Table of Content:
00:00 Intro
00:13 Patch Embedding
02:56 Swin transformer block
03:57 W-MSA
05:14 SW-MSA
08:56 Masked MSA implementation
14:58 Patch Merging
16:12 stages
18:28 Image classification result
19:12 Relative position bias
Icon made by Freepik from flaticon.com

Пікірлер: 24

@VedantJoshi-mr2us29 күн бұрын
By far one of the best + complete, SWIN transformer explanations on the entire Internet.
@soroushmehraban
29 күн бұрын
Thanks!
@FinalProject-rw1yf
27 күн бұрын
@@soroushmehraban Hi sir, could you also explain the FasterViT and GCViT paper...
@SizzleSan Жыл бұрын
Thorough! Very comprehensible, thank you.
@antonioperezvelasco32978 ай бұрын
Thanks for the good explanation!
@yehanwasura Жыл бұрын
Really informative, helped me lot to understand many concepts here. Keep up the good work
@soroushmehraban
Жыл бұрын
Thanks! I’ll try my best.
@rohollahhosseyni856410 ай бұрын
Very well explained, thank you Soroush.
@soroushmehraban
10 ай бұрын
Glad you liked it
@siarez Жыл бұрын
Great video! Thanks
@soroushmehraban
Жыл бұрын
Thanks for the feedback 🙂
@akbarmehraban5007 Жыл бұрын
I enjoy very much
@omarabubakr640811 ай бұрын
That's The Most Illustrative Video Of Swin-Transformers on The Internet!
@soroushmehraban
11 ай бұрын
Glad you enjoyed it 😃
@omarabubakr6408
11 ай бұрын
@@soroushmehraban yes abs thx so much, although I Have a Quick Question More Related to PyTorch actually which is in min 12:49 in line 239 in the code 1st what does -1 here means and what does it do exactly with the tensor 2nd from where we get [4,16] the 4 here from where we got it cuz its not mentioned in the reshaping. Thanks in advance.
@user-sw4hm4hh6h11 ай бұрын
perfect description.
@soroushmehraban
11 ай бұрын
Glad it was helpful 🙂
@proteus3338 ай бұрын
Amazing video !
@soroushmehraban
8 ай бұрын
Thanks!
@kundankumarmandal68046 ай бұрын
You deserve more likes and subscribers
@soroushmehraban
6 ай бұрын
Thanks man🙂 appreciated
@EngineerXYZ.6 ай бұрын
Why channel increasees c to 4c after merging
@soroushmehraban
6 ай бұрын
Because we downsample the width by 2 and height by 2. That means we have 4x downsampling in spatial resolution that we give it to the channel dimension. It's just a simple tensor reshaping. For example 10x10x2 = 200. After merging it's 5x5x8 = 200.
@dslkgjsdlkfjd5 күн бұрын
2:43 C would be equal to the number of filters not the number of kernels. In the torch.nn.conv2d operation being performed we have 3 kernels for each input channel and then C number of filters. Each filter having 3 kernels not C number of kernels.

Swin Transformer - Paper Explained

Пікірлер: 24

@soroushmehraban

29 күн бұрын

@FinalProject-rw1yf

27 күн бұрын

@soroushmehraban

Жыл бұрын

@soroushmehraban

10 ай бұрын

@soroushmehraban

Жыл бұрын

@soroushmehraban

11 ай бұрын

@omarabubakr6408

11 ай бұрын

@soroushmehraban

11 ай бұрын

@soroushmehraban

8 ай бұрын

@soroushmehraban

6 ай бұрын

@soroushmehraban

6 ай бұрын

Келесі