Swin Transformer - Paper Explained

Brief explanation of swin transformer paper.
Paper link: arxiv.org/abs/2103.14030
Table of Content:
00:00 Intro
00:13 Patch Embedding
02:56 Swin transformer block
03:57 W-MSA
05:14 SW-MSA
08:56 Masked MSA implementation
14:58 Patch Merging
16:12 stages
18:28 Image classification result
19:12 Relative position bias
Icon made by Freepik from flaticon.com

Пікірлер: 24

  • @VedantJoshi-mr2us
    @VedantJoshi-mr2us29 күн бұрын

    By far one of the best + complete, SWIN transformer explanations on the entire Internet.

  • @soroushmehraban

    @soroushmehraban

    29 күн бұрын

    Thanks!

  • @FinalProject-rw1yf

    @FinalProject-rw1yf

    27 күн бұрын

    @@soroushmehraban Hi sir, could you also explain the FasterViT and GCViT paper...

  • @SizzleSan
    @SizzleSan Жыл бұрын

    Thorough! Very comprehensible, thank you.

  • @antonioperezvelasco3297
    @antonioperezvelasco32978 ай бұрын

    Thanks for the good explanation!

  • @yehanwasura
    @yehanwasura Жыл бұрын

    Really informative, helped me lot to understand many concepts here. Keep up the good work

  • @soroushmehraban

    @soroushmehraban

    Жыл бұрын

    Thanks! I’ll try my best.

  • @rohollahhosseyni8564
    @rohollahhosseyni856410 ай бұрын

    Very well explained, thank you Soroush.

  • @soroushmehraban

    @soroushmehraban

    10 ай бұрын

    Glad you liked it

  • @siarez
    @siarez Жыл бұрын

    Great video! Thanks

  • @soroushmehraban

    @soroushmehraban

    Жыл бұрын

    Thanks for the feedback 🙂

  • @akbarmehraban5007
    @akbarmehraban5007 Жыл бұрын

    I enjoy very much

  • @omarabubakr6408
    @omarabubakr640811 ай бұрын

    That's The Most Illustrative Video Of Swin-Transformers on The Internet!

  • @soroushmehraban

    @soroushmehraban

    11 ай бұрын

    Glad you enjoyed it 😃

  • @omarabubakr6408

    @omarabubakr6408

    11 ай бұрын

    @@soroushmehraban yes abs thx so much, although I Have a Quick Question More Related to PyTorch actually which is in min 12:49 in line 239 in the code 1st what does -1 here means and what does it do exactly with the tensor 2nd from where we get [4,16] the 4 here from where we got it cuz its not mentioned in the reshaping. Thanks in advance.

  • @user-sw4hm4hh6h
    @user-sw4hm4hh6h11 ай бұрын

    perfect description.

  • @soroushmehraban

    @soroushmehraban

    11 ай бұрын

    Glad it was helpful 🙂

  • @proteus333
    @proteus3338 ай бұрын

    Amazing video !

  • @soroushmehraban

    @soroushmehraban

    8 ай бұрын

    Thanks!

  • @kundankumarmandal6804
    @kundankumarmandal68046 ай бұрын

    You deserve more likes and subscribers

  • @soroushmehraban

    @soroushmehraban

    6 ай бұрын

    Thanks man🙂 appreciated

  • @EngineerXYZ.
    @EngineerXYZ.6 ай бұрын

    Why channel increasees c to 4c after merging

  • @soroushmehraban

    @soroushmehraban

    6 ай бұрын

    Because we downsample the width by 2 and height by 2. That means we have 4x downsampling in spatial resolution that we give it to the channel dimension. It's just a simple tensor reshaping. For example 10x10x2 = 200. After merging it's 5x5x8 = 200.

  • @dslkgjsdlkfjd
    @dslkgjsdlkfjd5 күн бұрын

    2:43 C would be equal to the number of filters not the number of kernels. In the torch.nn.conv2d operation being performed we have 3 kernels for each input channel and then C number of filters. Each filter having 3 kernels not C number of kernels.