XLNet: Generalized Autoregressive Pretraining for Language Understanding

Ғылым және технология

Abstract:
With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling. However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy. In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation. Furthermore, XLNet integrates ideas from Transformer-XL, the state-of-the-art autoregressive model, into pretraining. Empirically, XLNet outperforms BERT on 20 tasks, often by a large margin, and achieves state-of-the-art results on 18 tasks including question answering, natural language inference, sentiment analysis, and document ranking.
Authors: Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le
arxiv.org/abs/1906.08237

Пікірлер: 43

  • @jackleesv
    @jackleesv4 жыл бұрын

    Please keep making these videos. Your work is amazing:))

  • @connorshorten6311
    @connorshorten63115 жыл бұрын

    Really cool! The "New York is a city" example helped a lot with my understanding of this!

  • @deeplearner2634
    @deeplearner26343 жыл бұрын

    I didn't really understand the random permutation idea from other sources but this video made it clear on how shuffled permutation allows to combine AR and BERT's AE idea. Thanks!

  • @abcdxx1059
    @abcdxx10594 жыл бұрын

    after a point searching on the internet gives you nothing this channel is the only place where i find explanations for very complex things in a way a newbie can understand please dont stop

  • @nikeshnaik5516
    @nikeshnaik55164 жыл бұрын

    I was not getting core idea behind XLNet and you made it look like piece of cake. Subscribed!! . Thank you.

  • @helloadventureworld
    @helloadventureworld3 жыл бұрын

    you are genuinely changing the way I read and understand papers. your work is amazing do more NLP papers plz

  • @aayatrubab
    @aayatrubab5 жыл бұрын

    I was eagerly waiting for it... Thanks, Yannic :)

  • @darkmythos4457
    @darkmythos44575 жыл бұрын

    Was actualy waiting for you to post this, thanks

  • @yuchengcho7471
    @yuchengcho74714 жыл бұрын

    Thanks Yannic, this explanation is super helpful!!

  • @vedantwalke1789
    @vedantwalke17894 жыл бұрын

    Great Video. The explanation made it very simple to understand and was very helpful !!

  • @rpcruz
    @rpcruz5 жыл бұрын

    I liked the quick digression into language modeling before getting into the meat of the paper. Awesome video!

  • @hemichael2111

    @hemichael2111

    5 жыл бұрын

    so do I

  • @limynet
    @limynet2 жыл бұрын

    This is a really nice rundown, compare to me half reading and half sleeping over the long paper, thank you so much.

  • @fahadqurashi7103
    @fahadqurashi71034 жыл бұрын

    Excellent explanation, easy to understand and to the point 👌👌

  • @kaenovama
    @kaenovama Жыл бұрын

    7 min in and finally i get it where i didn't understand! Thank you!

  • @venkatalv7014
    @venkatalv70144 жыл бұрын

    very clear explanation, thanks for the video

  • @aleksandrbazanov3866
    @aleksandrbazanov38664 жыл бұрын

    Yannic is the best guy on the internet

  • @neilteng4161
    @neilteng41613 жыл бұрын

    Thank you So Much!

  • @nenadsubat9489
    @nenadsubat94894 ай бұрын

    This is so enlightening!!!

  • @thepresistence5935
    @thepresistence59352 жыл бұрын

    I took 2.20 hours to understand this, but worth I don't forgot anymore

  • @srikanthkoraveni8210
    @srikanthkoraveni82105 жыл бұрын

    Thank you

  • @BSelm05
    @BSelm054 жыл бұрын

    Thank you for a very clear explanation. I wonder how many samples they perform for each sentence. I couldn't find it in the paper.

  • @AlphaMoury
    @AlphaMoury2 жыл бұрын

    Thank you man

  • @aj-kl7de
    @aj-kl7de3 жыл бұрын

    Thanks, You are doing god's work!

  • @aqibfayyaz1619
    @aqibfayyaz16192 жыл бұрын

    Great effort.

  • @keerthanajaganathan
    @keerthanajaganathan4 жыл бұрын

    Thanks for the video - it is very helpful. Could you please make a video on Cross-lingual Language Model Pretraining (XLM)?

  • @supertramp_og
    @supertramp_og4 жыл бұрын

    "Hmmmm " :P Great video.

  • @narendraparmar1631
    @narendraparmar16313 жыл бұрын

    Thanks

  • @prabhikthapa4671
    @prabhikthapa46714 жыл бұрын

    Hi, could you also clarify why are embedding being multiplied to the representation produced by network in the equation 1,2 formulation, my understanding was you could directly apply softmax to the representation to train?

  • @RAZZKIRAN
    @RAZZKIRAN3 жыл бұрын

    thankq

  • @prateethnayak8422
    @prateethnayak84223 жыл бұрын

    @12:40 is what model is listening to ! :D

  • @Rednivrug
    @Rednivrug4 жыл бұрын

    Language Modelling where Autoregressive is used to predict the next word by using the windows of previous words and Autoencoding is predicting the missing words in the windows of words. Aren't These two techniques are the same which we used to train word embedding for Word2Vec where CBOW(continuous bag of words) used to predict the next word by taking the previous window of words and N-gram method which used to predict the missing word by using previous and next words. What's the difference? Am I missing something?

  • @YannicKilcher

    @YannicKilcher

    4 жыл бұрын

    The difference is that in autoregressive decoding you do it again and again in a sequence.

  • @RajeshSharma-bd5zo
    @RajeshSharma-bd5zo2 жыл бұрын

    Cool video!! Thanks for it. However, the voice quality was not that great and clearly, there is a scope of improvement for it here.

  • @jingciwang587
    @jingciwang5874 жыл бұрын

    Now all my mind is like New Hmm is a Hmm, New York is a Hmm Hmm and Hmm~ Hmm~ Hmm~ Hmm~~~

  • @jwstolk
    @jwstolk4 жыл бұрын

    2 out of 5 words is closer to 40%

  • @emuccino
    @emuccino3 жыл бұрын

    18:23 😳😂

  • @robinranabhat3125
    @robinranabhat31255 жыл бұрын

    In this AI journey, I find some explain papers. leave behind the code. some explain the code. hopelessly though. and leave the theory. Can't we have like a paper explanation followed by an explanation of the code in tensorflow or pytorch ?? OR maybe everyone just knows only the high-level overview and thus, ignoring that part. although requiring great necessity. please upvote guys.

  • @YannicKilcher

    @YannicKilcher

    5 жыл бұрын

    If I were to also review the code, the videos would be 2+ hours 😁 but thanks for the feedback, will consider doing separate code reviews

  • @robinranabhat3125

    @robinranabhat3125

    5 жыл бұрын

    @@YannicKilcher if you do code review as well, trust me your channel we be the one of its kind. Anyone strudy enough to learn these papers, would want to see implementation details

  • @abcdxx1059

    @abcdxx1059

    4 жыл бұрын

    @@YannicKilcher damn you would do that for us 🤗🤗🤗

  • @tanny411

    @tanny411

    4 жыл бұрын

    I swear to sit through the 2 hours+ videos. This channel is life!

  • @wongmikeho
    @wongmikeho5 жыл бұрын

    Hmm..hmm...hmm...hmmm

Келесі