Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Ғылым және технология

This video explores the T5 large-scale study on Transfer Learning. This paper takes apart many different factors of the Pre-Training then Fine-Tuning pipeline for NLP. This involves Auto-Regressive Language Modeling vs. BERT-Style Masked Language Modeling and XLNet-style shuffling, as well as the impact of dataset composition, size, and how to best use more computation. Thanks for watching and please check out Machine Learning Street Talk where Tim Scarfe, Yannic Kilcher and I discuss this paper!
Machine Learning Street Talk: / @machinelearningstreet...
Paper Links:
T5: arxiv.org/abs/1910.10683
Google AI Blog Post on T5: ai.googleblog.com/2020/02/exp...
Train Large, Then Compress: arxiv.org/pdf/2002.11794.pdf
Scaling Laws for Neural Language Models: arxiv.org/pdf/2001.08361.pdf
The Illustrated Transformer: jalammar.github.io/illustrated...
ELECTRA: arxiv.org/pdf/2003.10555.pdf
Transformer-XL: arxiv.org/pdf/1901.02860.pdf
Reformer: The Efficient Transformer: openreview.net/pdf?id=rkgNKkHtvB
The Evolved Transformer: arxiv.org/pdf/1901.11117.pdf
DistilBERT: arxiv.org/pdf/1910.01108.pdf
How to generate text (HIGHLY RECOMMEND): huggingface.co/blog/how-to-ge...
Tokenizers: blog.floydhub.com/tokenizatio...
Thanks for watching! Please Subscribe!

Пікірлер: 27

@connorshorten63114 жыл бұрын
2:00 Pushing the NLP State-of-the-Art 2:40 Text-to-Text Framework 3:28 Factors of Variation Explored 5:00 Value of Pre-Training 5:25 Attention Masking 6:18 Architecture Results 7:02 Denoising Objectives 8:47 Span Corruption Strategy 9:45 Self-Supervised Learning Study Overview 11:14 Datasets 12:24 Dataset Size 12:56 Fine-Tuning Strategy 14:25 Task Imbalance 15:20 Pre-Train, then Fine-Tune 16:26 How should we use extra computation? 18:47 Scaling up to 11B parameters 19:30 What Didn’t Make the List 22:08 Context-Free Question Answering
@vatsalkrishna5627 Жыл бұрын
I never expected to learn so much from one single video. Amazing work presenting the paper in such a nuanced way!
@emanuelgerber3 жыл бұрын
Thank you! This helped me a lot to understand all the different aspects of T5
@connorshorten6311
3 жыл бұрын
Thanks Emanuel, really glad to hear that!
@SantoshGupta-jn1wn4 жыл бұрын
These videos are amazing, thanks Henry
@BiancaAguglia4 жыл бұрын
You're getting better and better at explaining these papers, Connor. Great job. Also, I enjoyed the conversation on the Machine Learning Street Talk channel. Looking forward to seeing more videos there too. 😊 I've decided to start studying NLP in a more organized manner (right now I have some intuition about how it works, but not much theoretical or practical knowledge.) I'll be watching your NLP videos when I need a productive break from my studies. 😊 P.S. I'm embarrassed to admit that only today I found out your first name was Connor. For some reason I thought it was Henry.
@---kt8cs3 жыл бұрын
Thank you, sir, your videos are gold!
@MakerBen3 жыл бұрын
Thanks for posting this! This is super helpful!
@connorshorten6311
3 жыл бұрын
Thank you so much! Glad you found this useful!
@TimScarfe4 жыл бұрын
Amazing job Connor!
@connorshorten6311
4 жыл бұрын
Thanks Tim!
@taku87513 жыл бұрын
amazing
@L33TNINJA514 жыл бұрын
A little hard to follow as someone who hasn't learned much about AI, but still enjoy your videos!
@hoangnhatpham8076
4 жыл бұрын
The target audience of these kinds of videos isn't supposed to be someone who hasn't learned much about AI anyway.
@L33TNINJA51
4 жыл бұрын
@@hoangnhatpham8076 I guess I just need to get to learning and stop being just a fanboy :)
@bikideka7880
3 жыл бұрын
I think it is hard to follow the videos for someone who do not encounter with research papers(like me).
@dislike__button2 жыл бұрын
I still don't understand how did they combine training on the C4 dataset and all the task specific datasets (squad etc). What role did the C4 datset play? How did they turn the raw text data of C4 into a input output task to train on? Would be grateful if someone could explain, thanks.
@heinsaar4 жыл бұрын
Thanks for sharing! It would be wonderful if you could get a better mic though. The laptop mic has a very unpleasant echo.
@justinmilner8 Жыл бұрын
Is 'deshuffling' really an accurate description of the XLNet pre-training objective? To me, deshuffling indicates prediction of the order of tokens within the text - which is not matching with my understanding of XLNet's pretraining objective.
@justinmilner8
Жыл бұрын
Yes I can confirm now that the deshuffling objective referred to in the T5 paper is not referencing XLNet's permutation masking objecive. (Deshuffling it is cited to SummAE in the T5 paper)
@tommykelly6840 Жыл бұрын
What is the difference between iid mask tokens and Bert Style mask tokens
@forcedlevy3 жыл бұрын
Watch at 0.75 speed
@salimbo45773 жыл бұрын
how much time does it take you guys to read a research paper and what parts do you read. because everytime i try to read one i strart loosing focus, any tips pls help
@zeinramadan
3 жыл бұрын
read the paper in 3 different passes: 1) In your first pass, start with reading the following sections within the paper: title, abstract and figures. 2) The second pass entails you reading the following sections: introduction, conclusion, another pass through figures and scan through the rest of the content. ⁃ The introduction and conclusion section of a paper contains clear and concise information on the content of the paper and a summary of any findings. The information presented in this section usually dismisses any supplementary information and only key information are included. This is beneficial to you as a reader as you get the vital information required to proceed to the other sections within the paper. 3) The third pass of the paper involves reading the whole sections within the paper but skipping any complicated maths or technique formulations that might be alien to you ⁃During this pass, you can also skip any terms and terminologies that you do not understand or aren’t familiar. Check out this article by Andrew Ng about how to efficiently read papers: towardsdatascience.com/how-you-should-read-research-papers-according-to-andrew-ng-stanford-deep-learning-lectures-98ecbd3ccfb3
@rohitghule94374 жыл бұрын
Why so fast