REALM: Retrieval-Augmented Language Model Pre-Training (Paper Explained)

Ғылым және технология

#ai #tech #science
Open Domain Question Answering is one of the most challenging tasks in NLP. When answering a question, the model is able to retrieve arbitrary documents from an indexed corpus to gather more information. REALM shows how Masked Language Modeling (MLM) pretraining can be used to train a retriever for relevant documents in an end-to-end fashion and improves over state-of-the-art by a significant margin.
OUTLINE:
0:00 - Introduction & Overview
4:30 - World Knowledge in Language Models
8:15 - Masked Language Modeling for Latent Document Retrieval
14:50 - Problem Formulation
17:30 - Knowledge Retriever Model using MIPS
23:50 - Question Answering Model
27:50 - Architecture Recap
29:55 - Analysis of the Loss Gradient
34:15 - Initialization using the Inverse Cloze Task
41:40 - Prohibiting Trivial Retrievals
44:05 - Null Document
45:00 - Salient Span Masking
50:15 - My Idea on Salient Span Masking
51:50 - Experimental Results and Ablations
57:30 - Concrete Example from the Model
Paper: arxiv.org/abs/2002.08909
Code: github.com/google-research/la...
My Video on GPT-3: • GPT-3: Language Models...
My Video on BERT: • BERT: Pre-training of ...
My Video on Word2Vec: • [Classic] Word2Vec: Di...
Abstract:
Language model pre-training has been shown to capture a surprising amount of world knowledge, crucial for NLP tasks such as question answering. However, this knowledge is stored implicitly in the parameters of a neural network, requiring ever-larger networks to cover more facts.
To capture knowledge in a more modular and interpretable way, we augment language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus such as Wikipedia, used during pre-training, fine-tuning and inference. For the first time, we show how to pre-train such a knowledge retriever in an unsupervised manner, using masked language modeling as the learning signal and backpropagating through a retrieval step that considers millions of documents.
We demonstrate the effectiveness of Retrieval-Augmented Language Model pre-training (REALM) by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA). We compare against state-of-the-art models for both explicit and implicit knowledge storage on three popular Open-QA benchmarks, and find that we outperform all previous methods by a significant margin (4-16% absolute accuracy), while also providing qualitative benefits such as interpretability and modularity.
Authors: Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, Ming-Wei Chang
Links:
KZread: / yannickilcher
Twitter: / ykilcher
Discord: / discord
BitChute: www.bitchute.com/channel/yann...
Minds: www.minds.com/ykilcher
Parler: parler.com/profile/YannicKilcher
LinkedIn: / yannic-kilcher-488534136
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: www.subscribestar.com/yannick...
Patreon: / yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Пікірлер: 32

@shaz71633 жыл бұрын
can you please do "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks"
@wentianbao53683 жыл бұрын
quite detailed and clear explanation. And I like the brief idea of the paper introduced in the beginning of the video~
@selfhelp1193 жыл бұрын
Using marginalized probability is a good idea. Brilliant!
@jenishah98253 жыл бұрын
This content here, is gold!
@kan_drio9 ай бұрын
Excellent video man! Thank you so much!
@veedrac3 жыл бұрын
I'd imagine REALM is pronounced like the work ‘realm’ (sounds like ‘relm’), given it seems to be a pun on the definition of realm, ‘a field or domain of activity or interest.’
@quebono1003 жыл бұрын
Love it :)
@DistortedV123 жыл бұрын
This seems on the surface similar to the idea Yannic had in the GPT-3 ML Street talk video.
@vimostan2693 жыл бұрын
For "Salient span masking", "BERT-wwm" "SpanBERT" "ERNIE" "RoBERTa" all adopted mask-based modifications in improving BERT.
@LNJP135793 жыл бұрын
Yannic - how do we get to know which research paper(RP) is more relevant. Only a miniscule fraction of RPs published make an impact. In earlier comments I had requested if you could somehow make a mapping of RP video to citations or anything similar, it would be great. Otherwise it is difficult to select videos from so many :).
@herp_derpingson
3 жыл бұрын
Checkout arxiv sanity preserver made by Karpathy. Its intended to serve this purpose. www.arxiv-sanity.com/
@MrAlextorex
3 жыл бұрын
use www.arxiv-sanity.com/ .The paper basicly must be accepted to conferences and must have many citations
@shayanroychoudhury90663 жыл бұрын
Could you do a video on ORQA?
@moon-zm8mx2 жыл бұрын
Thank you for sharing your clear paper explanation! even thought your clear explanation, I have one thing not to be clear to me... 34:17 I can't understand. I think the equation means that more retriever put relevant answer, more the r(z) value is going to be high, and that means gradient too is going up! But In my understanding, it is natural that more models are learned, more the gradient have to be small! So I'm so confused right now. Is what I know wrong? help me please..
@bdennyw13 жыл бұрын
Great explainer! One thing that I’m not sure about. How are the 3 models connected, is this end to end? How does the retrieval work in that case?
@YannicKilcher
3 жыл бұрын
Yes, this is end to end.
@bdennyw1
3 жыл бұрын
@@YannicKilcher Thanks, I'll have to dig into the paper. The retrieval step doesn't seem like it's differentiable, so there is something I'm missing.
@user-nc5cq9yu2c
2 жыл бұрын
@@bdennyw1 I don't see how the retrieval step is differentiable, either😂
@SuperMotzfeldt3 жыл бұрын
Can you do one with ColBERT as well?
@corgirun7892 Жыл бұрын
nice
@tarunpaparaju53823 жыл бұрын
Hey Yannic! Great video! I really appreciate the work you are doing to make research more accessible to everyone! By the way, I don't see a 1080p (HD) option for this video. Is it possible to watch this video in 1080p? Thank you! :)
@YannicKilcher
3 жыл бұрын
Thanks for telling me, I didn't even see that. No idea why this happens
@tarunpaparaju5382
3 жыл бұрын
@@YannicKilcher Thanks for your reply! Keep making awesome videos, it really helps me a lot :)
@abhilashnandy3 жыл бұрын
Yannic, instead of using a neural retriever, wouldn't a probabilistic retrieval framework, such as BM25 give a similar result?
@MrAlextorex
3 жыл бұрын
BM25 is too simplistic. BERT also has some understanding of meaning of words in the context of document
@abhilashnandy
3 жыл бұрын
@@MrAlextorex I have seen BM 25 augmented with T5 generated questions perform really well on MSMARCO
@MrAlextorex
3 жыл бұрын
@@abhilashnandy MSMARCO is based on questions sampled from bing searches which are quite factual and simplistic
@ziquaftynny92853 жыл бұрын
60 degrees or 300 degrees?
@sandeepunnikrishnan8806
11 ай бұрын
How would it be 300?
@Leon-pn6rb3 жыл бұрын
what does marginalizing mean in this context?
@YannicKilcher
3 жыл бұрын
I'm not sure what context you mean, could you clarify?
@DistortedV123 жыл бұрын
It is pronounced "relm". ah when knowing only english is actually helpful ;)