No video

PaLM Pathways Language Model explained | 540 Billion parameters can explain jokes!?

It’s time to explain PaLM, Google AI’s Pathways Language model in a coffee break! ☕ Are you ready for this 540 billion dense parameter model to explain jokes to you? But what about chain of thought reasoning? Or any other cool NLP task like the crazy ones listed in BIG Bench?
SPONSOR: Weights & Biases 👉 wandb.me/ai-co...
📺 Diffusion models and GLIDE explained: • Diffusion models expla...
Check out our daily #MachineLearning Quiz Questions: / aicoffeebreak
➡️ AI Coffee Break Merch! 🛍️ aicoffeebreak....
Paper 📜: Chowdhery, Aakanksha, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham et al. "PaLM: Scaling Language Modeling with Pathways." arXiv preprint arXiv:2204.02311 (2022). arxiv.org/abs/...
🔗 PaLM blog: ai.googleblog....
🔗 Large language models are slightly boring: / 1511642282788896769
Outline:
00:00 DALL-E 2 or PaLM?
01:14 Weights&Biases (Sponsor)
02:25 A brief history of boring large language models
03:43 What is PaLM?
05:11 Training PaLM on all TPUs
08:11 PaLM training data
08:49 What it can do
10:31 Few-shot learning explained
13:20 Explaining jokes and Outlook
Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏
Don Rosenthal, Dres. Trost GbR, banana.dev -- Kyle Morris, Julián Salazar, Edvard Grødem, Vignesh Valliappan, Kevin Tsai, Mike Ton
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to help with our Coffee Bean production! ☕
Patreon: / aicoffeebreak
Ko-fi: ko-fi.com/aico...
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔗 Links:
AICoffeeBreakQuiz: / aicoffeebreak
Twitter: / aicoffeebreak
Reddit: / aicoffeebreak
KZread: / aicoffeebreak
#AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research​
Music 🎵 : That's What It Takes (Instrumental) - NEFFEX
✍️ Arabic Subtitles by Ali Haidar Ahmad / ali-ahmad-0706a51bb .

Пікірлер: 39

  • @Ma2rten
    @Ma2rten2 жыл бұрын

    I am a coauthor of the PaLM paper. Thanks for choosing to cover it!

  • @AICoffeeBreak

    @AICoffeeBreak

    2 жыл бұрын

    Thanks for the visit! And congrats for the cool work. 👏 I'm eager to see what you have lined up next.

  • @michaelfischer841

    @michaelfischer841

    2 жыл бұрын

    thank you for your brilliant work

  • @michaelfischer841

    @michaelfischer841

    2 жыл бұрын

    when you are training these things -- are you also using the contents of university level reference materials in PDF format -- which can be converted using pdftotext on the command line

  • @sabofx

    @sabofx

    2 жыл бұрын

    @Maarten Bosma: I've viewed several videos on PaLM like this one and one by Dr Alan D Thompson. Is there any way that I could have a conversation/chat with the PaLM AI? I would love to test its reasoning capabilities myself. Is there somewhere where I can sign up for access? Looking forward to your reply! Cheers, Joost.

  • @Mutual_Information
    @Mutual_Information2 жыл бұрын

    I'm glad you chose PaLM. It felt like DALLE was sucking up all the attention when PaLM was doing some seriously impressive things we haven't seen before. Very nice video. As always :)

  • @anthonyrepetto3474
    @anthonyrepetto34742 жыл бұрын

    In regard to PaLM developing certain capabilities only once it reaches a threshold: We now know that even random graphs, of sufficient size and connectivity, undergo a 'phase-change' into states of higher order, as explained on Quanta's recent article, "Elegant Six-Page Proof Reveals the Emergence of Random Structure" - So, even though the model is not an innovation, it does provide a potential insight: making models bigger can cross *thresholds* into sudden new abilities!

  • @bazejmarciniak5682
    @bazejmarciniak5682 Жыл бұрын

    Your channel is a gem! Thanks for your great work!

  • @fedelopez77
    @fedelopez77 Жыл бұрын

    "Few-shot learning, as we see it from GPT-3 onwards, is just glorified pattern completion" --> Standing ovation, just awesome

  • @iliemihai949
    @iliemihai9492 жыл бұрын

    Foarte tare Letitia, unul dintre cele mai bune canale de urmarit in materie de NLP. In lunile urmatoare vom lansa un model de GPT2-780M pe limba romana antrenat pe 40 GB text.

  • @AICoffeeBreak

    @AICoffeeBreak

    2 жыл бұрын

    Wow, de abia aștept să văd. 👀

  • @HoriaNeagu

    @HoriaNeagu

    2 жыл бұрын

    Salut! S-a concretizat cumva acest proiect?

  • @hannesstark5024
    @hannesstark50242 жыл бұрын

    Fantastic! Thank you for this summary which prevents me from having to read slightly boring papers :7

  • @michaelfischer841
    @michaelfischer8412 жыл бұрын

    your commentary and insight is top notch

  • @JuliusSmith
    @JuliusSmith2 жыл бұрын

    I have to watch all your videos now! Your style is perfect for me - thanks for making them!

  • @AICoffeeBreak

    @AICoffeeBreak

    2 жыл бұрын

    Glad you found us! 😁

  • @mrshankj5101
    @mrshankj51012 жыл бұрын

    I don't think AI language models are boring! paLM and GPT-3 is awesome!

  • @JuliusSmith
    @JuliusSmith2 жыл бұрын

    Maybe "few shot orientation" would be a better term

  • @AICoffeeBreak

    @AICoffeeBreak

    2 жыл бұрын

    🤣

  • @JM-ln2zm
    @JM-ln2zm Жыл бұрын

    Great Video Letitia! i have a question. So PaLM was trained on 6100 TPU's, lets say you created a language translator using PaLM, In order for me now to use this newly created language translator do i still need access to the 6100 TPU's or can it be run on less TPU's once the model has been trained?

  • @AICoffeeBreak

    @AICoffeeBreak

    Жыл бұрын

    Hi, thanks for the question. Maybe someone knows this more thoroughly than me, but no, the parallelization on more than 6k TPUs is for speeding up training, for storing gradients. For inference, they do not need the gradients, they just need to load the parameters. Due to the enormous number, it is surely more than one TPU they need for inference, since it needs so much memory. If you are happy to wait a bit (I do not know how long "a bit" is for such enormous models), you could even load on a CPU with enough RAM for inference. 😅

  • @tildarusso
    @tildarusso2 жыл бұрын

    Nice wrap up. As you said, it is XXXL large but nothing new - boring as usual imho. Thanks you for saving the 87-page reading time for a lot of people!

  • @federicolusiani7753
    @federicolusiani77532 жыл бұрын

    Thank you so much for these videos!! The quality of the explanations and insights you provide is unmatched.

  • @AICoffeeBreak

    @AICoffeeBreak

    2 жыл бұрын

    Thanks, so nice of you! :)

  • @Skinishh
    @Skinishh2 жыл бұрын

    Thank you for the great video, as always! I wonder why these large LMs are all decode-only as GPT and not encoder-decoder as T5? 🤔

  • @Skinishh

    @Skinishh

    2 жыл бұрын

    Answering my own question: these kinds of models are only interested in next-text generation and not in fine-tuning tasks or mask completion as T5. Therefore, only a decoder is needed for text generation.

  • @Ma2rten

    @Ma2rten

    2 жыл бұрын

    @@Skinishh Google Research has also done work on large encoder-decoder models - most recently ST-MoE-32B. Decoder-only models tend to work best for open ended text generation and few shot. Encoder-Decoder models for classification and close ended text generation (e.g. machine translation).

  • @DerPylz
    @DerPylz2 жыл бұрын

    I wonder what the output would be without the "few-shots", so not giving the 2 examples of correctly solved tasks before the prompt. Do you think there would be no answer at all, or just a very bad one?

  • @odysseashlap

    @odysseashlap

    2 жыл бұрын

    There would be an irrelevant answer

  • @scottpulver

    @scottpulver

    2 жыл бұрын

    Irrelevant followed by 1 perfect

  • @Abdulazizab2

    @Abdulazizab2

    2 жыл бұрын

    Checkout the GPT-3 paper "Language Models are Few-Shot Learners" as they evaluate the 'few-shots' and also 'zero-shot' where you don't provide any prompt for a given task. For some tasks, zero shot does well and other tasks the model needs to be driven by at least one example i.e. 'one-shot'.

  • @jeanpicard1844
    @jeanpicard18442 жыл бұрын

    I’m confused as to what you mean about toxicity and why it’s being toxic? Or how it’s being toxic? Is there an example of something you can point me to? Maybe I’m just missing the meaning of a term as it is used in the AI/Language space.

  • @AICoffeeBreak

    @AICoffeeBreak

    2 жыл бұрын

    Maybe you can read more about it here. 🔗 GPT-3 examples of toxic behaviour: venturebeat.com/2022/01/27/openai-rolls-out-new-text-generating-models-that-it-claims-are-less-toxic/

  • @micknamens8659

    @micknamens8659

    Жыл бұрын

    "toxic" means it's an unwanted fact - i.e. denied & forbidden by cancel culture

  • @wilfredomartel7781
    @wilfredomartel7781 Жыл бұрын

    How can i test it?

  • @AICoffeeBreak

    @AICoffeeBreak

    Жыл бұрын

    Sadly, PaLM was not made available to the public. We can read about it in the paper.

  • @chriswondyrland73
    @chriswondyrland732 жыл бұрын

    ... Is she real, or an AI, or just imitating an AI ?!

  • @Micetticat
    @Micetticat2 жыл бұрын

    "Boring models" with new exciting hardware tricks!

  • @diegorusso2315
    @diegorusso2315 Жыл бұрын

    How can i try this AI?