L19.5.2.5 GPT-v3: Language Models are Few-Shot Learners

Ғылым және технология

Slides: sebastianraschka.com/pdf/lect...
-------
This video is part of my Introduction of Deep Learning course.
Next video: • L19.5.2.6 BART: Combi...
The complete playlist: • Intro to Deep Learning...
A handy overview page with links to the materials: sebastianraschka.com/blog/202...
-------
If you want to be notified about future videos, please consider subscribing to my channel: / sebastianraschka

Пікірлер: 4

  • @niranjansitapure4740
    @niranjansitapure4740 Жыл бұрын

    This series has been super informative to get a semi-deep dive into the concepts of different transformer and NLP models. Super helpful!

  • @jonathansum9084
    @jonathansum90843 жыл бұрын

    Hi. Thank you for uploading the video. I have a question. during the fine-tuning of the GPT, which the zero-shot, one-shot, or few-shot, does GPT do gradient update for the fine-tuning? If not, it means that fine-tuning does not learn from anything but for testing its performance?

  • @SebastianRaschka

    @SebastianRaschka

    3 жыл бұрын

    Good question. I think you are referring to the few-shot examples during testing/inference? I am 99% sure that GPT-3 does not do any gradient descent updates when it uses the examples in the context

  • @davidlearnforus
    @davidlearnforus Жыл бұрын

    Thanks for the great video! I struggle to understand how does 0, 1 or few-shot training (or it is just testing?) let the model have better predictions in few-shot versus 0-shot if weights are not updated? The only thing I can think of would be somehow combining backpropagated gradient with every new shot. How would othervise 0 or few shot example even comparable. Just everything would be 0-shot right?

Келесі