Self-generated data @ DLCT

Ғылым және технология

This is a talk delivered at the (usually not recorded) weekly journal club "Deep Learning: Classics and Trends" (mlcollective.org/dlct ).
Speaker: Rishabh Agarwal
Title: Improving LLMs using self-generated data
Abstract: This talk would be about some of our recent work on improving LLMs using their self-generated data with access to external feedback. I would cover how we can go beyond human data on problem-solving tasks (math, coding) via a simple self-improvement approach based on expectation-maximization based RL, and training verifiers on self-generated data to augment this self-improvement approach. Finally, I'd discuss an in-context learning extension of this self-improvement approach.
Speaker bio: Rishabh Agarwal is a research scientist in the Google DeepMind Team in Montréal. Rishabh finished his PhD at Mila under the guidance of Aaron Courville and Marc Bellemare. Previously, Rishabh spent a year at Geoffrey Hinton's amazing team in Google Brain, Toronto. Earlier, Rishabh graduated in Computer Science and Engineering from IIT Bombay. Rishabh's research work mainly revolves around deep reinforcement learning (RL), often with the goal of making RL methods suitable for real-world problems, and includes an outstanding paper award at NeurIPS.
Paper links:
arxiv.org/abs/2312.06585
arxiv.org/abs/2402.06457
arxiv.org/abs/2404.11018

Пікірлер

    Келесі