Augmented Language Models @ DLCT

Ғылым және технология

This is a talk delivered at the (usually not recorded) weekly journal club "Deep Learning: Classics and Trends" (mlcollective.org/dlct/ ).
Speaker: Gargi Balasubramaniam
Title: Augmented Language Models: A Survey
Abstract: This talk focuses on recent work in which language models (LMs) are augmented with reasoning skills and the ability to use tools. The former is defined as decomposing a potentially complex task into simpler subtasks while the latter consists in calling external modules such as a code interpreter. LMs can leverage these augmentations separately or in combination via heuristics, or learn to do so from demonstrations. While adhering to a standard missing tokens prediction objective, such augmented LMs can use various, possibly non-parametric external modules to expand their context processing ability, thus departing from the pure language modeling paradigm, and being referred to as Augmented Language Models (ALMs). The missing token objective allows ALMs to learn to reason, use tools, and even act, while still performing standard natural language tasks and even outperforming most regular LMs on several benchmarks.
Speaker's bio: Gargi Balasubramaniam is a 2nd year MS student at UIUC, advised by Han Zhao. Her research focuses on reliable machine learning through generalization and robustness.
Paper link: arxiv.org/abs/2302.07842