Synthetic Data @ DLCT

Ғылым және технология

This is a talk delivered at the (usually not recorded) weekly journal club “Deep Learning: Classics and Trends (mlcollective.org/dlct ).
Speaker: Diganta Misra
Title: Synthetic Data: The New Frontier
Abstract: In real-world scenarios, extensive manual annotation for continual learning is impractical due to prohibitive costs. Although prior arts, influenced by large-scale webly supervised training, suggest leveraging web-scraped data in continual learning, this poses challenges such as data imbalance, usage restrictions, and privacy concerns. Addressing the risks of continual webly supervised training, we present an online continual learning framework - Generative Name only Continual Learning (G-NoCL). The proposed G-NoCL uses a set of generators G along with the learner. When encountering new concepts (i.e., classes), G-NoCL employs the novel sample complexity-guided data ensembling technique DIverSity and COmplexity enhancing ensemBlER (DISCOBER) to optimally sample training data from generated data. Through extensive experimentation, we demonstrate superior performance of DISCOBER in G-NoCL online CL benchmarks, covering both In-Distribution (ID) and Out-of-Distribution (OOD) generalization evaluations, compared to naive generator-ensembling, web-supervised, and manually annotated data.
Speaker bio: Diganta is a UNIQUE Scholar Research MSc at Mila, Montreal supervised by Prof Irina Rish within the CERC AAI Lab and is a Visiting Researcher at the Human Sensing Lab at Carnegie Mellon University, Pittsburgh. His current research interests span topics of constrained learning - sparsity, mixture of experts, lifelong learning, generative models, learning dynamics and code generation. Diganta is also an incoming ELLIS PhD Fellow at the Max Planck Institute in Tübingen. Beyond research, he is an avid football, F1 and e-sport follower.
Paper link: arxiv.org/abs/2403.10853

Synthetic Data @ DLCT

Ғылым және технология

Пікірлер

Келесі