Git Re-Basin @ DLCT

Ғылым және технология

This is a talk delivered at the (usually not recorded) weekly journal club "Deep Learning: Classics and Trends" (mlcollective.org/dlct/ ).
Speaker: Samuel Ainsworth
Title: Git Re-Basin: Merging Models modulo Permutation Symmetries
Abstract: The success of deep learning is due in large part to our ability to solve certain massive non-convex optimization problems with relative ease. Though non-convex optimization is NP-hard, simple algorithms -- often variants of stochastic gradient descent -- exhibit surprising effectiveness in fitting large neural networks in practice. We argue that neural network loss landscapes contain (nearly) a single basin after accounting for all possible permutation symmetries of hidden units a la Entezari et al. (2021). We introduce three algorithms to permute the units of one model to bring them into alignment with a reference model in order to merge the two models in weight space. This transformation produces a functionally equivalent set of weights that lie in an approximately convex basin near the reference model. Experimentally, we demonstrate the single basin phenomenon across a variety of model architectures and datasets, including the first (to our knowledge) demonstration of zero-barrier linear mode connectivity between independently trained ResNet models on CIFAR-10 and CIFAR-100. Additionally, we identify intriguing phenomena relating model width and training time to mode connectivity. Finally, we discuss shortcomings of the linear mode connectivity hypothesis, including a counterexample to the single basin theory.
Speaker bio: Samuel Ainsworth is a Senior Research Scientist at Cruise AI Research where he studies imitation learning, robustness, and efficiency. He completed his undergraduate in Computer Science and Applied Mathematics at Brown University and received his PhD from the School of Computer Science and Engineering at the University of Washington. His research interests span reinforcement learning, deep learning, programming languages, and drug discovery. He has previously worked on recommender systems, Bayesian optimization, and variational inference at organizations such as The New York Times and Google.
Paper link: arxiv.org/abs/2209.04836

Пікірлер: 2

  • @ibraheemmoosa
    @ibraheemmoosa Жыл бұрын

    This was a great talk! I missed the live talk. Thanks for recording this one.

  • @kazz811
    @kazz811 Жыл бұрын

    Great talk! One point is that the argument for why the lambda is seemingly at 0.5 doesn't seem right. Because these cases are chosen with random seeds, all you can expect is that the distribution of lambda is peaked at 0.5 (for lots and lots of seeds) but it doesn't follow by symmetry that it would be exactly 0.5. That seems to warrant an explanation.

Келесі