Improving Reasoning in Language Models with LASER: Layer-Selective Rank Reduction

Ғылым және технология

Microsoft Research Forum, January 30, 2024
Dipendra Misra, Senior Researcher at Microsoft Research New York City and AI Frontiers lightning talk presentation at Microsoft Research Forum.
See more at aka.ms/ResearchForum-Jan2024

Пікірлер: 8

  • @MicrosoftResearch
    @MicrosoftResearch2 ай бұрын

    Join us for a continuous exchange of ideas about research in the era of general AI. *Register for the Microsoft Research Forum series:* aka.ms/ForumRegYT

  • @wolpumba4099
    @wolpumba40994 ай бұрын

    *Summary* *Introduction to LASER Method* - 00:17 Introduction of the new method called LASER (Layer-Selective Rank Reduction) for improving pre-trained large language models (LLMs). - 00:26 Discussion on the impact of LLMs on machine learning and their lack of complete understanding. - 00:34 Explanation that LLMs are trained on diverse internet-collected data using transformer architectures. - 00:49 Suggestion that understanding LLMs could be aided by intervening in the model and observing performance changes. *Explaining the LASER Intervention* - 01:15 Presentation of LASER, which involves selecting a weight matrix of an LLM and creating a low-rank approximation of it. - 01:32 Description of the transformer architecture consisting of multiple transformer blocks, each with several weight matrices. - 01:48 Selection of a specific weight matrix to illustrate the LASER process. - 02:01 Description of singular value decomposition (SVD) used for low-rank approximation and the computational efficiency of the process. - 02:39 Identification of three choices in performing a LASER intervention: which layer to select, which weight matrix to edit, and the level of approximation. *Benefits of LASER and Its Evaluation* - 03:03 Mention of LASER's ability to reduce the memory footprint of LLMs, enabling broader accessibility. - 03:24 Evaluation of LASER on the GPT-J LLM using the counterfact question answering dataset to assess model robustness to paraphrases. - 03:55 Observation that, contrary to expectations, the right LASER intervention can decrease model loss instead of increasing it, indicating an improvement in the pre-trained LLM. - 04:24 Analysis of the impact of applying LASER to different layers of an LLM, with earlier layers increasing loss and later layers decreasing loss when more approximation is performed. *Generalizability and Analysis of LASER's Effects* - 04:52 Confirmation that LASER's surprising results hold across multiple tasks and different LLMs like Roberta, GPT-J, and LLama2. - 05:02 Reports of significant performance improvements in certain tasks following LASER intervention. - 05:20 Brief mention of additional analyses, including gains from rare data points in training data and the removal of semantically correct but erroneous responses by LASER, likening it to a denoising process. *Conclusion* - 05:50 Summary of LASER as an intervention method for LLMs that can increase accuracy while reducing memory usage. - 06:07 Direction to the paper on arXiv and its upcoming presentation at the iClear conference. - 06:16 Closing of the talk. Disclaimer: I used gpt4-1106 to summarize the video transcript. This method may make mistakes in recognizing words

  • @hussienalsafi1149
    @hussienalsafi11493 ай бұрын

    ❤️❤️❤️❤️❤️❤️❤️

  • @naniSinek
    @naniSinek3 ай бұрын

    How do you cross validated that the later layers where improvements happen were in original model training maybe to many layers so that in the original training less layers would have helped to improve the loss?

  • @PicaPauDiablo1
    @PicaPauDiablo14 ай бұрын

    This is awesome. Thank you

  • @khangvutien2538
    @khangvutien25384 ай бұрын

    Thank you very much. Now I understand the underlying principles of LASER. Am I correct that you calculate the eigen vectors and disregard the vecteurs with low eigen values? Then it's similar to the Principal Component Analysis of my youth, and to JPEG compression. No surprise that you improve the answer accuracy, since there is less noise. Now the question is "do you have a guidance as of which inner matrix you choose to reduce the ranking"? The middle one for example the layer 45 in 90 layers? the upstream 1/3 for example layer 30 of 90? or layer 60 of 90?

  • @mattanimation
    @mattanimation4 ай бұрын

    shouldn't it be LASERR?

  • @marshallmcluhan33
    @marshallmcluhan334 ай бұрын

    What about Bagel and slerp?