Many-Shot VISUAL ICL is amazing! (Stanford)

Ғылым және технология

Many-shot visual in-context learning (ICL) is amazing! Especially when working with ICL+ (1 mio token context length) like Gemini 1.5 Pro, also tested already for GPT-4o.
An amazing alternative to fine-tuning VLM and LLMs.
New study by Stanford Univ shows the potential of new long context VLMs, also in regard to visual information (images). Tests include up to 1000 images in a prompt, with batched queries and the models perform!
Multimodal Many-shot in-context learning for extreme context lengths (1 mio token and more) tested for complete length of prompt.
This study by Stanford Univ establishes that multimodal foundation models can effectively leverage many-shot ICL, showing substantial performance gains and efficiency improvements. This paves the way for enhanced adaptability and accessibility of large multimodal foundation models in practical applications.
All rights w/ authors:
Many-Shot In-Context Learning
in Multimodal Foundation Models
arxiv.org/pdf/2405.09798
#airesearch
#ai
#visual

Пікірлер: 11

@propeacemindfortress24 күн бұрын
ohhh I can imagine a lot 😂 great presentation, looking forward to the next series
@kenchang345624 күн бұрын
Interesting but, and I'm no expert, for ICL, wouldn't you have to reevaluate the context every LLM session? Whereas with fine-tuning that context is essentially baked into the fine-tuned model. Although I think, ICL would be more flexible in accommodating changes to the context but I wonder what the token cost would be when you use a ICL requiring a very large number of tokens and reevaluating that context every LLM session?
@marvinkunz843
24 күн бұрын
It is true that ICL is increasing the number of tokens required. But at the same time, it can allow for dynamic selection of examples and can be handled much more flexible than fine-tuning. Medprompt had a very interesting technique for prompt selection that added a lot to it, in my opinion.
@TheReferrer72
24 күн бұрын
@@marvinkunz843 Yep, I think that the key to these findings that its a much quicker and flexible way of fine tuning the model. You can see by the graphs displayed they had lots of fun varying the batch sizes. This is an amazing video.
@DewEfresh23 күн бұрын
I'm interested in pre-training as well. What technique would you use? If one of the goals is to be as efficent as possilbe, you could possible use ReLora, GaLore, FSDP QDoRA. If fine tuning is just an extension of pre-training(with slight differences) these could all be options. You could also throw 1.58bit llm's into the mix which could be trained at fp8.
@norman917421 күн бұрын
good video
@fabriciot416624 күн бұрын
Excellent channel!: Totally agree with your last observation. It seems that we have lost the fact that you can take the most "intelligent" or expert person in the world in a certain science/discipline and they will be far from knowing everything about everything. I think that once an LLM learns enough about the language, its relationships and something else, it doesn't seem very natural to continue "putting" training data into it about all the information that is or is circulating out there. If you think of a "simple" use case (and despite the power of these models today, it is still not possible to have "enough" confidence) such as a customer service assistant, he or she must clearly know how to direct himself correctly (or the personality more appropriate to the use case) to the client, must manage a simple but stable dialogue (without hallucinations or oddities), and what remains, "the fine task" of the assistant will be something very specific.. And going to a more philosophical facet if If you want, you can't have everything, or you have something quite good in general but not so good in specific issues, or you have something very good in a specific issue (expert) and not as good, average as most of us, in the general knowledge. Excellent videos, I get a little lost in some of them, most of them are understood and enjoyed a lot. Thank you, a big hug
@propeacemindfortress24 күн бұрын
for your last question... sending data to SF is a bad idea...
@pensiveintrovert431820 күн бұрын
Who is really doing the work? You with your many examples or the LLM? You might as well just give it the answer.
@code4AI
20 күн бұрын
You just discovered the phenomenon of over fitting, when the LLM learns the answers and not the solution (path) itself. That is the exact reason why we have to be really careful with fine-tuning or ICL+, so we do not enter over fitting, which is a standard topic for the last 2 years. Thanks for your comment.
@pensiveintrovert4318
20 күн бұрын
@@code4AI I wasn't really making the point about overfitting. When one writes papers, one has unlimited time to play around with creating examples for the ICL. If I have to generate examples for any random query, then where is the time saving? It is no longer a general solution that an LLM is supposed to be.