Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping (Searchformer)

Ғылым және технология

Paper: arxiv.org/abs/2402.14083
Abstract:
While Transformers have enabled tremendous progress in various application settings, such architectures still lag behind traditional symbolic planners for solving complex decision making tasks. In this work, we demonstrate how to train Transformers to solve complex planning tasks and present Searchformer, a Transformer model that optimally solves previously unseen Sokoban puzzles 93.7% of the time, while using up to 26.8% fewer search steps than standard A∗ search. Searchformer is an encoder-decoder Transformer model trained to predict the search dynamics of A∗. This model is then fine-tuned via expert iterations to perform fewer search steps than A∗ search while still generating an optimal plan. In our training method, A∗'s search dynamics are expressed as a token sequence outlining when task states are added and removed into the search tree during symbolic planning. In our ablation studies on maze navigation, we find that Searchformer significantly outperforms baselines that predict the optimal plan directly with a 5-10× smaller model size and a 10× smaller training dataset. We also demonstrate how Searchformer scales to larger and more complex decision making tasks like Sokoban with improved percentage of solved tasks and shortened search dynamics.
Authors: Lucas Lehnert, Sainbayar Sukhbaatar, Paul Mcvay, Michael Rabbat, Yuandong Tian
Links:
Homepage: ykilcher.com
Merch: ykilcher.com/merch
KZread: / yannickilcher
Twitter: / ykilcher
Discord: ykilcher.com/discord
LinkedIn: / ykilcher
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: www.subscribestar.com/yannick...
Patreon: / yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Пікірлер: 82

  • @yichunchen4370
    @yichunchen4370Ай бұрын

    The key thing in A star is the heuristic, it cannot overestimate the cost to goal to avoid missing the optimal plan but should be as large as possible at the meantime, to fast fail other nodes in the frontier to get a shorter search trace. This is hard since finding a good heuristic is always a non-trivial manual task in A star. This model I think is just using the encoder to better understand the problem context, and the trace to understand the search strategy, and implicitly generated a very complex heuristic algorithm within it which outperformed the man crafted determined heuristic. That modeled heuristic does not have to be A star since it is not chained to the A star restrictions, but can be better(shorter). A star is just its teacher. It can use some of its self generated data to "eat its own p..." because in this scenario it is very easy to judge which is a good job and which is not, to only "eat the good p..."

  • @clray123

    @clray123

    Ай бұрын

    Or maybe it just good lucky because of the stochastic and the lucky guesses got cherry-picked out by authors of the paper in desperation to make their experiments look worthwhile and publication-worthy.

  • @andreaterlizzi

    @andreaterlizzi

    16 күн бұрын

    ​​​@@clray123surely results are a bit inflated, and I doubt the method would be able to really scale to indefinitely large problem instances (they go ODD). Still, the results are interesting, and show that transformers can learn good heuristics for search problems, if given nice inductive biases (yayyy inductive biases work what a surprise). I guess that a better approach would be to just train the thing to estimate a good heuristic, which is what NN are very good at doing, while the A* algorithm acts as a complex "token decoding" strategy. That would single-handedly guarantee that plans are optimal under unrestricting assumptions (since you are still executing A*), and grant average better (shorter) plans since the NN is learning the heuristic. But I guess this wouldn't suggest that LLMs will eventually be able to plan (which I think they won't), but that actual symbolic reasoning and algorithmic reasoning is needed.

  • @rockapedra1130
    @rockapedra1130Ай бұрын

    Yay! Paper reviews! Love these best. Yannic is an superb teacher!

  • @agenticmark
    @agenticmarkАй бұрын

    Its so awesome when we get synergies like this. Ive been writing game AIs for years, using transformers instead a* pathfinding is pretty amazing. Its crazy how everything in the ML stack seems to generalize for things they were not designed to work on!

  • @unclecode
    @unclecodeАй бұрын

    Meta-cognition, or "thinking about thinking," has become a subject of AI research, which I find quite impressive and always wonder when this happens. Remarkably, within the same transformer architecture and the current large language model (LLM) paradigm, this is now achievable. I believe if we can represent any problem through sequential linguistic presentation (or finding an encoder/decoder), then there's likely a transformer that can be trained to find perhaps an optimal solution. Sounds like a new definition of "computability"!! An interesting avenue to explore would be applying this to an NP or NP-hard problem, like the Traveling Salesman Problem, to see if a transformer model could uncover an optimal solution. If I were in my twenties, this might have been a paper I'd work on :D

  • @jimwagner7817

    @jimwagner7817

    Ай бұрын

    Absolutely, I love this design space. Do you have some papers you can recommend?

  • @antonioILbig

    @antonioILbig

    Ай бұрын

    That would be interesting! Heuristics serve this purpose: to make feasible something computationally infeasible but only restricted to some specific domain/problem distribution. So applying LLMs to this purpose could lead to good heuristics useful in the downstream task without handcrafting them.

  • @unclecode

    @unclecode

    Ай бұрын

    @@antonioILbig

  • @marshallmcluhan33
    @marshallmcluhan33Ай бұрын

    I'd give this paper an A+ but it's too Meta.

  • @Jvo_Rien

    @Jvo_Rien

    Ай бұрын

    the use wildcard * instead of +

  • @marshallmcluhan33

    @marshallmcluhan33

    Ай бұрын

    @@Jvo_Rien Oh I could give it a gold star ⭐

  • @snippletrap

    @snippletrap

    Ай бұрын

    FAIR

  • @marshallmcluhan33

    @marshallmcluhan33

    Ай бұрын

    @@snippletrap Project Aria sees what you did there.

  • @moisestorresgarcia8179

    @moisestorresgarcia8179

    Ай бұрын

    What a metaverse you laid for us right there 😂

  • @Dan-hw9iu
    @Dan-hw9iuАй бұрын

    Here's a potential response to address your criticism: Consider the A* algorithm, and let B = "Least trace result of A* wrapped in a for loop." Algorithms A* and B are quite distinct, by definition. Ditto for datasets produced by them, in general. Algorithm B is guaranteed to produce results compatible with A*. But that's not relevant. Especially when training on B produces substantially better models than A*. Different algorithms, dataset properties, and model performance -- both definitionally and empirically. I think the authors' conclusion that their best model learned _something_ outside of A* is pretty reasonable and supported. Great A/B tests! ;-) Please let me know if I'm wrong here. I only watched your (excellent) video, Yannic. I didn't read the full paper. Cheers, and thanks for all that you do.

  • @rogerc7960
    @rogerc7960Ай бұрын

    I'll have to think about it a little bit more.

  • @GNARGNARHEAD
    @GNARGNARHEADАй бұрын

    good explanation, I can't help but be wanting for both known and unknown obstructions in the training data, that way it changes what is optimal based on forethought

  • @ericadar
    @ericadarАй бұрын

    What if you could show that the trace lengths produced by the dynamic bootstrap are often shorter than the min length of the trace lengths produced by A* ? Would that be convincing evidence that they produced a novel heuristic that is more efficient than A* at finding optimal plans, or would you still think it's just solving the subtask of more efficient deterministic path search ?

  • @ericadar
    @ericadarАй бұрын

    @YannicKilcher can you review deepmind's mixture-of-depths paper (arxiv 2404.02258) ? do you think it's a substantive improvement in compute allocation by token?

  • @MaJetiGizzle

    @MaJetiGizzle

    Ай бұрын

    It’s done by leveraging routers/more intelligent routing, but yes. It is a substantial improvement in compute usage and allocation.

  • @chanel454879876354
    @chanel454879876354Ай бұрын

    Interesting concept. It can be used as a tool of agent. If agent can generate the environment, then agent can train the tool subnet and utilize this trained tool to solve the corresponding tasks.

  • @antonioILbig
    @antonioILbigАй бұрын

    It'd be reallly interesting to see an AI powered A* algorithm in the sense that its internal heuristics are substituted by an AI model of some kind, for instance substituting the tie-breaking heuristic with a model. In this way we could really see how much A* can be improved by AI without doubts about wethere it's still A* 😅

  • @VeganeDeDroite
    @VeganeDeDroiteАй бұрын

    For equal performance, how many training tokens do we need with the "solution only" version compared to the "teaching" version ?

  • @TiagoTiagoT
    @TiagoTiagoTАй бұрын

    Sound like the model might have learned a tie-breaker algorithm that statistically beats A* with naive stochastic tie-breaking? Am I understand this right? Are we approaching the P=NP threshold?

  • @antonioILbig
    @antonioILbigАй бұрын

    What happens if an LLM is trained to solve a problem of which an algorithmic solution doesn't exist, like the halting problem?

  • @clray123
    @clray123Ай бұрын

    I would only be somewhat impressed if the training done on maze somehow automagically generalized into performance on Sokoban or vice versa. I don't recall them even attempting to demonstrate such a thing. Other than that, it just seems like another overfit the dataset, report success to get published paper.

  • @jamescamacho3403
    @jamescamacho3403Ай бұрын

    It'd be interesting to apply this to proof solving.

  • @LukePluto
    @LukePlutoАй бұрын

    great review! really interesting paper, hopefully some of the misleading claims get cleaned up during review. surprised there was no discussion of Q*

  • @maloxi1472

    @maloxi1472

    Ай бұрын

    Unsurprising. Hype isn't research.

  • @chrisspencer6502
    @chrisspencer6502Ай бұрын

    Please do a video on lucen vector search

  • @dansplain2393
    @dansplain2393Ай бұрын

    I’d like to see the learned model compared to a simple improvement on the A* heuristic that incorporates the world, such as one that tie breaks using the the number of walls in the possible manhattan paths to the destination. Why walk through a forest if there’s another route through an open field?

  • @sagetmaster4
    @sagetmaster4Ай бұрын

    I wonder if we end up stitching together many extremely purpose built very small and efficient LLMs that are called upon for very specific functions. Less mixture of experts and more very specialized "dumb" organs and one or more larger executive more generalized models

  • @PaganPegasus
    @PaganPegasusАй бұрын

    I'd argue that learning how to "nondeterministically tie break" makes the algorithm no longer the same as A*, because it's sorta introducing a second heuristic into A* for tiebreaking which is no longer the same as vanilla A*.

  • @ntelo19
    @ntelo19Ай бұрын

    Exactly what Y. Lecun said on Lex Fridman podcast some months ago. It's undoubtedly a further step to AGI.

  • @TiagoTiagoT
    @TiagoTiagoTАй бұрын

    Even with forking and backtracking , A* is still somewhat in the land of linear thinking. When are we gonna see something like this bootstraped from something more abstract, like a freeform graph variant of the Wavefunction Collapse algorithm?

  • @simonkotchou9644
    @simonkotchou9644Ай бұрын

    Always hype when a new meta paper comes out

  • @wurstelei1356

    @wurstelei1356

    Ай бұрын

    Taking 100k train steps instead of 1m is pretty descent even thou if I didn't get everything.

  • @clray123
    @clray123Ай бұрын

    " optimally solves previously unseen Sokoban puzzles 93.7% of the time, while using up to 26.8% fewer search steps" Well since the A* optimally solves previously unseen Sokoban puzzles 100% of the time, it sounds like their Searchformer found a broken version of the algorithm which generates shorter search dynamics but because of that does not work as it should 6.3% procent of the time. Maybe they should put that version of the achievement in the abstract...

  • @andikacandra4996
    @andikacandra4996Ай бұрын

    I think it good to train positional embedding

  • @alexkubiesa9073
    @alexkubiesa9073Ай бұрын

    I feel like a simpler way to ensure short execution traces in the SearchFormer model would be, when generating the training data, to just run A* search a few times with random tie-breaking and pick the example with the shortest trace. No need to train one SearchFormer model to generate data to train another model.

  • @tst3rt

    @tst3rt

    Ай бұрын

    A* selects the path with the minimum heuristic value, which is not necessarily the shortest one, see Russell and Norvig for an example

  • @alexkubiesa9073

    @alexkubiesa9073

    Ай бұрын

    @@tst3rt According to the video, as long as the heuristic is admissible then the path output by A* is optimal. But regardless, whatever criterion we apply to filter the data generated by the first SearchFormer we could have applied to the process of generating the original training data.

  • @OperationDarkside
    @OperationDarksideАй бұрын

    Would be cool if this could be generalized and applied to real world scenarios

  • @MrMehrd
    @MrMehrdАй бұрын

    Doesn’t it related to Qstar ?

  • @fredericfournier5594
    @fredericfournier5594Ай бұрын

    The argument about it's predicted the next token don't have to much sense, it's more about to did we just memorize all the answers possible ( what is probably impossible) or do we creating model of reality more and more dense...

  • @EinsteinNewtonify
    @EinsteinNewtonifyАй бұрын

    🎉🎉🎉🎉🎉

  • @sean_vikoren
    @sean_vikorenАй бұрын

    The statement about beating A* caught my attention, since A* is one of the few algorithms that we have an optimality proof for. It seemed a bit confusing, in the mathematically not possible sense.

  • @JorgetePanete

    @JorgetePanete

    Ай бұрын

    I think the part of replacing traces might make it overfit

  • @clray123

    @clray123

    Ай бұрын

    They are not claiming that the generated plan is "more optimal" than A* in terms of number of steps to execute by following the plan, just that the same optimal plan was generated using fewer planning steps.

  • @sean_vikoren

    @sean_vikoren

    Ай бұрын

    @@JorgetePanete The whole paper is self falsifying. But, if you can't read and understand the proof for A*'s optimality, you should probably not talk about it at all. That 'paper' is so sloppy, it's not even wrong. It actually makes me wonder if we should go back to beating children with sticks for being idiots.

  • @sean_vikoren

    @sean_vikoren

    Ай бұрын

    @@clray123 Read it again. That is exactly what they are claiming. They are grade zero idiots.

  • @sean_vikoren

    @sean_vikoren

    Ай бұрын

    @@clray123 They are moving their unqualified lips.

  • @hll97fr16
    @hll97fr16Ай бұрын

    Isn't this result kind of related to the fact that we have noticed with Tree Of Thought or Chain of Thought, that when we ask for a LLM to give us a plan of his response, we obtain better final response in the end? 😊

  • @clray123
    @clray123Ай бұрын

    The "key conclusion" that it is easier to "teach" the LLM to find the correct solution if you provide it a trajectory toward it is not "really interesting", it is trivial. You are artificially constraining the search space, guiding the LLM to generate the correct next token conditioned on the predecesssors. In an extreme case, you are spoon-feeding it the entire solution and only asking for the final step. It's kind of also showing how little success we can expect from the unsupervised "data mining" based training alone, and it explains why LLMs got so much more "clever" when trained on more reasonable datasets (e.g. training with textbooks rather than Internet crap and what not). While it is a plausible and rather easy to see result, it is not very optimistic for those who imagine they can solve all the world's future problems by brute-forcing them with tons of data and GPUs (well, they sorta can, but it gets really expensive really quick, already for toy problems... which again is an already known characteristic of any brute-force search algorithm... well, duh).

  • @DeepThinker193
    @DeepThinker193Ай бұрын

    If I ever wanted a LLM to come even close to reasoning like a human being. I'd do it like this. Basically, I would treat it like the mind of a child. 1) I'd place the model in a robotic body with all sensory mechanism for sight, smell, touch, taste and hearing. 2) I'd give it a constant thought process just as how humans constantly think and reinforce. 3) I'd then train that model in person as if it were a child, by talking to it and reasoning with it daily by using demonstrations until it gains more and more sense of this physical world and where everything it stands. Rinse and repeat until it can reason better than me. Mark my words, the way they're going about training the models by stuffing endless data down it's throat will never achieve AGI. It's so inefficient infact, they're now looking at friggen nuclear power stations to power the models which I find ridiculous, but entertaining none the less. We as human being don't need that much power for our reasoning processes.

  • @drdca8263

    @drdca8263

    Ай бұрын

    How much data do you imagine the system you describe, would involve? Like, how many bytes per second of sensory input are you thinking? You seem to contrast this with the large quantities of data used to train big models today, but it isn’t clear to me that this would involve less training data. (Also, the human brain seems to have lots of information already built in, where it isn’t clear how that information would get into the model in the setup you describe)

  • @DeepThinker193

    @DeepThinker193

    Ай бұрын

    @@drdca8263What I'd imagine is the robot model would have to be plugged in/tethered to a larger mainframe computer at all times. Because battery tech simply isn't good enough yet, and also, I have no idea how much compute they can fit in the body after adding all the motors for movement etc. The idea is for the robot model to focus on 'efficiency' in logic, reasoning and the physics in our reality, rather than cramming more and more data and hoping for the best. Think about it. There's a reason why smart animals have to gradually learn about their environment to survive. It's why we are always constantly thinking. One can also think of the mind as our virtual environment and the body as our physical reality. Both our virtual and physical models are intertwined, we simply cannot reason well without one or the other. Nature uses this method because it is the most efficient and effective slow gradual steps rather than data all at once. I would argue that similar to our brains that as you say "already has some data built in." The LLM's also already has Data built in. It's just a matter of focusing on it's efficiency in logic and reasoning.

  • @clray123
    @clray123Ай бұрын

    LLMs don't run algorithms, they imitate algorithms' outputs by approximate copying. Poorly. One could also say LLMs are not reasoning. Or not intelligent. But hey, that could ruin your AI investment.

  • @maheshprabhu

    @maheshprabhu

    Ай бұрын

    Why do you say that? We have no idea how to define intelligence. If anything, it might just be statistical extrapolation just like how LLMs operate.

  • @clray123

    @clray123

    Ай бұрын

    @@maheshprabhu A human does not need to read a trillion tokens in order to learn language. A human is also capable of learning new concepts, planning, strategizing, optimizing, doing logic, arithmetics, avoiding contradictions, all while using just 20 Watts of power and a lot of it self-taught. Therefore I claim the LLMs are not intelligent... they are fakes. And that our brains do not run transformers. Funny enough, when computers were first introduced, people were claiming that a brain was "just like a computer" and when mechanical automata were introduced our body then was "just a sort of biological machine". Well, we should have realized by now that our brains are "a bit" more capable than that.

  • @clray123

    @clray123

    Ай бұрын

    What we have with LLMs right now is exactly what Searle described in his "Chinese room" thought experiment. We have a laboriously trained algorithm with a huge database which reproduces the inputs and outputs of (any) language. But there is plenty of evidence, in form of the errors that it makes, that it does not in fact internally understand any of the language which it processes, and that it is able to reproduce traces of reasoning, but does not reason.

  • @antonioILbig

    @antonioILbig

    Ай бұрын

    @@maheshprabhuThat could be the case! We actually don't know it. For this same reason I wouldn't say that these things are intelligent because that would mean we know what intelligence is. So the best I can say is: this things are not intelligent. I'm not sure of the validity of this reasoning, but this is what I feel about the "intelligent" discussion. If something looked really intelligent and we hadn't absolutely no idea of its functioning, I would say it is intelligent.

  • @clray123

    @clray123

    Ай бұрын

    @@antonioILbig Actually, we do know it. We do know that intelligent beings, namely us humans, are capable of running algorithms of arbitrary time/space complexity, given enough resources. We are Turing-complete, LLMs are not. We know the algorithmic complexity of generating the next token by an LLM. It is mathematically impossible for an LLM to "learn" any algorithm which has a higher complexity than that required for the next token generation. What is possible is for the LLM to memorize the lookup function for mapping the algorithm's inputs to outputs, for a certain fixed-size range of inputs. And that is what the LLMs of today are doing. It is an insult to actual intelligence to call that sort of mapping intelligent. Show me an LLM which, after training on sequences of digits of the number PI, has obtained the knowledge of how to generate the n-th digit of PI, for any n > length of training dataset. But wait, the "smartest" LLMs out there even have a problem with counting the number of digits in their input. Try it yourself. There are rather simple limitations in these algorithms which are quite obviously not present in the "algorithms" that execute in our heads.

  • @JorgetePanete
    @JorgetePaneteАй бұрын

    I call overfitting

  • @AngouSana69
    @AngouSana69Ай бұрын

    Please stop burning my eyes

  • @DeepThinker193

    @DeepThinker193

    Ай бұрын

    Go into the light.

  • @lucasbeyer2985
    @lucasbeyer2985Ай бұрын

    Ahhh please don’t start to do vocal fry, leave that to the valley girls!

  • @hasko_not_the_pirate
    @hasko_not_the_pirateАй бұрын

    it’s like, who cares about math, let’s intuit our way through it. It’s valid research but ultimately, it’s bad engineering.

  • @jks234

    @jks234

    Ай бұрын

    ? That’s all of deep learning. Neural networks are just set up to “potentially contain all branches needed to act properly in all situations”. This approach is also not “learning on purpose”. What AI is teaching us in general, I think, is explainability is a luxury and often, we cannot use it to make the most complex and useful solutions.

  • @hasko_not_the_pirate

    @hasko_not_the_pirate

    Ай бұрын

    Clearly, deep learning and LLMs have brought capabilities we haven’t been able to implement in other ways, so that’s good. This paper is valid research in the sense that it’s interesting to explore what is and isn’t possible. It is “bad engineering” though in the sense that instead of teaching the model to use an algorithm that’s known to be optimal, they taught it to emulate it in a very inefficient way.

  • @TheRohr

    @TheRohr

    Ай бұрын

    @@hasko_not_the_pirate Nah, it's not about efficiency here, but the question if a neural model can learn capabilities that are encoded in human-designed algorithms. Still ofc. it makes sense to use the algorithm directly in production. ^^

  • @444haluk
    @444halukАй бұрын

    What a stupid idea😂😂😂

Келесі