What is Q* | Reinforcement learning 101 & Hypothesis

Ғылым және технология

🔗 Links
- Jim Fan’s tweet: / 1728100123862004105
- Reinforcement learning deep dive: • Reinforcement Learning...
- Github: Q-learning AI to play snake game - www.crafters.ai/aitools/teach...
- Lets verify step by step: arxiv.org/abs/2305.20050
- Tree of thought: arxiv.org/abs/2305.10601
- Graph of thought: arxiv.org/abs/2308.09687
👋🏻 About Me
My name is Jason Zhou, a product designer who shares interesting AI experiments & products. Email me if you need help building AI apps! ask@ai-jason.com
#chatgpt #gpt4 #gpt5 #ai #artificialintelligence #tutorial #stepbystep #openai #llm #chatgpt #largelanguagemodels #largelanguagemodel #agent #reinforcementlearning

Пікірлер: 44

@AIJasonZ6 ай бұрын
Anything else I missed about Q*? Leave comment & let me know!
@manofsan
6 ай бұрын
Can this approach work on Small Language Models (Alpaca, Orca, etc)? Can existing LM which has already been trained, be further trained by transfer learning which uses this Reinforcement Learning technique? Can I train LM to play Snake Game using RL? How can ordinary people make something like Q*? I realize it will be hard to attain Q* level of performance, due to OpenAI's huge resources. But to even just demonstrate this Multi-Step Reasoning to solve some math problems would be great as proof of concept. I'm doing grad studies and would like to attempt a project on this.
@TheDessertFaux6 ай бұрын
That AlphaGo documentary remains so good, even a few years later. They found the human empathy and passion in a cold technical challenge, all without any narration. It gets me excited about hard tech.
@AIJasonZ
6 ай бұрын
Yea so good!
@Laurie-eg8ct
5 ай бұрын
I'll bet it does.
@LukePuplett6 ай бұрын
Really well put together, Jason, with use of interviews and clips.
@HarpaAI5 ай бұрын
Great overview! Jason, your videos on the AI topic are the best! 00:00 🤖 *"Q Star" is generating a lot of discussion in the AI community, and it's associated with OpenAI's recent actions, but its exact nature remains speculative.* 01:08 🎮 *Reinforcement learning is a machine learning framework where an agent learns from trial and error, aiming to maximize future rewards. It involves policy networks and value networks.* 03:25 🧠 *Reinforcement learning allows AI agents to self-play and discover new strategies, as demonstrated by DeepMind's achievements in games like Breakout and AlphaGo.* 08:01 📚 *There's speculation that "Q Star" could involve using policy networks and value networks, similar to AlphaGo, to improve reasoning and logic in large language models like GPT.* 11:14 🐍 *You can experiment with reinforcement learning in simple games with open-source projects, even if you're new to the field.*
@pandoraeeris78606 ай бұрын
Q-Star is AGI.
@jasonfinance6 ай бұрын
Thanks for organising the insights! this hypothesis is very exciting
@picksalot16 ай бұрын
A clear dfinition of AGI has been difficult to find. Temporarily constraining it to a specific field for evaluation might be helpful. For instance, AGI was achieved in Chess and Go when the best humans could not beat the game programs. At a certain point, the number of fields in which AGI has been achieved will far outweigh the fields that it hasn't. When that happens, the "General" in AGI will have been attained.
@flflflflflfl
6 ай бұрын
AGI was achieved in Chess and Go???
@homelessrobot6 ай бұрын
A lot of people are saying that Q* is some product of A* and Q-learning, but I think that mathematically inclined scientists are using a more formal application of _* than this. I would guess it is a generalization of Q-learning, in the same way that A* is a generalization of 'A': Dijkstra's algorithm. Maybe it involves graph search, but that is probably coincidental to the name. Pretty much everything these days involves graph search.
@AIJasonZ
6 ай бұрын
Yea very likely they have a breakthrough of search, as it is what unlock truely new/innovative strategy for alpha go
@jayhu60756 ай бұрын
I think Q* must be OPEN SOURCE for benefit humanity. Not only for big companies.
@half_way_expert6 ай бұрын
Thanks for sharing man. Keep it up
@nickstaresinic40316 ай бұрын
Very well organized and informative presentation.
@ran_domness6 ай бұрын
Excellent. Thanks!
@BooleanDisorder6 ай бұрын
Q* is the optimal route in Q-learning.
@Jim-ey3ry6 ай бұрын
This can be really huge!
@agenticmark6 ай бұрын
I was waiting for your video Jason! Thanks! Have you done any monte-carlo or genetic algorithm? My quess is Q* is a similar process but done at inference or a precache inference
@AIJasonZ
6 ай бұрын
Thanks! I haven’t don’t it personally, but gonna give it a try! My guess is something related to the search is the main break through
@Laurie-eg8ct5 ай бұрын
How does the reward system work for reinforcing behavior beyond Pavlovian bell sounds that signal approval?
@dancingdudezz5 ай бұрын
hey , Can you please make a video on detection on some significant insight using the reinforcement learning. I was curious about making the model to learn itself about the irregular patterns that needs to be classified using the reinforcement learning
@utkua6 ай бұрын
if it is just an optimization of training, I don;t see how it unlocks the abstract thinking. If it is actually another multi-model approach, bandwidth will be a limiting factor. But I think your guesses are not far off, OpenAI focuses on training more than anything else from the start. That is how you make your product look like a breakthrough without an actual breakthrough.
@NobleCaveman
6 ай бұрын
Isn't abstract thinking kind of just like increasing the chaos factor and seeking out connections between more 'random' topics or ideas?
@utkua
6 ай бұрын
⁠@@NobleCavemanabstract thinking is defining a concept and simulating it using current knowledge of the world. State of the art ai is still one directional flow over frozen links. It is like a record of intelligence. Useful but we are still multiple breakthroughs away from a true AGI. Funny thing is nobody is chasig those problems because they are high risk in terms of ROI, everyone wants to make a small improvement that will look cool enough to secure investment.
@nucleusaccumbens32286 ай бұрын
tyvm
@igorkudryk21996 ай бұрын
What are you recording with?
@agenticmark6 ай бұрын
Dr Jim is the shit. I will read anything his name is on.
@csabaczcsomps76556 ай бұрын
Q is question and * is repeat, so make sintezis of lot answer you got general inteligent ansver. My noob opinion.
@pauldannelachica23886 ай бұрын
❤❤❤
@PDragonLabs6 ай бұрын
👍
@MuzhiLi6 ай бұрын
can someone explain why didn't Google figure this out despite of developing some many groundbreaking research in the last decade?
@middle-agedmacdonald2965
6 ай бұрын
Sam has a heck of an enthusiastic team, that seems really tightly united. It just takes one "einstein" like thought to make it through a wall nobody else could think of.
@BR-hi6yt6 ай бұрын
Still don't understand Q* although I am a little clearer about AlphaGo
@andrewcampbell31006 ай бұрын
Q is blowing up and its not even monday on cable....
@dwikristianto6 ай бұрын
in science, technology and engineering world, there is no such thing (physical entity, or just an idea) have two or more names. each one, is only represented by single name. but not in marketing world, plenty of things have identical names or have several names. Q* and RLHF is something in the science worlds, so it must be pointing and representing different idea. IMO
@lucamatteobarbieri2493
6 ай бұрын
I hate to brake this to you but we are made mostly of dihydrogen monoxide.
@lucamatteobarbieri24936 ай бұрын
Open*AI
@victordelmastro82646 ай бұрын
I don't believe for a second that The Board would panic over an improved LLM or Transformer. :P Q* was an AGI hooked up to a Quantum Computer IMO. That would freak out the Board. Quantum Singularity Core based AGIs concern me. Don't turn off that device!! Infinite knowledge is infinite negative entropy. :O