Function Approximation | Reinforcement Learning Part 5

The machine learning consultancy: truetheta.io
Want to work together? See here: truetheta.io/about/#want-to-w...
Here, we learn about Function Approximation. This is a broad class of methods for learning within state spaces that are far too large for our previous methods to work. This is part five of a six part series on Reinforcement Learning.
SOCIAL MEDIA
LinkedIn : / dj-rich-90b91753
Twitter : / duanejrich
Github: github.com/Duane321
Enjoy learning this way? Want me to make more videos? Consider supporting me on Patreon: / mutualinformation
SOURCES
[1] R. Sutton and A. Barto. Reinforcement learning: An Introduction (2nd Ed). MIT Press, 2018.
[2] H. Hasselt, et al. RL Lecture Series, Deepmind and UCL, 2021, • DeepMind x UCL | Deep ...
SOURCE NOTES
This video covers topics from chapters 9, 10 and 11 from [1], with only a light covering of chapter 11. [2] includes a lecture on Function Approximation, which was a helpful secondary source.
TIMESTAMP
0:00 Intro
0:25 Large State Spaces and Generalization
1:55 On Policy Evaluation
4:31 How do we select w?
6:46 How do we choose our target U?
9:27 A Linear Value Function
10:34 1000-State Random Walk
12:51 On Policy Control with FA
14:26 The Mountain Car Task
19:30 Off-Policy Methods with FA
LINKS
1000-State Random Walk Problem: github.com/Duane321/mutual_in...
Mountain Car Task: github.com/Duane321/mutual_in...
NOTES
[1] In the Mountain Car Task, I left out a hyperparameter to tune: Lambda. This controls how far away the evenly spaced proto-points are from any given evaluation point. If lambda is very high, the prototypical points are considered very close together, and they won't do a good job discriminating different values over the state space. But if lambda is too low, then the prototypical points won't share any information beyond a tiny region surrounding each point.

Пікірлер: 61

@Tehom1 Жыл бұрын
"Who needs theorems when you've got hopes?" - words to live by.
@bean217
3 ай бұрын
Amen
@mCoding Жыл бұрын
That animation updating the estimates and showing the path the ball -- err "Car" -- took was spectacular. Great work as always!
@Mutual_Information
Жыл бұрын
Thank you my man! When reading the text, this was the example that convinced me it needed to be animated
@pauredonmunoz8221 Жыл бұрын
Just 2 days before the exam😍😍
@neithane72625 ай бұрын
Great playlist ! It would have been cool to include the time of each trainning
@mightymonke2527 Жыл бұрын
You're really criminally underrated, should have hundreds of thousands of views at least
@Mutual_Information
Жыл бұрын
Ha thank you, means a lot to hear that. In my view, I still have a lot of wrinkles in what I'm producing. It's OK to not have a massive audience while I try to figure out how to make great videos. But eventually, they'll be really great and I think the attention will come then.
@fang-panglin7691 Жыл бұрын
One of the best in youtube! Thanks!
@JetzYT Жыл бұрын
Thanks Duane, loving these videos. They're a big help for my group of undergrads who are interested in getting into RL research!
@Mutual_Information
Жыл бұрын
Oh that's awesome! Getting my vids into classrooms is the ideal case - thank you for passing it along
@glowish1993 Жыл бұрын
Your videos are so fricking good! Thank you for such quality content on YT many of us appreciate it. I'm sure the channel will blow up in the future!!
@Mutual_Information
Жыл бұрын
I hope you're right. Thank you!
@RANDOMGUY-wz9ur Жыл бұрын
amazing series. really appreciate your work!
@Mutual_Information
Жыл бұрын
Thanks guy!
@Lukas-wm8dy Жыл бұрын
Your explanations are brilliant, thanks for making these videos
@Mutual_Information
Жыл бұрын
Thanks Lukas, happy to do it
@qiguosun129 Жыл бұрын
Excellent course!
@Pedritox0953 Жыл бұрын
Great video! very ilustrative
@arnaupadresmasdemont4057 Жыл бұрын
Wonderful!
@marcin.sobocinski Жыл бұрын
Just a bit of "surfing" on the very broad topic (like just mentioning a "deadly triad" without any hints as to how to deal with it) ;), but the mountain car animation is just wonderful! Thank you for the code! It's always a pleasure to watch such a well prepared videos :D
@Mutual_Information
Жыл бұрын
Thank you Marcin - great to see you back here! Yea, sometimes surfing is the best I can afford :) Glad to hear the code is appreciated. For the those who are *really* curious about the details, the code can fill in the gaps
@siddharthbisht1287 Жыл бұрын
Change the title to RL with DJ featuring Lake Moraine. 😂😂😂😂 . The green screen is actually really useful. Once again, grateful for these videos. You are making content that can be binge watched with a notebook 😂😂😂😂
@Mutual_Information
Жыл бұрын
Thanks Siddharth - the show is a work in progress, and I've actually managed to pull off some progress :)
@rr006765 ай бұрын
These videos are great! I really did not like the formatting of Barto and Sutton (eg. definitions in the middle of paragraphs), but you've done an awesome job of exacting and presenting the most valuable concepts
@Mutual_Information
5 ай бұрын
Thank you for appreciating it! Barto and Sutton is a big bite, so I was intending to ease the digestion with these videos.
@abdulrhmanaun3 ай бұрын
Thank you so much
@ericsung14 Жыл бұрын
Thank you. it really help me a lots.
@Mutual_Information
Жыл бұрын
Awesome, happy to hear it
@jcplerm Жыл бұрын
Thanks for all your work making these great videos available to all. Is there a Part 6, or is it still in the making?
@Mutual_Information
Жыл бұрын
There's a part 6 in the works :)
@youtubeuser1794 Жыл бұрын
Thanks!
@letadangkhoa Жыл бұрын
Thanks a lot for the great content. May I know when the final video will be released?
@Mutual_Information
Жыл бұрын
It will be about a month form now. It may help to turn notifications on :)
@sounakmojumder5689 Жыл бұрын
Hi, thanks,TD value is not exactly the belman update (TV), in case of off-policy learning, it may capture a not important sample and new update can lead to a wrong direction, which may cause divergence and the update is projected into feature space (linear approximator let's say), and then projected bellman error is minimized , am I right?
@BohdanMushkevych Жыл бұрын
Thank you for great series! BTW - changing background to completely dark allows to concentrate on the content better
@Mutual_Information
Жыл бұрын
Yea, now that I've changed to the green screen, I think it's much better. We're a new channel now!
@aptxkok5242 Жыл бұрын
Looking forward to the part6 video. Any idea when will it be out?
@Mutual_Information
Жыл бұрын
Working on it as we speak! I have a lot of non-YT stuff going on as well, so I've been delayed. Let's say.. 3 weeks?
@noobtopro869924 күн бұрын
Sir can you provide the coding of these classes theory is really great but I am having trouble in implementation. One more playlist, please
@Mutual_Information
23 күн бұрын
Code links in the description :) And another playlist lol... I'm tired
@ivanxdxd Жыл бұрын
>tfw irl all data is spread in multiple excel files throughout the company with no structure whatsoever.
@bonettimauricio7 ай бұрын
Thank you very much for sharing this amazing content, I have a question. I think the obvious choices for features in the mountain car example are distance and velocity. I don't understand why you (or the book that used tile coding) chose to use normalized radial basis to convert these 2 features into 1225 (352) features. My understanding of function approximation was that its main goal was to shrink a huge space state into a smaller one. I get the impression that this solution expands the state space.
@Mutual_Information
7 ай бұрын
The essential *information* is distance and velocity, but how you feature those into a model is a different story. Let's say we didn't use a NRB, what's the alternative? E.g. if you do something linear, you'll quickly see do-able actions over the state space can never produce a sequence that'll get out of the valley.
@bonettimauricio
7 ай бұрын
@@Mutual_Information I will experiment with a polynomial expression combining position and velocity instead and check if it converges to the optimal solution. The NRB solution is great. I do not have a standard procedure to use for feature selection, do not even know if it exists at all, if you know any literature about it please let me know. Again, thanks for this content!
@Mutual_Information
7 ай бұрын
@@bonettimauricio There's a section of Sutton's textbook that's devoted to how to featurize the state space, in case you're interested
@TheElementFive Жыл бұрын
If I may suggest a future video topic, how about a deep dive into mercer's theorem and how it is applicable to support vector machines?
@Mutual_Information
Жыл бұрын
Mercer's theorem.. not a bad idea. That would probably get wrapped in a broader conversation with the kernel trick, and SVMs would get mentioned there. Added it to the queue!
@danielawesome129 ай бұрын
Can you help motivate the need for the proto points? You already had a complete encoding of the state with just 2 dimensions: (position, velocity). Encoding the state in 1200 dimensions seems like overparameterization/redundancy. I assume there's a practical reason such as "dividing up the state space into 1200 discretized regions then learning the optimal behavior per region" but I can't wrap my head around why that would be necessary. This confusion carries over into Part 6 where proto points come up again, but now we only have two.
@Mutual_Information
9 ай бұрын
You got me! A length 1200 feature vector is indeed overkill. What I'm doing is, I don't want the representational capacity of the parameterization I chose to be a limiting factor. So I go over the top and I'm doing effectively exactly what you describe: "diving up the state space 1200 discretized regions" and learning the value in each region, almost independently (but not *actually* independently). In practice, we'd take a lot more care to choose a parsimonious parameterization that would be more sample effect (assuming we chose the parameterization wisely). But doing so requires machinery I'd rather not use; e.g a neural network. By picking something simple, I was avoiding the headache of our more powerful tool, but you saw it's ugly symptom.
@danielawesome12
9 ай бұрын
Thanks for confirming! And with a speedy response time too! Thanks for making this series!!
@Mutual_Information
9 ай бұрын
@@danielawesome12 Happy to - love it when people check out the RL series
@imanmossavat9383 Жыл бұрын
I am waiting for your policy gradient video to use in my class! Are you going to release it any time soon?👀🙏
@Mutual_Information
Жыл бұрын
Yes, I've finished shooting it. Just in the editing phase now. It'll be post in about a week
@rewixx69420 Жыл бұрын
finally :-)
@datsplit2571 Жыл бұрын
Thank you for the great explanations and animations! Helped me a lot with passing the Advanced Machine Learning course! (Passed with an 8, this is approximately an A in the US grade system) Is there any way I can donate €5 to your PayPal? I wasn't able to do this through patreon/youtube as they both require a creditcard, which I don't have. (Creditcards are not that common in the Netherlands, especially not as a student)
@Mutual_Information
Жыл бұрын
That's very kind of you! I don't actually have PayPal, so I'm not sure how this transfer would work. But that's ok - there's no need! One thing that I would appreciate much more than the money is if you recommend this channel to someone in your class. Word of mouth is a big deal for a channel like this :)
@datsplit2571
Жыл бұрын
@@Mutual_Information I already recommended your channel in the teams channel of the university course :) at the start of 2023. I'll also sbare your channel with some friends of mine. Looking forward to part 6!
@Mutual_Information
Жыл бұрын
@@datsplit2571 you're a hero! Thank you!!
@zenchiassassin283 Жыл бұрын
The end looks like tabu search x)

Function Approximation | Reinforcement Learning Part 5

Пікірлер: 61

@bean217

3 ай бұрын

@Mutual_Information

Жыл бұрын

@Mutual_Information

Жыл бұрын

@Mutual_Information

Жыл бұрын

@Mutual_Information

Жыл бұрын

@Mutual_Information

Жыл бұрын

@Mutual_Information

Жыл бұрын

@Mutual_Information

Жыл бұрын

@Mutual_Information

Жыл бұрын

@Mutual_Information

5 ай бұрын

@Mutual_Information

Жыл бұрын

@Mutual_Information

Жыл бұрын

@Mutual_Information

Жыл бұрын

@Mutual_Information

Жыл бұрын

@Mutual_Information

Жыл бұрын

@Mutual_Information

23 күн бұрын

@Mutual_Information

7 ай бұрын

@bonettimauricio

7 ай бұрын

@Mutual_Information

7 ай бұрын

@Mutual_Information

Жыл бұрын

@Mutual_Information

9 ай бұрын

@danielawesome12

9 ай бұрын

@Mutual_Information

9 ай бұрын

@Mutual_Information

Жыл бұрын

@Mutual_Information

Жыл бұрын

@datsplit2571

Жыл бұрын

@Mutual_Information

Жыл бұрын

Келесі