Policy Gradient Methods | Reinforcement Learning Part 6

The machine learning consultancy: truetheta.io
Want to work together? See here: truetheta.io/about/#want-to-w...
Policy Gradient Methods are among the most effective techniques in Reinforcement Learning. In this video, we'll motivate their design, observe their behavior and understand their background theory.
SOCIAL MEDIA
LinkedIn : / dj-rich-90b91753
Twitter : / duanejrich
Github: github.com/Duane321
Enjoy learning this way? Want me to make more videos? Consider supporting me on Patreon: / mutualinformation
SOURCES FOR THE FULL SERIES
[1] R. Sutton and A. Barto. Reinforcement learning: An Introduction (2nd Ed). MIT Press, 2018.
[2] H. Hasselt, et al. RL Lecture Series, Deepmind and UCL, 2021, • DeepMind x UCL RL Lect...
[3] J. Achiam. Spinning Up in Deep Reinforcement Learning, OpenAI, 2018
ADDITIONAL SOURCES FOR THIS VIDEO
[4] J. Achiam, Spinning Up in Deep Reinforcement Learning: Intro to Policy Optimization, OpenAI, 2018, spinningup.openai.com/en/late...
[5] D. Silver, Lecture 7: Policy Gradient Methods, Deepmind, 2015, • RL Course by David Sil...
TIMESTAMPS
0:00 Introduction
0:50 Basic Idea of Policy Gradient Methods
2:30 A Familiar Shape
4:23 Motivating the Update Rule
10:51 Fixing the Update Rule
12:55 Example: Windy Highway
16:47 A Problem with Naive PGMs
19:43 Reinforce with Baseline
21:42 The Policy Gradient Theorem
25:20 General Comments
28:02 Thanking The Sources
LINKS
Windy Highway: github.com/Duane321/mutual_in...
NOTES
[1] When motivating the update rule with an animation protopoints and theta bars, I don't specify alpha. That's because the lengths of the gradient arrows can only be interpretted on a relative basis. Their absolute numeric values can't be deduced from the animation because there was some unmentioned scaling done to make the animation look natural. Mentioning alpha would have make this calculation possible to attempt, so I avoided it.

Пікірлер: 73

  • @maximilianpowers9785
    @maximilianpowers9785 Жыл бұрын

    My RL exam is in 2 weeks, you’re a life saver. I’m studying at UCL and the lectures lack a bit of that much needed visual intuition!

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    Exactly what I'm going for. When I read the update rule, it didn't click until I ultimately landed on a visual like this. Happy it clicked for you too

  • @johnshim6727
    @johnshim67279 ай бұрын

    This was a great video as a beginner in RL to grasp concepts. Appreciate your effort and time for making this!!

  • @stemfolk
    @stemfolk Жыл бұрын

    This is one of the best channels on the site by a long way. Very grateful that you never compromise on the quality of the videos. Excellent work!

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    Thank you - and I'm not slowing down!

  • @rajatkumar.j
    @rajatkumar.j24 күн бұрын

    Finally, after watching this 3 times I got the intuition of this method. Thank you for uploading a great series!

  • @shadowdragon2484
    @shadowdragon2484Ай бұрын

    Genuinely such an amazing series its changed the way I look at optimization problems as a whole moving forward

  • @Mutual_Information

    @Mutual_Information

    Ай бұрын

    Thank you for appreciating this one too. It's much less viewed

  • @derickd6150
    @derickd6150 Жыл бұрын

    Wow I'm so lucky. You only upload every few months and I just watched your previous videos yesterday. Love the series!! Thank you so much!

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    I intend to upload more frequently in fact. Just need to shorten the videos a bit. This one was a monster.

  • @quachthetruong
    @quachthetruong Жыл бұрын

    Your channel makes me love statistical probability and its applications more. You fully explain math without turning it into a tough academic lecture! Thank you so much!

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    Excellent - exactly the impact I'm hoping for

  • @asdf56790
    @asdf567907 ай бұрын

    A huuuge thank you for making this series! It was extremly well-explained and I would've taken me many many more hours learning this from a book/on other resources. It is a very dense course, so I had to spend quite some time rewatching, but it's absolutely worth it and I like the great coverage. Amazing!

  • @Mutual_Information

    @Mutual_Information

    7 ай бұрын

    And you are in exactly the circumstance I was aiming for. After I read the book, I thought.. damn that just takes way too long to learn. If I could lower the cost of learning, people would appreciate it, just like I would have. So I'm glad it worked for you!

  • @rolandbertin-johannet5270
    @rolandbertin-johannet5270 Жыл бұрын

    So grateful for this channel, any topic you cover is understood 10x faster than through other media

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    That's what I'm going for - I'm here for the quick learners ;)

  • @chiwaiwan2484

    @chiwaiwan2484

    6 ай бұрын

    and 10x quicker than my fucking lectures

  • @joshithmurthy6209
    @joshithmurthy6209 Жыл бұрын

    This video literally came a day before my test , thanks for uploading before test , even if you did after my test I would have seen it. They are so good.

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    Excellent! Love to hear it

  • @bonettimauricio
    @bonettimauricio6 ай бұрын

    Really excited to go into this RL journey with you. I was reading the book and watching these lessons, now it is the end (hopefully just the beginning), thank you so much for this!

  • @Mutual_Information

    @Mutual_Information

    6 ай бұрын

    Happy to hear you consider it a journey, and you're welcome!

  • @wenkanglee9596
    @wenkanglee95966 ай бұрын

    I just want to express my gratitude to you. As a total newbie to AI and ML, IMO, this series might be the well-explained videos in RL. Please keep up the good work. :)

  • @Mutual_Information

    @Mutual_Information

    6 ай бұрын

    Thank you, especially when it's said on this video - part 6 of the RL series. This series took me a real long time and it's only appreciate by a small, studious bunch - so it's great to hear from them. Thanks again!

  • @DoGyKG
    @DoGyKG10 ай бұрын

    Damn thank you for making this video. The visualization of algorithm is beyond any other explanations

  • @Mutual_Information

    @Mutual_Information

    10 ай бұрын

    And thank you for watching this one! It took forever to create and is largely at the very end of the series, so not that many people make it to it

  • @awaisahmad5908
    @awaisahmad59082 ай бұрын

    Thank You So Much. I wish we had teachers like You in our universities.

  • @dasyud
    @dasyud3 ай бұрын

    I was struggling to get into reading RL material because of the lack of intuition and this was exactly what I needed. Thanks a ton! I can now easily build upon the fundamentals you've taught me. I'm gonna binge watch every video on your channel since they are all on topics I find very interesting and want to learn about. I hope you put out more videos! Cheers! 🎉

  • @Mutual_Information

    @Mutual_Information

    3 ай бұрын

    Glad they're working for you. And yea I'm cooking a big one as we speak.

  • @user-bj8wg8vq8h
    @user-bj8wg8vq8h10 ай бұрын

    Very well done video, thank you

  • @siddharthbisht1287
    @siddharthbisht1287 Жыл бұрын

    You are going to heaven sir, with the kind of work you are producing. The interesting thing one can watch these with popcorn, with a notebook or listen to it while working on another task and it just works. Your explanations are clear, simple and straightforward which reflects your understanding. Also, thanks for sharing the sources, it genuinely helps a lot. Keep up with a great work.

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    Siddharth you're too nice! If I'm in heaven (hopefully in a long time), I'm sure I'll see you there

  • @onurrrrr77

    @onurrrrr77

    5 ай бұрын

    I am afraid of a man who watches this series with popcorn.

  • @user-fh7hj7du2f
    @user-fh7hj7du2f5 ай бұрын

    Thank you for explaining so well this complicated topic.

  • @mCoding
    @mCoding Жыл бұрын

    Great series! I have a question about the weighting of proto points. Do you do a simple distance weighted average over all proto points, and if so is there a variation that only weights K nearest neighbors? I ask because if the landscape is very non-monotonic, weights from far points might tug in the wrong direction even if locally the nearby proto points give good information, e.g. imagine a gradient landscape that is like a maze.

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    Thank you! To answer your Q, in this very simple case, I'm doing a distance weighted average over all protopoints, but there is a hyperparameter I didn't mention in the video, which scales distances prior to the conversion into normalized weights. The effect is, if the scale is very small, then the model effectively becomes identical to a K=1 nearest neighbors, and so any state only use the single closest protopoint to determine its action probability. In this case, that problematic tugging you mention doesn't exist. If the scale is very large, then the model sees all protopoints as almost equally far away.. and for any state, the action probabilities will essentially be just a simple average of all proto-action-probabilities. So, somewhere between these extremes, we balance that tugging-problem with the benefit of averaging data from nearby states. In larger, high dimensional problems. The way I've done things doesn't work well. We can't tile the space without blowing up parameters. So there are then a variety of approaches.. like selecting protopoints that well partition up the regions where the data is observed.. or reducing dimensionality of the original space into a more manageable latent space. Or.. abandon protopoints entirely and just use deep nets! In fact, I don't believe I've seen any large RL model that is nearest-neighbors based..

  • @TallMonkey
    @TallMonkey4 ай бұрын

    Thank you man. You're the best. Really helped me study for my upcoming exam. Most importantly helping me understand the intuition behind it all

  • @Mutual_Information

    @Mutual_Information

    4 ай бұрын

    My goal exactly!

  • @mberoakoko24
    @mberoakoko24 Жыл бұрын

    Aye , you are back. I'll come back to watch this with a notebook

  • @marcegger7411
    @marcegger7411 Жыл бұрын

    My favorite youtube channel is back!!

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    Thank you Marc :)

  • @MathVisualProofs
    @MathVisualProofs Жыл бұрын

    👍So nicely done.

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    Thank you MVP - that means something big come from you

  • @akritiupreti6974
    @akritiupreti69746 ай бұрын

    Work of art!

  • @dhinas9444
    @dhinas9444Ай бұрын

    Thank you man for maxing out our mutual information!

  • @Mutual_Information

    @Mutual_Information

    Ай бұрын

    YES! Someone finally said it! lol that's honestly exactly what I had in my mind when naming this silly channel

  • @timothytyree5211
    @timothytyree5211 Жыл бұрын

    Encore! Encore! Thou art the mac daddy of RL! I will stay tuned! Couldst thou pretty please consider developing a follow up video on more sophisticated PPO methods?

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    PPO would be the next topic. I can't say that's the next thing on the menu, but if this series gathers some attention, I can come back with an encore in due time. Thanks for the love!

  • @maximechopin2600
    @maximechopin26005 ай бұрын

    I wanted to ask how you make your animations, they are very clear and concise , thanks for the great content

  • @kimchi_taco
    @kimchi_taco10 ай бұрын

    salute!

  • @Mutual_Information

    @Mutual_Information

    10 ай бұрын

    Thanks for watching these more intense videos. This one took a long ass time!

  • @dhlee8594
    @dhlee8594 Жыл бұрын

    What is your background current job? Your videos are of really high-quality and cover advanced topics

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    I used to be in quantitative finance. Now I'm a data scientist at Lyft. And thank you - the advanced topics are where the action is!

  • @zenchiassassin283
    @zenchiassassin283 Жыл бұрын

    Hi, thanks a lot for your videos ! Do you plan to make some videos on other reinforcement learning policy gradient methods ?

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    RL videos aren't next in the queue. I'll be exploring some new categories. But eventually, I'd like to touch on PPO more directly. Just because if it's usefulness. But that probably won't happen this year.

  • @5_inchc594
    @5_inchc594 Жыл бұрын

    Thanks for the clear explaination. Could you please make a video on the PPO algorithm?

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    I don't currently have plans for it.. but it would be my next follow up on the RL series.

  • @5_inchc594

    @5_inchc594

    Жыл бұрын

    @@Mutual_Information Thanks! that would be great

  • @add-mt5xc
    @add-mt5xc7 ай бұрын

    How does one see that the objective (average reward) used in the policy gradient theorem is independent of the initial state? I think this is true as on the right-hand side, you are summing over states s. Is it the Markov assumption that lets you write the average reward in that manner such that it is independent of the initial state?

  • @Mutual_Information

    @Mutual_Information

    7 ай бұрын

    It's not that it's independent of the stating state. It's that it's true for any starting state. You'll still be able to improve in expected return even if the starting state is randomly set at the start of each episode.

  • @DRich222
    @DRich222 Жыл бұрын

    Hah- Nice candid clip at the end.

  • @actualBIAS
    @actualBIAS Жыл бұрын

    Would love to add this to my playlists. Is there any chance to do it? Why is it even disabled? Great vids btw

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    Wait.. what's disabled?? I don't think I've disabled any such thing on my end.

  • @GusTheWolfgang
    @GusTheWolfgang11 ай бұрын

    Why didnt yoou upload before my dissertation last year D:

  • @akhilezai
    @akhilezai Жыл бұрын

    Yaayyyy you're back!

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    Indeed. This one was 30 minutes long, hence it took forever to create. Next videos will be shorter/uploaded more frequently.

  • @moisesbessalle
    @moisesbessalleАй бұрын

    I think at @6:20 you meant "since we are creating 3 values out of 2 constraints" right?

  • @wilhem7206

    @wilhem7206

    Ай бұрын

    The one constraint is theta1 + theta2 + theta3 = 0, so if you know two of the thetas the 3rd one is determined

  • @moisesbessalle

    @moisesbessalle

    Ай бұрын

    @@wilhem7206 but there is another constraint which is that each p>=0

  • @wrjog23
    @wrjog23 Жыл бұрын

    too much informatiooooon!!!

  • @zerotwo7319
    @zerotwo731916 күн бұрын

    Man, I hate that this has nothing to do with neurons, or anything biologically inspired. great explanation to see what is really going on. but this has nothing to do with intelligence.

  • @TimL_
    @TimL_ Жыл бұрын

    Thank you.

  • @gravkint8376
    @gravkint83765 ай бұрын

    Damn this video is helpful. So far I was only able to get a vague understanding of the topic with lots of time and effort. But this gives a whole new level of intuition. Thank you so much!

  • @Mutual_Information

    @Mutual_Information

    5 ай бұрын

    Exactly what I'm going for ;)