Temporal Difference Learning (including Q-Learning) | Reinforcement Learning Part 4

The machine learning consultancy: truetheta.io
Want to work together? See here: truetheta.io/about/#want-to-w...
Part four of a six part series on Reinforcement Learning. As the title says, it covers Temporal Difference Learning, Sarsa and Q-Learning, along with some examples.
SOCIAL MEDIA
LinkedIn : / dj-rich-90b91753
Twitter : / duanejrich
Github: github.com/Duane321
Enjoy learning this way? Want me to make more videos? Consider supporting me on Patreon: / mutualinformation
SOURCES
[1] R. Sutton and A. Barto. Reinforcement learning: An Introduction (2nd Ed). MIT Press, 2018.
[2] H. Hasselt, et al. RL Lecture Series, Deepmind and UCL, 2021, • DeepMind x UCL | Deep ...
SOURCE NOTES
The video covers topics from chapters 6 and 7 from [1]. The whole series teaches from [1]. [2] has been a useful secondary resource.
TIMESTAMP
0:00 What We'll Learn
0:52 No Review
1:18 TD as an Adjusted Version of MC
2:49 TD Visualized with a Markov Reward Process
6:34 N-Step Temporal Difference Learning
8:08 MC vs TD on an Evaluation Example
11:50 TD's Trade-Off between N and Alpha
12:47 Why does TD Perform Better than MC?
15:29 N-Step Sarsa
17:15 Why have N above 1?
19:02 Q-Learning
20:50 Expected Sarsa
21:48 Cliff Walking
25:04 Windy GridWorld
28:12 Watch the Next Video!
NOTES
Code to compare TD vs MC on the evaluation task: github.com/Duane321/mutual_in...

Пікірлер: 96

  • @lordjared2572
    @lordjared2572 Жыл бұрын

    ok, just pls upload more vids. There's a huge vacuum of ML education out here for people who are not scared of math.

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    That’s the audience I want!

  • @rewixx69420

    @rewixx69420

    Жыл бұрын

    I learng ML by my self its hart find informatio this gus saves me on RL thanks

  • @marcin.sobocinski
    @marcin.sobocinski Жыл бұрын

    Your animations are fantastic, it's like a new dimension of learning. I helps so much to be able to visualize RL processes. Thank you!

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    Thank you Marcin - it means a lot when I hit someone in the audience exactly as I hoped :)

  • @samlaki4051
    @samlaki4051 Жыл бұрын

    Starting to get into RL, Yannic recommended you! How have I missed such a gem of a channel!

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    Love Yannic's stuff. Super pumped to get the shout out.

  • @marcegger7411
    @marcegger7411 Жыл бұрын

    Fantastic!! Keep up the amazing work! It's always so great to see quality content presented so eloquently :)

  • @saranahluwalia5353
    @saranahluwalia5353 Жыл бұрын

    I wish I had this to review 5 years ago. This would have eliminated wasteful experiments. Thank you for making this more accessible.

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    The way I'm designing the channel.. is the channel I would have wanted when I was learning ML for the first time. Seems like that theory landed!

  • @rostislavmarkov7488
    @rostislavmarkov74885 ай бұрын

    Awesome series covering essential fundamentals with great didactics. Raised the bar at creating high-quality content!

  • @mryazbeck98
    @mryazbeck989 ай бұрын

    I love your videos to recap what I read in the book. Helps me understand and visualize everything better. I was however hoping to know more about batch training because I didn't understand how it works at all!

  • @buh357
    @buh357 Жыл бұрын

    I am starting to learn RL, and your video is helping me a lot. You have a clear and precise explanation; thank you. Looking forward to new coming videos :)

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    excellent! Exactly what I'm going for :)

  • @pandie4555
    @pandie45559 ай бұрын

    dude, the amount of work you put in these videos is fantastic.

  • @Mutual_Information

    @Mutual_Information

    9 ай бұрын

    lol yea these videos were crazy hard. This one took me 100+ hours

  • @bornamorasai5285
    @bornamorasai5285 Жыл бұрын

    Can't wait for part 5 and 6!!!! Let's go!!!!

  • @hjop010
    @hjop010 Жыл бұрын

    Great video as always! It is helping me so much complementing Barto & Sutton.

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    Sweet - that's exactly how I want this series used!

  • @victormanuel8767
    @victormanuel876711 ай бұрын

    "You've covered a lot, give yourself a 3 second break" *0.5 seconds later* "Great, let's keep going"

  • @antonkot6250
    @antonkot62508 ай бұрын

    O, man. This visualisations are top-notch!

  • @ryderbrooks1783
    @ryderbrooks1783 Жыл бұрын

    This channel is extremely under subscribed. I very much appreciate the work you're putting in here. Thank you

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    Ha well I can't expect a large number of subs when my stuff is so technical. So.. should I make it less technical? Nope!

  • @fedelozano2895
    @fedelozano2895 Жыл бұрын

    Hi, your videos are really specific and super helpful! This information is helping me with my paper, can´t wait for the next one, thank you :)

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    Lol yea specific as hell! Glad it helps and I'm working on the next one as we speak!

  • @skirazai7591
    @skirazai7591 Жыл бұрын

    Man your doing some very high quality stuff ,keep it up.

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    Thanks - I'm tryin!

  • @qiguosun129
    @qiguosun129 Жыл бұрын

    Excellent lecture! It solved the doubts about the method that reviewers asked me to do parameter uncertainty analysis in scientific research papers.

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    Excellent - that's thrilling to hear this has some real impact!

  • @datsplit2571
    @datsplit2571 Жыл бұрын

    High quality videos, my compliments! This helps so much in understanding RL for a Master's course. Thank you!

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    You're welcome! And yuno what would be totally sweet? If you told your classmates about these vids :)

  • @datsplit2571

    @datsplit2571

    Жыл бұрын

    @@Mutual_Information Posted it in the teams chat of the Advanced Machine Learning course!

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    @@datsplit2571 thank you! Over time moves like that will make all the difference :)

  • @AlisonStuff
    @AlisonStuff Жыл бұрын

    love it!! so good!!!

  • @ezragarcia6910
    @ezragarcia6910 Жыл бұрын

    Thanks!! I just found your channel and IT'S AWESOME!

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    Thanks Ezra - I think it’s a work in progress lol 😁

  • @broccoli322
    @broccoli322 Жыл бұрын

    Great videos! Can't wait for more.

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    This is one of my favorites in fact - glad it hits!

  • @timothytyree5211
    @timothytyree5211 Жыл бұрын

    Excellent video! I am so stoked to use this in my work!

  • @timothytyree5211

    @timothytyree5211

    Жыл бұрын

    I used the knowledge of ^this vid today to help a buddy out at work! You rock, Duane!

  • @timothytyree5211

    @timothytyree5211

    Жыл бұрын

    I'm really looking forward to your next video on function approximation!

  • @IRONMAIDEN146
    @IRONMAIDEN146 Жыл бұрын

    Your videos are helping me a lot in my AI engineering degree, thanks a lot!

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    Love it!

  • @sathyakumarn7619
    @sathyakumarn7619 Жыл бұрын

    So precise and fun! But Highly under rated! Please advertise so that more people can be benefitted!

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    Thank you! and I agree, distributing this needs some more effort. Sometimes my tweets help

  • @NazerkeSafina
    @NazerkeSafina8 ай бұрын

    superb job with visualization. keep up! Only you could explain certain things to me, I've watched several other tutorials and wasn't feeling confident. One thing, I wish the explanation of how V(s) obtained for each state was more detailed, perhaps with multiple samples and step by step calculations.

  • @selcukkalafat2857
    @selcukkalafat2857 Жыл бұрын

    thank you. looking forward for the next part

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    In the works :) but I'll need some patience

  • @surakshachoudhary2880
    @surakshachoudhary2880 Жыл бұрын

    Eagerly awaiting the remaining episodes - remarkable work there! So far I've just watched the videos, and I think it can only become clearer with some practice - but was curious why I keep hearing about 'deep' RL? Where does the 'deep' a.k.a. neural nets fit into these videos..

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    Parts 5 and 6 are in the works - I only just started them so, it'll take some time. Nothing coming this month, but probably in Dec. Good question! "Deep" in "Deep RL" refers to deep learning, where we utilize neural networks with many layers to learn complex functions from observations. At this point, those NNs have had no place to be inserted - but that changes in part 5. In part 5, we'll discuss handling state-space that are so huge, we can't list them out in a table. In that case, you can use a function to model giant swaths of those states.. and Deep NNs can be especially good at that. My video won't be a deep dive in NNs - that's too big of a subject. But it should be clear how they would get used.

  • @marcin.sobocinski
    @marcin.sobocinski Жыл бұрын

    Dziękujemy.

  • @nathanzorndorf8214
    @nathanzorndorf82144 ай бұрын

    Thanks for this. Amazing.

  • @bmenashetheman
    @bmenashetheman Жыл бұрын

    What a fantastic series, thank you so much!!!

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    Thanks Ben, glad you see the same value I do Btw, if you know other people studying the same subject, it would help a lot to share this with them :)

  • @b0nce

    @b0nce

    Жыл бұрын

    Double this, great effort, excellent videos, thank you so much Also, Duane, you forget to add this video into RL playlist

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    @@b0nce oh thank you! Fixed

  • @bmenashetheman

    @bmenashetheman

    Жыл бұрын

    @@Mutual_Information already shared it with everyone in my class! I'm certain this channel will get really popular really soon, your content is fantastic.

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    @@bmenashetheman oh you rule! Thank you!

  • @bean217
    @bean2174 ай бұрын

    "If you recall... which you better!" I swear, I recall!

  • @user-sx3dy6cw8m
    @user-sx3dy6cw8m2 ай бұрын

    This is a life saver

  • @raghavendrakaushik1691
    @raghavendrakaushik1691Ай бұрын

    At 4:23 Shouldn't it be traversing backwards in time for MC?

  • @arrozenescau1539
    @arrozenescau15395 ай бұрын

    i wish i could like twice your videos

  • @Mutual_Information

    @Mutual_Information

    5 ай бұрын

    Well unfortunately, there is no way to double-like. I see only one solution: I need to upload 2x more videos!

  • @user-qm6up7kz4n
    @user-qm6up7kz4n8 ай бұрын

    04:00 "Return g_3 is diff of levels at t=3 and the end of Episode". Could someone explain this? a)Why g_3 is that and b)how do we know return at at t=3? In our BJ example we only know Reward at end of Episode(play), and we use that Reward to update Q.

  • @rewixx69420
    @rewixx69420 Жыл бұрын

    episode 6 finally i will undestand PPO

  • @the_random_noob9860
    @the_random_noob98602 ай бұрын

    In an epsilon greedy policy, the two probabilities are epsilon and 1 - epsilon. So, is my understanding correct? if epsilon = 0, the policy always takes the max action value from q table while generating the episode that q-learning, sarsa and expected sarsa becomes identical.

  • @123ming1231
    @123ming1231 Жыл бұрын

    can u make a video later, showing how u make those animation, it is fantasic !!!! It show the concept very clearly !!! The data visualization art behind is so elegant

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    Maybe one day.. The code I use is a big personal library that's not ready for the public. But I could see doing that.. maybe in a year or two after things have gone well. We'll see

  • @Electrikalforenzis
    @Electrikalforenzis Жыл бұрын

    Where are the rest, you are doing fine job with these episodes!!

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    haha thank you very much. I need a bit of time for parts 5 and 6. I just moved to a new house, got a full time job, many little things.. but it's coming :)

  • @snowflake5204
    @snowflake5204 Жыл бұрын

    At 20:30 shouldn't it be SARSA rather than TD1? Since we use state value function in TD rather than state action

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    Sorry it's not clear. I'm using 1-step TD control and Sarsa interchangeably here.

  • @samuelepignone8255
    @samuelepignone8255 Жыл бұрын

    Thanks a lot for your videos. There's just one thing that doesn't make sense to me: in the last example when you add Q-learning in the graph, it has a lower maximum reward than SARSA, and I don't understand how that's possible since the path it follows has many fewer steps. I hope I have explained my doubt well.

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    I don't know either actually. My intuition, by this point, is that an inability to explain performance is the rule, not the exception. It's rare that you can tell a story about why one algo is superior on a particular problem. These very simple toy examples are designed precisely to call out the different in their character. The last one, however, is weird enough that I can't explain all the performance gaps. If anyone else has an intuition, please chime in!

  • @sidnath7336
    @sidnath7336 Жыл бұрын

    Could we get videos on Markov Chain Monte Carlo methods?

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    MCMC! Absolutely, just may take me a bit to get to it

  • @kimchi_taco
    @kimchi_taco10 ай бұрын

    14:30 TD is better than MC in general. In my opinion, * TD: It's more align to Bellman Optimality equation, as it focuses on n steps optimization. * MC: It's more align to Bellman equation (with sampling), as it averages the rewards over the trajectory.

  • @abramgeorge3290
    @abramgeorge329010 ай бұрын

    why didn't we use importance sampling in Q-Learning, I have been searching for an answer for days with no clue

  • @hihellohowrumfine
    @hihellohowrumfine Жыл бұрын

    Can you please do a series on statistics

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    That's a bit broad. Is there a particular topic you're interested in?

  • @hihellohowrumfine

    @hihellohowrumfine

    Жыл бұрын

    @@Mutual_Information specifically statistical learning theory, something like what 3blue1brown channel has done for linear algebra. A lot of times when I read ML papers, it's hard to deeply appreciate why certain techniques work.

  • @imanmossavat9383
    @imanmossavat9383 Жыл бұрын

    why the mean TD performance is getting worst as you increase m (11:24)

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    I am not sure.. but I know the behavior is expected. That's actually a question posed in Sutton/Barto's book and I'm sure the answer is online somewhere.

  • @imanmossavat9383

    @imanmossavat9383

    Жыл бұрын

    @@Mutual_Information Thank you for your response. I really benefit from your videos. If I figure out the answer, I will share it here.

  • @danielm3772

    @danielm3772

    Жыл бұрын

    From what I have read online and my personal interpretation: this is due to 2 factors, mainly a big value for alpha and the initial state values. I we take the 5 states (calling them A,B,C,D,E) example, we know that the true values are 1/6, 2/6, 3/6, 4/6, 5/6. If we then use an initialization schema of 1/2 for all of them, then first we will see a decrease in the error due to the update of A,B,D,E (as they have the biggest difference compared to the true value), however at some point they are going to stabilize and V(C) is going to change as well, and because the value of alpha is big, we will move away from 1/2 (which corresponds to the initial AND true value) by an non-negligeable amount. Hope that helps.

  • @hansthompson
    @hansthompson Жыл бұрын

    where is part five? in production?

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    Yea, I took a little break before starting part 5. I'm currently writing it. It'll take sometime. Should be ready in January.

  • @hansthompson

    @hansthompson

    Жыл бұрын

    @Mutual Information very easy to follow. I'll be patiently waiting. Thanks.

  • @catcoder12
    @catcoder129 ай бұрын

    I really liked the videos, but a pace felt a bit too quick...The efforts put into examples are commendable.

  • @Mutual_Information

    @Mutual_Information

    9 ай бұрын

    I'll take it! I'm learning the slowness thing.. a bit

  • @coconut_camping
    @coconut_camping Жыл бұрын

    I bet you are in Stanford as a professor teaching RL by now? This became a RL bible to me.

  • @Mutual_Information

    @Mutual_Information

    Жыл бұрын

    haha not quite a professor! But if you're using this as a resource, I consider my job fulfilled

  • @swastiksharma2683
    @swastiksharma26836 ай бұрын

    you have so good content but you tried to make the video as short as you can due to which there are no natural pauses in the video making it difficult to focus and understand your content.

  • @Mutual_Information

    @Mutual_Information

    6 ай бұрын

    I think you're right. I'll have fewer cuts in future videos, and I have less cuts in my more recent ones.

  • @raminessalat9803
    @raminessalat98039 ай бұрын

    You videos are amazing and I know the time spent for creating these are probably astronomical. But i do have a feedback that would help your videos and its my own observation. I think your body language is too much and I feel it is very unnatural/isn't meaningful for the content. I don't know if you are actually forcing it to have a body language or not, but I think body language is something that happens naturally and you don't need to try too hard for it. At first when I started to watch your videos, that was something that was repelling for me personally but when I saw your content, I became a fan of your channel. So hope you take it as a constructive feedback from a fan.

  • @Mutual_Information

    @Mutual_Information

    9 ай бұрын

    Thank you, appreciate the genuine feedback, and I know what you mean. There's this awkward robotic-ness that's difficult to shake. But I think some of it is due this set up. In my more recent videos, my new setup has hopefully brought the unnaturalness down. A work in progress. I also may de-burden myself with trying to match my language with what I'll anticipate will be on screen.