Reinforcement Learning with sparse rewards

Ғылым және технология

In this video I dive into three advanced papers that addres the problem of the sparse reward setting in Deep Reinforcement Learning and pose interesting research directions for mastering unsupervised learning in autonomous agents.
Papers discussed:
Reinforcement Learning with Unsupervised Auxiliary Tasks - DeepMind:
arxiv.org/abs/1611.05397
Curiosity Driven Exploration - UC Berkeley:
arxiv.org/abs/1705.05363
Hindsight Experience Replay - OpenAI:
arxiv.org/abs/1707.01495
If you want to support this channel, here is my patreon link:
/ arxivinsights --- You are amazing!! ;)
If you have questions you would like to discuss with me personally, you can book a 1-on-1 video call through Pensight: pensight.com/x/xander-steenbr...

Пікірлер: 93

  • @michaelc2406
    @michaelc24066 жыл бұрын

    I've just been reading these papers for the openai retro competition. Your video went into a lot of depth, which is really hard to do with complex ideas, bravo!

  • @DeaMikan
    @DeaMikan3 жыл бұрын

    Seriously great, I'd love to see an updated video with the newest research!

  • @pasdavoine
    @pasdavoine6 жыл бұрын

    Fantastic video! Making me gain time and in an enjoyable way. Many thanks

  • @henning256yt
    @henning256yt2 жыл бұрын

    Love your passion for what you are talking about!

  • @AnkitBindal97
    @AnkitBindal976 жыл бұрын

    Your teaching style is incredible! Can you please do a video on Capsule Networks?

  • @mohammadhatoum
    @mohammadhatoum5 жыл бұрын

    Always impressing and I never get bored watching your videos. Good job and keep it up 👍

  • @thomasbao4477
    @thomasbao44774 жыл бұрын

    AMAZING! The prediction-reward algorithm in the first mentioned paper is very similar to how humans learn, at least based on a computational neurobiology course I took in college.

  • @adrienforbu5165
    @adrienforbu51653 жыл бұрын

    It's always interesting to see how ideas around curiosity have taken off in reinforcement learning (I think about the "Never give up" paper and atari57

  • @timonix2
    @timonix22 жыл бұрын

    Holy shit. I have been working on this problem for months and to see that professionals are getting almost the exact same answers as me is pretty cool. There are a whole bunch if ideas in here I have not tried yet as well. Super useful

  • @Jabrils
    @Jabrils5 жыл бұрын

    fantastic content lad!

  • @harrisonwestley7537

    @harrisonwestley7537

    2 жыл бұрын

    Instablaster...

  • @Frankthegravelrider
    @Frankthegravelrider5 жыл бұрын

    Ah dude just discovered your videos!! Just what I needed. Can't believe, have 6 year degree in engineering, work in AI and I can still learn from KZread.mad when you think out it. It's a new paradigm of education

  • @ArxivInsights

    @ArxivInsights

    5 жыл бұрын

    Haha, glad to hear that! You're welcome :)

  • @CalvinJKu
    @CalvinJKu6 жыл бұрын

    Awesome video as usual!

  • @LatinDanceVideos
    @LatinDanceVideos5 жыл бұрын

    Great channel. Thanks for this and other videos.

  • @ItalianPizza64
    @ItalianPizza646 жыл бұрын

    Amazing video again! Clear and concise as always, all but trivial with this kind of topics. I am very curious to see what you will be focusing on next!

  • @bonob0123
    @bonob01235 жыл бұрын

    great stuff well done man

  • @robosergTV
    @robosergTV6 жыл бұрын

    need more deep RL stuff ^^

  • @aliamiri4524
    @aliamiri45243 жыл бұрын

    amazing content(s), you are a very good teacher

  • @mashpysays
    @mashpysays6 жыл бұрын

    Thanks for the nice explanation.

  • @armorsmith43
    @armorsmith433 жыл бұрын

    This a very effective strategy for personal productivity as a programmer with ADHD. I augment my unreliable reward-signaling system with Test-Driven Development.

  • @miriamramstudio3982
    @miriamramstudio39823 жыл бұрын

    Excellent video! Thx.

  • @ianprado1488
    @ianprado14886 жыл бұрын

    You make high quality videos A+

  • @satyaprakashdash8203
    @satyaprakashdash82034 жыл бұрын

    I would like to see a video on meta reinforcement learning. Its an exciting field now!

  • @glorytoarstotzka330
    @glorytoarstotzka3305 жыл бұрын

    no clickbait , good video quality , good sound, relative nice topics for some people, but 16k subs Excuse me , wtf

  • @sebastianjost

    @sebastianjost

    3 жыл бұрын

    The video quality is great but the topics are just not interesting for many people. And of course few subs makes it hard to find this channel. I'm glad I did though. This is a great overview.

  • @andrestorres2836
    @andrestorres28365 жыл бұрын

    Your videos are awesome!! Im going to tell all my frieds about you

  • @skyheart_dev
    @skyheart_dev Жыл бұрын

    Maan it is so damn interesting and good video. I come from a completely different area - game development. And I wanted to understand some basics of A.I because I really want to dive deep into this to eventually teach for example rocket to fly, flappy bird to jump, snake to play efficiently. Reading papers is really difficult without knowledge of some basics, and the way you explained all these things is so good. I still don't understand the terminology and all these formulas, but at least I got one step closer :) Thank you for this brilliant video :)

  • @Leibniz_28
    @Leibniz_284 жыл бұрын

    Really happy to find your channel, really sad to find out few videos in it.

  • @inspiredbynature8970
    @inspiredbynature89702 жыл бұрын

    you are doing great, keep it up

  • @emademad4
    @emademad45 жыл бұрын

    great content , great purposes . please do more videos asap . im studding at the same field would you suggest some links for up to date good articles?

  • @vadrif-draco
    @vadrif-draco11 ай бұрын

    So "HER" basically starts off as "if I do this action, I can get to this goal", and then gradually learns how to flip the statement to "if I want to get to this goal, I need to do this action". Pretty nice.

  • @nikoskostagiolas
    @nikoskostagiolas6 жыл бұрын

    Hey dude, awesome video as always! Could you do one for the Relational Deep Reinforcement Learning paper of Zambaldi et al. ?

  • @sunegocioexitoso
    @sunegocioexitoso5 жыл бұрын

    Awesome video

  • @lukaslorenc4816
    @lukaslorenc48165 жыл бұрын

    Recommend to read "Curiosity-driven Exploration by Self-supervised Prediction" it's really awesome paper.

  • @minos99
    @minos992 жыл бұрын

    I was really touched by the ending of the video. We need research on models and the social-economic consequences of the AI models...and I don't mean that terminator, Butlerian jihad crap. I mean human side: job losses, bias, morality, misuse...etc

  • @Matthew8473
    @Matthew84734 ай бұрын

    This is a marvel. I read a book with similar content, and it was a marvel to behold. "The Art of Saying No: Mastering Boundaries for a Fulfilling Life" by Samuel Dawn

  • @QNZE5
    @QNZE56 жыл бұрын

    Hey, very nice video :) What is the source for that video containing the boat in a behavioural circuit?

  • @mountain_bouy
    @mountain_bouy5 жыл бұрын

    you are amazing

  • @DjChronokun
    @DjChronokun5 жыл бұрын

    if it wasn't for this channel I'd have never have known it wasn't pronounced 'ark-ziv'

  • @ritajitdey7567

    @ritajitdey7567

    5 жыл бұрын

    Same here, at least we got it corrected without embarrassing ourselves IRL

  • @wahabfiles6260

    @wahabfiles6260

    4 жыл бұрын

    @@ritajitdey7567 INR

  • @cyrilfurtado
    @cyrilfurtado6 жыл бұрын

    Great video, I can now look to read the papers. It would be great to post the links of the papers here

  • @ArxivInsights

    @ArxivInsights

    6 жыл бұрын

    All links are in the video description! :)

  • @adityaojha627
    @adityaojha6273 жыл бұрын

    Nice video. Question: Is DDQN efficient at solving sparse reward environments? Say I only give an agent a reward at the end of an episode.

  • @ThibaultNeveu
    @ThibaultNeveu6 жыл бұрын

    Very nice video. Thanks you :)

  • @420_gunna
    @420_gunna6 жыл бұрын

    Great vid! re: the ending of the video, what do you think about creating something on AI safety or ethics?

  • @ArxivInsights

    @ArxivInsights

    6 жыл бұрын

    Actually, that's a really good suggestion! Added to my pipeline :)

  • @AnonymousAnonymous-ht4cm

    @AnonymousAnonymous-ht4cm

    5 жыл бұрын

    Have you seen Robert Miles' channel? He has some good stuff on AI safety, but posts rather infrequently.

  • @saikat93ify
    @saikat93ify5 жыл бұрын

    This channel is really amazing initiative as I've always found ArXiv extremely interesting but don't have the time to read all the papers. :) This question may sound very silly, but - How do programs play game like Mario and Reversi ? What I mean is, don't we need some kind of hardware like a keyboard or joystick to play these games ? How do software agents play this game ? I have always been curious about this. Please explain my doubt if anyone has my answer. :)

  • @ArxivInsights

    @ArxivInsights

    5 жыл бұрын

    It's not that hard to hack the game engine so that an RL agent controls the game inputs via an API (so you can do that from eg Python) in stead of via a controller/joystick. In most gym games there's even an option to train your agent from the raw game state in stead of the rendered pixel version!

  • @TheAcujlGamer
    @TheAcujlGamer3 жыл бұрын

    This is so cool, specially the "HER" method. Wow!

  • @aytunch
    @aytunch4 жыл бұрын

    Great videos and channel. Why don't you make any more videos? :(

  • @hassanbelarbi5185
    @hassanbelarbi51854 жыл бұрын

    if some one want to contact you directly is there any way ?? i have some questions related to my thesis topic. thanks in advance for your efforts .

  • @samanthaqiu3416
    @samanthaqiu34164 жыл бұрын

    Make a video on the MuZero paper

  • @DistortedV12
    @DistortedV126 жыл бұрын

    Smart guy

  • @artman40
    @artman406 жыл бұрын

    What about delayed rewards?

  • @ycjoelin000
    @ycjoelin0006 жыл бұрын

    What's the website you used in 2:23?

  • @Vladeeer
    @Vladeeer6 жыл бұрын

    C a n. you do an example for RL?

  • @jeffreylim5920
    @jeffreylim59205 жыл бұрын

    7:56 where the main point starts

  • @codyheiner3636
    @codyheiner36365 жыл бұрын

    Hi Xander, I made a Patreon account just for you! Keep it up!

  • @ArxivInsights

    @ArxivInsights

    5 жыл бұрын

    Thx a lot Cody!! Getting this kind of support from people I've never is such a great motivation to keep going! Many thanks :)

  • @arjunbemarkar7414
    @arjunbemarkar74145 жыл бұрын

    Can you tell me where you find these articles?

  • @areallyboredindividual8766

    @areallyboredindividual8766

    3 жыл бұрын

    Website appears to be Arxiv. Searching for DeepMind and OpenAI papers will yield results too

  • @markusdegen6036
    @markusdegen60366 жыл бұрын

    Hi, i am completely new to the topic of machine learning itself.....just some thought.....when you do this sparse rewards, would it be possible to have each reward as somehow a forced version and a free will version......and then enforcing not to have forced ones? It sounds a bit abstract right now....when i get a better grasp of things maybe later in time i can rephrase that.

  • @ArxivInsights

    @ArxivInsights

    6 жыл бұрын

    Markus Degen A bit abstract indeed. In general the current paradigm is as follows: we want to give the algorithm sparse extrinsic rewards because those are usually easy to define and relatively unambiguous: 'win the game', 'stack object A on top of B', ... However, there are many people working on algorithms that create their own derivative intrinsic reward signals. In human terms those are things like motivation, passion, curiosity, ... that might not be directly linked to extrinsic rewards (paychecks, eating food, sex, ...) but seemingly evolution has shaped those drives to overcome similar problems as Deep RL is facing right now!

  • @vigneshamudha821
    @vigneshamudha8215 жыл бұрын

    brother please explain about capsule network

  • @ArxivInsights

    @ArxivInsights

    5 жыл бұрын

    Aurelien Geron has a great video on CapsNets, no need to redo his video, its already perfect! kzread.info/dash/bejne/ooSCmsZpdZafYJM.html

  • @vigneshamudha821

    @vigneshamudha821

    5 жыл бұрын

    +Arxiv Insights thanks bro

  • @MasterofPlay7
    @MasterofPlay74 жыл бұрын

    any coding videos?

  • @shivajbd
    @shivajbd5 жыл бұрын

    15:29 Modi

  • @skbshubham

    @skbshubham

    4 жыл бұрын

    lol!!

  • @wahabfiles6260
    @wahabfiles62604 жыл бұрын

    why his head bigger then the body? Alien?

  • @StevenSmith68828
    @StevenSmith688285 жыл бұрын

    I really like machine learning because it feels like training a pokemon sure it sometimes take a very long time to get it set up but yeah...

  • @viralblog007
    @viralblog0075 жыл бұрын

    can you suggest a link of research paper on reinforcement learning?.

  • @dripdrops3310
    @dripdrops33105 жыл бұрын

    The number of views of your videos is not proportional to their quality. Looking forward to new content!

  • @herrizaax
    @herrizaax5 жыл бұрын

    Nice video. I didn't get the last part: how does it learn faster if it sets virtual goals? If it gets the same reward for a virtual goal as for the real goal, then it will just learn it can shoot at any point which is made a goal but the real goal will never be found. If it gets a lower reward then it learns that shooting at goals gives a reward but it tells nothing about the proximity to the real goal. I'm obviously missing something here and I'm really curious what it is. Thank you :)

  • @planktonfun1
    @planktonfun14 жыл бұрын

    big brain filter

  • @WerexZenok
    @WerexZenok6 жыл бұрын

    I don't see any social problem automation can cause. If you let the market free, it will ajust itself as it always did.

  • @egparker5

    @egparker5

    6 жыл бұрын

    I sort of feel the same way. We shouldn't make any public policy decisions until we see actual damage happening, and not just overexcited predictions. So far it seems DL/ML is creating net additional jobs and increasing average salaries. If that changes, then maybe it is time to think about new public policies. In the meantime, I would recommend retargeting the time spent worrying about AI into time spent learning about AI to increase your human capital. www.wsj.com/articles/workers-fear-not-the-robot-apocalypse-1504631505 www.forbes.com/sites/bernardmarr/2017/10/12/instead-of-destroying-jobs-artificial-intelligence-ai-is-creating-new-jobs-in-4-out-of-5-companies

  • @WerexZenok

    @WerexZenok

    6 жыл бұрын

    Agreed. And even imagining the worst scenario, where AI replaces all jobs, we still will be capable of owning bots and renting then. We will live like gods on earth.

  • @NegatioNZor

    @NegatioNZor

    6 жыл бұрын

    The question here though, is WHO will be owning these robots, and how will these jobs be distributed? For highly educated and resourceful people, this will probably not be a huge issue. But there are something like N million truck drivers in the US, which will have a much harder time adjusting. Going from blue-collar to white-collar is probably not as easy.

  • @elarrayhesohit4479
    @elarrayhesohit44794 жыл бұрын

    I just want my computer to grind levels. Not take a my job.

  • @Rowing-li6jt
    @Rowing-li6jt5 жыл бұрын

    louder pls

  • @MD-pg1fh
    @MD-pg1fh6 жыл бұрын

    Her?

  • @tsunamio7750
    @tsunamio77504 жыл бұрын

    VOLUME TOO LOW!!!

  • @creativeuser9086
    @creativeuser9086 Жыл бұрын

    what happened to this channel..

  • @loopuleasa
    @loopuleasa6 жыл бұрын

    A feedback on your video: Trim your content, and be more entertaining for the videos. Watch how Siraj does it. From my point of view, I dozed off a couple of times, even though the accuracy of the content is high. Bascially use less words, less images, less intro, less buildup and focus more on the crux, while going faster to keep your audience on edge and curious. Hope my view is productive to you. Good luck.

  • @loopuleasa

    @loopuleasa

    6 жыл бұрын

    Do it like an AI optimizer does it. Minimize and use simplicity as much as possible until you reach the goal: Communicate the idea you want to convey, in as little time and actions as possible.

Келесі