Training AI Without Writing A Reward Function, with Reward Modelling

Ғылым және технология

How do you get a reinforcement learning agent to do what you want, when you can't actually write a reward function that specifies what that is?
The paper: arxiv.org/pdf/1706.03741.pdf
The blogpost: openai.com/blog/deep-reinforc...
Thanks to my wonderful patrons:
/ robertskmiles
James
Gladamas
Steef
Scott Worley
Jordan Medina
Simon Strandgaard
JJ Hepboin
Pedro A Ortega
Said Polat
Chris Canal
Jake Ehrlich
Kellen lask
Francisco Tolmasky
Michael Andregg
David Reid
Robert Daniel Pickard
Peter Rolf
Chad Jones
Richárd Nagyfi
Jason Hise
Phil Moyer
Shevis Johnson
Erik de Bruijn
Alec Johnson
Clemens Arbesser
Ludwig Schubert
Bryce Daifuku
Allen Faure
Eric James
Qeith Wreid
Jonatan R
Ingvi Gautsson
Michael Greve
Julius Brash
Tom O'Connor
Robin Green
Laura Olds
Jon Halliday
Paul Hobbs
Jeroen De Dauw
Lupuleasa Ionuț
Tim Neilson
Eric Scammell
Igor Keller
Ben Glanton
anul kumar sinha
Sean Gibat
Cooper Lawton
Will Glynn
Tyler Herrmann
Tomas Sayder
Ian Munro
Jérôme Beaulieu
Nathan Fish
Taras Bobrovytsky
Anne Buit
Vaskó Richárd
Sebastian Birjoveanu
Euclidean Plane
Andrew Harcourt
DGJono
robertvanduursen
Dmitri Afanasjev
Marcel Ward
Andrew Weir
Ben Archer
Kabs
Miłosz Wierzbicki
Tendayi Mawushe
Jannik Olbrich
Anne Kohlbrenner
Jussi Männistö
Wr4thon
Martin Ottosen
Archy de Berker
Marc Pauly
Andy Kobre
Brian Gillespie
Poker Chen
Kees
Darko Sperac
Truls
Paul Moffat
Anders Öhrt
Marco Tiraboschi
Michael Kuhinica
Fraser Cain
Robin Scharf
Seth Brothwell
Kasper Schnack
Klemen Slavic
Patrick Henderson
Oct todo22
Melisa Kostrzewski
Hendrik
Daniel Munter
Graham Henry
Duncan Orr
Bryan Egan
Robert Hildebrandt
James Fowkes
Alan Bandurka
Ben H
Tatiana Ponomareva
Michael Bates
Simon Pilkington
Dion Gerald Bridger
Petr Smital
Daniel Kokotajlo
Fionn
Yuchong Li
Diagon
Parker Lund
Paul Emmerich
Russell schoen
Andreas Blomqvist
Bertalan Bodor
David Morgan
Jeremy
Ben Schultz
Zannheim
Daniel Eickhardt
lyon549
HD
Ihor Mukha
14zRobot
Ivan
Arne Strasser
Jason Cherry
Igor (Kerogi) Kostenko
Isaac Boates
Thomas Dingemanse
Davy Ker
Alexander Brown
Devon Bernard
Ted Stokes
James Helms
Matheson Bayley
/ robertskmiles

Пікірлер: 925

  • @atimholt
    @atimholt4 жыл бұрын

    “Are scissors technology?” Me: yeah, of course. “Most people would say no.” ¯\_(ツ)_/¯

  • @totally_not_a_bot

    @totally_not_a_bot

    4 жыл бұрын

    Those of us who watch these videos don't really qualify as most people.

  • @pauljs75

    @pauljs75

    4 жыл бұрын

    Even sticks can count as technology, if implemented as tools in some way. (Combination of tools and methods to achieve some goals. Usually making a task easier, or doing something else that improves conditions for the tool user.) Obviously such is not the latest and greatest technology, which seems to be the definition this video is going for.

  • @lucar6897

    @lucar6897

    4 жыл бұрын

    I also think of calculators as artificial intelligence...

  • @shayneoneill1506

    @shayneoneill1506

    4 жыл бұрын

    Yeah the part of my brain that did those anthropology units would never let me think scisors arent technology

  • @NoahTopper

    @NoahTopper

    4 жыл бұрын

    When I was a kid I definitely would have said no. But I remember at some point being taught that anything along the line of a pencil or chair was technology, and that sunk in. But I imagine a lot of people still have that initial instinct.

  • @NoahTopper
    @NoahTopper4 жыл бұрын

    "If you squint, the training process is sort of like a compiler." Totally brilliant statement.

  • @shouldb.studying4670

    @shouldb.studying4670

    4 жыл бұрын

    I had to squint AND tilt my head but I see what he means 🤣

  • @ZyTelevan

    @ZyTelevan

    4 жыл бұрын

    Code is data is code

  • @BlargVison

    @BlargVison

    4 жыл бұрын

    yeah that was a fantastic comparison that i won't forget

  • @kacperozieblowski3809

    @kacperozieblowski3809

    4 жыл бұрын

    I agree

  • @filipgara3444

    @filipgara3444

    4 жыл бұрын

    „No.”

  • @columbus8myhw
    @columbus8myhw4 жыл бұрын

    "Like, there's no point asking for feedback if you're already pretty sure you know what the answer is, right?" …Do you want me to answer that question?

  • @TheLifeInMotion
    @TheLifeInMotion4 жыл бұрын

    According to Strongbad: "Technology is anything that you don't understand how it works and if you break it you have to buy a new one."

  • @chrisdaley2852

    @chrisdaley2852

    4 жыл бұрын

    So retractable pens are technology. Got it.

  • @CH-bd6jg

    @CH-bd6jg

    4 жыл бұрын

    @@chrisdaley2852 pen goes in, pen goes out. you can't explain that! just buy a new one!

  • @columbus8myhw

    @columbus8myhw

    4 жыл бұрын

    Chris Daley I mean, yes… but also I was willing to say scissors are technology so maybe I'm not a good judge of these things

  • @OrchidAlloy

    @OrchidAlloy

    4 жыл бұрын

    @@chrisdaley2852 Yes they are

  • @diablominero

    @diablominero

    4 жыл бұрын

    So my desktop computer isn't technology because I built it and could replace a single broken component rather than the whole thing?

  • @FortoFight
    @FortoFight4 жыл бұрын

    If you think about it, this is a lot closer to how a human learns. A human won't constantly bug you for feedback every single time it does something, nor will it learn how to do something properly from a standardised function (e.g. exam mark schemes). A human will independently use its available knowledge, and occasionally ask for help when it's unsure what to do.

  • @dannygjk

    @dannygjk

    Жыл бұрын

    Do you have any children? Kids seek approval.

  • @owenpawling3956

    @owenpawling3956

    Жыл бұрын

    @@dannygjk no, but he is right. Kids are just unsure more often.

  • @Nico-ur2po

    @Nico-ur2po

    Жыл бұрын

    @@dannygjk You don't correct a kid every time they talk using improper grammar or mix up word order. You correct them every now and then, and they learn over time combined with observing how other humans talk.

  • @dannygjk

    @dannygjk

    Жыл бұрын

    @@Nico-ur2po I didn't, (I have two kids).

  • @discosteve
    @discosteve4 жыл бұрын

    Your point still stands, but neverless the scissors have a butt load of tech in the background that us normies aren't aware of (material science). Just wanted to mention that the humble pair of scissors deserves some praise.

  • @DDvargas123

    @DDvargas123

    4 жыл бұрын

    I was thinking the same thing. We take for granted a lot of the cool tech around us all the time. Levers and Pulleys and other simple machines most of all. But rob makes a good point that people dont commonly think of them as tech even though perhaps they should. Language is a cruel mistress.

  • @infinummjb

    @infinummjb

    4 жыл бұрын

    scissors are relatively low-tech, but a tech nonetheless.

  • @columbus8myhw

    @columbus8myhw

    4 жыл бұрын

    Would you consider a scissors company a "tech company" the same way you'd consider Apple and SpaceX tech companies? What about post-its? Is 3M a tech company?

  • @DDvargas123

    @DDvargas123

    4 жыл бұрын

    @@columbus8myhw 3M's company description is literally: "applies science and innovation to make a real impact by igniting progress and inspiring innovation in lives and communities across the globe." That sounds really tech company to me

  • @RobertMilesAI

    @RobertMilesAI

    4 жыл бұрын

    I think if you took someone to a scissors factory and showed them all the machines and equipment of the production line, they'd call that technology. But not so much the scissors themselves

  • @Henrix1998
    @Henrix19984 жыл бұрын

    I can already imagine the Indian ML farms where thousands of people just evaluate learning

  • @TurkishLoserInc

    @TurkishLoserInc

    4 жыл бұрын

    Sounds a lot like the premise for The Matrix. "On a scale of 1-10, how real do you think this is?"

  • @Encypruon

    @Encypruon

    4 жыл бұрын

    It's called Amazon Mechanical Turk.

  • @Verrisin

    @Verrisin

    4 жыл бұрын

    Damn, that actually sounds likely... - Here is my idea: since AI will take all our jobs... There will be one job of the future: *Specifying preference.* - I actually don't hate it. :D

  • @Verrisin

    @Verrisin

    4 жыл бұрын

    ... thinking about it: It kind of is the ideal job, isn't it? Do we, as humans, even want to do anything more than that? - Our job will be saying what we want in the world, and how we want things to work... It will even work as a voting mechanism for policies since they will be run by AI - that figures out how to best match our preferences... - I think this is the way... (or at least a good direction for now ^^)

  • @benalias5766

    @benalias5766

    4 жыл бұрын

    I can already imagine a complex AI which is surprisingly good at a wide variety of tasks... and turns out to have hired a load of people in India to do its work for it.

  • @Macieks300
    @Macieks3004 жыл бұрын

    "in a later video" well... see you in 3 months then

  • 4 жыл бұрын

    This channel always worth the wait :)

  • @griest5493

    @griest5493

    4 жыл бұрын

    IKR, what a tease.

  • @MatthewStinar

    @MatthewStinar

    4 жыл бұрын

    You can't rush this kind of quality! Do you know how long it takes to read and digest all those research papers?

  • 4 жыл бұрын

    ... almost there.

  • @Macieks300

    @Macieks300

    4 жыл бұрын

    @ to be fair Robert was on Computerphile in the meantime kzread.info/dash/bejne/ZWWmt4-Pqqmbp9o.html

  • @TheMan83554
    @TheMan835544 жыл бұрын

    The thing about your channel is the little touches of 4th wall humour. Having backflip you say "wait I don't have to do a backflip?" Was brilliant.

  • @riccardoorlando2262
    @riccardoorlando22624 жыл бұрын

    So in a couple years captchas will be reward predictor training? "Which of these is the better shoe design"?

  • @toxicpsion

    @toxicpsion

    4 жыл бұрын

    nah, i'd bet they do it already; just more subtly than that.

  • @LoveScreamTrue

    @LoveScreamTrue

    4 жыл бұрын

    @@toxicpsion Like Google CAPTCHA? - "Select all traffic lights"

  • @johnnymellon7414

    @johnnymellon7414

    4 жыл бұрын

    "Select all the pictures with Sarah Connor in them" ... wait what?

  • @z-beeblebrox

    @z-beeblebrox

    4 жыл бұрын

    @@LoveScreamTrue Except it'll become "Select your favorite traffic lights"

  • @stribika0

    @stribika0

    4 жыл бұрын

    Which of these places do you prefer as a shelter during a robot uprising?

  • @Noxeus1996
    @Noxeus19964 жыл бұрын

    Definitely one of the best educational channels on KZread.

  • @zacharieetienne5784

    @zacharieetienne5784

    4 жыл бұрын

    hold on to your papers and i'll see you, next time!

  • @CynicatPro

    @CynicatPro

    4 жыл бұрын

    @@zacharieetienne5784 TwoMinutePapers is also super good X3

  • @hypebeastuchiha9229

    @hypebeastuchiha9229

    Жыл бұрын

    @@CynicatPro he sucks

  • @stefano8936
    @stefano89364 жыл бұрын

    Robert Miles: "what is technology?" Me: move the finger to calibrate the amount of video to skip Robert miles: "don't skip ahead" Me: humbly obey

  • @GrixM

    @GrixM

    4 жыл бұрын

    I feel betrayed because the next 5 minutes were just repetition of previous videos so I wish I had in fact skipped ahead.

  • @jnevercast

    @jnevercast

    4 жыл бұрын

    Yeah he got me too. I was about to skip just as he said don't skip. "Well okay!"

  • @Atariese

    @Atariese

    4 жыл бұрын

    The thing is... the question he poses after that leads me down that rabbit hole and away from his video... definitely not the intent i would say

  • @riperian8954

    @riperian8954

    2 жыл бұрын

    @@GrixM lol i did exactly what you and OP did, only I was like 'okay okay that's enough of that' after about 2 minutes. still a brilliant video overall though xd.

  • @sharkinahat
    @sharkinahat4 жыл бұрын

    I wouldn't mind an ad. YT trained me how to skip paid promotion.

  • @weirdal3333

    @weirdal3333

    4 жыл бұрын

    KZread Vanced vanced.app

  • @rr.studios

    @rr.studios

    4 жыл бұрын

    @@weirdal3333 lol im using this app rn

  • @HansLemurson

    @HansLemurson

    4 жыл бұрын

    What sort of reward function did you use?

  • @zeikjt
    @zeikjt4 жыл бұрын

    8:50 That backflip part was super enjoyable :D

  • @megajor232
    @megajor2324 жыл бұрын

    Whatcing your videos make me feel smart without actually having to be

  • @benalias5766

    @benalias5766

    4 жыл бұрын

    Sounds like you're gaming your reward metric.

  • @ephemeralvapor8064

    @ephemeralvapor8064

    4 жыл бұрын

    Maybe your evaluation of his teaching is: Good teacher = true Because he brings understanding lesser teachers could not in the same time and effort on your part.

  • @jessgold551
    @jessgold5514 жыл бұрын

    I have watched all of Robert's videos several times. Its perfectly paced, well considered and clearly communicated. There is so much there its interesting to watch, sleep on it, and watch again later to catch more. I also enjoy the presentation and multiple interesting ways of presenting things like word popups and cut to screen as well as some graphics and clips. If it helps with demographics I am a former software engineer and still work in I.T.

  • @MrCreeper20k
    @MrCreeper20k4 жыл бұрын

    17:25 Don't worry Robert, at least I don't mind an ad at the end. And if anyone should get that bread, it's you.

  • @the1gip
    @the1gip4 жыл бұрын

    You, sir, remain one of the most interesting educators in KZread. The effort you've put in to making this video watchable and entertaining really shows. There's not too many people I can watch for nearly 18 minutes in front of a beige backdrop and still be hooked.

  • @Felixkeeg
    @Felixkeeg4 жыл бұрын

    I am actually a bit dissappointed that you didn't go for the backflip lol

  • @ruvimlashchuk6134

    @ruvimlashchuk6134

    4 жыл бұрын

    My disappointment is immeasurable, and my day is ruined.

  • @ruvimlashchuk6134

    @ruvimlashchuk6134

    4 жыл бұрын

    My disappointment is immeasurable, and my day is ruined.

  • @Suush

    @Suush

    4 жыл бұрын

    He forgot to program a reward function :P

  • @igordmitriev7211
    @igordmitriev72114 жыл бұрын

    >We'll talk about them in a later video //Gets hyped, realises that it's the latest video on the channel, gets reminded of Patreon, enlists to see the video a bit sooner

  • @amyshaw893
    @amyshaw8934 жыл бұрын

    just replace the human with another ai, and get the human to rate that ai. not good enough? MOAR AI!!11!!

  • @DDvargas123

    @DDvargas123

    4 жыл бұрын

    It's AIs all the way down!

  • @thehypnotoad5184

    @thehypnotoad5184

    4 жыл бұрын

    Just make an AI trained on footage of people doing back flips, no need for human input Even if the AI is "only" 99% accurate it should be enough

  • @DDvargas123

    @DDvargas123

    4 жыл бұрын

    @@thehypnotoad5184 "footage of people doing backflips" IS human input

  • @thehypnotoad5184

    @thehypnotoad5184

    4 жыл бұрын

    @@DDvargas123 I mean the input already exist its just need to be collected, its kinda going full circle but it would be interesting to see if you can speed up the reward model that way

  • @rumplstiltztinkerstein

    @rumplstiltztinkerstein

    4 жыл бұрын

    @@thehypnotoad5184 but the ai will find ways to exploit it. Nothing stops us from giving the footage and having a human checking it from time to time telling it to stop using it's head as a catapult when the ai was supposed to be running

  • @OrioPrisco
    @OrioPrisco4 жыл бұрын

    Hey it's really cool for the viewers that you turned down that sponshorip offer, thanks

  • @dontfeo

    @dontfeo

    4 жыл бұрын

    Nah he should've taken it. U can skip it anyway and it would help him bring more content.

  • @Varue
    @Varue Жыл бұрын

    Humans being able to simulate problems in their head to predict different outcomes is one of their greatest strengths, it means they can be confronted with new experiences they haven’t evolved specifically for and come up with a solution from a list of possible solutions and stand a much greater chance of overcoming the problem without dying

  • @brendanjackman3600
    @brendanjackman36004 жыл бұрын

    "Hmm, reward functions are a limiting factor on some ML capabilities. This is a problem. How do we solve problems? WITH ML"

  • @DDvargas123

    @DDvargas123

    4 жыл бұрын

    Sometimes a solution is so good it can solve its own cons

  • @MichaelWBauer

    @MichaelWBauer

    4 жыл бұрын

    It's definitely funny when you frame it this way, but it's also interesting to note the similarity here with the brain. The brain is a system of interconnected neural networks which each are responsible for certain aspects of our thinking capabilities. It's not too hard to imagine the connection between the logical extension of the results in this video and the architecture of the human brain.

  • @default632

    @default632

    4 жыл бұрын

    @@MichaelWBauer Remember where the word neural network came from. Duh

  • @MatthewStinar

    @MatthewStinar

    4 жыл бұрын

    I think you're describing a Generative Adversarial Network. en.m.wikipedia.org/wiki/Generative_adversarial_network

  • @wilhem13
    @wilhem134 жыл бұрын

    A video upload ?? My day's already better. Great content my friend, THIS is why I don't watch TV anymore.

  • @arthurguerra3832
    @arthurguerra38324 жыл бұрын

    I've been so long without your videos. Please upload more frequently so we can drink your intelligence and knowledge.

  • @DamianReloaded
    @DamianReloaded4 жыл бұрын

    I would define intelligence as "the ability to autonomously identify problems and search for solutions to achieve goals"

  • @FrotLopOfficial
    @FrotLopOfficial4 жыл бұрын

    That last few minutes of your video will go unnoticed but for those who do, we very much appreciate it.

  • @briandoe5746
    @briandoe57464 жыл бұрын

    I am in a room by myself and I audibly cussed when I heard that openai and deepmind we're working together on something. Google's apparent lack of concern with safety is one of the reasons I want your videos sir

  • @daniellewilson8527

    @daniellewilson8527

    4 жыл бұрын

    Brian Doe why is two AIs with different modes of thought working together a problem? Humans have different modes(parts of the brain specialized for different tasks) that combine the inputs from these disparate programs into a coherent idea of the world. Imagine trying to learn about your surroundings when the only sense you have is the ability to differentiate temperature and you will understand why certain AIs need others to help with things.

  • @briandoe5746

    @briandoe5746

    4 жыл бұрын

    @@daniellewilson8527 my main concern with AI is not the expediency that it gets to general intelligence. My concern with a i is the safety mechanisms and their capabilities when it gets to general intelligence. Google has multiple times proven to be unconcerned about the safety question in This is highly concerning

  • @explogeek
    @explogeek4 жыл бұрын

    Loving your videos, I understand it takes time to research and script and edit, but I wish they came out more often...

  • @dontyoufuckinguwume8201

    @dontyoufuckinguwume8201

    4 жыл бұрын

    The guy has a full time job, the only way to get him to make more videos is to donate ^^

  • @DarkPrject
    @DarkPrject4 жыл бұрын

    This continues to be one of the most interesting channels on KZread. Fascinating video. Can't wait to see the next one.

  • @geronimomiles312
    @geronimomiles312 Жыл бұрын

    You choose to tackle issues which really clarify the meat of the process , and do fantastic. Really good stuff👍

  • @Alex2Buzz
    @Alex2Buzz4 жыл бұрын

    Miles: "What is technology?" *VSauce music*

  • @ohokcool

    @ohokcool

    4 жыл бұрын

    Did u go to Palms Middle?

  • @StromyYTA
    @StromyYTA4 жыл бұрын

    These videos are awesome. Feel almost like I can keep up to date with AI progress.

  • @n4th4ni3lmc5
    @n4th4ni3lmc54 жыл бұрын

    Awesome explanation and sounds like great progress in the field! Thank you very much, sir.

  • @cmoxiv
    @cmoxiv4 жыл бұрын

    Mate, you are brilliant. Great content with a philosophical flavour. The last part about Patreon is probably the only thing that actually convinced me about supporting content creators on Patreon. Well done mate. Well done.

  • @morkovija
    @morkovija4 жыл бұрын

    Been a long time Rob! Hope you brought the sauce!

  • @non_complete

    @non_complete

    4 жыл бұрын

    I agree wholeheartedly with your name.

  • @wilhem13

    @wilhem13

    4 жыл бұрын

    Most videos I MUST watch them on, at least x1.25.

  • @morkovija

    @morkovija

    4 жыл бұрын

    @@wilhem13 means that your content information density is quite high. No way I can speed up mathologer for example. But easily 2-3x some non-narrated restoration videos

  • @crypticnomad
    @crypticnomad4 жыл бұрын

    When people ask me what AI is I generally say that it is a universal function approximator.

  • @Metrolonx
    @Metrolonx4 жыл бұрын

    Love how the video quality grows with every video! Keep it up!

  • @frib75
    @frib754 жыл бұрын

    An amazing video. Never heard such a beautiful explanation of what reinforcement learning is. Thank you !

  • @wiktormigaszewski8684
    @wiktormigaszewski86844 жыл бұрын

    This is what I always thought of making a good robot - you give a feedback to it, while it learns, just like parents to a child. Very good, that this concept has been put into practice. It is definitely going to be helpful for AI companies making robots for their clients, who do not know exactly, what they need. The guy from "two minute papers" would say "what a great time to be alive!" :-)

  • @reneko2126

    @reneko2126

    4 жыл бұрын

    Yeah, why not just raise AI like kids? kzread.info/dash/bejne/l5WNq7dvibvYY9o.html

  • @circle688

    @circle688

    9 ай бұрын

    what a time to be alive

  • @Telhias
    @Telhias4 жыл бұрын

    With regards to puppeteering the robot to perform a backflip. There is a whole community of the Toribash game who do exactly that. It is a game in which every time period (measured in ms) you decide which joints to flex, extend, hold rigid and relax.

  • @rosborr4330
    @rosborr43304 жыл бұрын

    I subbed because you knew I'd skip ahead the moment you said 'What is technology?'. You win this round, Robert.

  • @esquilax5563
    @esquilax55634 жыл бұрын

    Good to see you on here again! You have some of the most fascinating content on KZread

  • @NoahTopper
    @NoahTopper4 жыл бұрын

    12:19 I approve very greatly of your use of "eachother" as one word. The world needs this change. I don't know if you and I talked about this at all at the EA Hotel, but I've been trying to convince everyone to write it like that.

  • @squirlmy

    @squirlmy

    4 жыл бұрын

    I started to do that, but "spell correct" too often comes on and I've gotten used to following automated corrections. I'm wondering if automated (or even AI writing assistants) will slow the evolution of language and grammar, and perhaps even pronunciation will remain in stasis not because of any changing dialect cues of social status, origin (or adopted location), or otherwise, but because of how our "correcting" algorithms are programmed in communication devices.

  • @qwertyTRiG

    @qwertyTRiG

    4 жыл бұрын

    @@squirlmy You've reminded me that I really need to create a dictionary with Oxford Spelling (en-GB-oed).

  • @discipleoferis549

    @discipleoferis549

    4 жыл бұрын

    I've been writing "eachother" for 15 years now. I've even told off some of my English teachers for trying to correct me. Heck... I remember back in 6th grade, I think, telling off my teacher for incorrectly correcting another student that had written "ain't". I was an opinionated 11-year-old, haha.

  • @NoahTopper

    @NoahTopper

    4 жыл бұрын

    @@discipleoferis549 I told my high school English teach that I was attempting to turn "eachother" into one word, and if she'd be willing to not mark it wrong when I used it. She was super on board.

  • @qwertyTRiG

    @qwertyTRiG

    4 жыл бұрын

    @@NoahTopper It definitely makes sense. Similarly, I tend to distinguish between "alright" (acceptable) and "all right" (completely correct).

  • @mrWade101
    @mrWade1014 жыл бұрын

    Scissors would be Old technology, whilst when most people say Technology they mean New technology.

  • @sjeses
    @sjeses4 жыл бұрын

    Absolutely fascinating. Thank you for putting in all the time and effort to introduce me to all these ideas in such an effective way.

  • @bscutajar
    @bscutajar4 жыл бұрын

    This is one of the best channels of youtube. The guy's explanations are extremely well done.

  • @firefoxmetzger9063
    @firefoxmetzger90634 жыл бұрын

    hmm. If samples are chosen based on unusual examples where the ensemble disagrees, what happens if the exploiting strategy has high agreement among members of the ensemble? It would never show up to the human for "correction" right, because the ensemble is confident about it? So rather then having to trust the network that performs the task, we now have to trust the ensemble training the reward function?

  • @MatthewStinar

    @MatthewStinar

    4 жыл бұрын

    I was thinking you would still want to throw in some strong matches just to verify.

  • @xxThabaxx
    @xxThabaxx4 жыл бұрын

    This is something I've been thinking a lot about as it could work similarly to how we tend to train children. It seems like you could first train a machine learning algorithm to recognize social cues (lingual and physical responses) regarding it's behavior and build a reward function based on that. I think you still run into some complicated reward hacking situations like the machine wanting to force certain reactions. But it seems like it would get us closer.

  • @eathonhowell7414

    @eathonhowell7414

    Жыл бұрын

    This way of thinking is exactly what's getting me interested in this field. I cannot help but feel there is a comparison to be made between the in-exact nature of child raising, and trying to "teach" artificial intelligence. General or otherwise. Hell, think of an individual cell within the body as an AGI and the totality of what humans are seems like a miracle.

  • @eathonhowell7414

    @eathonhowell7414

    Жыл бұрын

    This way of thinking is exactly what's getting me interested in this field. I cannot help but feel there is a comparison to be made between the in-exact nature of child raising, and trying to "teach" artificial intelligence. General or otherwise. Hell, think of an individual cell within the body as an AGI and the totality of what humans are seems like a miracle.

  • @gwen9939

    @gwen9939

    Жыл бұрын

    @@eathonhowell7414 You should probably watch the video called Why not just Raise AI like Kids.

  • @mgostIH
    @mgostIH4 жыл бұрын

    I feel like watching your videos are my reward function by now, can't wait for more!

  • @bensonmiakoun7674
    @bensonmiakoun76744 жыл бұрын

    Highly interested for the next video! Thanks

  • @EU_DHD
    @EU_DHD4 жыл бұрын

    I like watching you talk about AI safety more than I like learning about AI safety. And I really like learning AI safety!

  • @unvergebeneid

    @unvergebeneid

    4 жыл бұрын

    Shade much? So you're not learning AI safety by watching him talk about it?

  • @EU_DHD

    @EU_DHD

    4 жыл бұрын

    @@unvergebeneid Those are two aspects of the same thing. I just like the one aspect more than the other.

  • @BinaryReader
    @BinaryReader4 жыл бұрын

    Technology is just another word for "Tool". Everything created by humans of some utility is a tool, and is therefore technology. I wasnt aware there was confusion around the definition.

  • @oldvlognewtricks

    @oldvlognewtricks

    4 жыл бұрын

    Queueing was created by humans and is of some utility. Queueing is not technology. Stand-up comedy was created by humans, and is of some utility. Stand-up comedy is not technology. It is difficult (or perhaps impossible) to write a definition that doesn’t raise exceptions, which I suspect was the point Robert was trying to make. Your example only confirms the point.

  • @BinaryReader

    @BinaryReader

    4 жыл бұрын

    Not to get into a huge discussion here, but both of those could be loosely defined as technologies. What are jokes if not tools of social interaction? What is queuing if not a tool for social order (assuming you mean standing in line and not the computer science definition, which is also a technology)

  • @oldvlognewtricks

    @oldvlognewtricks

    4 жыл бұрын

    @@BinaryReader I continue to agree, and disagree. A joke and a queue might be tools, but 'technology' is more of a push. technology /tɛkˈnɒlədʒi/ - noun the application of scientific knowledge for practical purposes, especially in industry. "advances in computer technology" machinery and equipment developed from the application of scientific knowledge. "it will reduce the industry's ability to spend money on new technology" the branch of knowledge dealing with engineering or applied sciences. There is perhaps some science to comedy, but a social convention like queueing is hardly an application of science, so much as an emergent social expediency, or whatever. I'm not getting 'engineering' from either, except in the loosest sense. Alternatively, to take the definition to its logical conclusion, all human action is technology and the definition loses its usefulness. But you're right - no potential for confusion whatsoever ;) At best, there is comparative 'technology-ness' - a joke might be technology, but it's less technology than a smartphone. Maybe moreso than a punch to the face. Maybe it depends on context. Still works to make the 'this is not straightforward to define' point.

  • @squirlmy

    @squirlmy

    4 жыл бұрын

    @@BinaryReader Perhaps it's an Americanism, but there's another definition of "tool", and you're well on your way towards demonstrating it. Both of you actually, because none of us need or want an in depth discussion of the definitions of either word. Rob's brief mention of it doesn't warrant further commentary.

  • @drdca8263

    @drdca8263

    4 жыл бұрын

    Rob’s definition kind of closely matches Strong Bad’s definition, of “anything that’s really cool and you don’t know how it works”. Ryan North’s definition includes language, and I think basically any technique which has been invented. But yeah, like Rob says, it isn’t a big deal how we define it. Slightly different definitions can can be used in different social circles, or even in different conversations among the same people.

  • @zachkrakower172
    @zachkrakower1724 жыл бұрын

    Dude these videos are awesome. Thank you for taking the time to educate all of us!

  • @daviddawkins
    @daviddawkins4 жыл бұрын

    Incredibly well presented and articulate, thank you.

  • @Laborejo
    @Laborejo4 жыл бұрын

    "It is easier to write a program to evaluate a solution". This is also why artificial music composition does not produce even half-decent outcomes yet. Creating an artificial listener (or many of them) is still far down on the to-do list.

  • @postvideo97

    @postvideo97

    4 жыл бұрын

    There have been no research (that I know of) that uses human reward modeling for music generation. It could be the next breakthrough in music generation!

  • @Sceleri

    @Sceleri

    4 жыл бұрын

    this method could work for that tho you just tell it which beat is more fire

  • @ToriKo_

    @ToriKo_

    4 жыл бұрын

    Sceleri exactly

  • @dasc000

    @dasc000

    4 жыл бұрын

    emily howell: hold my beer

  • @JsbWalker

    @JsbWalker

    4 жыл бұрын

    Have none of you heard of Emily Howell?

  • @jayteegamble
    @jayteegamble4 жыл бұрын

    meh, we don't mind a 60 second spiel if it gets us more of your awesome content (and we can skip forward anyway). Grab that bag imo

  • @diribigal

    @diribigal

    4 жыл бұрын

    This is a tough problem since watching to the end is probably valued by KZread's AI, and even though you and I wouldn't mind, some would. So how do the short term gains of the sponsorship compare to the long term dividends of the KZread algorithm and extra subscribers, which increase visibility over time (perhaps by a minor amount) ?

  • @sevret313

    @sevret313

    4 жыл бұрын

    @@diribigal That's why you don't put the sponsor at the end, but the start.

  • @V1ctoria00
    @V1ctoria004 жыл бұрын

    Damn. I dont usually find a new channel by its latest video. I was hoping I could binge this topic here.

  • @fergochan
    @fergochan4 жыл бұрын

    Great video, but there's still one thing I'm confused about: how do I tell if that simulated robot is doing a back flip or a front flip?

  • @xenoblad
    @xenoblad4 жыл бұрын

    You've been playing Raid: Shadow Legends for 10 years?!

  • @hypnotourist
    @hypnotourist4 жыл бұрын

    Very clear presentation for a fascinating topic ! Your "patreon/human discussions" reward function has trained you well, so to speak :-)

  • @haldir108
    @haldir1084 жыл бұрын

    I am EAGERLY awaiting that video about self-teaching or whatever it is.

  • @dsdy1205
    @dsdy12054 жыл бұрын

    When you realise you've reinvented the parent-child relationship

  • @AugustusBohn0

    @AugustusBohn0

    3 жыл бұрын

    nature wins again

  • @dsdy1205

    @dsdy1205

    3 жыл бұрын

    God coming back to this comment a year later it sounds so stupid

  • @BubbleManxx
    @BubbleManxx4 жыл бұрын

    I laughed at the Vsauce reference.

  • @Hexanitrobenzene

    @Hexanitrobenzene

    4 жыл бұрын

    Could you provide a timestamp ? Looks like I missed it.

  • @BubbleManxx

    @BubbleManxx

    4 жыл бұрын

    @@Hexanitrobenzene Lol, it's at the very start of the video. When he pops up from the lower half of the screen and asks "What is technology?".

  • @Hexanitrobenzene

    @Hexanitrobenzene

    4 жыл бұрын

    @@BubbleManxx Oh, that one :) Looks like I'm rusty on VSauce, haven't watched him in awhile...

  • @andersenzheng

    @andersenzheng

    4 жыл бұрын

    @@Hexanitrobenzene Not your fault. There hasnt been one for a while

  • @maloxi1472
    @maloxi14724 жыл бұрын

    Thank you for bringing this idea to my attention ! Holy cow ! This is such a simple, yet beautiful idea !

  • @panstromek
    @panstromek4 жыл бұрын

    This is really on point for a problem I am trying to solve now. I do some computer vision for which it is way too complicated to create training data and way too complicated to write reward function, but it's the "You know it, when you see it" type of thing. Thanks for making this video ;)

  • @sk8rdman
    @sk8rdman4 жыл бұрын

    "Mattresses and VPNs." Someone watches SmarterEveryDay

  • @bencrossley647
    @bencrossley6474 жыл бұрын

    This sounds like a method to solve NP problems. Easy to verify Hard to solve.

  • @4.0.4

    @4.0.4

    4 жыл бұрын

    The year is 2069. A computer is granted the prize for solving the P vs NP problem. Despite the judges being unable to confirm that the overly-complex thesis the computer came up with was correct or not, it looked quite correct to all experts. A mathematician was quoted saying: "...I mean, in the two new branches of mathematics that the computer invented, the math does check out." It is unknown what the computer will do with the prize, but several paperclip factories report being contacted shortly after the prize money was deposited.

  • @bencrossley647

    @bencrossley647

    4 жыл бұрын

    Chrysippus +1 for paperclips (assuming you’re referencing the game) It will work it’s way to a galactic army at some point.

  • @Kevin________

    @Kevin________

    4 жыл бұрын

    @@4.0.4 Alright... you win this comment section.

  • @griest5493

    @griest5493

    4 жыл бұрын

    I was thinking the same thing when he said that. Also, the halting problem is a thing. The catch is that NNs are just making approximations.

  • @default632

    @default632

    4 жыл бұрын

    @@4.0.4 universalist paperclips, hours of waste time for a reference on the interwebs. Worth it

  • @mygreenlama
    @mygreenlama3 жыл бұрын

    Thank you for another great video! I am very much looking forward to the continuation ;)

  • @gus2747
    @gus27473 жыл бұрын

    "If you squint the training process is sort of like a compiler " --- great sentence!

  • @DigitalicaEG
    @DigitalicaEG4 жыл бұрын

    "Don't skip ahe..." Me: **skipping**

  • @Deez-Master
    @Deez-Master4 жыл бұрын

    We are getting close to having P=NP

  • @governmentofficial1409

    @governmentofficial1409

    4 жыл бұрын

    Silicon Valley spoiler

  • @augustinaslukauskas4433
    @augustinaslukauskas44334 жыл бұрын

    I'm not surprised this result is amazing considering both OpenAI and DeepMind worked on it. I dream of working for one of them after uni. Thank you for explaining the paper so clearly and in an entertaining way!

  • @sam-you-is

    @sam-you-is

    Жыл бұрын

    did you make it sir

  • @spaceyfounder5040
    @spaceyfounder50404 жыл бұрын

    Oh my gosh, can't wait for the next video!

  • @realityChemist
    @realityChemist4 жыл бұрын

    "How do you learn when there's nobody who can teach you?" Read a textbook or a WikiHow article?

  • @Vode_ika

    @Vode_ika

    4 жыл бұрын

    That is someone teaching you, via a book.

  • @realityChemist

    @realityChemist

    4 жыл бұрын

    @@Vode_ika True, I was thinking in the context of someone sitting there teaching you, like in this video. So I guess the answer is just unsupervised learning? Although I could have sworn Rob already did a video on that... Maybe it was someone else on Computerphile?

  • @drdca8263

    @drdca8263

    4 жыл бұрын

    Isn’t the answer “think very hard, write things down, and when you can do so safely, try many options, test your previous ideas both by the results of the options you took and by more thinking, repeat”?

  • @Biped

    @Biped

    4 жыл бұрын

    @@drdca8263 but that all requires some way of evaluating your results (aka having a reward function that teaches you)... It seems weird that there would be a way without that. I mean... the information has to come from somewhere...

  • @SimonBuchanNz

    @SimonBuchanNz

    4 жыл бұрын

    I would suspect the answer is, in fact, something like googling it, but this, of course, requires a pretty complete internal model of the world to start generating and testing against your own predictions. I'm struggling to think of alternatives that aren't just this in disguise though: the best I have is looking at a small set of successful examples and trying to break down from the solution used what the problem is, so you have something to test your own solutions against. If there's a decent way to describe that that isn't going to fall prey to small training data issues like overfitting, I'm excited: that's starting to really sound like the casual meaning of learning!

  • @roberthoople
    @roberthoople4 жыл бұрын

    "Training AI Without Writing A Reward Function..." *Capitalism Drools*

  • @MatthewStinar

    @MatthewStinar

    4 жыл бұрын

    Watching this video made be realize how much corporations are like poorly programmed artificial intelligence, like the stamp collecting AI that decided to "Kill all humans." We take our instrumental goal of maximizing profits and assign that as the corporation's terminal goal. In pursuing it's terminal goal of maximising profits, the corporation decides to "Kill all humans." 😲

  • @bibasniba1832
    @bibasniba18324 жыл бұрын

    Priceless knowledge, swift explanation. Bravissimo!

  • @ZachAgape
    @ZachAgape4 жыл бұрын

    The first videos I saw u in were the computerphile videos on AI which I enjoyed a lot, and thanks, this video was very interesting too! Also thank you for not wanting to waste 60 seconds of our time ^^

  • @Karpata1
    @Karpata14 жыл бұрын

    Hey if I have to hit the "L" button a couple times so you can get a couple hundreds or even a couple thousands of pounds I'm fine with it.

  • @davidm.johnston8994
    @davidm.johnston89944 жыл бұрын

    Great video, can't wait for the next one!

  • @gabrote42
    @gabrote423 жыл бұрын

    These are brilliantly designed! I want more!

  • @Kobriks1
    @Kobriks14 жыл бұрын

    Excellent explanation! Thank you.

  • @AsmageddonPrince
    @AsmageddonPrince4 жыл бұрын

    Your voice is so soothing, and videos so informative.

  • @rewrose2838
    @rewrose28384 жыл бұрын

    Lovely stuff here, very clear explanation and great use of the graph

  • @weirdsciencetv4999
    @weirdsciencetv4999 Жыл бұрын

    This channel is so underrated. I had to do just what he proposes in one of my experiments in college. The technique most definitely works!

  • @orcu
    @orcu4 жыл бұрын

    I liked this explanation very much. Great work!

  • @tedstokes57
    @tedstokes574 жыл бұрын

    I like that there's a hint about the next video at the end

  • @alexlamson
    @alexlamson4 жыл бұрын

    Excellent work Rob, this is a good one. I think there's some content potential in reviewing these big RL papers

  • @injinii4336
    @injinii43364 жыл бұрын

    Surely scissors are an example of some of our most cutting-edge technology. Ba-dum-tss!

  • @andybaldman
    @andybaldman4 жыл бұрын

    *Woohoo, new vid. You do not post enough! Love all of your vids. You should post more often.*

  • @AlejandroPiad
    @AlejandroPiad4 жыл бұрын

    This is the first of your videos I see, and you almost got my subscribe with the first philosophical half, but the second half was plain brilliant, so you definitely got my subscribe now.

  • @DahVoozel
    @DahVoozel4 жыл бұрын

    Fascinating stuff as always.

  • @Havermeijer
    @Havermeijer4 жыл бұрын

    Your videos made AI an accessible topic for me. I love the pure logic and game-like thinking.

  • @adryncharn1910
    @adryncharn1910 Жыл бұрын

    This was highly interesting, thank you!

  • @stephen-torrence
    @stephen-torrence4 жыл бұрын

    Closest thing to a literal "bicycle for the mind" I've seen in AI research. Cool!

  • @MrKohlenstoff
    @MrKohlenstoff10 ай бұрын

    Great video, super well and clearly explained! 👌

  • @lukasmrazik3485
    @lukasmrazik34854 жыл бұрын

    Quite a good chance I will talk about this in my master's State final exams. Thank you, sir, for saving a lot of my time!

  • @marceloprado2035
    @marceloprado20354 жыл бұрын

    Your content is really great, thanks for it!

  • @MidnightSt
    @MidnightSt4 жыл бұрын

    ...i don't know much about this area of IT, but the first thing that came to my mind after reading the video title was: "oh, yeah, what's a better idea than creating a black box that nobody knows how and why it works, and what its boundary conditions actually are? why, yes, creating such a black box without even explaining to it what is good and what is bad! BRILLIANT!"

Келесі