We train an Artificial Intelligence with Reinforcement Learning to play the game Trackmania Nations Forever, and post videos showcasing the progressive improvement of our A.I.
This channel is a collaboration between pb4 (github.com/pb4git) and Agade (github.com/Agade09).
You can contact us at the address pb4videos (at) gmail.com, via our github, or on Discord (server: discord.gg/tD4rarRYpj and channel: discord.com/channels/847108820479770686/1150816026028675133).
Пікірлер
'It took players 9 years to realize they could do this trick...' -Ai literally within 24 hours with no human input 'Hey look what I found'
Why dont you make a program, that lets us all make a giant computer cluster to train the ai?
They said the AI played 20hz I play 10
omg ahahah 14:13 I started lauging aloud the same before agent smith omg hahahah
New here! Love the content, keep it coming!
As someone who creates maps, but isn't a stellar driver, one of the issues I run into is building around incorrect racing lines. I may build turns that will be taken a different way by a better player resulting in far more speed than expected on the exit. This causes the flow to be completely wrong and the maps to break. In these times I often wish I could have access to more strong players to test my map so I could build around these issues. With this AI, theoretically I could have it run on the map instead and learn from how it drives my maps. I could then make improvements and increase my map quality. Amazing work and excited for the future
First I know I complained a bit, but I did like and subscribe so I hope we hear more about how your AI does, I can't drive those races though I did try a few as long as coming in like 10000 place or worse is an achievement :) Later.
OK, very interesting. I have to ask, why are you trailing so much like even in the last race basically right from the start you are already losing and nothing has happened, it seems you must be missing something to always come out at the start behind and then you have to speed up. Also when you know a shortcut exists and that using it on lap two is bad because you will lose time why can't you teach the ai to use it on lap three because now it would be faster, maybe, must be since you ended up behind, I get how wheels on the track accelerate the car and wheels in the air do nothing, but still, in the race you won, you still had to come from behind when you could have been a touch slower but farther down the track and had a bigger lead on the human. I know this is teaching it how to measure best performance, but I bet you could teach it, maybe it needs to try jumps that it will fail then it could rank the jump undoable at speed x so it doesn't try and jump and when it has more speed it considers it and learns it's still slower so it marks the jump at that speed as also a bad choice and on lap three when it's going even faster it tries again and sees the better result. I get that this might be hard in your algorithm or web of choices but you seem smarter than me and I would just like to see the AI really kick human asses even harder, though I do have to dislike the use of the AI your using :)
Wait you're saying an AI can't learn as fast as a human, that seems like just maybe your code is fucked in the head, can't the ai review the track design. Then guestimate the best speed he could drive through the area then figure out how good a bounce would be at let's say 20 points around the curve and just try those 20 spots, and see which one is the best then try 20 spots around the known current best and try again, if none are better is has the best one it can know, I mean how does the human player hit every wall ram/crash drive out at speed, for every corner, obviously an AI should be able to try and test these 24 hours a day and find the route?
Wonder how hard it would be to train it to get vertical setups by itself and start noseboosting all over the place. PS I think the setup is far harder to train than the noseboosts themselves
insane video, well done.
In the video you give rewards to the agent for following the line of the course. How do you initialize the line? Is it just defined by the track itself or do you hardcode a line for every track yourself? If you do hardcode it, how do you hardcode it? Then how do you assign the reward? Is it just based on distance to the line? Because wouldnt that mean that if it moves backward that it still gets a reward for being close to the line. How do you make sure it moves forward with your reward function?
What happens if you add a multiplier for the reward for reaching mini checkpoints out of order?
bro's gonna accidentally create AGI while trying to beat records in a video game
Shortcuts are found via analysis from the freecam so maybe you need to give the AI some sort of blueprint of the map
Train it on Deep Dip 2
Good stuff.
Do you see, Mr. Anderson?
Investing in your channel from 13k. Big things ahead
what if the progress line was also part of the ai network? now that the distance to the line also has an impact on the reward, maybe giving the ai the ability to modify the line will make it possible for even better optimized race lines.
You sould learn the AI to overcome Deep Dip 2, that would be amazing if that is even imposible 🤩
our wives - rollerating our obsession
wait til this guy learns about tasing
Out of curiousity have you tried using a dynamic utility score? For example on a full speed map, the AI would start almost exclusively maximising for top speed/acceleration, and then as training progresses the utility score smoothly changes to maximise for track progression rate. Like one of the things that seams 'cheaty' with these AI's is changing the variables intentionally (like making prioritise path seperation) makes it seam less genuine. But I don't see why you couldn't have it run a few smaller populations with different priorities initially: like top speed, acceleration, checkpoint seperation or even imitating current WR (as basically all real players do) and you slowly change each indivigial populations towards simply finishing the track quickly you also grow or shrink the populations based rate of improvement+current performance. Once things start to stabalise, ofcourse. Which whilst obviously adding some computational requirments would mean the AI, whilst still requiring training on any indivigual map, could face a wide bredth of maps and optomise appropriately without a human needing to tweak the variables a little. I guess the coolest AI is one which after an initial training session, can beat WR on maps with a reasonable amount or real time further training on that specific map. Maybe a 100 hours equivelant or so.
Impressive stuff. But what would be truly amazing is an AI trained on all existing maps that can take on any new map without retraining.
Cool, do more
My idea: Instead of rewarding it just on its distance from the ideal path, you should also reward it for following the direction and speed of the ideal path. 1) Get the cos of the angle between the path and the AI's velocity: cos(theta) 2) Get the difference in speed between the path and velocity: abs( |path| - |velocity| ) So, something like this: cos(theta) - abs( |path| - |velocity| ) Essentially, the path shouldn't be a set of points but a set of vectors. Edit: I think there's some flaws with this: 1) The AI has less freedom to find a better path. 2) Knowing the ideal path before training beats the entire point of training the AI BUT: 1) During training, you can set the "ideal path" to the path the best AI follows (basically a reward system that adjusts itself during training)
I love that unlike TAS you Don't need to find it by your self and I believe AI can learn/find anything like overwalls, nose dances and especially uber bugs! how it got wallbangs is incredible!! maybe it can discover some new way to get less time
Awesome content with the right amount of technical depths into AI technology, keep going!
I would love to see this AI try to complete deep dip one then try deep dip 2
Only 5 minutes in and I can already tell this is a great video, almost on the level of Wirtual’s videos which is something the majority of Trackmania content creators struggle to accomplish. Be proud for what you have created and strive to make more content like this❤
"The neural network I use is not much bigger than a fruit fly's brain." I know that rats can drive, why haven't we tried flies yet?
If replays are just the input of the game. Can’t you train an ai from high score replays? And then have it be able to predict the best time to press the best input?
19:56 relatable AI
Microsoft asur can be a sponsor of your work, try contact them ;) good job!
Please make a technical video about the whole algorithm, modifications and training tricks you have used
This is neato, but from a tech perspective I definitely think a zero-shot AI is gonna be wayyyy cooler!
Bro what are u coding this on is it python or somthing else
Awesome work It’s so cool to see AI play video games like this great work
This is cool, but once every track has an AI topping the records it will feel like the game is over
You could have called every human WR car. "Mr. Anderson". :)
Absolutely mind blowing, it is Deepmind level stuff
Get the AI to beat Deep Dip 2.0 before anyone else, then I’ll be impressed.
awesome project! I remember seeing v1 ages ago and I just got caught up. I'm curious, is there a reason the image is still so heavily compressed? it seems like you're probably losing quite a lot of information this way.
The reasons are processing time (2x more pixels ~2x slower pass through the network) and RAM usage. We need ~64Go of RAM for ~1Million images at that resolution, if we had 2x more pixels we would need 2x more RAM.
Man, you are a true talent. First of all your video is so well made, this alone deserves applaus. Then there‘s your skill in programing AI. Im not an expert in this field but in my understanding this need quiet some knowledge to do. I really hope for you that you will use these skills in your career. I hope you all the best and looking forward to see a new update video 😁😁👍🏼
Hi you should try Deep dip
This is incredible.
I'm not sure the video creator knows what a win loss record is. He seems to ignore the fact people beat it then he needs to come back and tweak it.
I hope you see this. So it had a problem with the problem solving part of the trial map so isn't there a way to train it with a massive reward on completion and also the time taken to complete it. Obviously this wouldn't be enough cause of the whole local maximum and those things I don't get but can't you also give a reward on collecting checkpoints, or something to do with proximity to the checkpoint?
You could also try deep dip 1 or 2. Would love to see it. Amazing video!!!