A. I. Learns to Play Starcraft 2 (Reinforcement Learning)
Tinkering with reinforcement learning via Stable Baselines 3 and Starcraft 2.
Code and model: github.com/Sentdex/SC2RL
Stable Baselines 3 tutorial: pythonprogramming.net/introdu...
Neural Networks from Scratch book: nnfs.io
Channel membership: / @sentdex
Discord: / discord
Reddit: / sentdex
Support the content: pythonprogramming.net/support...
Twitter: / sentdex
Instagram: / sentdex
Facebook: / pythonprogramming.net
Twitch: / sentdex
#artificialintelligence #machinelearning #python
Пікірлер: 309
I have to say you make the most understandable learning materials Your website together to the videos. All the Code is there, the book, the playlists from scratch. Most professional educators can’t do this 🤗
I like how in depth this video is, really enjoyed it!
Never before has a marketing ploy worked so well on me. I'm looking forward to receiving the hardcover version of the book!
Very interesting idea with a macro ai and a strategic ai, sort of working in tandem forming a symbiotic relationship of sorts.. could maybe even break that down even further, like on a per unit type basis... Tho i imagine the complexity explodes at that point.
@sentdex
2 жыл бұрын
We have very few unit types, at least here. For the full game, there are more, and even here I wasn't utilizing all the things a voidray can actually do, but certainly there are ways to have a "voidray" algo and a "probe" algo...etc. Definitely something to think on.
@hikari1690
2 жыл бұрын
This sounds like how deepfakes work. Have 2 ai models compete with each other to improve each other. So if the macro ai needs to try to defeat the strategy and vice versa
@prodj.mixapeofficial6431
2 жыл бұрын
I believe dota have 5 controllable units, with individual open ai per unit, and modified communication between the 5 to mimic real human gameplay.
@Dethek
2 жыл бұрын
When I was looking into the AI for starcraft i was thinking of the following: Overarching AI - makes final decision on what action to take Supported by: Strategy AI - use training from professional replays to assess based on what player has seen, what is their likely strategy, and then choose strategy based on that Macro AI Micro AI
@TheFalconerNZ
2 жыл бұрын
@@ccriztoff Get his book lol ;-)
I love you idea of drawing your own minimap! Thats a smart way to make more information available easily.
I do not code or have the desire to code, but this video is beautiful. I enjoy StarCraft videos seeing people micromanage, but the thought and process that goes into creating a “program” to do the same thing is fascinating. The amount of work and work to obtain the knowledge that goes into the work is far underrated. I hope for you the best!
*__* One thing I feel is missing from the map, is a kind of "ghost" of where enemies have been seen previously, which could become "points of interest" for scouting in the future. The "ghosts" could "fade" over time, but never fade to zero again (caped at minimum 1, starting at like 255 or something), to make the algorithm prioritize the most recent ghost locations. Also, instead of scouting with void rays, wouldn't it be cheaper to scout with drones (to generate ghost areas) (scouting probably targets mineral areas without ghost, to see if enemies have expanded, while voidrays can scout areas WITH ghosts, to see if the enemies are still there & try to defeat them there, can also send a probe first to ghost area, to determine enemy strength before attacking).
@Lithane97
2 жыл бұрын
Better yet, just train an observer to scout ghosts, it's almost like they're made for that 👍 Wouldn't even require any logic really, just if ghost entity train observer and have it sit there all game.
@achtsekundenfurz7876
2 жыл бұрын
I can imagine some ways to refine the AI using more inputs: -- time elapsed since game started (there's hardly any risk of attack at all in the 1st minute, but at a late stage, the risk is much higher), -- current resource totals (letting resources sit in the "bank" is usually wose than expanding the economy or forces), -- # of "ghosts" on the map (where enemies were sighted and lost again). About rewards and penalties, I'd suggest the following: -- adjust the reward/punishment for victory/defeat: a "good" AI should aim for a quick victory, but not at all costs. Maybe set the victory reward to 24,000 / sqrt(seconds played) and cap at 1000 (i.e. don't reward any higher for games lasting
@tjw2469
2 ай бұрын
@@Lithane97 if there is a raven+cyclone/raven+viking/missile turret then its a dead observer
This was an interesting video. I will have a look at your example code for sure, wanna try to tinker a bit. Thanx for all your hard work.
Can’t get enough of learning this awesome stuff
Thanks for putting all of that together. Looks neat.
this is such a cool project! would love to see this keep going
14:49 That's some next-level Gateway placement.
I have played starcraft for years and years, and I love this channel. This is going to be great.
I'd love to see the next video in this series with dual macro and micro algorithms and improving the win percentage
Here is an idea: You can use more then 3 channels to give spatial information to your network. No need to limit yourself by conventional idea of 3 channels! If you are worried about how to visualize this, just think of it as an extra map.
Hey! I love the update here. I followed the original series you put out. As a SC2 veteran I noticed deficiencies and deviated in a strong way halfway through. I setup separate models to handle the decision making for each aspect of the game. This makes it so it can make the decision to use its army separately from the decision of progressing tech (or not). I stopped around the time I couldn't figure out how to have it build its own strategies as I ended giving it a long set of possible actions and letting it pick and it felt too 'guided'. It was able to beat "Very Hard" 50% of the time vs random's 0%. Was my first exercise with ML. I got the chance to apply the concept it at work for something outside of my scope. Used both that and the SC2 project as demonstration in an interview and got a promotion out of it. This inspires me to try my hand at it again! EDIT: To handle army movement which you mentioned in the video, I chopped the maps up into a grid and gave it decisions to make where it could attack-move its army to any of these at will. 9 worked the best but you could make it much more granular. It used this to both attack and defend.
It’s a very interesting video about the ML + gaming. As a newbie to this AI world, it also gives lots incentives to continue learning.
This is great! I'd love to see something like this could compete in the arena
Wow congratulations I think what you did is amazing 🤩 I would like to do something like this for software testing for a while but it is so complicated
Looks great, always been interested in the Alpha Star gameplay and how it manages all the different tasks. For the enemy search, can focus on undiscovered minerals (enemies would normally congregate around minerals fields) and probably better than random search.
Very illuminating!
Great vid. Buying the book.
This is freaking intense! also for the hunters problem: Why not make a "return to safe space" function for them when they detect enemies. That way they only perform scouting duties.
@adye88
2 жыл бұрын
And obviously set a variable for safe space= position holding command center
Wow I'm literally working on a series on RL theory and I was just wondering how the hell you'd code things up to actually play Warcraft 3. Starcraft 2, close enough! Such a useful channel
Great stuff! Super interesting.
I like the ideas here, but be sure that you've got task that can understand the adv. of having a high ground vision giving better attacking vs not having high ground vision. Also, I'd consider having the model constantly scouting as all information gained on the players actions can lead for better counter attacks and so forth. But yeah I'm loving this. keep'em coming!
Already super impressive that you could do rl for macro level strategy! Totally agree that to solve a csrtain problem how to formulate the state action and reward is key
I'm replying very rarely to those kind of videos.. but hats off. Even though the project structure is messy, your genuine "realistic" practical approach was very enjoy some to watch.
Thanks! Your tutorials were the first that worked for me. Biggest problem that I had was the directory path for the Starcraft maps.
@sentdex
Жыл бұрын
Thank you for the super!
Great book! Love my copy
Nicely done.
Book Purchased. Thanks
Why not give rewards based on how many enemy units/buildings are destroyed? Then give a penalty based on how many units/buildings are destroyed? Also to help the AI prioritize winning over stalling, you could increase the value of a win based on how fast it won.
@nrobo3840
2 жыл бұрын
Yeah, adding a time decay to the win reward was where my mind immediately went.
@moseszero3281
2 жыл бұрын
I was thinking a k/d reward and a lowering of all rewards for time
Very satisfying to watch
Amazing video dude. Gj.
i am trying to make a AI that will farm for me in rust, but ia mso lost xD, if I understand you are not using computer vision because the camera movement is to complicated ? so you are building data from minimap only ? if i wanted to train my AI to farm sulfur nodes in rust what would be your approach ?
It's amazing how are you doing it. Your videos are really inspiring
Thanks for the amazing content
At 4:52 regarding your comment, I added async def on_start(self): self.last_sent = 0 after the on_step function. It makes it a little clearer
Very cool idea. I think programing a few meta builds into your algorithm and seeing how it learns with time (if achieved "this" by "this time" do "this" otherwise do "this") like doing a rush build ect.
I wasn’t in the mood to watch the video when I read the title, but when I realized what the thumbnail was I stopped by to drop a like lol.
Kind of neat, I'm wondering if you looked at the AlphaStar research at all to do this, or looked into the StarCraft 2 AI community? There's about 70 coders of various bots and AI that compete against each other and it'd give you a ton of ideas on build choices and especially unit control and decision making.
@Leonhart_93
2 жыл бұрын
The AI coders in the community don't make true AI, they just give them a set of commands and responses to various actions. A true AI learns from successes and failures (reinforcement) with very little initial programming.
@PeterRAmice
2 жыл бұрын
@@Leonhart_93 while this has some truth to it, what you are referring to is machine learning. The ai spectrum is much wider than learning like a human, the best way of describing ai imo is: a machine which observes it's environment and executes actions which maximizes its goals. So with that definition in mind I would argue those people are actually building ai's which do not automatically learn from their past experiences and thus they do not build machine learning ai's, which alpha did.
@Leonhart_93
2 жыл бұрын
@@PeterRAmice We just called bots that follow specific sets of instructions AI in the past out of laziness and limited understanding. It doesn't apply to current times anymore, fake AI and true AI have almost nothing in common. We can't use the same word to describe them both, so a "bot" is proper for the fake AI.
@Leonhart_93
2 жыл бұрын
@string name; Yes, bots. I played vs the top bots of the sc2 bots community, they are really good. They won't be easy unless you are at least masters, which is impressive for a bot. The major problem with those fake AI is that they can always be cheesed in some way, no human programmer can ever input the right answer for every situation. Btw, AlphaStar never had complete map vision, it wouldn't have been a valid test. It had complete vision of whatever parts it could see since there was no player-like camera which removed any delay from responses. I think that's ok, even bots respond to everything with 0 delay. AlphaStar has potential, but it will never progress past a certain point if they don't train it permanently on the ladder vs pro players and actually see current tactics.
@ErazerPT
2 жыл бұрын
@string name; It's no more cheesy than a grandmaster switching cams at 400+apm (yes, they do it...). And while "beating the best" might sound like a great eng goal, all you need is to beat 99% to already go WAY beyond what humans can do (on averga). There's a few F1 top racers, there's billions of "common drivers", for a driving ML model which is more important, beating the top F1 or consistently outdoing "Average Joe"? p.s. that one "human trick" that beat the model in one game was a simple "loop", as the model got stuck reacting to the same thing in a loop, back and forward. You can observer that level of idiocy in humans too at times ;)
This is now the 3rd programming/ artificial intelligence channel that I've found myself watching even though my ability to code (or even Math) is so awful that if there was a gun to my head I would beg to just be shot. But I find it satisfying to watch. Like a time-lapse of an ant colony diligently working away.
What an awesome video! I wonder how making an API like this for other RTS games would be possible and then training AI models for those separately. 🤔
very interesting to see how you structured it to use ML decisions for higher level decision making. would definitely be interested in seeing how you approach a micro script and specifically wonder about the ability to add new behaviors without having to retrain from scratch
For the reward mechanism you could probably build a LSTM that gives you the probability of winning for each action you take and you should probably include a time penalty to avoid the bot dragging the game out
This amazing. Amazing code, amazing explanation, amazing editing. Only one suggestion: when possible, don't use try:...except:pass As this can lead to hellish problems. If you know what exception you are having in that try-except statement, using that exception explicitly is better (even if you are just going to 'pass' it)
This video is so God damn cool, I have a current project that I try to make chess self play work on very limited resources, I think SC2 will be my next project if the actual python API is open. What is your GPU, and how long did it take for you to train the agent?
Some time series analysis (windowed access to what has been searched, where stuff was, ...) would probably help the AI make better decisions. The data of just the map does not do a good job of storing time-information. Your rewards seem like a good fit. Great video!
This is what Im looking for
i know nothing about programming or AI but this is just so fun this watch
looking forward to see how this turns out after its polished.
well done sir
Do you want Skynet? Because this is how you get Skynet :) (also great strats and explanation on how this works)
So good blows my mind
Reward for resources spent. this will incentivize expansion and rapid army growth until max out. At that point change the reward to enemy units/structures killed. Something like (big reward) for spending money on nexus/probe/stargate (bigger reward) for void ray, (penalty) for having too much money banked up unless supply is >190. Then (big reward) for killing enemy unit/structure, while dialing back on rewards for building structures. zero out the rewards for probes over 70-80 and for pylons over 200 supply. When supply drops due to combat, flip the rewards back to making void rays to max out again.
Interesting actions. Not only do they encode a lot of knowledge about the game, they include deep causal chains that otherwise would take long to learn.
In my experience grouping attacks and synchronising targets is very important.
I wonder if modifying the reward structure to include a small reward for scouting. Like finding new enemy structures or something would be useful to get more wins in those games where you said they regrouped and came back with a larger force to beat you later
@AlexGrom
Жыл бұрын
Later on there is potential to counter based on what and when was seen. You see early barracks - prepare to counter marines, marauders or reapers.
very inspiring video! Looking forward for more reinforcement learning tutorial!
this book is impressive
Any plans on a part 2 with the microgame plan implemented and see how it runs in tandem?
oh man, i needed that book 3 months ago, made with 3 others our own NN and genetic algorithm to play mario. also with reinforment learning. i was thinking about how hard it would be for sc to do so. but it doesnt seemed too hard, but you used didnt wrote your own neural network right?
Cool video
That is very cool and powerful
Just a quick note: the "can afford" check at 04:47 is NOT totally redundant. You're inside a "for each idle stargate" sort of loop, and if two are idle, you could end up in a situation where you can afford one but not the other -- and depending on the capabilities of the ex-handler, tripping an exception doe to insufficient resources could crash the AI.
supper cool !!!!!
It remind me my FYP in university.
it might help to allow a phoenix now and then for scouting purposes since void rays are super slow
I remember when you live coded AI plays GTA v and that too on python's default IDLE. Bring those days back. Great video though.
You were so preoccupied with if you could, you didn't stop to think if you should... and so it begins.
Would it be possible for you to script the last few enemy corordinates where the ai encounters them and then project a trajectory to where the enemy may be?
The criteria I would try to ensure it has highest on its priority is- if you win, only- unit efficiency. IE: how many resources did this unit earn, or destroy for an opponent, relative to its own cost? Averaging them out, and defining those units by a percentage based on the actions they were made to perform- and segmenting the game into the first 5 minutes and the rest of the game- you could provide a huge assist to the AI learning more complicated macro and micro strategies.
I'll be interested to see how you link the different AIs with there different specialties together. My only concern would be there is bound to be some overlap, how would the AI resolve 'competition' against itself when one or more AI specialties want to control the same thing? ugh I can English I swear!
Does Stable-baselines allow to store states - reward pairs in harddisk? I developed a modification of the MemorySequential class in keras-rl to use little memory in ram. My algorithm uses a thread to store states (images or whatever) as numpy arrays in my ssd disk, and keeps a randomized subset of the states in every loop of the algorithm in order to train the agent without using tons of RAM (which i don't have). It's a sloppy implementation so I was wondering if stable-baselines has something like that
When you got talking about how to handle the gas extractor on your minimap was that you handled it strangely. So keep in mind that the RGB values for the colors you put on your map are arbitrary and serve to help you visually more than the computer. But you could have encoded some meaningful data into the RGB itself. For example, instead of saying "This building is green, this building is dark green" and so on, you could have put all building/unit type info into the R-value of RGB. IE: This building is R-value 12, this building is R-value 13, and so on. Then the G-Value could represent something else, like building health. IE: R-12, G-255 means it's a Refinery at full health while R-12, G-1 means the Refinery is about to explode if it takes any more damage. Finally, the B-Value could then be used as some sort of indicator of something specific to that building. R-13 might be a Barracks, and B-2 might mean that it's in the middle of training something and has 2 units of time before it finishes and can do something else. On the other hand, R-14 might be a Gas node, and B-# could indicate how much gas that node has, while R-15 indicates that this is an extractor with the B-# still indicating how much is still in the node. Sure to YOU R-14 and R-15 are basically the same amounts of red and your eyes wouldn't be able to tell the difference, but to a computer, those are two distinct values.
So coooool 🤗
Can some of the values be included in the iteration or training ... like for example the reward values?
I really want to buy that book.
I love how I know every word but have no idea what you're talking about. (youtube recommended this video because I follow startcraft2)
Any chances of putting the books on amazon as well?
Here's my issue with your approach: Your actions are basically just a hard coded list of commands. You could essentially just create a hierarchy of those commands and apply a little probability and get similar results. The way you've set this up, the AI will never develop novel strategies. It can, at best, play with the topmost level of human strategy available (and that's only if you spend the time to hardcode that into each action). And, that's cool, but... I feel like the point of an exercise like this should be to see how the AI "thinks" about the task and what novel strategies might arise from that. Idk, I do understand that the computing power to decide between the thousands of different options available at any given moment in an RTS is beyond most personal computers, but... I feel like hard-coding the actions kind of defeats the whole purpose.
@JOHNSMITH-ve3rq
2 жыл бұрын
hard agree. love the channel but yeah -- all the hardcoded rules are confusing. Can't you simply give it the barest of initial game parameters - no strategy, no rules - and let it learn from winning strategies?
Maybe code a reward for seeing unique enemy units/buildings. That way ai would have to scout the map for enemies, then double the reward for attacking if the attack unit was recently seen by a scout unit.
your strategy of multiple Ais to coordinate everything at the end that you mention is the same one Paradox Entertainment uses in games like Stellaris and EU4
Couldn't you use the mini map and cursor to move camera by clicking said mini map? or am I missing something?
9:00 another reward could be time. If you win a match, then the shorter the time, the extra rewards you get. Like wise, the opposite if you lose.
Rewarding unit destruction over victory could lead to the bot learning NOT to win, and instead stalemate, in order to maximize unit destruction.
years of learning starcraft leads me to simply queue a few things but otherwise not give too many orders as that delays facilitation of actions.
Hello, may I ask how to automate process of training there? Or I need to manually restart game everytime?
Well, you could add rewards for scouting new locations, rewards for keeping units alive, and check out the math of pro starcraft players of what units you should use and when. Also tier rewards for which units and buildings it will destroy to reinforce priority targeting for better performance.
You could also give the ai a larger gradient win reward or larger gradient lose reward for shorter games.
SENTDEX: I'm going to teach AI to rush Voidrays. Protoss mains: STOP! I can only get so aroused. Zerg: This is a war crime.
Might also want to look into upgrades if you haven't. Units in mid-late without any upgrades are much worse in SC and this might have quite an impact.
Wish to also learn those skills for software testing
like Deepmind Alphastar, cool, would love to see full gameplay of this, please?
is there a c# version of the code snippets? maybe another resource that can teach machine learning algorithms in c# also? :)
Hey Lieutenant Commander Data, I'm going to go buy your book so I can join Starfleet too!
I read the title and thought "Isn't that redundant? Star Craft has its own A.I.", then I thought "Oh, crap you're mixing A.I.s. Who could have known the singularity was going to come from the greatest RTS ever?"
Such good learning materials! 🤗🤓 Most professional educators can’t do this
@hikari1690
2 жыл бұрын
Be honest, you weren't interested in professional education materials cause they use chess or cat and mouse instead of SC2 🤣🤣🤣
What if you sent a few units to the starting locations in a clockwise method using waypoints of some sort. (At x time go to y waypoont repeat) These waypoints could be also used to pinpoint locations the computer hid and resulted in a loss. Also, what if a losing so many units quickly it was set to build the upgrade function. I think both of these ideas could be implemented with your "engines". This was extremely entertaining!