A series of exhibition matches between DeepMind's'AlphaStar' and StarCraft II pros TLO and MaNa ended in a one-sided victory for the artificial intelligence which went a combined 10-1 against its human opponents.
The matches (broadcast from in-game replay) were played under the parameters of Protoss vs. Protoss and on the map Catalyst. AlphaStar was also restricted in the speed of its actions to keep it in line with human pros, such as having a 350 millisecond delay time between perceiving information and issuing commands. The AI also played with an average APM (actions per minute) of 280, compared to 390 for MaNa and 678 for TLO.
Despite making a number of decisions that would be considered questionable by human pro standards—such as over-saturated minerals with Probes, making very large number of Disruptors (and inflicting self-damage with them), or aggressively attacking high-ground defenses—AlphaStar was able to overpower its human opponents. Some of its plays resembled that of extremely-high APM bots (inhuman speed), such as complicated flanking actions or extremely precise Blink micro.
TLO and MaNa faced a number of different AlphaStar 'agents'—versions of the AI that trained through matches against each other in an internal league (combined with initial, imitation learning from human replays). With accelerated training, the agents were able to accrue around 200 years of real-time StarCraft II training over 14 days. The agents in the demonstration showed different preferences for strategies—for example one opted for mass Disruptors while others chose mass Blink-Stalkers.
While AlphaStar defeated both TLO and MaNa 5-0 in previous recorded matches, MaNa was able take a victory in a final, live exhibition game against a new agent. While the first ten agents agents were effectively able to 'see' the entire map at once (NOT a maphack—more akin to a max zoom-out), the new agent was given restrictions to mimic human player's field-of-vision limitations during a game. While this new agent still had an estimated MMR of over 7000, MaNa was able to defeat its mass-Stalker strategy with careful scouting and an overpowering army of Immortals.
While AlphaStar chose Protoss and map to expedite the training process, DeepMind said the process could be applied to any race. Deepmind previously achieved notoriety for developing the AlphaGo AI which defeated a number of top Go players in exhibition matches.
On January 25 2019 06:43 Poopi wrote: I think the title is kinda misleading. Raw interface AI goes 10-0, camera interface AI goes 0-1, would be more appropriate
This is a really, really insignificant nit to pick. It's a ~200 MMR difference according to their estimation, which is not worth putting in the TITLE. I'll forgive you though because you're just trying to defend mankind's pride.
I don't think we will ever get to a consensus that the AI outsmarted us at the game due to mechanical aspect of the game. A good number of those games were won due to completely inhuman stalker micro.
I'd like to know epms and not worthless apms for this, considering the AI was capable of microing stalkers in 3 different places at once and win vs mass immortals.
On January 25 2019 06:48 ArtyK wrote: I'd like to know epms and not worthless apms for this, considering the AI was capable of microing stalkers in 3 different places at once and win vs mass immortals.
Honestly I feel that StarCraft II is much more of a mechanics game than 'strategy' game than most people think, (it's more real-time connect-four than chess), so I also felt this demonstration wasn't quite as impressive as it could have been.
At least for my personal taste, the truly impressive way to hold his exhibition would be to have an AI that's actually considerably BAD mechanically but still crushes human opponents based on strategies and tactics (and, I guess, EXTREME efficiency of actions).
On January 25 2019 06:48 ArtyK wrote: I'd like to know epms and not worthless apms for this, considering the AI was capable of microing stalkers in 3 different places at once and win vs mass immortals.
On January 25 2019 06:43 Poopi wrote: I think the title is kinda misleading. Raw interface AI goes 10-0, camera interface AI goes 0-1, would be more appropriate
This is a really, really insignificant nit to pick. It's a ~200 MMR difference according to their estimation, which is not worth putting in the TITLE. I'll forgive you though because you're just trying to defend mankind's pride.
How is that insignificant? The raw interface didn't lose, yet the camera interface lost the first game its played. That means there is a huge actual difference between the two. Keep in mind that it's an estimated MMR, it's not like the AI played human ladder and only managed to go to 7300 MMR instead of 7500. It's the internal MMR from their internal league "The version of AlphaStar using the camera interface was almost as strong as the raw interface, exceeding 7000 MMR on our internal leaderboard."
On January 25 2019 06:43 Poopi wrote: I think the title is kinda misleading. Raw interface AI goes 10-0, camera interface AI goes 0-1, would be more appropriate
This is a really, really insignificant nit to pick. It's a ~200 MMR difference according to their estimation, which is not worth putting in the TITLE. I'll forgive you though because you're just trying to defend mankind's pride.
How is that insignificant? The raw interface didn't lose, yet the camera interface lost the first game its played. That means there is a huge actual difference between the two.
This is where I stop engaging you in discussion ^_^
On January 25 2019 06:43 Poopi wrote: I think the title is kinda misleading. Raw interface AI goes 10-0, camera interface AI goes 0-1, would be more appropriate
This is a really, really insignificant nit to pick. It's a ~200 MMR difference according to their estimation, which is not worth putting in the TITLE. I'll forgive you though because you're just trying to defend mankind's pride.
How is that insignificant? The raw interface didn't lose, yet the camera interface lost the first game its played. That means there is a huge actual difference between the two. Keep in mind that it's an estimated MMR, it's not like the AI played human ladder and only managed to go to 7300 MMR instead of 7500. It's the internal MMR from their internal league "The version of AlphaStar using the camera interface was almost as strong as the raw interface, exceeding 7000 MMR on our internal leaderboard."
Alphastar lost because Mana adapted and new what to expect going in. All you have to do is defend like a regular insane AI and prioritize ramp defenses. I bet Mana could do a clean sweep of 5-0 if he played again.
I would also tend to put more weight in the results (despite low sample size of the camera-restricted agent) than their estimated MMR, at least without information about how those estimations are made. They said on the stream they really didn't know how powerful the agents would be or even if they had selected the agents which would have the best chance against human players. They don't seem to have a lot of faith in these MMR estimations. It seems to just be the MMR from the internal leaderboard.
On January 25 2019 06:43 Poopi wrote: I think the title is kinda misleading. Raw interface AI goes 10-0, camera interface AI goes 0-1, would be more appropriate
This is a really, really insignificant nit to pick. It's a ~200 MMR difference according to their estimation, which is not worth putting in the TITLE. I'll forgive you though because you're just trying to defend mankind's pride.
How is that insignificant? The raw interface didn't lose, yet the camera interface lost the first game its played. That means there is a huge actual difference between the two. Keep in mind that it's an estimated MMR, it's not like the AI played human ladder and only managed to go to 7300 MMR instead of 7500. It's the internal MMR from their internal league "The version of AlphaStar using the camera interface was almost as strong as the raw interface, exceeding 7000 MMR on our internal leaderboard."
Alphastar lost because Mana adapted and new what to expect going in. All you have to do is defend like a regular insane AI and prioritize ramp defenses. I bet Mana could do a clean sweep of 5-0 if he played again.
Not really, I watched the replay and, even tho I don't know PvP much, I'm pretty sure AlphaStar had the game somewhat won after the successful harass (yeah, even after the adept counter harass from MaNa), it had far better income, and sufficiently higher army value to have time to build a good composition.
But it got stuck really bad on the warp prism harass, and when MaNa started to attack near the 2nd base it was really indecisive.
So it feels like the AI can have a good start because it's relatively easy, but isn't able to change plans according to current situation (2nd oracle was queued up during the harass, no split of stalkers in the main base when the warp prism got cornered).
Reading the article, I'm not sure if their approach can overcome these difficulties given more time.
This was interesting. Nevertheless right now AI is playing a different game. It has much higher effective APM, by far superior micro management and superior vision, it will not be distracted in terms of sight. That is why it won games. The strategies presented by AI were rather poor and reactions questionable. I would say that AI has still a long way to go before it beats any pro players on equal terms, as successfull AI must prove itself beat players mainly by adopting superior strategies.
On January 25 2019 06:48 ArtyK wrote: I'd like to know epms and not worthless apms for this, considering the AI was capable of microing stalkers in 3 different places at once and win vs mass immortals.
I'm pretty sure apm = epm for the AI
yeah that's what made sense to me but i'd still like to know for all 3 during the matches
I'm quite impressed by the decision making and overall gameplan the AI showed but after looking at the replays the micro its displaying is completely inhuman, its basically playing with 10-20% more supply in terms of how efficient its units are.
As many others are pointing out, there's not much to be impressed with while the AI is playing a different game. We already know that a microbot using an API can outmicro a human using a mouse. I look forward to seeing what they come up with in the future, but what I saw today was no more interesting than if they had been feeding the AI bonus minerals.
Also looking forward to detailed replay analysis, I expect that will reveal that "APM" is a very misleading metric to be using.
On January 25 2019 07:17 No_Roo wrote: As many others are pointing out, there's not much to be impressed with while the AI is playing a different game. We already know that a microbot using an API can outmicro a human using a mouse. I look forward to seeing what they come up with in the future, but what I saw today was no more interesting than if they had been feeding the AI bonus minerals.
Also looking forward to detailed replay analysis, I expect that will reveal that "APM" is a very misleading metric to be using.
I mean - what we saw today was FAR superior to any bot we've ever seen in terms of strategy/decisionmaking so I wouldn't call it not impressive. It's definitely not on the level that it has better decion-making than a pro though.
On January 25 2019 07:17 No_Roo wrote: As many others are pointing out, there's not much to be impressed with while the AI is playing a different game. We already know that a microbot using an API can outmicro a human using a mouse. I look forward to seeing what they come up with in the future, but what I saw today was no more interesting than if they had been feeding the AI bonus minerals.
Also looking forward to detailed replay analysis, I expect that will reveal that "APM" is a very misleading metric to be using.
Exactly; from what I've glanced at the games this isn't very interesting. Perfect orb-walking and dropship/blink micro. If humans could do that then all dropships would be heavily nerfed and blink would be removed from the game. I'm sorry Google - no one is shocked nor cares about this.
Sadly their marketing department will continue to push this bot down our throats while they tell themselves that they've revolutionized AI.
Sure micro was incredible but people aren't giving alphastar enough credit for its decision making. Sure it made a few mistakes like the warp prism harass but overall it was amazing. Clutch recalls, incredibly good targeting decisions and movement. Really cool to see.
I was hoping for something a little more outlandish coming out of it, I heard they trained it on pro replays initially, which would explain why it played so similar to human players. It certainly played a very high-micro aggressive style which it is obviously suited for, but I guess I was hoping for something that would really challenge what we know about SC2. Maybe the probe oversaturation is a thing like that?
I am more interested how the top 10-20 globally does against the AI. Still interesting to watch and see the decisions the AI makes. And also to see the better / worse qualities of the AI. I'm also a bit sad that it wasn't smart enough to saturate his base better, which is one of the first things you learn playing Starcraft.
On January 25 2019 07:17 No_Roo wrote: As many others are pointing out, there's not much to be impressed with while the AI is playing a different game. We already know that a microbot using an API can outmicro a human using a mouse. I look forward to seeing what they come up with in the future, but what I saw today was no more interesting than if they had been feeding the AI bonus minerals.
Also looking forward to detailed replay analysis, I expect that will reveal that "APM" is a very misleading metric to be using.
Exactly; from what I've glanced at the games this isn't very interesting. Perfect orb-walking and dropship/blink micro. If humans could do that then all dropships would be heavily nerfed and blink would be removed from the game. I'm sorry Google - no one is shocked nor cares about this.
Sadly their marketing department will continue to push this bot down our throats while they tell themselves that they've revolutionized AI.
User was warned for this post
Do you have any technical understanding whatsoever of AI? Or are you just talking out of your ass?
This is an impressive technical achievement. I wouldn't call it revolutionary, since the algorithms Deepmind is working off of are fairly mainstream, but this was still a great first showing.
On January 25 2019 07:43 FueledUpAndReadyToGo wrote: Crazy... Never thought it would be this effective
Does this thing run on some supercomputer mainframe or just a regular i7?
The training is done on 16TPU if I recall correctly (<=> optimized GPU for machine learning computations) but the agent that plays against humans is made to be able to run on a single desktop GPU (probably a powerful one tho, latest most powerful nvidia would be my guess)
Wow, I wasn't expecting people bashing on the A.I by saying it is only a good microbot and that "no one is shocked nor cares about this".
I was really impressed and I think it is much more than a simple microbot. It knew when to push and when to back off, meaning he could size its opponents' army size, he targetted a warp prisms with phoenixes while not over-committing them because he knew the strength of warp prisms in an attack. He also knew when to expand by seeing its opponent army in the final showmatch. The A.I. also improved quite significantly in between the games vs TLO and the one versus Mana. He clearly learned that blocking its entrance was useful and that ramps were also dangerous. While the A.I didn't know how to deal with Mana double immortal drop, it might be simply because he did not practice against this situation before. However, I'm pretty sure it is something it can learn since he clearly knew how to deal with mineral line harassment by leaving stalker in its mineral line.
On January 25 2019 07:46 yep wrote: Wow, I wasn't expecting people bashing on the A.I by saying it is only a good microbot and that "no one is shocked nor cares about this".
I was really impressed and I think it is much more than a simple microbot. It knew when to push and when to back off, meaning he could size its opponents' army size, he targetted a warp prisms with phoenixes while not over-committing them because he knew the strength of warp prisms in an attack. He also knew when to expand by seeing its opponent army in the final showmatch. The A.I. also improved quite significantly in between the games vs TLO and the one versus Mana. He clearly learned that blocking its entrance was useful and that ramps were also dangerous. While the A.I didn't know how to deal with Mana double immortal drop, it might be simply because he did not practice against this situation before. However, I'm pretty sure it is something it can learn since he clearly knew how to deal with mineral line harassment by leaving stalker in its mineral line.
I believe it all comes down to not understanding the technology behind it and how the program operates.
I am very amazed and I hope they already learned that there are people on the internet which won't like what they do. I liked it, very much. We need a company, which will build the Skynet and kill the humanity! Learning it Go and Starcraft seems like a plausible start.
Oversaturating bases is the future of Starcraft and brute forcing ramps is the future of Starcraft!
Seriously speaking, AlphaStar was indeed impressive but I wasn't satisfied with it inhumanly microing blink stalkers during the most convincing victory and falling to immortal drops when it eventually lost; however, it's already way, way better than any AI in terms of decision making.
I actually don't get why didn't they let a single agent play more than one game to show its capability of learning from its mistakes; is it still to a point where it needs thousand of games to develop and effective change in strategy?
On January 25 2019 07:46 yep wrote: Wow, I wasn't expecting people bashing on the A.I by saying it is only a good microbot and that "no one is shocked nor cares about this".
I was really impressed and I think it is much more than a simple microbot. It knew when to push and when to back off, meaning he could size its opponents' army size, he targetted a warp prisms with phoenixes while not over-committing them because he knew the strength of warp prisms in an attack. He also knew when to expand by seeing its opponent army in the final showmatch. The A.I. also improved quite significantly in between the games vs TLO and the one versus Mana. He clearly learned that blocking its entrance was useful and that ramps were also dangerous. While the A.I didn't know how to deal with Mana double immortal drop, it might be simply because he did not practice against this situation before. However, I'm pretty sure it is something it can learn since he clearly knew how to deal with mineral line harassment by leaving stalker in its mineral line.
I believe it all comes down to not understanding the technology behind it and how the program operates.
I am very amazed and I hope they already learned that there are people on the internet which won't like what they do. I liked it, very much. We need a company, which will build the Skynet and kill the humanity! Learning it Go and Starcraft seems like a plausible start.
Well said. The only people unimpressed by this are those too ignorant of computer science to understand what it means.
Reminds me of:
"Congressman, the iPhone is made by a different company."
They had TLO off race and then didn't get any good pros to face against. I don't mean TLO or Mana aren't good pros, but it would have been cool to see AlphaStar vs Gumiho or Scarlett or Elazer for example.
On January 25 2019 07:46 yep wrote: Wow, I wasn't expecting people bashing on the A.I by saying it is only a good microbot and that "no one is shocked nor cares about this".
I was really impressed and I think it is much more than a simple microbot. It knew when to push and when to back off, meaning he could size its opponents' army size, he targetted a warp prisms with phoenixes while not over-committing them because he knew the strength of warp prisms in an attack. He also knew when to expand by seeing its opponent army in the final showmatch. The A.I. also improved quite significantly in between the games vs TLO and the one versus Mana. He clearly learned that blocking its entrance was useful and that ramps were also dangerous. While the A.I didn't know how to deal with Mana double immortal drop, it might be simply because he did not practice against this situation before. However, I'm pretty sure it is something it can learn since he clearly knew how to deal with mineral line harassment by leaving stalker in its mineral line.
I believe it all comes down to not understanding the technology behind it and how the program operates.
I am very amazed and I hope they already learned that there are people on the internet which won't like what they do. I liked it, very much. We need a company, which will build the Skynet and kill the humanity! Learning it Go and Starcraft seems like a plausible start.
Well said. The only people unimpressed by this are those too ignorant of computer science to understand what it means.
Reminds me of the US Congress trying to ask Sundar Pichai about Google making iPhones.
I wouldn't be so harsh on them. In the end, we may be in their position in other science fields
On January 25 2019 07:59 geokilla wrote: They had TLO off race and then didn't get any good pros to face against. I don't mean TLO or Mana aren't good pros, but it would have been cool to see AlphaStar vs Gumiho or Scarlett or Elazer for example.
MaNa got 2nd place in the last year at some WCS I believe. Where was Scarlett at that time?
No offense to anyone, but MaNa is a very strong WCS player, maybe even stronger than SCarlett(don't take my word for it, I am an ignorant of WCS after all ). I take MaNa and TLO. I would rather saw sOs, Stats or Classic but they won't speak that much English(show) and it would be costly to fly them there
On January 25 2019 06:48 ArtyK wrote: I'd like to know epms and not worthless apms for this, considering the AI was capable of microing stalkers in 3 different places at once and win vs mass immortals.
Honestly I feel that StarCraft II is much more of a mechanics game than 'strategy' game than most people think, (it's more real-time connect-four than chess), so I also felt this demonstration wasn't quite as impressive as it could have been.
At least for my personal taste, the truly impressive way to hold his exhibition would be to have an AI that's actually considerably BAD mechanically but still crushes human opponents based on strategies and tactics (and, I guess, EXTREME efficiency of actions).
This was both awesome since the decisions weren't that bad but the whole thing looked staged due to the micro the IA had.
On January 25 2019 07:46 yep wrote: Wow, I wasn't expecting people bashing on the A.I by saying it is only a good microbot and that "no one is shocked nor cares about this".
I was really impressed and I think it is much more than a simple microbot. It knew when to push and when to back off, meaning he could size its opponents' army size, he targetted a warp prisms with phoenixes while not over-committing them because he knew the strength of warp prisms in an attack. He also knew when to expand by seeing its opponent army in the final showmatch. The A.I. also improved quite significantly in between the games vs TLO and the one versus Mana. He clearly learned that blocking its entrance was useful and that ramps were also dangerous. While the A.I didn't know how to deal with Mana double immortal drop, it might be simply because he did not practice against this situation before. However, I'm pretty sure it is something it can learn since he clearly knew how to deal with mineral line harassment by leaving stalker in its mineral line.
I believe it all comes down to not understanding the technology behind it and how the program operates.
I am very amazed and I hope they already learned that there are people on the internet which won't like what they do. I liked it, very much. We need a company, which will build the Skynet and kill the humanity! Learning it Go and Starcraft seems like a plausible start.
Well said. The only people unimpressed by this are those too ignorant of computer science to understand what it means.
Reminds me of the US Congress trying to ask Sundar Pichai about Google making iPhones.
I wouldn't be so harsh on them. In the end, we may be in their position in other science fields
Well, when I start spewing bs from ignorance, feel free to correct me.
I really wonder what AlphaStar's Zerg macro is going to look like, it has the potential to be VERY different than human Zerg macro in terms of when it drones and when it doesn't.
ZvT from AlphaStar will be especially fascinating to watch.
I was extremely skeptical in the beginning. However i was positively surprised by the result. It is clear that Deepmind has done an amazing job. I do want to point out though, that I am fairly certain that after multiple games, the Pros can figure out a way to defeat the AI easily by exploiting AI patterns. That said, I am eager to see how much Deepmind can improve on what they already have.
I also want to point out that Pro APM compared to AI APM is hardly comparable, since every action of a PC is deliberate, while alot of actions of pro players are also spam. Maybe thats just nitpicking though.
On January 25 2019 08:21 404AlphaSquad wrote: I was extremely skeptical in the beginning. However i was positively surprised by the result. It is clear that Deepmind has done an amazing job. I do want to point out though, that I am fairly certain that after multiple games, the Pros can figure out a way to defeat the AI easily by exploiting AI patterns. That said, I am eager to see how much Deepmind can improve on what they already have.
I also want to point out that Pro APM compared to AI APM is hardly comparable, since every action of a PC is deliberate, while alot of actions of pro players are also spam. Maybe thats just nitpicking though.
Pros should less spam and do more effective action. Shame on them! Learn from the AI while AI is learning from you, this is a two way street
Why converge on practically monobuilding stalkers? Well, this is a unit composition that becomes more attractive the more you can out control your opponent for one thing. It's a good strategy if you can click faster and more accurately than your opponent.
On January 25 2019 07:17 No_Roo wrote: As many others are pointing out, there's not much to be impressed with while the AI is playing a different game. We already know that a microbot using an API can outmicro a human using a mouse. I look forward to seeing what they come up with in the future, but what I saw today was no more interesting than if they had been feeding the AI bonus minerals.
Also looking forward to detailed replay analysis, I expect that will reveal that "APM" is a very misleading metric to be using.
Exactly; from what I've glanced at the games this isn't very interesting. Perfect orb-walking and dropship/blink micro. If humans could do that then all dropships would be heavily nerfed and blink would be removed from the game. I'm sorry Google - no one is shocked nor cares about this.
Sadly their marketing department will continue to push this bot down our throats while they tell themselves that they've revolutionized AI.
There's plenty of room to debate the fairness of this test, and what the results mean, but you're off base when you claim that this isn't a shocking result and a breakthrough. A lot of people don't realize this, but writing an AI that is even capable of accurately evaluating the game state and producing a halfway competent strategy IS the notable result here. Previous RTS AIs were totally incapable of doing anything resembling a reasonable strategy, whether they had 200 APM or 10,000 APM.
Think back to the match where the stalkers targeted down the immortal that had popped out of the robo. No programmer ever entered code that said immortals were strong and that they should be prioritized. The AI figured that much out for itself, along with everything else it did to get there.
This was a very well thought-out, well structured presentation with very good production quality and I enjoyed every minute of it.
Of course there were factors benefiting AlphaStar (only one map, only one matchup, maybe higher eapm than humans can do), but this was just a presentation to show how neural networks can learn a very complex domain with reinforcement learning. It was not a tournament to decide the fate of humanity, so maybe we can just relax a little over such details.
It was truly fascinating to see how the agents got better from the first match to the second one with another week (~ 200 years) of training. Artosis was spot on to highlight the great decision making (respect for ramps, disengaging).
I look forward to what AlphaStar will be able to contribute to the game. Maybe pros will think about building more probes earlier. Maybe we could have a live evaluation during human tournament games to see who is ahead or what a good strategy would be from the current state. Such tools are used in chess to great effect and can really add to the casting of a tournament.
Of course the most important thing about AIs (to me) would be the whole ethic complexity. We develop something that by many definitions is truly intelligent, yet we have no rules and no laws on how to interact with such machines. Who is accountable for mistakes of the AI would be one of those questions.
On January 25 2019 08:35 No_Roo wrote: Why converge on practically monobuilding stalkers? Well, this is a unit composition that becomes more attractive the more you can out control your opponent for one thing. It's a good strategy if you can click faster and more accurately than your opponent.
I'd argue not just when you have better control than your opponent but when you have better control in an absolute sense. Meaning even playing against other AI agents with the same mechanical abilities the Stalkers will be the best unit. Even counter units like Immortals, also controlled by the AI, will fail against Stalkers, right? With equal control on both sides, as long as that level of control is high enough.
I'd be interested what units it would land on in other matchups.
It's possible that the AI was overfit to its handicap, in the last match there were still multiple groups of stalkers over a screen away from each other, but the AI was slower without the handicap. Reminds me of Dota 2 and OpenAI, where the AI learned how to play using multiple couriers and developed strategies to fit that handicap, then lost to humans without it.
The AI result is quite incredible and is a hell of an achievement. Any "handicap" it had does not make this less so.
Like everyone, I'm excited to see how it fares in the future when not using a raw interface, while facing pros who are not off-race as well as how it does in non-mirror match ups.
The AI is in the good way. However, I was expecting more elaborate strategies. The AI kept bugging doing F2 against a drop harass instead of doing a phoenix ^^.
From what that I have seen from the stream and the replays : equivalent of 3/4 gold players playing archon vs GM :D. Not really fair. Give the AI a mouse and the same camera : different story.
Stalkers are balanced for humans and not for agents. As AlphaStar cannot perform more actions per minute than Mana, perhaps it is that it can perceive ("see") the HP-level of all stalkers at the same time, it doesn't need to focus its eyes in a particular one after the other like Mana would do. It always selects the best stalker to blink away, and uses its APM super-effectively. Under this conditions, stalkers are overpowered.
Most agents build way more stalkers than any other unit (see statistics at deepmind's website), because it is the best one on their "hands". Attacking or defending with it is too good of a strategy. This limits the strategic depth of the agent, particularly it does not need to scout because nothing counters stalkers properly.
To overcome this limitation, they could lower the average APM limit while training, or put an absolute limit to the instantaneous APM.
Almonst forgot it: AlphaStar is a tremendous achievement even at this stage of its development. Best gaming AI ever, easily. I cannot wait to see what Deepmind brings next time.
>While the first ten agents agents were effectively able to 'see' the entire map at once (NOT a maphack—more akin to a max zoom-out), the new agent was given restrictions to mimic human player's field-of-vision limitations during a game.
So much of Starcraft 2 is screen placement and attention. Letting the Agent basically be able to interact with an entire map at one point really defeats the purpose in my opinion. It's like letting your opponent move their chess pieces whenever they want while you can still only move 1 per turn. Feels like in that regard the Agent wasn't even playing "Starcraft" at that point.
That said, it's really cool and I'm happy for them.. It just feels like something is off with their approach. Showcasing an Agent with SUCH a huge advantage (arguably a core, fundamental aspect of SC2 being removed in its favor) vs a human player just doesn't feel like it means much to me. You can't handicap an agent enough to make up for this insane advantage.
On January 25 2019 09:02 LHK wrote: While this was cool, i'm really disappointed in
>While the first ten agents agents were effectively able to 'see' the entire map at once (NOT a maphack—more akin to a max zoom-out), the new agent was given restrictions to mimic human player's field-of-vision limitations during a game.
So much of Starcraft 2 is screen placement and attention. Letting the Agent basically be able to interact with an entire map at one point really defeats the purpose in my opinion. It's like letting your opponent move their chess pieces whenever they want while you can still only move 1 per turn. Feels like in that regard the Agent wasn't even playing "Starcraft" at that point.
That said, it's really cool and I'm happy for them.. It just feels like something is off with their approach. Showcasing an Agent with SUCH a huge advantage (arguably a core, fundamental aspect of SC2 being removed in its favor) vs a human player just doesn't feel like it means much to me. You can't handicap an agent enough to make up for this insane advantage.
I mean all of these handicaps to try and mimic human limitations will be flawed. You could even take this further to and say "well the AI should only be able to focus its 'eyes' on a X by Y size portion of a screen at a time, and require at least Z milliseconds to process that information." You'll never please everyone, and the best we're gonna get to is 'good enough.'
Well the first 10 games was played with no "camera simulation", ie like "full zoom out" : the AI can see, react, click and micro everywhere visible on the map. This way perfect blink micro on 3 separated fronts surrounding MaNa army was possible with not-that-much APM, but still a very inhuman action.
Camera decision (where to look) and minimap-guessing ( you see a dot on the minimap : what is it ? should you respond to it now or next ? ) is very important skills in decent human players.
Furthermore, between game+screen+eyes+brain, their is a minimal delay to understand and react each time a player change the camera location. Then there is even delay with mouse control, etc.
Still, very impressive results although expected coming from Deepmind. Next step should be more mechanical fairness (see the minimap and 1 screen, max 4 screens/second, etc, mouse cursor simulation, etc) before playing all races all MU, because it is already over-fitting the "full zoom-out" setting, as seen in the mass blink-surrounding play.
I don't like how a lot of the news articles imply that AlphaStar had beaten the best that humans can offer. I believe this conclusion should be given once the BEST human players had the opportunity to interact with the AI for a while to come with strategies catered towards the AI, and then the AI beats them 99 percent of the time. Otherwise, the AI is just another extremely good player, but not the unbeatable colossus that it was in chess and go.
Moreover, deepmind should really cap the peak apm, bc no player can go 1000 apm microing stalkers at three different fronts chipping away all the units little by little. That's just micro-bot doing its thing. I wish we could limit its apm so much so that AI is forced to beat the human players not by micro, but by sheer ingenuity and picking the right compositions. That will be the actual great milestone in AI imo.
On January 25 2019 09:11 Waxangel wrote: You'll never please everyone, and the best we're gonna get to is 'good enough.'
Yep. But full zoom-out hack (while very convenient from a technical standpoint) is far away to "Good Enough" conditions but closer to obvious cheat. It is even close to a human-used cheat, because even human players play better with possibility to zom-out more.
IDK if they will, but I would like to see alphastar play some more games against some of these pro players. Also it would be interesting to see the best like an innovation play against the AI. I still think what they have right here, and these results of a fair AI that is no cheating playing a full game at this level and winning is still a pretty amazing achievement. I would like to see more games where the pro players can look for more weaknesses, but the AI being completely unpredictable makes it harder to predict than a normal person Its too bad they cant transfer this AI to brood war so we could see a flash or jaedong play against it. I also think brood war is a more balanced time tested game than sc 2
Do you think we'll soon get the best best players to play against AlphaStar? I wish they had at least one S level player today instead of Mana or TLO (though not to say they're bad players at all).
On January 25 2019 09:20 jalstar wrote: The last AI we saw was still very good and would have beaten other humans who aren't as good as Mana
Yes but it is time to stop over-fitting the zoom-out. I guess as seen with the 11th game they understand that. So using the cheat-code version to get to 10-0 is a bit frustrating, ideally they should have waited for the more-legit version to be able to win games vs pro before the exhibition matches.
This was really cool and amazing. However, SC2 hasn't been established as long as a game like GO (which rules/patches don't change the game). I hope we get a chance to see a rematch in five years when I believe we will start to see the pinnacle of SC2 players.
On January 25 2019 09:24 snakeeyez wrote: IDK if they will, but I would like to see alphastar play some more games against some of these pro players. Also it would be interesting to see the best like an innovation play against the AI. I still think what they have right here, and these results of a fair AI that is no cheating playing a full game at this level and winning is still a pretty amazing achievement. I would like to see more games where the pro players can look for more weaknesses, but the AI being completely unpredictable makes it harder to predict than a normal person Its too bad they cant transfer this AI to brood war so we could see a flash or jaedong play against it. I also think brood war is a more balanced time tested game than sc 2
I guarantee there will be many more matches, against progressively superior AI. This is just the beginning.
Can they put this bot as part of sc 2 and let the public play games against it? In my opinion these results are pretty amazing. There are tons of AIs and bots for brood war and other RTS games. None of them are even close to this level of play, and this bot is doing it all with human APM and not cheating.
I do have to wonder if the way it plays is informed by its' mechanical ability - if we gave it infinite APM and and a client with enough framerate to support it, would it just do totally fundamentally unsound plays because it can get away with it?
On January 25 2019 09:28 Garrl wrote: I do have to wonder if the way it plays is informed by its' mechanical ability - if we gave it infinite APM and and a client with enough framerate to support it, would it just do totally fundamentally unsound plays because it can get away with it?
On January 25 2019 06:43 Poopi wrote: I think the title is kinda misleading. Raw interface AI goes 10-0, camera interface AI goes 0-1, would be more appropriate
Probably the more accurate description. Though the AI is technically impressive, we already know that an appropriately programmed AI with sheerly precise apm will always win over a human. The AI didn't win thorugh strategy, it won through sheer actions muscle.
On January 25 2019 09:27 ANGELIAS1234 wrote: this means nothing unless they beat top Kr it means nothing
I have been out of the SC 2 scene a while, but I love AI research so seeing this is pretty cool. I always wondered what the best AI we could produce could do against top level players. What types of strategies does it use and are humans playing the game wrong? Can an AI beat someone like innovation in a long 10 game set? IDK that seems tough. They said this AI had normal APM it was below that of TLO. They showed the graph. It was not cheating
I would love to see how the Korean Pros go against the AI. I dont think the first 5 games with TLO is a true reflection given that it was TLO's off-race. I think if it was Stats/Zest/Parting, they would've handled it much better.
The AlphaStar AI is definitely leaps and bounds better than previous AI bots, but we did see a few openings/weaknesses.
I'd argue even the AIs effective apm was too high. 270 actions per minute that are all 100% precise and achieve exactly what the computer intends to do is beyond insane. Add to that that without screen manipulation that effectively is even higher, since we humans spend quite a bit of apm on just that. As mentioned above it also has inhuman perception in for example knowing their stalker health.
Ultimately the AI did seem to understand the core concepts of the game, but it didnt really do anything 'genius' strategically. It would be very interesting to see this AI handicapped much more to the point where it would absolutely have to outsmart a human pro player.
Still a great achievement for an AI as it did understand many core concepts. However in a sense the showcase versus a pro player felt a bit misplaced. It showed to me that an AI with only reasonable understanding of the game will still win with insane micro. It didnt really show that the AI was capable of actually solving the strategic aspects of sc2. It honestly didnt really seem to understand the game well at all. I hope they try to train it with a bigger handicap to see if it can actually play like an intelligent human being.
What was really clear here was the economic choices that the AI makes when selecting its strategy/unit composition. It's clear across the different agents that they value Stalkers so highly that they are almost always selected as part of the unit composition.
Then, we know that they incentivise the selection of certain units to create variety amongst agents, leading to their different 'preferences'. However, if you look at the games, those incentives are clearly so strong that the AI always over-builds that one unit, because its predicted value of any of the other units does not match the power of the incentive. So in the last game, the AI didn't build a phoenix because it was incentivised to build oracles, so it always thought that was more valuable than responding to the drop.
That's clearly the biggest weakness of the AI at the moment, but as the agents play against each other more, and as the need for incentives declines, there should be a greater sophistication about which unit preferences create the most value in different scenarios.
On January 25 2019 06:43 Poopi wrote: I think the title is kinda misleading. Raw interface AI goes 10-0, camera interface AI goes 0-1, would be more appropriate
This is a really, really insignificant nit to pick. It's a ~200 MMR difference according to their estimation, which is not worth putting in the TITLE. I'll forgive you though because you're just trying to defend mankind's pride.
How is that insignificant? The raw interface didn't lose, yet the camera interface lost the first game its played. That means there is a huge actual difference between the two. Keep in mind that it's an estimated MMR, it's not like the AI played human ladder and only managed to go to 7300 MMR instead of 7500. It's the internal MMR from their internal league "The version of AlphaStar using the camera interface was almost as strong as the raw interface, exceeding 7000 MMR on our internal leaderboard."
To answer your question, it's insignificant because you're committing a post-hoc ergo proctor hoc fallacy: you see that something came after something, IE the AI lost after the camera change happened, and you conclude that the second thing caused the first.
It's related to the fact that "correlation does not imply causation".
You don't know that it was the the camera change that actually was the determining factor here. It could be that Mana had a better idea of what he was up against. It could be that the warp prism threw off the AI's gameplan (I think it's this one). It could be that this AI isn't quite as good as other AIs.
It's impossible to say, so the safest bet is to include it as part of the larger dataset rather than think of it as a case study unto itself.
On January 25 2019 09:28 Garrl wrote: I do have to wonder if the way it plays is informed by its' mechanical ability - if we gave it infinite APM and and a client with enough framerate to support it, would it just do totally fundamentally unsound plays because it can get away with it?
Yes.
Heh, otherwise known as "The Marineking" approach to SC2.
On January 25 2019 09:46 snakeeyez wrote: Have they said what kind of neural net it uses? How big is the neural net? They said it also uses reinforcement learning.
We would have to wait until they give more info. Just saying reinforcement learning doesn't mean much. The best you could try to deduce things from would be the one time they tried to explain the AI's thought process with mana's and AI's pov along with other things on the screen.
On January 25 2019 09:46 snakeeyez wrote: Have they said what kind of neural net it uses? How big is the neural net? They said it also uses reinforcement learning.
You can check out their oldpaper and blog but ya we have to wait for the updated paper.
On January 25 2019 09:46 snakeeyez wrote: Have they said what kind of neural net it uses? How big is the neural net? They said it also uses reinforcement learning.
Eventually if they kept taking winning bots of these leagues would it eventually end up with one or a few perfect winning well rounded strategies that never lose? Maybe ones that humans or with our limitations we would have never found or is the bot finding strats that are not practical for humans?
These results suggest that AlphaStar’s success against MaNa and TLO was in fact due to superior macro and micro-strategic decision-making, rather than superior click-rate, faster reaction times, or the raw interface.
This conclusion isn't particularly convincing. Deepmind's research is great, but their propendency for PR stunts is wearing.
On January 25 2019 06:43 Poopi wrote: I think the title is kinda misleading. Raw interface AI goes 10-0, camera interface AI goes 0-1, would be more appropriate
This is a really, really insignificant nit to pick. It's a ~200 MMR difference according to their estimation, which is not worth putting in the TITLE. I'll forgive you though because you're just trying to defend mankind's pride.
Perhaps a more compelling nitpick for you then would be how TLO was offracing. He is not pro level with his Protoss. Really, it went 5-1 against pros, and I don't even think parent's argument is insignificant. Also, Mana seemed to find a major flaw in the final game, warp prism harass. I'd imagine in it's current state if you did this strategy 100 times, the bot would lose all of them. Title misleading to seem more impressive than it truly is. Humanity will not kneel to machine.
Does anyone notice the irony of demanding a handicap every time a computer does something better than a human and then arguing that the computer is not better?
In the near future people will demand “the computer use the same amount of bio-mass as a human brain for its calculations otherwhise it’s unfair”...
Even the “unfair” super-human blink micro was self-learned by the AI. That alone is pretty impressive, given that 1-2 years ago Neural nets could barely win micro fights against the built-in AI.
On January 25 2019 11:02 imp42 wrote: Does anyone notice the irony of demanding a handicap every time a computer does something better than a human and then arguing that the computer is not better?
In the near future people will demand “the computer use the same amount of bio-mass as a human brain for its calculations otherwhise it’s unfair”...
Even the “unfair” super-human blink micro was self-learned by the AI. That alone is pretty impressive, given that 1-2 years ago Neural nets could barely win micro fights against the built-in AI.
The point here instead is indeed to possibly have an AI beating a human while not relying on superhuman skills; its capability of taking autonomous decisions and developing original and appropriate strats is what it counts, not perfectly microing blink stalkers to the point humans can't keep up: every "stupid" ordinary AI can do this already.
Thus said, the AI not being coded to do so but getting to that by itself is truly impressive; it just isn't what Deepmind project is developed for and expected to do(I guess).
On January 25 2019 06:43 Poopi wrote: I think the title is kinda misleading. Raw interface AI goes 10-0, camera interface AI goes 0-1, would be more appropriate
This is a really, really insignificant nit to pick. It's a ~200 MMR difference according to their estimation, which is not worth putting in the TITLE. I'll forgive you though because you're just trying to defend mankind's pride.
Humanity will not kneel to machine.
If you said that unironically I guess we've truly arrived in a new age
On January 25 2019 11:02 imp42 wrote: Does anyone notice the irony of demanding a handicap every time a computer does something better than a human and then arguing that the computer is not better?
In the near future people will demand “the computer use the same amount of bio-mass as a human brain for its calculations otherwhise it’s unfair”...
Even the “unfair” super-human blink micro was self-learned by the AI. That alone is pretty impressive, given that 1-2 years ago Neural nets could barely win micro fights against the built-in AI.
This is indeed extremely... let's say weird. But is more disturbing is this stuff being pushed in 2019. No one thinks that such a computer can lose to a human without limitations. In the end, it becomes a pointless exercise of giving random limitations to the computer while still having it succeeding. That is why I called this a moronic PR stunt. It accomplishes nothing, the outcome is not unexpected at all (I hope so, at least).
This whole thing would be interesting if somehow you could put real human limitations on the AI so that we could get something interesting out of its strategic decisions. If we wanted to see meme micro we could just get one of those years old videos where a computer would dodge tank shots with a huge group of zerglings but isolating the single zergling that was being targeted.
100% confident that AI will drastically outperform humans no matter what, even if you impose further mechanical restrictions such as mouse boxing delay limitations, minimap mouse movement lag (to name a few components resulting in more fair army movement), certain alterations to micro ms delays, etc., Basically even if you model the AI handicaps as close to human level as you can, strategically it WILL beat humans (imo). If today's demonstration and what we've seen earlier with AlphaGo doesn't make people respect the potential of AI then I don't know what will really.
I'm glad I'm not one of those believing that humans can make better decisions than machines in games that are purely down to math and attention/movement. AI will eventually be better at pretty much everything that a human can do. It doesn't matter if you bring in your favorite top players either, Korean or not. Maybe in the short term. But betting against the machines long-term (5+ years) for something as 'simple' (maybe not "simple" in our current year, but soon enough) - as a game of SC2, just seems unwise to me.
The real issue will be down to modeling the mechanical limitations of humans and drawing the line at certain averages, trying out different sliders for different properties, and tightening when the AI makes moves that should require additional enforced millisecond "movement cost". But I'm pretty sure that even if you have a really tight model that moves as sluggishly as a human - especially in multi-location skirmishes - the AI's ability to calculate the board state and "what to do next" will absolutely destroy that of a human. Eventually.
Great work by the deepmind team and professional showing by everyone involved in the stream today. It's clear that the handicaps have to be modeled pretty tightly for the AI to be considered fair to us humans, but this was very impressive overall.
On January 25 2019 11:02 imp42 wrote: Does anyone notice the irony of demanding a handicap every time a computer does something better than a human and then arguing that the computer is not better?
You've got it exactly backwards. The people critical of this result are interested in the computer playing by the same set of rules, right now it's playing with extra rules that benefit it, which are not available to humans. Allowing it to see without the restrictions of a camera is a special rule that benefits the computer. Allowing it to have direct API access is a special rule that favors the computer.
Starcraft is a game designed and balanced around physical limitations of the input & output devices used to play it. If you're not using those devices, or at least realistically observing their limitations, you're not playing starcraft.
If you're impressed with the results, then hey, great. But it's not fair to say critics are the one demanding special privileges here.
AI feels like such a terrible, terrible idea for humanity. I will never address a robot by a human name (renamed my "Alexa" to Echo) and I will never play VR games.
On January 25 2019 11:02 imp42 wrote: Does anyone notice the irony of demanding a handicap every time a computer does something better than a human and then arguing that the computer is not better?
You've got it exactly backwards. The people critical of this result are interested in the computer playing by the same set of rules, right now it's playing with extra rules that benefit it, which not available to humans. Allowing it to see without the restrictions of a camera is a special rule that benefits the computer. Allowing it to have direct API access is a special rule that favors the computer.
Starcraft is a game designed and balanced around physical limitations of the input & output devices used to play it. If you're not using those devices, or at least realistically observing their limitations, you're not playing starcraft.
If you're impressed with the results, then hey, great. But it's not fair to say critics are the one demanding special privileges here.
At the end of the day, Deepmind is not at all interested in Starcraft. What they are interested in is a controlled environment of imperfect information where real-time decisions must be made from a near-infinite set of choices to achieve a desired outcome.
Starcraft just so happens to fulfill those requirements while also providing a handy PR platform, and whether or not AlphaStar is actually conforming to the exact specifics of some fan expectation of how it "should" play is a tertiary consideration at best.
Deepmind is an engineering company, not a Starcraft company. Priorities follow as such.
On January 25 2019 09:46 snakeeyez wrote: Have they said what kind of neural net it uses? How big is the neural net? They said it also uses reinforcement learning.
About what kind of neural networks, Oriol Vinyals (the first person from DeepMind team) said in the video and also in the blog post that AlphaStar's core structure is based on LSTM (Long-Short Term Memory network, a type of RNN - Recurrent Neural Network), and interestingly from the blog post, the other part is a rare type of structure proposed by Google Brain years ago called Pointer Networks for finding action sequences efficiently. That is even though the reaction time (or what we like to call inference time as mentioned in the video) is about 300 milliseconds on average for an agent, but instead of output one action at a time, it will be able to produce a sequence/batch of actions from one output. Hence it is able to execute APM up to 1500, that is 40 milliseconds per action, which means one AlphaStar agent output probably consist around 10 sequences more or less (think about it like human reflex reactions during micro, instead of every single action is a conscious decision, you make a decision, and the rest is auto execution from muscle memory, which in our human brain is controlled and learned with Cerebellum)
As to how big is one individual agent, from my own experience in evaluating neural network inference time (using one top of the line Nvidia 1080 Ti, and 2080 Ti, and P100), at the scale of millions to tens of millions of parameters (weights in the networks) per network will be around 100 milliseconds per output/inference (without optimization and using Tensorflow). So I imagine one single agent in AlphaStar is probably on similar scale. And as shown in the blog post, the training process not only optimize the parameters but also hyperparameters, hence I suspected the end result size for each agent will be varied, and is make sense to evolve these agents to be efficient in decision time to limit the networks to be compact.
As to how these agents are trained, from what I can gathered it sort of like evolutionary algorithms (a big branch of AI, which is not entirely related to how neural networks are traditionally trained). On this part, the self emulating and keep variations within the agent populations are the keys to generate novel strategies, more so than each network structure. It essentially only need agent that can evolve through many generations where starting populations learned from human records in the virtual league can learn and improved upon themselves, as well as new generations introduces over and over.
On January 25 2019 11:25 Liquid`Snute wrote: But I'm pretty sure that even if you have a really tight model that moves as sluggishly as a human - especially in multi-location skirmishes - the AI's ability to calculate the board state and "what to do next" will absolutely destroy that of a human. Eventually.
This is what I wanted to see, but this is not what was displayed. As Xamo noted before in the hands of the AI the stalker is the best unit because of the shire micro potential. No scouting or reactionary play necessary.
I would like to see a strategy and reactionary play which wasn't displayed here. DeepMind, lower the eapm to make it comparable to human and let it win using strategy, not pure micro management. Don't use raw view.
The agent is an achievement nevertheless, but it didn't actually solve SC2, it exploited it unbalance for the APM it has.
Maybe I'm wrong about this but after watching the replays I really think that AlphaStar has discovered a novel way to open in PvP that is essentially superior to what humans are doing currently. The probe production isnt actually a mistake at all. The extra workers make the opponent's harassment pay off way less and can even be used to defend aggression. Once your natural is up, the faster saturation on 2 bases kicks in and you go into the mid game with a supply lead.
I actually think its early game play is extremely intelligent and demonstrated a clear understanding of the strategy.
On January 25 2019 11:02 imp42 wrote: Does anyone notice the irony of demanding a handicap every time a computer does something better than a human and then arguing that the computer is not better?
You've got it exactly backwards. The people critical of this result are interested in the computer playing by the same set of rules, right now it's playing with extra rules that benefit it, which are not available to humans. Allowing it to see without the restrictions of a camera is a special rule that benefits the computer. Allowing it to have direct API access is a special rule that favors the computer.
Starcraft is a game designed and balanced around physical limitations of the input & output devices used to play it. If you're not using those devices, or at least realistically observing their limitations, you're not playing starcraft.
If you're impressed with the results, then hey, great. But it's not fair to say critics are the one demanding special privileges here.
Maybe take an additional step back and notice that the game itself is already extremely unfairly biased towards humans (see https://www.teamliquid.net/blogs/518977-towards-a-good-sc-bot-p5-hi-2-2, section 1). From this point of view, we already play with a lot of rules that benefit us humans.
Regarding the zoomed out camera view, I agree. That's clearly an unfair advantage for the AI. However, first results seem to show that having to control the camera just slows down the learning a bit and does not represent a significant obstacle.
Direct API access is a different story. It would be relatively easy to engineer a robotic arm that physically uses a keyboard. It's just not valuable in terms of advancing the field of AI.
On January 25 2019 11:25 Liquid`Snute wrote: But I'm pretty sure that even if you have a really tight model that moves as sluggishly as a human - especially in multi-location skirmishes - the AI's ability to calculate the board state and "what to do next" will absolutely destroy that of a human. Eventually.
This is what I wanted to see, but this is not what was displayed. As Xamo noted before in the hands of the AI the stalker is the best unit because of the shire micro potential. No scouting or reactionary play necessary.
I would like to see a strategy and reactionary play which wasn't displayed here. DeepMind, lower the eapm to make it comparable to human and let it win using strategy, not pure micro management. Don't use raw view.
The agent is an achievement nevertheless, but it didn't actually solve SC2, it exploited it unbalance for the APM it has.
Just let the 5 most robust agents play against themselves. Assuming their blink micro levels are very similar the exploit cancels out and strategy / reactionary play will be the dominant factor again.
The only thing you will be missing is the arbitrary benchmark under conditions that consider the physiological limits of humans. Not very relevant in the grand scheme of things.
I for one would enjoy watching two AIs battling each other with crazy strategies at 15'000 apm! *
* likely in mirror matches only - I am aware that the balancing is targeted at human level apm.
On January 25 2019 09:40 TheDougler wrote: You don't know that it was the the camera change that actually was the determining factor here. It could be that Mana had a better idea of what he was up against. It could be that the warp prism threw off the AI's gameplan (I think it's this one). It could be that this AI isn't quite as good as other AIs.
Mana said the strats the AI came up with threw him off. Specifically walking up the ramp in game 1. It's something a pro would never do so he felt safe up until the point AS forced his way up the ramp. Likewise, Mana said in game 5 that AS's strat was so weird he had no idea what was going on. It caught him completely off guard.
If pros had time to study the agent's games, I think it would be much easier to beat them right now. The agents start with pro games to study, so it's only fair that humans get to study their games.
Obviously there's a lot to do on DeepMind's side before they reach the ultimate test. Obviously getting rid of raw interface and switching to the camera like the show match is a good first start. Playing non-mirror matchups is another, as well as on different maps.
The final I would say is to play only one agent. Every game used a different agent. It's akin to playing different players. TLO didn't know this when he was playing and played his matches as if it was the same agent and thus tried strats to counter what he just saw in the previous game, which of course didn't work. Playing against a single agent would be quite interesting.
On January 25 2019 11:02 imp42 wrote: Does anyone notice the irony of demanding a handicap every time a computer does something better than a human and then arguing that the computer is not better?
You've got it exactly backwards. The people critical of this result are interested in the computer playing by the same set of rules, right now it's playing with extra rules that benefit it, which are not available to humans. Allowing it to see without the restrictions of a camera is a special rule that benefits the computer. Allowing it to have direct API access is a special rule that favors the computer.
Starcraft is a game designed and balanced around physical limitations of the input & output devices used to play it. If you're not using those devices, or at least realistically observing their limitations, you're not playing starcraft.
If you're impressed with the results, then hey, great. But it's not fair to say critics are the one demanding special privileges here.
Maybe take an additional step back and notice that the game itself is already extremely unfairly biased towards humans (see https://www.teamliquid.net/blogs/518977-towards-a-good-sc-bot-p5-hi-2-2, section 1). From this point of view, we already play with a lot of rules that benefit us humans.
Regarding the zoomed out camera view, I agree. That's clearly an unfair advantage for the AI. However, first results seem to show that having to control the camera just slows down the learning a bit and does not represent a significant obstacle.
Direct API access is a different story. It would be relatively easy to engineer a robotic arm that physically uses a keyboard. It's just not valuable in terms of advancing the field of AI.
A lot of time, we forget how impressive our human vision and instant decision action learning are. From an engineering point of view, human vision and object detection mechanism is so well tuned and calibrated by eons of evolution that it can identify objects and evaluate them subconsciously and then reconstruct the world around us in our head from moment to moment and make assumptions almost right the first time most of the time given just a few examples.
All the above abilities are actually quite far off from realization in current engineering capability. That is someone might said it is easy to just install a robot arm, and a pair of camera as eyes for an AI system to achieve end-to-end learning (that is simulate a human constrain), which is completely unrealistic at the current moment. The reason for invoking API as interface, is more or less due to it is very difficult for pure vision object detection to do it efficiently and correctly, as well as the control for output with a robot arm require so many parameters to just create one action alone will need to be learned and tuned with infinite possibility. So it is the engineering limitation make this necessary. (imagine how badly an AI system will function, where 5% of the input is incorrect, and half of the output actions are not what it is intended to do, it will be impossible to train it no matter what, and image recognition will also reduce the reaction time, not to mention the training time. If raw image input to objectives/objects is easy, we will already have the reinforcement learning for playing any game already).
The use of full map view is also an engineering challenge related to the I/O limitation. Most people don't realize how amazing their "imagination" or "reconstruction" of the field of view really is. For an raw image input to jump all over the map and able to picture them together as a whole and find temporal correlations, and they changed and being covered by units/buildings all the time, is very very difficult. I am amazed to how well the limited view version actually function for the live match. That along is an achievement. And as the learning curve showed it took quite a while almost 2 to 3 days to overcome and realize the raw input of a map is actually a map and make coherence output is astonishing. if you put these to the time scale 7 days is like 200 years, it actually spend most time like decades of repeat to learn these which is actually not very smart but more brute force. I'd imagine it will have hard time introducing a new map which might take learning from the start to get this realization again (I believe it memorize landmark patterns in order to navigate and tell which part of the map it is looking at instead of generalized to any type of "virtual view" construction), and I believe it is also the reason why it's performance for microing from multiple angles are less impressive.
On January 25 2019 09:40 TheDougler wrote: You don't know that it was the the camera change that actually was the determining factor here. It could be that Mana had a better idea of what he was up against. It could be that the warp prism threw off the AI's gameplan (I think it's this one). It could be that this AI isn't quite as good as other AIs.
[...] The final I would say is to play only one agent. Every game used a different agent. It's akin to playing different players. TLO didn't know this when he was playing and played his matches as if it was the same agent and thus tried strats to counter what he just saw in the previous game, which of course didn't work. Playing against a single agent would be quite interesting.
A misconception IMO. There is no conceptual difference between "one agent" and "multiple agents", because you can simply combine x agents into one composite agent (which is exactly what they did).
Compare it to Innovation switching up his macro game with a 3-rax proxy cheese. It's not akin to playing different players, but the same player choosing a different game plan before he starts the game.
The concept of a composite agent gets interesting when you add a super-agent to it, responsible for picking a sub-agent to play a specific game in a boX match. I would imagine the super-agent would then be trained similar to a Texas Hold'em agent and converge to game-theoretical optima for cheese / standard ratio etc.
On January 25 2019 09:40 TheDougler wrote: You don't know that it was the the camera change that actually was the determining factor here. It could be that Mana had a better idea of what he was up against. It could be that the warp prism threw off the AI's gameplan (I think it's this one). It could be that this AI isn't quite as good as other AIs.
[...] The final I would say is to play only one agent. Every game used a different agent. It's akin to playing different players. TLO didn't know this when he was playing and played his matches as if it was the same agent and thus tried strats to counter what he just saw in the previous game, which of course didn't work. Playing against a single agent would be quite interesting.
A misconception IMO. There is no conceptual difference between "one agent" and "multiple agents", because you can simply combine x agents into one composite agent (which is exactly what they did).
Compare it to Innovation switching up his macro game with a 3-rax proxy cheese. It's not akin to playing different players, but the same player choosing a different game plan before he starts the game.
The concept of a composite agent gets interesting when you add a super-agent to it, responsible for picking a sub-agent to play a specific game in a boX match. I would imagine the super-agent would then be trained similar to a Texas Hold'em agent and converge to game-theoretical optima for cheese / standard ratio etc.
This actually has a technical term in machine learning community called ensemble learning. But I don't think it is that easy to implement as of yet. And for efficiency sake the single agent is actually very different from a group of agents which will absolutely require quite a bit of parallel processing to achieve (it is not as simple as installing more GPU can solve). And indeed these agents choose to represent the group of all agents in the AlphaStar league will be those encounter many different strategies and still win for the most part overall. It actually is a very difficult problem to introduce "novelty" and still able to adapt mid-game. The current system is simply not having any learning capability on the fly (within one game, in machine learning term, it is a system with offline learning, instead of active/online learning which is much much more difficult).
On January 25 2019 11:02 imp42 wrote: Does anyone notice the irony of demanding a handicap every time a computer does something better than a human and then arguing that the computer is not better?
You've got it exactly backwards. The people critical of this result are interested in the computer playing by the same set of rules, right now it's playing with extra rules that benefit it, which are not available to humans. Allowing it to see without the restrictions of a camera is a special rule that benefits the computer. Allowing it to have direct API access is a special rule that favors the computer.
Starcraft is a game designed and balanced around physical limitations of the input & output devices used to play it. If you're not using those devices, or at least realistically observing their limitations, you're not playing starcraft.
If you're impressed with the results, then hey, great. But it's not fair to say critics are the one demanding special privileges here.
Maybe take an additional step back and notice that the game itself is already extremely unfairly biased towards humans (see https://www.teamliquid.net/blogs/518977-towards-a-good-sc-bot-p5-hi-2-2, section 1). From this point of view, we already play with a lot of rules that benefit us humans.
Regarding the zoomed out camera view, I agree. That's clearly an unfair advantage for the AI. However, first results seem to show that having to control the camera just slows down the learning a bit and does not represent a significant obstacle.
Direct API access is a different story. It would be relatively easy to engineer a robotic arm that physically uses a keyboard. It's just not valuable in terms of advancing the field of AI.
[...]That is someone might said it is easy to just install a robot arm, and a pair of camera as eyes for an AI system to achieve end-to-end learning (that is simulate a human constrain), which is completely unrealistic at the current moment.
Oh, I didn't want to suggest that the bot would have to learn how to use a robotic arm. Rather just substituting a signal "move to x" to a SC API with a signal "press right-mouse button at x-coordinate" to a robotic arm.
The use of full map view is also an engineering challenge related to the I/O limitation. Most people don't realize how amazing their "imagination" or "reconstruction" of the field of view really is. For an raw image input to jump all over the map and able to picture them together as a whole and find temporal correlations, and they changed and being covered by units/buildings all the time, is very very difficult.
Remember the Atari engine was trained on pure pixel input. The engine was completely ignorant to the fact that some pixel formed an abstract image of a brick while other pixels formed an abstract image of a ball. I agree that finding temporal correlations is very hard when jumping across screens. But to a bot it's just a bunch of pixels. It doesn't interpret them in any way. Also, even Artosis was able to identify an oracle on the minimap during the cast. The minimap is fully visible at all times and you "just" need to jump in the main screen to zoom in on the action (e.g. placing buildings or microing units).
I wonder where my posts went to. I didn't find them in this thread, how are they merged? Anyways, if deepmind won't fix the crazy apm, any following games against humans would be meaningless.
On January 25 2019 09:40 TheDougler wrote: You don't know that it was the the camera change that actually was the determining factor here. It could be that Mana had a better idea of what he was up against. It could be that the warp prism threw off the AI's gameplan (I think it's this one). It could be that this AI isn't quite as good as other AIs.
[...] The final I would say is to play only one agent. Every game used a different agent. It's akin to playing different players. TLO didn't know this when he was playing and played his matches as if it was the same agent and thus tried strats to counter what he just saw in the previous game, which of course didn't work. Playing against a single agent would be quite interesting.
A misconception IMO. There is no conceptual difference between "one agent" and "multiple agents", because you can simply combine x agents into one composite agent (which is exactly what they did).
Compare it to Innovation switching up his macro game with a 3-rax proxy cheese. It's not akin to playing different players, but the same player choosing a different game plan before he starts the game.
The concept of a composite agent gets interesting when you add a super-agent to it, responsible for picking a sub-agent to play a specific game in a boX match. I would imagine the super-agent would then be trained similar to a Texas Hold'em agent and converge to game-theoretical optima for cheese / standard ratio etc.
That's interesting, thanks for sharing. I'm talking more like what you said at the end. Humans know based on how a Bo5 is going when a good time to cheese might be. Or they have a feeling on what build may work based on what they've seen in prior games. This is what I'm talking about. An agent who has a game plan but also adapts their builds based on current events (even if this is only decided prior to the game starting).
Oh, I didn't want to suggest that the bot would have to learn how to use a robotic arm. Rather just substituting a signal "move to x" to a SC API with a signal "press right-mouse button at x-coordinate" to a robotic arm.
Still not that easy in practice. First of all, not just the precision of the "move to x" is actually an eye-hand correlation learning task that is the system has to learn how fast the real mouse move relative to the field of view, and the right amount of force to apply to the mouse to stop at the right location related to the screen in physical system. There is a reason why fully automatic robotic factories don't have robotic arms working along side humans, not just for efficiency, but also the precision control of robotics usually meant the force control are fined tuned to a very specific conditions/environments where a little misalignment will crash the whole procedures. From engineering perspective taking all that effort to create such robotics is just not yet cost-effective at the current time. (This kind of precision physical system might take years to reach the same level of proficient as to API command and most likely millions in expenditures just for the equipment).
The use of full map view is also an engineering challenge related to the I/O limitation. Most people don't realize how amazing their "imagination" or "reconstruction" of the field of view really is. For an raw image input to jump all over the map and able to picture them together as a whole and find temporal correlations, and they changed and being covered by units/buildings all the time, is very very difficult.
Remember the Atari engine was trained on pure pixel input. The engine was completely ignorant to the fact that some pixel formed an abstract image of a brick while other pixels formed an abstract image of a ball. I agree that finding temporal correlations is very hard when jumping across screens. But to a bot it's just a bunch of pixels. It doesn't interpret them in any way. Also, even Artosis was able to identify an oracle on the minimap during the cast. The minimap is fully visible at all times and you "just" need to jump in the main screen to zoom in on the action (e.g. placing buildings or microing units).
There is a reason why the input of the Atari game is used, due to the limited resolutions of the game and relatively easy to identify patterns in such low resolutions. And actually a lot of types of Atari games are still very poorly performed by machine learning. In order to reconstruct and abstract pixels into objects are not that easy even with CNN networks. The delay and inaccurate results introduced by them will accumulate over time, and difficult to learn. The current raw view is actually just an enhanced minimap hence it is able to tell a lot of information without the actual raw screen full resolution. And the live competition version is still not using the pure raw image input, but as they describe in the blog post "its perception is restricted to on-screen information, and action locations are restricted to its viewable region.", that is it's attention is limited in the input within an activation zone, and only able to submit commands within that active zone. As I said, if the raw image object recognition is so easy, we will already have all games or problems related to real image solved by machine learning already.
On January 25 2019 06:43 Poopi wrote: I think the title is kinda misleading. Raw interface AI goes 10-0, camera interface AI goes 0-1, would be more appropriate
This is a really, really insignificant nit to pick. It's a ~200 MMR difference according to their estimation, which is not worth putting in the TITLE. I'll forgive you though because you're just trying to defend mankind's pride.
Let me see the entire map in full detail, let my mind control everything instead of my hands via a clunky keyboard and mouse while I use my eyes to look at a screen, and you say those advantages would be insignificant? You know I have eyelids right? I have to blink, I miss things because of that. I have to scroll the screen around and motion blur messes stuff up. I miss things because of that too.
A human mind would destroy a computer if it had the same inputs and view of the game. It wouldn't be close.
Build a machine that has hands and eyes, make it perform using a monitor, speakers, keyboard and mouse. A human would crush it. Allow the mind to control the game without those clunky devices, and it destroys the AI even more. The mind would be way too fast, way too fast. I can split an army of Blink Stalkers in milliseconds in my mind. It'd be over before it began really.
The human mind is capable of things AI can only dream of. In fact, AI is just one of many achievements of the human mind.Starcraft is a game without perfect information, unlike Chess and Go. It will be a long time before AI comes anywhere close given the same inputs.
APM/EPM: Assuming the APM for AlphaStar is the same as EPM(280), what about for TLO, MaNa, and other pros? I read somewhere it's something around 150 for masters/GM so maybe around 200 for top level pros playing their main race?
Builds/Strats: The commentators were baffled by the heavy emphasis on stalkers, phoenixes, and disruptors. I think it makes perfect sense for an AI that is mechanically better than it's opponents to rely on these units that heavily rewards micro. This would justify the late gases to get more minerals for stalkers. If the AI is pitted against another clone AI for a PvP for 100 games, it would probably adapt by getting a sentry for more efficiency in stalker wars, therefore faster (or rather, more standard) gas timing.
Starcraft2 has historically been balanced for the top 500 or so players; units like high templars are extremely weak in bronze league as players cannot utilize them due to the lack of mechanical skills. So it's obvious that these units with great micro potentials are "overpowered" as they are consistently played inefficiently, even at top human levels. In the current state of the game, I would guess units like immortals, carriers, and tempests (high cost, low micro potential/low speed, low AOE potential, no effect) would not see play in AIvAI gameplay.
Supersaturation: It might be mathematically advantageous to over saturate before expanding, as oppose to getting 16 workers, cut probes while expand earlier, then continue production. Alternatively, could it simply be a buffer against harass tactics human players emphasize that acts as incentive? Again, would it occur in an AIvAI gameplay?
Zerg?: It would be hilarious if AlphaStar 12 pools every game (against humans) simply because of the mechanical advantage.
Edit: I'm confused about the "difference human and computer input" argument. Isn't having limitations what makes us human? Fine, give them 2 more years to program an AI that has to input commands using mechanical arms, that has to receive information from two local cameras (which shuts off for 400ms every 20s), and don't forget to add blur to the cameras during scrolling: I have a feeling that AlphaStar will still crush it's human counterparts but granted a little bit less of the crushiness. But would the argument turn to the fact that the mechanical arm is made with precise machinery and not tissues? Inversely, if we can play starcraft with just our minds, I'm certain human will still lose. Think: can you go through even 5 simulation in a second?
So this bot could micro perfectly given the APM/EPM restrictions on the bot? That's not really fair though because even humans at a specific APM/EPM will make micro mistakes.
On January 25 2019 06:43 Poopi wrote: I think the title is kinda misleading. Raw interface AI goes 10-0, camera interface AI goes 0-1, would be more appropriate
This is a really, really insignificant nit to pick. It's a ~200 MMR difference according to their estimation, which is not worth putting in the TITLE. I'll forgive you though because you're just trying to defend mankind's pride.
Let me see the entire map in full detail, let my mind control everything instead of my hands via a clunky keyboard and mouse while I use my eyes to look at a screen, and you say those advantages would be insignificant? You know I have eyelids right? I have to blink, I miss things because of that. I have to scroll the screen around and motion blur messes stuff up. I miss things because of that too.
A human mind would destroy a computer if it had the same inputs and view of the game. It wouldn't be close.
Build a machine that has hands and eyes, make it perform using a monitor, speakers, keyboard and mouse. A human would crush it. Allow the mind to control the game without those clunky devices, and it destroys the AI even more. The mind would be way too fast, way too fast. I can split an army of Blink Stalkers in milliseconds in my mind. It'd be over before it began really.
The human mind is capable of things AI can only dream of. In fact, AI is just one of many achievements of the human mind.Starcraft is a game without perfect information, unlike Chess and Go. It will be a long time before AI comes anywhere close given the same inputs.
Point to the cat, please:
It's almost as though humans and computers are very, very different.
I just watched the final game. What a joke the AI was. Massing Blink Stalkers. Moving the entire army back to defend Warp Prism harass. Warping in Stalkers to stop an Immortal drop. Building a cannon to defend against a Warp Prism drop. Not leaving Stalkers in it's base to defend a Warp Prism drop. Blindly charging up ramps with Stalkers versus Immortal/Sentry.
A complete inability to decide how to defend Mana's attack. Without being able to see the entire map, it was terrible.
On January 25 2019 07:06 Kafka777 wrote: This was interesting. Nevertheless right now AI is playing a different game. It has much higher effective APM, by far superior micro management and superior vision, it will not be distracted in terms of sight. That is why it won games. The strategies presented by AI were rather poor and reactions questionable. I would say that AI has still a long way to go before it beats any pro players on equal terms, as successfull AI must prove itself beat players mainly by adopting superior strategies.
It's hard to say what is equal terms in terms of video game. One could argue that equal terms is when ai will control physical keyboard,. Mouse and get screen data from the display. But that's adds a lot of problems on top of in game decisions. The alphastar decisions were fine because it executed them with insane micro all the time. But the last game shown that alpha star is really playing best on what it seems in very recent period of time and the repetitive immortal drops has broken it. In that respect it is very alien, and I believe top players could adapt to its style and play around it.
One thing people also need to remember is that the blizzard measurement of EAPM is not actually all effective APM. Rather it is Blizzard EAPM = APM - "the most obvious spam clicks".
Alot of the time pro players will do slightly subpar clicks or they will compensate with higher APM and do extra clicks because they know their mouse accuracy isn't 100% accurate (very relevant when you do splitting vs AOE).
So a computer that can average 350 apm + spike is 1500 is absolutely gonna stomp human pro players as long as it's general strategic knowledge and decision making is "not terrible".
I would be more interested in seeing it capped at a mechanical level where it comes to mid-high GM players and then see whether it strategically could win games against top pro players.
For that to be the case, it would probably need to be capped at avearge of around 150 apm with spikes no higher than 500.
I think unless it's a robot sitting at a computer operating a keyboard and mouse, people are always going to be able to take shots at an AI Starcraft player. Unlike chess and go, the interface is a huge part of Starcraft.
Unfortunately, I'm guessing it's not beneficial for DeepMind to try and go that far, since their interest is purely in developing AI and neural networks. I'm sure they'll make a really good AI that stomps any human player, but it will always do it from inside the machine. Then, they'll move on.
Maybe someday a robotics team will take an AlphaStar brain and put it into a robot, who knows haha
That said, I wasn't expecting this iteration of AlphaStar to be anywhere near this good, so it was cool to watch.
On January 25 2019 16:49 Popkiller wrote: I think unless it's a robot sitting at a computer operating a keyboard and mouse, people are always going to be able to take shots at an AI Starcraft player. Unlike chess and go, the interface is a huge part of Starcraft.
Unfortunately, I'm guessing it's not beneficial for DeepMind to try and go that far, since their interest is purely in developing AI and neural networks. I'm sure they'll make a really good AI that stomps any human player, but it will always do it from inside the machine. Then, they'll move on.
Maybe someday a robotics team will take an AlphaStar brain and put it into a robot, who knows haha
That said, I wasn't expecting this iteration of AlphaStar to be anywhere near this good, so it was cool to watch.
Even if there was a Terminator sitting in the chair they'd complain that steel wrists don't hurt. Artificial intelligence is called that for a reason.
Some people just like to whine. They whined when AlphaStar won, they whined when it lost, and they'll whine the next time it plays, no matter what accommodations Deepmind grants. I look forward to hearing their excuses when it wins again.
former deepmind machine learning scientist (and former sc2 player) here. i worked on a separate (unannounced) project but collaborated with a lot of the people who worked on alphastar, so i wanted to share some thoughts from an ML and SC perspective. these opinions are completely my own and don't reflect the positions of deepmind or the company i'm currently at.
ML: - this is a pretty remarkable achievement in terms of demonstrating imitation learning and RL as a means to learn long term strategies. - most of the algorithms (inverse planning, imitation learning) used are not new, but have never been applied in such a way and at this scale. a lot of crazy engineering went into this that most people don't think of - there are still notable failure cases which is why there's still more to do to beat top pros in any matchup, but mostly just more training. this could theoretically generalize to any RTS game. - this is closer to real life than any other existing environment ever created.
SC: - as a former sc2 player myself, sure i'll admit i wasnt that impressed for reasons others have stated including: 1. alphastar had some physical advantages a human doesnt have 2. alphastar made some blatant "errors" from a gameplay perspective (not getting blink, unable to handle warp prism harass), the latter which mana exploited for his only win 3. i never really saw alphastar scout, then react to an opponent. it always seemed like it had a strategy at the start, then executed it. - sc2 is a game where macro effects determine the outcome of a game - major battle losses usually. this means it's really hard to know if alphastar had a superior strategy or if it outmacro-ed the humans. and that was demonstrated tenfold by the fact that it had useless workers, 10 some observers, useless units, etc etc. and won those games convincingly.
at the end of the day, this is not to appeal to sc2 fans, just as AlphaGo wasnt meant for go fans (it's just a great side effect). it's to use these games as a testbed for real world situations - planning and decision making in partial information environments with exponentially large action spaces.
On January 25 2019 16:53 pvsnp wrote: I look forward to hearing their excuses when it wins again.
everyone expects AI to win over time. that's what it does, it brute forces games in a way that not only can a human not do, but that humans collectively, as an entity, cannot do. winning is inevitable. but it is an empty exercise right now when the AI is winning by producing mass stalkers into a hard counter and then winning because it can blink micro in a way that a human being cannot. that's simply pointless.
To be honest i am not going to mention any excuse but just state that it would be more interesting if it was a ZvZ, though i suspect Deepmind ZvZs never go past baneling nest.
On January 25 2019 16:53 pvsnp wrote: I look forward to hearing their excuses when it wins again.
everyone expects AI to win over time. that's what it does, it brute forces games in a way that not only can a human not do, but that humans collectively, as an entity, cannot do. winning is inevitable. but it is an empty exercise right now when the AI is winning by producing mass stalkers into a hard counter and then winning because it can blink micro in a way that a human being cannot. that's simply pointless.
Well, this AI didn't brute force it though. I mean AI with bruteforcing wouldn't kill its own units via big boom balls of doom against TLO
On January 25 2019 16:49 Popkiller wrote: I think unless it's a robot sitting at a computer operating a keyboard and mouse, people are always going to be able to take shots at an AI Starcraft player. Unlike chess and go, the interface is a huge part of Starcraft.
Unfortunately, I'm guessing it's not beneficial for DeepMind to try and go that far, since their interest is purely in developing AI and neural networks. I'm sure they'll make a really good AI that stomps any human player, but it will always do it from inside the machine. Then, they'll move on.
Maybe someday a robotics team will take an AlphaStar brain and put it into a robot, who knows haha
That said, I wasn't expecting this iteration of AlphaStar to be anywhere near this good, so it was cool to watch.
Even if there was a Terminator sitting in the chair they'd complain that steel wrists don't hurt. Artificial intelligence is called that for a reason.
Some people just like to whine. They whined when AlphaStar won, they whined when it lost, and they'll whine the next time it plays, no matter what accommodations Deepmind grants. I look forward to hearing their excuses when it wins again.
If they didn't promote this as a fair match between pro and AI most ppl wouldn't have problem with it. This is PR based on at the very least a biased approach of what's going on.
In my opinion the truest level playing field is not a crippled bot vs a human playing with keyboard and mouse, but a bot vs a human whose brain is directly hooked to SC2, so that the human also gains the advantages that the AI had. Imagine if you also had access to perfect clicks and stalker micro. It's fun to think about-- I think the AI, at least the AI we have at today, would never beat a human hooked to SC2, ever.
brute force learning, not playing in an individual game
The term "brute force" has a specific definition in computer science. Generally speaking, it means to compute all of the possibilities and then select the best one. However, this is not feasible in Starcraft for a number of reasons. It wasn't even feasible with the game of go, and Starcraft is a lot more complex. AlphaGo and AlphaStar are both capable of quickly recognizing which possibilities are not worth exploring, but without having to actually evaluate the outcome of such actions. This is why some people describe their mode of operation as more intuition-based. The AI doesn't have to think about what would happen if it made obviously terrible decisions, like move commanding back and forth in front of the enemy's army.
If they didn't promote this as a fair match between pro and AI most ppl wouldn't have problem with it. This is PR based on at the very least biased approach of what's going on.
There is no way to spin this as simple PR. Writing an AI that can play RTS at a human level is a world-first achievement in computer science.
EDIT: What OpenAI did with Dota was a PR stunt. As far as I'm concerned, as long as the AI is playing real Starcraft and not some limited version, then it certainly qualifies as a fair evaluation of its strength.
On January 25 2019 16:49 Popkiller wrote: I think unless it's a robot sitting at a computer operating a keyboard and mouse, people are always going to be able to take shots at an AI Starcraft player. Unlike chess and go, the interface is a huge part of Starcraft.
Unfortunately, I'm guessing it's not beneficial for DeepMind to try and go that far, since their interest is purely in developing AI and neural networks. I'm sure they'll make a really good AI that stomps any human player, but it will always do it from inside the machine. Then, they'll move on.
Maybe someday a robotics team will take an AlphaStar brain and put it into a robot, who knows haha
That said, I wasn't expecting this iteration of AlphaStar to be anywhere near this good, so it was cool to watch.
Even if there was a Terminator sitting in the chair they'd complain that steel wrists don't hurt. Artificial intelligence is called that for a reason.
Some people just like to whine. They whined when AlphaStar won, they whined when it lost, and they'll whine the next time it plays, no matter what accommodations Deepmind grants. I look forward to hearing their excuses when it wins again.
If they didn't promote this as a fair match between pro and AI most ppl wouldn't have problem with it. This is PR based on at the very least a biased approach of what's going on.
"Fair" is one of those words that everyone has a different definition for. Machines aren't people. Neural networks don't have a whole lot in common with neurons besides the name. Could you please point to the cat in this picture?
But you are right about one thing. This (meaning the showmatches and such) is a PR stunt. The real success is in the major technical progress they've made. Deepmind doesn't really care about Starcraft except as a vehicle for improving our understanding of AI. The visibility and media attention is just a nice bonus.
To put this in perspective, Google is over a billion in the red because of Deepmind. That money is not being spent to win games of Starcraft. Fairly or otherwise.
Demanding robot arms is a bit silly. At some point it's like asking your mother, who is about to visit the market, to bring back a bouquet of flowers, including roses, tulips and the rare Edelweiss flower growing only at high altitude. It transforms the task from buying something at the market to becoming a mountaineer. Robot arms are a separate field of AI (literally the field of robotics), whereas Deepmind seeks to develop machine learning tools for planning sequential tasks in noisy environments with incomplete information and large action spaces, or whatever is the proper terminology. In that respect the throttling of APM and the added limitation to camera movement are besides the point, only a concession to the domain.
If you want AI to compete in fairer circumstances, such that its APM is hardcapped and noise and drag is introduced to its cursor control, then you have to pressure not Deepmind, but Blizzard. The latter is the only one with an actual investment in StarCraft. For instance, if they intend to use AlphaStar as a tool for balance testing, then it has to mimic human physical constraints to be useful; and if they intend to play more show matches and challenge top players, it has to be able to win convincingly by superior decision making, not act as if it was barely evolved beyond the meanest micro bot.
One disappointment for me was that AlphaStar really excelled with micro which is kind of mini-game where you kind of have full knowledge or at least information you have is not that imperfect.
In terms of long term planning with huge amount of imperfect information AlphaStar was really unimpressive - it was that it didn't matter that much as long as it could compensate with insane micro and good decent tactics.
It actually bogs the question how well DeepMind's approach would scale for longer games where imperfect information has even bigger impact on the outcome.
I bet Deepmind playing zerg vs terran or protoss it would be alot weaker, especially zvt.
Its hard to play safe against every possible openings terran can do against zerg. Zerg units also have a lot less micro potential. Sometimes as zerg you have to take calculated risks and drone a bit harder than should be safe.
Im a bit disapointed we only saw pvp... PVP is really simple and comes down mostly to execution and micro. Would love to see it play ZvT where every game is a totally different opening from terran, when it would be getting attacked and harassed non stop all game.
We saw that it struggled once harassed with warp prism... and they had blink stalkers. I cant help but feel deepmind wouldnt stand a chance in ZvT currently.
I feel like PVP and ZVZ would be the easiest matchups for deepmind to dominate in. Because both those matchups are pretty straight up and all about micro and massing lots of the same unit.
What strikes me and people seem to completely forget about it, is that for their 5 best agents, they claimed that none was THE best. This means that each can loose to another one. So of course, as it was pointed out, if you study closely all 5 agents, learn their tendancies, then scouting should be enough to adapt and win. This is also a very good news for us players, as it means that all strategies are probably viable, at least as a counter to another one.
Pro players would be comparable to 3 to 5 agents, with different BO planed. For example, one BO micro oriented with an blink all in, another one with a double WP harass, and another one with a fast carrier. For now, AI agents still don't really know how to adapt and shift their strategies to match the one of the opponent. That's why the AI lost to MaNa. He adapted to the AI by harassing, and the AI was clearly confused.
Also claiming this is just a micro-bot is far from understanding how deepmind AIs work. They learn from scratch. I would have actually loved to see some kind of progression, see how the game was playing after 10 years, 50 years, 100 years. The only progression we saw was that one week training between TLO's game and MaNa, which was already an impressive improvement in the tactical parts and strategically (1 base carrier against TLO, mass phoenix against Mana)
the game vs TLO also showed that the AI learned to defend (probably not in the most efficient way but still) canon rushed. It probably means that other agents out of the top 5 were canon rushers.
Anyway, I'm quite impressed by the AI progress, from the beacon search to this
On January 25 2019 18:19 LDaVinci wrote: What strikes me and people seem to completely forget about it, is that for their 5 best agents, they claimed that none was THE best. This means that each can loose to another one. So of course, as it was pointed out, if you study closely all 5 agents, learn their tendancies, then scouting should be enough to adapt and win. This is also a very good news for us players, as it means that all strategies are probably viable, at least as a counter to another one.
Pro players would be comparable to 3 to 5 agents, with different BO planed. For example, one BO micro oriented with an blink all in, another one with a double WP harass, and another one with a fast carrier. For now, AI agents still don't really know how to adapt and shift their strategies to match the one of the opponent. That's why the AI lost to MaNa. He adapted to the AI by harassing, and the AI was clearly confused.
Also claiming this is just a micro-bot is far from understanding how deepmind AIs work. They learn from scratch. I would have actually loved to see some kind of progression, see how the game was playing after 10 years, 50 years, 100 years. The only progression we saw was that one week training between TLO's game and MaNa, which was already an impressive improvement in the tactical parts and strategically (1 base carrier against TLO, mass phoenix against Mana)
the game vs TLO also showed that the AI learned to defend (probably not in the most efficient way but still) canon rushed. It probably means that other agents out of the top 5 were canon rushers.
Anyway, I'm quite impressed by the AI progress, from the beacon search to this
If a professional player elects for a different strategy for each game of the series, then he is still recognizably the same player, with the same patterns and habits in his control, such that you can predict his reaction to feints, charges, diversions. Furthermore, although a player can deliberate in opening choice, they will tend to fall back to their general style in the mid- and late game.
But suppose you play a game versus AlphaStar, you notice not only some higher level decisions such as a tendency to opt for blink stalker builds, but you also pick up on some habits such as bad scouting, inability to defend against harassment, ineffective wall-offs. And now you're considering your approach for next game. It's obviously perfectly proper for the agent to rotate between strategies in order to avoid being figured out, therefore you realize you can't blindly counter the previous blink stalker build. But you also can't count on its weaknesses in defense, because you're playing a different agent. You don't know if it will have good blink micro, because it's a different agent. etc. In its most extreme form, any type of decision which involves interaction with your opponent will be based on quicksand, because you're playing a completely new opponent every time with no history, no information about it. Whatever this is, it's not standard match conditions. It's more like playing five random ladder games versus barcodes, with the opponents being picked out of time and space, e.g. one will be a 2016 player from Korea, the other a 2018 player from Europe, the other a diamond level player that just does cannon rushes.
In short, it's not predictable, unless all the agents tend to converge to a similar style, which is speculative, but if that's the case then I think that humans could adjust and develop anti-AI strategies, similar to how computers could be defeated in chess for years despite superior calculation ability. Obviously AI's will win in the end, but it's an open question whether that's a week from now or five years from now. We don't even know if Deepmind will stay with the project long enough to thoroughly trash all human opposition. afaik there has been BW AI research for around 10 years without threatening pro gamers.
On January 25 2019 09:28 Garrl wrote: I do have to wonder if the way it plays is informed by its' mechanical ability - if we gave it infinite APM and and a client with enough framerate to support it, would it just do totally fundamentally unsound plays because it can get away with it?
Yes if you train it against human opponents, not quite as it is trained today against other agents. It would do plays that might be fundamentally unsound if made by a human, but that would still be positively selected against opponents with the same APM.
On January 25 2019 13:11 ThunderJunk wrote: That was amazing. Can't wait to integrate the Deepmind probe saturation into my pvp
If it is learnt (and not just a side effect of "building probes is good"), we have to assume the continued probe production had a positive effect on the agent's winrate during training. Could mean that other agents on average do a better job harassing the economy than what human opponents did, so that the continued probe production that would have been necessary in face of a better harassment leads to oversaturation against humans.
brute force learning, not playing in an individual game
The term "brute force" has a specific definition in computer science. Generally speaking, it means to compute all of the possibilities and then select the best one. However, this is not feasible in Starcraft for a number of reasons. It wasn't even feasible with the game of go, and Starcraft is a lot more complex. AlphaGo and AlphaStar are both capable of quickly recognizing which possibilities are not worth exploring, but without having to actually evaluate the outcome of such actions. This is why some people describe their mode of operation as more intuition-based. The AI doesn't have to think about what would happen if it made obviously terrible decisions, like move commanding back and forth in front of the enemy's army.
If they didn't promote this as a fair match between pro and AI most ppl wouldn't have problem with it. This is PR based on at the very least biased approach of what's going on.
There is no way to spin this as simple PR. Writing an AI that can play RTS at a human level is a world-first achievement in computer science.
EDIT: What OpenAI did with Dota was a PR stunt. As far as I'm concerned, as long as the AI is playing real Starcraft and not some limited version, then it certainly qualifies as a fair evaluation of its strength.
The Dota 2 stuff was way more impressive than that, the outplays in early game were way more impressive and it didn't abused godlike mechanics to do so. And they're not trying to pretend their AI is beating some pros because it's not doing that yet.
It s really kind of funny when you post a poll to ask what people think about slowing the game speed of sc2 to make a better strategy game with more decision making (but you got supress by moderators), then you see AlphaStar crushing every players with human APM speed..
I don't think there's much you can do with a preexisting game like Starcraft to get a 'fair' game between ai and humans. Maybe intense tweaking of balance and ai ability but even then I don't think it would seem like a real opponent. You would have to design a game from the ground up with ai in mind I think.
I'm very impressed with the decision making of this ai. Its streets ahead of anything else I've seen.
On January 25 2019 17:49 Jasper_Ty wrote: In my opinion the truest level playing field is not a crippled bot vs a human playing with keyboard and mouse, but a bot vs a human whose brain is directly hooked to SC2, so that the human also gains the advantages that the AI had. Imagine if you also had access to perfect clicks and stalker micro. It's fun to think about-- I think the AI, at least the AI we have at today, would never beat a human hooked to SC2, ever.
It would make no big difference, because human attention span is limited too.
On January 25 2019 18:19 LDaVinci wrote: What strikes me and people seem to completely forget about it, is that for their 5 best agents, they claimed that none was THE best. This means that each can loose to another one. So of course, as it was pointed out, if you study closely all 5 agents, learn their tendancies, then scouting should be enough to adapt and win. This is also a very good news for us players, as it means that all strategies are probably viable, at least as a counter to another one.
Pro players would be comparable to 3 to 5 agents, with different BO planed. For example, one BO micro oriented with an blink all in, another one with a double WP harass, and another one with a fast carrier. For now, AI agents still don't really know how to adapt and shift their strategies to match the one of the opponent. That's why the AI lost to MaNa. He adapted to the AI by harassing, and the AI was clearly confused.
Also claiming this is just a micro-bot is far from understanding how deepmind AIs work. They learn from scratch. I would have actually loved to see some kind of progression, see how the game was playing after 10 years, 50 years, 100 years. The only progression we saw was that one week training between TLO's game and MaNa, which was already an impressive improvement in the tactical parts and strategically (1 base carrier against TLO, mass phoenix against Mana)
the game vs TLO also showed that the AI learned to defend (probably not in the most efficient way but still) canon rushed. It probably means that other agents out of the top 5 were canon rushers.
Anyway, I'm quite impressed by the AI progress, from the beacon search to this
If a professional player elects for a different strategy for each game of the series, then he is still recognizably the same player, with the same patterns and habits in his control, such that you can predict his reaction to feints, charges, diversions. Furthermore, although a player can deliberate in opening choice, they will tend to fall back to their general style in the mid- and late game.
But suppose you play a game versus AlphaStar, you notice not only some higher level decisions such as a tendency to opt for blink stalker builds, but you also pick up on some habits such as bad scouting, inability to defend against harassment, ineffective wall-offs. And now you're considering your approach for next game. It's obviously perfectly proper for the agent to rotate between strategies in order to avoid being figured out, therefore you realize you can't blindly counter the previous blink stalker build. But you also can't count on its weaknesses in defense, because you're playing a different agent. You don't know if it will have good blink micro, because it's a different agent. etc. In its most extreme form, any type of decision which involves interaction with your opponent will be based on quicksand, because you're playing a completely new opponent every time with no history, no information about it. Whatever this is, it's not standard match conditions. It's more like playing five random ladder games versus barcodes, with the opponents being picked out of time and space, e.g. one will be a 2016 player from Korea, the other a 2018 player from Europe, the other a diamond level player that just does cannon rushes.
In short, it's not predictable, unless all the agents tend to converge to a similar style, which is speculative, but if that's the case then I think that humans could adjust and develop anti-AI strategies, similar to how computers could be defeated in chess for years despite superior calculation ability. Obviously AI's will win in the end, but it's an open question whether that's a week from now or five years from now. We don't even know if Deepmind will stay with the project long enough to thoroughly trash all human opposition. afaik there has been BW AI research for around 10 years without threatening pro gamers.
Maybe I wasn't clear in my point. I was saying that people could prepare before a match, if they have access to several games for each agent and if they play against the same agents. Then scounting allows you to know which one it is, adapt your play accordingly and choose a winning strategy. In the case of Mana and TLO, they didn't know before the game that they would play against 5 different agents and even if, they wouldn't have known the behavior for each agent. They couldn't prepare.
It's as if Maru and Serral were playing only training games for a year against each other without sharing replays. It would be impossible to EU Terrans to predict the play of Serral and for Kr zergs to predict the play of Maru. Strategy wise, they could completely change from one game to another. Of course, tactically wise (micro, reaction to pressure and so on) they are the same player and would have tendancies that would stay the same and you could probably adapt between the games. In that regard the AI and the 5 agents are different. But still how do you prepare for the reactions of players if you only have 1 year old replays
And even in the case of the 5 AI agents, some patterns could still be recognizable : difficulties to split its army, over reaction to seeing enemy units close to its base, and probably others that I missed because I'm no pro. So it could be possible to kind of adapt.
On January 25 2019 19:30 Jockmcplop wrote: I don't think there's much you can do with a preexisting game like Starcraft to get a 'fair' game between ai and humans. Maybe intense tweaking of balance and ai ability but even then I don't think it would seem like a real opponent. You would have to design a game from the ground up with ai in mind I think.
I'm very impressed with the decision making of this ai. Its streets ahead of anything else I've seen.
I think its just super micro potential units that are broken for AI. I promise with zerg AI would not look nearly that impressive. THey would probably have a hard time droning and making units at the right times. Then you dont have things like warp prism micro or blink micro that can scale like crazy. What are they gonna do, dance their zerglings? They cant jump accross a wall. What are they gonna do, shoot an oracle or void ray or banshee with some roaches?
The AI would probably be forced in a ling bane hydra game, and then it would come down to how good they can micro hydras against AOE.
Zerg is the true race where human intelligence shines. Its about being one step ahead, predicting what the opponent will do, where he will send his warp prism,etc. You have to know when to drone and make units sometimes based just on instincts alone or knowing your opponent or current meta trends.
Protoss is you pick a build, execute it perfectly, and if you do execute it perfectly or very close, you probably win,unless your build order was too coin flippy.
In fact, PVP is by far the easiest matchup for an AI like deepmind to win consistently against humans. edit: Actually, ZvZ might be even easier for A.I now that i think about it... Most definitely maybe?
FUN FACT: I wonder if deepmind AI could play billions of matches of all matchups to determine which is the potential best race. If someone is unbiased and would know , its an AI. I bet they already know lol.
This kind if AI playing billions of games against same skill opponent could probably easily conclude if certain units are overpowered in certain matchups. Very interesting stuff when you think about it....
On January 25 2019 19:30 Jockmcplop wrote: I don't think there's much you can do with a preexisting game like Starcraft to get a 'fair' game between ai and humans. Maybe intense tweaking of balance and ai ability but even then I don't think it would seem like a real opponent. You would have to design a game from the ground up with ai in mind I think.
I'm very impressed with the decision making of this ai. Its streets ahead of anything else I've seen.
I think its just super micro potential units that are broken for AI. I promise with zerg AI would not look nearly that impressive. THey would probably have a hard time droning and making units at the right times. Then you dont have things like warp prism micro or blink micro that can scale like crazy. What are they gonna do, dance their zerglings? They cant jump accross a wall. What are they gonna do, shoot an oracle or void ray or banshee with some roaches?
The AI would probably be forced in a ling bane hydra game, and then it would come down to how good they can micro hydras against AOE.
Zerg is the true race where human intelligence shines. Its about being one step ahead, predicting what the opponent will do, where he will send his warp prism,etc. You have to know when to drone and make units sometimes based just on instincts alone or knowing your opponent or current meta trends.
Protoss is you pick a build, execute it perfectly, and if you do execute it perfectly or very close, you probably win,unless your build order was too coin flippy.
In fact, PVP is by far the easiest matchup for an AI like deepmind to win consistently against humans.
FUN FACT: I wonder if deepmind AI could play billions of matches of all matchups to determine which is the potential best race. If someone is unbiased and would know , its an AI. I bet they already know lol.
This kind if AI playing billions of games against same skill opponent could probably easily conclude if certain units are overpowered in certain matchups. Very interesting stuff when you think about it....
I don't think so. All the ai could tell you is what the balance is like for ai players. Human players use units differently (in a literal sense) so the balance is different. In other words, the ai could tell you which race is best, but the data would be useless to human players.
ITT: People that thought it would be around plat/dia level before the presentation and are now unimpressed because its (self taught!) mechanics are too good and it didn't play against code s players yet.
It seems that AI is cheating. For example, 4th game vs Mana. Looking at the corner of own base with selected 2 adepts. Than the probe is coming (which won't in vision of AI) and place gate. We can see that probe is selected (blue circle around the probe), but at that time in the bottom of screen 2 adepts are still being selected.
From that point we can figure out that: AI worked not only with 1 active screen (as it was considered). AI could do multiply selections and actions in one moment, that human can't do.
Also I should give attention that average apm of AI is about 50-100 and its effective apm. But when it is crucial APM rises to values much higher (600, 800, 1000 etc) that also hardly achieved by human. And the average 277 APM is pretty sensless thing.
On January 25 2019 20:12 CadoEverto wrote: It seems that AI is cheating. For example, 4th game vs Mana. Looking at the corner of own base with selected 2 adepts. Than the probe is coming (which won't in vision of AI) and place gate. We can see that probe is selected (blue circle around the probe), but at that time in the bottom of screen 2 adepts are still being selected.
From that point we can figure out that: AI worked not only with 1 active screen (as it was considered). AI could do multiply selections and actions in one moment, that human can't do.
Please note that the raw interface agents weren’t using the camera directly. The 10 replays have therefore been post-processed to add heuristic camera movements, such that the target location of each agent action is visible on screen.
Also in your screenshots YOU as the observer have the adepts selected. Click the "X" symbol to the right of the minimap to get to the player selection.
On January 25 2019 06:43 Poopi wrote: I think the title is kinda misleading. Raw interface AI goes 10-0, camera interface AI goes 0-1, would be more appropriate
This is a really, really insignificant nit to pick. It's a ~200 MMR difference according to their estimation, which is not worth putting in the TITLE. I'll forgive you though because you're just trying to defend mankind's pride.
On January 25 2019 09:40 TheDougler wrote: You don't know that it was the the camera change that actually was the determining factor here. It could be that Mana had a better idea of what he was up against. It could be that the warp prism threw off the AI's gameplan (I think it's this one). It could be that this AI isn't quite as good as other AIs.
[...] The final I would say is to play only one agent. Every game used a different agent. It's akin to playing different players. TLO didn't know this when he was playing and played his matches as if it was the same agent and thus tried strats to counter what he just saw in the previous game, which of course didn't work. Playing against a single agent would be quite interesting.
A misconception IMO. There is no conceptual difference between "one agent" and "multiple agents", because you can simply combine x agents into one composite agent (which is exactly what they did).
Compare it to Innovation switching up his macro game with a 3-rax proxy cheese. It's not akin to playing different players, but the same player choosing a different game plan before he starts the game.
The concept of a composite agent gets interesting when you add a super-agent to it, responsible for picking a sub-agent to play a specific game in a boX match. I would imagine the super-agent would then be trained similar to a Texas Hold'em agent and converge to game-theoretical optima for cheese / standard ratio etc.
This actually has a technical term in machine learning community called ensemble learning. But I don't think it is that easy to implement as of yet. And for efficiency sake the single agent is actually very different from a group of agents which will absolutely require quite a bit of parallel processing to achieve (it is not as simple as installing more GPU can solve). And indeed these agents choose to represent the group of all agents in the AlphaStar league will be those encounter many different strategies and still win for the most part overall. It actually is a very difficult problem to introduce "novelty" and still able to adapt mid-game. The current system is simply not having any learning capability on the fly (within one game, in machine learning term, it is a system with offline learning, instead of active/online learning which is much much more difficult).
I don't know anything about AI, but wouldn't it be sufficient to simply have the bots play Bo5's against each other instead of Bo1's during the training phase? Because then they can still learn from what their opponent has been doing in previous games.
On January 25 2019 18:19 LDaVinci wrote: What strikes me and people seem to completely forget about it, is that for their 5 best agents, they claimed that none was THE best. This means that each can loose to another one. So of course, as it was pointed out, if you study closely all 5 agents, learn their tendancies, then scouting should be enough to adapt and win. This is also a very good news for us players, as it means that all strategies are probably viable, at least as a counter to another one.
Pro players would be comparable to 3 to 5 agents, with different BO planed. For example, one BO micro oriented with an blink all in, another one with a double WP harass, and another one with a fast carrier. For now, AI agents still don't really know how to adapt and shift their strategies to match the one of the opponent. That's why the AI lost to MaNa. He adapted to the AI by harassing, and the AI was clearly confused.
Also claiming this is just a micro-bot is far from understanding how deepmind AIs work. They learn from scratch. I would have actually loved to see some kind of progression, see how the game was playing after 10 years, 50 years, 100 years. The only progression we saw was that one week training between TLO's game and MaNa, which was already an impressive improvement in the tactical parts and strategically (1 base carrier against TLO, mass phoenix against Mana)
the game vs TLO also showed that the AI learned to defend (probably not in the most efficient way but still) canon rushed. It probably means that other agents out of the top 5 were canon rushers.
Anyway, I'm quite impressed by the AI progress, from the beacon search to this
If a professional player elects for a different strategy for each game of the series, then he is still recognizably the same player, with the same patterns and habits in his control, such that you can predict his reaction to feints, charges, diversions. Furthermore, although a player can deliberate in opening choice, they will tend to fall back to their general style in the mid- and late game.
But suppose you play a game versus AlphaStar, you notice not only some higher level decisions such as a tendency to opt for blink stalker builds, but you also pick up on some habits such as bad scouting, inability to defend against harassment, ineffective wall-offs. And now you're considering your approach for next game. It's obviously perfectly proper for the agent to rotate between strategies in order to avoid being figured out, therefore you realize you can't blindly counter the previous blink stalker build. But you also can't count on its weaknesses in defense, because you're playing a different agent. You don't know if it will have good blink micro, because it's a different agent. etc. In its most extreme form, any type of decision which involves interaction with your opponent will be based on quicksand, because you're playing a completely new opponent every time with no history, no information about it. Whatever this is, it's not standard match conditions. It's more like playing five random ladder games versus barcodes, with the opponents being picked out of time and space, e.g. one will be a 2016 player from Korea, the other a 2018 player from Europe, the other a diamond level player that just does cannon rushes.
In short, it's not predictable, unless all the agents tend to converge to a similar style, which is speculative, but if that's the case then I think that humans could adjust and develop anti-AI strategies, similar to how computers could be defeated in chess for years despite superior calculation ability. Obviously AI's will win in the end, but it's an open question whether that's a week from now or five years from now. We don't even know if Deepmind will stay with the project long enough to thoroughly trash all human opposition. afaik there has been BW AI research for around 10 years without threatening pro gamers.
Maybe I wasn't clear in my point. I was saying that people could prepare before a match, if they have access to several games for each agent and if they play against the same agents. Then scounting allows you to know which one it is, adapt your play accordingly and choose a winning strategy. In the case of Mana and TLO, they didn't know before the game that they would play against 5 different agents and even if, they wouldn't have known the behavior for each agent. They couldn't prepare.
It's as if Maru and Serral were playing only training games for a year against each other without sharing replays. It would be impossible to EU Terrans to predict the play of Serral and for Kr zergs to predict the play of Maru. Strategy wise, they could completely change from one game to another. Of course, tactically wise (micro, reaction to pressure and so on) they are the same player and would have tendancies that would stay the same and you could probably adapt between the games. In that regard the AI and the 5 agents are different. But still how do you prepare for the reactions of players if you only have 1 year old replays
And even in the case of the 5 AI agents, some patterns could still be recognizable : difficulties to split its army, over reaction to seeing enemy units close to its base, and probably others that I missed because I'm no pro. So it could be possible to kind of adapt.
Yeah, but the type of thinking necessary to beat such an AI is just very different from regular competition. You would have to start reasoning like, oh it prefers one-base builds so probably it doesn't understand expanding well, or it hasn't walled off, so it's probably learned to be good at defensive micro. It's all about finding a weakness and exploiting it over and over. Whereas humans are amazingly good at improving on the fly, so you sometimes can't even do the same thing twice in a single game, let alone match.
Personally, I think AlphaStar is browsing this thread right now, trying to learn new ways to play even better. And probably making a note of it's biggest critics too (read: an IRL kill list for the future)
On January 25 2019 06:43 Poopi wrote: I think the title is kinda misleading. Raw interface AI goes 10-0, camera interface AI goes 0-1, would be more appropriate
This is a really, really insignificant nit to pick. It's a ~200 MMR difference according to their estimation, which is not worth putting in the TITLE. I'll forgive you though because you're just trying to defend mankind's pride.
Let me see the entire map in full detail, let my mind control everything instead of my hands via a clunky keyboard and mouse while I use my eyes to look at a screen, and you say those advantages would be insignificant? You know I have eyelids right? I have to blink, I miss things because of that. I have to scroll the screen around and motion blur messes stuff up. I miss things because of that too.
A human mind would destroy a computer if it had the same inputs and view of the game. It wouldn't be close.
Build a machine that has hands and eyes, make it perform using a monitor, speakers, keyboard and mouse. A human would crush it. Allow the mind to control the game without those clunky devices, and it destroys the AI even more. The mind would be way too fast, way too fast. I can split an army of Blink Stalkers in milliseconds in my mind. It'd be over before it began really.
The human mind is capable of things AI can only dream of. In fact, AI is just one of many achievements of the human mind.Starcraft is a game without perfect information, unlike Chess and Go. It will be a long time before AI comes anywhere close given the same inputs.
Point to the cat, please:
It's almost as though humans and computers are very, very different.
On January 25 2019 09:40 TheDougler wrote: You don't know that it was the the camera change that actually was the determining factor here. It could be that Mana had a better idea of what he was up against. It could be that the warp prism threw off the AI's gameplan (I think it's this one). It could be that this AI isn't quite as good as other AIs.
[...] The final I would say is to play only one agent. Every game used a different agent. It's akin to playing different players. TLO didn't know this when he was playing and played his matches as if it was the same agent and thus tried strats to counter what he just saw in the previous game, which of course didn't work. Playing against a single agent would be quite interesting.
A misconception IMO. There is no conceptual difference between "one agent" and "multiple agents", because you can simply combine x agents into one composite agent (which is exactly what they did).
Compare it to Innovation switching up his macro game with a 3-rax proxy cheese. It's not akin to playing different players, but the same player choosing a different game plan before he starts the game.
The concept of a composite agent gets interesting when you add a super-agent to it, responsible for picking a sub-agent to play a specific game in a boX match. I would imagine the super-agent would then be trained similar to a Texas Hold'em agent and converge to game-theoretical optima for cheese / standard ratio etc.
This actually has a technical term in machine learning community called ensemble learning. But I don't think it is that easy to implement as of yet. And for efficiency sake the single agent is actually very different from a group of agents which will absolutely require quite a bit of parallel processing to achieve (it is not as simple as installing more GPU can solve). And indeed these agents choose to represent the group of all agents in the AlphaStar league will be those encounter many different strategies and still win for the most part overall. It actually is a very difficult problem to introduce "novelty" and still able to adapt mid-game. The current system is simply not having any learning capability on the fly (within one game, in machine learning term, it is a system with offline learning, instead of active/online learning which is much much more difficult).
I don't know anything about AI, but wouldn't it be sufficient to simply have the bots play Bo5's against each other instead of Bo1's during the training phase? Because then they can still learn from what their opponent has been doing in previous games.
Well, you're not wrong. It's just that if you do that and actually want the bot to learn adaption patterns over multiple games, then you need to feed it the previously played games as input.
If you design that mechanism manually, the most simple approach I can think of is to feed it the history of wins/losses together with the respective agent as additional input:
Game 1: Agent "Mass Blinker" - loss Game 2: Agent "Proxy Gates" - win ...
and so on (the agent names are chosen for illustration only - to the AI it would just be agent 1, agent 2, ...).
But if you don't want to do design anything manually = let the bot self-learn, then you'd have to feed the complete history of entire games as input, which blows up the input quite a bit.
- was the setup "fair" or not? - did the AI play well or not?
In the chess community we were blown away by the games AlphaZero played about a year ago. We had never seen anything like it. However, it was also very disappointing to realize Deepmind wasn't interested in chess at all. It was nothing but a playing field to demonstrate the capabilities of their neural net. As soon as the experiment concluded successfully, the Deepmind team moved on and left the chess world wondering what could have been, if they had access. Imagine somebody allowing you to peek into a treasure chest full of amazing content, but then closes it and stores it away, not to be opened again.
Realistically we are in the same situation with StarCraft of course. Once Deepmind "beats the game" they will move on without missing a beat. But
"GOOD NEWS EVERYONE"
(Prof. Farnsworth)
Independent of the two questions above (fair setup? good play?) I think it's safe to say StarCraft still holds plenty of challenges for the AI. And even if at some point the peak of strategic depth has been reached in all matchups on arbitrary maps with a configurable average / max APM parameter, there is still the world of custom games to explore.
On January 25 2019 09:40 TheDougler wrote: You don't know that it was the the camera change that actually was the determining factor here. It could be that Mana had a better idea of what he was up against. It could be that the warp prism threw off the AI's gameplan (I think it's this one). It could be that this AI isn't quite as good as other AIs.
[...] The final I would say is to play only one agent. Every game used a different agent. It's akin to playing different players. TLO didn't know this when he was playing and played his matches as if it was the same agent and thus tried strats to counter what he just saw in the previous game, which of course didn't work. Playing against a single agent would be quite interesting.
A misconception IMO. There is no conceptual difference between "one agent" and "multiple agents", because you can simply combine x agents into one composite agent (which is exactly what they did).
Compare it to Innovation switching up his macro game with a 3-rax proxy cheese. It's not akin to playing different players, but the same player choosing a different game plan before he starts the game.
The concept of a composite agent gets interesting when you add a super-agent to it, responsible for picking a sub-agent to play a specific game in a boX match. I would imagine the super-agent would then be trained similar to a Texas Hold'em agent and converge to game-theoretical optima for cheese / standard ratio etc.
This actually has a technical term in machine learning community called ensemble learning. But I don't think it is that easy to implement as of yet. And for efficiency sake the single agent is actually very different from a group of agents which will absolutely require quite a bit of parallel processing to achieve (it is not as simple as installing more GPU can solve). And indeed these agents choose to represent the group of all agents in the AlphaStar league will be those encounter many different strategies and still win for the most part overall. It actually is a very difficult problem to introduce "novelty" and still able to adapt mid-game. The current system is simply not having any learning capability on the fly (within one game, in machine learning term, it is a system with offline learning, instead of active/online learning which is much much more difficult).
I don't know anything about AI, but wouldn't it be sufficient to simply have the bots play Bo5's against each other instead of Bo1's during the training phase? Because then they can still learn from what their opponent has been doing in previous games.
Well, you're not wrong. It's just that if you do that and actually want the bot to learn adaption patterns over multiple games, then you need to feed it the previously played games as input.
If you design that mechanism manually, the most simple approach I can think of is to feed it the history of wins/losses together with the respective agent as additional input:
Game 1: Agent "Mass Blinker" - loss Game 2: Agent "Proxy Gates" - win ...
and so on (the agent names are chosen for illustration only - to the AI it would just be agent 1, agent 2, ...).
But if you don't want to do design anything manually = let the bot self-learn, then you'd have to feed the complete history of entire games as input, which blows up the input quite a bit.
There is already a matchmaking probabilities parameters in the reinforcement learning process as shown in the blog post
In evolutionary algorithm term, it is similar to mating or ranking mechanism put together, if an agent already played against certain agent with certain strategies, it shouldn't "mate" with the same agent as often, but really need to "mate/match" with newly introduced agents/variations, hence novelty search for new blood. However, the probability for playing against losing agents should still be higher I assume, so the policy learning will be able to reward a bit more and certain agents can solidify its sucessful strategies (try a few more times just to make sure so to speak)
I had been a proponent of oversaturating minerals for a long time after I saw Hister do it long ago. I gave up on it later, but I think I will go back to it.
There is so much to learn from an AI that doesn't care about norms or customs and does whatever it thinks gives it the best chance to win. As much as I try to avoid the pressure of following whatever everyone else does, its get me.
Deepmind has a very long way to go to beat Starcraft. A human can regularly take down the cheating SC2 AI with ease, I think Deepmind loses to the cheating AI at this point. It's micro isn't good enough to overcome the massive economic advantage the cheating AI gets.
And why didn't anyone cannon rush the AI? AI is always going to be weak to cheese. It will never play mind games better than a mind.
On January 26 2019 01:05 BronzeKnee wrote: [...] And why didn't anyone cannon rush the AI? AI is always going to be weak to cheese. It will never play mind games better than a mind.
My honest opinion? Because TLO and Mana were pulled aside in a quiet moment and politely asked to please tend towards macro games. We spotted some weakness in the defense, so trying to exploit that via a cannon rush would be a pretty straight-forward move...
On January 25 2019 09:40 TheDougler wrote: You don't know that it was the the camera change that actually was the determining factor here. It could be that Mana had a better idea of what he was up against. It could be that the warp prism threw off the AI's gameplan (I think it's this one). It could be that this AI isn't quite as good as other AIs.
[...] The final I would say is to play only one agent. Every game used a different agent. It's akin to playing different players. TLO didn't know this when he was playing and played his matches as if it was the same agent and thus tried strats to counter what he just saw in the previous game, which of course didn't work. Playing against a single agent would be quite interesting.
A misconception IMO. There is no conceptual difference between "one agent" and "multiple agents", because you can simply combine x agents into one composite agent (which is exactly what they did).
Compare it to Innovation switching up his macro game with a 3-rax proxy cheese. It's not akin to playing different players, but the same player choosing a different game plan before he starts the game.
The concept of a composite agent gets interesting when you add a super-agent to it, responsible for picking a sub-agent to play a specific game in a boX match. I would imagine the super-agent would then be trained similar to a Texas Hold'em agent and converge to game-theoretical optima for cheese / standard ratio etc.
This actually has a technical term in machine learning community called ensemble learning. But I don't think it is that easy to implement as of yet. And for efficiency sake the single agent is actually very different from a group of agents which will absolutely require quite a bit of parallel processing to achieve (it is not as simple as installing more GPU can solve). And indeed these agents choose to represent the group of all agents in the AlphaStar league will be those encounter many different strategies and still win for the most part overall. It actually is a very difficult problem to introduce "novelty" and still able to adapt mid-game. The current system is simply not having any learning capability on the fly (within one game, in machine learning term, it is a system with offline learning, instead of active/online learning which is much much more difficult).
I don't know anything about AI, but wouldn't it be sufficient to simply have the bots play Bo5's against each other instead of Bo1's during the training phase? Because then they can still learn from what their opponent has been doing in previous games.
Well, you're not wrong. It's just that if you do that and actually want the bot to learn adaption patterns over multiple games, then you need to feed it the previously played games as input.
If you design that mechanism manually, the most simple approach I can think of is to feed it the history of wins/losses together with the respective agent as additional input:
Game 1: Agent "Mass Blinker" - loss Game 2: Agent "Proxy Gates" - win ...
and so on (the agent names are chosen for illustration only - to the AI it would just be agent 1, agent 2, ...).
But if you don't want to do design anything manually = let the bot self-learn, then you'd have to feed the complete history of entire games as input, which blows up the input quite a bit.
There is already a matchmaking probabilities parameters in the reinforcement learning process as shown in the blog post
In evolutionary algorithm term, it is similar to mating or ranking mechanism put together, if an agent already played against certain agent with certain strategies, it shouldn't "mate" with the same agent as often, but really need to "mate/match" with newly introduced agents/variations, hence novelty search for new blood. However, the probability for playing against losing agents should still be higher I assume, so the policy learning will be able to reward a bit more and certain agents can solidify its sucessful strategies (try a few more times just to make sure so to speak)
yeah that's not what I was talking about. Basically, if you want the decisions to depend on history, you need to feed the history. And if you want pure self-learning you need to feed all of it. I.e. the complete history of the boX match.
On January 26 2019 01:22 BronzeKnee wrote: People keep talking about the limitations that were placed on the AI... how about the limitations placed on the humans?
PvP only?
Pick random. AI has no clue what race you are. Cheese hard or feign cheese and collect the easy win.
From a scientific point of view, expanding from 1 to 6 matchups doesn't add much value. It just costs 6 times more in terms of compute power or time.
On January 26 2019 01:22 BronzeKnee wrote: People keep talking about the limitations that were placed on the AI... how about the limitations placed on the humans?
PvP only?
Pick random. AI has no clue what race you are. Cheese hard or feign cheese and collect the easy win.
From a scientific point of view, expanding from 1 to 6 matchups doesn't add much value. It just costs 6 times more in terms of compute power or time.
I don't agree, the other matchups could present very different challenges.
Also, oversaturate your minerals human noobs! That was embarassing to see, how could we have been so blind. The value is not in the minerals but in the ability to absorb probe losses and to expand at full capacity.
On January 25 2019 09:40 TheDougler wrote: You don't know that it was the the camera change that actually was the determining factor here. It could be that Mana had a better idea of what he was up against. It could be that the warp prism threw off the AI's gameplan (I think it's this one). It could be that this AI isn't quite as good as other AIs.
[...] The final I would say is to play only one agent. Every game used a different agent. It's akin to playing different players. TLO didn't know this when he was playing and played his matches as if it was the same agent and thus tried strats to counter what he just saw in the previous game, which of course didn't work. Playing against a single agent would be quite interesting.
A misconception IMO. There is no conceptual difference between "one agent" and "multiple agents", because you can simply combine x agents into one composite agent (which is exactly what they did).
Compare it to Innovation switching up his macro game with a 3-rax proxy cheese. It's not akin to playing different players, but the same player choosing a different game plan before he starts the game.
The concept of a composite agent gets interesting when you add a super-agent to it, responsible for picking a sub-agent to play a specific game in a boX match. I would imagine the super-agent would then be trained similar to a Texas Hold'em agent and converge to game-theoretical optima for cheese / standard ratio etc.
This actually has a technical term in machine learning community called ensemble learning. But I don't think it is that easy to implement as of yet. And for efficiency sake the single agent is actually very different from a group of agents which will absolutely require quite a bit of parallel processing to achieve (it is not as simple as installing more GPU can solve). And indeed these agents choose to represent the group of all agents in the AlphaStar league will be those encounter many different strategies and still win for the most part overall. It actually is a very difficult problem to introduce "novelty" and still able to adapt mid-game. The current system is simply not having any learning capability on the fly (within one game, in machine learning term, it is a system with offline learning, instead of active/online learning which is much much more difficult).
I don't know anything about AI, but wouldn't it be sufficient to simply have the bots play Bo5's against each other instead of Bo1's during the training phase? Because then they can still learn from what their opponent has been doing in previous games.
Well, you're not wrong. It's just that if you do that and actually want the bot to learn adaption patterns over multiple games, then you need to feed it the previously played games as input.
If you design that mechanism manually, the most simple approach I can think of is to feed it the history of wins/losses together with the respective agent as additional input:
Game 1: Agent "Mass Blinker" - loss Game 2: Agent "Proxy Gates" - win ...
and so on (the agent names are chosen for illustration only - to the AI it would just be agent 1, agent 2, ...).
But if you don't want to do design anything manually = let the bot self-learn, then you'd have to feed the complete history of entire games as input, which blows up the input quite a bit.
There is already a matchmaking probabilities parameters in the reinforcement learning process as shown in the blog post
In evolutionary algorithm term, it is similar to mating or ranking mechanism put together, if an agent already played against certain agent with certain strategies, it shouldn't "mate" with the same agent as often, but really need to "mate/match" with newly introduced agents/variations, hence novelty search for new blood. However, the probability for playing against losing agents should still be higher I assume, so the policy learning will be able to reward a bit more and certain agents can solidify its sucessful strategies (try a few more times just to make sure so to speak)
yeah that's not what I was talking about. Basically, if you want the decisions to depend on history, you need to feed the history. And if you want pure self-learning you need to feed all of it. I.e. the complete history of the boX match.
I'd bet truly random strategy selection (over the set of good enough strategies) is unbeatable in BoX
On January 25 2019 09:40 TheDougler wrote: You don't know that it was the the camera change that actually was the determining factor here. It could be that Mana had a better idea of what he was up against. It could be that the warp prism threw off the AI's gameplan (I think it's this one). It could be that this AI isn't quite as good as other AIs.
[...] The final I would say is to play only one agent. Every game used a different agent. It's akin to playing different players. TLO didn't know this when he was playing and played his matches as if it was the same agent and thus tried strats to counter what he just saw in the previous game, which of course didn't work. Playing against a single agent would be quite interesting.
A misconception IMO. There is no conceptual difference between "one agent" and "multiple agents", because you can simply combine x agents into one composite agent (which is exactly what they did).
Compare it to Innovation switching up his macro game with a 3-rax proxy cheese. It's not akin to playing different players, but the same player choosing a different game plan before he starts the game.
The concept of a composite agent gets interesting when you add a super-agent to it, responsible for picking a sub-agent to play a specific game in a boX match. I would imagine the super-agent would then be trained similar to a Texas Hold'em agent and converge to game-theoretical optima for cheese / standard ratio etc.
This actually has a technical term in machine learning community called ensemble learning. But I don't think it is that easy to implement as of yet. And for efficiency sake the single agent is actually very different from a group of agents which will absolutely require quite a bit of parallel processing to achieve (it is not as simple as installing more GPU can solve). And indeed these agents choose to represent the group of all agents in the AlphaStar league will be those encounter many different strategies and still win for the most part overall. It actually is a very difficult problem to introduce "novelty" and still able to adapt mid-game. The current system is simply not having any learning capability on the fly (within one game, in machine learning term, it is a system with offline learning, instead of active/online learning which is much much more difficult).
I don't know anything about AI, but wouldn't it be sufficient to simply have the bots play Bo5's against each other instead of Bo1's during the training phase? Because then they can still learn from what their opponent has been doing in previous games.
Well, you're not wrong. It's just that if you do that and actually want the bot to learn adaption patterns over multiple games, then you need to feed it the previously played games as input.
If you design that mechanism manually, the most simple approach I can think of is to feed it the history of wins/losses together with the respective agent as additional input:
Game 1: Agent "Mass Blinker" - loss Game 2: Agent "Proxy Gates" - win ...
and so on (the agent names are chosen for illustration only - to the AI it would just be agent 1, agent 2, ...).
But if you don't want to do design anything manually = let the bot self-learn, then you'd have to feed the complete history of entire games as input, which blows up the input quite a bit.
There is already a matchmaking probabilities parameters in the reinforcement learning process as shown in the blog post
In evolutionary algorithm term, it is similar to mating or ranking mechanism put together, if an agent already played against certain agent with certain strategies, it shouldn't "mate" with the same agent as often, but really need to "mate/match" with newly introduced agents/variations, hence novelty search for new blood. However, the probability for playing against losing agents should still be higher I assume, so the policy learning will be able to reward a bit more and certain agents can solidify its sucessful strategies (try a few more times just to make sure so to speak)
yeah that's not what I was talking about. Basically, if you want the decisions to depend on history, you need to feed the history. And if you want pure self-learning you need to feed all of it. I.e. the complete history of the boX match.
I'd bet truly random strategy selection (over the set of good enough strategies) is unbeatable in BoX
Especially because the agents don't have specific tendencies like human players. Unlike a human, AlphaStar can switch to a very different agent executing a very different strategy or even switch races without loss of performance.
On January 26 2019 01:22 BronzeKnee wrote: People keep talking about the limitations that were placed on the AI... how about the limitations placed on the humans?
PvP only?
Pick random. AI has no clue what race you are. Cheese hard or feign cheese and collect the easy win.
From a scientific point of view, expanding from 1 to 6 matchups doesn't add much value. It just costs 6 times more in terms of compute power or time.
I don't agree, the other matchups could present very different challenges.
Also, oversaturate your minerals human noobs! That was embarassing to see, how could we have been so blind. The value is not in the minerals but in the ability to absorb probe losses and to expand at full capacity.
It's interesting whether it applies to humans as well. This tactic might be more applicable to agents with superior unit control and spending regime. But then again - maybe it's a new meta.
On January 26 2019 01:22 BronzeKnee wrote: People keep talking about the limitations that were placed on the AI... how about the limitations placed on the humans?
PvP only?
Pick random. AI has no clue what race you are. Cheese hard or feign cheese and collect the easy win.
From a scientific point of view, expanding from 1 to 6 matchups doesn't add much value. It just costs 6 times more in terms of compute power or time.
6 times? My man... that isn't how this works. That isn't how any of this works. It far greater than 6 times because of the exponential increase of variables.
We used to talk a lot about Starsense. The ability to know something was happening without having any clue it was happening. The AI has no Starsense, clearly. Mana wrecked with a two Immortal drop that crippled it.
The AI is going to always favor aggression for the same reason I do: If you are attacking you are controlling the tempo of the game. That makes it more likely it will win because it reduces the number of variables in an interaction. Defending takes on many more forms.
Attack the AI and it will fall apart. Hold it's attacks (which is harder because defending takes on more variables and it has near perfect micro), and it will fall apart.
I think it’s hard to judge from just one game if it is that vulnerable to opponents attacking. Sad thing from their AMA thread is that they will stop using this current version of their agent, without even playing more games in camera interface to try to understand its flaws better.
On January 26 2019 01:05 BronzeKnee wrote: I had been a proponent of oversaturating minerals for a long time after I saw Hister do it long ago. I gave up on it later, but I think I will go back to it.
There is so much to learn from an AI that doesn't care about norms or customs and does whatever it thinks gives it the best chance to win. As much as I try to avoid the pressure of following whatever everyone else does, its get me.
Deepmind has a very long way to go to beat Starcraft. A human can regularly take down the cheating SC2 AI with ease, I think Deepmind loses to the cheating AI at this point. It's micro isn't good enough to overcome the massive economic advantage the cheating AI gets.
And why didn't anyone cannon rush the AI? AI is always going to be weak to cheese. It will never play mind games better than a mind.
Lol, my man, Tencent did that in September already, with a bot 'only' circa Platinum lvl, that would get obliterated by AlphaStar : arxiv.org
Respectfully, it might be a good idea to do some research here...
On January 26 2019 03:23 Poopi wrote: I think it’s hard to judge from just one game if it is that vulnerable to opponents attacking. Sad thing from their AMA thread is that they will stop using this current version of their agent, without even playing more games in camera interface to try to understand its flaws better.
Don't worry, they will just look for a more planning oriented approach, and iterate on it. Given the resources they've already invested in this, there's zero chance they take this as a sunk cost, and as they have followed the AlphaGo playbook to the letter so far (with Mana as Fan Hui), this is definitely going all the way to AlphaStar vs (Serral, Maru).
On January 26 2019 01:22 BronzeKnee wrote: People keep talking about the limitations that were placed on the AI... how about the limitations placed on the humans?
PvP only?
Pick random. AI has no clue what race you are. Cheese hard or feign cheese and collect the easy win.
From a scientific point of view, expanding from 1 to 6 matchups doesn't add much value. It just costs 6 times more in terms of compute power or time.
I don't agree, the other matchups could present very different challenges.
Also, oversaturate your minerals human noobs! That was embarassing to see, how could we have been so blind. The value is not in the minerals but in the ability to absorb probe losses and to expand at full capacity.
It's interesting whether it applies to humans as well. This tactic might be more applicable to agents with superior unit control and spending regime. But then again - maybe it's a new meta.
I wonder if overbuilding probes is for anticipate probes loss or if it's more something along the line of balancing the risk-reward of expanding. If you get your expand denied you will have a better incomed if you oversaturate so maybe it's safer against agressive build. Like if you get 1 based and instead of getting the nexus you have 8 more probe in the main you will be ahead in econnomy against the other player witb having only 1 base to defend and when you do get the expand the reward will be imediate so it's less of a risk.
On January 25 2019 09:40 TheDougler wrote: You don't know that it was the the camera change that actually was the determining factor here. It could be that Mana had a better idea of what he was up against. It could be that the warp prism threw off the AI's gameplan (I think it's this one). It could be that this AI isn't quite as good as other AIs.
[...] The final I would say is to play only one agent. Every game used a different agent. It's akin to playing different players. TLO didn't know this when he was playing and played his matches as if it was the same agent and thus tried strats to counter what he just saw in the previous game, which of course didn't work. Playing against a single agent would be quite interesting.
A misconception IMO. There is no conceptual difference between "one agent" and "multiple agents", because you can simply combine x agents into one composite agent (which is exactly what they did).
Compare it to Innovation switching up his macro game with a 3-rax proxy cheese. It's not akin to playing different players, but the same player choosing a different game plan before he starts the game.
The concept of a composite agent gets interesting when you add a super-agent to it, responsible for picking a sub-agent to play a specific game in a boX match. I would imagine the super-agent would then be trained similar to a Texas Hold'em agent and converge to game-theoretical optima for cheese / standard ratio etc.
This actually has a technical term in machine learning community called ensemble learning. But I don't think it is that easy to implement as of yet. And for efficiency sake the single agent is actually very different from a group of agents which will absolutely require quite a bit of parallel processing to achieve (it is not as simple as installing more GPU can solve). And indeed these agents choose to represent the group of all agents in the AlphaStar league will be those encounter many different strategies and still win for the most part overall. It actually is a very difficult problem to introduce "novelty" and still able to adapt mid-game. The current system is simply not having any learning capability on the fly (within one game, in machine learning term, it is a system with offline learning, instead of active/online learning which is much much more difficult).
I don't know anything about AI, but wouldn't it be sufficient to simply have the bots play Bo5's against each other instead of Bo1's during the training phase? Because then they can still learn from what their opponent has been doing in previous games.
Well, you're not wrong. It's just that if you do that and actually want the bot to learn adaption patterns over multiple games, then you need to feed it the previously played games as input.
If you design that mechanism manually, the most simple approach I can think of is to feed it the history of wins/losses together with the respective agent as additional input:
Game 1: Agent "Mass Blinker" - loss Game 2: Agent "Proxy Gates" - win ...
and so on (the agent names are chosen for illustration only - to the AI it would just be agent 1, agent 2, ...).
But if you don't want to do design anything manually = let the bot self-learn, then you'd have to feed the complete history of entire games as input, which blows up the input quite a bit.
There is already a matchmaking probabilities parameters in the reinforcement learning process as shown in the blog post
In evolutionary algorithm term, it is similar to mating or ranking mechanism put together, if an agent already played against certain agent with certain strategies, it shouldn't "mate" with the same agent as often, but really need to "mate/match" with newly introduced agents/variations, hence novelty search for new blood. However, the probability for playing against losing agents should still be higher I assume, so the policy learning will be able to reward a bit more and certain agents can solidify its sucessful strategies (try a few more times just to make sure so to speak)
yeah that's not what I was talking about. Basically, if you want the decisions to depend on history, you need to feed the history. And if you want pure self-learning you need to feed all of it. I.e. the complete history of the boX match.
I got what you said, there used to be a kind of reinforcement learning distinction in whether to completely recorded all the past actions and behaviors in achieves or always trained from the very beginning independently only treating correlation between scenarios with correlation probability. The problem of utilizing archives is usually about training instability issue, as well as the scaling problem when the combination of possible samples become less representative if we introduce longer and longer training samples to RNNs. (actually input dimention of LSTM or RNNS doesn't have to change, just the training samples become longer, and most likely more hidden layers/nodes for memory)
In essence, LSTM should be able to treat a series of games input as one super long sequences and find patterns across games, hence agents' LSTM initial input will not be initialized when a new game started, but retain the precious iteration's final hidden output as its new initial hidden input. (think of it like the network remembers its final "mental states" after a match and bring it on to the next match). But the first problem will be the supervised learning training examples don't have enough of these kindd of particular player to particular player match-to-match games sequences available, hence the starting the policy networks will not be able to supervised learn them. Hence unless the whole structure is redesigned akin to AlphaGo Zero without any prior supervised learning, or somehow someone painstakingly manually piece together enough high level consecutive training sequence of Go5, Go7 from many high level players in all kinds of different strategies combinations used as training examples.
On January 26 2019 01:22 BronzeKnee wrote: People keep talking about the limitations that were placed on the AI... how about the limitations placed on the humans?
PvP only?
Pick random. AI has no clue what race you are. Cheese hard or feign cheese and collect the easy win.
From a scientific point of view, expanding from 1 to 6 matchups doesn't add much value. It just costs 6 times more in terms of compute power or time.
I don't agree, the other matchups could present very different challenges.
Also, oversaturate your minerals human noobs! That was embarassing to see, how could we have been so blind. The value is not in the minerals but in the ability to absorb probe losses and to expand at full capacity.
It's interesting whether it applies to humans as well. This tactic might be more applicable to agents with superior unit control and spending regime. But then again - maybe it's a new meta.
I wonder if overbuilding probes is for anticipate probes loss or if it's more something along the line of balancing the risk-reward of expanding. If you get your expand denied you will have a better incomed if you oversaturate so maybe it's safer against agressive build. Like if you get 1 based and instead of getting the nexus you have 8 more probe in the main you will be ahead in econnomy against the other player witb having only 1 base to defend and when you do get the expand the reward will be imediate so it's less of a risk.
I think it's the latter. I've gone and looked at minerals lost in the final game where it was sacrificing 2 oracles each time to one-shot Mana's workers. The mineral trade looks even in an instant, but killing workers makes AS better off in the long run, so it takes a very clinical and quantified view of future income. Implicitly embedded in the reinforcement learning algorithm is a very strong Monte-Carlo optimization engine. So once agents are fully trained (in 3 to 6 months IMHO), if they still do whacko stuff like oversaturating minerals, that is probably not a local optimum but the actual thing to copy for us as humans.
On January 26 2019 01:22 BronzeKnee wrote: People keep talking about the limitations that were placed on the AI... how about the limitations placed on the humans?
PvP only?
Pick random. AI has no clue what race you are. Cheese hard or feign cheese and collect the easy win.
From a scientific point of view, expanding from 1 to 6 matchups doesn't add much value. It just costs 6 times more in terms of compute power or time.
6 times? My man... that isn't how this works. That isn't how any of this works. It far greater than 6 times because of the exponential increase of variables.
Well, to expand from 1 mirror matchup to 3 you most definitely just need ~3 times the resources.
I admit in a non-mirror you have roughly twice as may different unit types. But whether that actually increases the number of variables exponentially depends on the implementation of the NN. For example, if you feed raw pixels to a fully connected NN, then it doesn't matter how many unit types you identify on a higher level of abstraction. You will have the same amount of nodes and edges in the neural net.
First off, the 'what is fair'-argument. It doesn't matter. You decide on the challenge you want the AI to solve. No one is going to argue that an AI isn't 'being fair' when it is doing your current job so much better that you get fired. If AI is better at looking at medical imaging data than doctors, why should we let doctors keep doing this job and let people die? So what does this whole 'not fair' thing mean and why is it even important? To me, the APM cap is already silly. In theory, a human could keep count of the hits every unit they have is getting, because it is in view, so they can calculate the hp of every of their 20 marines without having to click on them. But humans cannot. But AI can. That's the point. They don't have the limitations humans have. Yes, the AI can see the game state and doesn't need an interface. But we could also display the game state as an matrix of numbers on the screen, rather than the view of the game 'made for humans'. But humans cannot comprehend that at all, because of human limitations.
BTW, to those that think the AI learned from playing vs Mana, that almost certainly isn't the case. You train the weights and biases of your net using training games. You need to be sure if you are adjusting your weights properly. If you let the AI play 1000's of games vs a human, you do not know if when you are winning more you actually have better weights, or the humans are weaker/tired/messing around, etc.
If you compare this with the BWAPI bots and considering the power deeplearning neural networks have so far shown to have, this clearly shows that a neural net can handle playing Starcraft properly. The AI doesn't get stuck doing silly stuff. It clearly knows the objective of the game.
This means that they succeeded in successfully capturing playing RTS as a mathematical equation. They have a game state, which basically is a matrix of a bunch of numbers, and they have a 'move space' which is basically a matrix that decides which action to take. And they were able to state it such that you have a phase space landscape that has curvature enough to move towards a minimum. Training the neural network doesn't lead from one silly useless attempt to control the game, to some other version of that. It is converging towards proper strong play.
I remember having a debate here where people vehemently claimed that AI would never be able to properly play an RTS. Which was after AlphaGo beat Lee Sedul. Which is completely silly in hindsight. I think people that say they are not impressed somehow expected an AI with an eerie ability of the game to read the mind of the other player. Or they expect 'human play', whatever that means. But the bots train to be good at winning. Yes, they train against each other, so it is not clear initially if the way the AI plays is completely filled with blind spots that humans automatically find and exploit. Often you can see how an AI tries to react to you, and then exploit that. We only saw that once, with the harass (donno what those units are called as I am a SC BW player). But the AI beat the human players. That is what matters. We don't know what would have happened if the humans were better. The AI doesn't care about aesthetics like far distance mining with workers or building too many observers. The AI points out what is important in winning the game. And clearly one aspect there is steady macro, good micro and decisive attacks. To me, the most impressive aspect was that the AI knows when it can win a fight and when it cannot. So this tells you something about the game. Apparently 'being really good at the micro minigame' is at the core of being good at SC2. People say the AI didn't scout. Maybe scouting isn't that important? Or maybe the AI did have 'Star sense' and it already knew what it needed to know. Furthermore, this isn't a human. So they don't think in concrete plans. The AI juggles between playing vs all unknown plays at the same time. So rather than trying to hardcounter what the opponent is doing, the AI decides to play a strategy that wins vs most things the opponent can do. Maybe this is the proper way to approach SC2? Humans try to play to win each and every game. But since RTS are games of limited info, maybe like in poker you should play the best strategy on the long term, not try to win a game you are not going to be able to win because of a random draw/bad luck.
People talk about mindgames and about the AI being like human RTS players, trying to guess their opponent's build. But the AI here is far superior. Especially if you can just introduce a new agent that has a completely different style. The AI doesn't know what happened in the game just before and it doesn't care. It knows the best way to play, and it just keeps doing that. How are you going to counter that as a human? By definition, you will lose every mind game vs AI, because it doesn't 'think'. Well, maybe the midngame will be with the human in control of which agent to select.
As for the other matchups, if you can solve this matchup, you can make a completely independent NN for the other ones. Someone said it will be way more difficult to train a NN that can do all matchups. Yes, if you need the exact same neural net to play all matchups with the same nodes and weights, yes. But you can just make a completely independent agent for each.
As a SC BW player I am kind of confused about how oversaturating minerals is considered 'bad' by SC2 players. I thought we in SC BW already knew you want to oversaturate. So it seems SC2 people forgot or had to unlearn because their game is different. So this AI clearly has a different approach. So how do we know who is wrong? How do we know if this AI is wrong to oversaturate, or if humans are wrong to do so? This is very similar to the discussion people had in Go, where it was not clear if what the NN was doing was far superior to humans, but humans couldn't understand why, or if the AI was wrong, but much stronger in other aspects of the game so that it wouldn't matter.
Same thing for the AI doing mass stalkers vs immortals when humans think stalkers get countered by immortals. Maybe going mass stalkers and microing properly is the best strategy in every PvP. It says something about the game, not about how bad the AI is.
Maybe some things to think about before you say "x would beat the strongest Deepmind AI 10 times in a row" or 'just cannon rush the AI and you will win" or 'attack the AI and you will win'.
I also can only shake my head at people saying "The AI was only good at massing units and the micro mini game. It doesn't understand anything about strategy."
I think the only big question now is if the following can happen. Deepmind releases a very strong AI to the public, so everyone can play against it. The whole community plays vs it and tries to find a weakness. Will the community as a whole figure out a weakness so that it can be exploited so that at some point all good players can just use that exploit and win. Maybe. But the point is, you can just generate a new AI that has a different weakness. And humans will lose many games before they can find it. So what is the point? If you could somehow capture the play style of a top human player and freeze it, you can much easier train a NN overfitted to beat that specific human player.
At this point we know that an AI doesn't make human mistakes, is relentless, has all the attention to do everything it needs to do, is not obviously exploitable, had decent macro, and will outmicro you. If the only way to beat an AI is to cheese it and hope this build wasn't used by competing agents, then how are humans still superior?
Eudorus you make a lot of great points, but one thing I don't agree with is giving the computer limitations being silly (i.e. APM). It just depends on the question you are trying to answer. Do we want to know if computers can have higher APM than humans? No, we already know that humans could never keep up with a computer in terms of APM. That would be like trying to find out if humans can beat a calculator at math problem solving speed, we already know the answer to that, we don't need an experiment. Here we are trying to answer can AI be programmed to be smart as or smarter than a human at a game of starcraft? beat a human with strategy rather brute APM.
I think the unexpected answer we are getting some early insight into is that maybe starcraft2 isn't as much of a strategy game as we thought it was. To your point, maybe it is mostly about just doing the most high chance success strategy with insane perfect execution (macro and micro). This is what the most successful sc2 looks like. But I don't think the question is fully answered yet.
Agree that real life applications like healthcare we would never want to give limitations, because we're not trying to answer question here really, the most important thing is saving lives. This kind of research will actually help lead to more real world applications like healthcare tools (the team mentioned weather forecasting, which I thought was a very bland example)
But Starcraft is a game. You either win or lose. There is no such thing as 'being smart'.
I do agree with the reason why they put in that limitation. It will be more impressive to see an AI play carefully and strategic rather than an AI that just sits there, then suddenly attacks with some silly unit composition and completely outmicros the human and just somehow wins. It also interesting to see if the AI develops nuanced patterns and behaviors in this realm. But it is probably not the easiest way to minimize towards a NN that wins.
But if you want an AI to be good at Starcraft, there is no reason to put in a limitation. Unless it costs you too much processor power or something and you want so solve the same problem with less resources.
When two humans play, it can really pay off to figure out what your opponent's plan is. Humans usually have a clear and concrete plan. They don't juggle in their mind on 3 candidate plans and have small details sway to which plan they commit. Humans have tells. This can explain why humans play differently. If you can tell your opponent thinks that you are ahead, it means something. Same if you can tell your opponent probably wants a longer game vs you. But those AI agents playing vs each other, they don't have concrete plans, or tells, or try to figure them out and hard-counter them, like humans would.
What is fair can be relevant depending on what people are looking for from the benchmark. If it is just about playing Starcraft, then I'd say being fair doesn't really matter and computers simply have advantages in certain areas over humans. However, in my view it isn't just about playing Starcraft but finding methods that can learn how to act in various kinds of complex domains. In that sense the question isn't just whether the machine can beat the best human, but rather can the methods learn to deal with all sorts of things that arise in the game. By handicapping its mechanical abilities human benchmarks can better be used to evaluate how well the methods have learned the other aspects.
On January 26 2019 05:40 Eudorus wrote: But Starcraft is a game. You either win or lose. There is no such thing as 'being smart'.
I do agree with the reason why they put in that limitation. It will be more impressive to see an AI play carefully and strategic rather than an AI that just sits there, then suddenly attacks with some silly unit composition and completely outmicros the human and just somehow wins. It also interesting to see if the AI develops nuanced patterns and behaviors in this realm. But it is probably not the easiest way to minimize towards a NN that wins.
But if you want an AI to be good at Starcraft, there is no reason to put in a limitation. Unless it costs you too much processor power or something and you want so solve the same problem with less resources.
When two humans play, it can really pay off to figure out what your opponent's plan is. Humans usually have a clear and concrete plan. They don't juggle in their mind on 3 candidate plans and have small details sway to which plan they commit. Humans have tells. This can explain why humans play differently. If you can tell your opponent thinks that you are ahead, it means something. Same if you can tell your opponent probably wants a longer game vs you. But those AI agents playing vs each other, they don't have concrete plans, or tells, or try to figure them out and hard-counter them, like humans would.
I don't know who would doubt that with infinite apm an AI would win just by overwhelmingly superior mechanics and you definitely don't need a neural network to do that. What AlphaStar is expected to do in the future, if Deepmind will keep on working on it, is to display smart decisions, that's what ordinary AI can't do by themselves.
For people who would doubt AI can beat strong players in AI, just read what people posted here a year ago, for example in the Boxer on AlphaGo thread. Secondly, so what kind of AI can beat a top players without an APM cap and also without a neural network?
So you say you think you need an APM cap to get an AI with 'smart decisions' rather than overwhelming the other player with superior mechanics? So what does that mean? I know what you are trying to say, but think about how you would define 'being smart' in the context of Starcraft. How is it 'smart' to play more human-like and have a finely tuned unit composition and play calm and macro carefully when the nature of the game is such that you should just mass stalkers, control them individually every time the game state updates, and simply just mass up, micro them around continuously, and go in for the kill when your opponent makes a mistake? The best way to define 'being smart' is that which leads to a higher winrate.
By the way, there are both simplifying aspects to having an APM cap, as well as real-life applications to not having one. It will be hard to train the same network to control all your 30 stalkers every game state while as well deciding upon when to switch tech or expand. So in that case you need a small fast micro network and a larger slower network. And there are real-world problems that require such NN's as well as real-world problems that require high speed. Furthermore, who knows what strategy the AI would come up with to properly micro infinite APM blink stalker vs blink stalker battles vs itself. That would need many more nodes than just moving the one that is hurt to the back while yourself attacking the lowest ph enemy stalker currently in range.
No, the APM cap was decided on to make the AI more 'human-like' and relatable. It is basically PR as well as seeing if you can train an AI to do tasks in human-like manner. There are many real-world problems that you can do effectively in eerie AI-like manners, which would be unacceptable for social reasons, or in a human-like manner. If you can mimic a human while outperforming them, than that is often better than doing the same thing slightly better, but being completely alien.
On January 26 2019 06:38 Eudorus wrote: For people who would doubt AI can beat strong players in AI, just read what people posted here a year ago, for example in the Boxer on AlphaGo thread. Secondly, so what kind of AI can beat a top players without an APM cap and also without a neural network?
So you say you think you need an APM cap to get an AI with 'smart decisions' rather than overwhelming the other player with superior mechanics? So what does that mean? I know what you are trying to say, but think about how you would define 'being smart' in the context of Starcraft. How is it 'smart' to play more human-like and have a finely tuned unit composition and play calm and macro carefully when the nature of the game is such that you should just mass stalkers, control them individually every time the game state updates, and simply just mass up, micro them around continuously, and go in for the kill when your opponent makes a mistake? The best way to define 'being smart' is that which leads to a higher winrate.
By the way, there are both simplifying aspects to having an APM cap, as well as real-life applications to not having one. It will be hard to train the same network to control all your 30 stalkers every game state while as well deciding upon when to switch tech or expand. So in that case you need a small fast micro network and a larger slower network. And there are real-world problems that require such NN's as well as real-world problems that require high speed. Furthermore, who knows what strategy the AI would come up with to properly micro infinite APM blink stalker vs blink stalker battles vs itself. That would need many more nodes than just moving the one that is hurt to the back while yourself attacking the lowest ph enemy stalker currently in range.
No, the APM cap was decided on to make the AI more 'human-like' and relatable. It is basically PR as well as seeing if you can train an AI to do tasks in human-like manner. There are many real-world problems that you can do effectively in eerie AI-like manners, which would be unacceptable for social reasons, or in a human-like manner. If you can mimic a human while outperforming them, than that is often better than doing the same thing slightly better, but being completely alien.
I believe that they picked StarCraft as a game to test the AI on because it is a game where players have to decide how to divide attention and make decisions with imperfect information. If AlphaStar is winning games with brute force because it has superhuman micro and the ability to focus across multiple screens at the same time then it isn't achieving that goal. The final game where it could only focus on one screen and failed to build a phoenix to counter it, didn't counter Mana's army composition and made bad army movement decisions makes it look like it still has a long way to go on the decision making side of things. I don't think the purpose of the AlphaStar is to have a bot that just bashes players but instead to have a bot that beats players at the imperfect information game, but the bot may never have to learn to do that without some limitations put on it.
Starcraft has this thing called “balance”, where unit strength is calibrated around human abilities to provide strategically rich gameplay. If you break the balance of the game then you can’t test the abilities of your agent, as it will just find out some silly unbeatable micro strat. APM caps and such is not just PR...
I wonder if AlphaStar would play differently if it didn't learn anything from replays, but had to figure everything out by itself. Would be more interesting strategywise to see what it does when it doesnt "mimic humans".
First of all, it was very impressive. Better than any other bots we have seen so far.
Secondly, MaNa just barely lost the first game (if there was a second sentry, the AI would have been as good as dead with its mindless blind rushing up-ramp) and another one (when he was ahead with an army of immortals VS phoenix-stalker. In this case MaNa simply threw the game, which he had in the bag by that moment, because hadn't expected such a decision from the AI , didn't know his style. If he just kept his stuff together and killed the third, he would have won). I wasn't impressed by the A.I.'s micro -- it was not perfect. More impressed that its early aggression was so "commited", because it's exactly how you throw games -- rushing. So , even that part wasn't hopeless for humans
Then , so long time working on that program and so much money spent -- and it's still crushed by a simple workers harassment , like a typical in-built A.I, a thing which I learned in , like , 5 games VS Insane A.I. Kind of stupid for such an advanced program... I am pretty sure I can beat that program, although I'm no pro . It's still very vulnerable.
Makes me think the battle for mankind hasn't been lost yet...
Finally , if some guy played one map whole his life in a mirror match and had a good micro control and mechanics, you would expect that he, having figured the map and all the most plausible scenarios , would win a lot. That's pretty much what we saw in that demonstration. Now, that guy would be totally helpless were he to play good players on many maps in a random order.
So, there is still hope : only when we see an A.I. crush random pro-gamers in a typical tournament setting, 100 out of 100 matches (or something like that, very persuasive), will we have the right to say the game has been figured by A.I, as happened to Go and Chess. Nowadays no world champion at those games can even hope to take a game from the best A.I.'s.
Actually , I'm a bit worried here : now that a human champion has lost to a program, there is a chance that the same happens what happened to Go and Chess...
I'd love to watch a set of exhibition matches against actual top tier players. There is such a skill gap between the best 5 pros and the lesser skilled pros. It would be so nice to see some Terran matches and mixed race games. Serral, Maru, TY, Classic, Stats, etc. I would watch those games on the edge of my chair
On January 26 2019 07:53 CobaltBlu wrote: I believe that they picked StarCraft as a game to test the AI on because it is a game where players have to decide how to divide attention and make decisions with imperfect information.
Why would 'divided attention' be a fundamental AI problem? AI's can be programmed to have just as much 'attention' as you have parallel threads available. I believe you are mistaken. Imperfect information? Definitely. Hard to capture in terms of moves, definitely. Attention. No way!
If AlphaStar is winning games with brute force because it has superhuman micro and the ability to focus across multiple screens at the same time then it isn't achieving that goal.
What do you mean 'brute force'? If you write an AI to detect brain lesions or tumors on a CT or MRI, why wouldn't you allow the AI access to many many previous images? If you want to write an AI to be an air controller and direct planes to their runways, why wouldn't you want the AI to control multiple airplanes at the same time?
The final game where it could only focus on one screen and failed to build a phoenix to counter it, didn't counter Mana's army composition and made bad army movement decisions makes it look like it still has a long way to go on the decision making side of things. I don't think the purpose of the AlphaStar is to have a bot that just bashes players but instead to have a bot that beats players at the imperfect information game, but the bot may never have to learn to do that without some limitations put on it.
The AI had imperfect information in all games. Only in the last game they had a new AI that had to use a window to access game info rather than just read out the game state from the API. We don't know why the last AI lost to Mana. Was it weaker because it had less training? Was it weaker because of this new restriction? Was Mana playing better and had some more luck?
Yes, they thought they had strong bots, so they put a new limitation on it. They obviously bought into the 'fairness' argument, be it marketing purposes or because having a restriction makes it more challenging and is relevant to some real-world applications. But to say that their core purpose was to have a neural network with limited attention perform better is false. And even more false is to come in with the notion that an AI without APM cap is 'less smart' because it wins through micro rather than through other means.
On January 26 2019 08:00 shabby wrote: I wonder if AlphaStar would play differently if it didn't learn anything from replays, but had to figure everything out by itself. Would be more interesting strategywise to see what it does when it doesnt "mimic humans".
Considering the move space, initially a neural network with random weights will have purely random actions. It will be as if your cat is walking across your keyboard, or as if a monkey is clicking your mouse. At that point you depend on the AI accidentally building a pylon, building a gateway, and having a zealot move towards the enemy base. So you would have hundreds of thousands of games lasting for a very long time with literally nothing happening. In other words, the phase space of tremendously huge and only a very tiny segment of that phase space has engines that actually attempt to play the game. And if you initiate a random neural net, it will be out there in a flat desert of completely random clicking. If you are there and move in any direction of the phase space, you aren't suddenly winning more games. All your agents would just randomly click while the clock counts down to a draw. So all your bots draw vs all of them because the phase space is so huge, no neural network gets initialized with the proper weights. That's why you first imitate human play. We already know how a game of Starcraft should look like. So there is no sense in exploring the vast flat desert of useless neural nets. Maybe you are copying things from humans that are bad and you don't unlearn them. Hard to know. But it makes no sense to try to train neural nets when only 0.0000001% of them actually send their probes to mine minerals.
I'd like to see how well AlphaStar does in BW, his main edge in the matches came from being aggressive and outmicroing his opponents w/ blink micro (obviously while having perfect macro/base management). BW has a much greater defenders advantage, it wouldn't be able to capitalize on the same aspects it did in SC2 (which to me was mainly exploiting micro and positioning).
BW games are slower paced and there's way less poking, AlphaStar wouldn't be able to exploit micro maneuvers which obviously benefits human players.
Edit: Well, AlphaStar could exploit microing multiple muta groups perfectly but that would only be viable in ZvT and ZvZ. Even then a progamer T could adapt and just open 1-1-1 into valks or fast vessel, AlphaStar would probably be godly in ZvZ tho.
People are so offended by this showmatch its making me think you believe that whatever happened in it has meaningful value beyond what the team at deepmind learned that can be applied to other fields.
- Do you honestly believe that any human can control 3 groups of stalkers on 3 different screens keeping them on the razors edge of dying vs immortals while keeping a near perfect surround on 3 different sides? - Do you think it was fair to throw a human that has never played vs an AI with that level of micro and expect him to not constantly misjudge the balance of power at any given screen ending up with the human taking fights he can't get any advantage out of?
Wondered why PvP was chosen as the AI's focus? - Easily distinguishable units for the uninitiated to starcraft, only 1 set of them too since its a mirror. - The unit with the most micro potential in the game that doesn't look hilariously unfair (see individual zerling / marine / bane micro) - Not a whole lot of emphasis on building placement for defending which I could see the AI having major issues with. - A matchup known for its short games and high agression. - Stalkers allow the computer to never really commit to anything it doesn't like while still being able to kill the human player at any point and keeping a good air defense while not having to do much scouting.
Its just a demonstration to improve their company's reputation and stock, attracting talent while doing some R&D and there is nothing wrong with that. I'm gonna keep watching people play the game and if every now and again they make a showmatch vs an AI cool, if they make AI's fight eachother and we get to learn something that can or cannot be utilized by human.... cool. If its still hurting your ego keep this in mind: starcraft was designed by people, balanced for people and is played by people with their inherent limitations, the moment you change that no one gets to claim that starcraft has been solved.
On January 26 2019 08:50 Doko wrote: People are so offended by this showmatch its making me think you believe that whatever happened in it has meaningful value beyond what the team at deepmind learned that can be applied to other fields.
- Do you honestly believe that any human can control 3 groups of stalkers on 3 different screens keeping them on the razors edge of dying vs immortals while keeping a near perfect surround on 3 different sides?
No, that is why we have AI
- Do you think it was fair to throw a human that has never played vs an AI with that level of micro and expect him to not constantly misjudge the balance of power at any given screen ending up with the human taking fights he can't get any advantage out of?
Neither had the AI ever played vs a human before. Well, it was trained purely vs other AI's. It didn't know it was playing something else. So what exactly are you trying to say? I also don't understand what this has to do with 'fair'. The only argument you can make was that the humans weren't playing as good as they normally would. And if a human misjudges a fight that the AI does judge correctly, how has that to do with the human being new to AI? You either misjudge a situation or you don't. You mean that the human didn't expect to be outmicroed because subconsciously they had never experienced that before so they were trained to play differently. Correct. But that it what happens when you play vs a superior player. Ai or no AI.
Wondered why PvP was chosen as the AI's focus? - Easily distinguishable units for the uninitiated to starcraft, only 1 set of them too since its a mirror. - The unit with the most micro potential in the game that doesn't look hilariously unfair (see individual zerling / marine / bane micro) - Not a whole lot of emphasis on building placement for defending which I could see the AI having major issues with. - A matchup known for its short games and high agression. - Stalkers allow the computer to never really commit to anything it doesn't like while still being able to kill the human player at any point and keeping a good air defense while not having to do much scouting.
Obviously, they thought that PvP would be easiest. You start testing your methods on the easiest matchup. Then you go to the more difficult one. If you don't do that, you are doing it wrong. Protoss has always been easiest to play, SCBW and apparently also SC2.
If you can go from Go to Starcraft, then going from PvP to ZvT is trivial.
Its just a demonstration to improve their company's reputation and stock, attracting talent while doing some R&D and there is nothing wrong with that. I'm gonna keep watching people play the game and if every now and again they make a showmatch vs an AI cool, if they make AI's fight eachother and we get to learn something that can or cannot be utilized by human.... cool.
Of course. But let's not forget that Deepmind did also quite well in the protein folding competition just a while ago.
If its still hurting your ego keep this in mind: starcraft was designed by people, balanced for people and is played by people with their inherent limitations, the moment you change that no one gets to claim that starcraft has been solved.
On January 26 2019 05:40 Eudorus wrote: But Starcraft is a game. You either win or lose. There is no such thing as 'being smart'.
First of all, there is, of course, such thing as "being smart". IQ tests, with all their limits, measure that thing quite well. And that thing has a lot of "predictive power" : we can actually formulate provable hypotheses based on the IQ level and then test them : see , how things work out for this particular individual. It's a proper experiment , and psychometrics and statistics have accumulated a lot of data regarding that.
But you are absolutely correct : you either win or lose . No chess grandmaster, world chess champion etc. will ever win the Fritz or Stockfish Chess program. No Go champion will ever win the Deepmind program.
But it's not the case with the StarCraft : the game is still winnable VS AI. That's because StarCraft is a complex intellectual cognitively demanding game , and the AIs are still not capable of solving cognitive tasks of that level cogently. The AI that can be beat by a harass trick is worth nothing : it's stupid, it doesn't win games.
When they make the AI that beats any pro in any matchup in 100 out of 100 games, like it is the case in Chess now and like it's almost the case in Go now, we will have to admit that AI has surpassed humans in that cognitively very advanced domain as well. But it's not the case yet...
On January 26 2019 08:43 TT1 wrote: I'd like to see how well AlphaStar does in BW, his main edge in the matches came from being aggressive and outmicroing his opponents w/ blink micro (obviously while having perfect macro/base management). BW has a much greater defenders advantage, it wouldn't be able to capitalize on the same aspects it did in SC2 (which to me was mainly exploiting micro and positioning).
BW games are slower paced and there's way less poking, AlphaStar wouldn't be able to exploit micro maneuvers which obviously benefits human players.
Edit: Well, AlphaStar could exploit microing multiple muta groups perfectly but that would only be viable in ZvT and ZvZ. Even then a progamer T could adapt and just open 1-1-1 into valks or fast vessel, AlphaStar would probably be godly in ZvZ tho.
AlphaStar in TvT could be extremely interesting imo.
On January 26 2019 08:43 TT1 wrote: I'd like to see how well AlphaStar does in BW, his main edge in the matches came from being aggressive and outmicroing his opponents w/ blink micro (obviously while having perfect macro/base management). BW has a much greater defenders advantage, it wouldn't be able to capitalize on the same aspects it did in SC2 (which to me was mainly exploiting micro and positioning).
BW games are slower paced and there's way less poking, AlphaStar wouldn't be able to exploit micro maneuvers which obviously benefits human players.
Edit: Well, AlphaStar could exploit microing multiple muta groups perfectly but that would only be viable in ZvT and ZvZ. Even then a progamer T could adapt and just open 1-1-1 into valks or fast vessel, AlphaStar would probably be godly in ZvZ tho.
AlphaStar in TvT could be extremely interesting imo.
BW TvT is very chess like tho, the longer the game goes the more it slows down (in terms of army movement). It comes down to positioning (tank lines) but there's a ton of decision making involved as well.
I'd like to see AlphaStar play other SC2 MU's first tho. The PvP wins felt like it was out-computing Mana/TLO more than anything . Blink micro is by far the most abusable micro aspect in SC2.
I'd personally love to watch AlphaStar 2h mass muta vs FlaSh. But again, even if it did win it would practically be like it's out-computing a human being by having no mechanical limits.. which is a given. Non ZvZ/ZvT MU's would be a different story tho.
Some observations of the game from me where that alpha-star is pretty good at figuring out army / build order stuff up until its maxed out, it's exceptionally good at the multi screen / angle micro management, but I think people are over estimating how good at is at actual intelligence related stuff.
There were many moments were alphastar was just running up choke points over and over and losing units even though it had just run into a strong army, it doesn't fell like it makes good combat decisions that aren't immediately micro related. At some point it also did nonsensical things like having six observers in the army or building out blink late in a predominantly stalker army.
I'm also not sure it really 'gets' economic management at a high level, it expands very conservatively and mostly to build more army. It would be interesting to see if it can make reasonable decisions in a past maxed out game, or if it can judge end games where resources are going to run out and adapt.
I would have really liked to see a ZvT to gauge this, because long term macro management and also sudden tech switches are definitely a much more strategic problem than the very micro focussed PvP games.
Honestly people aren't talking about his this AI was trained enough. This AI didn't create strategies on its own. The base agents were trained on replays, so effectively the first generation was just copying the replays exactly and so on. Those base agents would execute GM level builds with minimal training.
I'm really impressed by the optimizations it has found in the games. The disruptor game where it kills its own units seems to be an optimization where it realized it's probably better to sacrifice your own units as long as you kill the opponent's units. Sorta like banelings. But it does it very inefficiently. Some of the shots were plain mistakes, but we saw that even in it's build orders. In one game it build a fleet beacon to cancel it 1 second later.
As a proof of concept, the fact it can micromanage it's units according to the situation and context things are happening, the fact it can make decisions on it's own is proof of it's effectiveness. This is basically an alpha AI(no pun/context misinterpretation intended) that still has a lot of work to do.
A real test of it's effectiveness would be if it can play turn based SC2(not really turn based) where every .25 or .5 seconds the game is effectively paused and the player can decide what actions to take with every building and every unit. A single game would take a week of someone sitting there doing those actions. This is just to control for the perfect unit control the AI has, where clicking and focus firing units doesn't become a problem for the human player. There really isn't large strategic gains one could make even if they had 10x longer to think about the game since micromanaging the units is the most difficult part.
One optimization that it does and is really obvious is aggressive focus firing even in large battles. If you actually do the calculations, a perfectly efficient army vs the least efficient army(focused damaged on units, vs spreading the damage as much as possible) the AI can fight armies 40% bigger and win outright, trade evenly against armies around 60% larger. In "even" battles it will humiliate human players like a GM would humiliate a diamond level player.
Watching the replays now and am really impressed. Can't wait to see what this becomes into after a couple years.
Let's just make sure we don't piss HAL off, alright?
On January 26 2019 06:38 Eudorus wrote: For people who would doubt AI can beat strong players in AI, just read what people posted here a year ago, for example in the Boxer on AlphaGo thread. Secondly, so what kind of AI can beat a top players without an APM cap and also without a neural network?
So you say you think you need an APM cap to get an AI with 'smart decisions' rather than overwhelming the other player with superior mechanics? So what does that mean? I know what you are trying to say, but think about how you would define 'being smart' in the context of Starcraft. How is it 'smart' to play more human-like and have a finely tuned unit composition and play calm and macro carefully when the nature of the game is such that you should just mass stalkers, control them individually every time the game state updates, and simply just mass up, micro them around continuously, and go in for the kill when your opponent makes a mistake? The best way to define 'being smart' is that which leads to a higher winrate.
By the way, there are both simplifying aspects to having an APM cap, as well as real-life applications to not having one. It will be hard to train the same network to control all your 30 stalkers every game state while as well deciding upon when to switch tech or expand. So in that case you need a small fast micro network and a larger slower network. And there are real-world problems that require such NN's as well as real-world problems that require high speed. Furthermore, who knows what strategy the AI would come up with to properly micro infinite APM blink stalker vs blink stalker battles vs itself. That would need many more nodes than just moving the one that is hurt to the back while yourself attacking the lowest ph enemy stalker currently in range.
No, the APM cap was decided on to make the AI more 'human-like' and relatable. It is basically PR as well as seeing if you can train an AI to do tasks in human-like manner. There are many real-world problems that you can do effectively in eerie AI-like manners, which would be unacceptable for social reasons, or in a human-like manner. If you can mimic a human while outperforming them, than that is often better than doing the same thing slightly better, but being completely alien.
By "smart" I mean that AlphaStar should be able find appropriate solution to unexpected situations(like the Immortal drop one) and that it should be able to elaborate and recognize effective strategies; I'd say that the AI was more "efficient" than "smart".
As far as I know, AlphaZero was praised for some actually innovative and exceptionally good moves and AlphaStar should be able to do the same. What it has accomplished instead(and that's of course pretty impressive as well) is to win by mere mechanics; Its micro was not exactly perfect due to the cap but still inhuman at times(see blink Stalkers).
I am not a programmer but I guess you could create a super strong AI capable of beating humans with infinite apm even without the ability of learning by itself; by limitating AlphaStar you would ideally be able to make it win not eventually exploiting its mechanical advantages over humans but, possibly, outsmarting them. I don't think it's just a capriceous limitation imposed by silly humans who need to regard this AI as closer to them, it's that the very interesting part is for us to see if the AI can teach something to us, not just beat us.
If the objective here is just to develop an AI that can indipendently come up with ways to beat the top human players, simply you are right and I have nothing more to say.
On January 26 2019 08:43 TT1 wrote: I'd like to see how well AlphaStar does in BW, his main edge in the matches came from being aggressive and outmicroing his opponents w/ blink micro (obviously while having perfect macro/base management). BW has a much greater defenders advantage, it wouldn't be able to capitalize on the same aspects it did in SC2 (which to me was mainly exploiting micro and positioning).
BW games are slower paced and there's way less poking, AlphaStar wouldn't be able to exploit micro maneuvers which obviously benefits human players.
Watching all the replays I couldn't help but think the same thing. Obviously I get why they picked SC2, but building an agent for BW that could beat Flash in a Bo5 would be a real challenge.
On January 25 2019 07:02 Poopi wrote: But it got stuck really bad on the warp prism harass, and when MaNa started to attack near the 2nd base it was really indecisive.
Totally! Spotting with the observer played a big role, but this was the real critical point.
Well I guess it shouldn't be too surprising when a computer wins at a computer game.
If you consider that Alphastar needed "200 years of games" (for one race on one map), you can still appreciate that humans still learn way faster under real conditions and are able to adapt much more quickly and energy efficient.
I wonder if you can create AI decision making for applications outside of video games with the same approach. would it need to compute 200 years of trial-and-error in one second for every decision it makes in a new situation?
On January 26 2019 08:43 TT1 wrote: I'd like to see how well AlphaStar does in BW, his main edge in the matches came from being aggressive and outmicroing his opponents w/ blink micro (obviously while having perfect macro/base management). BW has a much greater defenders advantage, it wouldn't be able to capitalize on the same aspects it did in SC2 (which to me was mainly exploiting micro and positioning).
BW games are slower paced and there's way less poking, AlphaStar wouldn't be able to exploit micro maneuvers which obviously benefits human players.
Watching all the replays I couldn't help but think the same thing. Obviously I get why they picked SC2, but building an agent for BW that could beat Flash in a Bo5 would be a real challenge.
the popularity of the game in and outside of activision/blizzard like played a role. (promoting the game) I would imagine the deepmind guys would even prefer doing this with BW.
On January 26 2019 07:57 Grumbels wrote: Starcraft has this thing called “balance”, where unit strength is calibrated around human abilities to provide strategically rich gameplay. If you break the balance of the game then you can’t test the abilities of your agent, as it will just find out some silly unbeatable micro strat. APM caps and such is not just PR...
Exactly, it's playing another game, another of those games would have went to MaNa if the AI didnt do 1000+ constant APM from 3 different screens.
It's super cool and super impressive, but remember this is PvP on catalyst, I want to see non-mirror matchups specially with Zerg, ZvT/ZvP should be way more interesting, when does the AI build units?
About BW, it'd be cool, but BW APM also has to be capped, if the AI is getting ahead with mechanics on SC2, I can only imagine the monstermodemacro.
On January 25 2019 07:02 Poopi wrote: But it got stuck really bad on the warp prism harass, and when MaNa started to attack near the 2nd base it was really indecisive.
Totally! Spotting with the observer played a big role, but this was the real critical point.
Well I guess it shouldn't be too surprising when a computer wins at a computer game.
I think an important thing to take into account here is that the AI didn't spot the observer, and didn't even "think" to check for one. That allowed Mana full view of the ramp, and plenty of advance warning to pull his immortals back when a whole troop of stalkers came marching up the ramp, and also gave him knowledge that the whole stalker army was walking back out of the main. For some reason, the AI was completely unprepared for this harrassment and *never* left stalkers in that corner to defend for it. The first time, it could be in a state where it didn't know Mana had a warp prism, but the second time, it should have been in a different state where it did have that knowledge. And presumably in its 200 years of training, some games had been against an opposing bot that did some form of harrassment. And it would have learned better ways to deal with that. With the DeepMind guys saying they picked the agents that were "least exploitable", it's quite weird that it was so easily exploitable.
On January 26 2019 00:47 imp42 wrote: To sum up most of the debate so far:
- was the setup "fair" or not? - did the AI play well or not?
In the chess community we were blown away by the games AlphaZero played about a year ago. We had never seen anything like it. However, it was also very disappointing to realize Deepmind wasn't interested in chess at all. It was nothing but a playing field to demonstrate the capabilities of their neural net. As soon as the experiment concluded successfully, the Deepmind team moved on and left the chess world wondering what could have been, if they had access. Imagine somebody allowing you to peek into a treasure chest full of amazing content, but then closes it and stores it away, not to be opened again.
Realistically we are in the same situation with StarCraft of course. Once Deepmind "beats the game" they will move on without missing a beat.
Except they didn't. They actually continued the chess research, and released the new results in late 2018.
People forget that science (which is what they are doing) isn't a process where you - from the outside world - see results every day. When DeepMind first announced that AlphaZero had beaten Stockfish, it wasn't just the release of 10 games. There was a scientific paper included, which also had to be written and reviewed, and the researchers need to make sure that the conclusions they draw are correct.
Science isn't a field where the we, as the outside world, can expect to get daily or somewhat regular progress reports (although it can happen). This is, in part, to prevent people and the media from jumping to conclusions until the researchers feel certain they have accounted for whatever they feel they need to account for, and make sure the results aren't misinterpreted. And you are confusing this approach (which is a common and rather sound approach) with them having left "chess". They certainly haven't. And they certainly haven't left StarCraft II either. It just happens to be a research project, and they are doing it for the research - not for our entertainment. They are interested in StarCraft and Chess all right - just not at competitive sports, but as learning tools.
I, for one, are 100% certain that DeepMind isn't done with StarCraft. Since it's the first major attempt for an AI project (at least that i know of) that tries to solve a problem with limited information, i expect them to do put lot of work into this. For now, they have attempted training an AI that can win by executing powerful strategies. The next steps might be to attempt to make the AI more adaptable. This isn't just about training it more - it's about HOW to train it. That's actually the hard part of creating a good AI: you need to train it CORRECTLY and find the best approach. It's not just a bruce-force problem of letting it train for longer and longer.
To give an example of this, take this video by Codebullet. He is trying to train an AI to learn to play "Worlds Hardest Game" (that's what it's called). Initial attempts weren't getting him any results, so he had to change the way he was teaching it, and ended up solving the problem by incrementally expanding the amount of moves the AI was allowed to make before it would die. In that game, it was an easy solution. But for a game like StarCraft, this is actually a really complex problem.
On January 25 2019 19:30 Jockmcplop wrote: I don't think there's much you can do with a preexisting game like Starcraft to get a 'fair' game between ai and humans. Maybe intense tweaking of balance and ai ability but even then I don't think it would seem like a real opponent. You would have to design a game from the ground up with ai in mind I think.
I'm very impressed with the decision making of this ai. Its streets ahead of anything else I've seen.
I think its just super micro potential units that are broken for AI. I promise with zerg AI would not look nearly that impressive. THey would probably have a hard time droning and making units at the right times. Then you dont have things like warp prism micro or blink micro that can scale like crazy. What are they gonna do, dance their zerglings? They cant jump accross a wall. What are they gonna do, shoot an oracle or void ray or banshee with some roaches?
The AI would probably be forced in a ling bane hydra game, and then it would come down to how good they can micro hydras against AOE.
Zerg is the true race where human intelligence shines. Its about being one step ahead, predicting what the opponent will do, where he will send his warp prism,etc. You have to know when to drone and make units sometimes based just on instincts alone or knowing your opponent or current meta trends.
Protoss is you pick a build, execute it perfectly, and if you do execute it perfectly or very close, you probably win,unless your build order was too coin flippy.
In fact, PVP is by far the easiest matchup for an AI like deepmind to win consistently against humans. edit: Actually, ZvZ might be even easier for A.I now that i think about it... Most definitely maybe?
FUN FACT: I wonder if deepmind AI could play billions of matches of all matchups to determine which is the potential best race. If someone is unbiased and would know , its an AI. I bet they already know lol.
This kind if AI playing billions of games against same skill opponent could probably easily conclude if certain units are overpowered in certain matchups. Very interesting stuff when you think about it....
There's some truth to what you're saying here, I'm especially interested in how an AI Zerg approaches drone production.
Overall though, I think a Zerg AI would have a flock of mutas constantly poking away at their opponent in ways that no human can.
I also think that splitting up their zerglings would increase their effectiveness.
Let's compare to chess: when alphazero beat stockfish, a lot of chess analysts thought alphazero played beautifully and very different from the typical brute force AI. Youtube is full of game analysis of those games because GMs truly find those games beautiful and interesting.
Here it looks like the AI can get away with suboptimal decisions, bad decisions just because it can click simultaneously on 100+ units at a time anywhere on the map.
This feels like giving an average Joe an aimbot in a FPS game.
On January 27 2019 07:19 Xitah wrote: Not impressed. Why?
Let's compare to chess: when alphazero beat stockfish, a lot of chess analysts thought alphazero played beautifully and very different from the typical brute force AI. Youtube is full of game analysis of those games because GMs truly find those games beautiful and interesting.
Here it looks like the AI can get away with suboptimal decisions, bad decisions just because it can click simultaneously on 100+ units at a time anywhere on the map.
This feels like giving an average Joe an aimbot in a FPS game.
On January 25 2019 06:48 ArtyK wrote: I'd like to know epms and not worthless apms for this, considering the AI was capable of microing stalkers in 3 different places at once and win vs mass immortals.
I'm pretty sure apm = epm for the AI
I agree with this notion.
AI APM should be based on human EPM, not human APM.
Let me put this differently. After deep blue best Kasparov, brute force chess engines clearly surpassed humans but still it was the case that a human GM + engine could easily beat an engine because while the engine could see 20+ tactical combinations by sheer brute force, they could still make long term obvious mistakes in mid game. Alphazero seemingly does not so human GM + alphazero is not really better than just alphazero.
Here it's obvious that humans can made a lot of improvements to the game play so humans are not surpassed yet.
On January 25 2019 06:48 ArtyK wrote: I'd like to know epms and not worthless apms for this, considering the AI was capable of microing stalkers in 3 different places at once and win vs mass immortals.
I'm pretty sure apm = epm for the AI
I agree with this notion.
AI APM should be based on human EPM, not human APM.
What's the average pro's EPM?
usually about 200 to 240, with slight variations between races. Still, there is a fundamental difference if the A.I can access the control of units using background A.P.I. For instance, say you have an army of pheonixes or mutas, and you want to pulll the injured unit back to save its life or sometimes to save the entire fleet from a parasitic bomb. For humans, it's impossible to achieve in one click because these units are stacked. Best case scenario: ctrl+left click to select all units, find the one(if it's actually in first page), click it, click on somewhere else on the screen right click. These are already three clicks, plus three drags. practical case scenario, drag drop right click, drag drop right click, drag left click, drag right click Worst case scenario. drag drop right click, drag drop right click, drag drop right click, and everythign dies.
If you are an A.I with direct interface to select any units on screen, all you have to do is left click to select, drag right click, yeah there is 350ms latency, but all you have to do is two click and one drag.
On January 27 2019 11:10 Aegwynn wrote: So now do we have human elitists as well?
Nah we do not, we will not be surprised if A.I is shown to be mechanically superior to humans it's perfectly fine and expected. But what we want from Deepmind is to show that A.I can also exhibit some sorts of resourcefulness and wisdom in terms of tactics, to inspire human play, say the 25 probe-mining is inspiring, the 40 stalker 1000 apm micro is not.
If they just want to show an A.i that can beat humans, they could just make an automaton 2000 that does not hack and totally destory any human players with ease. That's like asking an Marathon elite to run against a car with GPS, the result is already written. There is a reason for them to implant a 350ms delay and to show that the A.I has lower average APM than human players even though they do not understand human APM very well. That is, through their neural network training on TPUs, the A.I can be quickly trained to be superior than human players even if they are mechanically equivalent.
I am not saying they should build a robot and entire real time image processing to precieve the game and to control the game the same way a human player would do, but it should be their goal to make something as equivalent and as fair as possible.
Otherwise, it makes no point to prove that a car can run faster than a man on racing tracks. It is much better to prove that an autonomous car can run faster than its human-driven version with equal weights on the racing track, the exact controls and interface sure are different, but this is much more fair.
Very nice showcase, but as it stands now the AI has the equivalent of an unlimited amount of hotkeys at its disposal. It doesn't blink back a bunch of stalkers, but rather decides which stalkers to blink within the bounds of it's APM. I think the mechanical brute forcing will not stop until the AI shows mistakes stemming from imperfect micro.
On January 27 2019 07:19 Xitah wrote: Not impressed. Why?
Let's compare to chess: when alphazero beat stockfish, a lot of chess analysts thought alphazero played beautifully and very different from the typical brute force AI. Youtube is full of game analysis of those games because GMs truly find those games beautiful and interesting.
Here it looks like the AI can get away with suboptimal decisions, bad decisions just because it can click simultaneously on 100+ units at a time anywhere on the map.
This feels like giving an average Joe an aimbot in a FPS game.
That's a superficial and wrong analysis. You're missing many things the AI did right, and that isn't related to micro.
For example, the AI had an absolutely perfect understanding or ability to predict the outcome of a battle based on army sizes and unit composition. This shouldn't really come as a surprise, since it's something you would expect an AI to do really well after 200 years of accumulated playing-experience, but it's really an important point. Even if you reduce the micro-abilities of the AI even further, humans can't hope to ever be as good at judging the outcome of an engagement as an AI. You often see this reflected in commentators in professional matches, when a medium-sized to large battle ensues, and they can't tell who is gonna win it until the end.
And you also saw this in these games: there were several situations where TLO/Mana misjudged the winning chances of an engagement, and got crushed, both on defense and offense. The AI understood the army compositions better, and knew when to move in or when it had enough defense, where's the human player made a sub-optimal judgement.
Trying to reduce the outcome of this match to simply being about superior micro is plain ignorant.
On January 27 2019 15:23 Dumbledore wrote: The alphastar agents that went 10-0 had ability to see the map zoomed out all the time. Not fair at all. The camera based agent got slam dunked.
Humans evolve too, and Mana found a way to abuse the AI. That is expected to happen. He came into this match prepared.
Fact is that according to DeepMind, the camera-based agent is actually only slightly weaker than the non-camera based agent. You can't judge something from one game. If they had gone to a 10-game match, i would still expect the AI to win 6-7 out of 10 games.
Here's the strength-graphs of the camera-agent versus the camera-less agent.
On January 27 2019 20:31 akatama wrote: Very nice showcase, but as it stands now the AI has the equivalent of an unlimited amount of hotkeys at its disposal. It doesn't blink back a bunch of stalkers, but rather decides which stalkers to blink within the bounds of it's APM. I think the mechanical brute forcing will not stop until the AI shows mistakes stemming from imperfect micro.
Actually, according to the information DeepMind has released, the AI actually uses individual "clicks" (like a simulated mouse) or drag boxes just like a human.
Of course it's better and faster at it, but it doesn't have infinite hot-keys.
Just my thoughts from the point of view of a casual sc:bw/sc2 player. I take it that AS learned "how to play" by getting an initial seed of pro replays, thus probably picking up the initial concept of "mining is good" and "kill enemy stuff to win", and then evolved by playing countless games versus instances of itself, right? If so, I think that this is a major flaw in the eventual evolution of its decisionmaking ability. It´s more or less the same if I was watching some replays and played a tutorial with my friend, and then never took a glance outside the box but only played against him, over and over again, for, well... 200 years. The result is a tree of winning techniques (not necessarily strategies or refined builds) that proved to be good within the scope of the learning environment, but it doesn´t have too much of a similarity to the evolution of the PvP meta over the years, or even within a patch. It´s not trying to be a better gamer (c), but trying to reach higher win-% with what it´s doing. This if of course backed by an access to mechanics way beond human capabilites.
Well, so far so good. It´s still nice to see that AS figured out things like flanking, proxies, blink micro etc. more or less by itself. What it failed to learn about is any consistent form of reactionary play depending on opponents' actions, see manas prism harass. But why should it, as long as pushing hard with near-perfect macro at home while aiming for an economy lead still works? It´s been practicing against other versions of itself that (mostly) didn´t execute any major tech switches after all.
Also, I see its heavy emphasis on blink stalkers in one form or another as a result of AS "understanding" that perfect micro with them is freaking awesome, or rather: cost/return wise they are oftentimes more effective than other protoss compositions, given the fact that fueling hundreds of apm into microing them makes them a dozen times better over a-moving them, or just focus firing. I assume that in the history of ASvAS games, any AI instance that tried to play a conservative composition against 80-100% blink stalkers eventuelly got wrecked, so this tree of decisions eventually died, or rather fell behind too much in priority to be considered.
Basically, with unlimited eapm, SC2 would need an independent set of balance patches around the mechanical capabilities of AS, or at least AS-PvP would sooner or later evolve to a point where all it needs are nexi, gates, core, forge, probes and stalkers. kappa.
But: what would have happened if AS didn´t get to learn in an environment where it can spike eapm to 1500 and beyond, but only to a human level? Or even as low as maybe 50ish eapm? Obviously letting it learn at high eapm and then hardcapping it way below that for an actual match will distort the result big time. We will probably never know, but I guess that with different cost/return weights on units, we would have seen totally different preferred unit compositions. Maybe 11/11 soul train games.
When I for myself try to improve, I play people that are seeded somewhere between "a bit better" and "quite a bit better", because otherwise, I am unlikely to get a glimpse of my own flaws, and all I get out of a game is a tiny bit of mechanical routine. Obviously, it´s quite hard to find a human opponent that´s stronger than AS and able to play hundreds of thousands of games per hour, simultaneously. So what do you think would happen if AS wouldn´t play-to-learn against itself, but rather against conventional bots with a gazillion of apm, perfect micro and strats that are designed to crush it with random multi-front pushes, while itself being constrained to screen vision and, say, 200-250 eapm? And once it starts to consistently deal with it, improve the sparring bot? Would AS learn to be more of an e-serral than a blink-god? Or would AS just evolve into developing a pushing technique that is mostly unstoppable by this kind of play, thus basically ending up where it is right now?
On January 27 2019 07:19 Xitah wrote: Not impressed. Why?
Let's compare to chess: when alphazero beat stockfish, a lot of chess analysts thought alphazero played beautifully and very different from the typical brute force AI. Youtube is full of game analysis of those games because GMs truly find those games beautiful and interesting.
Here it looks like the AI can get away with suboptimal decisions, bad decisions just because it can click simultaneously on 100+ units at a time anywhere on the map.
This feels like giving an average Joe an aimbot in a FPS game.
fair enough but this is a lot harder than chess. This might take some more innovation and more work. This was a good first step, and I thought it played very similar to top level humans which is a big achievement itself. No bot ever in rts has ever played very human like. It made a lot of great decisions even minus the amazing and inhuman micro it had This game is a lot different and much harder than chess. Chess really is a simple game even compared to GO. I think deepminds biggest achievement was GO that was truly amazing. Playing within human limits and beating the top ranked korean SC 2 players would be an amazing achievement though too. Even a perfect AI in sc 2 though you would expect it to lose because some strats just lose to other strats.
On January 27 2019 07:19 Xitah wrote: Not impressed. Why?
Let's compare to chess: when alphazero beat stockfish, a lot of chess analysts thought alphazero played beautifully and very different from the typical brute force AI. Youtube is full of game analysis of those games because GMs truly find those games beautiful and interesting.
Here it looks like the AI can get away with suboptimal decisions, bad decisions just because it can click simultaneously on 100+ units at a time anywhere on the map.
This feels like giving an average Joe an aimbot in a FPS game.
That's a superficial and wrong analysis. You're missing many things the AI did right, and that isn't related to micro.
For example, the AI had an absolutely perfect understanding or ability to predict the outcome of a battle based on army sizes and unit composition. This shouldn't really come as a surprise, since it's something you would expect an AI to do really well after 200 years of accumulated playing-experience, but it's really an important point. Even if you reduce the micro-abilities of the AI even further, humans can't hope to ever be as good at judging the outcome of an engagement as an AI. You often see this reflected in commentators in professional matches, when a medium-sized to large battle ensues, and they can't tell who is gonna win it until the end.
And you also saw this in these games: there were several situations where TLO/Mana misjudged the winning chances of an engagement, and got crushed, both on defense and offense. The AI understood the army compositions better, and knew when to move in or when it had enough defense, where's the human player made a sub-optimal judgement.
Trying to reduce the outcome of this match to simply being about superior micro is plain ignorant.
predicting the result of a battle is actually very hard. When you run up a ramp you might only see part of the army. Sometimes you cant see the entire enemy army than your guessing if you can win or not just based on partial information. You need to then make educated guesses to try to fill this in. Personally I think brood war was harder because units were less clumped so it was harder to see all the enemy units since they were more spread out. In sc 2 units bunch together into tight balls easier to see. In brood war you would have huge sprawling groups of units or tank lines that you cant see all of them. You need intuition. It will be interesting to see the bot play TvT since I think tanks are one of the most interesting and strategic parts of the game. How does it manage and position its siege lines? There is no doubt it will learn though. This AI is very impressive even minus the cheating micro and map management. It did a lot of amazing things I have never seen bots to do in RTS games. I have watched and played a lot of bots though, and some understanding of how they work. Some of it is lost on average people
On January 27 2019 15:23 Dumbledore wrote: The alphastar agents that went 10-0 had ability to see the map zoomed out all the time. Not fair at all. The camera based agent got slam dunked.
Humans evolve too, and Mana found a way to abuse the AI. That is expected to happen. He came into this match prepared.
Fact is that according to DeepMind, the camera-based agent is actually only slightly weaker than the non-camera based agent. You can't judge something from one game. If they had gone to a 10-game match, i would still expect the AI to win 6-7 out of 10 games.
Here's the strength-graphs of the camera-agent versus the camera-less agent.
I think we need to see a lot more games. I would like to see mana get some rematches against the same agents. Every match really needs to be limited to camera to make it legit. having the entire map is a pretty large cheat code.
Is Alphastar using physical keyboard/Mouse? If not then there is no point to do this since fatigue /coordination error is a big part of starcraft games.
Do this when AI uses real time camera, physical keyboard/mouse.
One thing that I thought that was pretty interesting was, how does the AI know that you're not going DT?
In the one game Mana did go DT it seemed like he had a pretty decent edge with map control and unit attacks, but eventually didn't maximize that lead.
I read that the AI can see the blur on the map, and since it has full map vision it can essentially see any cloaked unit, even if it's undetected.
Would removing this attribute, or at least making the camera focus on the blur as a requirement vs cloaked units make them much stronger vs the AI, and furthermore, alter the builds it does substantially?
Like, if for example, Mana had feigned an expo and just went straight to DTs, other than the agent that went mass disruptor, I don't see how this wouldn't just be a serious flaw in the builds it does.
Making mass stalkers/blink and microing them is insane, but it doesn't beat a variety of other builds that were unable to be tested.
Thusly, it would be great to, as others have said, see the pros be able to play against the same agents multiple times to see if their MMR is 7000 in a BO1, or if it's actually 7000 across X number of games vs pros.
On January 27 2019 07:19 Xitah wrote: Not impressed. Why?
Let's compare to chess: when alphazero beat stockfish, a lot of chess analysts thought alphazero played beautifully and very different from the typical brute force AI. Youtube is full of game analysis of those games because GMs truly find those games beautiful and interesting.
Here it looks like the AI can get away with suboptimal decisions, bad decisions just because it can click simultaneously on 100+ units at a time anywhere on the map.
This feels like giving an average Joe an aimbot in a FPS game.
That's a superficial and wrong analysis. You're missing many things the AI did right, and that isn't related to micro.
For example, the AI had an absolutely perfect understanding or ability to predict the outcome of a battle based on army sizes and unit composition. This shouldn't really come as a surprise, since it's something you would expect an AI to do really well after 200 years of accumulated playing-experience, but it's really an important point. Even if you reduce the micro-abilities of the AI even further, humans can't hope to ever be as good at judging the outcome of an engagement as an AI. You often see this reflected in commentators in professional matches, when a medium-sized to large battle ensues, and they can't tell who is gonna win it until the end.
And you also saw this in these games: there were several situations where TLO/Mana misjudged the winning chances of an engagement, and got crushed, both on defense and offense. The AI understood the army compositions better, and knew when to move in or when it had enough defense, where's the human player made a sub-optimal judgement.
Trying to reduce the outcome of this match to simply being about superior micro is plain ignorant.
I don't know if you can call the AI better at judging an engagement. If you download the replays and take control of the fights vs an equally skilled opponent the human player is usually the one who has the better army at the end. The AI doesn't evaluate the fight as winnable because it has better army, it knows it can win because it can outmicro the opponent with its composition.
Having 3 stalkers, 1 zealot and 4 phoenixes vs 8 stalkers, 1 adept and 2 phoenixes.
Not impressed. Situation repeats with Kasparov vs Deep blue. Human side were in the dark. Human side played like it plays vs human. And human side won and showed how "stupid" a. I. Is in last game after having enough experience.
On January 28 2019 01:34 scwish wrote: Just my thoughts from the point of view of a casual sc:bw/sc2 player. I take it that AS learned "how to play" by getting an initial seed of pro replays, thus probably picking up the initial concept of "mining is good" and "kill enemy stuff to win", and then evolved by playing countless games versus instances of itself, right? If so, I think that this is a major flaw in the eventual evolution of its decisionmaking ability.
No. This is mathematically required. The phase space game moves is way too big for it to randomly get near the phase space of playing properly. There is no point in spending tens of thousands or more hours of game play of agents just randomly clicking like monkeys, where no iteration of the NN is any better than the other.
It´s more or less the same if I was watching some replays and played a tutorial with my friend, and then never took a glance outside the box but only played against him, over and over again, for, well... 200 years.
You can't be serious.
The result is a tree of winning techniques (not necessarily strategies or refined builds) that proved to be good within the scope of the learning environment,
That is all that training a neural network can mathematically ever do. But I thought your point would be that it only copies replays and never does anything new or creative.
... but it doesn´t have too much of a similarity to the evolution of the PvP meta over the years, or even within a patch. It´s not trying to be a better gamer (c), but trying to reach higher win-% with what it´s doing. This if of course backed by an access to mechanics way beond human capabilites.
Why would you expect the learning of a neural network to mirror the human meta? I think you should understand that it never should. A meta is something completely random anyway. That's why there can be different meta's if populations of gamers are cut off from each other. And why do we humans have a meta anyway? Because we copy each others strategies. In training a NN, you never ever copy the weights and biases of the agent that beat you. So why even bring it up?
So it should not happen. And you certainly don't want it to happen either.
Well, so far so good. It´s still nice to see that AS figured out things like flanking, proxies, blink micro etc. more or less by itself. What it failed to learn about is any consistent form of reactionary play depending on opponents' actions, see manas prism harass.
It did respond to Mana's warp prism and it did so perfectly consistently, which was exactly the problem. Do you even know the meaning of the words you are using? This was a clear example of a human recognizing a tendency of the AI and exploiting it and the AI because of it's nature unable to adapt to this.
But why should it, as long as pushing hard with near-perfect macro at home while aiming for an economy lead still works? It´s been practicing against other versions of itself that (mostly) didn´t execute any major tech switches after all.
Good insight. The AI has shown us that the correct way to play SC2 is not to try to out-think the opponent. Instead, you play a game where it becomes irrelevant what your opponent is thinking.
You say you know that it didn't know to adapt because it didn't face agents beating it because they did learn to do tech switches. But you have no reason to believe this. And neither TLO or Mana won by doing a tech switch. So I don't get this.
Also, I see its heavy emphasis on blink stalkers in one form or another as a result of AS "understanding" that perfect micro with them is freaking awesome, or rather: cost/return wise they are oftentimes more effective than other protoss compositions, given the fact that fueling hundreds of apm into microing them makes them a dozen times better over a-moving them, or just focus firing.
This sentence is an absolute mess, but I think I get the jest of it. Yes, AS has a deeper understanding of Starcraft than all people here claiming it should do techswitches and diverse unit compositions.
I assume that in the history of ASvAS games, any AI instance that tried to play a conservative composition against 80-100% blink stalkers eventuelly got wrecked, so this tree of decisions eventually died, or rather fell behind too much in priority to be considered.
Which means those agents were playing the game wrong and the ones that won and were selected had a deeper understanding.
Basically, with unlimited eapm, SC2 would need an independent set of balance patches around the mechanical capabilities of AS, or at least AS-PvP would sooner or later evolve to a point where all it needs are nexi, gates, core, forge, probes and stalkers. kappa.
Why? How many more balance patches has SC2 had compared to SC BW. And it still isn't balanced. Or is the argument here: "AS shows that SC2 sucks."
But: what would have happened if AS didn´t get to learn in an environment where it can spike eapm to 1500 and beyond, but only to a human level? Or even as low as maybe 50ish eapm? Obviously letting it learn at high eapm and then hardcapping it way below that for an actual match will distort the result big time. We will probably never know, but I guess that with different cost/return weights on units, we would have seen totally different preferred unit compositions. Maybe 11/11 soul train games.
The AS would likely have to overcome some technical hurdle but then the outcome would be AS winning. And you would come up with something new to whine about.
When I for myself try to improve, I play people that are seeded somewhere between "a bit better" and "quite a bit better", because otherwise, I am unlikely to get a glimpse of my own flaws, and all I get out of a game is a tiny bit of mechanical routine.
That's just your own rationalization about how you learn. People on TL have made much stronger cases about the benefits of playing against people weaker than you.
Obviously, it´s quite hard to find a human opponent that´s stronger than AS and able to play hundreds of thousands of games per hour, simultaneously.
After this statement, I am not quite sure if you mean what 'quite hard' means. But say this was somehow possible, would we then even need to develop deeplearning AI's? We just get a bunch of people better than a deeplearning algorithm to do the same thing for hundreds of thousands of hours.
So what do you think would happen if AS wouldn´t play-to-learn against itself, but rather against conventional bots with a gazillion of apm, perfect micro and strats that are designed to crush it with random multi-front pushes, while itself being constrained to screen vision and, say, 200-250 eapm?
Do such bots exist? It would simply overfit the NN to beat these very weak bots.
And once it starts to consistently deal with it, improve the sparring bot?
How would it improve the non-deeplearning sparring bot? They are hard-coded line by line.
Would AS learn to be more of an e-serral than a blink-god? Or would AS just evolve into developing a pushing technique that is mostly unstoppable by this kind of play, thus basically ending up where it is right now?
It would waste a whole bunch of Google's invested dollars.
On January 28 2019 15:33 cpower wrote: Is Alphastar using physical keyboard/Mouse? If not then there is no point to do this since fatigue /coordination error is a big part of starcraft games.
But an AI doesn't get fatigued. Why would you hard-code in artificial fatigue so that the NN develops to avoid the effect of fatigue that it doesn't suffer from in the first place? Also, I don't think even for a human playing a Bo5, fatigue plays a big role. Unless you are jet-lagged or something. I assume you mean mental fatigue, which is hard to notice yourself. From my experience, humans have no obvious problems concentrating for 5x30 minutes.
I don't understand why you say that an AI is not useful unless it has all the flaws humans have.
On January 28 2019 16:06 -Kyo- wrote: I read that the AI can see the blur on the map, and since it has full map vision it can essentially see any cloaked unit, even if it's undetected.
So can humans. They just removed an interface limitation from the AI so that it would learn to actually play the game first. It is silly to simultaneously have the AI learn to play the game and 'fight the interface'. So they gave all the info theoretically available to the player and fed it straight into an input matrix. If you put in these limitations then the AI will just learn to be really really good at camera control while right now it can focus straight on strategizing.
Would removing this attribute, or at least making the camera focus on the blur as a requirement vs cloaked units make them much stronger vs the AI, and furthermore, alter the builds it does substantially?
What?
Thusly, it would be great to, as others have said, see the pros be able to play against the same agents multiple times to see if their MMR is 7000 in a BO1, or if it's actually 7000 across X number of games vs pros.
But this is also tricky because a human GM playing a BO1 vs a random top agent will result in a completely different MMR for AS than a human playing over and over vs the same AI until it knows exactly how to exploit it. An AI can be no.1 on the ladder while at the same time any human can be shown how to beat it with a specific strategy if humans are allowed to play over and over vs the exact same AI until they find how to exploit it.
Same sensationalised story when AlphaZero "beat" stockfish. Basically playing against a shelf bought stockfish with no tablebases. Yeh you can make your AI seem impressive when you give it massively favourable conditions.
AI is a powerful tool but I stopped taking this at all seriously after the list of conditions and the fact it sees the whole map. Pretty much a marketing gimmick, cool but not that impressive.
On January 28 2019 21:51 KelsierSC wrote: Same sensationalised story when AlphaZero "beat" stockfish. Basically playing against a shelf bought stockfish with no tablebases. Yeh you can make your AI seem impressive when you give it massively favourable conditions.
But the result was 27-0, 60-something draws... Do you think something could help Stockfish take a game from AlphaZero?
On January 28 2019 21:51 KelsierSC wrote: Same sensationalised story when AlphaZero "beat" stockfish. Basically playing against a shelf bought stockfish with no tablebases. Yeh you can make your AI seem impressive when you give it massively favourable conditions.
But the result was 27-0, 60-something draws... Do you think something could help Stockfish take a game from AlphaZero?
When Alphazero played 1000 games against stockfish and Stockfish had access to its opening books SF was able to consistently win games. Alphazero won the majority of the games but the margin was significantly smaller.
afaik there have been no fundamental advances in physics since the 60’s, yet look at today’s technology versus 50 years ago. There’s a lot of cool things to be done using current AI tech.
On January 26 2019 08:00 shabby wrote: I wonder if AlphaStar would play differently if it didn't learn anything from replays, but had to figure everything out by itself. Would be more interesting strategywise to see what it does when it doesnt "mimic humans".
Considering the move space, initially a neural network with random weights will have purely random actions. It will be as if your cat is walking across your keyboard, or as if a monkey is clicking your mouse. At that point you depend on the AI accidentally building a pylon, building a gateway, and having a zealot move towards the enemy base. So you would have hundreds of thousands of games lasting for a very long time with literally nothing happening. In other words, the phase space of tremendously huge and only a very tiny segment of that phase space has engines that actually attempt to play the game. And if you initiate a random neural net, it will be out there in a flat desert of completely random clicking. If you are there and move in any direction of the phase space, you aren't suddenly winning more games. All your agents would just randomly click while the clock counts down to a draw. So all your bots draw vs all of them because the phase space is so huge, no neural network gets initialized with the proper weights. That's why you first imitate human play. We already know how a game of Starcraft should look like. So there is no sense in exploring the vast flat desert of useless neural nets. Maybe you are copying things from humans that are bad and you don't unlearn them. Hard to know. But it makes no sense to try to train neural nets when only 0.0000001% of them actually send their probes to mine minerals.
Well, yeah, heh. Seems your point is that it would take a lot of processing power/time, my point was what would happen if you did it. If it floats your boat, you could clone the machine a million times, lower the timelimit for a draw in games where nothing happens, and change the game speed drastically for the initial learning or something. Idk, you could probably even "weight" (if thats the right word) decisions in the beginning to favor mining/building to speed up the first million games of random clicks. The point is that after it figures out that mining minerals and building buildings and an army to attack with is good, it would "evolve" without any interference from how humans have decided SC should look. Saying that we already know how a game of starcraft should look like is arrogant imo, and kinda makes the point of an AI moot, besides the technical research, which most of us arent interested in. People thought they knew how chess and DoTA should be played too, turns out it should be played a lot more aggressively, with sacrifices being made along the way for the greater good.
Yeah, the correct way to play SC2 is to rely on micro unattainable by human players. The same way flying at high speed is the correct way to play American football... Great insight.
You would think it would be obvious, right? Well, apparently not to some.
Maybe Blizzard can use this NN in the future to make sure the game doesn't require too much micro.
As for AlphaZero in chess, yeah Deepmind ran a weak version of Stockfish against AlphaZero. Just like they picked TLO out of all people to play vs AlphaStar. And they blamed Blizzard for that, lol. Also, why would you even ask Blizzard? Isn't TLO a Supreme Commander player that retired years ago? And if you go through the Stockfish games, many games have moved that no Stockfish engine would ever make. It is as if they manually entered a bad movie. Or the way they ran Stockfish resulted in it blundering. And if you look on Youtube, there's even people that claim that for some games they released, they switched around the names and the game won by AlphaZero was actually won by Stockfish.
On January 28 2019 21:51 KelsierSC wrote: Same sensationalised story when AlphaZero "beat" stockfish. Basically playing against a shelf bought stockfish with no tablebases. Yeh you can make your AI seem impressive when you give it massively favourable conditions.
But the result was 27-0, 60-something draws... Do you think something could help Stockfish take a game from AlphaZero?
When Alphazero played 1000 games against stockfish and Stockfish had access to its opening books SF was able to consistently win games. Alphazero won the majority of the games but the margin was significantly smaller.
Alright, I pointed this out before, but... They played 1200 games in that series. The margin was "significantly smaller" for two reasons: 1. Any non-zero number would make for a "significant" difference 2. They didn't just "have access to" opening books. The opening positions were determined beforehand. So AZ and SF might have actually started playing after 7 moves, or however many moves took them to the pre-determined starting position. Some of those positions would be intrinsically disadvantaged. If you watch some of the games that AZ lost, you can see that this actually did make a difference, and as far as I can tell, those may have been the only games AZ lost.
Despite number 2 above, the score was 290-24 in that match. That's the significantly smaller margin you're pointing to. When you combine this with the increasingly competitive Leela Chess Zero in TCEC, it's hard to argue that DeepMind failed to come up with an algorithm that yields a superior agent in chess. I get that you were responding to a post talking about their earlier complete shutout series, but if we go to the root comment, the fundamental idea that these are "sensationalised" results is just... not tenable.
Getting back to StarCraft and the root comment, the A* results are pretty impressive, any way you cut it. I think where people are getting confused is that they want to see something even more impressive. Well, give them time and let's see. In the meanwhile, they've accomplished something real here, even if it's not exactly what some people want them to have accomplished. There now exists a bot that can beat SCII pros. Am I wrong in thinking that prior to this, nothing else had even come close, even when taking full advantage of bot abilities compared to humans? But I guess that even though they tried to make its abilities somewhat more human-like, it's not an accomplishment because, you know, they didn't succeed fully, so they haven't proven that a bot can totally destroy all humans every game under perfectly human-like conditions. Until they do that, they are just a bunch of over-hyped sensationalists, right?
On January 29 2019 08:48 Polypoetes wrote: You would think it would be obvious, right? Well, apparently not to some.
Maybe Blizzard can use this NN in the future to make sure the game doesn't require too much micro.
As for AlphaZero in chess, yeah Deepmind ran a weak version of Stockfish against AlphaZero. Just like they picked TLO out of all people to play vs AlphaStar. And they blamed Blizzard for that, lol. Also, why would you even ask Blizzard? Isn't TLO a Supreme Commander player that retired years ago? And if you go through the Stockfish games, many games have moved that no Stockfish engine would ever make. It is as if they manually entered a bad movie. Or the way they ran Stockfish resulted in it blundering. And if you look on Youtube, there's even people that claim that for some games they released, they switched around the names and the game won by AlphaZero was actually won by Stockfish.
They picked TLO because he is great for PR. Also they picked him because he actually is a legit Pro and plays well enough to test the waters. They then went on to Protoss Pro Mana to see how good AlphaStar actually can be. So what's the problem?
TLO played SupCom ages ago and deliberately switched to Broodwar in order to prepare for SC2 which he knew would be a real esport opportunity.
On January 27 2019 07:19 Xitah wrote: Not impressed. Why?
Let's compare to chess: when alphazero beat stockfish, a lot of chess analysts thought alphazero played beautifully and very different from the typical brute force AI. Youtube is full of game analysis of those games because GMs truly find those games beautiful and interesting.
Here it looks like the AI can get away with suboptimal decisions, bad decisions just because it can click simultaneously on 100+ units at a time anywhere on the map.
This feels like giving an average Joe an aimbot in a FPS game.
That's a superficial and wrong analysis. You're missing many things the AI did right, and that isn't related to micro.
For example, the AI had an absolutely perfect understanding or ability to predict the outcome of a battle based on army sizes and unit composition. This shouldn't really come as a surprise, since it's something you would expect an AI to do really well after 200 years of accumulated playing-experience, but it's really an important point. Even if you reduce the micro-abilities of the AI even further, humans can't hope to ever be as good at judging the outcome of an engagement as an AI. You often see this reflected in commentators in professional matches, when a medium-sized to large battle ensues, and they can't tell who is gonna win it until the end.
And you also saw this in these games: there were several situations where TLO/Mana misjudged the winning chances of an engagement, and got crushed, both on defense and offense. The AI understood the army compositions better, and knew when to move in or when it had enough defense, where's the human player made a sub-optimal judgement.
Trying to reduce the outcome of this match to simply being about superior micro is plain ignorant.
Good point. Just to add to this a bit: while it is certainly true that medium- and large-scale battles are visually hard to follow and difficult to "score" as the fast action unfurls for casters, another major reason they often won't know who's going to win a fight is because the unit control from each player is also highly variable. Some players have different unit targeting priority tendencies and that can influence outcomes, some are better at positioning, some can more effectively exploit their favorite units, some click faster, some click more accurately, some react faster, and so on.
AlphaStar chooses to take an engagement against an enemy force knowing that it will have near-100% unit efficiency and assuming that the enemy will have the same (since it has prior experience playing against itself). Human play can only introduce mistakes, resulting in relative inefficiency. There was only one moment that I saw where AlphaStar made a critical blunder in a skirmish, and that was in the Blink Stalker game (Game 4 vs MaNa) where it blinked forward and lost like 7 Stalkers in exchange for 1 Immortal. By the odds, if AlphaStar pushes forward against you, it's doing so with extremely high confidence backed up by centuries of experience.
It is logical to first teach the AI to play the "pure" game without any kind of camera-based restrictions. Then introduce the camera on top and make it adopt. It is not entirely clear if that was the case or if the camera-restricted AI learned everything from very scratch. Anyway, deepmind should go further than a bare-camera limitation and introduce virtual keyboard and mouse with *physical properties*. If they teach their AI to perform as well in those physical restrictions no one would complain.
ITT lot of people who don't see nor understand how impressive this is + human feelings hurt.
I think the main problem with mainstream audience is that their expectations from the first iteration is off the charts + they have little understanding on the concept of "AI" which is an unfortunate term used by mainly movies and then by media to draw attention.
Anyway, great work by the Deepmind team. Truly impressive to see this development being done in so short amount of time. I honestly wasn't expecting this so soon.
Here is hoping that the AI will participate in GSL in 2 years, AlphaStar fighting!!
On January 27 2019 07:19 Xitah wrote: Not impressed. Why?
Let's compare to chess: when alphazero beat stockfish, a lot of chess analysts thought alphazero played beautifully and very different from the typical brute force AI. Youtube is full of game analysis of those games because GMs truly find those games beautiful and interesting.
Here it looks like the AI can get away with suboptimal decisions, bad decisions just because it can click simultaneously on 100+ units at a time anywhere on the map.
This feels like giving an average Joe an aimbot in a FPS game.
That's a superficial and wrong analysis. You're missing many things the AI did right, and that isn't related to micro.
For example, the AI had an absolutely perfect understanding or ability to predict the outcome of a battle based on army sizes and unit composition. This shouldn't really come as a surprise, since it's something you would expect an AI to do really well after 200 years of accumulated playing-experience, but it's really an important point. Even if you reduce the micro-abilities of the AI even further, humans can't hope to ever be as good at judging the outcome of an engagement as an AI. You often see this reflected in commentators in professional matches, when a medium-sized to large battle ensues, and they can't tell who is gonna win it until the end.
And you also saw this in these games: there were several situations where TLO/Mana misjudged the winning chances of an engagement, and got crushed, both on defense and offense. The AI understood the army compositions better, and knew when to move in or when it had enough defense, where's the human player made a sub-optimal judgement.
Trying to reduce the outcome of this match to simply being about superior micro is plain ignorant.
Good point. Just to add to this a bit: while it is certainly true that medium- and large-scale battles are visually hard to follow and difficult to "score" as the fast action unfurls for casters, another major reason they often won't know who's going to win a fight is because the unit control from each player is also highly variable. Some players have different unit targeting priority tendencies and that can influence outcomes, some are better at positioning, some can more effectively exploit their favorite units, some click faster, some click more accurately, some react faster, and so on.
AlphaStar chooses to take an engagement against an enemy force knowing that it will have near-100% unit efficiency and assuming that the enemy will have the same (since it has prior experience playing against itself). Human play can only introduce mistakes, resulting in relative inefficiency. There was only one moment that I saw where AlphaStar made a critical blunder in a skirmish, and that was in the Blink Stalker game (Game 4 vs MaNa) where it blinked forward and lost like 7 Stalkers in exchange for 1 Immortal. By the odds, if AlphaStar pushes forward against you, it's doing so with extremely high confidence backed up by centuries of experience.
Even MaNa himself said on the Youtube video in the other thread that the AI just played without the fear and that's what made the big difference. e.g. the stalker rush. AlphaStar just went up and killed MaNa because it either knew MaNa doesn't have the sentry to force field ramp or it knew the fight will be benefitial to it even with the sentry! What human player would do that without a blink? honestly? Who? Not even sOs or Has are that crazy.
Video (in case it was this thread I saw it in I'm sorry for reposting )
Some time-marks Stalker rush - 0h20m Immortal rush - 1h01m, at 1:03 MaNa talks how ballsy AlphaStar goes through the ramp
I don't remember the time mark when MaNa talks about the agents not having any fear. Also MaNa talks how the immortal drop did NOT win him the show match And no offense to anyone in this thread, but I will believe more MaNa than you Edit> This is not against you, Excalibur_Z, it just happens I replied to your post, the end is my general rant
I think AlphaStar was better in some areas and worse in others compared to other bots.
The best part was the Stalker battle engagement evaluation.
Most "reactions" it did was very adapted to a bizarro bot game universe with bots overfitted to that environment.
There were also the hilarious total failures, especially obvious in game 4 and the exhibition match.
- The failure to build a new robo or cannons after losing it to 3 DTs and getting only one observer out. - The attacking of its own forge. - The exhibition match hilarious ending. - The total lack of forcefield understanding.
If each agent played many matches, and especially longer matches, this would have become much more obvious I think.
There is also the complete lack of scouting, which is probably since scouting is a negative/loss in the short term, but good in the long term only if you react to it. This local minima is probably very hard to escape. I am not sure how they plan to deal with this problem in their learning.
Even with the insane micro I think MaNa goes 100-0 after training against it a bit. He just needed to unlearn to engage in razors edge micro battles.
On January 27 2019 07:19 Xitah wrote: Not impressed. Why?
Let's compare to chess: when alphazero beat stockfish, a lot of chess analysts thought alphazero played beautifully and very different from the typical brute force AI. Youtube is full of game analysis of those games because GMs truly find those games beautiful and interesting.
Here it looks like the AI can get away with suboptimal decisions, bad decisions just because it can click simultaneously on 100+ units at a time anywhere on the map.
This feels like giving an average Joe an aimbot in a FPS game.
That's a superficial and wrong analysis. You're missing many things the AI did right, and that isn't related to micro.
For example, the AI had an absolutely perfect understanding or ability to predict the outcome of a battle based on army sizes and unit composition. This shouldn't really come as a surprise, since it's something you would expect an AI to do really well after 200 years of accumulated playing-experience, but it's really an important point. Even if you reduce the micro-abilities of the AI even further, humans can't hope to ever be as good at judging the outcome of an engagement as an AI. You often see this reflected in commentators in professional matches, when a medium-sized to large battle ensues, and they can't tell who is gonna win it until the end.
And you also saw this in these games: there were several situations where TLO/Mana misjudged the winning chances of an engagement, and got crushed, both on defense and offense. The AI understood the army compositions better, and knew when to move in or when it had enough defense, where's the human player made a sub-optimal judgement.
Trying to reduce the outcome of this match to simply being about superior micro is plain ignorant.
Good point. Just to add to this a bit: while it is certainly true that medium- and large-scale battles are visually hard to follow and difficult to "score" as the fast action unfurls for casters, another major reason they often won't know who's going to win a fight is because the unit control from each player is also highly variable. Some players have different unit targeting priority tendencies and that can influence outcomes, some are better at positioning, some can more effectively exploit their favorite units, some click faster, some click more accurately, some react faster, and so on.
AlphaStar chooses to take an engagement against an enemy force knowing that it will have near-100% unit efficiency and assuming that the enemy will have the same (since it has prior experience playing against itself). Human play can only introduce mistakes, resulting in relative inefficiency. There was only one moment that I saw where AlphaStar made a critical blunder in a skirmish, and that was in the Blink Stalker game (Game 4 vs MaNa) where it blinked forward and lost like 7 Stalkers in exchange for 1 Immortal. By the odds, if AlphaStar pushes forward against you, it's doing so with extremely high confidence backed up by centuries of experience.
I would strongly disagree with Athinira's claim. There is no evidence that AlphaStar perfectly predicted the outcomes of the battle. At best, it thought it had the upper hand in those engagements under the assumption that its opponent was equally mechanically capable. MaNa obviously is not, which made those engagements so decisive. In order to verify how reliable its assessment of the engagements is you'd have to analyze the AlphaStar league games.
On January 29 2019 18:36 papaz wrote: ITT lot of people who don't see nor understand how impressive this is + human feelings hurt.
I think the main problem with mainstream audience is that their expectations from the first iteration is off the charts + they have little understanding on the concept of "AI" which is an unfortunate term used by mainly movies and then by media to draw attention.
Anyway, great work by the Deepmind team. Truly impressive to see this development being done in so short amount of time. I honestly wasn't expecting this so soon.
Here is hoping that the AI will participate in GSL in 2 years, AlphaStar fighting!!
I am so bored of this response. Anyone who doesn't immediately embrace machine learning or computer assistance is immediately scared of the future, or as you put it "human feelings hurt".
In reality it is those of us who can think critically, look at the parameters of this program/show match and realise it isn't as impressive as the title makes it out to be.
No one here will underplay or doubt the importance of "AI" in the modern world but this is a PR stunt that tries to overplay the result.
On January 29 2019 18:36 papaz wrote: ITT lot of people who don't see nor understand how impressive this is + human feelings hurt.
I think the main problem with mainstream audience is that their expectations from the first iteration is off the charts + they have little understanding on the concept of "AI" which is an unfortunate term used by mainly movies and then by media to draw attention.
Anyway, great work by the Deepmind team. Truly impressive to see this development being done in so short amount of time. I honestly wasn't expecting this so soon.
Here is hoping that the AI will participate in GSL in 2 years, AlphaStar fighting!!
I am so bored of this response. Anyone who doesn't immediately embrace machine learning or computer assistance is immediately scared of the future, or as you put it "human feelings hurt".
In reality it is those of us who can think critically, look at the parameters of this program/show match and realise it isn't as impressive as the title makes it out to be.
No one here will underplay or doubt the importance of "AI" in the modern world but this is a PR stunt that tries to overplay the result.
Plus quite a lot of researchers in machine learning / AI are a bit pissed off by Deepmind dishonesty / PR stunt, which could be hurting the field, so acting like every people expressing criticism is doing so because they don't understand AI is a bit laughable.
On January 29 2019 18:36 papaz wrote: ITT lot of people who don't see nor understand how impressive this is + human feelings hurt.
I think the main problem with mainstream audience is that their expectations from the first iteration is off the charts + they have little understanding on the concept of "AI" which is an unfortunate term used by mainly movies and then by media to draw attention.
Anyway, great work by the Deepmind team. Truly impressive to see this development being done in so short amount of time. I honestly wasn't expecting this so soon.
Here is hoping that the AI will participate in GSL in 2 years, AlphaStar fighting!!
I am so bored of this response. Anyone who doesn't immediately embrace machine learning or computer assistance is immediately scared of the future, or as you put it "human feelings hurt".
In reality it is those of us who can think critically, look at the parameters of this program/show match and realise it isn't as impressive as the title makes it out to be.
No one here will underplay or doubt the importance of "AI" in the modern world but this is a PR stunt that tries to overplay the result.
Plus quite a lot of researchers in machine learning / AI are a bit pissed off by Deepmind dishonesty / PR stunt, which could be hurting the field, so acting like every people expressing criticism is doing so because they don't understand AI is a bit laughable.
Well, the issue is that some responses qualify for that response.
Will there be more games of Alphastar playing vs Pros? I think I heard some of the programmers saying, that now they are going back to their general research...
I guess it would make sense for their Image. Better leave with a 10-1 than getting crushed over and over again, once players find out and abuse the obvious weaknesses Alphastar has (warpprism harass, not giving respect to sentries, just for example)
On January 25 2019 06:48 ArtyK wrote: I'd like to know epms and not worthless apms for this, considering the AI was capable of microing stalkers in 3 different places at once and win vs mass immortals.
On January 29 2019 23:45 Subflow wrote: Will there be more games of Alphastar playing vs Pros? I think I heard some of the programmers saying, that now they are going back to their general research...
I guess it would make sense for their Image. Better leave with a 10-1 than getting crushed over and over again, once players find out and abuse the obvious weaknesses Alphastar has (warpprism harass, not giving respect to sentries, just for example)
Doubt it would get crushed, it will only get better over time. I think they said it had been training for around 1 week before TLO match and then another few days before MaNa match? Thats not a very long time, and it was markedly better in MaNas matches. I dont doubt that if they keep working on it and let it practice, it will eventually be as unbeatable as computers are in other games.
On January 29 2019 23:45 Subflow wrote: Will there be more games of Alphastar playing vs Pros? I think I heard some of the programmers saying, that now they are going back to their general research...
I guess it would make sense for their Image. Better leave with a 10-1 than getting crushed over and over again, once players find out and abuse the obvious weaknesses Alphastar has (warpprism harass, not giving respect to sentries, just for example)
Doubt it would get crushed, it will only get better over time. I think they said it had been training for around 1 week before TLO match and then another few days before MaNa match? Thats not a very long time, and it was markedly better in MaNas matches. I dont doubt that if they keep working on it and let it practice, it will eventually be as unbeatable as computers are in other games.
That’s not really how it works. They can train it for months, but if they haven’t improved the underlying learning capabilities, it won’t matter. They might have stopped after one or two weeks because there was no additional improvement after that.
On January 29 2019 19:11 MadMod wrote: Even with the insane micro I think MaNa goes 100-0 after training against it a bit. He just needed to unlearn to engage in razors edge micro battles.
There are some huge holes in AlphaStars game. Some can be teased out easily (like learning to defending Warp Prism drops), but others won't be so easy.
I hope Blizzard lets anyone play against AlphaStar via the SC2 client. That'd be awesome.
On January 25 2019 06:48 ArtyK wrote: I'd like to know epms and not worthless apms for this, considering the AI was capable of microing stalkers in 3 different places at once and win vs mass immortals.
I'm pretty sure apm = epm for the AI
It seems like this.
Someone on r/machinelearning made an interesting post pointing out that since the agents started off learning from human replays, they seem to have taken on the human trait of spamming and never quite been able to drop it. I haven't checked that it's right, but the post gave examples of times where you could clearly see spamming from A*. If true, this would mean that at best, epm==apm only some of the time. Are there spam clicks even while microing intensively? Is that part of why such high apm spikes are needed?
Does anyone know what game speed AlphaStar is playing at during its internal games? Do I remember correctly that they mentioned 200 years of experience in a week? Was it combined playtime across all agents?
What I'm wondering is whether they could make an evolutionary algorithm that is trained to reconstruct a replay from one player's perspective. It's very different from simply teaching it to win. Such an approach would teach it how to model the state of the game from incomplete information. The main problem would be quantifying how faithful the reconstruction of a replay is.
Then they could turn it into a module and incorporate it into AlphaStar, and make it model the game it is currently playing in real time (assuming it can simulate numerous games of SC2 that quickly). It could come up with realistic scenarios explaining what the AI already knows about the opponent. It could create working hypotheses regarding what has been happening behind the fog of war, and perhaps even verify them via scouting.
On January 28 2019 21:15 Polypoetes wrote: But an AI doesn't get fatigued. Why would you hard-code in artificial fatigue so that the NN develops to avoid the effect of fatigue that it doesn't suffer from in the first place? Also, I don't think even for a human playing a Bo5, fatigue plays a big role. Unless you are jet-lagged or something. I assume you mean mental fatigue, which is hard to notice yourself. From my experience, humans have no obvious problems concentrating for 5x30 minutes.
I don't understand why you say that an AI is not useful unless it has all the flaws humans have.
I may have put in in a wrong way but misclicks do happen a lot in real games and AI is not designed to have misclicks so it's not really a fair battle to start with. I actually have talked with some developers on this program and see if they will try to implement that in the next phases.
On January 30 2019 09:51 maybenexttime wrote: Does anyone know what game speed AlphaStar is playing at during its internal games? Do I remember correctly that they mentioned 200 years of experience in a week? Was it combined playtime across all agents?
What I'm wondering is whether they could make an evolutionary algorithm that is trained to reconstruct a replay from one player's perspective. It's very different from simply teaching it to win. Such an approach would teach it how to model the state of the game from incomplete information. The main problem would be quantifying how faithful the reconstruction of a replay is.
Then they could turn it into a module and incorporate it into AlphaStar, and make it model the game it is currently playing in real time (assuming it can simulate numerous games of SC2 that quickly). It could come up with realistic scenarios explaining what the AI already knows about the opponent. It could create working hypotheses regarding what has been happening behind the fog of war, and perhaps even verify them via scouting.
Is what I'm proposing very far-fetched?
I don't know if I'm understanding you correctly, but you could imagine some sort of implementation where an AI has a belief about the opponent's units and economy, which it acts upon in a game and then verifies via watching the replay. I haven't read the paper they released yet, but from some comments I read I don't think it has these capabilities currently.
Also, I don't like spreading misinformation, but I /recall/ having heard that the figure of 200 years is the playtime of the agent which has played the longest time. The week of training probably also includes the initial stage of imitation learning from replays. Depending on how long this lasted, it would mean that if the agent playing vs TLO had 200 years of practice, then the one playing vs Mana, which trained for another week, would have at least 400 years of experience, but possibly much more.
But it might be best to read the paper. I mean, the ratio of a week to 200 years is like 1 : 10,000 , and I'm pretty sure you can't speed up SC2 that much even with good hardware and eliminating graphics. So a single agent has to be able to train in parallel with itself.
On January 29 2019 23:45 Subflow wrote: Will there be more games of Alphastar playing vs Pros? I think I heard some of the programmers saying, that now they are going back to their general research...
I guess it would make sense for their Image. Better leave with a 10-1 than getting crushed over and over again, once players find out and abuse the obvious weaknesses Alphastar has (warpprism harass, not giving respect to sentries, just for example)
Doubt it would get crushed, it will only get better over time. I think they said it had been training for around 1 week before TLO match and then another few days before MaNa match? Thats not a very long time, and it was markedly better in MaNas matches. I dont doubt that if they keep working on it and let it practice, it will eventually be as unbeatable as computers are in other games.
That’s not really how it works. They can train it for months, but if they haven’t improved the underlying learning capabilities, it won’t matter. They might have stopped after one or two weeks because there was no additional improvement after that.
I did say "keep working on it" though in addition to letting it practice. Of course its not "done" yet, or this wouldnt just be a PR demo.
On January 30 2019 09:51 maybenexttime wrote: Does anyone know what game speed AlphaStar is playing at during its internal games? Do I remember correctly that they mentioned 200 years of experience in a week? Was it combined playtime across all agents?
What I'm wondering is whether they could make an evolutionary algorithm that is trained to reconstruct a replay from one player's perspective. It's very different from simply teaching it to win. Such an approach would teach it how to model the state of the game from incomplete information. The main problem would be quantifying how faithful the reconstruction of a replay is.
Then they could turn it into a module and incorporate it into AlphaStar, and make it model the game it is currently playing in real time (assuming it can simulate numerous games of SC2 that quickly). It could come up with realistic scenarios explaining what the AI already knows about the opponent. It could create working hypotheses regarding what has been happening behind the fog of war, and perhaps even verify them via scouting.
Is what I'm proposing very far-fetched?
I don't know if I'm understanding you correctly, but you could imagine some sort of implementation where an AI has a belief about the opponent's units and economy, which it acts upon in a game and then verifies via watching the replay. I haven't read the paper they released yet, but from some comments I read I don't think it has these capabilities currently.
Also, I don't like spreading misinformation, but I /recall/ having heard that the figure of 200 years is the playtime of the agent which has played the longest time. The week of training probably also includes the initial stage of imitation learning from replays. Depending on how long this lasted, it would mean that if the agent playing vs TLO had 200 years of practice, then the one playing vs Mana, which trained for another week, would have at least 400 years of experience, but possibly much more.
But it might be best to read the paper. I mean, the ratio of a week to 200 years is like 1 : 10,000 , and I'm pretty sure you can't speed up SC2 that much even with good hardware and eliminating graphics. So a single agent has to be able to train in parallel with itself.
This is a good point. I'm not sure. It would mean that a game of SC2 of normally ~30 minutes would be played in 0.2 seconds. Even having the map and everything loaded into memory in advance, that seems *very* fast to simulate SC2 with 2 quite heavy RL algorithms making the decisions on both sides. On the other hand, they are running it on a rather powerful setup. 16 TPUs can run a pretty hefty NN in very little time. However, the SC2 engine itself is not easily parallelized, and it still needs to compute every unit's actions every step of the simulation.
I found it weird that Mana had to play Alphastar without any practice sessions because it seems like the AI agent had a playing style. Against human opponents Mana is aware of player tendencies and therefore the matchup gains weight by mind games caused by the meta developed between them and the broader meta Mana engages with online while laddering. Would it not make sense for Mana and also Alphastar do have played practice games before the big 5 matches?
On January 30 2019 09:51 maybenexttime wrote: Does anyone know what game speed AlphaStar is playing at during its internal games? Do I remember correctly that they mentioned 200 years of experience in a week? Was it combined playtime across all agents?
What I'm wondering is whether they could make an evolutionary algorithm that is trained to reconstruct a replay from one player's perspective. It's very different from simply teaching it to win. Such an approach would teach it how to model the state of the game from incomplete information. The main problem would be quantifying how faithful the reconstruction of a replay is.
Then they could turn it into a module and incorporate it into AlphaStar, and make it model the game it is currently playing in real time (assuming it can simulate numerous games of SC2 that quickly). It could come up with realistic scenarios explaining what the AI already knows about the opponent. It could create working hypotheses regarding what has been happening behind the fog of war, and perhaps even verify them via scouting.
Is what I'm proposing very far-fetched?
I don't know if I'm understanding you correctly, but you could imagine some sort of implementation where an AI has a belief about the opponent's units and economy, which it acts upon in a game and then verifies via watching the replay. I haven't read the paper they released yet, but from some comments I read I don't think it has these capabilities currently.
Also, I don't like spreading misinformation, but I /recall/ having heard that the figure of 200 years is the playtime of the agent which has played the longest time. The week of training probably also includes the initial stage of imitation learning from replays. Depending on how long this lasted, it would mean that if the agent playing vs TLO had 200 years of practice, then the one playing vs Mana, which trained for another week, would have at least 400 years of experience, but possibly much more.
But it might be best to read the paper. I mean, the ratio of a week to 200 years is like 1 : 10,000 , and I'm pretty sure you can't speed up SC2 that much even with good hardware and eliminating graphics. So a single agent has to be able to train in parallel with itself.
Not exactly. The training stage of that module would take place before it would be used in actual games. It would involve trying to recreate replays having information from one player's perspective only. So it would use replays to verify its "predictions" regarding how the game unfolded, but only at the training stage. In the final implementation, where it'd be playing actual opponents (AI or human), the AI would model the game up to the current point in real time. It would rely on early scouting information to narrow down the number of game tree paths to consider - similar to how humans analyze the game. The scouting information would serve as the boundary conditions/anchors.
E.g. let's say the AI sees Nexus first, followed by two Gates and a bunch of Zealots. Firstly, it will reject game tree paths with proxy openings as very unlikely. Secondly, it would simulate various scenarios of how the opponent got there and choose those most probable. After early game it will have a certain belief, as you put it, as to how the game has progressed for both sides so far. This will narrow down the number of game tree paths for it to consider in mid game. The process would closely resemble what humans are currently doing, i.e. creating a mental image of the game.
The implementation I'm proposing would need to be able to simulate SC2 games in quasi-real time. Like you're saying, the ratio of 1 to 10,000 seems excessive. But is it simply a matter of having enough processing power? I'd have to check what sort of hardware they're using to train the AI and then to play against human opponents.
edit: @Acrofales
Would you actually need to parallelize SC2? By that, do you mean simply running one client in parallel with another or something else? Because doing this internally in SC2 could be difficult, but would running multiple clients be a problem? And, as Grumbels said, you'd have to do away with any sort of graphics.
It’s pretty obvious in hindsight that a single AlphaStar agent would be highly abused and embarrassing if it had to play a full Bo5 series, since they probably stick strongly to playing the same way with minimal in-game adjustment, and that’s why DeepMind only let each agent play one time, to prevent any agent from being figured out.
On January 30 2019 22:13 Zzoram wrote: It’s pretty obvious in hindsight that a single AlphaStar agent would be highly abused and embarrassing if it had to play a full Bo5 series, since they probably stick strongly to playing the same way with minimal in-game adjustment, and that’s why DeepMind only let each agent play one time, to prevent any agent from being figured out.
That is certainly true for the state AlphaStar is in right now. However, lets assume that they let the agents play Bo5 series against each other instead of Bo1 during the training stage. I think it is not unreasonable that agents learn to deviate from their "default" playstyle if they continue losing. Thus, such agents might learn to adapt during a BoX series.
On January 28 2019 21:15 Polypoetes wrote: But an AI doesn't get fatigued. Why would you hard-code in artificial fatigue so that the NN develops to avoid the effect of fatigue that it doesn't suffer from in the first place? Also, I don't think even for a human playing a Bo5, fatigue plays a big role. Unless you are jet-lagged or something. I assume you mean mental fatigue, which is hard to notice yourself. From my experience, humans have no obvious problems concentrating for 5x30 minutes.
I don't understand why you say that an AI is not useful unless it has all the flaws humans have.
I may have put in in a wrong way but misclicks do happen a lot in real games and AI is not designed to have misclicks so it's not really a fair battle to start with. I actually have talked with some developers on this program and see if they will try to implement that in the next phases.
Again, what does 'fair' really mean? Humans always blunder in chess. No one in the chess community has ever demanded chess engines blunder on purpose for it to be 'fair' to claim that human computers are better at chess than humans.
Yes, there is the excellent point made earlier about contempt settings. A chess engine doesn't estimate the strength of the opponent. Say you are playing chess and you can make two different moves. One move solidifies your tactical advantage. There is no clear win, but you get a good position with equal material. The other move presents your opponent with big tactical challenges. The opponent has 3 to 4 candidate moves and it is unclear to what position they lead. You have to calculate deep and every new positions has several candidate moves. Incorrect play will lose you a piece. But you have seen all these positions already (because you are a strong engine) and you know the best moves will lead your opponent to win a pawn and have a slightly more active position.
Clearly the best move is to keep the position simple and keep the advantage. The other move would lose you your advantage, and you will lose a pawn as well. But if you know your opponent will never find the best move, you can win the game in the next few moves by playing the continuation you have seen is inferior.
Clearly, the same is true in Starcraft. There is no reason to play around dangerous micro of your enemy when you have identified it is not capable of this.
We don't know if AlphaStar has some special properties to it's NN. It probably has, but not necessarily so. But an ordinary neural network is deterministic. You put in a matrix of data, and the weights and biases give as output a new matrix of data that it's training have taught it belongs to that input. So given exactly the same input, it will do the exact same thing. But there might be many stochastic effects that are not relevant but that do lead to the AI doing different things. So an AI might go DTs or not based on something irrelevant as building placement.
We also don't know if in the internal League the agents knew and learned what engine they were up against. If they don't know, and you let them alternately play thousands of games against two different agents, it will use the adaptions it made against A also against B, and vice versa.
But if you let the same agent play against two static opponent agents. And part of the input is which agent it is, the NN has the potential to overfit to exploit the opponent A and B independently. Similarily, you can select or train AI's against the ability of other agents to find and exploit their weaknesses. You can take an agent you want to improve. You keep it static while you evolve several different agents to exploit it. Then you train your agent of interest against these overfitted exploiter NNs. In parallel, you also need your NN to maintain it's original winrate against normal NNs.
This will discard decision paths and micro tricks that have the potential to be exploited.
You need special tricks for an AI to do mindgames. You could write a NN that does nothing but select the right agent for the right game. You have access to a bunch of replays of human players you play against. You match the patterns you see in these games with what this NN knows about the strengths and weaknesses of your agents. Then you select the agent most likely to beat the human.
As for running the version of SC2 during training, you can run games in parallel. If you have thousands of games you need to simulate, there is no need to run one game in more than 1 core. Just put one game in one thread. Also, you do not need to render graphics. You just need to know the outcome or generate a replay. I don't know how SC2 was written, but in the ideal case, this cuts down computation by a lot, as most power required is needed to render the graphics. I don't think SC2 has any physics engine that actually affects the game outcome, right? So it just needs to keep track of where each unit is, where it is oriented towards, what command it was, etc.
Again, what does 'fair' really mean? Humans always blunder in chess. No one in the chess community has ever demanded chess engines blunder on purpose for it to be 'fair' to claim that human computers are better at chess than humans.
While I don't think this question is necessarily very interesting for Deepmind, there is a market for chess engines that can simulate human players of arbitrary skill. Think about how "ladder anxiety" is a real phenomenon and how beneficial it would be for casual players to be able to face off against engines that play human-like and are capable of learning, while being able to dynamically lower their Elo to be just under yours. If there was an inexpensive method of achieving this, that would have meaningful economic value for the gaming industry (about 180 billion dollars in revenue yearly). Bots aren't capable of this, they can be exploited too easily and they don't play like humans.
On January 30 2019 22:13 Zzoram wrote: It’s pretty obvious in hindsight that a single AlphaStar agent would be highly abused and embarrassing if it had to play a full Bo5 series, since they probably stick strongly to playing the same way with minimal in-game adjustment, and that’s why DeepMind only let each agent play one time, to prevent any agent from being figured out.
That is certainly true for the state AlphaStar is in right now. However, lets assume that they let the agents play Bo5 series against each other instead of Bo1 during the training stage. I think it is not unreasonable that agents learn to deviate from their "default" playstyle if they continue losing. Thus, such agents might learn to adapt during a BoX series.
I don't think this is even a fair qualifier. Each agent right now specifies a specific, heavily optimized build order with assorted micro, contingency plans for when things go wrong, etc. etc. etc. While I agree the more interesting route from an AI point of view is to see whether adaptive play can be learned in this way (although you'll probably need some way to "model" your opponent), for the sake of competition, they might just as well have said it was all a single agent that had learned 5 different strategies, and it would use any one of them. Internally, the "single agent" selects one of the ASL agents at random and loads that. In order to preempt exploitation of a "single strategy" bot, having some rng is very useful.
On January 30 2019 22:13 Zzoram wrote: It’s pretty obvious in hindsight that a single AlphaStar agent would be highly abused and embarrassing if it had to play a full Bo5 series, since they probably stick strongly to playing the same way with minimal in-game adjustment, and that’s why DeepMind only let each agent play one time, to prevent any agent from being figured out.
That is certainly true for the state AlphaStar is in right now. However, lets assume that they let the agents play Bo5 series against each other instead of Bo1 during the training stage. I think it is not unreasonable that agents learn to deviate from their "default" playstyle if they continue losing. Thus, such agents might learn to adapt during a BoX series.
I don't think this is even a fair qualifier. Each agent right now specifies a specific, heavily optimized build order with assorted micro, contingency plans for when things go wrong, etc. etc. etc. While I agree the more interesting route from an AI point of view is to see whether adaptive play can be learned in this way (although you'll probably need some way to "model" your opponent), for the sake of competition, they might just as well have said it was all a single agent that had learned 5 different strategies, and it would use any one of them. Internally, the "single agent" selects one of the ASL agents at random and loads that. In order to preempt exploitation of a "single strategy" bot, having some rng is very useful.
It's not really a highly specific build order though. It's not chess, where you can specify an opening sequence, because in Starcraft II the exact sequence of what-actions-to-execute-when differs literally every game due to stochastic effects and opponent interaction. I don't think it's that easy to have the agent choose randomly from a catalogue of openings, that strikes me as an AI challenge in itself.
This is in the context of the following information.
In order to train AlphaStar, we built a highly scalable distributed training setup using Google's v3 TPUs that supports a population of agents learning from many thousands of parallel instances of StarCraft II. The AlphaStar league was run for 14 days, using 16 TPUs for each agent. During training, each agent experienced up to 200 years of real-time StarCraft play. The final AlphaStar agent consists of the components of the Nash distribution of the league - in other words, the most effective mixture of strategies that have been discovered - that run on a single desktop GPU.
The implication seems to be that after training only for seven days it was already stronger than Mana, even with the camera restriction. But that's clearly ridiculous, that agent wasn't that strong. It also implies that the agents that defeated TLO are much stronger than Mana, but that also seems dubious.
They also used a seriously flawed APM benchmark, and didn't seem bothered by the fact that TLO's APM peaked way above what is physically possible. While their work is very interesting, there are many serious issues with how they present it. :<
If what that quote claims is correct, reconstructing the game in quasi-real time seems very possible.
I don't know how they can place Mana and TLO on that graph. Maybe this is their Blizzard ladder MMR? Anyway, you cannot compare MMRs of two distinct populations. And furthermore, MMR doesn't take into account that there is a rock-paper-scissor effect where certain styles counter others. This is clearly so in SC in general, and Deepmind has made a point out of it several times that their best agent is consistently beaten by an agent that isn't so highly rated. And one reason for them to have this match is to see how play strength in their agent league translates to play strength in the human realm.
So I guess this chart refers to the MMR of the agents inside their agent league. So it means that the agent with the window restriction was able to be quite strong compared to the non-restricted agents. But their window restricted agent bugged out, for whatever reason. So it lost. May be related to it having to use a window, just less training, Mana adapting and trying harder to find an exploit, or luck.
BTW, 'only seven days' means nothing. If you run the training session on 7 times more TPU's, it takes 1 day.
They likely don't have their best technical people work on all this stuff is represented on their site. TLO held down buttons, resulting in insane APM. And their agent peaks APM at crucial micro moments. So this whole APM thing is nonsense. I don't even know why they even bother, actually.
In order to train AlphaStar, we built a highly scalable distributed training setup using Google's v3 TPUs that supports a population of agents learning from many thousands of parallel instances of StarCraft II. The AlphaStar league was run for 14 days, using 16 TPUs for each agent. During training, each agent experienced up to 200 years of real-time StarCraft play. The final AlphaStar agent consists of the components of the Nash distribution of the league - in other words, the most effective mixture of strategies that have been discovered - that run on a single desktop GPU.
The implication seems to be that after training only for seven days it was already stronger than Mana, even with the camera restriction. But that's clearly ridiculous, that agent wasn't that strong. It also implies that the agents that defeated TLO are much stronger than Mana, but that also seems dubious.
Their estimated MMR is probably based on their AlphaLeague but with the same MMR calculations that Blizzard use, however they compare it to MaNa real MMR. If they were to let the agents play on the ladder in real time a lot of games, the MMR of the agents would probably be different than their estimated MMR (especially if it has flaws for everyone to abuse ^_^).
Plus the fact that their APM are not calculated the same way as blizzard (blizzard counts 2 for building something for example, because you need to press 2 keys, and 0 for camera movement from the player, but alphastar counts 1 for everything) and their shady APM graphics (TLO hides their agent APM graph but they didn't specify that it was because of rapid fire and essentially useless spam from TLO) make these MMR comparisons even more pointless.
On January 31 2019 20:31 Polypoetes wrote: BTW, 'only seven days' means nothing. If you run the training session on 7 times more TPU's, it takes 1 day.
I highlighted 'seven days' to contrast with the fourteen days for the agents that beat Mana. I think it's reasonable, based on their play, that these five agents are actually incredibly strong and nigh undefeatable with conventional play. But not the agent that trained for half of the time with a camera restriction and was somewhat simply defeated by Mana. And not the agents that played amateurishly against TLO, those might have been pretty good, but clearly not top tier level.
This is in the context of the following information.
In order to train AlphaStar, we built a highly scalable distributed training setup using Google's v3 TPUs that supports a population of agents learning from many thousands of parallel instances of StarCraft II. The AlphaStar league was run for 14 days, using 16 TPUs for each agent. During training, each agent experienced up to 200 years of real-time StarCraft play. The final AlphaStar agent consists of the components of the Nash distribution of the league - in other words, the most effective mixture of strategies that have been discovered - that run on a single desktop GPU.
The implication seems to be that after training only for seven days it was already stronger than Mana, even with the camera restriction. But that's clearly ridiculous, that agent wasn't that strong. It also implies that the agents that defeated TLO are much stronger than Mana, but that also seems dubious.
Yeah, this is the quote they added underneath the image that offended everyone.
CLARIFICATION (29/01/19): TLO’s APM appears higher than both AlphaStar and MaNa because of his use of rapid-fire hot-keys and use of the “remove and add to control group” key bindings. Also note that AlphaStar's effective APM bursts are sometimes higher than both players.
And this was there before:
In its games against TLO and MaNa, AlphaStar had an average APM of around 280, significantly lower than the professional players, although its actions may be more precise.
So, that seems pretty fair, but then if you look at the conclusion it says.
"These results suggest that AlphaStar’s success against MaNa and TLO was in fact due to superior macro and micro-strategic decision-making, rather than superior click-rate, faster reaction times, or the raw interface."
Which is of course pretty ridiculous, since they just had to admit that the agents had a superior click-rate and when they changed the interface it lost in an embarrassing fashion.
But in my opinion it's not that interesting to continuously litigate this point, and it's a pity that Deepmind wasn't more careful in their presentation. If they had only been a bit more cautious they wouldn't have this level of blowback.
On January 31 2019 20:31 Polypoetes wrote:They likely don't have their best technical people work on all this stuff is represented on their site. TLO held down buttons, resulting in insane APM. And their agent peaks APM at crucial micro moments. So this whole APM thing is nonsense. I don't even know why they even bother, actually.
They bother because their goal seems to be to design an AI that beats humans by outstrategizing them in a game of incomplete information. If they have an AI that is poor at decision making in an incomplete information environment but makes up for it with insane mechanics, that completely defeats the purpose.
I think you're very wrong in your thinking that they want to make an AI that is good at SC2 in general, regardless of what it excels in.
I think you are very wrong to think that making an AI good at SC2 is going to result in an AI that can trick and outmindgame humans. If you think so, you are delusional about what kind of game SC2 is. And the AI will actually show this.
I have asked this before and I still haven't really got an answer from anyone. But how did you think the AI would outsmart the human and win without outmicroing and battle decision making? How would that look? Maybe it is because of my experience and background, but I think I understand that the AI would just always seem 'lucky' and just win. Which is exactly what we saw in these games. People say the AI didn't scout for DTs and would have auto-lost vs DTs, for example. I don't think so. I think it knew. Maybe not the TLO one, but the one Mana played against, pretty sure. Same with the Pylon build in TLO's base and where there AI used all these shield batteries to keep one immortal alive. I think it didn't bug out and place a pylon there. I think it placed it there to be killed, so the opponent's stalkers don't do something else like scout the choke. Same with the Stargate it build at the ramp, then cancelled it as it was scouted, then rebuild it in the back of the base. Another obvious thing is the AI building more probes while SC2 players think 2 a patch is enough because Blizzard put n/16 over each nexus.
Do I think humans can beat these AIs by playing against the same identical AI over and over? Probably, so in that respect it is different from chess or go. But that doesn't really matter because you can generate many agents that are all stronger and different enough they cannot be exploited by the same thing.
The AI just knows what to value because it always weights every parameter there is to weight, and it has been trained enough to give each parameter a very good weight. So in chess AlphaZero always seems to know when to be material and when to be positional. Or find moves that work for all three possible positions. Humans are irrational and are simply not able to do this. They have their tendency and playstyle and that will weaken them. It doesn't look impressive when the AI decides to give up an expansion and keep their army alive, like in the carrier game with TLO. But it is a complex calculation which humans have a hard time evaluating.
I think this holds true in general, that an AI that cannot be beaten by humans but that is making decisions that seem to be mistakes are likely not mistakes. This we saw in go where top players said Alphago made mistakes and weak moves. You can only show them to be mistakes by exploiting them and winning them. Of course this was expecially relevant in go because Alphago always knew which move to make to get a position that wins 52% of the time with 1 more point than your opponent, compared to winning only 51% of the time but now with a larger point difference. It was able to ignore at which margin it would win. So if this translates to Starcraft, then the AI rarely wins decisively, but as a result loses way less. Humans don't just want to win. They want to win convincingly. If you win but it feels like you never played any better, you aren't that satisfied.
I already said I agreed that in the future it will be interesting to make AIs that play weak like a human. But that is certainly not what Deepmind's goal is so far. They want to show they can 'solve' the problem of SC2 by winning against the best player.
To all these people that think the bot didn't out-strategize anyone; you will never be satisfied by the bots ability to strategize, because the nature of SC2 is not what you think it is. For you to get what you want you need a different game.
TLO said Deepmind told him to not make hallucinated units, because it confuses the AI. — in the Pylon show. Someone else said it doesn’t understand burrowed units due to the Blizzard api, and apparently it sort of bugged out as terran because it would lift its buildings in order to not lose and not make progress in training.
Another funny bug that TLO mentioned was that the AI learned to kinda crash the game in order to avoid losing during training.
Polypoetes, you make an awful lot of assumption that doesn't quite bear out. Pro players generally do want to win over if they win decisively. You get the same ladder points and tournament money no matter how much you think you have won or lost a game by. Overmaking workers is an opportunity cost after 16 where you don't get your money back till 2.5 mins after you queued the worker. It makes sense if you are planning to transfer or losing units to harrass. Zerg players for example notably do and does overdrone.The point of deepmind PR stunt was not to show it can win against the best human player (mana and tlo aren't even close to the best) but to show that it could outstrategize humans, but in general it just outmuscled them with massive and accurate spikes of APM.
On February 01 2019 01:24 Dangermousecatdog wrote: Polypoetes, you make an awful lot of assumption that doesn't quite bear out. Pro players generally do want to win over if they win decisively. You get the same ladder points and tournament money no matter how much you think you have won or lost a game by.
But this completely ignores what it means to be human and how humans actually learn. I am making assumptions? Your claim is literally that humans are able to objectively take their human experiences, objectively form their goal, and rewire their brains so it happens more. That is not how humans learn.
Humans learn by re-enforced learning as well. But what is the re-enforcement? You clicking on the screen, trying to kill the enemy army and it either working or failing? Or you looking at the ladder points after the game?
Overmaking workers is an opportunity cost after 16 where you don't get your money back till 2.5 mins after you queued the worker. It makes sense if you are planning to transfer or losing units to harrass. Zerg players for example notably do and does overdrone.
What are you even trying to say?
The point of deepmind PR stunt was not to show it can win against the best human player (mana and tlo aren't even close to the best) but to show that it could outstrategize humans, but in general it just outmuscled them with massive and accurate spikes of APM.
No. Starcraft is a complex game with hidden information. They have an AI that can beat top players. They showned they have the AI architecture and techniques to solve this problem. The solving is in the winning, not in the impressing SC2-playing college kids.
The reason they will play with APM limits is because of the well-known problem of finding minima in very high dimensional phase space. There will be minima in the phase space that are deep, but those cannot be found because they are surrounded by agents that play very badly. Think strategies or play-styles that only work when it is very fine tuned and specifically executed, but then it does really well.
Clearly it finds an easy good minimum using blink stalkers. The question is how good the minima for other unit compositions are. And to cross the phase space from one minimum to the other, it has to travel over a peak, and it needs an incentive to do so. You can do that by rewarding it to build certain units, which they already tried. But you can also do it with an APM limit, because it is clear that it's potential for APM have a lot of synergy with stalkers. So you put an APM limit that changes the phase space and makes stalkers much less optimum, and then find new minima where it uses strategies with other units. And once you are there, you remove the APM limitation and you either find out it stays in that minimum and deepens it further, or it moves back to stalkers.
Yes,a AlphaStar vs AlphaStar match says nothing to us about how we as humans should play or enjoy SC. But that isn't important at all. Look at their Go and Chess projects. They have done very little to 'help' those communities. They do their research into AI using games as the problem to solve. They aren't doing it to help the game in question. They aren't helping the chess community figure out it they should switch to Fischer random, for example.
I really enjoyed seeing AlphaZero's chess games. I don't think I will enjoy watching AlphaZero play SC2 against itself. It says something about the game, not about the AI.
On January 31 2019 20:31 Polypoetes wrote: I don't know how they can place Mana and TLO on that graph. Maybe this is their Blizzard ladder MMR? Anyway, you cannot compare MMRs of two distinct populations. And furthermore, MMR doesn't take into account that there is a rock-paper-scissor effect where certain styles counter others. This is clearly so in SC in general, and Deepmind has made a point out of it several times that their best agent is consistently beaten by an agent that isn't so highly rated. And one reason for them to have this match is to see how play strength in their agent league translates to play strength in the human realm.
I think what they probably did was draw assumptions based off a common data point: where the Elite AI ranks in the skill spectrum among humans. They claimed that the Elite AI is equivalent to "low Gold". Presumably they had a more specific internal MMR value from Blizzard, and were also told what MMR differences equate to what outcome probabilities (as well as probably the exact rating formula itself). DeepMind's primary benchmark was their supervised learning agents which could beat the Elite AI "95% of the time". That translates to a certain rating gap (let's say 500). As the agents train against each other in the AlphaStar League, they can measure win percentages against each other, so if Agent A wins 95% of the time against Agent B, and Agent B wins 95% of the time against Agent C, then you know that Agent A is 1000 MMR higher than Agent C. Eventually you can plot out MMRs for each of the participating agents.
You're of course completely correct that you can't compare MMRs of distinct populations. The environments and the metagames are just too different. We have long known that "6k" on NA is not quite the same as "6k" on EU, which is not quite the same as "6k" on KR, and the best benchmarks we can get are the few players who participate in every region simultaneously, but even then it won't be exact. So "6k" in the AlphaStar League might be TLO level, or it might be higher or lower than that. "7k" in the AlphaStar League might be MaNa level, or it might be higher or lower than that. I guess it sort of serves as an okay point of reference for illustration purposes, but it's by no means scientific fact. Since much of the DeepMind news is marketing-oriented and promotional with laypeople as a target audience, it's important to keep the information they present in perspective.
On January 31 2019 21:55 Polypoetes wrote: I think you are very wrong to think that making an AI good at SC2 is going to result in an AI that can trick and outmindgame humans. If you think so, you are delusional about what kind of game SC2 is. And the AI will actually show this.
I have asked this before and I still haven't really got an answer from anyone. But how did you think the AI would outsmart the human and win without outmicroing and battle decision making? How would that look?
I think I already answered. Having the AI model the game from the opponent's perspective and factor this inferred information into its decision making process would resemble intelligent behavior much more closely. This is what we call reading the game.
Maybe it is because of my experience and background, but I think I understand that the AI would just always seem 'lucky' and just win. Which is exactly what we saw in these games. People say the AI didn't scout for DTs and would have auto-lost vs DTs, for example. I don't think so. I think it knew. Maybe not the TLO one, but the one Mana played against, pretty sure. Same with the Pylon build in TLO's base and where there AI used all these shield batteries to keep one immortal alive. I think it didn't bug out and place a pylon there. I think it placed it there to be killed, so the opponent's stalkers don't do something else like scout the choke. Same with the Stargate it build at the ramp, then cancelled it as it was scouted, then rebuild it in the back of the base. Another obvious thing is the AI building more probes while SC2 players think 2 a patch is enough because Blizzard put n/16 over each nexus.
You are giving AlphaStar too much credit. Those could've been brilliant moves, but just as well they might've been blunders that were rendered irrelevant by the AI's far superior mechanics. There's no way of telling.
I think this holds true in general, that an AI that cannot be beaten by humans but that is making decisions that seem to be mistakes are likely not mistakes. This we saw in go where top players said Alphago made mistakes and weak moves. You can only show them to be mistakes by exploiting them and winning them. Of course this was expecially relevant in go because Alphago always knew which move to make to get a position that wins 52% of the time with 1 more point than your opponent, compared to winning only 51% of the time but now with a larger point difference. It was able to ignore at which margin it would win. So if this translates to Starcraft, then the AI rarely wins decisively, but as a result loses way less. Humans don't just want to win. They want to win convincingly. If you win but it feels like you never played any better, you aren't that satisfied.
How are you drawing that conclusion? AlphaStar by default must've underestimated its chances in any engagement. It mostly practiced with opponents that matched it in terms of mechanics. It could very well be the case that AlphaStar's predictions in terms of how favorable an engagement will be are correct only roughly 50% of the time, but MaNa's "subpar" execution made them appear correct most of the time.
I already said I agreed that in the future it will be interesting to make AIs that play weak like a human. But that is certainly not what Deepmind's goal is so far. They want to show they can 'solve' the problem of SC2 by winning against the best player.
If their goal was to make an AI that beats good SC2 players (while cheating, at that), then that's hardly an accomplishment. What would be their next goal, CSGO bots with aimbot? I highly doubt that is their goal...
To all these people that think the bot didn't out-strategize anyone; you will never be satisfied by the bots ability to strategize, because the nature of SC2 is not what you think it is. For you to get what you want you need a different game.
No, we just need the AI to play on roughly equal terms mechanically, or even have it be weaker mechanically Then there will be no way for it to compensate its strategic shortcomings.
On February 01 2019 01:58 Polypoetes wrote:No. Starcraft is a complex game with hidden information. They have an AI that can beat top players. They showned they have the AI architecture and techniques to solve this problem. The solving is in the winning, not in the impressing SC2-playing college kids.
Yes, and they conveniently rigged the games in a way that makes the incomplete information aspect irrelevant.
And no, they do not have an AI that can beat top players. Currently they have an AI that is easily confused and exploitable, and wins vs. skilled human opponents through far superior mechanics and not decision making.
The solving is not in the winning. That's trivial. Solving would be creating an AI that is mechanically worse (or roughly equal) than a skilled human but wins regardless.
“ think I already answered. Having the AI model the game from the opponent's perspective and factor this inferred information into its decision making process would resemble intelligent behavior much more closely. This is what we call reading the game.”
This strikes me as overreaching. We don’t know if the AI needs to explicitly model the opponent in order to make good decisions. Humans don’t really play like that either in standard gameplay. It might be that you can come up with strats and builds that are, to a degree, independent of what your opponent is doing, or that force them to react to your play in a predictable fashion.
I mean, at some point you’re demanding that AI’s can on the fly react to a completely novel strategy before they’re considered good, which is a harsher standard than what we apply to humans. Most likely it will never come up because its standard decision making in terms of micro and engagements will be so sharp that SC2 will have no space for something utterly bizarre.
Ignorance is one thing, but I cannot believe how arrogant your comments are. Neural networks don't work by modeling the thoughts and intentions of the opponent. This description is also extremely vague. How can you boldly state that you already answered, and then give such a non-answer?
The only conclusion I can draw is that to you an AI that plays SC2 really well doesn't match your definition of 'intelligence'. I suggest you redefine your definition while also reading up on how a neural network plays a game of chess or a game of SC. Simply put, you have a big matrix of game state data. You put that through a very large network of weights, and you get an output from which you compute a game move. You then optimize the weights in that network to get the output you desire.
This simply means that you think that a our current Deeplearning neural network can never be 'intelligent' because it doesn't have 'thoughts'; it doesn't reflect on itself or it's opponent.
As for the 'brilliant moves' made, how am I giving AlphaStar too much credit? It won 0-10! That's all I can go off. Because it is a neural network I cannot judge it as I would judge a human. If it seems only barely stronger than TLO/Mana, but it still wins, then that means nothing. It is optimized to win, not to impress or crush. Yes, those actions may have been blunders. But those kinds of moves are exactly what you would expect from a neural network. That is again why I am so confused because people expected something very different, which they cannot put into words how that would look like.
As for the 'conclusions', they are not conclusions. They are an example. For AlphaGo, Deepmind had an additional system to evaluate the chances at winning. People criticized AlphaGo for making what humans thought to be subpar moves, because it was trying to win more reliably with a small margin while humans would try to win more terrain. It understood what it meant to 'win' a lot better than humans, because it always considers all parameters and evaluates those objectively. And it does so as correctly as the weights have been set. It is just an example literally taken from Go, a more complex game, to show that humans were wrong to evaluate AlphaGo. But you just read that statement and thought 'How did he get that 51%', which is baffling to me. So I doubt you are understanding properly what I wrote just now.
Then you say that their goal of creating an AI that can beat the best players at SC2 is 'hardly an accomplishment'? What does this even mean? That you think they picked the wrong game? This is another puzzling statement. Only a year ago, people here were claiming that it would be a complete theoretical impossibility to create an AI that can beat the best player at SC. Even now, there are people that think humans will be stronger than the best AI Deepmind can come up with, if given the time to practice against it. So how can you say that it is trivial? They have a big team there with a whole bunch of smart people with scientific careers in deep learning/machine learning. They have Google investment and infrastructure. They worked on many other games before they worked on SC; not just go and chess. And you think it is 'hardly an accomplishment'. The people that throw money at Deepmind don't think so.
And yes, that is literally what it means to make a SC2 AI; as if you are making a CSGO aimbot. I think that was intended as an argumentum ad ridiculum, but it isn't. Why don't you go ahead and code some CSGO AI that misses intentionally. CSGO is also a much more straightforward game. You can program an AI to just walk around randomly and it will do pretty well. Furthermore, AI is much better at teamwork than humans are, because it will be a single entity. All teamgames are mostly about teamwork and AI teamworking with AI is a trivial problem.
You also lie about Deepmind making the incomplete information irrelevant. You must mean that it sees the entire map. But I know you know that it still has the fog of war. So it still has perfect incomplete information. So you are not just arrogant, you are also a deliberate liar. Yes, you could have made a point out of the window control, but Deepmind already pre-empted that entire criticism by developing an agent that didn't use that. That agent bugged out and lost. No one denied that this restriction makes the problem more difficult. Which is exactly why they initially didn't put on that restriction. So every single move that AlphaStar did in the 10-0 games, a human player could also have made. Just not all those moves put together.
You also say that the AI is easily confused and is exploitable. Yet if we talk about the complete map version of AlphaStar, it never seemed confused and was never exploited. TLO and Mana on the other hand, they said they were very confused. AlphaStar never played vs humans, TLO and Mana never played vs AlphaStar. And who was the one confused? The humans. And both of them knew they had to try to exploit the AI. Mana only succeeded in doing so by sending forward a single stalker and having the AI's phoenixes lift it in vain. That's about the degree of exploitabilty I saw.
But I guess making an AI that goes 10-0 vs strong human players, that doesn't get confused and is barely exploitable, and simple outplays a human in micro, tactics, build orders, and decision making is trivial.
This really feels like the MBS discussion all over again. People stupidly and against all evidence argue that SC2 is a game of strategy and mindgames. It clearly is not. Any AI beating a top player under any circumstance, you will always have an excuse about why it is 'cheating'. And if you give up on that it is 'simply winning, which is trivial, because it is a machine'.
All these people quibbling over "winning" and "fairness" fail to understand the bigger picture. Starcraft is not the end, merely a small stepping stone in the long road of ML, AI, software engineering, and corporate profit. AlphaStar is a precocious toddler playing in the kiddie pool of Starcraft, before its cousins go on to tackle bigger and better things.
AlphaStar is a work in progress, but even when it is completed, how exactly it plays and wins games of Starcraft are little more than PR. If adding APM caps, or camera restrictions, or what have you will significantly contribute to AlphaStar's learning ability then it will be done. If not, it won't. The way it actually plays Starcraft is purely incidental to the real goals.
So long as it improves Deepmind's understanding of AI, so long as it fulfills the technical requirements of Google's engineers, so long as it satisfies the demands of Alphabet's shareholders, then it will be a success.
Deepmind is well over a billion dollars in the red. The budget for AlphaStar alone is orders of magnitude above anything the entire professional Starcraft scene could ever dream of. Follow the money.
On February 01 2019 06:36 pvsnp wrote: All these people quibbling over "winning" and "fairness" fail to understand the bigger picture. Starcraft is not the end, merely a small stepping stone in the long road of ML, AI, software engineering, and corporate profit. AlphaStar is a precocious toddler playing in the kiddie pool of Starcraft, before its cousins go on to tackle bigger and better things.
AlphaStar is a work in progress, but even when it is completed, how exactly it plays and wins games of Starcraft are little more than PR. If adding APM caps, or camera restrictions, or what have you will significantly contribute to AlphaStar's learning ability then it will be done. If not, it won't. The way it actually plays Starcraft is purely incidental to the real goals.
So long as it improves Deepmind's understanding of AI, so long as it fulfills the technical requirements of Google's engineers, so long as it satisfies the demands of Alphabet's shareholders, then it will be a success.
Deepmind is well over a billion dollars in the red. The budget for AlphaStar alone is orders of magnitude above anything the entire professional Starcraft scene could ever dream of. Follow the money.
Im curious about your "well over a billion in the red". Could you provide a source or sources for that? I am not really doubting it or calling you out, I just legitimately want to educate myself on it. Though I'd point out that it is common business practice to say that things are as unprofitable as possible, if you can get away with it.
Also, I'd like to say that I agree with most of your message, but I do think Deepmind is about more than shareholders. I think that google leadership realizes the level of talent and potential deepmind attracts, and gives them a certain amount of flexibility in what they choose to do with their resources. It wouldn't surprise me if they make some choices (like giving even more "human" type limitations to alphastar), simply for the fun and the challenge.
On February 01 2019 05:48 Grumbels wrote: “ think I already answered. Having the AI model the game from the opponent's perspective and factor this inferred information into its decision making process would resemble intelligent behavior much more closely. This is what we call reading the game.”
This strikes me as overreaching. We don’t know if the AI needs to explicitly model the opponent in order to make good decisions. Humans don’t really play like that either in standard gameplay. It might be that you can come up with strats and builds that are, to a degree, independent of what your opponent is doing, or that force them to react to your play in a predictable fashion.
Humans do play that way, except we only model a fragment of the game we find relevant. In my earlier example I noted how scouting Nexus first and probably a few other things (Probe count, minerals mined, etc.) precludes the possibility of the opponent doing a proxy build (a hypothetical example, I don't know how SC2 PvP works). The way we analyze the information is "he couldn't possibly have had enough minerals for a proxy Gate". The AI module I'm proposing would instead simulate the whole game up to that point, come up with a number of probable game tree paths and rank a proxy opening as highly unlikely.
Sure, what you're describing is also true, but that's even more advanced I'd say. At least making the opponent play a certain way. While it may be the case that modelling the game is not necessary for good decision making, I think it would be an improvement as it would give the decision making module information that is not readily available by "looking" underneath the fog of war. I think that is the essence of what makes a game like SC2 different from a game like chess, and DeepMind are not really addressing that aspect.
@Polypoetes
I will reply tomorrow as it's late now. For now let me just say that it's ironic to accuse me of arrogance seeing as you didn't bother to read what I propose and are constantly making assertions regarding DeepMind's goals regarding SC2 that directly contradict what they say about their AI.
I also never claimed that AlphaStar is maphacking or such. I repeatedly said that it beat TLO and MaNa due to far superior mechanics, which made the incomplete information aspect of the game irrelevant. The same way having a team of aimboters in CS:GO makes tactics irrelevant. You're basically saying that aimboters are tactically superior because when they choose to engage, they dominate. Ridiculous.
On January 30 2019 09:51 maybenexttime wrote: Does anyone know what game speed AlphaStar is playing at during its internal games? Do I remember correctly that they mentioned 200 years of experience in a week? Was it combined playtime across all agents?
What I'm wondering is whether they could make an evolutionary algorithm that is trained to reconstruct a replay from one player's perspective. It's very different from simply teaching it to win. Such an approach would teach it how to model the state of the game from incomplete information. The main problem would be quantifying how faithful the reconstruction of a replay is.
Then they could turn it into a module and incorporate it into AlphaStar, and make it model the game it is currently playing in real time (assuming it can simulate numerous games of SC2 that quickly). It could come up with realistic scenarios explaining what the AI already knows about the opponent. It could create working hypotheses regarding what has been happening behind the fog of war, and perhaps even verify them via scouting.
Is what I'm proposing very far-fetched?
I don't know if I'm understanding you correctly, but you could imagine some sort of implementation where an AI has a belief about the opponent's units and economy, which it acts upon in a game and then verifies via watching the replay. I haven't read the paper they released yet, but from some comments I read I don't think it has these capabilities currently.
Also, I don't like spreading misinformation, but I /recall/ having heard that the figure of 200 years is the playtime of the agent which has played the longest time. The week of training probably also includes the initial stage of imitation learning from replays. Depending on how long this lasted, it would mean that if the agent playing vs TLO had 200 years of practice, then the one playing vs Mana, which trained for another week, would have at least 400 years of experience, but possibly much more.
But it might be best to read the paper. I mean, the ratio of a week to 200 years is like 1 : 10,000 , and I'm pretty sure you can't speed up SC2 that much even with good hardware and eliminating graphics. So a single agent has to be able to train in parallel with itself.
This is a good point. I'm not sure. It would mean that a game of SC2 of normally ~30 minutes would be played in 0.2 seconds. Even having the map and everything loaded into memory in advance, that seems *very* fast to simulate SC2 with 2 quite heavy RL algorithms making the decisions on both sides. On the other hand, they are running it on a rather powerful setup. 16 TPUs can run a pretty hefty NN in very little time. However, the SC2 engine itself is not easily parallelized, and it still needs to compute every unit's actions every step of the simulation.
In theory they could run multiple instances of the same agent simultaneously, then train from datasets of replays, right?
Also starcraft can be sped up a lot. Excluding loading which takes about 5 seconds, I can run a 5 minute game in about 10 seconds on my shitty desktop. So maybe they really can pull it off.
On February 01 2019 06:36 pvsnp wrote: All these people quibbling over "winning" and "fairness" fail to understand the bigger picture. Starcraft is not the end, merely a small stepping stone in the long road of ML, AI, software engineering, and corporate profit. AlphaStar is a precocious toddler playing in the kiddie pool of Starcraft, before its cousins go on to tackle bigger and better things.
AlphaStar is a work in progress, but even when it is completed, how exactly it plays and wins games of Starcraft are little more than PR. If adding APM caps, or camera restrictions, or what have you will significantly contribute to AlphaStar's learning ability then it will be done. If not, it won't. The way it actually plays Starcraft is purely incidental to the real goals.
So long as it improves Deepmind's understanding of AI, so long as it fulfills the technical requirements of Google's engineers, so long as it satisfies the demands of Alphabet's shareholders, then it will be a success.
Deepmind is well over a billion dollars in the red. The budget for AlphaStar alone is orders of magnitude above anything the entire professional Starcraft scene could ever dream of. Follow the money.
I don't think we fail to understand the bigger picture. Quite the contrary. If you transition from a complete information game to an incomplete information game specifically due to this difference, circumventing that aspect of the game defeats the purpose.
On February 01 2019 06:36 pvsnp wrote: All these people quibbling over "winning" and "fairness" fail to understand the bigger picture. Starcraft is not the end, merely a small stepping stone in the long road of ML, AI, software engineering, and corporate profit. AlphaStar is a precocious toddler playing in the kiddie pool of Starcraft, before its cousins go on to tackle bigger and better things.
AlphaStar is a work in progress, but even when it is completed, how exactly it plays and wins games of Starcraft are little more than PR. If adding APM caps, or camera restrictions, or what have you will significantly contribute to AlphaStar's learning ability then it will be done. If not, it won't. The way it actually plays Starcraft is purely incidental to the real goals.
So long as it improves Deepmind's understanding of AI, so long as it fulfills the technical requirements of Google's engineers, so long as it satisfies the demands of Alphabet's shareholders, then it will be a success.
Deepmind is well over a billion dollars in the red. The budget for AlphaStar alone is orders of magnitude above anything the entire professional Starcraft scene could ever dream of. Follow the money.
Im curious about your "well over a billion in the red". Could you provide a source or sources for that? I am not really doubting it or calling you out, I just legitimately want to educate myself on it. Though I'd point out that it is common business practice to say that things are as unprofitable as possible, if you can get away with it.
Also, I'd like to say that I agree with most of your message, but I do think Deepmind is about more than shareholders. I think that google leadership realizes the level of talent and potential deepmind attracts, and gives them a certain amount of flexibility in what they choose to do with their resources. It wouldn't surprise me if they make some choices (like giving even more "human" type limitations to alphastar), simply for the fun and the challenge.
The only figures I can find are that DeepMind's expenses in 2017 were $400 million, more than double its previous year. If that trend continued into 2018, then it's possible. I have a friend who has old academic colleagues in the AI/ML field and they were expressing frustration at the ability to compete with conglomerates like Samsung, Google, and Facebook when it comes to solving problems because these giant companies can just throw endless gobs of money at these research challenges. The figure I heard was that it "cost" DeepMind $25 million per day to train AlphaStar. I say "cost" in quotes because it uses Google's TPUs, and DeepMind is a Google subsidiary, so they can effectively write off the real cost. However, if some third party were to use that same cloud computing power, that's how much Google would have charged them. Obviously, you're not going to get anyone in the academic space to raise that sort of money continuously, so DeepMind's AIs can get trained up much faster and the company as a whole can move much more quickly.
But it is worth noting that while DeepMind does make some sales in the medical field, it's still "in the red" because it's first and foremost a research wing of Google. It's a known cost that will eventually recognize returns in various forms as its AI algorithms become sufficiently advanced (for example, it was responsible for reducing cooling costs at Google data centers by 40%, and improving the longevity of phone batteries on Android 9 by using adaptive brightness).
On February 01 2019 06:36 pvsnp wrote: All these people quibbling over "winning" and "fairness" fail to understand the bigger picture. Starcraft is not the end, merely a small stepping stone in the long road of ML, AI, software engineering, and corporate profit. AlphaStar is a precocious toddler playing in the kiddie pool of Starcraft, before its cousins go on to tackle bigger and better things.
AlphaStar is a work in progress, but even when it is completed, how exactly it plays and wins games of Starcraft are little more than PR. If adding APM caps, or camera restrictions, or what have you will significantly contribute to AlphaStar's learning ability then it will be done. If not, it won't. The way it actually plays Starcraft is purely incidental to the real goals.
So long as it improves Deepmind's understanding of AI, so long as it fulfills the technical requirements of Google's engineers, so long as it satisfies the demands of Alphabet's shareholders, then it will be a success.
Deepmind is well over a billion dollars in the red. The budget for AlphaStar alone is orders of magnitude above anything the entire professional Starcraft scene could ever dream of. Follow the money.
Im curious about your "well over a billion in the red". Could you provide a source or sources for that? I am not really doubting it or calling you out, I just legitimately want to educate myself on it. Though I'd point out that it is common business practice to say that things are as unprofitable as possible, if you can get away with it.
Pretty sure that information should be public too, since Alphabet is publicly traded. Suffice to say that Deepmind costs a pretty penny. After some quick googling:
Also, I'd like to say that I agree with most of your message, but I do think Deepmind is about more than shareholders. I think that google leadership realizes the level of talent and potential deepmind attracts, and gives them a certain amount of flexibility in what they choose to do with their resources. It wouldn't surprise me if they make some choices (like giving even more "human" type limitations to alphastar), simply for the fun and the challenge.
Oh for sure, Deepmind is basically R&D. Very prestigious R&D. The bottom line is not important here, and Google is more than happy to throw money at it, even more so than usual. But there's still faith in Deepmind returning that investment tenfold, eventually.
On February 01 2019 06:36 pvsnp wrote: All these people quibbling over "winning" and "fairness" fail to understand the bigger picture. Starcraft is not the end, merely a small stepping stone in the long road of ML, AI, software engineering, and corporate profit. AlphaStar is a precocious toddler playing in the kiddie pool of Starcraft, before its cousins go on to tackle bigger and better things.
AlphaStar is a work in progress, but even when it is completed, how exactly it plays and wins games of Starcraft are little more than PR. If adding APM caps, or camera restrictions, or what have you will significantly contribute to AlphaStar's learning ability then it will be done. If not, it won't. The way it actually plays Starcraft is purely incidental to the real goals.
So long as it improves Deepmind's understanding of AI, so long as it fulfills the technical requirements of Google's engineers, so long as it satisfies the demands of Alphabet's shareholders, then it will be a success.
Deepmind is well over a billion dollars in the red. The budget for AlphaStar alone is orders of magnitude above anything the entire professional Starcraft scene could ever dream of. Follow the money.
I don't think we fail to understand the bigger picture. Quite the contrary.
First you say this.
If you transition from a complete information game to an incomplete information game specifically due to this difference, circumventing that aspect of the game defeats the purpose.
Then you say that? Please explain how exactly superhuman mechanics allow AlphaStar to magically pierce the fog of war.
On February 01 2019 06:36 pvsnp wrote: All these people quibbling over "winning" and "fairness" fail to understand the bigger picture. Starcraft is not the end, merely a small stepping stone in the long road of ML, AI, software engineering, and corporate profit. AlphaStar is a precocious toddler playing in the kiddie pool of Starcraft, before its cousins go on to tackle bigger and better things.
AlphaStar is a work in progress, but even when it is completed, how exactly it plays and wins games of Starcraft are little more than PR. If adding APM caps, or camera restrictions, or what have you will significantly contribute to AlphaStar's learning ability then it will be done. If not, it won't. The way it actually plays Starcraft is purely incidental to the real goals.
So long as it improves Deepmind's understanding of AI, so long as it fulfills the technical requirements of Google's engineers, so long as it satisfies the demands of Alphabet's shareholders, then it will be a success.
Deepmind is well over a billion dollars in the red. The budget for AlphaStar alone is orders of magnitude above anything the entire professional Starcraft scene could ever dream of. Follow the money.
Im curious about your "well over a billion in the red". Could you provide a source or sources for that? I am not really doubting it or calling you out, I just legitimately want to educate myself on it. Though I'd point out that it is common business practice to say that things are as unprofitable as possible, if you can get away with it.
Also, I'd like to say that I agree with most of your message, but I do think Deepmind is about more than shareholders. I think that google leadership realizes the level of talent and potential deepmind attracts, and gives them a certain amount of flexibility in what they choose to do with their resources. It wouldn't surprise me if they make some choices (like giving even more "human" type limitations to alphastar), simply for the fun and the challenge.
Yeah, I think it’s important to keep this sort of anti-capitalist / materialist analysis in mind, but I don’t think it fully governs Deepmind’s actions. They have some degree of independence and ‘artistic license’. Obviously every corporation will eventually be assimilated into the logic of global capitalism, but for now it’s a little above that.
@Excalibur_Z Btw, I read that Deepmind is responsible for the current voice to text bot google uses.
Circumvent means to go around. AlphaStar makes this aspect of the game relatively irrelevant. The same way having a team of aimboters in CS:GO makes tactics irrelevant. The fact that the aimboters dominate any fight when they choose to engage doesn't make them tactically superior.
The games MaNa lost were rigged in many ways when it comes to the engagements. First of all, there was a vast gap in terms of mechanics between MaNa and AlphaStar - both in terms of battle awareness (not being limited to one screen in case of the AI) and superhuman APM peaks. Secondly, MaNa's experience worked against him. He admitted that he misjudged many engagements due to not being used to playing opponents with such mechanics. Before each battle MaNa overestimated his chances whereas AlphaStar underestimated its chances.
1) It's pretty clear that control plays a bigger role than decision making in those wins. Continuing to make mass stalkers against 8 immortals is not good decision making. Having your whole army go back to the back of your base 4 times to deal with a warp prism while you could just cross the map and go fucking kill him instead is not good decision making. I think it looks especially bad because it's bad decision making in a way that is somewhat obvious, like, very few humans would make those bad decisions. We are used to "bad decision making" that is much subtler than that.
2) I don't like the PR strategy of DeepMind. It seems like they have to hype the fuck out of the accomplishments that they get, and it makes the whole thing seem really artificial to me. I don't have the exact quotes in mind any more but what they said about this starcraft experience felt overreaching when I read it; what they said about the poker experience was even worse, but the poker experience was somewhat more convincing than the starcraft one (it had issues as well).
edit: my mistake, just realized Libratus wasn't made by the same guys. But the same principle applies to both.
On February 01 2019 01:24 Dangermousecatdog wrote: Polypoetes, you make an awful lot of assumption that doesn't quite bear out. Pro players generally do want to win over if they win decisively. You get the same ladder points and tournament money no matter how much you think you have won or lost a game by.
But this completely ignores what it means to be human and how humans actually learn. I am making assumptions? Your claim is literally that humans are able to objectively take their human experiences, objectively form their goal, and rewire their brains so it happens more. That is not how humans learn.
Humans learn by re-enforced learning as well. But what is the re-enforcement? You clicking on the screen, trying to kill the enemy army and it either working or failing? Or you looking at the ladder points after the game?
What are you even saying? You quote me but don't actually say anything that interacts what I am saying. Your assumptions are still false assumptions. And then you write some nonsense. Do you even play SC2?
On February 01 2019 19:24 Nebuchad wrote: I have two main issues with this whole thing.
1) It's pretty clear that control plays a bigger role than decision making in those wins. Continuing to make mass stalkers against 8 immortals is not good decision making. Having your whole army go back to the back of your base 4 times to deal with a warp prism while you could just cross the map and go fucking kill him instead is not good decision making. I think it looks especially bad because it's bad decision making in a way that is somewhat obvious, like, very few humans would make those bad decisions. We are used to "bad decision making" that is much subtler than that.
2) I don't like the PR strategy of DeepMind. It seems like they have to hype the fuck out of the accomplishments that they get, and it makes the whole thing seem really artificial to me. I don't have the exact quotes in mind any more but what they said about this starcraft experience felt overreaching when I read it; what they said about the poker experience was even worse, but the poker experience was somewhat more convincing than the starcraft one (it had issues as well).
edit: my mistake, just realized Libratus wasn't made by the same guys. But the same principle applies to both.
I don't think you can claim that making stalkers is bad decisionmaking at all. On paper, immortals hard counter stalkers. And in human control they do too. But if you have Alphastar micro capabilities, then suddenly they don't anymore. I think you're mixing cause and effect a bit here. Alphastar learned to make stalkers in most situations *because* it also learned to micro them incredibly well. That seems like a legitimate strategy. It's like when MKP showed that if you split your marines they didn't just get blasted into goo by a couple of banelings, and if you did it well, then suddenly banelings no longer countered marines very well at all. Sure, he microd marines FAR better than his contemporaries, but was his choice to then just make lots of marines a bad choice? Clearly not.
As for (2). They are a commercial enterprise. Of course they're going to hype their accomplishments. What did you think? That said, if you actually watched the video, the guys there are quite honest about their achievements and their aspirations. I don't think they believe they have "solved SC2". Or poker, for that matter, although I suspect poker is pretty close to being solved in all its various forms, whereas SC2 will take a bit longer. Still, Alphastar is quite a remarkable achievement, even with its flaws, and they are justifiably proud of it.
On February 01 2019 01:24 Dangermousecatdog wrote: Polypoetes, you make an awful lot of assumption that doesn't quite bear out. Pro players generally do want to win over if they win decisively. You get the same ladder points and tournament money no matter how much you think you have won or lost a game by.
But this completely ignores what it means to be human and how humans actually learn. I am making assumptions? Your claim is literally that humans are able to objectively take their human experiences, objectively form their goal, and rewire their brains so it happens more. That is not how humans learn.
Humans learn by re-enforced learning as well. But what is the re-enforcement? You clicking on the screen, trying to kill the enemy army and it either working or failing? Or you looking at the ladder points after the game?
What are you even saying? You quote me but don't actually say anything that interacts what I am saying. Your assumptions are still false assumptions. And then you write some nonsense. Do you even play SC2?
Are you that dense? Humans don't make a conscious effort to learn. Playing RTS is mostly instincts. Of course a player is trying to win. The question is if and how a player knows what actions make her win. Ladder points are not the re-enforcement for human learning. The human experience is how they learn. This is why you can and do learn by false reinforcement. For example, beginning players turtling up. They think they are playing better because the duration of the game is longer.
You call them 'assumptions', but this is exactly in line with everything I have heard modern experts on human learning talk. It also makes sense in my complete scientific world view. Yet your view is that humans will themselves to learn in an objective way because of ladder points and tournament money. Absurd!
Everything I said exactly addresses your absurdly false claims. But this must be your 'tactic'.
You ask me if I play SC2. Of course I don't. I think it is a bad and boring game, which is born out by AlphaGo. It shows that the perfect way to play, either as an AI or as a human, is to 'circumvent' interesting game play and rely on mechanics and superior micro of one or two units.
But why is it relevant? We aren't talking about SC2. We are talking about how humans learn and how AIs learn. I would like to state to you 'Do you even code?' or 'Do you even have a general understanding of the cognitive sciences?'. But it seems clear to me that you have problems thinking, comprehending the English language, expressing yourself in the English language, or all three.
On February 01 2019 19:24 Nebuchad wrote: I have two main issues with this whole thing.
1) It's pretty clear that control plays a bigger role than decision making in those wins. Continuing to make mass stalkers against 8 immortals is not good decision making. Having your whole army go back to the back of your base 4 times to deal with a warp prism while you could just cross the map and go fucking kill him instead is not good decision making. I think it looks especially bad because it's bad decision making in a way that is somewhat obvious, like, very few humans would make those bad decisions. We are used to "bad decision making" that is much subtler than that.
2) I don't like the PR strategy of DeepMind. It seems like they have to hype the fuck out of the accomplishments that they get, and it makes the whole thing seem really artificial to me. I don't have the exact quotes in mind any more but what they said about this starcraft experience felt overreaching when I read it; what they said about the poker experience was even worse, but the poker experience was somewhat more convincing than the starcraft one (it had issues as well).
edit: my mistake, just realized Libratus wasn't made by the same guys. But the same principle applies to both.
I don't think you can claim that making stalkers is bad decisionmaking at all. On paper, immortals hard counter stalkers. And in human control they do too. But if you have Alphastar micro capabilities, then suddenly they don't anymore. I think you're mixing cause and effect a bit here. Alphastar learned to make stalkers in most situations *because* it also learned to micro them incredibly well. That seems like a legitimate strategy. It's like when MKP showed that if you split your marines they didn't just get blasted into goo by a couple of banelings, and if you did it well, then suddenly banelings no longer countered marines very well at all. Sure, he microd marines FAR better than his contemporaries, but was his choice to then just make lots of marines a bad choice? Clearly not.
As for (2). They are a commercial enterprise. Of course they're going to hype their accomplishments. What did you think? That said, if you actually watched the video, the guys there are quite honest about their achievements and their aspirations. I don't think they believe they have "solved SC2". Or poker, for that matter, although I suspect poker is pretty close to being solved in all its various forms, whereas SC2 will take a bit longer. Still, Alphastar is quite a remarkable achievement, even with its flaws, and they are justifiably proud of it.
There comes a point where it wouldn't have worked though. I don't know how many immortals are required, if it's 10 or 14, but at some point Alphastar would have still lost. Mana perceived that the point was 8 because he was used to playing human stalkers, and so he was on the map with 8 immortals thinking he was safe when he wasn't. At some point he would have been safe.
I didn't put 2) in there because I find it particularly surprising, but because it makes me ask myself more questions about the whole enterprise than I would if they made their commentary more fair and analytical.
Well, apparently the internal agents favoured stalkers naturally. Maybe those that were given the artificial incentive build immortals were beating those stalker-heavy agents. We don't know for sure. But apparently there is a downside to making a lot of immortals when most other agents are heavy on stalkers. Maybe they would mostly lose against any other agent not making stalkers.
If stalkers counter everything but immortals, and immortals get countered by everything but stalkers, then it probably is still best to make stalkers. If you don't like this, take it up with Blizzard, not with Deepmind.
I find it annoying that we could not see more games with the camera interface once MaNa finally won. They could have cut the off race TLO games and played more live matches.
On February 02 2019 02:42 Polypoetes wrote: If stalkers counter everything but immortals, and immortals get countered by everything but stalkers, then it probably is still best to make stalkers. If you don't like this, take it up with Blizzard, not with Deepmind.
This. That's exactly the point. As long as the game allows for insane apm, one cannot blame an AI for using it.
On February 02 2019 02:55 Nebuchad wrote: I find it difficult to make many charitable assumptions on how much the machine calculated when I look at that warp prism defense.
A neural network does the same amount of calculations with random weights as it does with the weights it found through '200 years of gameplay'. So there you already go wrong.
Secondly, that AI was different from the AI that went 0-5 against Mana. So don't judge those AI's by the different AI in the last game.
Third, we saw AlphaGo become 'delusional' after it played very strongly. These kinds of blind spots and failures are natural for neural networks, because no amount of training can ever prepare a NN completely for any test input. If you want an AI that succeeds 99.99 of the time, then don use a NN. Yet despite it's delusions, AlphaGo was stronger than Lee Sedol. So once the NN goes wrong and is losing, you cannot judge it's strengths on what it is doing then. Korean commentators were literally laughing AlphaGo off stage. If you watch the AlphaGo documentary, which you probably have when you are debating here, you already know what I mean.
On February 01 2019 18:12 maybenexttime wrote: @pvsnp
Circumvent means to go around.
Really? I had no idea. Thanks for the pretentious dictionary copypaste.
AlphaStar makes this aspect of the game relatively irrelevant. The same way having a team of aimboters in CS:GO makes tactics irrelevant. The fact that the aimboters dominate any fight when they choose to engage doesn't make them tactically superior.
And.....? AlphaStar's goal is not to be a tactically superior Starcraft player. Just using the word "superior" betrays your lack of understanding. Superior to a completely arbitrary performance bechmark? Is Mana the only progamer out there? Change the benchmark and it's inferior, or superior, or whatever. Whether it happens to be superior or not is purely incidental to the real goals of making decisions with incomplete information. Which, as you are either overlooking or ignoring, applies to all aspects of the game, not just micro.
The games MaNa lost were rigged in many ways when it comes to the engagements. First of all, there was a vast gap in terms of mechanics between MaNa and AlphaStar - both in terms of battle awareness (not being limited to one screen in case of the AI) and superhuman APM peaks. Secondly, MaNa's experience worked against him. He admitted that he misjudged many engagements due to not being used to playing opponents with such mechanics. Before each battle MaNa overestimated his chances whereas AlphaStar underestimated its chances.
And more of the inane blathering about the same talking points. Did you even bother to read my post? AlphaStar doesn't really care about rigging. Deepmind doesn't really care. Google doesn't really care.
Because "rigging" implies that there is a "proper" (human) way for the AI to play Starcraft, which there isn't, as far as Deepmind is concerned. Because the true goal of AlphaStar is not to play Starcraft like a human. It's to further understand AI decision-making with incomplete information. And if the optimal decision is to use superhuman micro, then that's a useful conclusion. Does it trivialize the problem if the decision is always to use superhuman micro in battle? Perhaps, to some degree. But to claim that the entire game as an incomplete information environment, from production to scouting to harass and finally to battle, is trivialized merely because superhuman micro is within the action set, is idiocy.
Any attention AlphaStar pays towards Starcraft as a game, the fans, the fairness, the showmatches, is more or less entirely incidental. Or PR driven. This is a technical project with technical goals, and Starcraft is simply the vehicle of choice. If Starcraft had zero progamers and zero support, the only things AlphaStar would lose are a useful (but nonessential) performance benchmark and a PR opportunity.
Bluntly put, everything you're so busy preaching about doesn't matter. Go play Starcraft and leave AI to the professionals.
On February 02 2019 04:47 pvsnp wrote:Really? I had no idea. Thanks for the pretentious dictionary copypaste.
I thought you were implying that I somehow said that superhuman mechanics allow AlphaStar pierce through the fog of war. I guess it's you making that claim.
And.....? AlphaStar's goal is not to be a tactically superior Starcraft player. Whether it happens to be or not is purely incidental to the real goals of making decisions with incomplete information. Which, as you are either overlooking or ignoring, applies to all aspects of the game, not just micro.
And more of the inane blathering about the same talking points. Did you even bother to read my post? AlphaStar doesn't really care about rigging. Deepmind doesn't really care. Google doesn't really care.
Because "rigging" implies that there is a "proper" (human) way for the AI to play Starcraft, which there isn't, as far as Deepmind is concerned. Because the true goal of AlphaStar is not to play Starcraft like a human. It's to further understand AI decision-making with incomplete information. And if the optimal decision is to use superhuman micro, then that's a useful conclusion.
Any attention they pay towards Starcraft as a game, the fans, the fairness, the showmatches, is more or less entirely PR. This is a technical project with technical goals and Starcraft is simply the vehicle of choice.
That was an analogy meant to show how an AI can completely ignore the decision making aspect of a game that normally has one. Superhuman Stalker micro does not put the AI's decision making to the test, let alone decision making in an incomplete information environment (I'm pretty sure that AlphaStar's Stalker micro would look the same if they were to make SC2 a game of complete information). While there is decision making involved in Blink Stalker micro, it's the inhuman speed and precision of the execution that makes all the difference, not the decisions made. It doesn't matter which Stalker you chose to blink and where if you can blink five of them faster than a human would blink just one...
@pvsnp, That’s really silly. AlphaStar is to a large extent a collaboration between Blizzard and Deepmind. Do you think Blizzard has no stake in SC2? Deepmind can’t just use the custom-made SC2 interface, the set of replays Blizzard took fom the ladder etc. and do a “hit-and-run” without antagonizing Blizzard. Furthermore, the video game industry makes 150 billion dollar a year in revenue, so creating agents that can play video games have potential utility and therefore economic value. As an example, without APM limits and human-esque play you can’t use these agents for replacing the in-game AI.
And why do you think that Deepmind targeted Go and chess? Or Atari games? Or SC2? Maybe because it was founded by a former chess prodigy who is obsessed with board games, and maybe because it largely attracts researchers who are gaming enthusiasts. Deepmind is not just some nebulous google research center plotting world domination, they are also a prestige project and they have some degree of autonomy. You can’t be completely cynical about them.
Fact of the matter is, everyone in this thread who was taking this tone of “suck it up, Deepmind considers SC2 beneath itself, it will vulture-like scavenge what it can from it and then move on, their true goal is skynet/world domination”... they are likely to be proven wrong as the co-leads of AlphaStar already conceded they would look at the APM limits and probably adjust them.
Of course you shouldn’t trust them, but they just aren’t beyond sentimental and moral considerations.
On February 02 2019 04:47 pvsnp wrote:Really? I had no idea. Thanks for the pretentious dictionary copypaste.
I thought you were implying that I somehow said that superhuman mechanics allow AlphaStar pierce through the fog of war. I guess it's you making that claim.
I was making the claim that you're missing the point, and everything I've heard since only reinforces that.
And.....? AlphaStar's goal is not to be a tactically superior Starcraft player. Whether it happens to be or not is purely incidental to the real goals of making decisions with incomplete information. Which, as you are either overlooking or ignoring, applies to all aspects of the game, not just micro.
And more of the inane blathering about the same talking points. Did you even bother to read my post? AlphaStar doesn't really care about rigging. Deepmind doesn't really care. Google doesn't really care.
Because "rigging" implies that there is a "proper" (human) way for the AI to play Starcraft, which there isn't, as far as Deepmind is concerned. Because the true goal of AlphaStar is not to play Starcraft like a human. It's to further understand AI decision-making with incomplete information. And if the optimal decision is to use superhuman micro, then that's a useful conclusion.
Any attention they pay towards Starcraft as a game, the fans, the fairness, the showmatches, is more or less entirely PR. This is a technical project with technical goals and Starcraft is simply the vehicle of choice.
That was an analogy meant to show how an AI can completely ignore the decision making aspect of a game that normally has one. Superhuman Stalker micro does not put the AI's decision making to the test, let alone decision making in an incomplete information environment (I'm pretty sure that AlphaStar's Stalker micro would look the same if they were to make SC2 a game of complete information). While there is decision making involved in Blink Stalker micro, it's the inhuman speed and precision of the execution that makes all the difference, not the decisions made. It doesn't matter which Stalker you chose to blink and where if you can blink five of them faster than a human would blink just one...
A terrible analogy that is either ignorant or disingenuous, given what you said earlier about circumventing the point. If the point is decisionmaking then deciding to use superhuman micro works just fine. Why would anything about human blink skill factor in? What bearing does that have on optimal decisionmaking from limited information?
Decisions on where and how to blink with superhuman micro don't matter in the context of winning the game. They matter in the context of choosing optimally given limited knowledge about current game state. Guess which one Deepmind cares about?
Anyway, not wasting my time any further with you.
Good to hear, correcting you was getting tiresome.
On February 02 2019 06:21 Grumbels wrote: @pvsnp, That’s really silly. AlphaStar is to a large extent a collaboration between Blizzard and Deepmind. Do you think Blizzard has no stake in SC2? Deepmind can’t just use the custom-made SC2 interface, the set of replays Blizzard took fom the ladder etc. and do a “hit-and-run” without antagonizing Blizzard. Furthermore, the video game industry makes 150 billion dollar a year in revenue, so creating agents that can play video games have potential utility and therefore economic value. As an example, without APM limits and human-esque play you can’t use these agents for replacing the in-game AI.
And why do you think that Deepmind targeted Go and chess? Or Atari games? Or SC2? Maybe because it was founded by a former chess prodigy who is obsessed with board games, and maybe because it largely attracts researchers who are gaming enthusiasts. Deepmind is not just some nebulous google research center plotting world domination, they are also a prestige project and they have some degree of autonomy. You can’t be completely cynical about them.
Fact of the matter is, everyone in this thread who was taking this tone of “suck it up, Deepmind considers SC2 beneath itself, it will vulture-like scavenge what it can from it and then move on, their true goal is skynet/world domination”... they are likely to be proven wrong as the co-leads of AlphaStar already conceded they would look at the APM limits and probably adjust them.
Of course you shouldn’t trust them, but they just aren’t beyond sentimental and moral considerations.
I can see how you got to your conclusion that I have a very low and/or cynical opinion of Deepmind/Google. Especially since that's not exactly an uncommon opinion. But you've got it totally backwards. I love Deepmind and Google. I think AlphaStar is both technically interesting and very entertaining.
What annoys me is seeing all the prejudice surrounding AlphaStar, the misconceptions on how ML works, and the general ignorance about anything technical. Of course there are legitimate criticisms to be levelled at the way Deepmind has approached Starcraft with AlphaStar. But it's annoying, to say the least, when laymen pretend at expertise.
Google and Deepmind are doing Starcraft a favor by bringing so much attention. And yet so many people react by immediately attacking them and their work, in many cases without the slightest understanding of the technical aspects involved.
This video was quite disappointing. The AI has numerous mechanical advantages over the human players. Under these circumstances we can not learn anything about strategy or the hidden beauty that is left in SC2.
1. APM was up to 1000 in the blink Stalker battles as far as I have seen. 2. Click precision should be lowered to match humans. 3. Perception should be lowered so it can't detect invisible units immediately. 4. It shouldn't be able to perceive the whole map and act in more than one screen.
On February 02 2019 09:32 Greenei wrote: This video was quite disappointing. The AI has numerous mechanical advantages over the human players. Under these circumstances we can not learn anything about strategy or the hidden beauty that is left in SC2.
1. APM was up to 1000 in the blink Stalker battles as far as I have seen. 2. Click precision should be lowered to match humans. 3. Perception should be lowered so it can't detect invisible units immediately. 4. It shouldn't be able to perceive the whole map and act in more than one screen.
It's an AI research project, not a 100% fairness project. The AI is not intended to compete in professional leagues. It's intended to further the research in the field of AI - and as someone else said, StarCraft is merely the vessel of choice.
Complaining that the AI has superior micro is the equivalent of complaining that chess computers can calculate millions of positions per second (Deep Blue reached a peak of around ~120 million positions per second when it beat Kasparov). Turning that argument around, you could also complain that the human can more easily outsmart the AI, and it isn't fair for the AI.
Both are stupid arguments. Humans and computers aren't the same, and they're not intended to compete - which is why they generally don't. In chess, humans compete in human-only tournaments, and AI's compete in AI-only tournaments. And when humans and AI's do compete against each other from time to time, it's either (1) for fun, (2) for show or (3) for learning. It's NOT for competition.
It’s actually a pretty big problem if AIs win due to outright superior mechanics because it makes the game a far easier problem to solve. If you wanna push the field of reinforcement learning using a strategy game that relies on mechanics as well as strategy, you need a relatively fair fight to do so.
On February 01 2019 23:46 Dangermousecatdog wrote:
On February 01 2019 01:58 Polypoetes wrote:
On February 01 2019 01:24 Dangermousecatdog wrote: Polypoetes, you make an awful lot of assumption that doesn't quite bear out. Pro players generally do want to win over if they win decisively. You get the same ladder points and tournament money no matter how much you think you have won or lost a game by.
But this completely ignores what it means to be human and how humans actually learn. I am making assumptions? Your claim is literally that humans are able to objectively take their human experiences, objectively form their goal, and rewire their brains so it happens more. That is not how humans learn.
Humans learn by re-enforced learning as well. But what is the re-enforcement? You clicking on the screen, trying to kill the enemy army and it either working or failing? Or you looking at the ladder points after the game?
What are you even saying? You quote me but don't actually say anything that interacts what I am saying. Your assumptions are still false assumptions. And then you write some nonsense. Do you even play SC2?
Are you that dense? Humans don't make a conscious effort to learn. Playing RTS is mostly instincts. Of course a player is trying to win. The question is if and how a player knows what actions make her win. Ladder points are not the re-enforcement for human learning. The human experience is how they learn. This is why you can and do learn by false reinforcement. For example, beginning players turtling up. They think they are playing better because the duration of the game is longer.
You call them 'assumptions', but this is exactly in line with everything I have heard modern experts on human learning talk. It also makes sense in my complete scientific world view. Yet your view is that humans will themselves to learn in an objective way because of ladder points and tournament money. Absurd!
Everything I said exactly addresses your absurdly false claims. But this must be your 'tactic'.
You ask me if I play SC2. Of course I don't. I think it is a bad and boring game, which is born out by AlphaGo. It shows that the perfect way to play, either as an AI or as a human, is to 'circumvent' interesting game play and rely on mechanics and superior micro of one or two units.
But why is it relevant? We aren't talking about SC2. We are talking about how humans learn and how AIs learn. I would like to state to you 'Do you even code?' or 'Do you even have a general understanding of the cognitive sciences?'. But it seems clear to me that you have problems thinking, comprehending the English language, expressing yourself in the English language, or all three.
User was temp banned for this post.
So, no you don't play SC2. I thought so. And from the sounds of it no other game or sport either. It's fairly obvious that you are talking out of your arse.
On February 01 2019 23:46 Dangermousecatdog wrote:
On February 01 2019 01:58 Polypoetes wrote:
On February 01 2019 01:24 Dangermousecatdog wrote: Polypoetes, you make an awful lot of assumption that doesn't quite bear out. Pro players generally do want to win over if they win decisively. You get the same ladder points and tournament money no matter how much you think you have won or lost a game by.
But this completely ignores what it means to be human and how humans actually learn. I am making assumptions? Your claim is literally that humans are able to objectively take their human experiences, objectively form their goal, and rewire their brains so it happens more. That is not how humans learn.
Humans learn by re-enforced learning as well. But what is the re-enforcement? You clicking on the screen, trying to kill the enemy army and it either working or failing? Or you looking at the ladder points after the game?
What are you even saying? You quote me but don't actually say anything that interacts what I am saying. Your assumptions are still false assumptions. And then you write some nonsense. Do you even play SC2?
Are you that dense? Humans don't make a conscious effort to learn. Playing RTS is mostly instincts. Of course a player is trying to win. The question is if and how a player knows what actions make her win. Ladder points are not the re-enforcement for human learning. The human experience is how they learn. This is why you can and do learn by false reinforcement. For example, beginning players turtling up. They think they are playing better because the duration of the game is longer.
You call them 'assumptions', but this is exactly in line with everything I have heard modern experts on human learning talk. It also makes sense in my complete scientific world view. Yet your view is that humans will themselves to learn in an objective way because of ladder points and tournament money. Absurd!
Everything I said exactly addresses your absurdly false claims. But this must be your 'tactic'.
You ask me if I play SC2. Of course I don't. I think it is a bad and boring game, which is born out by AlphaGo. It shows that the perfect way to play, either as an AI or as a human, is to 'circumvent' interesting game play and rely on mechanics and superior micro of one or two units.
But why is it relevant? We aren't talking about SC2. We are talking about how humans learn and how AIs learn. I would like to state to you 'Do you even code?' or 'Do you even have a general understanding of the cognitive sciences?'. But it seems clear to me that you have problems thinking, comprehending the English language, expressing yourself in the English language, or all three.
User was temp banned for this post.
So, no you don't play SC2. I thought so. And from the sounds of it no other game or sport either. It's fairly obvious that you are talking out of your arse.
On February 02 2019 09:32 Greenei wrote: This video was quite disappointing. The AI has numerous mechanical advantages over the human players. Under these circumstances we can not learn anything about strategy or the hidden beauty that is left in SC2.
1. APM was up to 1000 in the blink Stalker battles as far as I have seen. 2. Click precision should be lowered to match humans. 3. Perception should be lowered so it can't detect invisible units immediately. 4. It shouldn't be able to perceive the whole map and act in more than one screen.
Complaining that the AI has superior micro is the equivalent of complaining that chess computers can calculate millions of positions per second
It obviously isn't. There are no mechanics in chess, what we're testing is the decision making. This is also what we should be expecting to be tested in Starcraft. It's not impressive to know that a program functions quicker than a hand, anyone could have told you that.
On February 02 2019 09:32 Greenei wrote: This video was quite disappointing. The AI has numerous mechanical advantages over the human players. Under these circumstances we can not learn anything about strategy or the hidden beauty that is left in SC2.
1. APM was up to 1000 in the blink Stalker battles as far as I have seen. 2. Click precision should be lowered to match humans. 3. Perception should be lowered so it can't detect invisible units immediately. 4. It shouldn't be able to perceive the whole map and act in more than one screen.
It's an AI research project, not a 100% fairness project. The AI is not intended to compete in professional leagues. It's intended to further the research in the field of AI - and as someone else said, StarCraft is merely the vessel of choice.
Complaining that the AI has superior micro is the equivalent of complaining that chess computers can calculate millions of positions per second (Deep Blue reached a peak of around ~120 million positions per second when it beat Kasparov). Turning that argument around, you could also complain that the human can more easily outsmart the AI, and it isn't fair for the AI.
Both are stupid arguments. Humans and computers aren't the same, and they're not intended to compete - which is why they generally don't. In chess, humans compete in human-only tournaments, and AI's compete in AI-only tournaments. And when humans and AI's do compete against each other from time to time, it's either (1) for fun, (2) for show or (3) for learning. It's NOT for competition.
Why do you think they limited the APM of the engine at all then? In the interview they said that they are happy if new strategies emerge from the AI and humans can learn something from it for their own game. This goal is inconsistent with the AI having large APM. My criticism is that they did not go far enough to ensure fair play.
Furthermore, I said that I was disappointed in it because we can't learn anything about strategy. I didn't say anything about fair competition. Microbots just aren't that fun to look at.
They said that each agent requires 16 TPU's to train, but I don't know how many agents compete in the league. But let's assume it's about a thousand. That means 16,000 TPU's running in parallel (for reference, AlphaZero used 5,000). The maximum pricing given is $4.50 / h. That comes down to a cost of ~1.7 million dollar a day, and about 25 million dollar for the full two weeks of training, not including the initial phase of learning from replays.
But I picked the number of agents at random, maybe I'm off by an order of magnitude. And obviously Google's pricing of TPU services to outside clients is not the same as their internal assessment of its cost for in-house research teams.
Someone on Reddit said that it would cost them 25 million a day to train AlphaStar, but I don't know how that person came up with those numbers. Possibly many people within the SC2 AI or ML community have access to inside information.
But if these numbers are roughly correct, and if you assume that a functioning AI that works for all match-ups and all maps and all patches would require ten times the training, that's hundreds of millions of dollars for training an AI. And that's disregarding the labor costs of the dozens of people working on this project for the last couple of years.
some really moronic people in here, the point of google AI is to outthink a human, not outclick a human. no one would be impressed with an AI that can outmicro pros.
if you want an AI that uses optimal micro to beat a human player, then I'm sure the hardest blizz AI with a marine splitting/stalker blinking/roach burrow cheat would have beaten humans about 5 years ago.
On February 03 2019 13:03 shadymmj wrote: some really moronic people in here, the point of google AI is to outthink a human, not outclick a human. no one would be impressed with an AI that can outmicro pros.
if you want an AI that uses optimal micro to beat a human player, then I'm sure the hardest blizz AI with a marine splitting/stalker blinking/roach burrow cheat would have beaten humans about 5 years ago.
And believe it or not, the Deepmind engineers are well aware of that. Which is exactly why they've designed AlphaStar as they have.
This is the first time in a while I've taken time to watch SC2, not for lack of love of the game but lack of time, and holy shit this was fascinating!
I am much less impressed with the matchscore results (10-1) once I learned that the computer could see the whole screen at once in the first 10 games meaning that it didn't have to use its APM to adjust the screen and manipulate the amazing blink stalker micro. But, the presentation was strong and they were very transparent which I appreciate.
Probably the most interesting thing overall was that it seems to have proved a case for oversaturating mineral lines in the beginning. I was Zerg player so for me the possibilities for some of the more complex Zerg ability interactions in the future really fascinate me!
I'm sorry but your fall back on the inhuman micro is unfair and childish argument.
1. It's an artifiticial intelligent 2. It's programmed to play in the most optimal way. 3. It's training is also inhuman, thus playing other AI will enforce inhuman tactics and most efficient inhuman am.
4. The agents playing all have different strategies, while most of them used stalkers as a majority it vastly differed in how they built around them.
Thus, when criticising Alphastar I really wish people would consider the constraints and positives it has.
You cannot say definitively that X player trained to play against Y bot will destroy it. As far as I can tell out of all 11 games, Alphastars openings were nearly flawless. It was aggressive, paid attention to buildings, units and position.
Let's consider the proxy gate game. We talk about boldly moving up the ramp but I also think many pros would do the same, considering the pressure needed in order to win that particular game. If it had not moved up that ramp, it would of lost that game. Especially since the only units it had was stalkers vs Robo, sentry and few stalkers. Not to mention like I stated before is what Alphastar looks at in considering future probabilities and necessary actions to win.
By all means Alphastar isn't unbeatable but you are very undermining it's achievements. That's just down right ignorant and a greedy opinionated thinking.
P.S. I would make a more refined post but I'm not up for the challenge at 4am in China. I'm not arguing for or against but I really think people need to analyze this situation with more than "I think" or "it thinks". No one should be winning reviews, the point of a review is to show mistakes and weaknesses.
I also really dislike people saying if.... "If this happened then X would win." When in fact we hardly know nothing about how AlphaStar would react to the situation.
"But in the last game..." Yeah the last game was one agent, with more human like camera control. We actually don't know if the other agents would respond the same way. Our sample pool is just to small to be critizing and arguing over optimal apm.
Since the non-handicapped versions went 10-0, I think there is no basis to say that they played with flaws that could be abused.
A very important note that most people are missing:
The version of Alphastar that dropped a game had only been trained for 7 days, vs the 14 days of the by far the most impressive one. Training for half of the time caused an MMR drop of around 1000 while using the camera interface only caused an MMR drop of around 200.
I wrote about the three significant impacts of AlphaStar from a data scientist perspective: the arrival of real-time AI, the shift beyond supervised machine learning, and the future of mutual learning between AI and humans.
Back propagation calculates the derivatives for whatever you wanna do. You can do Netwon-Raphson, BFGS, DFP, etc. You can very well use back propagation and go uphill.
I'm just correcting your wrong terminology and you're being belligerent without reason.
Dude, read it again. You wrote a totally nonsensical line. Back propagation does not go uphill because it does not go anywhere. You don't think it's valid to point out that you're mixing up the buzzwords, but you think it's reasonable to point out a typo? I'm not disputing the rest of your post (nor am I supporting it). Instead of getting fired up because of very local, concrete and polite criticism, you could just write "yeah I got'em mixed up".
I didn't read your whole post, that's why I'm not disputing it. I read the very beginning and there's a glaring error there. I know very little about machine learning, but I happen to study optimization so I took my time to clarify.
Back propagation is not an optimization algorithm. It's also not "NN or ML". It's an algorithm created to differentiate functions expressed as the composition of functions whose derivatives are known. It's older than both of us and it's basically the chain rule, known for hundreds of years.
The optimization algorithm used in most neural networks is stochastic gradient descent (btw, it goes uphill sometimes because it's stochastic). It is not the only one. L-BFGS was also used with some success. It is not purely gradient based (uses second derivative information), so it also goes uphill.
If someday you decide to learn something instead of fighting people who are helping you, I suggest you read Nocedal's book, it's a very good introduction.
They said that each agent requires 16 TPU's to train, but I don't know how many agents compete in the league. But let's assume it's about a thousand. That means 16,000 TPU's running in parallel (for reference, AlphaZero used 5,000). The maximum pricing given is $4.50 / h. That comes down to a cost of ~1.7 million dollar a day, and about 25 million dollar for the full two weeks of training, not including the initial phase of learning from replays.
But I picked the number of agents at random, maybe I'm off by an order of magnitude. And obviously Google's pricing of TPU services to outside clients is not the same as their internal assessment of its cost for in-house research teams.
Someone on Reddit said that it would cost them 25 million a day to train AlphaStar, but I don't know how that person came up with those numbers. Possibly many people within the SC2 AI or ML community have access to inside information.
But if these numbers are roughly correct, and if you assume that a functioning AI that works for all match-ups and all maps and all patches would require ten times the training, that's hundreds of millions of dollars for training an AI. And that's disregarding the labor costs of the dozens of people working on this project for the last couple of years.
LOL, amazing BIll O'Reilly tactics. I wonder if I'm chatting with Eliza.
I'll only reply to your second paragraph, cause the rest is clearly gibberish. Stochastic gradient descent randomly selects a subset of the training set to optimize over. It tunes out the parameters of the network in a way to make the error over this set go down. It does NOT guarantee that the true error, over the whole set, will go down. And we're not even talking about generalization, which is the real difficulty with neural networks.
Now that I know that you're trolling, I'm out of this stupid debate.
I don't know how many agents compete in the league. But let's assume it's about a thousand.
Only a few dozen agents by the end of it. Did they say 16 TPU's per agent specifically?
From the reddit AMA here is one of David Silver answers:
In order to train AlphaStar, we built a highly scalable distributed training setup using [Google's v3 TPUs](https://cloud.google.com/tpu/) that supports a population of agents learning from many thousands of parallel instances of StarCraft II. The AlphaStar league was run for 14 days, using 16 TPUs for each agent. The final AlphaStar agent consists of the most effective mixture of strategies that have been discovered, and runs on a single desktop GPU.
In order to train AlphaStar, we built a highly scalable distributed training setup using [Google's v3 TPUs](https://cloud.google.com/tpu/) that supports a population of agents learning from many thousands of parallel instances of StarCraft II. The AlphaStar league was run for 14 days, using 16 TPUs for each agent. The final AlphaStar agent consists of the most effective mixture of strategies that have been discovered, and runs on a single desktop GPU.
so yeah 16 TPUs per agent.
Yup, and "many thousands". But I'm not sure you can use commercial computing prices. I don't know if there's any way of taking only cost price, which is essentially manufacturing and electricity (and the former is almost certainly negligible in comparison to the latter).
This micro thing got me into reading of SC2 API docs to understand how AlphaStar talks to SC2.
As others already pointed out, the SC2 API is unfair by design since it allows to directly perform actions over units, rather than moving a some kind of mouse pointer and performing mouse clicks (or keyboard presses, whatever). This approach to interfacing a game is not questionable, considering a fact that all previous bots failed to win a human even with that extremely unfair API. Why to make anything human-alike, if bots can't win with unlimited control features?
However, what gets me angry is how all those "bot vs human" matches are presented. Am I exaggerating, or developers of AlphaStar try to claim that their bot is "as human-alike as possible"?(1) Is it kind of PR, right?
I didn't dive into details of what the "Rendered" is, but I assume it is as close to the human<->sc2 interface as possible. I also assume that the AlphaStar uses "Feature Layer" or "Raw" instead. Is there any article describing what API is used by the bot and a some kind of proof that pick is fair enough (i.e. human-alike)? (2) So far I can't find any.
So, if the answer to (1) is "yes" (meaning that developers DID claim their bot "as human as possible") and the answer to (2) is "no" (meaning there are no any proof that their claim is true) then developers trying to cheat a community.
I'm not against bots who cheating in a game. I'm against developers who are cheating the SC community (which is mostly consist of people who are not professionals in the programming field) and not saying about overpowered interface that their cheating bot has.
Does anyone know if they plan to alter the parameters and redo the tests so the AI can focus more on the strategy rather than the micro aspect of the game? I mean there was obviously a lot of decision making and early game choices involved by the AI, but I feel we could make the test even more interesting by tweaking a few things.
For example involving camera movement more and removing it's ability to see the whole map and react to everything instantly. Because spotting something on the minimap and having to change your camera to that spot to react while executing something on the other side of the map, and keeping up your macro is not simply a mechanical, but a strategic decision that players due to APM limitations, so I think it will be really interesting to challenge the AI more and see how it will utilize it's APM under different rules. Camera movement and positions is also part of proper macro and it eats up a chunk of the players APM, so by focusing more on the macro could lead to neglected micro and vice versa.
Not only that, but human APM by itself is heavily inflated. If we look at the conscious actions a human can make in a single second, there's really not that many non-spam actions we can physically make. While an AI that can see the whole map and has control over everything with 250 precise APM is above anything a human will ever be capable of. And I know this isn't about humans, it's about the advancement of the AI, but I believe that limiting it's ability to win due to inhuman mechanics will lead it to figuring out other build orders and strategies to win earlier or later in the game, and it will be a vastly more interesting experiment.
So far this is very impressive though and I hope they have more in store for us!
The reason why it won was because it was more precise,not better at anything else though. Precision is what made it win every match it played. It was definitely worse at decision making. It did have crazy micro skills though.
Not only that, but human APM by itself is heavily inflated. If we look at the conscious actions a human can make in a single second, there's really not that many non-spam actions we can physically make. While an AI that can see the whole map and has control over everything with 250 precise APM is above anything a human will ever be capable of. And I know this isn't about humans, it's about the advancement of the AI, but I believe that limiting it's ability to win due to inhuman mechanics will lead it to figuring out other build orders and strategies to win earlier or later in the game, and it will be a vastly more interesting experiment.
^ This!
I have no confidence that they will put any effort into doing this though. In chess they just had a match vs Stockfish under their own conditions and disappeared claiming Alpha is by far the strongest chess entity. There are concerns that it was just a publicity stunt because Stockfish appeared to be making weaker moves than it normally would.
Here they will probably pull similar crap by beating the human pros with inhuman mechanical efficiency like it did in the Mana match and portraying it like its so great.
Hopefully I'm wrong. They should have Alpha compete in TCEC once in chess and in SC2 give it legitimate human mechanics in SC2 instead of pretending that a human can use their APM so efficiently like Alpha does. They don't even appear to be acknowledging that it's way too efficient mechanically in SC2 right now. It seems they're parading it as on an equal playing field compared to a human to the audience who doesn't know any better.
Demis Hassabis, of Deep Mind, on their work with self-learning systems:
He talks about AlphaStar from 32.50. Most of it's familiar if you watched the exhibition matches, but there's an interesting bit at 39.24 where he talks about how they added intrinsic motivations to get build diversity (and you can see the rise of the stalker anyway).