Balance and Bonjwas: A Statistical Analysis

Lightwip

United States5497 Posts

March 29 2012 02:49 GMT

Brood War is one of the most balanced games ever created. In fact, it may be the most balanced competitive game ever created, an amazing feat given the fact that the three races of Brood War are about as different from each other as they could possibly be. Indeed, the game is very well-balanced, and it's very hard to tell which race is superior when just starting to play. Yet, although it is hard to admit it for many fans of the game, there is one race which is clearly superior to them all. Meet Brood War's favorite child: Terran.

You will never find a more wretched hive of scum and villainy.

In this post, I will give and explain my findings on this issue, and attempt to show that Terran has an upper hand in both general play, and especially in the creation of bonjwa players. I won't deny that I'm pretty biased on this subject, but I will try to avoid bias and justify my position through facts and numbers, not bias. It's as easy to say "tank imba, vulture imba" as it is to say "zealot imba, ultralisk imba" without any proof. In truth, I believe that no one unit makes Terran superior, but simply the presence of many options and outs in any given situation. However, let's consider facts rather than opinions. For those of you familiar with statistics, this should be pretty straightforward. For those of you who aren't, I'll explain as well as I can.

My analysis will be based off the aggregate scores of all pro games played from the start of 2002 until present day that are listed in the TLPD. A few notes:
1. There can be no complaints of a lucky season or an outlier player. The Six Dragons era is statistically insignificant. Swarm Season is statistically insignificant. Flash, Boxer, Oov, and Nada are also all statistically insignificant. Maps are also statistically insignificant as an aggregate. All these factors balance out into a value that would be very hard to refute.
2. Over ANY 3-4 year period, these numbers are about equal. Savior's innovations, (P)

Nal_rA's innovations, etc. are all insignificant over time because all the other races are, in the long term, able to compensate for these differences. These statistics are not a relic of a pre-Savior past.
3. 2002 is the logical starting point because it is after the release of patch 1.08, the last significant balance patch. It would make sense to start as early as possible, but not to evaluate a game with a different set of rules.
4. It would be neither viable nor useful to look at non-pro games. At any level other than pro, the balance is irrelevant because players simply aren't good enough. If you're not a pro, you pretty much lose only because the opponent played better. Balance is more significant at a higher level, in general (the same rule applies for chess, where white is imba).
5. "If terran is imba, why don't terrans win EVERYTHING?" Because terran is only slightly imbalanced. But as I will demonstrate, a little is enough.
6. It really doesn't matter whether the league victories are concentrated in a single player or spread around many because it's important to realize that if one player wins, then every other player cannot. It only makes sense that better>worse and will win more often than not.
7. Semifinalists and silver winners are completely irrelevant. There are dozens of mediocre and outright bad players who have made semifinals and even finals. Yet you'll be hard-pressed to justify that ANY of the starleague winners certainly didn't deserve to win. There are a few to argue, but even ones like July or Casy aren't even certainly unworthy. Too many non-winners who got close are, though.

Let's start by looking at the MSL and OSL winrates by race.
MSL
Terran: 12
Zerg: 10
Protoss: 4
Total: 26

OSL
Terran: 14
Zerg: 10
Protoss: 9
Total: 33
There really isn't much difference between the MSL and the OSL that would affect this experiment, so we can merge the two and set the winrates in terms of percentages.

MSL+OSL
Terran: 26/59 = 44.1%
Zerg: 20/59 = 33.9%
Protoss: 13/59 = 22%
There's also no need to differentiate between Zerg and Protoss in this test, so we can simply make this in terms of Terran vs. non-Terran.

Terran: 26/59 = 44.1%
Non-Terran: 33/59 = 55.9%
And now, we're ready to conduct a test.

For anyone familiar with statistics, one helpful tool is to test a statistic. It involves an assumed, null, value for a statistic, and a test to see whether or not a statistic of a sample(of a population) obtained is likely to appear by chance if the null is correct. In this case, the sample is all leagues to date and the population is all leagues played and unplayed. Since we do not know the standard deviation(spread) of the population(because it is impossible to acquire in this situation), we will use a t-test with all leagues played as the sample.

Our hypotheses are:
H0: μ=.33 (null hypothesis: the population mean is .33, or a fair 1/3 chance for Terran to win)
Ha: μ>.33 (alternative hypothesis: the population mean is larger than .33, so Terran has a larger than even chance of league victory).

The easiest way to conduct this test is to create a table with the values. It would be long and pointless to list, but it consists of 26 1's to indicate 26 Terran league wins, and 33 0's to indicate Terran losses in leagues. So we run the test:
n(number of sample points): 59
SE (standard error, t-test spread): .501
T value (test statistic measures distance from mean): 1.698
p value: .0475 or 4.75%
Sample Mean: .441 or 44.1%(this was calculated earlier)

Now, most of these values are pretty inconsequential, and are only listed for the purpose of noting statistics. The important thing here is the p value, which is the chance that such a sample mean would appear in a population with a mean as stated in the null hypothesis. As a rule of thumb, if the p value is less than .05 (5%), there is pretty strong evidence against the null hypothesis and in favor of the alternative one. It's not so strong that it's beyond a shadow of a doubt, but this test shows that we have pretty good evidence that Terran does indeed have a higher than fair chance of MSL/OSL victory.

This begs the question: how much higher? Well, let's run a test to create a new model. To do this, we'll need a winrate for all matchups. I added up all the games from 2002 onward (patch 1.08), and here is the result:
TvZ: 6549-5490 (54.40%)
ZvP: 5162-4280 (54.67%)
PvT: 4782-4317 (52.56%)
For anyone involved in BW, this T>Z>P>T trend is not at all surprising. Nor should the ZvP>TvZ>PvT trend be unexpected. At a quick glance, it's obvious that these results are ever so slightly favorable for Terran. If we were to equally weigh the percentages of each matchup (with the mirror being 50%):

Terran: 54.40*(1/3) + 47.44*(1/3) + 50*(1/3) = 50.61%
Zerg: 54.67*(1/3) + 45.6*(1/3) + 50*(1/3) = 50.09%
Protoss: 52.56*(1/3) + 45.33*(1/3) + 50*(1/3) = 49.30%
These are essentially the odds that a player faces in Proleague. So basically, a probable Terran is slightly more likely to win a given game than a probable Zerg or probably Protoss. However, by all means, even over a large period of time this isn't going to make results that are especially telling. Terran will have a higher winrate, but not by much. The imbalance truly comes out in the individual leagues. So let's look at a starleague.

Welcome to the Lightwip Hypothetical Starleague!

The Starleague proper consists of 36 players: 13 Terran, 13 Zerg, and 10 Protoss. Starleagues, unlike Proleague, are not race-balanced; the lower total winrate of Protoss actually hurts the chances of qualifying. If you look at every league in history, Protoss usually qualifies less than Terran and Zerg.

While I could average results from hundreds of simulations to find out the winrate, it would be too difficult to account for all factors and honestly not much more accurate. Therefore, the Lightwip HSL shall have a different set of rules: Victory is winning 16 of 20 games. This is pretty comparable to winning a Starleague proper, even if not exactly the same. By all means, it's a good proxy variable.
Let's calculate the win percentages by race and player count (mirrors are again 50%):
Terran: 54.40*(13/35) + 47.44*(10/35) + 50*(12/35) = 50.9%
Zerg: 54.67*(10/35) + 45.6*(13/35) + 50*(12/35) = 49.7%
Protoss: 52.56*(13/35) + 45.33*(13/35) + 50*(9/35) = 49.2%
This becomes slightly more Terran-favored. By binomial distribution(a situation in which there is a win/lose with a known percentage for each, as here), the chance for a hypothetical player of each race to reach 16 is (really low because 16/20 is an insane record):
Terran: .738%
Zerg: .548%
Protoss: .483%

Scaled,
Terran: 41.7%
Zerg: 31.0%
Protoss: 27.3%
For the most part, these statistics mirror actual SL results, reposted below.

Terran: 26/59 = 44.1%
Zerg: 20/59 = 33.9%
Protoss: 13/59 = 22%
Zerg actually has a slightly higher winrate while Protoss has a smaller one, but predictions are not perfect. It's close enough, at any rate. We could conduct another t-test to see whether 44.1% is far from 41.7%, but I think it's obvious that the new model is a good enough fit for all three races.

Like any statistical analysis, this one is not perfect. I'll outline a few things that ought to be considered below. There are two things that should be considered: bias and confounding variables.

Let's start with bias. Quite simply, there is none. We're not using any data that could be skewed by any form of human tendencies because all these numbers are a fact.

Now as far as confounding variables, there actually is something to consider.
The first is a simple problem: scaling up to 1. We did this to form our model above. This is not necessarily going to ruin anything, but admittedly it's not exactly 100% reliable. While it could generate meaningless data, I think it's not a problem. I could be wrong though, and the model is certainly imperfect.

The second is a bit more tricky: mirror matchups. Zerg and Protoss mirrors often devolve into a coinflip, which allows good players to be defeated by worse players. One consequence of this is that zerg and protoss titles are less concentrated in a few key players, but rather in a bunch of weaker ones. Terran, on the other hand, features numerous key players holding a good number of the titles. As I mentioned before, for one player to win, all others must lose(and the best is most likely to not lose), so simply chalking this up to skilled players is not enough. And on top of that, skilled players are more likely than unskilled players to win against all races, subject to the same conditions. So when a Terran key player advances from a TvT, he'll have more chance than any other Terran of winning his next game against Zerg or Protoss.
Now, this problem would not cause our experiment to incorrectly conclude that Terran has an unfair advantage; on the contrary, if anything it would cause us to underestimate Terran imbalance. But it also brings up an interesting topic: bonjwas. It seems that it is indeed easier for Terran to make bonjwas, simply because not only do Terrans have an advantage inherent in winrates, but they also have a mirror matchup that favors stronger players over weaker players to an extent higher than a coinflip. This certainly does help to explain why Terran spawns bonjwas so readily while Zerg and especially Protoss are hard-pressed to get one out.

I'd like to hear your thoughts and criticisms. Perhaps my logic, analysis, or numbers are somehow wrong. Please, point this out.

sharkeyanti

United States1271 Posts

March 29 2012 03:06 GMT

Good stuff. The general idea that Terrans are better rewarded for their skill is harder to see in non-professional games, but is absolutely true for the pros. Yea the 16/20 is perhaps a poor reflection of how starleagues are actually won, but seems okay based on the large data set. My main problem might be the typical "eye-test" complaint, as the reasons for people winning a starleague often have little to do with the basic premise of race-balance. But I will certainly accept that Terrans are more likely to be in a position of semi-finals/finals because of their racial "imbalance." It'd be tough to say that Bonjwas are created because of this though.

1a2a3aPro

Canada227 Posts

March 29 2012 03:18 GMT

There are several potential points to object your analysis on. I will name only a few of them.

Firstly, you are taking historical data from the beginning of Brood War. This is simply not fair. Pre-savior ZvT is in no resemblance to post-savior ZvT. The same can be said for mutalisk micro and stacking. Or the popularization of forge fast expansion builds against Zerg. These changes revolutionized a matchup that before this, was heavily favoured towards one race. These older changes add skew to the %s and distribution.

The second point I want to make, is that just because the best players have been Terran, does not mean that Terran is the best race. The players growing up idolize and want to be like BoxeR, like NaDa, like oov. This causes a heavier skew on the ladder towards these races. It also causes there to be very good coaching for those respected races, from some of these players (I'm looking at you, oov).

Finally, how much skew is there, really? Winning 16 in a row is ridiculous, and is not a fair judge of a players ability. A player with 70% across all matchups would not only be S class, they would be as good as Flash. This means a player can do WWLWWLWWLW, repeat, for their entire career, and always win Bo3s and have a great win rate in proleague. Why is it necessary to have such ridiculous streaks? I feel that you have some selection bias here, you are selecting a statistic that will of course heavily favour Terran, due to the volatility of PvP and ZvZ maches.

Overall, I think this is "see great Terran players, infer Terran bias, find a way to make statistics work to my conclusion." 1) Terran destroyed for a long time before players figured out different things, adds skew. 2) The %s we are talking about are very, very minute. Compare these to a game like WC3, and you will see what I mean (a couple of races come out clearly better). 3) Your criteria of large win streak is not the characteristic of a bonjwa. Someone with a consistent 70-75% winrate would simply completely dominate the game.

I will re-evaluate my #1 if you can prove the validity of this statement:

1. There can be no complaints of a lucky season or an outlier player. The Six Dragons era is statistically insignificant. Swarm Season is statistically insignificant. Flash, Boxer, Oov, and Nada are also all statistically insignificant.

Over ANY 3-4 year period, these numbers are about equal. Savior's innovations, Nal_rA's innovations, etc. are all insignificant over time because all the other races are, in the long term, able to compensate for these differences. These statistics are not a relic of a pre-Savior past.

Glockateer

United States254 Posts

March 29 2012 03:21 GMT

The percentage of wins each race has over each other here is within the realm of being statistically insignificant since groups of players can certainly skew that statistic. I think that the terran mirror match up is the most noteworthy reason for a bonjwa having an edge over his counterparts of other races.

[V]

United States905 Posts

March 29 2012 03:23 GMT

Truth.

Given all things equal, between three equally superlatively skilled Terran, Zerg, or Protoss, the Terran will have much strategy and specific units/abilities he can exploit to gain the advantage and win. Moreover, Terran allows for better recovery, defense, and adjustment in gameplay.

More than a decade of game statistics and "bonjwa" record testify to this. I suspect that non-terran progamers feel an evil childlike pleasure everytime they beat a Terran.

endy

Switzerland8966 Posts

March 29 2012 03:23 GMT

While I agree that on the ~32k games you aggregated "anomalies" like Flash do not impact on the race statistics figures, you cannot keep this assumption at your next step, it's like
1. Flash, Bisu, NaDa incredible win rates do not impact on 32k games. Ok.
2. Let's keep this assumption when we only have 36 players and 20 games for each player. You can't keep this assumption at all.

When someone like Flash has a 70% win rate in any matchup with great bo3/5 skills, he is way more imbalanced than the 50.9% race advantage.
Even if we suppose that at higher stages of Starleagues he will also face players who also have a higher win rate, the odd of Flash winning is still way more than 50%.

And since this is about statistics and bonjwas, it also happens that during their strongest domination era (by that I mean ignore NaDa winning an OSL in late 2006, etc) bonjwas all had a 70%+ winrate. Maybe only sAviOr didn't because he often dropped games during series.

Lightwip

United States5497 Posts

March 29 2012 03:28 GMT

Winning 16/20 is indeed ridiculous. But the point isn't that that is expected of them. The point is to estimate relative probability of achieving this result, which puts Terran ahead by a sizeable margin. 12/20 or 14/20 would have similar results, but it's more apparent at 16/20.
No single player is significant enough to influence the results. This analysis incorporates well over 25,000 games, and it doesn't even include mirrors. No player's career, and no 3-6 month period will make a sizeable difference in these results. As for the numbers themselves, it's simply that they do not fluctuate that much. There is a difference of at most 1% betwen eras, which is certainly not enough to reverse trends. An aggregate of the eras compensates for all new developments.
Having Boxer as a hero hardly makes anyone better than having Savior or Nal_rA as a hero.
Keep in mind, the analysis may not be perfect, but I fail to see how my methods would give a contrary-to-fact result.

The bonjwa point is that terrans have to be x better than their peers while P/Z have to be y, where x<y by enough to allow for quite a few more T bonjwas. If we were doing standard deviations, it would be 2.5 terran, 2.9 zerg, 3.4 protoss or something because 2.5 standard deviations above the probable terran is far enough to reach the bonjwa threshold.

On March 29 2012 12:18 1a2a3aPro wrote:
Finally, how much skew is there, really? Winning 16 in a row is ridiculous, and is not a fair judge of a players ability. A player with 70% across all matchups would not only be S class, they would be as good as Flash. This means a player can do WWLWWLWWLW, repeat, for their entire career, and always win Bo3s and have a great win rate in proleague. Why is it necessary to have such ridiculous streaks? I feel that you have some selection bias here, you are selecting a statistic that will of course heavily favour Terran, due to the volatility of PvP and ZvZ maches.

If anything, this experiment severely understates the Terran balance advantage since all 3 mirrors are given the value of 50%.

shaftofpleasure

Korea (North)1375 Posts

March 29 2012 03:48 GMT

Ahhh. The dreaded Starcraft Statistics. This is like me hate eating a bitter melon and the only available donuts around are bitter melon flavored donuts.

L3gendary

Canada1469 Posts

March 29 2012 03:55 GMT

From a stats point of view ur reasoning with the 16/20 wins makes no sense. You can take the slightest difference in any 2 percentages, take them to some ridiculous power and then conclude that they're far apart...when they're not. Then you rescaled it out of 100%??! This is really disingenuous.

Lightwip

United States5497 Posts

March 29 2012 03:58 GMT

#10

On March 29 2012 12:55 L3gendary wrote:
From a stats point of view ur reasoning with the 16/20 wins makes no sense. You can take the slightest difference in any 2 percentages, take them to some ridiculous power and then conclude that they're far apart...when they're not. Then you rescaled it out of 100%??! This is really disingenuous.

You're right. Bonjwas are decided over the course of a single game, so taking a win rate over a large stretch of time is disingenuous. I should have scaled it out of a single game.

Scarecrow

Korea (South)9172 Posts

March 29 2012 04:12 GMT

#11

I agree that the reduced luck factor in TvT contributes to more good terrans progressing and hindering the success of great players in other races. Similar to how PvP in sc2 can be detrimental to great players progressing.

The starleague figures are pretty interesting, alot of the low toss rate has to do with the historical difficulties toss has had with zerg making a 16/20 run extraordinarily difficult. Same with historical problems in ZvT. I feel the game and maps today are far more balanced than it was in Boxer/Nada's era. Extrapolating base race winrates over such a long period and so many different map pools to the chance of a race winning a starleague (when it's more player dependent) doesn't seem that reliable but i'm surprised how the numbers line up.

L3gendary

Canada1469 Posts

March 29 2012 04:14 GMT

#12

What im saying is you cant take some percentages that are very close, raise it to some arbitrary power and make conclusions from it. Are the differences in percentages even statistically significant? Where did the 10 protosses come from? Because that's really the only reason terran has the higher percentage in ur scenario because PvT is its worst mu and there are fewer P so that automatically makes it terran favoured.

etrensce

Australia337 Posts

March 29 2012 04:16 GMT

#13

On March 29 2012 12:58 Lightwip wrote:

Show nested quote +

You're right. Bonjwas are decided over the course of a single game, so taking a win rate over a large stretch of time is disingenuous. I should have scaled it out of a single game.

No but your methods are not correct statistically. Just conduct a simple p-test over the 32k games and you can see how insignificant the differences between TZP are.

Soft`Soap

Canada865 Posts

March 29 2012 04:19 GMT

#14

Honestly
it's all Boxer's fault

Lightwip

United States5497 Posts

March 29 2012 04:21 GMT

#15

On March 29 2012 13:16 etrensce wrote:

Show nested quote +

No but your methods are not correct statistically. Just conduct a simple p-test over the 32k games and you can see how insignificant the differences between TZP are.

Unless I'm thinking of something else, the significance tests would be pointless since the values are parameters. 32k games = game population.

oldgregg

New Zealand1176 Posts

March 29 2012 04:24 GMT

#16

16/20 wins for a Starleague seems kinda arbitrary. Wouldn't it be better to average the % games won out of games played for each Starleague winner on their starleague run, and use that %?

Your point about Terrans being more dominant becasue TvT is more skill based than ZvZ and PvP is quite interesting though, although Zergs and Protosses might find this offensive lol

Lightwip

United States5497 Posts

March 29 2012 04:27 GMT

#17

On March 29 2012 13:24 oldgregg wrote:
16/20 wins for a Starleague seems kinda arbitrary. Wouldn't it be better to average the % games won out of games played for each Starleague winner on their starleague run, and use that %?

Your point about Terrans being more dominant becasue TvT is more skill based than ZvZ and PvP is quite interesting though, although Zergs and Protosses might find this offensive lol

16/20 is arbitrary, but an accurate simulation is a coding nightmare. At the same time, I don't see how 16/20 would differ too much in results from a standard probable starleague.

I think it's universally accepted that PvP and especially ZvZ can be coinflips.

1004

United States104 Posts

March 29 2012 04:28 GMT

#18

honestly

if you had time to think hard about this and to write up this entire post

you have too much time on your hands...

go read a book
kiss a girl
play starcraft
SOMETHING other than just trying to do math to say terran is imbalanced.
"you shouldn't tell me what to do"
okay, then heed them as suggestions... but seriously, you clearly stated that it is completely irrelevant outside of the pro scene. not one person on these forums is a pro, therefore you merely just told everyone a cool story bro.

User was warned for this post

PurePwnageofTerran

268 Posts

March 29 2012 04:30 GMT

#19

My thoughts: Marine's when stim DPS is ridiculous.. esp against zerg
Seige tanks range of 12

How does a zerg counter seige tank/bio mid game before defliers.
I think thats where zergs problem is.

L3gendary

Canada1469 Posts

March 29 2012 04:30 GMT

#20

On March 29 2012 13:28 1004 wrote:
honestly

if you had time to think hard about this and to write up this entire post

you have too much time on your hands...

go read a book
kiss a girl
play starcraft
SOMETHING other than just trying to do math to say terran is imbalanced.
"you shouldn't tell me what to do"
okay, then heed them as suggestions... but seriously, you clearly stated that it is completely irrelevant outside of the pro scene. not one person on these forums is a pro, therefore you merely just told everyone a cool story bro.

At least he's contributing something, you on the other hand are not.

OT i still wanna know why there are only 10 P? I feel like im missing sometihn.

1 2 3 4 5 7 8 9 Next All

Please or register to reply.

Balance and Bonjwas: A Statistical Analysis

Completed

Ongoing

Upcoming