|
nzb,
Bravo! This is awesome! I was listening to the state of the game yesterday and after hearing Liquid'Tyler's explanation I was sure this could easily be proved statistically. I was going to go off and write a program to do exactly this. I'm glad you beat me to it as it saves a great amount of time. Great presentation too!
The discussion on the state of the game podcast got me totally interested in tournament theory. For anyone else interested I ran into another great discussion about tournament theory here:
http://www.vrbones.com/2009/07/designing-tournament-part-1.html
I agree with Liquid'Tyler that the primary goal of a tournament should be to find the best player. Or to quote from the above site:
The primary goal of a tournament is to provide an objective method for finding the competitor with the highest true skill.
I think it's interesting that people dislike the extended series rule because it's not used in other tournaments. A better question might be why isn't the extended series rule used more often in other tournament formats? It clearly does a better job at determining more accurate results.
One thing that I'm really curious about is what would be the optimum tournament format. For example say you wanted the most accurate results after playing a total of 'N' games, what's the best way to do that. What about tweaking the format so that the first round of games is a Bo1 and make it a triple elimination tournament or quadruple elimination tournament? Would it be better to have fewer games played across a wider variety of players? Also how efficient is the pool play system that the GSL tournament uses. It seems like a lot of very good players failed to qualify out of the pool play.
Anyways, great post nzb!
|
On November 12 2010 11:13 Dragar wrote:Show nested quote +On November 12 2010 10:26 rasnj wrote:On November 12 2010 10:19 Dragar wrote: Is it possible to rephrase the question to not assume that the better player is transitive? So that the goal is not to determine the 'best' player, but rather to minimise the effect of matchup ordering, etc? What exactly would be the goal then? I thought about doing this kind of analysis myself, but decided that I couldn't formulate exactly what I wanted the tournament system to accomplish without imposing a total order on the skill levels of the players, and I considered this too far from reality to bother. If you can clearly express the goal of your tournament and a way to determine how far a given ranking is from that goal, then we can probably do some analysis. Well, that is precisely what I am wondering. Is it possible that rather than specifying a 'goal' for the tournament, you instead want to find the tournament framework that minimises the impact of the order in which the players face each other (i.e. ideally it shouldn't matter if Idra plays Huk in the first game or HDStarcraft) when determining the winner (or rankings, or whatever goal you want to pick for the tournament). The point of going to these lengths is to address exactly the difficulty you talk about - it's not realistic to impose a total order on the skill levels of the players (though it's obviously not terribly far out; we just want to know if it will affect the analysis or not). This is a good idea. I think I will try to see if I can work out anything from this point of view tomorrow (3:30am here). Does anyone know where we can find the official mlg rules for map sets, rankings, brackets, map elimination etc.? Would like to analyze it using the mlg format.
|
My general thought is this. At this skill level, in best of series, talent shakes out, and the more games 2 players play, the better sample is generated to truly determine the better player. Looking at an extended series as a bo7 (ie, removing the time between games element), it does a better job of getting a winner, and avoids ambiguous results and prevents "incorrect" results.
For example, there is simply no way Idra should lose to HD in a best of 7 if he was much more talented. Hence, if you play an early set and HD beats one 2-1, the reasonable expectation should be that either way, if Idra truly more talented, the second match up should see a 2-0 or 2-1 result for Idra, making the series 3-2 or 3-3, which if we assume that the vastly better player wins the best of 7, even if they played an extra match, Idra should win and move on, meaning in an extended series or second bo3 Idra should win- there is no difference between the 2. This also avoids the problem of idra winning the first set and losing the second- allowing the worse player to move on, even if the series was 3-2. Again, this is working under the assumption that the vastly more talented player should win every best of 7.
However, in even games, the extended series does a better job of truly showing who the better player is by increasing the population of games. For example, if Idra and Nony play, most people would say the skill difference is negligible- whoever would win a bo7 would be the one who happened to be playing better that day, had a special build, or something like that. Whoever wins the first series, lets assume idra, and they play again, the extended series does a better job of determining the better player the outcome is acceptable. Take the normal non extended series If idra wins again, he was the better player that day, no problems, he won 4 of the 7 games. However, should nony win with the same score from the first series, say both were 2-0, we have no way of telling who the better player was that day, just that nony happened to win later on, but overall, with a 2-2 score, there is no actual result or closure- ambiguity remains. But with an extended series, playing out the rest, someone would have to definitively prove they were playing better that day. There is no ambiguous result, someone would win a bo7.
Now lets assume 2 players of slightly different skill levels- say Idra and machine. Idra should win a series the majority of the time. In this case, the more games the 2 play, the more the match shifts in idra's favor- as the better player, given infinite matches, he should win the majority. So increasing the number of matches they play increases the chance the better player moves on.
From a purely analytical standpoint- extending the series and playing more games simply increases the better players opportunity to win the series.
|
On November 12 2010 10:32 nzb wrote:Show nested quote +On November 12 2010 10:26 zulu_nation8 wrote: I think your study would only be meaningful if people actually assumed a bo7 series does not determine the best player as well as a bo3 series. I'm not really sure what you are responding to ... The point of this is to determine exactly how much of an effect extended series has, both for individual matches and for an entire tournament. I'm pretty sure I haven't seen anyone talk about this with real numbers to back up what they are saying
I don't understand why numbers are needed to argue that a format with lots of games played has less variance than a format with few games. What does your conclusion show that we don't already know? Moreover what can you interpret from the data you collected? What does the 1% increase mean? Is it significant? What does it reveal about extended series other than it slightly increases the chance that the "best" player wins, which everyone should already understand since extended series is a bo7 compared to the normal bo3. I also don't agree with your performance model, skill level is extremely difficult to quantify.
|
Your model is incomplete, as it ignores the fact that the extended series suddenly changes the rules. While it seems insignificant for sports based on (physical) skill, because players always play 'straight up', it heavily messes up a strategy game like SC2.
In a bo3 both players play 'honest', because one loss puts you on the verge of losing the match. And the best player should win, especially because he can increase his chances by picking the final map, but sometimes the weaker player has a good day and takes the win.
When they meet again and the series resumes as bo7, that completely messes up the rules. P1 starts with 4 losses to elimination, which enables him to use very risky tactics. He can all-in/cheese 3 times until it gets serious and he gets to pick 2 maps, which is very convenient because he only needs 2 wins.
And that's the problem it heavily increases his chances to win, way more than it should, especially if its a non mirror matchup. That's something unique to SC2 (or strategy games in general), these all-in tactics allow a player to reduce the decision making to a minimum, which means the better player can't take advantage from his higher skill level and let's be honest even the best players won't be able to scout every cheesy tactic in every game.
So what should happen in your simulation is a lot more false-positives where a inferior player advances one round just because of that rule. It will happen several times per tournament (depending on the size) and it will cause other superior players to rank lower than expected, because they dropped out since they didn't get a extended series in their favor and got matched against another high class player instead.
That's why I don't buy this 58% success rate of the extended series, but the problem is you can't model the effect of map picking/risky tactics that the player from the winners bracket gets, because there are no statistics to get these % from.
Now, what you could do is replace the extended series with a fresh bo5, because in the worst case they had to play 4-5 games anyway and that should increase the success of the better player, because they start on equal ground and he actually benefits from his higher skill regardless if he won or lost the first bo3.
Of course I can't prove my point that the extended series is reaaally bad, because only mlg uses it for SC2 and 3 tournaments are not enough to get results from statistics. (it would be hard anyway because the real skill level of each player remains unknown)
But why even risk to use a flawed system like that (flawed for strategy games) when a more reasonable solution like a fresh bo5 is available? Especially when your simulations seems to prove that the effect on the final ranking is minimal, even in a perfect world that assumes that players never play cheesy.
|
To me the point of contention is clear:
If you believe that you are 'playing the field' in the tournament, then you should count them as separate series.
If you believe that that tournament is a series of head-to-head match-ups, which seems to appeal more strongly to a sense of personal fairness, then the best of 7 provides a better picture of relative skill.
Personally, I prefer the best of 7 because I prefer to at least keep the head to head matches fair. The separate series format seems to go back into the teeth of the flaws of the system with the randomness of seeding, etc.
I also do not like when we start discussing 'what a player deserves'. Seems like a bad route to travel.
|
On November 12 2010 11:51 Nienordir wrote: Your model is incomplete, as it ignores the fact that the extended series suddenly changes the rules. While it seems insignificant for sports based on (physical) skill, because players always play 'straight up', it heavily messes up a strategy game like SC2.
In a bo3 both players play 'honest', because one loss puts you on the verge of losing the match. And the best player should win, especially because he can increase his chances by picking the final map, but sometimes the weaker player has a good day and takes the win.
When they meet again and the series resumes as bo7, that completely messes up the rules. P1 starts with 4 losses to elimination, which enables him to use very risky tactics. He can all-in/cheese 3 times until it gets serious and he gets to pick 2 maps, which is very convenient because he only needs 2 wins.
And that's the problem it heavily increases his chances to win, way more than it should, especially if its a non mirror matchup. That's something unique to SC2 (or strategy games in general), these all-in tactics allow a player to reduce the decision making to a minimum, which means the better player can't take advantage from his higher skill level and let's be honest even the best players won't be able to scout every cheesy tactic in every game.
So what should happen in your simulation is a lot more false-positives where a inferior player advances one round just because of that rule. It will happen several times per tournament (depending on the size) and it will cause other superior players to rank lower than expected, because they dropped out since they didn't get a extended series in their favor and got matched against another high class player instead.
That's why I don't buy this 58% success rate of the extended series, but the problem is you can't model the effect of map picking/risky tactics that the player from the winners bracket gets, because there are no statistics to get these % from.
Now, what you could do is replace the extended series with a fresh bo5, because in the worst case they had to play 4-5 games anyway and that should increase the success of the better player, because they start on equal ground and he actually benefits from his higher skill regardless if he won or lost the first bo3.
Of course I can't prove my point that the extended series is reaaally bad, because only mlg uses it for SC2 and 3 tournaments are not enough to get results from statistics. (it would be hard anyway because the real skill level of each player remains unknown)
But why even risk to use a flawed system like that (flawed for strategy games) when a more reasonable solution like a fresh bo5 is available? Especially when your simulations seems to prove that the effect on the final ranking is minimal, even in a perfect world that assumes that players never play cheesy.
You are right that there all kinds of effects that aren't being captured in the model. That's why I said the player model was definitely the weakest part of the analysis. What you bring up is interesting, because unlike many other objections, it is a systematic error that would favor the winner of the winners' round game. However, statistically speaking, this person is likely to be the 'better' player, so it probably doesn't actually change things that much. It would decrease the 58%, but it would also decrease the 4%.
I guess in a larger sense, you can't take any of the exact numbers from the original post literally. This is a model, and it is a simplified one. I absolutely, 100% guarantee that every individual number in the original post is wrong. That wasn't the point, though. The point was the overall trends, and I still think they are correct.
Notice that the conclusion doesn't reference a single number from the body of the post. Instead, it draws lessons from the numbers and states those. I think they are still, by and large, correct:
- Extended series will increase the likelihood that the better player advances. (Pending your objection, probably by less than the analysis shows.) - However, it won't have much impact on overall tournament settings. - If we want to improve tournament outcomes, we should modify the tournament format.
I didn't talk about this in the main post, because its just my opinion and wasn't backed by any numbers, but I think a good format would be:
- Play swiss-style tournament to determine the top 8-16 players. - Play single elimination to get champion.
This would be a very reliable way to determine the top 8 or 16, and then would switch into overdrive to determine the champ. It would be very exciting, similar to how the NCAA does March Madness. I would love it if we could get someone to do some special event using this format just to try it out.
|
I feel like in the numbery statisticy way of thinking, yes the extended series makes sense. but you have to think of the tourney scenario. later in the tournament, there is more pressure or money at stake. so then the players may play different. this alone should me the new best of three, as incontrol says, an isolated event. this should be completely separate from the first best of three. there is also the momentum aspect. if in the first BO3 - player A wins the first game. he will then have a psychological advantage. so then he has momentum going into the second game. since the second best of three doesnt have that same momentum because the games are played at different times, i dont think they should be considered of the same series.
|
Lol what's with people saying "THIS MODEL IS IMCOMPLETE"
well of course it is. You can NEVER have a perfect model, and realize having simplest model is the best if you took any kind of studies on stats.
The numbers are but a tool for whether determining the meaning of series of data. It's not supposed to be the end all description of everything.
This is a great model for determining the value of extended series statistically. It's simple and straight forward, and for the most part, describes different kind of tournament formats within reasons. Besides, even if you try to add all these "effects", given that he did a million tries, the data will probably not shift in any meaningful ways.
I think people should stop discussing the validity of it.
|
On November 12 2010 13:12 italiangymnast wrote: I feel like in the numbery statisticy way of thinking, yes the extended series makes sense. but you have to think of the tourney scenario. later in the tournament, there is more pressure or money at stake. so then the players may play different. this alone should me the new best of three, as incontrol says, an isolated event. this should be completely separate from the first best of three. there is also the momentum aspect. if in the first BO3 - player A wins the first game. he will then have a psychological advantage. so then he has momentum going into the second game. since the second best of three doesnt have that same momentum because the games are played at different times, i dont think they should be considered of the same series.
I think one of the points of my post that have been lost on most people is that extended series, although it seems to not negatively effect outcomes, doesn't really seem to help much either in the "macro" sense. Therefore, other considerations become more important.
I think considerations such as: entertainment value, counter-intuitiveness, different tournament settings, etc.. are all very important, and this post makes it clear that the statistics do not support the extended series as a must-have for the double-elimination tournament format.
My conclusion from all of this is that the extended series rule is really a judgement call based on other, subjective qualities. From my reading of public opinion on TL.net, it seems that most people do not like it, and therefore maybe it should be reconsidered.
I would caution, however, what would people's opinions be if instead Liquid`Tyler had beat Painuser 2-0 in the winners' bracket, and then lost 1-2 to him in the losers'. He would have gone 3-2 against him in the tournament, but been knocked out. Where would TL.net stand on the issue then?
|
nzb thanks for doing this. It seems clear that the doubleExtended is slightly more accurate, which is to be expected as a single bo7 would yield better results than 2 x bo3 if all done stand alone.
The unfactorable variation on this is the situation in which the games have been played. The classic example being Liquid`Tyler vs. PainUser where external issues affected the game in an unknow way, and thus resulted in a (highly) possibly modified outcome. I suppose one could argue this in both ways, with two seperate bo3 might favour one player due to completely uncontrolable cirmustances.
Another simple situation comes to mind with standard doubleElimination: Player A beats Player B, player B is now in loser bracket. Player A then loses to Player C and gets knocked down to the loser bracket. Player A and Player B have their 2nd bo3 in this tournament, and this time, Player B wins. Player A is now knocked out of the tournamet.
So, while both players have won 1 bo3, because of the order, Player A has been knocked out, and Player B continues.
I have a question with your data: In your abstract, you state that single elim yeilds a 19% champion rate for the best player, while double elim gives 24%, double elim+ gives 25% and round robin shows 47%
how did you come to that conclusion from this?:
Format | Winner | Depth | 2^Depth ---------------+--------+-------+-------- Single | 0.91 | 52.09 | 110.07 Double | 0.88 | 48.31 | 89.83 DoubleExtended | 0.88 | 46.01 | 87.42 RoundRobin | 0.72 | 22.29 | 28.85
I understand the results are 1 - winner% , but wouldn't that mean that this shows single elim having a winner of 9% (where 1 - winner% = 0.91)
|
On November 12 2010 13:24 nzb wrote:
I would caution, however, what would people's opinions be if instead Liquid`Tyler had beat Painuser 2-0 in the winners' bracket, and then lost 1-2 to him in the losers'. He would have gone 3-2 against him in the tournament, but been knocked out. Where would TL.net stand on the issue then?
^This is probably what most people are missing when they simply get caught up about tournament format.
But it's still valid to say Tyler got eliminated because he lost 2 Bo3 series where as Painuser only lost 1 Bo3 in the tournament.
Just different way of looking at things.
|
On November 12 2010 13:33 voss wrote: nzb thanks for doing this. It seems clear that the doubleExtended is slightly more accurate, which is to be expected as a single bo7 would yield better results than 2 x bo3 if all done stand alone.
The unfactorable variation on this is the situation in which the games have been played. The classic example being Liquid`Tyler vs. PainUser where external issues affected the game in an unknow way, and thus resulted in a (highly) possibly modified outcome. I suppose one could argue this in both ways, with two seperate bo3 might favour one player due to completely uncontrolable cirmustances.
As with all of these objections to the model, you have to determine if they systematically favor the winner of the previous series or not. It doesn't seem like random outside factors, such as this, would have any systematic effect. Example: What if Painuser had been the one with a re-game, except in the losers' bracket match after defeating Tyler 2-0 in the winners' bracket? It seems like that would benefit Tyler.
Another simple situation comes to mind with standard doubleElimination: Player A beats Player B, player B is now in loser bracket. Player A then loses to Player C and gets knocked down to the loser bracket. Player A and Player B have their 2nd bo3 in this tournament, and this time, Player B wins. Player A is now knocked out of the tournamet.
So, while both players have won 1 bo3, because of the order, Player A has been knocked out, and Player B continues.
Exactly. This is something I tried to highlight in the intro, but I bet most people didn't read that.
I have a question with your data: In your abstract, you state that single elim yeilds a 19% champion rate for the best player, while double elim gives 24%, double elim+ gives 25% and round robin shows 47%
how did you come to that conclusion from this?:
Format | Winner | Depth | 2^Depth ---------------+--------+-------+-------- Single | 0.91 | 52.09 | 110.07 Double | 0.88 | 48.31 | 89.83 DoubleExtended | 0.88 | 46.01 | 87.42 RoundRobin | 0.72 | 22.29 | 28.85
I understand the results are 1 - winner% , but wouldn't that mean that this shows single elim having a winner of 9% (where 1 - winner% = 0.91)
This is something that I realized was confusing after posting -- the numbers in the abstract are from a tournament with 64 players, and the numbers in Section 4.1 are from a 128-player tournament. I generated the numbers for 64 players to make the graphs for varying #'s of games, and then I realized since I was talking about MLG Dallas, it would be a good idea to use 128-players for my main results. I forgot to change the ones in the abstract. :/ Oh well.
In some sense, its better this way because the 128-player numbers don't show the (slight) benefit of the extended series in the 'winner' metric at the second decimal point. I would need to include more precision.
|
On November 12 2010 13:35 scion wrote:Show nested quote +On November 12 2010 13:24 nzb wrote:
I would caution, however, what would people's opinions be if instead Liquid`Tyler had beat Painuser 2-0 in the winners' bracket, and then lost 1-2 to him in the losers'. He would have gone 3-2 against him in the tournament, but been knocked out. Where would TL.net stand on the issue then? ^This is probably what most people are missing when they simply get caught up about tournament format. But it's still valid to say Tyler got eliminated because he lost 2 Bo3 series where as Painuser only lost 1 Bo3 in the tournament. Just different way of looking at things.
I think, ultimately, you are screwed either way:
Option A: Use the extended series rule -- its awkward and people don't like it.
Option B: Keep things with BO3, and deal with the weird paradoxes like getting knocked out even though you "beat" the other player, but at least it is consistent.
Because the statistics aren't conclusive, you are left with a judgement call.
In my opinion what this really says is that double elimination has problems, and maybe we should use a different format that does a better job of ranking most of the players, and concludes with an exciting tourney to determine the champ. (See previous post about swiss/single elim hybrid.)
|
Very interesting, The only thing missing would be some measurement of tournament efficiency, that is which model produces the most accurate result in the lowest number of games
|
On November 12 2010 13:47 fenixauriga wrote: Very interesting, The only thing missing would be some measurement of tournament efficiency, that is which model produces the most accurate result in the lowest number of games
If you read the wikipedia pages on tournaments (at bottom of OP), they have a good discussion of this. Swiss seems to have good results, but it has other issues....
|
Thank you for doing this. I had to cringe so hard hearing Nony explain the point behind extended series and then Day9 going "Gee, I didn't think of that, wow."
Then later on both him and Idra claimed the purpose of a tournament format is not to have the best player win because eventually someone not the best player will win. Yeah, it will. Unless you play an infinite round robin. But you can't play so many games.
So then Day9 went into this line that as a math graduate he knew that it doesn't matter what structure you use since the variance of the coins will never be evened out by a tournament structure. This is obviously very wrong as it is very likely for a person to win once when he is only 30% likely to win. But if you need to get that 30% odds several times then that's not going to be likely. The argument that it was fundamentally wrong to even think that a tournament structure would have any effect was so obviously wrong, I was sad Nony got a little intimidated. Yeah, it would be hard for him to make the actual argument as he couldn't run these simulations on that spot in his head and use that as evidence. Anyway he would have to do some handwaving and it wouldn't have looked strong to many viewers, who were already in favour of normal double elim.
People just hate to accept that what they did for years wasn't that good. It's hard for people to accept that in the past people went out of tournaments, eliminated by people they had a wining record against. They have to accept that that was somehow just.
I don't understand why numbers are needed to argue that a format with lots of games played has less variance than a format with few games. What does your conclusion show that we don't already know?
Didn't you hear what Day9 said on that podcast? And, there's still people here that dispute the result.
I remember when people didn't understand why the person in coming to the finals of the loser bracket had to win twice. Some people aren't very good at this stuff.
|
On November 12 2010 13:58 nzb wrote:Show nested quote +On November 12 2010 13:47 fenixauriga wrote: Very interesting, The only thing missing would be some measurement of tournament efficiency, that is which model produces the most accurate result in the lowest number of games If you read the wikipedia pages on tournaments (at bottom of OP), they have a good discussion of this. Swiss seems to have good results, but it has other issues....
Personally I like the idea of elimination swiss. Basically you use the swiss pairing system and then eliminate players once they have lost 3 rounds. Eventually when there are 4 or less players left, the do a playoff for the top spot.
|
On November 12 2010 14:06 Almeisan wrote: So then Day9 went into this line that as a math graduate he knew that it doesn't matter what structure you use since the variance of the coins will never be evened out by a tournament structure. This is obviously very wrong as it is very likely for a person to win once when he is only 30% likely to win. But if you need to get that 30% odds several times then that's not going to be likely. The argument that it was fundamentally wrong to even think that a tournament structure would have any effect was so obviously wrong, I was sad Nony got a little intimidated. Yeah, it would be hard for him to make the actual argument as he couldn't run these simulations on that spot in his head and use that as evidence. Anyway he would have to do some handwaving and it wouldn't have looked strong to many viewers, who were already in favour of normal double elim.
Hahahah... This is exactly why I was inspired to do this. I was like... "C'mon Day[9]! You are representing all science grad students in the universe on this show, and you come up with this crap?" I'd like to think that Nony was just chillin' and didn't want to fight about it anymore. Anyway, I had to run the numbers myself and show the (admittedly small) advantage of the extended series. I thought it might make a bigger difference than it did, but facts are facts.
|
I didn't read all of the statistics stuff, but really interesting analysis. I think the only major tournament that uses group stages followed by single elim Ro16 is WCG, and it seems like the best players always win there (Korea won every single BW WCG since forever). It'd be cool if more tournaments picked up that kind of format if it's indeed more accurate in less # of games.
|
|
|
|