December 26 rating list: now tweaked - Page 7

Grovbolle

Denmark3803 Posts

December 28 2012 21:54 GMT

#121

On December 29 2012 05:39 Salient wrote:
The GSL ratings are superior because they are limited to games between top players. You fail to account for the quality of the games with your statistics 101 class project. More sophisticated models can actually do that (and thus become more powerful), but they become necessarily more subjective. It's a price you need to pay to make the ratings meaningful. It takes a lot of work though. Your system is apparently less subjective and easier to construct, but you get what you pay for.

Please enlighten us with you vast knowledge, because obviously you are some sort of mathematical genius who can quantify subjectivity in a meaningful and objective/useful way.
Also, please define top players.

If you want subjectivity, just read power ranks plz.

Steel

Japan2283 Posts

December 28 2012 22:09 GMT

#122

Effort is 10th because of 3 proleague wins, ha.

InvincibleRice

United States38 Posts

December 28 2012 23:47 GMT

#123

Good rankings. I think people are viewing this as an absolute power ranking (like an NFL team ranking), but those are inherently subjective. This is nice a little tool to see how people are performing recently; it's not going to show you who's better that who, since that requires a whole host of data (weighting different tourney wins, examining the games themselves, etc) that would be far too time consuming to gather.

There are some people here that don't pass the eye test, but that's the nature of these things. I feel that most people just want some SC2 authority to spout off a power rank and for that power ranking to align with their own personal views.

THF

United Kingdom20 Posts

January 03 2013 15:36 GMT

#124

It looks like a lot of people do not understand the idea behind this work. While a lot of them think that whatever is the top 10 on your ranking is what you (TheBB) believe to be the top 10, I do appreciate that it's "just" a result of a consistent application of a statistical algorithm.

I like this work very much. I do not rationally agree with a lot of these rankings (as was mentioned - CombatEX, Mvp and a lot of Kespa players who play only with one another and are linked to the "main pool" of players only via three tournaments [OSL, WCS Korea, MLG v Proleague] - these ratings are far from what I would say "expected"), but it's only yet another tool for enthusiasts to look at.

I miss the events being rated differently. Though I understand the decision (it's easier not to rank, and probably fairer from stats point of view). There might be some room for improvement (or an alternative ranking) where events are weighed - be it by prize money, predefined ranking, or the median skill of competing players. But having this "raw" data where every game counts the same is important as well.

One suggestion: how about publishing the uncertainty of a rating of a given player (both on profile and on the list)?

nebula.

Sweden1431 Posts

January 03 2013 15:38 GMT

#125

On December 27 2012 22:29 GGzerG wrote:
I think you should just remove CombatEX from the rankings all together since he is not even allowed on this website , it is pretty pointless to even have him in the rankings IMO, just move on to the next person please, wouldn't be hard to replace him with someone that actually matters.

wow lol. this is so stupid

Grovbolle

Denmark3803 Posts

January 03 2013 15:51 GMT

#126

On January 04 2013 00:36 THF wrote:
It looks like a lot of people do not understand the idea behind this work. While a lot of them think that whatever is the top 10 on your ranking is what you (TheBB) believe to be the top 10, I do appreciate that it's "just" a result of a consistent application of a statistical algorithm.

I like this work very much. I do not rationally agree with a lot of these rankings (as was mentioned - CombatEX, Mvp and a lot of Kespa players who play only with one another and are linked to the "main pool" of players only via three tournaments [OSL, WCS Korea, MLG v Proleague] - these ratings are far from what I would say "expected"), but it's only yet another tool for enthusiasts to look at.

I miss the events being rated differently. Though I understand the decision (it's easier not to rank, and probably fairer from stats point of view). There might be some room for improvement (or an alternative ranking) where events are weighed - be it by prize money, predefined ranking, or the median skill of competing players. But having this "raw" data where every game counts the same is important as well.

One suggestion: how about publishing the uncertainty of a rating of a given player (both on profile and on the list)?

It is a lot of work just to add games from so many different sources, there aren't "just" GSL, MLG, IPL,DH, IEM, SPL, GSTL, OSL etc. There are also TONS of cups, ASL, NSL, Zotac. Weighing and gathering data which could be used for applying a weight is both time consuming and very subjective.

KillerDucky

United States498 Posts

January 03 2013 22:07 GMT

#127

TheBB, could you make a .csv or something available for the latest ratings list? I'd like to play around with it.

THF

United Kingdom20 Posts

January 04 2013 12:57 GMT

#128

On January 04 2013 00:51 Grovbolle wrote:

Show nested quote +

I don't doubt that. As I said, I really appreciate this work - especially as the database of games is pretty much publicly available.

The weighing could be done semi-automatically as well, though, depending on the weighing criteria. Someone would need to try what "works best", and it would definitely be controversial whatever criteria one decides to use. But for example, weighing based on average player rating in a tournament could be fully automatic once implemented, where weighing based on other criteria (money, prestige) would require some more tournament metadata.

Now that I think about it, any sort of tournament-based weighting would require a different rating algorithm than the current one - the current one is "simply" asking "which set of ratings maximises the probability of these real life outcomes" which have already happened, it's an "ex post" algorithm. Something like Elo might be more suited for weighing, where ratings are evaluated continuously, and you can apply "multipliers" to given results.

Grovbolle

Denmark3803 Posts

January 04 2013 16:29 GMT

#129

On January 04 2013 21:57 THF wrote:

Show nested quote +

Currently, the matches aren't assigned to a given tournament. The descriptions you see here http://aligulac.com/results/ are just made up by whoever plots in the data from the first played game, and the rest of us just use the same form

I usually just use a similar description as the LR-thread I copy from :D

Greenei

Germany1754 Posts

January 05 2013 02:07 GMT

#130

On December 29 2012 04:36 TheBB wrote:

Show nested quote +

I'm getting tired of this. The whole idea behind the system was to make it predictive.

image

Behold. This is a plot of almost 50k historical games. On the horizontal axis you find predicted winrate for the presumed stronger player, using the ratings at the time the game was played. The games were grouped in reasonably small groups, i.e. 50%-53.3% and so on. (Obviously no numbers below 50 since this is the predicted winrate for the stronger player. It also only goes up to about 75 because there are very few games past that.)

On the vertical axis is the actual winrate for each group.

As can be plainly seen cough, cough, the actual winrate is close to the predicted winrate or, in some cases, higher.

Now I'm no statistician, and honestly I don't know how to do a proper prediction test, but it looks fucking damnable good to me.

Does that mean, that if I want the actual winrate I have to add a couple %-points to the result your website delivers? Or do you account for that internally while giving me the predicted winrate?

Another question: Does your choice of the periods affect the ratings? For example if PlayerA wins 1 TvZ in period 1 and loses 1 TvP in period 2. Then his racespecific ranking don't differ from each other at the end of period 2, right?

But what if he plays both games in 1 period? Then the racespecific ranks will differ if I understand it correctly.

This doesn't make much sense to me. For example Demuslim played a 2-3 vs Violet and 3-0 vs Sen in the last period. I don't see why it's fair, that his TvP and TvT get the same boost as his TvZ as I'm pretty sure that his TvZ is his best MU right now.

envyYaegz

United States68 Posts

January 05 2013 02:14 GMT

#131

I think the fact that CombatEX made the list shows there is something wrong with the way this is calculated. It honestly single-handedly crushes any credibility this list might have had, and takes away from the other players on the list, or not on the list.

baba1

Canada355 Posts

January 05 2013 02:15 GMT

#132

Combat-EX making his way through the zergs!

pigmanbear

Angola2010 Posts

January 05 2013 03:35 GMT

#133

On January 05 2013 11:14 envyYaegz wrote:
I think the fact that CombatEX made the list shows there is something wrong with the way this is calculated. It honestly single-handedly crushes any credibility this list might have had, and takes away from the other players on the list, or not on the list.

First everybody bitched that circumstances led to CombatEx representing Canada, then he does well there and you still keep your head in the sand? Honestly his results are probably better than fellow Canuck HuK would've had at that event.

That being said, I am deathly afraid of CombatEx. The name shakes me to the core. I couldn't even type it out, I had to copy and paste it into my post.

The great battle between the heart and the head in this thread is hilarious: http://wiki.teamliquid.net/starcraft2/User_talk:EhonTiming!/CombatEX

Greenei

Germany1754 Posts

January 05 2013 05:44 GMT

#134

If you take an entry with only 6 games and say that this list doesn't have any credibility you are just being dumb. If you only have 6 datapoints no system in the world can make an accurate description of a players skill...

TheBB

Switzerland5133 Posts

January 07 2013 13:29 GMT

#135

On January 05 2013 11:07 Greenei wrote:

Show nested quote +

Does that mean, that if I want the actual winrate I have to add a couple %-points to the result your website delivers? Or do you account for that internally while giving me the predicted winrate?

That's not accounted for, no. You can add those percentages if you like, I think most people feel (and I would agree) that the percentages are sometimes a bit extreme.

On January 05 2013 11:07 Greenei wrote:Another question: Does your choice of the periods affect the ratings? For example if PlayerA wins 1 TvZ in period 1 and loses 1 TvP in period 2. Then his racespecific ranking don't differ from each other at the end of period 2, right?

But what if he plays both games in 1 period? Then the racespecific ranks will differ if I understand it correctly.

This doesn't make much sense to me. For example Demuslim played a 2-3 vs Violet and 3-0 vs Sen in the last period. I don't see why it's fair, that his TvP and TvT get the same boost as his TvZ as I'm pretty sure that his TvZ is his best MU right now.

Well yeah, that's a weakness. The distribution of games over periods matters, not just their order. This is true for most rating systems. It's a bit more visible here because the periods are short, as well as the race thing. It's just the sort of thing that happens when you use discrete instead of continuous time I guess.

If you're mathematically inclined, I do this to ensure that the likelihood function has a unique maximum (or in other words, it restricts the parameter space so that the Hessian is nondegenerate). I could do that in a different way, too, which would allow race-specific matchups to change even in cases like those you mention.

If I didn't do it, then there's no way to tell if a 10-0 in TvT over a period is due to an increase in (a) general skill, (b) TvT skill or (c) a combination of the two. In the case of (c), some choice has to be made regarding the mixing factor.

y0su

Finland7871 Posts

January 07 2013 13:36 GMT

#136

Am I missing something about specialization?
Site shows Stephano (+281 vT) while Life has a difference of +348 between his vT and overall rating.

e: It would be cool if the site sorted more things like best in each MU (as well as how much higher someone is than 2nd place). i.e.
ZvT: Life 3203 (+661)

DusTerr

2520 Posts

January 07 2013 13:46 GMT

#137

I find it very interesting how many players have two very similar match-up ratings and then the third is vastly different. I think this could be analysed further :D

JustPassingBy

10776 Posts

January 07 2013 15:40 GMT

#138

On December 29 2012 06:54 Grovbolle wrote:

Show nested quote +

With "less subjective" he probably means taking less "non-data", e.g. weights for tournaments, which make matches in big tournaments count more than online cups.

By the way, why is CombatEx appearing on the list. Did he place well in some tournaments I am not aware of?

KalWarkov

Germany4126 Posts

January 07 2013 15:44 GMT

#139

On January 08 2013 00:40 JustPassingBy wrote:

Show nested quote +

he got 4th in WCG i believe, im not sure how that makes him appear on that list.