Chess-like player ratings (Was: Scraping policy) |
tilps Kwon-Tom Obsessive Puzzles: 6720 Best Total: 18m 37s | Posted - 2007.01.16 21:06:37 Latest Ratings Stable ratings Ratings Movement between Stable and Latest
Ratings movement will become more interesting over time I think, for now its Almost a requote of latest ratings in some ways, because stable ratings are so poorly defined.
Last edited by Tilps - 2007.01.23 03:45:08 |
Nis Kwon-Tom Obsessive Puzzles: 2208 Best Total: 22m 1s | Posted - 2007.01.17 01:11:20 Would you care to elaborate on the formula you use to calculate the rating? I have spent quite a bit of time playing with backgammon rating formulas, and would be interested in what you are doing. |
Tilps Kwon-Tom Obsessive Puzzles: 6720 Best Total: 18m 37s | Posted - 2007.01.17 02:39:36 I use the expected win formula out of the wikipedia article on chess ratings for calculating the probability that player a will rank higher then player b in a given days puzzle. (Two players of the same rating should have equal chances of ranking higher or lower than each other in any given day.)
For each player in a competition, the probability of ranking lower than each player (including themselves) is summed and 0.5 added. This gives their 'expected rank'. This is compared to their actual rank and the difference between expected and actual ranks is divided by the total number of contestants to normalize for the fact that some competitions have more people then other. This normalized number is multiplied by a volatility constant which basicaly just controls how fast the ratings change and then added to the players previous rating.
First time players are given 1500 estimated rating, and non-competitors keep their previous rating.
The probability formula from wikipedia has the advantage of being easy to calculate in c#, as the erf function is not provided which would be needed for normal distribution probability calculations. |
Tilps Kwon-Tom Obsessive Puzzles: 6720 Best Total: 18m 37s | Posted - 2007.01.17 08:01:57 Found a bug - user names are case insensitive, but my program was treating them as case sensitive. This meant there was two tilps entries in the list, not sure if it affected anyone else. Will have to merge the results.
Update: Results have been updated to correct the issue - I think 6 people were affected. Still doesn't fix the fact that my rating is only 1508
Last edited by Tilps - 2007.01.17 08:13:16 |
Brian Kwon-Tom Obsessive Puzzles: 4907 Best Total: 9m 6s | Posted - 2007.01.17 18:55:30 This is interesting.
I think I'm relatively better on the big weekend puzzles (equivalently, relatively worse on the shorter weekday puzzles), so I'm guessing that I'll rank lower in your system than I would by ranking weekly times. (On the other hand, this effect might simply be due to shorter puzzles having a higher variance in solving time.) But in any case it would be interesting to see how well these two ranking systems correlate.
Also, if you kept doing this for a while, it would be interesting to see ratings for specific days, because I'm sure some people are (relatively) better at solving hard puzzles than easy ones or vice versa.
Last edited by Brian - 2007.01.17 18:56:19 |
astrokath Kwon-Tom Obsessive Puzzles: 3258 Best Total: 13m 42s | Posted - 2007.01.17 19:12:53
Quote: Originally Posted by brian This is interesting.
I think I'm relatively better on the big weekend puzzles (equivalently, relatively worse on the shorter weekday puzzles), so I'm guessing that I'll rank lower in your system than I would by ranking weekly times. (On the other hand, this effect might simply be due to shorter puzzles having a higher variance in solving time.) But in any case it would be interesting to see how well these two ranking systems correlate.
Also, if you kept doing this for a while, it would be interesting to see ratings for specific days, because I'm sure some people are (relatively) better at solving hard puzzles than easy ones or vice versa. |
Mmm. I'm not quite as competetive as I used to be for the quickest puzzles, but I seem to be holding my own for the trickier-than-average ones. Of course, the long, slow puzzles don't get you on the BECT board! |
Tilps Kwon-Tom Obsessive Puzzles: 6720 Best Total: 18m 37s | Posted - 2007.01.17 20:36:08
Quote: Originally Posted by brian This is interesting.
...
Also, if you kept doing this for a while, it would be interesting to see ratings for specific days, because I'm sure some people are (relatively) better at solving hard puzzles than easy ones or vice versa. |
My thoughts have headed this direction too - but its going to need to be Months before showing such ratings mean anything significant.
Its easy to tack onto my current system, but I think I will wait a month before showing them for the first time.
Edit: Files have been updated again to include the last 12 hours of data for latest/movement.
Last edited by Tilps - 2007.01.17 21:19:48 |
Tilps Kwon-Tom Obsessive Puzzles: 6720 Best Total: 18m 37s | Posted - 2007.01.18 00:03:32
Quote: Originally Posted by brian This is interesting.
I think I'm relatively better on the big weekend puzzles (equivalently, relatively worse on the shorter weekday puzzles), so I'm guessing that I'll rank lower in your system than I would by ranking weekly times. (On the other hand, this effect might simply be due to shorter puzzles having a higher variance in solving time.) But in any case it would be interesting to see how well these two ranking systems correlate.
|
I have done some thinking on this matter too now. I can implement another set of pages which show ratings changes based on 'weekly times' however I have several different designs which may be worth considering.
1) Ratings update 'daily' like the current ratings. The weekly total is calculated for each day by going back for the last 7 days. This means each day is included in 7 different ratings calculations. a) A competitor is included if they have done any of the last 7 puzzles for that day, otherwise they are deemed non-competing and keep their previous rating. b) A competitor is included if they have done all 7 previous puzzles, otherwise they are deemed non-competing and keep their previous rating.
2) Ratings are updated once a week, based only on days which are closed from further competition. (Only a Stable-Weekly.html generated, no Latest-Weekly.html). Each day is only included in one rating calculation. It seems pretty obvious that competitors should only be excluded from rating calculation and change if they don't compete in any of the 7 days.
3) New ratings are generated each day like in 1), but the rating calculations are done like in 2), with the exception that maybe a Latest-Weekly.html can be generated which is useful.
Problems with 2 and 3 is the number of rating changes between the begining of time and now, is much smaller, which means it takes longer for the ratings to have good meaning. Problem with 1 is that the each day gets used in so many ratings.
I guess I could do both 1 and 3 (given that 2 is just Sundays Stable-Weekly.html from 3) and see how they go. |
Tilps Kwon-Tom Obsessive Puzzles: 6720 Best Total: 18m 37s | Posted - 2007.01.18 08:34:20 Okay, here we go. More ratings than you can poke a stick at.
Kwon-Tom Ratings
We now have: Normal, where rating is updated each day based on todays performance.
Weekly, where rating is updated weekly, based on the last weeks performance. (On Tuesdays, Weekly is the week ending tuesday, on wednesdays, Weekly is the week ending wednesday, and so on.)
Rolling, where rating is updated daily, based on the last weeks performance.
Daily, where rating is updated weekly, based on the last days performance. Since people will always be interested in each of the 7 different daily ratings, all 7 are available, rather than just todays as it is for weekly.
Rolling and weekly stats are very vulnerable to missing out on a puzzle, rolling even more than weekly, I suspect. Therefore, given the incomplete data I have at the start of my score table, these ratings may show very unexpected results for quite some time.
Daily stats simply need alot more data (as do weekly) before they become really interesting, both need 7 times more data then rolling and normal, and I suspect at least 3 weeks of data is needed for normal before the top rated players ratings stop rising monotonically.
Enjoy!
Last edited by Tilps - 2007.01.23 03:45:34 |
Tilps Kwon-Tom Obsessive Puzzles: 6720 Best Total: 18m 37s | Posted - 2007.01.18 08:39:29 Would anyone like 'wrong' ratings? - it would be quite easy to add them to my system if a LastWrongHour.php page was added to the site much like the LastHour page. |
Stephen Kwon-Tom Obsessive Puzzles: 5215 Best Total: 21m 48s | Posted - 2007.01.18 10:00:30 How about a 'rank' column on the tables ie simply 1-n - if you are outside the top 10 or so, it's depressing to have to count the lines each time to see where you stand! |
foilman Kwon-Tom Admin Puzzles: 3613 Best Total: 24m 6s | Posted - 2007.01.18 10:01:46 Wrongs Solved In Last Hour |
Tilps Kwon-Tom Obsessive Puzzles: 6720 Best Total: 18m 37s | Posted - 2007.01.18 10:42:08 And to show how fast I can add the wrongs - they are done (*will clean up the code later *)
Kwon Tom Ratings Kwon Tom Wrong Ratings
Moved the pages to new urls, since the number of files in my download directory was getting out of hand.
And I added the rank feature too.
Edit: Scores updated again.
Last edited by Tilps - 2007.01.18 21:16:07 |
Tilps Kwon-Tom Obsessive Puzzles: 6720 Best Total: 18m 37s | Posted - 2007.01.19 23:10:02 Another Daily update done.
Edit: And another.
Last edited by Tilps - 2007.01.21 00:05:46 |
Brian Kwon-Tom Obsessive Puzzles: 4907 Best Total: 9m 6s | Posted - 2007.01.21 00:39:23
Quote: Originally Posted by tilps Another Daily update done.
Edit: And another. |
So I take it that you manually update the webpage that lists the rankings, but you have a program that has them at any moment?
It might be interesting to have not just the latest movement, but individual histories too, possibly in graph form. (You've probably already considered this.) Of course this would only be useful after a few weeks when the ratings have levelled off.
Anyway, cool stuff. |
m2e Kwon-Tom Obsessive Puzzles: 607 Best Total: 16m 43s | Posted - 2007.01.21 04:26:57 Also what happens when a player misses a day? |
Tilps Kwon-Tom Obsessive Puzzles: 6720 Best Total: 18m 37s | Posted - 2007.01.21 06:32:19 Missing a day is quite bad for weekly ratings, very bad for rolling ratings, and not-relevant for the rest. For the rest of the ratings, if you miss a day, you keep your existing ratings. But because weekly and rolling are based on 'weekly total time' all people with 6 completed days are ranked lower than 7 completed days, in correspondence with the leader board.
And long term player histories - I'll consider them, but since I'm only using static web pages, that is a very large number of static web pages to be producing at all times. (And I'm currently spending my spare time on development for my Apple Hunt game, so not so much time to spend learning php in order to generate the player histories on demand.
Edit: Updated ratings again.
Last edited by Tilps - 2007.01.21 22:04:29 |
fgnn Kwon-Tom Obsessive Puzzles: 717 Best Total: 19m 46s | Posted - 2007.01.22 02:42:05 And how does bombing a puzzle affect your overall score. Like a 2 minute puzzle and 4 minute puzzle are both good attempts, but a 2day puzzle was clearly a screwup and is not representative of how good you actually are. Do you account for this at all? (sory if you already said this) |
Tilps Kwon-Tom Obsessive Puzzles: 6720 Best Total: 18m 37s | Posted - 2007.01.22 02:49:06 Absolute score takes no part in the calculation of new ratings, only your ranking within a given puzzle. Therefore if you do exceptionaly badly on one puzzle, the worst you can do is come last. Your rating will probably suffer, but its only going to be a short term thing, you will rebuild your rating based on your normal performance. The greatest number of points your rating can change in a single day is 30.
Ofcourse, someone trying to game the system to boost their rating, would just not submit at all for a day that they think they have done really badily on. This however is only likely to help your ratings on the normal and individual day ratings - it would be a major detriment to their scores on the weekly and rolling, since doing 7 puzzles is Always considered better than 6, in those cases. |
Tilps Kwon-Tom Obsessive Puzzles: 6720 Best Total: 18m 37s | Posted - 2007.01.23 03:46:52 Updated again about 5-6 hours ago - just went through and updated all the links in my posts in this thread to point to the location I moved them to last weekend. |