Dedicated Chess Computer Test Scores
Moderators: Harvey Williamson, Steve B, Watchman
Forum rules
This textbox is used to restore diagrams posted with the fen tag before the upgrade.
This textbox is used to restore diagrams posted with the fen tag before the upgrade.
My formula have been truncated :
USCF = 180 + 0.94FIDE, if FIDE under 2000
USCF = 20 +1.02FIDE if FIDE greater 2000
fide uscf
2787 2862,74
2773 2848,46
2718 2792,36
2659 2732,18
i have check for the First 5 players of your list , it seems to fit quite well !
But all of this is not really important. This is just to have a reference
br
Nicolas
USCF = 180 + 0.94FIDE, if FIDE under 2000
USCF = 20 +1.02FIDE if FIDE greater 2000
fide uscf
2787 2862,74
2773 2848,46
2718 2792,36
2659 2732,18
i have check for the First 5 players of your list , it seems to fit quite well !
But all of this is not really important. This is just to have a reference
br
Nicolas
Another piece of information. At the recent Millionaire Open #2 in Las Vegas, FIDE ratings were used without adjustment in the Open Section, where available or USCF rating was used. In ALL other sections, any FIDE ratings used had 60 points added to the FIDE rating. Not sure of the methodology used to arrive at a 60 point difference, but the organizers were trying to eliminate any sandbagging or players playing in the wrong section.
Rating regards,
Brian B
Rating regards,
Brian B
- spacious_mind
- Senior Member
- Posts: 4018
- Joined: Wed Aug 01, 2007 10:20 pm
- Location: Alabama
- Contact:
Can you also try the less than 2000 ratingscandien wrote:My formula have been truncated :
USCF = 180 + 0.94FIDE, if FIDE under 2000
USCF = 20 +1.02FIDE if FIDE greater 2000
fide uscf
2787 2862,74
2773 2848,46
2718 2792,36
2659 2732,18
i have check for the First 5 players of your list , it seems to fit quite well !
But all of this is not really important. This is just to have a reference
br
Nicolas
Regards
Nick
Nick
- spacious_mind
- Senior Member
- Posts: 4018
- Joined: Wed Aug 01, 2007 10:20 pm
- Location: Alabama
- Contact:
Cool that seems to fit between the 41 and 87 that I had foundBrian B wrote:Another piece of information. At the recent Millionaire Open #2 in Las Vegas, FIDE ratings were used without adjustment in the Open Section, where available or USCF rating was used. In ALL other sections, any FIDE ratings used had 60 points added to the FIDE rating. Not sure of the methodology used to arrive at a 60 point difference, but the organizers were trying to eliminate any sandbagging or players playing in the wrong section.
Rating regards,
Brian B
So therefore still I believe you can use the chess.com scale up to 1700 and work a progression upwards
Best regards
Nick
Hello
whatever the used rating system ( which may introduce only an offset) it seems that , when tuned the "Computer Vs Computer" to the "Man Vs Machine" results , we have to for the stronger engines adjust the Computer rating.
After a while and several test i consider that for stronger engine (rating over 2000 in my rating list) i have to adjust my rating list with the following formula :
"Rating Vs Human"=( 2000+ (("Vs Computer Rating"-2000)*0.6)).
My own test seems to show that this "tuning" is not necessary for machine with rating under 2000.
The level vs human data are coming from :
. CRA Rating ,
. Rating from game versus rated human ( Internet information mostly from AEGON or "Hombre versus Machina" data)
. Internet versus Human on FICS
. SelSr Data ( when available for a few machines)
All those data seems to be relevant and compliant. I remember that L Kaufman did the same observation in his old CCR rating. I never really understand why the rating of stronger machine must have to be tune but it seems to be done.
Strangely i found that the Elo Aktiv rating list seems to be ok and doesn't need to be tune ....
I will be happy to get any explanation (or idea) for this behavior
Br
Nicolas
whatever the used rating system ( which may introduce only an offset) it seems that , when tuned the "Computer Vs Computer" to the "Man Vs Machine" results , we have to for the stronger engines adjust the Computer rating.
After a while and several test i consider that for stronger engine (rating over 2000 in my rating list) i have to adjust my rating list with the following formula :
"Rating Vs Human"=( 2000+ (("Vs Computer Rating"-2000)*0.6)).
My own test seems to show that this "tuning" is not necessary for machine with rating under 2000.
The level vs human data are coming from :
. CRA Rating ,
. Rating from game versus rated human ( Internet information mostly from AEGON or "Hombre versus Machina" data)
. Internet versus Human on FICS
. SelSr Data ( when available for a few machines)
All those data seems to be relevant and compliant. I remember that L Kaufman did the same observation in his old CCR rating. I never really understand why the rating of stronger machine must have to be tune but it seems to be done.
Strangely i found that the Elo Aktiv rating list seems to be ok and doesn't need to be tune ....
I will be happy to get any explanation (or idea) for this behavior
Br
Nicolas
- Steve B
- Site Admin
- Posts: 10146
- Joined: Sun Jul 29, 2007 10:02 am
- Location: New York City USofA
- Contact:
Hi Nicholasscandien wrote:Hello
whatever the used rating system ( which may introduce only an offset) it seems that , when tuned the "Computer Vs Computer" to the "Man Vs Machine" results , we have to for the stronger engines adjust the Computer rating.
After a while and several test i consider that for stronger engine (rating over 2000 in my rating list) i have to adjust my rating list with the following formula :
"Rating Vs Human"=( 2000+ (("Vs Computer Rating"-2000)*0.6)).
My own test seems to show that this "tuning" is not necessary for machine with rating under 2000.
The level vs human data are coming from :
. CRA Rating ,
. Rating from game versus rated human ( Internet information mostly from AEGON or "Hombre versus Machina" data)
. Internet versus Human on FICS
. SelSr Data ( when available for a few machines)
Selective Search published a Dedicated Vs Human rating list each month for years and it included alot more then a few machines
the last published list i see is from 2005 and includes about 100 different computers
number of games played per machine ranged from a small sample to hundreds of games per machine
i didnt scientifically analyze the human rating list but there does not seem to be a pattern for computers rated over 2000 or under 2000
or under 2200 or over 2200 or any other elo range
some machines are rated higher against human some less...some rating differences are as Little as 10 pts and some as high as 200 pts
an example of differences Vs.humans
Tasc R30 -78
Mephisto Genius 030 +10
Fidelity Eag V5 +25
Mephisto MM5 -119
Extensive List Regards
Steve
Last edited by Steve B on Sat Jan 23, 2016 11:09 am, edited 1 time in total.
- spacious_mind
- Senior Member
- Posts: 4018
- Joined: Wed Aug 01, 2007 10:20 pm
- Location: Alabama
- Contact:
- spacious_mind
- Senior Member
- Posts: 4018
- Joined: Wed Aug 01, 2007 10:20 pm
- Location: Alabama
- Contact:
Hi Nicolas,scandien wrote:Hello
whatever the used rating system ( which may introduce only an offset) it seems that , when tuned the "Computer Vs Computer" to the "Man Vs Machine" results , we have to for the stronger engines adjust the Computer rating.
After a while and several test i consider that for stronger engine (rating over 2000 in my rating list) i have to adjust my rating list with the following formula :
"Rating Vs Human"=( 2000+ (("Vs Computer Rating"-2000)*0.6)).
My own test seems to show that this "tuning" is not necessary for machine with rating under 2000.
The level vs human data are coming from :
. CRA Rating ,
. Rating from game versus rated human ( Internet information mostly from AEGON or "Hombre versus Machina" data)
. Internet versus Human on FICS
. SelSr Data ( when available for a few machines)
All those data seems to be relevant and compliant. I remember that L Kaufman did the same observation in his old CCR rating. I never really understand why the rating of stronger machine must have to be tune but it seems to be done.
Strangely i found that the Elo Aktiv rating list seems to be ok and doesn't need to be tune ....
I will be happy to get any explanation (or idea) for this behavior
Br
Nicolas
In a sense no list needs tuning because the data as it stands is correct since it reflects the results played. But for tuning against humans you would only be correct in your list selectively if you only fix the over 2000 for humans.
Absolutely it needs tuning from top to bottom. As I stated before the bottom ones when compared to humans are far wronger than even the top ones.
You have to start from the bottom. As I have stated previously there is no such thing as a computer that is weaker than a beginner. If you want to really fix the problem then you have to consider that at the bottom there would be hardly a computer below 1000 USCF or 1200 FIDE. Currently all those lists whether it is dedicated or engines run as low as 700. Now since you are all telling me these lists are European and FIDE that would make the 700 computer be 500 USCF and neither of those exist. Especially since FIDE cannot be lower than USCF.
You are at the moment only adjusting a 25% problem and not considering the other 75% problem. It is a 100% problem that must be fixed.
Best regards
Nick
Yes i know.. i tried unsuccessfully to get this rating list .... Thi sshould be interesting ....Steve B wrote: Selective Search published a Dedicated Vs Human rating list each month for years and it included alot more then a few machines
the last published list i see is from 2005 and includes about 100 different computers
number of games played per machine ranged from a small sample to hundreds of games per machine
I think that it rely mostly on the program tyle. Some are really effective versus human ( as the Henne, Kaplan or Lang programs) and other are tuned to play versus other computer ( Rathsman's or Morsch's program). Schroder's ones may be better versus human ( MMIV Polgar) or computer ( MMV).Steve B wrote:
i didnt scientifically analyze the human rating list but there does not seem to be a pattern for computers rated over 2000 or under 2000
or under 2200 or over 2200 or any other elo range
some machines are rated higher against human some less...some rating differences are as Little as 10 pts and some as high as 200 pts
an example of differences Vs.humans
Tasc R30 -78
Mephisto Genius 030 +10
Fidelity Eag V5 +25
Mephisto MM5 -119
Extensive List Regards
Steve
br
Nicolas
- spacious_mind
- Senior Member
- Posts: 4018
- Joined: Wed Aug 01, 2007 10:20 pm
- Location: Alabama
- Contact:
Sure a players way of playing can be affected by the computer opponent they play, I am not disagreeing with that. The R30 however if you read Kopec again is the ideal candidate for the Prestige rating adjustment he listed. He is at the top of the food chain he will have that effect as part of his rating for sure.scandien wrote:Yes i know.. i tried unsuccessfully to get this rating list .... Thi sshould be interesting ....Steve B wrote: Selective Search published a Dedicated Vs Human rating list each month for years and it included alot more then a few machines
the last published list i see is from 2005 and includes about 100 different computers
number of games played per machine ranged from a small sample to hundreds of games per machine
I think that it rely mostly on the program tyle. Some are really effective versus human ( as the Henne, Kaplan or Lang programs) and other are tuned to play versus other computer ( Rathsman's or Morsch's program). Schroder's ones may be better versus human ( MMIV Polgar) or computer ( MMV).Steve B wrote:
i didnt scientifically analyze the human rating list but there does not seem to be a pattern for computers rated over 2000 or under 2000
or under 2200 or over 2200 or any other elo range
some machines are rated higher against human some less...some rating differences are as Little as 10 pts and some as high as 200 pts
an example of differences Vs.humans
Tasc R30 -78
Mephisto Genius 030 +10
Fidelity Eag V5 +25
Mephisto MM5 -119
Extensive List Regards
Steve
br
Nicolas
Certain other computers for sure will also have this. You have to remember that the lists don't play opponents equally and are strongly influenced by the opponents they played, therefore there are going to be anomalies throughout a large list. The rating formulas also don't allow for calibrations therefore the bigger the list the bigger the deviations at the top and at the bottom from reality.
best regards
Nick
- spacious_mind
- Senior Member
- Posts: 4018
- Joined: Wed Aug 01, 2007 10:20 pm
- Location: Alabama
- Contact:
I have adjusted the chess.com list to go all the way to USCF 800 = Absolute beginner and matched it to the FIDE equivalent per chess.com analysis.
Here are the original SSDF ratings compared to today's list.
Hopefully you can see as I can see clearly the pattern between original SSDF and Chess.com's FIDE.
Best regards
Here are the original SSDF ratings compared to today's list.
Hopefully you can see as I can see clearly the pattern between original SSDF and Chess.com's FIDE.
Best regards
Nick
May be we are all wrong ... the level of computer chess vs human is not really working as the Man Vs Man rating.
I have run several match on Internet for several machine. For every machine ( except for MEPHISTO MIRAGE and KRYPTON REGENCY) the result was clear. The machine were playing at a specific level ( the range of compliant opponent was each time 100 points wide). Player below this range were crushed and players above this range won easily ( a set of player better than the machine by 200 points won 80-85% of the game ).
The results of the machine are not compliant with ELO formulas.
So when you defeat a machine in a match ( i just run a small match in 4 games versus the romaII and i won by 2.5 to 0.5 , the 4th game was useless as i qualified for the next step). This is a small match but i am sure not to be rated 273 points over the Roma.
The results are the same in Computer versus computer match ( MEPHISTO NIGEL SHORT outclass SAITEK TURBO ADVANCED TRAINER by 2.5 to 0.5, and NOVAG SUPER FORTE C outclass MEPHISTO MODENA with the same score).
This is the reason why i try to run compliant games or match with various opponents.
Br
Nicolas[/img]
I have run several match on Internet for several machine. For every machine ( except for MEPHISTO MIRAGE and KRYPTON REGENCY) the result was clear. The machine were playing at a specific level ( the range of compliant opponent was each time 100 points wide). Player below this range were crushed and players above this range won easily ( a set of player better than the machine by 200 points won 80-85% of the game ).
The results of the machine are not compliant with ELO formulas.
So when you defeat a machine in a match ( i just run a small match in 4 games versus the romaII and i won by 2.5 to 0.5 , the 4th game was useless as i qualified for the next step). This is a small match but i am sure not to be rated 273 points over the Roma.
The results are the same in Computer versus computer match ( MEPHISTO NIGEL SHORT outclass SAITEK TURBO ADVANCED TRAINER by 2.5 to 0.5, and NOVAG SUPER FORTE C outclass MEPHISTO MODENA with the same score).
This is the reason why i try to run compliant games or match with various opponents.
Br
Nicolas[/img]
Hello again
i get some interesting result for machine vs man under tournament conditions. The Reference is the FIDE rating:
Novag Sapphire : 34 games - perf 2087 (2100 on FICS)
Novag Diablo/Scorpio: 19 games - 2185
MEPHISTO ATLANTA : 6 games - perf 2382
MEPHISTO MILANO/POLGAR : 6 games - perf 2237
MEPHISTO BERLIN PRO : 7 GAMES - perf 2300
FIDELITY DESIGNER MACH III : 4 Games - perf 2160
a lot of data are coming from Hombre machina tounament or AEGON results.4
I know that 6 games is not a lot but it may help for a good tuning. And 19 gales or 34 games are enough to get an official rating.
BR
Nicolas
i get some interesting result for machine vs man under tournament conditions. The Reference is the FIDE rating:
Novag Sapphire : 34 games - perf 2087 (2100 on FICS)
Novag Diablo/Scorpio: 19 games - 2185
MEPHISTO ATLANTA : 6 games - perf 2382
MEPHISTO MILANO/POLGAR : 6 games - perf 2237
MEPHISTO BERLIN PRO : 7 GAMES - perf 2300
FIDELITY DESIGNER MACH III : 4 Games - perf 2160
a lot of data are coming from Hombre machina tounament or AEGON results.4
I know that 6 games is not a lot but it may help for a good tuning. And 19 gales or 34 games are enough to get an official rating.
BR
Nicolas
- Steve B
- Site Admin
- Posts: 10146
- Joined: Sun Jul 29, 2007 10:02 am
- Location: New York City USofA
- Contact:
the Milano And Polgar not the same exact program so why listed together?scandien wrote:Hello again
i get some interesting result for machine vs man under tournament conditions. The Reference is the FIDE rating:
Novag Sapphire : 34 games - perf 2087 (2100 on FICS)
Novag Diablo/Scorpio: 19 games - 2185
MEPHISTO ATLANTA : 6 games - perf 2382
MEPHISTO MILANO/POLGAR : 6 games - perf 2237
MEPHISTO BERLIN PRO : 7 GAMES - perf 2300
FIDELITY DESIGNER MACH III : 4 Games - perf 2160
a lot of data are coming from Hombre machina tounament or AEGON results.4
I know that 6 games is not a lot but it may help for a good tuning. And 19 gales or 34 games are enough to get an official rating.
BR
Nicolas
anyway ...Lets Compare to Sel Ser human list from 2005
Novag Sapphire : 83 games - 2139
Novag Diablo/Scorpio: 140 games - 2126
MEPHISTO ATLANTA : 9 games -2357
MEPHISTO MILANO 14 games -2087
Mephisto POLGAR 5 Mhz 17 games 2076
MEPHISTO BERLIN PRO : 29 GAMES -2217
FIDELITY DESIGNER MACH III : 245 Games - 2107
All Close except for the MILANO/POLGAR rating
Comparative Regards
Steve
- spacious_mind
- Senior Member
- Posts: 4018
- Joined: Wed Aug 01, 2007 10:20 pm
- Location: Alabama
- Contact:
Yep that's all interesting. Kaufmann spent a lot of time on R30 in 1994. He was quite impressed with it and rated King 2.5 at around 2530. The 1994 list showed King 2.2 with a Mean 2526. CCN (Eric Hallsworth) at 2521 and PLY (SSDF) at 2530.Steve B wrote:the MIL And Polgar not the same exact program so why listed together?scandien wrote:Hello again
i get some interesting result for machine vs man under tournament conditions. The Reference is the FIDE rating:
Novag Sapphire : 34 games - perf 2087 (2100 on FICS)
Novag Diablo/Scorpio: 19 games - 2185
MEPHISTO ATLANTA : 6 games - perf 2382
MEPHISTO MILANO/POLGAR : 6 games - perf 2237
MEPHISTO BERLIN PRO : 7 GAMES - perf 2300
FIDELITY DESIGNER MACH III : 4 Games - perf 2160
a lot of data are coming from Hombre machina tounament or AEGON results.4
I know that 6 games is not a lot but it may help for a good tuning. And 19 gales or 34 games are enough to get an official rating.
BR
Nicolas
anyway ...Lets Compare to Sel Ser human list from 2005
Novag Sapphire : 83 games - 2139
Novag Diablo/Scorpio: 140 games - 2126
MEPHISTO ATLANTA : 9 games -2357
MEPHISTO MILANO 14 games -2087
Mephisto POLGAR 5 Mhz 17 games 2076
MEPHISTO BERLIN PRO : 29 GAMES -2217
FIDELITY DESIGNER MACH III : 245 Games - 2107
All Close except for the MILANO/POLGAR rating
Comparative Regards
Steve
Ok so if you take the mean at 2526 for King 2.2 and if you deduct say the GM average difference of 87 or the 60 from the casinos then you would have a rating for King 2.2 of maybe somewhere between 2439 - 2466 ELO. The Active list rating has it at 2367 which is a difference of between 72 - 99 ELO.
Atlanta in the Active List is 2266 which is 101 ELO lower than the Active King 2.2 rating.
Based on your two independent Human Atlanta ratings the average rating over the combined 15 games is 2367.
Therefore you have the following:
1) Kaufmann, CCN & PLY would probably have rated Atlanta at around 2435
2) Deduct 87 and you get 2348
3) Deduct 60 and you get 2375
Mean = 2362
It is all very close really.
So lets assume the 60 deduction becomes a standard for over 2000. Then at near the top of the dedicated List you would have:
King 2.2 at 2466 as a list calibration computer
Atlanta at 2367 as a list calibration computer (your human rating)
At the bottom you could have:
MK I at 985 as a list calibration computer
CC7 (has plenty of games) at 1311 as a list calibration computer
And I think you would soon find that most other computers would quite closely fall into place for a reasonably good comparison against humans.
The problems with todays ratings software like ELO stat or Bayesian is that you can only calibrate one computer. Which means that when you run your list with 300-500 computers, you are very quickly so far away from human ratings that it becomes impossible to compare accurately across the whole list. Think about it you are expecting good human comparison across 500 computers based on 1 calibration. It just doesn't work today.
You almost need 1 calibration per say every 50 computer programs to have a good chance of being accurate across the complete list.
So anyway this brings the circle around to why I posted questions to Nicolas originally
Best regards
Nick