Dedicated Chess Computer Test Scores

This forum is for general discussions and questions, including Collectors Corner and anything to do with Computer chess.

Moderators: Harvey Williamson, Steve B, Watchman

Forum rules
This textbox is used to restore diagrams posted with the fen tag before the upgrade.
Post Reply
User avatar
scandien
Member
Posts: 206
Joined: Mon Sep 12, 2011 1:15 pm
Contact:

Post by scandien »

Steve B wrote:
the Milano And Polgar not the same exact program so why listed together?


Steve
I was quite sure that those two program waere nearly the same. For me the ilano is a Polgar program put in a cheaper machine ( with some little Improvement)
was i wrong ?
br

Nicolas
User avatar
spacious_mind
Senior Member
Posts: 4001
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

scandien wrote:
Steve B wrote:
the Milano And Polgar not the same exact program so why listed together?


Steve
I was quite sure that those two program waere nearly the same. For me the ilano is a Polgar program put in a cheaper machine ( with some little Improvement)
was i wrong ?
br

Nicolas
Hi Nicolas

Yes Milano, Polgar and Nigel Short should always be kept separated.

Regards
Nick
User avatar
spacious_mind
Senior Member
Posts: 4001
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

Was just checking to see if there were any major rating variances through the years.

Here is Bobby Fischer's rating:

Image

Well apparently only a minor difference between USCF and FIDE for Bobby Fischer.

No major changes regards...
Nick
User avatar
scandien
Member
Posts: 206
Joined: Mon Sep 12, 2011 1:15 pm
Contact:

Post by scandien »

Fischer is not a good example.. his level was so high that any rating would be useless.... He was better than all the Soviet grand masters , and outclassed every Western Champion of his time....

i precise that i am a Bobby's Fan :)

here is an interesting link where you can see some information about the rating

http://web.tecnico.ulisboa.pt/diogo.fer ... rength.pdf

br


Nicolas
User avatar
spacious_mind
Senior Member
Posts: 4001
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

scandien wrote:Fischer is not a good example.. his level was so high that any rating would be useless.... He was better than all the Soviet grand masters , and outclassed every Western Champion of his time....

i precise that i am a Bobby's Fan :)

here is an interesting link where you can see some information about the rating

http://web.tecnico.ulisboa.pt/diogo.fer ... rength.pdf

br


Nicolas
The example is a correlation between USCF and FIDE. Not regarding his strengths or even weaknesses :) Those ratings I posted are official ratings. I am not translating his abilities of whether he should be higher or lower :)

You know that if you evaluate every single Grandmaster game or all of Bobby Fisher games you will get a range of performances from him probably ranging between 2000 - 3400 ELO. You can't even say that the Byrne game was the best game he ever played without doing an evaluation of all his games.

Best regards
Nick
User avatar
spacious_mind
Senior Member
Posts: 4001
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

Hi Nicolas,

You mentioned that you had struggles with Kaplan programs. Well you are not the only one. Take a look at this chart that I created for 1985 and 1983 US Open Championship.

COMPUTERS IN US OPEN CHAMPIONSHIPS

Image

Kaplan's Turbostar 432 had a fantastic tournament in 1985 performing at ELO 2151 against some really good rated opponents. Also, take a look at how CCR's Mean performance rating changed over the years for Turbostar 432.

In 1983 you can see Super Constellation and the rating 2018. Also take a look how the ratings changed over the years. In 1995 Kaufmann started adding a CRA rating list which is performances only of computers versus humans. Do you also notice how Kaufmann rated Super Constellation back to 2018. I am starting to believe that Kaufmann never fully bought into the CCN or Ply arguments of where the ratings should be for a USCF rating. This is why I think he decided to start listing a CCR rating as well, it is also why I think he never fully bought into those lower ratings when comparing against humans. He went back to 2018 ELO for Super Constellation in 1995 under his CRA List.

The British and SSDF lists tried to convince people that people would over the years be more familiar with chess computers and therefore a Super Constellation should now all of a sudden be 1650 or something. You know that's a load of Codswallop.

I notice that a man called Groszek played in both 1983 and in 1985. His USCF rating remained consistent though that period. His performances against computers also remained consistent.

You cannot compare people who have their own chess computer who win by continuously repeating the same formula to win against a computer. The only thing that is real is true performances against humans in proper tournaments or club matches. Therefore I don't buy into any arguments coming from people who are familiar with their own chess computers to justify or prove that the ratings are lower in order to fit a rating list formula. If anything that player should instead adjust his own rating because of familiarity to how he performs against chess computers instead of the other way around.

I have color marked people who played more than once against computers in the above tournaments.

Here is where the ELO 200 came from that has us all confused.

Image

Kaufmann increased CCN by 100 points and PLY by 200 points in order to be able to have a rough comparison between the ratings of the British rating list and the Swedish rating list and allow USA players to get a feel of what strength they are and at the sane time his human versus computer tests would also be aligned. Because his CCR ratings which are based only on humans versus computers, showed different results. Which of course they would since they are a comparison between humans and computers and not a mixture of both or just computers versus computers.

As you can see through the years he tried to make those adjustments when finally in 1995 he decided to show the human against computer results separately again and therefore he went back to showing Super Constellation at 2018. It is probably were Super Constellation rightly should be.

In 1995 he also changed from 200 ELO to 180 ELO to make his PLY adjustments.

Here are some more human ratings for chess computers from the same 1995 CCR report:

Image
Image

Arthur Bisguier we should all know since he also played Bobby Fischer many times also seems to think that Mephisto Montreux is rated around USCF 2495. Deduct 60-87 and you have your approximate FIDE player strength of ELO 2435 - 2409. Fits with Atlanta again?

We are talking about GM Kaufmann, GM Bisguier, and other top rated players. Yet we doubt them just to prove that a computer rating list is right for human comparison? That is really strange.

Chess computer rating lists are right for a relative comparison between computer but todays lists are really not that useful to compare their strength against humans. Which is a pity as this is what I believe Larry Kaufmann had always tried to achieve in his reports until they were finally dropped.

Take any of the above examples from Kaufmann and deduct 60-87 and you are probably close to a FIDE rating for that chess computer.

Best regards

ps. for Constellation 3.6 I dropped the two red results and added Williams from the same year report to get the number up to sufficient players who played against Constellation x.
Nick
User avatar
scandien
Member
Posts: 206
Joined: Mon Sep 12, 2011 1:15 pm
Contact:

Post by scandien »

Hello,

very interesting datas :). Thanks

I am totaly agree with you on the point taht the machine are not weaker nowadays than 30 years ago.. Those machines are playing exactly as the year they were product.
This is the reason why i take ( to rate the machines) a meaning of several sources.
My own computer games , the Aktiv List and CCR to get Computer vs computer sources. And CRA rating , internet rating and my own test over internet ( i run several match).
spacious_mind wrote:
The British and SSDF lists tried to convince people that people would over the years be more familiar with chess computers and therefore a Super Constellation should now all of a sudden be 1650 or something. You know that's a load of Codswallop.
for the SSDF list and the drop in level i am , once again totally agree with you. The reason of this drop seems dubious for me. I think it was caused by the fact that the newer machine/program were over estimated. So the level were reduced by 100.
I think the overestimation of new program was due to what you call the Kopec effect ( and to the phenomenon point on my previous posts: The results of the too stronger machine are over estimate when oppose to too weak oppenents.

On my tests versus internet (on FICS where the standard level is closed to the FIDE level accordingly to FICS statistic and my own experience, and it is quite easy to have a link with the two rating systems) the Constellation ( not even super conny) performs at 1840 in Actions chess versus a 1810 opposition. If , as Larry Kaufman, you reduece this rating by 70 to get 40/2 rating , you can consider that The Constellation is playing at a level near 1770...
To get the Super Conny level you should add at least 60 points (Aktiv List) so the level should be 1830 on FICS so at leat 1830 on FIDE (if we consider FICS stats the the app level would be 1900) . Using The USCF formula you get a 1900 (or 1930) USCF Rating.

the detailled results are found here

[url]file:///H:/Perso/echecs/echiqiuer%20de%20Bures/Machine_vs_Internet_en.xhtml[/url]

I don't translate yet that page but the Constellation could have get a really better rating if his end game handling was better. I don't own a super conny and i don't know if it is better in end game handling.

You can find interesting data on other computer .... i will try to have time to translate the page in english ( on going)

BR

Nicolas
User avatar
spacious_mind
Senior Member
Posts: 4001
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

scandien wrote:Hello,

very interesting datas :). Thanks

I am totaly agree with you on the point taht the machine are not weaker nowadays than 30 years ago.. Those machines are playing exactly as the year they were product.
This is the reason why i take ( to rate the machines) a meaning of several sources.
My own computer games , the Aktiv List and CCR to get Computer vs computer sources. And CRA rating , internet rating and my own test over internet ( i run several match).
spacious_mind wrote:
The British and SSDF lists tried to convince people that people would over the years be more familiar with chess computers and therefore a Super Constellation should now all of a sudden be 1650 or something. You know that's a load of Codswallop.
for the SSDF list and the drop in level i am , once again totally agree with you. The reason of this drop seems dubious for me. I think it was caused by the fact that the newer machine/program were over estimated. So the level were reduced by 100.
I think the overestimation of new program was due to what you call the Kopec effect ( and to the phenomenon point on my previous posts: The results of the too stronger machine are over estimate when oppose to too weak oppenents.

On my tests versus internet (on FICS where the standard level is closed to the FIDE level accordingly to FICS statistic and my own experience, and it is quite easy to have a link with the two rating systems) the Constellation ( not even super conny) performs at 1840 in Actions chess versus a 1810 opposition. If , as Larry Kaufman, you reduece this rating by 70 to get 40/2 rating , you can consider that The Constellation is playing at a level near 1770...
To get the Super Conny level you should add at least 60 points (Aktiv List) so the level should be 1830 on FICS so at leat 1830 on FIDE (if we consider FICS stats the the app level would be 1900) . Using The USCF formula you get a 1900 (or 1930) USCF Rating.

the detailled results are found here

[url]file:///H:/Perso/echecs/echiqiuer%20de%20Bures/Machine_vs_Internet_en.xhtml[/url]

I don't translate yet that page but the Constellation could have get a really better rating if his end game handling was better. I don't own a super conny and i don't know if it is better in end game handling.

You can find interesting data on other computer .... i will try to have time to translate the page in english ( on going)

BR

Nicolas
Hi Nicolas,

I think if you can somehow hold a rating list between something like:

Chess Challenger 10 at 1250 and King 2.2 R30 at 2466 then you will I think see that all the computers will fall into place of where approximately they should be. Even the computers below CC10 will slot in with MK being around 1000 FIDE or 800 USCF and computers higher than R30 King 2.2 the same. The dedicated computers rated higher than R30 King 2.2 would also fall into place.

Everyone would be happy. A human can compare himself to the dedicated computer.

Problem is that the rating calculators are not geared up to do this.

Best regards
Nick
User avatar
spacious_mind
Senior Member
Posts: 4001
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

Lexman recently tested RadioShack Master Chess.

RADIO SHACK MASTER CHESS MODEL 60-2217 - 1998 - FRANS MORSCH

Image

Radio Shack Master Chess is a 16 MHz H8 16 bit processor with 16 KB ROM and probably 1 KB RAM (guessing).

http://www.spacious-mind.com/html/maste ... puter.html

RADIO SHACK MASTER CHESS MODEL 60-2217 TEST RESULT

Image

Radio Shack Master Chess finished the tests with a final score of ELO 2208. I believe that Master Chess is probably the Fritz 2 program. There are a lot of similarities seen in these tests.

Best regards
Nick
User avatar
spacious_mind
Senior Member
Posts: 4001
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

Here are the test results for Novag Beluga:

NOVAG BELUGA - DAVID KITTINGER - 1990

Image

http://www.spacious-mind.com/html/beluga.html

Novag Beluga has an 8 Bit 16 MHz 6301Y which runs at 4 MHz. Beluga has 16 KB ROM and 2 KB RAM.

NOVAG BELUGA - TEST RESULT

Image

Novag Beluga completed the tests with a score of 2026 ELO.

Best regards
Nick
User avatar
spacious_mind
Senior Member
Posts: 4001
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

When I played my U1400 tournaments a few years ago, I mistakenly thought that Level 16 was the level for approx. 30 seconds per move as all the 73 Level Ron Nelson chess computer manuals stated:

Levels 1 - 15 = 1 second per playing level
Level 16 - 72 = 2 seconds per playing level
Level 73 = Infinite.

Recently in my tests I had noticed that level 16 really moves too fast on all these computers and therefore with some trials, I settled on level 25 as being the closest level to computers that have an average time setting of 30 seconds per move.

Since I had just completed the rating test with Novag Beluga where it finished at ELO 2026 which is a little higher then Ron Nelson's E-Chess that had completed the tests with an ELO 1979, I thought I would test Beluga against E-Chess. Beluga played at Level 2 = 60/30 average time setting and Radio Shack E-Chess Model 60-2845 played at level 25 with FAST ON which is a setting that allows E-Chess to play selectively. Also, Beluga has the ability to ponder (think on opponents time) whereas E-Chess does not have this ability.

[Event "Computer Test Match"]
[Site "Pelham, Alabama"]
[Date "2016.01.30"]
[Round "1"]
[White "Radio Shack E-Chess 60-2845, LV 25."]
[Black "Novag Beluga, LV 2 60/30."]
[Result "1/2-1/2"]
[ECO "C29"]
[WhiteElo "1365"]
[BlackElo "1723"]
[PlyCount "94"]
[EventDate "2015.01.30"]
[EventType "match"]
[EventRounds "4"]
[EventCountry "USA"]

1. e4 e5 2. Nc3 Nf6 3. f4 d5 4. fxe5 {NOVAG BELUGA OUT OF BOOK} Nxe4 5. Nxe4 dxe4 {RADIO SHACK E-CHESS MODEL 60-2845 OUT OF BOOK} 6. d4 Nc6 7. Be3 Be7 8. Bb5 Bd7 9. Bxc6 Bxc6 10. Ne2 O-O 11. O-O f6 12. e6 f5 13. Nc3 Bg5 14. Bxg5 Qxg5 15. d5 Rad8 16. Qd4 f4 17. e7 Qxe7 18. Rad1 e3 19. Rxf4 Rxf4 20. Qxf4 e2 21. Re1 Qc5+ 22. Qf2 Qxf2+ 23. Kxf2 Bxd5 24. Nxd5 Rxd5 25. c4 Ra5 26. a3 Rc5 27. b3 Ra5 28. a4 Kf7 29. Rxe2 Rf5+ 30. Ke3 Rf1 31. Kd4 Rh1 32. g3 Rf1 33. Re5 b6 34. a5 Rd1+ 35. Ke4 Re1+ 36. Kd5 Rxe5+ 37. Kxe5 Ke7 38. a6 c5 39. g4 Kd7 40. Kd5 h6 41. h4 g6 42. Ke5 Ke7 43. h5 g5 44. Kf5 Kf7 45. Ke5 Ke7 46. Kf5 Kf7 47. Ke5 Ke7
{DRAW BY 3 X REPETITION} 1/2-1/2

FINAL POSITION

[fen]8/p3k3/Pp5p/2p1K1pP/2P3P1/1P6/8/8 w - - 0 48[/fen]

Well it was a test game, where I did expect Beluga to win, but this game was played pretty evenly. E-Chess at level 25 is definitely a lot better then it's Active rating of ELO 1365 (played at level 16)! and probably competes against the computers lying at around 1700 ELO at Schachcomputer.Info when playing level 25.

These are the Active games that Beluga has played at Schachcomputer.Info:

178 Novag Beluga : 1723 - 40 (+ 10,= 9,- 21), 36.2 %

Mephisto Europa / Europa A / Marco Polo : 10 (+ 4,= 2,- 4), 50.0 %
Mephisto MM II + HG240 : 10 (+ 2,= 2,- 6), 30.0 %
Mephisto Rebell 5.0 : 20 (+ 4,= 5,- 11), 32.5

Best regards
Nick
User avatar
spacious_mind
Senior Member
Posts: 4001
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

Recently I have posted about differences in ELO's with USCF ratings and FIDE or Eurpoean ratings.

I have done some further reading and searching around regarding this topic and my belief is that the rating calculators that are being used such as ELOStat work well in what they do.

However, the problem lies in that these calculators are not able to calculate a FIDE List from top to bottom correctly because of the limitations of FIDE's rating as magnified by the Chess.com report that I showed earlier with FIDE's Novice rating starting at 1400 ELO and nothing is acknowledged below that. 90% of the world players lie at around 1400 ELO therefore you know that FIDE is not a good calculator for this area since many people who have played for years still play at around this level.

As reported in an earlier post USCF generally considers Beginners to be playing at 800 ELO. Therefore the rating calculators by virtue of this only have the ability to calculate accurately a USCF rating when it comes to chess computers because the relationship between a beginning computer and a human player has to be around the same. A USCF rating is the only rating that can achieve this through a rating calculator.

Another problem is that all rating lists in the past and present are run by Europeans and other countries outside the U.S. Which means that there exists a hole into which a square peg is being forcibly driven into it. Meaning, by nature the calculators that really only work for USCF are being adjusted by Eurpoean ratings to selectively fit what is perceived to be a good rating for selected computers to match their experience which in turn drives down the list to a level where you can neither compare a European player whose own ratings are Fide or a USCF player because the ratings at the bottom of the rating scale end up being too low for everyone.

Therefore since by nature in how the calculators work, you actually end up having a list based on a USCF formula that is too low for USCF and at the same time has little relevance for a European player either.

I think Kaufmann and CCR new all this as their continuous references and adjustments seem to confirm the same.

Image

Notice how everything is always referenced as European Ratings and not FIDE?

It's because you cannot use these calculators to create a FIDE list.

Image

Notice how CCR also has to manually adjust to get a USCF equivalent?

Notice how also they adjusted the SSDF (PLY) Tournament results by 180 points and Blitz by 30 points to give you a CCR rating as follows:

Image

If you look at the above from the 1995 CCR list you will see that R30 King 2.2 was adjusted to have a 2488 Tournament rating and 2512 Blitz rating.

This would mean that PLY had R30 King 2.2 rated on a European list as:

Tournament = 2308 ELO
Blitz = 2488 ELO

Obviously CCR did not buy into this kind of discrepancy and therefore adjusted the rating with only 30 ELO for Blitz instead of their usual 180 for tournament ratings.

Anyway since Brian had sent me some early reports from some good scores he is getting with testing Genius 3, the above chart was pulled up to compare and I noticed that Gideon 3.1 is also showing on this chart. Since Gideon 3.1 is also listed with dedicated computers and showing with DOS computers I thought that Gideon 3.1 might be an ideal reference point for some comparisons, using ELO 2488 as a USCF baseline for Gideon 3.1.

With this in mind I asked Micha at Schachcomputer.Info to run the Active list using Gideon 3.1 as a baseline with 2488 ELO. Micha kindly provided the list and I used this list to do some comparisons and adjustments based on the 60 ELO difference for USCF compared to FIDE reported by Brian B in an earlier post and also the Chess.Com comparison chart for beginners who had both FIDE and USCF ratings. Here is the rating list that I created with all the adjustments:

Image

To the left in green I named the list Kaufmann 95 which created with Gideon 3.1 showing as ELO 2488 as a USCF value. The next column shows the adjustments as an approximate reference to FIDE strength using the Chess.com adjustments and the 60 ELO adjustment on the top end. I then compared it to the current Active List and noticed that the value 111 is constant throughout the list as the difference. The Fide value is also constant with 51 ELO difference until you get closer to the area where the Chess.com adjustments kicked in.

When I saw this I thought wow, the current Schachcomputer Active list would probably only have to be adjusted with a 100 point increase and you would get a really good USCF list from start to finish.

Now I ask you take a close look at the USCF ratings or if you are a FIDE player at the FIDE comparison and compare these computer ratings from top to bottom with:

1) What old experts showed
2) How it matches to an official USCF or Fide scale (especially at the bottom)
3) How certain computers performed against humans in tournaments (ie take the Atlanta report from an earlier post)
4) What your gut tells you as a player without influencing yourself too much because of your familiarity of a specific computer.

I would love to hear from you all and see what your feelings are based on the above rating charts. Does Kaufmann work for you? Does Active List plus 100 work for you? Or does current Active list as it stands work for you?

Best regards
Nick
User avatar
paulwise3
Senior Member
Posts: 1508
Joined: Tue Jan 06, 2015 10:56 am
Location: Eindhoven, Netherlands

Post by paulwise3 »

spacious_mind wrote:When I played my U1400 tournaments a few years ago, I mistakenly thought that Level 16 was the level for approx. 30 seconds per move as all the 73 Level Ron Nelson chess computer manuals stated:

Levels 1 - 15 = 1 second per playing level
Level 16 - 72 = 2 seconds per playing level
Level 73 = Infinite.

Recently in my tests I had noticed that level 16 really moves too fast on all these computers and therefore with some trials, I settled on level 25 as being the closest level to computers that have an average time setting of 30 seconds per move.
Hi Nick,

When playing some games with the Sabre IV I had the same idea about level 16. I will have to look it up, I might even have made a remark already somewhere about it...

I just mailed you my testsheets. The last three columns are new, they are for the Mephisto Master Chess, the Novag Emerald and the Saitek Executive. I saw in your other thread you already tested the Barracuda, I wonder if the results are the same. And please keep in mind my remarks about the Emerald using about twice as much time as the Executive...

Best regards, Paul
2024 Special thread: viewtopic.php?f=3&t=12741
2024 Special results and standings: https://schaakcomputers.nl/paul_w/Tourn ... 25_06.html
If I am mistaken, it must be caused by a horizon effect...
lexman
Member
Posts: 121
Joined: Mon Oct 12, 2015 11:35 pm

Post by lexman »

It would be interesting to compare the different rating lists in terms of percentile. What is the fide rating for someone on the 90th percentile ie better than 90 percent of people in that pool, and do the same for USCF Germany uk etc.
In regard to dedicated computers the position is complex viz humans because one always has the problem of people developing pet lines into which the computer always falls as the program becomes more known as has been alluded to in the thread.
One way of looking at this may be to see if there is a pattern or curve in the programs decrease in strength versus humans and mathematically allow for it.
Another way would have been to ensure when playing humans the game begins from a neutral start position not part of either ones book or repertoire as in themed tournaments.
User avatar
spacious_mind
Senior Member
Posts: 4001
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

paulwise3 wrote:
spacious_mind wrote:When I played my U1400 tournaments a few years ago, I mistakenly thought that Level 16 was the level for approx. 30 seconds per move as all the 73 Level Ron Nelson chess computer manuals stated:

Levels 1 - 15 = 1 second per playing level
Level 16 - 72 = 2 seconds per playing level
Level 73 = Infinite.

Recently in my tests I had noticed that level 16 really moves too fast on all these computers and therefore with some trials, I settled on level 25 as being the closest level to computers that have an average time setting of 30 seconds per move.
Hi Nick,

When playing some games with the Sabre IV I had the same idea about level 16. I will have to look it up, I might even have made a remark already somewhere about it...

I just mailed you my testsheets. The last three columns are new, they are for the Mephisto Master Chess, the Novag Emerald and the Saitek Executive. I saw in your other thread you already tested the Barracuda, I wonder if the results are the same. And please keep in mind my remarks about the Emerald using about twice as much time as the Executive...

Best regards, Paul
Hi Paul,
Thanks, I will get the added to the list. If you test Sabre IV then try to use the same level 25 so that we can compare the individual moves for family relationship.

Best regards
Nick
Post Reply