Dedicated Chess Computer Test Scores

This forum is for general discussions and questions, including Collectors Corner and anything to do with Computer chess.

Moderators: Harvey Williamson, Steve B, Watchman

Forum rules
This textbox is used to restore diagrams posted with the fen tag before the upgrade.
Post Reply
User avatar
spacious_mind
Senior Member
Posts: 4018
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

lexman wrote:It would be interesting to compare the different rating lists in terms of percentile. What is the fide rating for someone on the 90th percentile ie better than 90 percent of people in that pool, and do the same for USCF Germany uk etc.
In regard to dedicated computers the position is complex viz humans because one always has the problem of people developing pet lines into which the computer always falls as the program becomes more known as has been alluded to in the thread.
One way of looking at this may be to see if there is a pattern or curve in the programs decrease in strength versus humans and mathematically allow for it.
Another way would have been to ensure when playing humans the game begins from a neutral start position not part of either ones book or repertoire as in themed tournaments.
Yes I agree. Everything would be nice if the world were to abolish all current ratings and just agree on one rating for everyone. That would be too easy.

The adjustments I made at the bottom of the scale follow a natural progressive scale until USCF and FIDE reach the 60 difference as shown in Chess.com for players that have both ratings which were compared at Chess.com. The number 60 is achieved at ELO 1950 USCF or ELO 1890 FIDE after which I left the 60 as constant. You could argue per conversion scale that is used by USCF that:

2000 = 60
2050 = 61
2100 = 62
2150 = 63
2200 = 64
2250 = 65
2300 = 66
2350 = 67
2400 = 68
2450 = 69
2500 = 70

This is a 1 point graduation for every 50 points after 2000 ELO differences between USCF and FIDE ie 2500 = 2570 USCF, which would mean that R40 in the Info Active Plus 100 list would have a FIDE rating of 2435 and Mephisto Portorose 68020 would have a FIDE rating of 2185

Regarding human and their familiarity with their computers. I look at it this way. When you sit at home drinking coffee and playing chess you can become world champion in your own mind. You are relaxed, you can take back moves, if you mess up you turn off the computer and no one knows it. You keep repeating the sequence of moves over and over again to win, and when you finally beat your computer repeatedly through your unnatural behavior of how you play your computer opponent which is totally different to what you would get away with when playing a human, you then treat the computer with contempt because you now feel that you are superior :)

I tend to ignore peoples opinions of where a computer should be because of this, as I also have the same tendencies, it is normal.

To be fair to a computer, go to a chess club, let someone else bring in a chess computer that he may own (even if you also own it) Let that person sit across the table from you with a chess clock and now play a match or several matches. Then lets see what the performances really are, where you have the pressure of a clock, no take backs or mistakes. Lets see what would really happen then :)

Best regards
Nick
User avatar
paulwise3
Senior Member
Posts: 1508
Joined: Tue Jan 06, 2015 10:56 am
Location: Eindhoven, Netherlands

Post by paulwise3 »

spacious_mind wrote:If you test Sabre IV then try to use the same level 25 so that we can compare the individual moves for family relationship.
Ok, I will do that :)

Best regards, Paul
2024 Special thread: viewtopic.php?f=3&t=12741
2024 Special results and standings: https://schaakcomputers.nl/paul_w/Tourn ... 25_06.html
If I am mistaken, it must be caused by a horizon effect...
User avatar
paulwise3
Senior Member
Posts: 1508
Joined: Tue Jan 06, 2015 10:56 am
Location: Eindhoven, Netherlands

Post by paulwise3 »

Hi Nick,

I started testing the Saber IV at level 25. Looking at it's display I noticed that it regularly exceeds the 30 secs/move, but hardly ever more then 40 secs. Under 30 secs, it almost always was 25 secs or more. So I am not sure this is the right level, although it may be compensated by opening book moves etc.
In the first game it scored about the same as the Grandmaster, so I started wondering... but in testgame 2 it scored badly, so I stopped wondering ;).

Further, I finally found the sheet where I did (a few months ago) testgame 3 with the Excalibur Chess Station. There it was also obvious that level 16 is far too low for 30 secs/move. I tested then at which level it would play the mating threat move 21. c4, and that was level 18.

Time Level testing regards
Paul
2024 Special thread: viewtopic.php?f=3&t=12741
2024 Special results and standings: https://schaakcomputers.nl/paul_w/Tourn ... 25_06.html
If I am mistaken, it must be caused by a horizon effect...
User avatar
spacious_mind
Senior Member
Posts: 4018
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

paulwise3 wrote:Hi Nick,

I started testing the Saber IV at level 25. Looking at it's display I noticed that it regularly exceeds the 30 secs/move, but hardly ever more then 40 secs. Under 30 secs, it almost always was 25 secs or more. So I am not sure this is the right level, although it may be compensated by opening book moves etc.
In the first game it scored about the same as the Grandmaster, so I started wondering... but in testgame 2 it scored badly, so I stopped wondering ;).

Further, I finally found the sheet where I did (a few months ago) testgame 3 with the Excalibur Chess Station. There it was also obvious that level 16 is far too low for 30 secs/move. I tested then at which level it would play the mating threat move 21. c4, and that was level 18.

Time Level testing regards
Paul
Hi Paul,
When I was testing it with level 25 I had Novag Beluga next to me also testing at the same time. I decided on level 25 because overall the time was not any longer than other dedicateds average time. You could go less as well, but when you have other computers taking sometimes 1.30 - 2 minutes for a move I think level 25 is quite a fair compromise, when you take opening books into considerations as well.

Best regards
Nick
User avatar
scandien
Member
Posts: 206
Joined: Mon Sep 12, 2011 1:15 pm
Contact:

Post by scandien »

hello,
spacious_mind wrote:Regarding human and their familiarity with their computers. I look at it this way. When you sit at home drinking coffee and playing chess you can become world champion in your own mind. You are relaxed, you can take back moves, if you mess up you turn off the computer and no one knows it. You keep repeating the sequence of moves over and over again to win, and when you finally beat your computer repeatedly through your unnatural behavior of how you play your computer opponent which is totally different to what you would get away with when playing a human, you then treat the computer with contempt because you now feel that you are superior Smile

I tend to ignore peoples opinions of where a computer should be because of this, as I also have the same tendencies, it is normal.
I have to disagree with you Nick.... Of course, anybody can play with machine under the conditions you mentioned , but it is not mandatory.

When i am playing versus my machine i have using nearly tournament conditions :
- same time control for both opponent( always at least 1 hour per game per player, or 1hour +30 sec or even 40/1 when i have time).
- no take back allowed ( i am not playing to win at all cost but for real training),
- i am not using always the same move sequence ( but i am using my own opening so , as i have a quite little opening book, it may seems i am playing relative game,
- but yes i am playing at home with my coffee , and when i am playing a game i am in good condition and i am willing to play!
- when i can see that a computer will play always the same move sequence i will avoid playing with it ( this was the case for the MEPHISTO B&P) instead of changing my opening !
- i am playing tournament ( with sometime small match) with automatic selection of my opponents ( so that i can not choose my opponents).

In spite of this condition, the result versus weaker ( or stronger ) opponent are truncated.
i apply to following formula to my performance:
If my performance is between (opponent average_rating)+/- 60 then my normalized performance will be my Real Performance ( no adjustment)
Else the performance point over (or below) opponent average rating + 60 (opponent average rating -60) will be divided by 2.

This is to point that a pool of machine will be less efficient versus a stronger human , but more efficient versus a weaker human player (concerning result statistic).

This little formula seem to be working , and this imply that a machine will play always at the same level, and will be (nearly) unable to win versus a stronger human , and will (always nearly) crush weaker human opponent.



br

Nicolas
User avatar
paulwise3
Senior Member
Posts: 1508
Joined: Tue Jan 06, 2015 10:56 am
Location: Eindhoven, Netherlands

Post by paulwise3 »

Here the results of the Excalibur Saber IV. I also tested the Chess Station with testgames 2 and 3, and it made exactly the same moves, this confirms they are clones. Both played at level 25.

Code: Select all

Excalibur Saber IV/Chess Station
   1    2    3    4    5     avg
W 2252 2092 2301 1780 1851 | 2055
B 2168 1258 1611 2028 1839 | 1781
---------------------------+-----
A 2210 1675 1956 1904 1845 | 1918
So the overall score is surprisingly slightly better then that of the Krypton Regency.
But there has to come an agreement with other people about level 25 being the right level for 30secs/move to make this score relevant for a general active rating list I guess. I think the Emerald, and maybe also the Beluga, can not be used as a good comparison. Then you could better take the Executive (or another GK2000 clone), as it really seems to take account for the average of 30 secs/move.

Best regards,
Paul
2024 Special thread: viewtopic.php?f=3&t=12741
2024 Special results and standings: https://schaakcomputers.nl/paul_w/Tourn ... 25_06.html
If I am mistaken, it must be caused by a horizon effect...
User avatar
spacious_mind
Senior Member
Posts: 4018
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

paulwise3 wrote:Here the results of the Excalibur Saber IV. I also tested the Chess Station with testgames 2 and 3, and it made exactly the same moves, this confirms they are clones. Both played at level 25.

Code: Select all

Excalibur Saber IV/Chess Station
   1    2    3    4    5     avg
W 2252 2092 2301 1780 1851 | 2055
B 2168 1258 1611 2028 1839 | 1781
---------------------------+-----
A 2210 1675 1956 1904 1845 | 1918
So the overall score is surprisingly slightly better then that of the Krypton Regency.
But there has to come an agreement with other people about level 25 being the right level for 30secs/move to make this score relevant for a general active rating list I guess. I think the Emerald, and maybe also the Beluga, can not be used as a good comparison. Then you could better take the Executive (or another GK2000 clone), as it really seems to take account for the average of 30 secs/move.

Best regards,
Paul

Hi Paul,

Its a hard topic to know what is fair and unfair. I know of lots of computers specifically starting at around that ELO 1900 range and below that don't really keep to their specified times. Emerald and Beluga are just a couple of examples from many and the problem is not just exclusive to Novag. Therefore since they are all listed with an Active rating why would you not give the Excalibur's as close as possible the same level setting especially since it increases their game quality to make them interesting in U1700/U1800 tournaments.

Comparing the setting to a Morsch is even more unrealistic as the Morsch's are 200 ELO higher in the ratings. At that level of play most of the computers were already designed to stick to level settings more accurately.

Level 23 or 24 is probably as low you could go before you start to become really unfair again when comparing it to its peers. You mentioned level 18 earlier. You do know that with level 18 you struggle to get 20 seconds before it moves most of the time?

Best regards
Nick
User avatar
spacious_mind
Senior Member
Posts: 4018
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

scandien wrote:hello,
spacious_mind wrote:Regarding human and their familiarity with their computers. I look at it this way. When you sit at home drinking coffee and playing chess you can become world champion in your own mind. You are relaxed, you can take back moves, if you mess up you turn off the computer and no one knows it. You keep repeating the sequence of moves over and over again to win, and when you finally beat your computer repeatedly through your unnatural behavior of how you play your computer opponent which is totally different to what you would get away with when playing a human, you then treat the computer with contempt because you now feel that you are superior Smile

I tend to ignore peoples opinions of where a computer should be because of this, as I also have the same tendencies, it is normal.
I have to disagree with you Nick.... Of course, anybody can play with machine under the conditions you mentioned , but it is not mandatory.

When i am playing versus my machine i have using nearly tournament conditions :
- same time control for both opponent( always at least 1 hour per game per player, or 1hour +30 sec or even 40/1 when i have time).
- no take back allowed ( i am not playing to win at all cost but for real training),
- i am not using always the same move sequence ( but i am using my own opening so , as i have a quite little opening book, it may seems i am playing relative game,
- but yes i am playing at home with my coffee , and when i am playing a game i am in good condition and i am willing to play!
- when i can see that a computer will play always the same move sequence i will avoid playing with it ( this was the case for the MEPHISTO B&P) instead of changing my opening !
- i am playing tournament ( with sometime small match) with automatic selection of my opponents ( so that i can not choose my opponents).

In spite of this condition, the result versus weaker ( or stronger ) opponent are truncated.
i apply to following formula to my performance:
If my performance is between (opponent average_rating)+/- 60 then my normalized performance will be my Real Performance ( no adjustment)
Else the performance point over (or below) opponent average rating + 60 (opponent average rating -60) will be divided by 2.

This is to point that a pool of machine will be less efficient versus a stronger human , but more efficient versus a weaker human player (concerning result statistic).

This little formula seem to be working , and this imply that a machine will play always at the same level, and will be (nearly) unable to win versus a stronger human , and will (always nearly) crush weaker human opponent.



br

Nicolas
Hi Nicolas,

I am not doubting your honesty. The point is that it is hard to substantiate games that are played at home. There is no one sitting opposite you to verify the match conditions as you would in a club. And I am being general here and not discounting what you do or some other people do, which may well be perfect.

However you still get used to your opponent's responses through repeated practice of your opening repertoire that allows you to replay moves that you found successful against the computer you are playing. Which in time also increases your own ELO and lowers your computer's ELO, therefore you would still be inflating your rating and reducing the computer's rating.

In club matches you may meet your opponent twice a year and therefore it is a little harder to prepare for him especially as he may have changed his openings and practiced to play against you as well since the last time you played him. The equilibrium remains more equal. Playing a human maybe 10 times over 5 years means that you have not learned everything you need to know of your human opponent whereas your computer you have learned to play against by heart in that period :)

A fun test would be, with closed eyes, try to randomly pick an opening from one of your 5 ECO - Encyclopedia of Chess Openings volumes, open any page and point a finger to an opening. Then set it up and play the computer :) Because that is probably closer to what a human player would face when playing a computer in a club. :wink:


Best regards
Nick
User avatar
scandien
Member
Posts: 206
Joined: Mon Sep 12, 2011 1:15 pm
Contact:

Post by scandien »

spacious_mind wrote:However you still get used to your opponent's responses through repeated practice of your opening repertoire that allows you to replay moves that you found successful against the computer you are playing. Which in time also increases your own ELO and lowers your computer's ELO, therefore you would still be inflating your rating and reducing the computer's rating.
May be but not so sure... With my pairing system ( a tournaments cycle in several year) , i am not playing the same opponent more than once per year ( and some opponent was not encountered for several years). the inflating/deflating process should be reduced in this case.

A second interest of my method is to play compute becoming stronger with time ( as the tournament progress)

Of course i agree with you , as i am playing my own opening , i am get more and more confident with it.
But this is the same versus human... i take several year to build my opening book , and now i have few line , but i have a good knowledge on those line. Even versus human ( it can play from time to time versus human), i am laying the same opening.
May be i should test on FICS versus Human to check for a chess skill improvement or not.
spacious_mind wrote: In club matches you may meet your opponent twice a year and therefore it is a little harder to prepare for him especially as he may have changed his openings and practiced to play against you as well since the last time you played him.

isn't it the same thing when you are facing several computer ?
here are the game plaid versus computer during 6 years :

Constellation 3.6 2010 ->1 , 2014 -> 4
Excalibur GrandMaster 2014 ->1 , 2015 ->1
Excellence 2015 ->1
Fritz 1 DosBox 2009 ->1
Mephisto B&P 2011 ->3
Mephisto Europa 2009 -> 2 , 2011 ->1
Mephisto Mirage 2011 -> 3
Mephisto MM II 2015 -> 1
Mephisto MM IV 2014 -> 4 , 2015 -> 1
Mephisto MM V 2015 ->1
Mephisto Polgar 2015 -> 2
Mephisto Rebell 5 2011 -> 6, 2015 -> 1
Mephisto RomaII 2011 ->1 , 2016 ->3
Mirage 2.07 DosBox 2009 -> 1
Naum 1.8 Palm 2009 ->1
Novag Super VIP 2011 -> 1
Psion 2.17 DosBox 2009 -> 1
Saitek CORONA 2015 -> 1
Sapphire 2011 -> 1
Sargon 5 DosBox 2009 -> 2
Scysis Turbo 16K 2011 -> 4
Siberian 2.15 DosBox 2009 -> 1
TogaIIv1.1a Palm 2009 -> 1
Turbo Advanced Trainer 2009 -> 1 , 2010 -> 9, 2011 -> 1
Yeno 416 XL 2011 -> 5
Zarkov 2.50 2009 -> 1
Zircon 2009 -> 1 2015 -> 1
As you can see, nd on my view, i cannot adapt to a specific machine, no time to do so , as i am varying opponent as much as possible .... i precise that i am playing only at longer time control ( never action chess).

And you says that i cannot try to prepare the match , but my opponent can ? Why not but i think ( i have plaid in club and tournament) that at low level, you cannot prepare versus your opponents as you don't have enough data to do so ( at master's level this is obviously not the same).


spacious_mind wrote: A fun test would be, with closed eyes, try to randomly pick an opening from one of your 5 ECO - Encyclopedia of Chess Openings volumes, open any page and point a finger to an opening. Then set it up and play the computer Smile Because that is probably closer to what a human player would face when playing a computer in a club.
Not really. Only computer can select random opening.
As you know, most chess player are using only few opening or variation (and even master are used to play specific or favorite opening).
As i am always beginning my game with 1.e4, you cannot force me to play a Queen Gambit with white ( or even black as i am playing mostly King Indian). Versus each black answer on 1.e4 , i select a variation which i can mastered. This would be the same whatever the opponent is a computer or a human.

I am not used to play Queen gambit as i discarded this line. So i am not use to plan in position issuing from this opening, and i cannot get good result with this line ( versus human opponent i had very bad result in the past).

Best regards
User avatar
paulwise3
Senior Member
Posts: 1508
Joined: Tue Jan 06, 2015 10:56 am
Location: Eindhoven, Netherlands

Post by paulwise3 »

spacious_mind wrote: Level 23 or 24 is probably as low you could go before you start to become really unfair again when comparing it to its peers. You mentioned level 18 earlier. You do know that with level 18 you struggle to get 20 seconds before it moves most of the time?
Hi Nick,

At that time I did not really test the levels for 30 secs/move. It was only too obvious that level 16 was not right. It is only that I think level 25 may be just a little too high.
I agree that it is not fair like the Emerald taking so much more time. In normal gameplay it does come to an acceptable move average. So the way the Emerald program works may not be suitable for this kind of test.
On the other hand, I played a small match between Emerald and Executive, and the Emerald leads by 3 - 1, only two draws for the Executive. But that may just prove that the Executive does not like the Emerald's playing style ;-)

But I do agree that we should take some action to re-assess the playing levels for the Saber IV and the likes.

Re-assessing regards,
Paul
2024 Special thread: viewtopic.php?f=3&t=12741
2024 Special results and standings: https://schaakcomputers.nl/paul_w/Tourn ... 25_06.html
If I am mistaken, it must be caused by a horizon effect...
User avatar
paulwise3
Senior Member
Posts: 1508
Joined: Tue Jan 06, 2015 10:56 am
Location: Eindhoven, Netherlands

Post by paulwise3 »

Hi Nick,

Right now I am testing the Krypton/Systema Challenge with all features set to 20. A long while ago I already did it for the Concerto, will send you both in time. As may be expected the attacking values are a little better and defending a little worse, but on average the Concerto came out about the same as with the standard settings. I just finished the first two testgames for the Challenge, and in both games it scored slightly better then the Concerto, but with the same characteristics. Will be interesting to see the other three tests :-)

Testing regards,
Paul
2024 Special thread: viewtopic.php?f=3&t=12741
2024 Special results and standings: https://schaakcomputers.nl/paul_w/Tourn ... 25_06.html
If I am mistaken, it must be caused by a horizon effect...
User avatar
paulwise3
Senior Member
Posts: 1508
Joined: Tue Jan 06, 2015 10:56 am
Location: Eindhoven, Netherlands

Post by paulwise3 »

As I lately got an Analyst module for my Galileo, I preferred to test it before finishing the Challenge :-).
It scored quite nice, thanx to a relatively good score for testgame 4:

Code: Select all

Saitek Galileo D+ 6 MHz
       1    2    3    4    5     avg
White 2836 2072 2444 1764 2102
Black 1980 2102 2033 2120 2019
Avg   2408 2087 2239 1942 2061 | 2147

Satisfied regards,
Paul
2024 Special thread: viewtopic.php?f=3&t=12741
2024 Special results and standings: https://schaakcomputers.nl/paul_w/Tourn ... 25_06.html
If I am mistaken, it must be caused by a horizon effect...
Post Reply