Computer ratings vs humans

This forum is for general discussions and questions, including Collectors Corner and anything to do with Computer chess.

Moderators: Harvey Williamson, Steve B, Watchman

Forum rules
This textbox is used to restore diagrams posted with the fen tag before the upgrade.
Brian B
Member
Posts: 74
Joined: Mon Jun 09, 2014 10:37 pm

Computer ratings vs humans

Post by Brian B »

Which computers are better at playing humans than other computers? Which computers are better at playing other computers than humans?

In looking at the calibrating results put together by the SSDF, found here: http://privat.bahnhof.se/wb432434/level.htm I thought it was interesting how some computers were a lot better at playing humans than other computers, even when considering the lowering of computer ratings by 100 points. OTOH, some computers were much better at playing other computers. Maybe I didn't read this right, but here are some of the examples:

Better vs Humans:

Fidelity Mach III at 2067 vs humans, 1993 vs other computers
Fidelity Mach IV at 2177 vs humans, 2074 vs other computers
CXG Sphinx Galaxy 1999 vs humans, 1880 vs other computers(used Modena ELO number)

Better against computers:

Novag Diablo at 1975 vs humans, 2100 vs other computers
Fidelity Excel Club 68000 at 1774 vs humans, 1857 vs other computers
Novag Forte B at 1861 vs humans, 2026 vs other computers

Granted, in many cases the results are based on only a handful of games, so with more games played I am sure we would see some of the data smooth out. It is a shame we can't get more computer vs human data out there to see which computers play humans better or tougher than other computers. Especially for those people that buy computers to actually play them more so than to have their computers play other computers.

Only Human Regards,
Brian B
Larry
Senior Member
Posts: 2269
Joined: Wed Aug 01, 2007 8:42 am
Location: Gosford, NSW Australia

Post by Larry »

Running my eye down the list I got a surprise at the ratings of both
Fidelity Excellence and Super Connie.
Excellence -> humans 1578
-> comps 1760
....about 180 elo difference

Super Connie -> humans 1555
-> comps 1730
....similar difference.
I read somewhere that the ratings of comps versus humans does drop
back over time as the humans figure out how to beat the comps, and
just intuitively go on beating them pretty much the same way. If those
same humans were forced to play various openings against the comps
the comps would fare better.
L
Mike Watters
Member
Posts: 429
Joined: Fri Sep 26, 2008 12:31 pm
Location: Milton Keynes
Contact:

Post by Mike Watters »

From 1987 to April 2005 Eric Hallsworth published Human v Computer results in Selective Search alongside his Computer v Computer ratings. There were several thousand Human games recorded with some machines having 200+ games.

As far as I know that is the best source of Human v Computer ratings there is, and it always surprised me a little that more use was not made of them. In connection with the Strong Tournament on my website which I am about to start I include the v Human ratings in the tables.

http://www.chesscomputeruk.com/html/strong.html

Generally speaking the v Human ratings are higher, but for some machines the difference is 100+.

Most later Fidelity's show up well eg
Mach 2C 1918/2059
Mach 3/Designer 2265/V2 1985/2107
Mach 4/Designer 2325/V7 2076/2179

whereas most Lang programs (not Amsterdam) have v Human ratings closer to the v Computer ratings.

All the best
Mike
Brian B
Member
Posts: 74
Joined: Mon Jun 09, 2014 10:37 pm

Post by Brian B »

Larry wrote: I read somewhere that the ratings of comps versus humans does drop
back over time as the humans figure out how to beat the comps, and
just intuitively go on beating them pretty much the same way. If those
same humans were forced to play various openings against the comps
the comps would fare better.
L
I agree. Sometimes the same opening variations seem to work against different computers, and I will tend to avoid other opening variations where I get crushed. Recently, I found that playing 3.f3 in the Caro Kann seemed to work well, with a win against Par Excellence and Diamond, but not as well against Saitek D+. I think the same thing could be said if you played the same player over and over again at the club. Avoid their strengths and play whatever works.

Tartakower regards,
Brian B
Brian B
Member
Posts: 74
Joined: Mon Jun 09, 2014 10:37 pm

Post by Brian B »

Mike Watters wrote:From 1987 to April 2005 Eric Hallsworth published Human v Computer results in Selective Search alongside his Computer v Computer ratings. There were several thousand Human games recorded with some machines having 200+ games.

As far as I know that is the best source of Human v Computer ratings there is, and it always surprised me a little that more use was not made of them. In connection with the Strong Tournament on my website which I am about to start I include the v Human ratings in the tables.

http://www.chesscomputeruk.com/html/strong.html

Generally speaking the v Human ratings are higher, but for some machines the difference is 100+.

Most later Fidelity's show up well eg
Mach 2C 1918/2059
Mach 3/Designer 2265/V2 1985/2107
Mach 4/Designer 2325/V7 2076/2179

whereas most Lang programs (not Amsterdam) have v Human ratings closer to the v Computer ratings.

All the best
Mike
Thanks for posting this information. Your website is an excellent resource! Looking forward to your tournament, should be interesting to see how it turns out.

I wonder why the later Fidelity units are so much better against Human opponents?

Thanks,
Brian B
User avatar
spacious_mind
Senior Member
Posts: 3999
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

Mike Watters wrote:From 1987 to April 2005 Eric Hallsworth published Human v Computer results in Selective Search alongside his Computer v Computer ratings. There were several thousand Human games recorded with some machines having 200+ games.

As far as I know that is the best source of Human v Computer ratings there is, and it always surprised me a little that more use was not made of them. In connection with the Strong Tournament on my website which I am about to start I include the v Human ratings in the tables.

http://www.chesscomputeruk.com/html/strong.html

Generally speaking the v Human ratings are higher, but for some machines the difference is 100+.

Most later Fidelity's show up well eg
Mach 2C 1918/2059
Mach 3/Designer 2265/V2 1985/2107
Mach 4/Designer 2325/V7 2076/2179

whereas most Lang programs (not Amsterdam) have v Human ratings closer to the v Computer ratings.

All the best
Mike
Hi Mike,

Excellent, I am really excited to see you doing such a class tournament.

Best regards,

Nick
Nick
IvenGO
Member
Posts: 298
Joined: Tue Oct 18, 2011 5:37 am
Location: Moscow, Russia

Post by IvenGO »

It also depends much of playing style: Im at least equal vs Mephisto Atlanta, but Novag Star Diamond made me loose a lot with very few wins/draws, so I think I score about 15% against it... no matter that SD is rated ~100point lower than MA...
User avatar
scandien
Member
Posts: 206
Joined: Mon Sep 12, 2011 1:15 pm
Contact:

Dedicated versus internet

Post by scandien »

I have perform some test on Internet to be able to evaluate the real level of the dedicated Computer Chess :


Every machine will have tu run 16 games versus Human on FICS server,
With the result and performance , i will be able first to check if the different rating list are relevant or not ,
Analysis of the games , will enable me to check if the machine's performance is relevant or not,
I then will be able to give a level to the machine, and to check those level with the level found during the 80's.


Process
All the game will be run on FICS. The level on FICS can be easily compare with the FIDE Level ( nearly the same level if under 2000) .. Compare the two rating system here .
The game will be run with 30 minutes per game ( i select the nearest machine level ).Side will be select randomly by FICS , and the level of the opponent will be as compatible as possible.

Before the test i was already expecting some performance from the machine. The AktiveSchach Rating List was base on 80's performance and if we take into account a inflation àof the ELO (120 point ) we can probably anticipate the actual level of the machine in the 21's.


The Results :

MEPHISTO MIRAGE ( MEPHISTO II based program on better hardware)
AKTIVESCHACH RATING : 1556
FICS PERFORMANCE: 1795
Comments : The MIRAGE play in a human like way. If his opening book is weak , he have a good general knowledge and is able to handle strong opponent. In spite of his age , the MEPHISTO MIRAGE is adapt to club player ( or even strong club player)

MEPHISTO MMII :
AKTIVESCHACH RATING : 1782
FICS PERFORMANCE: 2021
Comments : a machine containing the Conchess program. Very strong in tactic, the MMII is able to handle opening quite well, and is not so weak in the end game. His very good performance is due to the fact that he outclass any opponent below 2000! His main weakness is the handle of close position where the machine have no idea about the move to move.
A good machine for strong club player

NOVAG SUPER VIP:
AKTIVESCHACH RATING : 1770
FICS PERFORMANCE: 1808
Comments : The SUper VIP play positional chess even if it is able to enter dubious variant, probably due to problem in tactic. Generally the position reached by the VIP are good with free weakness.

RadioShack Chess Champion 2150L
AKTIVESCHACH RATING : 1777
FICS PERFORMANCE: 1989
Comments : the CC2150L is very agressive and any player should take care about weak point on his position otherwise the CC2150L would exploit it easily. But if th eplayer take care about lowering his weakness, there is no problem for him !
This machine is made for strong club players !

SCYSIS TURBO 16K
AKTIVESCHACH RATING : 1617
FICS PERFORMANCE: 1762
Comments : The Scysis Turbo 16K have a good style of play and have generally good position. But he have to difficulties to evaluate the King Safety and the effect of pinning ! So versus good player , it has no real chance to win. The Turbo 16K is a good opponent for a club player.

MEPHISTO EUROPA
AKTIVESCHACH RATING : 1717
FICS PERFORMANCE: 1809
Comments : The MEPHISTO EUROPA have a really solid style of play , but have some weakness in tactic. But only strong player can take advantage of those weakness. The Level of Europa is nearly those of a strong club player !

Constellation 3.6
AKTIVESCHACH RATING : 1771
FICS PERFORMANCE: 1840
Comments :

EXCELLENCE
AKTIVESCHACH RATING : 1780
FICS PERFORMANCE: 1901
Comments : the excellence have a good tactical and agressive style of play. He is able to defeat strong player and fit well to good club player .



MEPHISTO REBELL 5.0
AKTIVESCHACH RATING : 1843
FICS PERFORMANCE: 1928
Comments : In Spite of weakness in the end game and material greed during opening ( sometime it lose a game due to this greed) the REBELL is a strong opponent , and can be opposed to strong club player.



KRIPTON REGENCY
AKTIVESCHACH RATING : 1780
FICS PERFORMANCE: 1874
Comments : a very surprising style of play, where the regency can find good solution in very closed position ! he regency have some weakness in tactic and this is why it can lose some game. his average level is near those of strong club player!

BR

Nicolas
User avatar
Steve B
Site Admin
Posts: 10140
Joined: Sun Jul 29, 2007 10:02 am
Location: New York City USofA
Contact:

Post by Steve B »

scandien wrote: Ever machine will have tu run 16 games versus Human on FICS server,

Interesting approach
can I ask you...
how did you verify you were really playing against a human?
I play a lot of fast games on Playchess..I used to play a lot on Fics and ICC as well
my guess is that about 2/3-3/4 of the guests/members I play are cheating with a computer..even if they are playing as a guest..even in unrated games


Humanoid Regards
Steve
User avatar
Monsieur Plastique
Senior Member
Posts: 1014
Joined: Thu Jul 03, 2008 9:53 am
Location: On top of a hill in eastern Australia

Post by Monsieur Plastique »

Larry wrote: I read somewhere that the ratings of comps versus humans does drop back over time as the humans figure out how to beat the comps
For my own part, a more philosophical question is whether, for example, a strong amateur player (say FIDE 1950 and higher) is the same strength today as they were, say 35 years ago.

The "official" rating system of course only works on the relative strength of humans against other humans and not absolute strength per se. My feeling (unfortunately not substantiated in any way) is that humans today are significantly stronger than similarly rated humans in years gone by when the influences of computer strength upon human play was not a significant factor - at least for club level players and higher (obviously not novices).

Obviously a dedicated computer has precisely the same playing strength today as it did when it first came off the production line unless it has been modified in some way. I don't necessarily buy the argument that humans have "learned" to beat them - at least in the last two decades at any rate - and that this is totally responsible for a ratings drop. Certainly that is some of the reason and I myself admit to exploiting computer weakness to win (in fact just in the game I played last week where I left a pawn en prise knowing that if the computer took it, the resulting combination was outside of it's selective search horizon). But I don't know whether humans crushing computers is that statistically significant - personally I lose my fair share of games even though I am fully aware of the weaknesses of computers.

Anyway, I feel that if I personally went back and played machines I have not played for decades (and thus might have forgotten about precise weaknesses), I think I might fair better, but only because having played computers and engines for so long, I am simply a better player. But my rating won't be higher because other humans have done precisely the same thing and have also improved accordingly.
Chess is like painting the Mona Lisa whilst walking through a minefield.
xchessg
Member
Posts: 173
Joined: Thu Jul 28, 2011 7:09 pm

Post by xchessg »

Monsieur Plastique wrote:
For my own part, a more philosophical question is whether, for example, a strong amateur player (say FIDE 1950 and higher) is the same strength today as they were, say 35 years ago.
Nothing philosophical about it. There has been an ELO inflation. A 1950 player in the 90' is to be compared with about 2020 something today.

I don't know about the positive effects of computer chess on the human play though. For sure, especially opening play has become way more dynamic today with all that computer-assisted opening (and middlegame...) analysis. Searching through large reference databases has also helped a lot. I played competitive chess from the 80' on. Opponents those days played the opening like patzers, but they came up with a lot of ideas during the game. Nowadays opponents play the opening like a god, but the follow-up isn't always that pretty to watch...

Anyway, concerning ELO-rating: One needs about 600+ games to land on a rating that is worth noting in a List. Even the actual computer-computer rating lists contain "ratings" :roll: based on way to few games/opponents....

I have to say, it is often appaling to see just how hung up some people are on a ELO rating. After all, ist only serves to divide a large player field into categories for organisational purposes in case of competitions limited to a number of rounds. To me, a difference in elo strenght of say + - 100 points is meaningless (I'm 2098). I know club players that refuse to play rated games because they fear to lose some points... The system wasn't designed with this in mind I dare say.

Down with ELO!

Xavier
User avatar
Steve B
Site Admin
Posts: 10140
Joined: Sun Jul 29, 2007 10:02 am
Location: New York City USofA
Contact:

Post by Steve B »

xchessg wrote:
Anyway, concerning ELO-rating: One needs about 600+ games to land on a rating that is worth noting in a List. Even the actual computer-computer rating lists contain "ratings" :roll: based on way to few games/opponents....
Well 600 games seems a bit harsh to me
to this day the SSDF rating lists are one of the most highly respected rating lists in the world(not to mention the longest continuously published list)..many of their rating's are based upon less then 600 games
if you review their lists you can see that Generally speaking ...
600 games will produce a standard rating deviation of about +-30 Elo
200 games will produce a standard rating deviation of about +-50 Elo

for me...knowing that a dedicated chess computer has a rating that is accurate within 50 Elo pts is more then sufficient



50 Something Regards
Steve
User avatar
Monsieur Plastique
Senior Member
Posts: 1014
Joined: Thu Jul 03, 2008 9:53 am
Location: On top of a hill in eastern Australia

Post by Monsieur Plastique »

Steve B wrote:Well 600 games seems a bit harsh to me
Agreed Steve. It's been the case with my own and Cameron's extensive games over the years that you don't even need anything remotely like that to achieve a reliable and stable rating, especially with computers. That said, it is of course nice to have a massive database because the theoretical deviations are much smaller.

If I look at rating averages taken from our own private lists and the SSDF, I typically find that even in ad-hoc 10 or 12 game encounters, the outcome statistically matches the rating differences except for those cases where machines have specific difficulties against others.

And in that very long winded rating test I did for the Nintendo DS Fritz where I played a 100 x 40/2 games, the difference between the rating it earned after 40 games versus the rating it earned after the full 100 games was a mere 2 points.

I agree with the SSDF philosophy where a rating is considered publishable after 100 games so long as the deviations are noted (and they are). For me, based on experience what is much more critical is that there is an adequate variety of opponents in and around the vicinity of the test subject's estimated rating (which should be determined by preliminary testing). This reduces the exaggerated effect of computer rating differences and allows for the effects of "difficult" and "easy" opponents that in absolute terms are actually similar in strength.

Of course, none of this is new and Larry Kaufmann mentioned the importance of this approach in the early days of the Computer Chess Reports.
Chess is like painting the Mona Lisa whilst walking through a minefield.
User avatar
ricard60
Senior Member
Posts: 1285
Joined: Thu Aug 09, 2007 2:46 pm
Location: Puerto Ordaz

Post by ricard60 »

The only two dedicated chess machines that where ever rated by an offical federation were the fidelities Mach 3 (2265) and Mach 4 (2325) they compete under official tournament conditions of the USCF against rated humans and they got those elo´s. It would be very interesting if 20 years later they would be aloud to compete again in the USCF tournaments under same conditions and there we will see if their rating against humans have changed. Of course machines will play at the same strength allways, the test here will show if humans of today play the same elo of humans 20 years ago.

Elo regards
Ricardo
User avatar
scandien
Member
Posts: 206
Joined: Mon Sep 12, 2011 1:15 pm
Contact:

Post by scandien »

Steve B

Interesting approach
can I ask you...
how did you verify you were really playing against a human?
I can't be sure that i plaid only humans of course! but I think that if a playe rwas using computer chess program , the older machine would have been really crushed ... So i am quite confident with this .
Monsieur Plastique & Larry :

I read somewhere that the ratings of comps versus humans does drop back over time as the humans figure out how to beat the comps


For my own part, a more philosophical question is whether, for example, a strong amateur player (say FIDE 1950 and higher) is the same strength today as they were, say 35 years ago. [/b]

First , during my test my opponent were not aware that they ware facing old dedicated computer chess. This is the only way to check the real level of the machine ( otherwise human will use special maneuver vs computer !). I know this is not very honest but this was made only to test my machine, and i defeat every machine during match ( the goal was not to win some Elo Rating during the test).

Second, the machines perform at a level of 100-120 higher than the Aktive schach rating list ( based/tuned on man vs machine match during the 80's). If we check at Rod Edwards' page we can see that the machine performance is compliant with the older level if we add 100-120 point to the old rating. I suppose that this is a way to prove the Elo Inflation !). The machine are playing exactly as in the 80's but can defeat 'better' players (or players with a better rating)!

the page indicated the Elo inflation ( with a possible explanation)
http://members.shaw.ca/redwards1/


a last information :
except for two machine ( MEPHISTO MIRAGE & Kryton REGENCY) the machine are playing at a determined level.player below 50-100 point under the machine are crashed by the machine ( nearly no human victory) and player above 50-100 poubnt above the machine win nearly all the game. This behavior is not compliant with human rating , but indicates that the machine are not limted by game condition, stress or illness... Always the same level.
Strangely MEPHISTO MIRAGE & Kryton REGENCY seem to have a human behavior!

Best Regards

Nicolas
Post Reply