A new Ratings list

Dylan Sharp · Post by **Dylan Sharp** » Tue Apr 01, 2008 3:48 am

Are human ratings inaccurate? Because humans also keep learning...

Graham Banks · Post by **Graham Banks** » Tue Apr 01, 2008 3:54 am

Dylan Sharp wrote:Are human ratings inaccurate? Because humans also keep learning...

Different because each human is unique and learns on his or her own with the one brain.

The same engine accumulating learning on different computers at different rates playing other engines with little or no learning or learning also accumulated at different rates means that you're literally testing several modified versions that are unique to each other.

Regards, Graham.

PortCitySlim · Post by **PortCitySlim** » Tue Apr 01, 2008 4:37 am

Harvey Williamson wrote:I think very few customers have a 64bit OS and it will stay that way for a long time. I think many now have a dual core system but very few other than us Computer Chess geeks are running a 64bit os - so for the majority of customers 32bit results are more interesting. Surely the main purpose of any ratings list is to show what you can expect on the average hardware of most customers not the geeks.

I'm new to chess engines and I own a 64bit OS(Vista Ultimate). I just bought Deep Hiarcs 12 from Chessbase and I hope that I am not disappointed especially since it was the most expensive GUI+UCI that I found, I finally choose H12 because of its playing style and that Chessbase put it together with their Fritz 11 GUI. If I had just wanted the very strongest engine out there I would have bought Rybka but I will wait and learn more about chess engines and then when Rybka 3 comes out I will step up to that level.

Shaun Brewer · Post by **Shaun Brewer** » Tue Apr 01, 2008 11:09 am

You will not be disappointed Hiarcs is a great unique engine – while the improvement over the H11 series my not be as great as we hoped - it is an improvement!

My guess is we will see better results at Blitz however I am hopeful that some setting tweaks will be significant at longer time controls.

So far Sharpen PV on does not appear to be the silver bullet however more testing is required.

I expect that the Hiarcs testers will be trying longer time controls and if promising settings are discovered I will probably be the first to test for CCRL….

Shaun

Krazyken · Post by **Krazyken** » Sat Apr 05, 2008 12:02 pm

Graham Banks wrote:
Dylan Sharp wrote:Are human ratings inaccurate? Because humans also keep learning...
Different because each human is unique and learns on his or her own with the one brain.

The same engine accumulating learning on different computers at different rates playing other engines with little or no learning or learning also accumulated at different rates means that you're literally testing several modified versions that are unique to each other.

Regards, Graham.

I would seriously doubt that the differences between each of the modified versions would exceed the margin of error found in the rating. All of the versions are running the same book and the same learning code, thus they are very likely to learn the same things. The biggest variation will be the order in which things are learned, which I would expect the differences to even out over a large number of games and opponents. The key to keeping such a list I expect would be to keep replaying the matches over time to see which engines benefit the most from learning.

Of course it would be really cool if the learning from different computers could be combined, and updated from time to time.

And of course human ratings are inaccurate! Very few have played the thousands of games needed for statistical significance!

mackgra · Post by **mackgra** » Sat Apr 05, 2008 9:44 pm

Krazyken wrote: Graham Banks wrote:
Dylan Sharp wrote:
Are human ratings inaccurate? Because humans also keep learning...

Different because each human is unique and learns on his or her own with the one brain.

The same engine accumulating learning on different computers at different rates playing other engines with little or no learning or learning also accumulated at different rates means that you're literally testing several modified versions that are unique to each other.

Regards, Graham.

I would seriously doubt that the differences between each of the modified versions would exceed the margin of error found in the rating. All of the versions are running the same book and the same learning code, thus they are very likely to learn the same things. The biggest variation will be the order in which things are learned, which I would expect the differences to even out over a large number of games and opponents. The key to keeping such a list I expect would be to keep replaying the matches over time to see which engines benefit the most from learning.

Of course it would be really cool if the learning from different computers could be combined, and updated from time to time.

And of course human ratings are inaccurate! Very few have played the thousands of games needed for statistical significance!

The whole position learning thing is very interesting. I'm personally slightly sceptical as to the true benefit the learning files bring. Not being
a programmer I don't really know HOW this parameter works. However past learning files have been limited in size. For instance a past Hiarcs version had only 64Kb devoted to the learning file where as the engine Shredder had 24Mb devoted to learning. Now my question is what happens when the file(s) becomes full? Presumably any new information to be stored will be written over old information thus destroying in the process the old information.

In the later versions of Crafty the position learning was compiled out because Prof Robert Hyatt considered it as not worth the effort. I would like to see some examples where position learning helps and also if this then helps the engine increase its playing strength.

We never stop learning regards

Graham

Peter Grayson · Post by **Peter Grayson** » Sun Apr 06, 2008 9:35 pm

Graham Banks wrote:
Dylan Sharp wrote:Are human ratings inaccurate? Because humans also keep learning...
Different because each human is unique and learns on his or her own with the one brain.

The same engine accumulating learning on different computers at different rates playing other engines with little or no learning or learning also accumulated at different rates means that you're literally testing several modified versions that are unique to each other.

Regards, Graham.

Graham,
There are too many comparisons between computer chess and human chess which is wrong because they are fundamentally different. Emotional and psychological states of player and opponent together with an assessment of opponents style cannot be taken into account in engine games but may have a significant impact on human games. I believe there should be a completely different rating system for chess computers and engines rather than try and use the Elo system that doesn't always work too well for humans either.

As far as engine learning is concerned then it is in the same category as book learning. If the engine author's intention is for an engine to learn positions from a game that is fine but for a particularly match the best approach is to start with an empty learning file just as a new opening book should be used.

This is a very difficult topic because even the use of a book permits pre-learnt opening lines that the engine may not have otherwise have played. Many of the sharp lines have ideas that preclude anything the engine would likely find during its normal play. Similarly though the engine is a good tool for sharpening many lines that contain human errors. So just how do you set the rules and where do you draw the line in permitted added knowledge?

So the use of any opening book is giving an engine a knowledge that has been gained elsewhere so even the so called neutral lines used by the Test Houses that probably don't go anywhere far enough in a particular line also add knowledge that may favour one engine more than another. Who is qualified to choose these line? Most of the engines on today's hardware are probably stronger chess players than those who decided which lines should be used!

For analysing positions using several engines is probably best e.g. Rybka is good in the middle game but may miss some tactical shot. I wouldn't rely on Rybka 2.3.2a's analysis in the endgame, particularly if bishops are involved giving its well known evaluation errors in bad bishop endings.

So for engine versus engine testing it is probably best to stick with the commercially released package. For analysis that may be something different. A good analysis engine isn't necessarily a good playing engine because the judgment criteria is different. So the whole engine testing system needs a major review but I doubt if an all encompassing system of test could be achieved that would keep everyone happy!

PeterG

Andreas Guettinger · Post by **Andreas Guettinger** » Sat Apr 12, 2008 12:42 pm

Harvey Williamson wrote:I think very few customers have a 64bit OS and it will stay that way for a long time. I think many now have a dual core system but very few other than us Computer Chess geeks are running a 64bit os - so for the majority of customers 32bit results are more interesting. Surely the main purpose of any ratings list is to show what you can expect on the average hardware of most customers not the geeks.

I don't quite agree with that. Outside the Windows world, the OS are mainly 64 bit, i.e. Linux and Mac OS. Windows is lagging behind but people will realize this over time.
Thus, many new chess engines are designed to run best on 64-bit. By using bitboards the programmers invest time and effort to make them run best on 64-bit to be prepared for the future even though they take a performance penalty on 32-bit systems.
The 64-bit version of Rybka is around 1.8 times faster than the 32-bit version. This is approximately the speed increase of dual CPU vs. single CPU. I have not read what time controls were used, but on short time controls this would be a huge disadvantage for Rybka, on long time controls the disadvantage should be less prominent.
Although I'm very glad that Hiarcs can hold it's par against Rybka 32-bit, I still think there is some way to go to reach Rybka 64-bit. Especially on 4 CPU, where it seems that the improvements of Hiarcs 12 are not so prominent like in the single CPU version.