Our dedicated chess computers in fact 300-350 elo weaker ??
Moderators: Harvey Williamson, Steve B, Watchman
Forum rules
This textbox is used to restore diagrams posted with the fen tag before the upgrade.
This textbox is used to restore diagrams posted with the fen tag before the upgrade.
- mclane
- Senior Member
- Posts: 1605
- Joined: Sun Jul 29, 2007 9:04 am
- Location: Luenen, germany, US of europe
- Contact:
Our dedicated chess computers in fact 300-350 elo weaker ??
I was confronted with the idea that our dedicated chess computer
Have in fact a 300-350 elo lower rating then ssdf suggests.
E.g. Mm5 1575 elo.
Have in fact a 300-350 elo lower rating then ssdf suggests.
E.g. Mm5 1575 elo.
What seems like a fairy tale today may be reality tomorrow.
Here we have a fairy tale of the day after tomorrow....
Here we have a fairy tale of the day after tomorrow....
- Steve B
- Site Admin
- Posts: 10145
- Joined: Sun Jul 29, 2007 10:02 am
- Location: New York City USofA
- Contact:
Re: Our dedicated chess computers in fact 300-350 elo weaker
you are saying that the SSDF ratings for dedicated chess computers are all overstated by 300-350 Elo?mclane wrote:I was confronted with the idea that our dedicated chess computer
Have in fact a 300-350 elo lower rating then ssdf suggests.
E.g. Mm5 1575 elo.
SSDF ratings correlated quite nicely with Selective Search ratings for more then 30 years
they both cant be all wrong
Doubtful Regards
Steve
- mclane
- Senior Member
- Posts: 1605
- Joined: Sun Jul 29, 2007 9:04 am
- Location: Luenen, germany, US of europe
- Contact:
This is not my opinion. I was confronted with this opinion in a discussion about
Ratings in CSS forum.
My opinion is that due to games dedicated computers vs. humans the ratings were always calibrated at that time. While in later days of computerchess
The ratings are not calibrated anymore.
I remember that super constellation, mm2 and mm4 or 5 was quite often
Playing against humans.
Porzer Open, Aegon tournament, in chess clubs, ...
There were the position test ratings ( Bratko kovacs, bednorz toe Nissen etc.)
And rating lists of ssdf and the British rating lists.
I do believe in the 1900 elo not in the 300 or 350 elo weaker numbers.
Ratings in CSS forum.
My opinion is that due to games dedicated computers vs. humans the ratings were always calibrated at that time. While in later days of computerchess
The ratings are not calibrated anymore.
I remember that super constellation, mm2 and mm4 or 5 was quite often
Playing against humans.
Porzer Open, Aegon tournament, in chess clubs, ...
There were the position test ratings ( Bratko kovacs, bednorz toe Nissen etc.)
And rating lists of ssdf and the British rating lists.
I do believe in the 1900 elo not in the 300 or 350 elo weaker numbers.
What seems like a fairy tale today may be reality tomorrow.
Here we have a fairy tale of the day after tomorrow....
Here we have a fairy tale of the day after tomorrow....
One of the ways to have an idea of an elo of a machine is to play it against humans but also we can test it against positions, there are a lot of test with positions like the colditz, BT-2450, BT-2630,BS2830 and others. With those test you can also have an idea of the strength of the machine. If you do those tests today you will have the same result if you do those test in 5 years from now. ¿So why changing the elo of the machines?
Elo regards
Ricardo
Elo regards
Ricardo
I think you'll find the problem with rating the dedicateds is that only
the early games against the same person have a lot of meaning. Once
the owner figures out how to beat the comp, he can more or less go
ahead and beat at will, on any level, because he has long since found
holes in it's knowledge. It seems to me that the ratings are a fair
reflection of the strength of each comp.
Nick made a comment about this a few months ago, when he said that
a given dedicated chess comp would not be so easy to beat if someone
other than yourself made the first several moves for you.
L
the early games against the same person have a lot of meaning. Once
the owner figures out how to beat the comp, he can more or less go
ahead and beat at will, on any level, because he has long since found
holes in it's knowledge. It seems to me that the ratings are a fair
reflection of the strength of each comp.
Nick made a comment about this a few months ago, when he said that
a given dedicated chess comp would not be so easy to beat if someone
other than yourself made the first several moves for you.
L
- spacious_mind
- Senior Member
- Posts: 4016
- Joined: Wed Aug 01, 2007 10:20 pm
- Location: Alabama
- Contact:
The problem lies in that for example the 40/40 List and other modern lists are no longer realistic to reality. They are only worthwhile as a ranking of engines to engines for comparisons and totally useless for anything else.
MM5 per what you are used to is listed at ELO 1982. You could test it by playing it against any of the below listed engines running at your fastest PC speed and MM5 would end up with a performance of 1500 ELO.
You could try it, play any of the above on a Pentium I7 against Mephisto MM5 20 times at 40/40, 30 seconds or 2hr/40 and see what the results are. Heck you could even use that Athlon 64 X2 4600+ (2.4 GHz) which is about 4 times slower than an I7 (about 4x50=200 ELO) and the results would still be the same.
MM5 would be lucky to get one or two draws never mind a win. So the score would be engine 19 points and MM5 1 point.
Impossible right? considering on that list MM5 has the same ELO rating?
Do you remember the days and reports in CS&S etc where top Grandmasters played and lost against Genius 6 or Rebel 8 or MChess Pro? And that was with hardware that played on a lowly Pentium 300 or 450? So what chance would these Grandmaster today have against for example Prodeo?
Well after this match the Grandmaster would probably loose a couple of hundred points against Prodeo and be lucky escape with a final rating of 2400-2500. (About the same loss as what you are seeing with MM5)
So it doesn't surprise me at all that the engine people have completely lost their sense of reality. To me a good list matches the ability of the programs against humans. It's for this reason that I stopped taking other lists seriously and stick to my own line in the sand list that I created a couple of years ago taking the data from Schachcomputer.Info and freezing it in time.
Computer rating lists nowadays only interest me if I can match the ELO to some degree with humans. Everything else in my opinion is uninteresting.
http://www.spacious-mind.com/html/ratin ... ments.html
best regards
Best regards
MM5 per what you are used to is listed at ELO 1982. You could test it by playing it against any of the below listed engines running at your fastest PC speed and MM5 would end up with a performance of 1500 ELO.
You could try it, play any of the above on a Pentium I7 against Mephisto MM5 20 times at 40/40, 30 seconds or 2hr/40 and see what the results are. Heck you could even use that Athlon 64 X2 4600+ (2.4 GHz) which is about 4 times slower than an I7 (about 4x50=200 ELO) and the results would still be the same.
MM5 would be lucky to get one or two draws never mind a win. So the score would be engine 19 points and MM5 1 point.
Impossible right? considering on that list MM5 has the same ELO rating?
Do you remember the days and reports in CS&S etc where top Grandmasters played and lost against Genius 6 or Rebel 8 or MChess Pro? And that was with hardware that played on a lowly Pentium 300 or 450? So what chance would these Grandmaster today have against for example Prodeo?
Well after this match the Grandmaster would probably loose a couple of hundred points against Prodeo and be lucky escape with a final rating of 2400-2500. (About the same loss as what you are seeing with MM5)
So it doesn't surprise me at all that the engine people have completely lost their sense of reality. To me a good list matches the ability of the programs against humans. It's for this reason that I stopped taking other lists seriously and stick to my own line in the sand list that I created a couple of years ago taking the data from Schachcomputer.Info and freezing it in time.
Computer rating lists nowadays only interest me if I can match the ELO to some degree with humans. Everything else in my opinion is uninteresting.
http://www.spacious-mind.com/html/ratin ... ments.html
best regards
Best regards
Nick
- Steve B
- Site Admin
- Posts: 10145
- Joined: Sun Jul 29, 2007 10:02 am
- Location: New York City USofA
- Contact:
which is all that interests mespacious_mind wrote:The problem lies in that for example the 40/40 List and other modern lists are no longer realistic to reality. They are only worthwhile as a ranking of engines to engines for comparisons and totally useless for anything else.
I own many computers and I want to know how they will play against each other ..could care less how they play against humans or against pc engines or against other modified computers
the SSDF and Selective Search Lists remain accurate and very meaningful to this day.. as do the BT test suites ..and other indicia of rating.. all of which correlate well with each other and have done so for 30+ years
and they only interest me if they can tell me how they will play against other dedicated computers..only mildly interested in performance VS humansspacious_mind wrote:
Computer rating lists nowadays only interest me if I can match the ELO to some degree with humans. Everything else in my opinion is uninteresting.
as mentioned elsewhere Humans can manipulate their play against computers and over time create false results increasing their win percentage
Selectice Search on occasion would publish a rating list Vs Humans and I never paid much attention to it
Hallsworth eventually dropped publishing the lists due to lack of submissions
you would change the age old definition of a dedicated chess computer and now you would change their historical ratings
your doing well Nick
Alt Left Regards
Steve
Last edited by Steve B on Thu Jun 29, 2017 11:32 am, edited 1 time in total.
- spacious_mind
- Senior Member
- Posts: 4016
- Joined: Wed Aug 01, 2007 10:20 pm
- Location: Alabama
- Contact:
I seem to be the only one Steve that has stuck to their historical ratings. SSDF doesn't it is out of proportion and has been for years. Why do you think they show at the bottom of the list some old defunct program hardware and at the top of the list the latest newest program. So how do you compare the old defunct program hardware to the hardware at the top of the list. You can't there is no comparison.Steve B wrote:which is all that interests mespacious_mind wrote:The problem lies in that for example the 40/40 List and other modern lists are no longer realistic to reality. They are only worthwhile as a ranking of engines to engines for comparisons and totally useless for anything else.
I own many computers and I want to know how they will play against each other ..could care less how they play against humans or against pc engines or against other modified computers
the SSDF and Selective Search Lists remain accurate and very meaningful to this day.. as do the BT test suites ..and other indicia of rating.. all of which correlate well with each other and have done so for 30+ years
and they only interest me if they can tell me how they will play against other dedicated computers..only mildly interested in performance VS humansspacious_mind wrote:
Computer rating lists nowadays only interest me if I can match the ELO to some degree with humans. Everything else in my opinion is uninteresting.
as mentioned elsewhere Humans can manipulate their play against computers and over time create false results increasing their win percentage
Selectice Search on occasion would publish a rating list Vs Humans and I never paid much attention to it
you would change the age old definition of a dedicated chess computer and now you would change their historical ratings
your doing well Nick
Alt Left Regards
Steve
Nick
- Steve B
- Site Admin
- Posts: 10145
- Joined: Sun Jul 29, 2007 10:02 am
- Location: New York City USofA
- Contact:
spacious_mind wrote:
I seem to be the only one Steve that has stuck to their historical ratings. SSDF doesn't it is out of proportion and has been for years. Why do you think they show at the bottom of the list some old defunct program hardware and at the top of the list the latest newest program. So how do you compare the old defunct program hardware to the hardware at the top of the list. You can't there is no comparison.
I don't compare old dedicated computers to modern PC engines
I cant speak to the modern testing methods of the SSDF today
I guess Lars Sandin could do that
not sure he would take an old dedicated computer and play it against a PC engine for a rating
I think he does that with dedicated computers that are PC engine based like the Phoenix computers
I am interested though in the BT test suite performance of some of the modern PC engines you listed with a 1500 rating
my guess is they would score much higher then 1500 which I think is an indication that something is not quite right with their rating
not sure of that though...
any tests like that available?
Drilling Down Regards
Steve
- spacious_mind
- Senior Member
- Posts: 4016
- Joined: Wed Aug 01, 2007 10:20 pm
- Location: Alabama
- Contact:
There are plenty of tests you can try my rating test which would accurately show their results of those tests for all and everythingSteve B wrote:spacious_mind wrote:
I seem to be the only one Steve that has stuck to their historical ratings. SSDF doesn't it is out of proportion and has been for years. Why do you think they show at the bottom of the list some old defunct program hardware and at the top of the list the latest newest program. So how do you compare the old defunct program hardware to the hardware at the top of the list. You can't there is no comparison.
I don't compare old dedicated computers to modern PC engines
I cant speak to the modern testing methods of the SSDF today
I guess Lars Sandin could do that
not sure he would take an old dedicated computer and play it against a PC engine for a rating
I think he does that with dedicated computers that are PC engine based like the Phoenix computers
I am interested though in the BT test suite performance of some of the modern PC engines you listed with a 1500 rating
my guess is they would score much higher then 1500 which I think is an indication that something is not quite right with their rating
not sure of that though...
any tests like that available?
Drilling Down Regards
Steve
The errors in modern lists is that they seem to think that there is this magical ceiling of 3400 ELO and god forbid that someone gets past it. That is what is ruining previously good ratings of old programs. If there is a ceiling you should be reaching it through diminishing returns meaning that today if Komodo were 9 points better than Stockfish at that ceiling as it shows today then perhaps in reality through diminishing returns the difference is really 1 point between them and add 2 points behind the decimal.
It needs a total reinvention of the rating calculation system as it don't work.
Besides us humans buy computers so we have every right to know where we stand accurately on any list.
Best regards
Nick
- Steve B
- Site Admin
- Posts: 10145
- Joined: Sun Jul 29, 2007 10:02 am
- Location: New York City USofA
- Contact:
OK so your beef is with the modern rating lists and not the older established ones like Selective Search ( or the old SSDF lists) which NEVER had older computers play against MODERN pc enginesspacious_mind wrote:
The errors in modern lists is that they seem to think that there is this magical ceiling of 3400 ELO and god forbid that someone gets past it. That is what is ruining previously good ratings of old programs.
is that correct?
I didn't get that from you first post
if so then that's my bad and I can see your point
Missed Your Point (I think) Regards
Steve
- paulwise3
- Senior Member
- Posts: 1508
- Joined: Tue Jan 06, 2015 10:56 am
- Location: Eindhoven, Netherlands
You found the solution Nick!spacious_mind wrote: The errors in modern lists is that they seem to think that there is this magical ceiling of 3400 ELO and god forbid that someone gets past it. That is what is ruining previously good ratings of old programs.
So this means our rating of dedicated machines is ok, and that programs like Komodo and Stockfish are thus rated 350 points too low!!!
Rating the rating system regards,
Paul
2024 Special thread: viewtopic.php?f=3&t=12741
2024 Special results and standings: https://schaakcomputers.nl/paul_w/Tourn ... 25_06.html
If I am mistaken, it must be caused by a horizon effect...
2024 Special results and standings: https://schaakcomputers.nl/paul_w/Tourn ... 25_06.html
If I am mistaken, it must be caused by a horizon effect...
-
- Full Member
- Posts: 679
- Joined: Mon Aug 29, 2016 8:31 pm
- Location: Cheshunt, Hertfordshire, UK
I subscribed to selective search magazine for a few years,and following games in the magazine by dedicateds against rated humans,there is no doubt in my mind the ratings are not far off,of course if you play a good dedicated often enough you will find a weakness in an opening line,and could beat it almost at will,but whenever you play another human ,whether in club games or a higher level,that is not how the real scenario works.There are plenty of tests,including giving the computer positions out of the chess informant periodical ,which I have done,and the R30 for one has done well .
- spacious_mind
- Senior Member
- Posts: 4016
- Joined: Wed Aug 01, 2007 10:20 pm
- Location: Alabama
- Contact:
Hi Steve,Steve B wrote:OK so your beef is with the modern rating lists and not the older established ones like Selective Search ( or the old SSDF lists) which NEVER had older computers play against MODERN pc enginesspacious_mind wrote:
The errors in modern lists is that they seem to think that there is this magical ceiling of 3400 ELO and god forbid that someone gets past it. That is what is ruining previously good ratings of old programs.
is that correct?
I didn't get that from you first post
if so then that's my bad and I can see your point
Missed Your Point (I think) Regards
Steve
Yes I believe that the problem lies in keeping all the programs within a range. Therefore the stronger the top end gets the more you lower ratings at the bottom and create some crazy justification why that needs to be done. Ie...miraculously the chess player today is sooooooo much better than the chess player from 10 years ago and he can now beat all the dedicated computers etc...which you know is bs.
The challenge on the top end is that if you use today's formula then all of a sudden you might see R30 rated at 2550 or something and no one is going to like that either.
Therefore the rating calculation that was created years ago is no longer adequate for todays top engines and fast computer speeds.
Best regards
Nick
- spacious_mind
- Senior Member
- Posts: 4016
- Joined: Wed Aug 01, 2007 10:20 pm
- Location: Alabama
- Contact:
When I mentioned in an earlier post that I seemed to be the only one who has stayed faithful to the old ratings take a look at this CCR List from 1995 which was shortly before PLY (SSDF) decided to mess around (founders of the messing around) and downgrade chess computers in order at that time make space for the new generation chess software and winboard engines.
Back then dedicated computers players were about a 1000 times more than what we have today. But I guess a committee of two can decide that if the shoe doesn't fit change it.
So show me list today that comes as faithfully close to what experts of the past reported on as what I showed here:
http://www.spacious-mind.com/html/ratin ... ments.html
Look at the USCF ratings and compare them the above CCR list and you will get my point.
At the time when I did this a couple of years ago, I was soooo tempted to increase start base even higher to make the USCF ratings resemble even more closely to the CCR list, but my temptation stopped because Info list for dedicated chess computers is the best there is today with the amount of games played and collected. So I used Info's list as an accurate start base. Even though over the years that list swayed with the wind as well. Starting high then adjusting downwards by 100 ELO's to suit SSDF and then a few years ago adjusting back upwards again by 100 ELO.
Even today Info has this self inflicted barrier where under no circumstance should a dedicated computer program be listed above 2400 ELO. The only exception being R40 which no one has except for Steve So no one really cares that it lies above 2400 today So probably even Hiarcs 1% which easily won their tournament continues to fit below 2400 although overall it trounces every other dedicated chess computer
So who is to say what really is right when a few can decide for the rest of the world that the number 100 is appropriate for a reduction or an increase. So the moral of the story is, "If the shoe doesn't fit, chop of some toes and then it will fit nicely!"
Best regards
Back then dedicated computers players were about a 1000 times more than what we have today. But I guess a committee of two can decide that if the shoe doesn't fit change it.
So show me list today that comes as faithfully close to what experts of the past reported on as what I showed here:
http://www.spacious-mind.com/html/ratin ... ments.html
Look at the USCF ratings and compare them the above CCR list and you will get my point.
At the time when I did this a couple of years ago, I was soooo tempted to increase start base even higher to make the USCF ratings resemble even more closely to the CCR list, but my temptation stopped because Info list for dedicated chess computers is the best there is today with the amount of games played and collected. So I used Info's list as an accurate start base. Even though over the years that list swayed with the wind as well. Starting high then adjusting downwards by 100 ELO's to suit SSDF and then a few years ago adjusting back upwards again by 100 ELO.
Even today Info has this self inflicted barrier where under no circumstance should a dedicated computer program be listed above 2400 ELO. The only exception being R40 which no one has except for Steve So no one really cares that it lies above 2400 today So probably even Hiarcs 1% which easily won their tournament continues to fit below 2400 although overall it trounces every other dedicated chess computer
So who is to say what really is right when a few can decide for the rest of the world that the number 100 is appropriate for a reduction or an increase. So the moral of the story is, "If the shoe doesn't fit, chop of some toes and then it will fit nicely!"
Best regards
Nick