Your Country Needs You!

This forum is for general discussions and questions, including Collectors Corner and anything to do with Computer chess.

Moderators: Harvey Williamson, Steve B, Watchman

Forum rules
This textbox is used to restore diagrams posted with the fen tag before the upgrade.
Post Reply
User avatar
spacious_mind
Senior Member
Posts: 3999
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

I have made a slight adjustment in how the tests are rated in order to allow engines to be tested as well and hopefully get close to the ratings that are shown here at CCRL 40/40:

http://computerchess.org.uk/ccrl/4040/

This link is also interesting:

http://www.chess.com/article/view/chess ... ons?page=3

I have been trying to find a comparison between USCF ratings and FIDE ratings. The above is about the only thing that I have found. Interesting in the table at chess.com, FIDE ratings are higher than the UCSF ratings up to @ ELO 1800, after which USCF ratings become higher than FIDE ratings.

Because I have adjusted the ratings to allow testing of all programs as listed in CCRL 40/40, below are the links to download the revised spreadsheets for test games 1 and 2:

Test Game 1 Download

http://spacious-mind.com/forum_reports/ ... ellvi.xlsx

Test Game 2 Download

http://spacious-mind.com/forum_reports/ ... _turk.xlsx

I have also created two new tests. Test game 3 is probably the first Immortal game ever played (move aside Adolf Anderssen!). The game was played between Thomas Bowdler against Henry Seymour Conway, in London, England in 1788. This is a good 63 years before Adolf Anderssen played his Immortal game against Lionel Kieseritzky.

Thomas Bowdler, by profession a physician is best known for publishing "The Family Shakespeare" which censored William Shakespeare's works making them more suitable for reading by Women and Children. Well that was the intent. I guess Women were considered as innocent as Children in the 18th Century. This achievement also created the use of a new English dictionary word "bowdlerize" meaning:

1: to expurgate (as a book) by omitting or modifying parts considered vulgar

2: to modify by abridging, simplifying, or distorting in style or content

Therefore Thomas Bowdler was not only a good chess player in his day, but also the inventor of Censorship!!!.

Field Marshall Henry Seymour Conway, it seems was a British General and Statesman and brother of the 1st Marques of Hertford and cousin of Horace Walpole, the Earl of Oxford and son of Prime Minister Sir Robert Walpole.

Well it seems that back then chess was very much a pastime for Nobelmen who all had other jobs and positions.

1. e4 e5 2. Bc4 Bc5 3. d3 (START)

[fen]rnbqk1nr/pppp1ppp/8/2b1p3/2B1P3/3P4/PPP2PPP/RNBQK1NR w KQkq - 0 3[/fen]

Test Game 3 Download

http://spacious-mind.com/forum_reports/ ... onway.xlsx

You cannot really post anything related to 18th century chess without also including something from François-André Danican Philidor. Philidor has to be the first Super Grandmaster in history!. He played most of his games either blind or at odds of a pawn or a piece and 1st move for his opponent. Philidor was also a famous musical composer of his day, producing over 20 operas.

In this game below, he plays blindfolded against none other than Thomas Bowdler. The game was drawn. This game is interesting because it has a closed position which tends to be really difficult for dedicated computers.

1. e4 c5 2. Bc4 e6 3. Qe2 Nc6 4. c3 a6 5. a4 b6 6. f4 d6 7. Nf3 Nge7 8. Ba2 g6
9. d3 Bg7 10. Be3 d5 11. Nbd2 O-O 12. O-O f5 (START)

[fen]rnbqk1nr/pppp1ppp/8/2b1p3/2B1P3/3P4/PPP2PPP/RNBQK1NR w KQkq - 0 3[/fen]

Test Game 4 Download

http://spacious-mind.com/forum_reports/ ... lidor.xlsx

Finally I have tested 4 computers with the 4 tests:

Image

For these 4 tests I have left the actual games in the Test spreadsheets for you to look at.

The above new ratings should allow you to accurately compare the real strength difference between all tested programs and engines.

ps. I forgot to mention that all you humans can play these tests as well. The spreadsheets also have Tabs specifically for human players to try out their skill.

Best regards
Nick
User avatar
spacious_mind
Senior Member
Posts: 3999
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

Ooops I just noticed that I did not show Test Game 4 start position.

Here it is:

Test Game 4

"1. e4 c5 2. Bc4 e6 3. Qe2 Nc6 4. c3 a6 5. a4 b6 6. f4 d6 7. Nf3 Nge7 8. Ba2 g6
9. d3 Bg7 10. Be3 d5 11. Nbd2 O-O 12. O-O f5 (START)"

[fen]r1bq1rk1/4n1bp/ppn1p1p1/2pp1p2/P3PP2/2PPBN2/BP1NQ1PP/R4RK1 w - - 0 13[/fen]

The test starts from the above position.

You can download the spreadsheet from the link in the previous post.

Thanks and regards,
Nick
User avatar
spacious_mind
Senior Member
Posts: 3999
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

Here is the 5th and last test from the pre 19th Century.

This game was played between Captain Smith and Philidor, in London in 1790. Philidor played blindfold.

TEST GAME 5 - SMITH-PHILIDOR

[Event "London blindfold"]
[Site "London"]
[Date "1790.03.13"]
[Round "?"]
[White "Smith"]
[Black "Philidor, Francois Andre Dani"]
[Result "0-1"]
[ECO "C24"]
[PlyCount "66"]
[EventDate "1790.??.??"]
[EventType "game"]
[EventRounds "3"]
[EventCountry "ENG"]
[Source "ChessBase"]
[SourceDate "2002.11.25"]

1. e4 e5 2. Bc4 Nf6 3. d3 c6 4. Bg5 h6 5. Bxf6 Qxf6 6. Nc3 b5

[fen]rnb1kb1r/p2p1pp1/2p2q1p/1p2p3/2B1P3/2NP4/PPP2PPP/R2QK1NR w KQkq - 0 7[/fen]

The test ends after 29 moves but below are the complete moves:

7. Bb3 a5 8. a3
Bc5 9. Nf3 d6 10. Qd2 Be6 11. Bxe6 fxe6 12. O-O g5 13. h3 Nd7 14. Nh2 h5 15. g3
Ke7 16. Kg2 d5 17. f3 Nf8 18. Ne2 Ng6 19. c3 Rag8 20. d4 Bb6 21. dxe5 Qxe5 22.
Nd4 Kd7 23. Rae1 h4 24. Qf2 Bc7 25. Ne2 hxg3 26. Qxg3 Qxg3+ 27. Nxg3 Nf4+ 28.
Kh1 Rxh3 29. Rg1 Rxh2+ 30. Kxh2 Rh8+ 31. Nh5 Rxh5+ 32. Kg3 Nh3+ 33. Kg4 Rh4#


DOWNLOAD TEST GAME 5

http://spacious-mind.com/forum_reports/ ... lidor.xlsx

I have played a few computers as shown below in test game 5 and the results are included in the above attached Test Game 5 spreadsheet.

Image

DOWNLOAD SCORE CALCULATOR

http://spacious-mind.com/forum_reports/ ... uters.xlsx

This spreadsheet download now incorporates all 5 games for a final test rating score as show below for the computers that have played all 5 tests so far.

Image

The RATING column shows the final rating for the pre 19th century tests.

Best regards
Nick
User avatar
paulwise3
Senior Member
Posts: 1505
Joined: Tue Jan 06, 2015 10:56 am
Location: Eindhoven, Netherlands

Post by paulwise3 »

Hi Nick,
As you suggested I downloaded the spreadsheet for the first testgame from http://hiarcs.net/forums/viewtopic.php?t=6835&start=60 and entered my moves. I did the test at 6-1-2015, but did not take the time to post them. With this new version I can see my total score, but not the separate white/black score. Since I used to think my defensive qualities are better than the offensive, I am quite curious! As for my rating: I used to be a good club player, but haven't played really serious games for more then 30 years. But I guess this (1878) is about the upper limit of the rating I should have.
I am not sure how to insert an image, so here are my moves:
WHITE BLACK
MOVE PLAYED MOVE PLAYED
4. ... Nf6
5.Nf3 5. ... Bg4
6.h3 6. ... Bh5
7.Qxf3 7. ... c6
8.Qxb7 8. ... Nbd7
9.Nb5 9. ... Bd6
10.d4 10. ... Rb8
11.Nxc8 11. ... Nxc8
12.Bb5+ 12. ... Nd6
13.Bb5+ 13. ... Nd7
14.Qxb5+ 14. ... Nd7
15.Bf4 15. ... exd5
16.Qxd5 16. ... c5
17.0-0-0 17. ... 0-0
18.Rxd5 18. ... Ke7
19.Bf4 19. ... Qxg2
20.Qxd7+ 20. ... Kf8
21.Qd8+ CHECKMATE

Looking forward to my white/black scores!

Regards, Paul
User avatar
spacious_mind
Senior Member
Posts: 3999
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

paulwise3 wrote:Hi Nick,
As you suggested I downloaded the spreadsheet for the first testgame from http://hiarcs.net/forums/viewtopic.php?t=6835&start=60 and entered my moves. I did the test at 6-1-2015, but did not take the time to post them. With this new version I can see my total score, but not the separate white/black score. Since I used to think my defensive qualities are better than the offensive, I am quite curious! As for my rating: I used to be a good club player, but haven't played really serious games for more then 30 years. But I guess this (1878) is about the upper limit of the rating I should have.
I am not sure how to insert an image, so here are my moves:
WHITE BLACK
MOVE PLAYED MOVE PLAYED
4. ... Nf6
5.Nf3 5. ... Bg4
6.h3 6. ... Bh5
7.Qxf3 7. ... c6
8.Qxb7 8. ... Nbd7
9.Nb5 9. ... Bd6
10.d4 10. ... Rb8
11.Nxc8 11. ... Nxc8
12.Bb5+ 12. ... Nd6
13.Bb5+ 13. ... Nd7
14.Qxb5+ 14. ... Nd7
15.Bf4 15. ... exd5
16.Qxd5 16. ... c5
17.0-0-0 17. ... 0-0
18.Rxd5 18. ... Ke7
19.Bf4 19. ... Qxg2
20.Qxd7+ 20. ... Kf8
21.Qd8+ CHECKMATE

Looking forward to my white/black scores!

Regards, Paul
Hi Paul

With these tests there are several tabs. Are you not seeing them?

Image

The above picture is from the TAB called COMPUTER TEST. Here you type in the information of your computer and here is where you play out your computer moves.

Image

The information that you put into the COMPUTER TEST tab, automatically carries over into COLUMN A of the next tab called COMPUTER SCORE - 30 SECONDS. When I have finished with the COMPUTER TEST game, I then come to this TAB and I copy Column A into a free column by highlighting COLUMN A and doing "COPY" followed with "PASTE FORMAT" into a new free Column and then follow this with "PASTE VALUE". This procedure then transfers the information into this new Column but does not copy the formulas that are in column A as you don't want the actual formulas copied to another column because column A is continuously overridden when you add new computers to the test. In the above example you can see that I did that for CXG LEGEND as it is shown in Column A and Column F.

Also in this TAB named COMPUTER SCORE - 30 SECONDS you can see the scores for white and black and the total score as I am showing in the picture below:

Image

At the top of the picture you see WHITE SCORE and at the bottom of the picture you see BLACK SCORE and TOTAL SCORE. So you do have both scores showing here.

Also to post a result into the forum, all you have to do is HIGHLIGHT the column that you want to copy and then with COPY in the spreadsheet, followed by PASTE in your Forum Post you will see what I am showing below:

COMPUTER TEST RESULT
CXG
"LEGEND - 1992
CONCERTO - 1992"
GYULA HORVATH
LEVEL 53 STYLE 5 (30 SECONDS) AVERAGE TIME
H8 - 8 BIT - 10 MHZ - 32 KB ROM 1 KB RAM
WHITE
4.Qh5
5.Nc3
6.Nf3
7.c3
8.Bxf7+
9.Bb3
10.Kd2
11.c3
12.Nxc3
13.Qg4+
14.Qxg7
15.Bxg8
16.Bxd6+
17.Bxd6+
18.Bd5+
19.Bxa8
20.Bxa8
21.a4+
22.Qb3+
WHITE SCORE:
2419
BLACK MOVES
3. ... d6
4. ... Nf6
5. ... b5
6. ... Qb6
7. ... Bxg1
8. ... Kd7
9. ... Qxa1
10. ... Nf6
11. ... Bxc3+
12. ... Qxh1
13. ... Kc7
14. ... Ne7
15. ... Ngf6
16. ... cxb5
17. ... Kd8
18. ... Ka6
19. ... Nc5
20. ... Qf1
21. ... Kxb4
BLACK SCORE:
2097
TOTAL SCORE:
2258

This for example is the result of test game 3 CXG Legend playing at level 53 with normal setting (Style 5).

Best regards
Nick
User avatar
paulwise3
Senior Member
Posts: 1505
Joined: Tue Jan 06, 2015 10:56 am
Location: Eindhoven, Netherlands

Post by paulwise3 »

Hi Nick,
Thanx for your explanations! I'm always fighting with Excel ;-). Didn't even notice that the white moves score was just a PageUp away...
I entered my moves in the computertest tab and now everything works.
Since I seem not able to edit my previous post, here is my score:
Paul Wiselius (3 min./move)
WHITE SCORE:
1708
BLACK SCORE:
2048
TOTAL SCORE:
1878
Which confirms my thoughts about better defending than attacking.

Although I'm not that passive as the Concerto! This machine is definitely not suited for blitz games. I let it play with black against the Sphinx Comet at 20 secs avg per move (Comet in standard playing mode):
1. with all parameters set at 50. That was a dramatic game, in which the Comet was in clear winning position, about a rook in advance, but let it slip to a draw by 3 times move repetition (it has no detection for move repetition).
2. Standard style and param values. This time the Concerto had clear winning advantage, but let it slip away, eventually it could be happy with draw by move repetition. This however was interesting, because the Concerto did not signal the draw, probably because there was a move cycle of 4 moves (8 ply) involved.
(Which could lead to another subject: at what move/ply cycle does a machine still detect 3 times position repetition??)

One extra edit: I really like to play with the Comet. At first I was disappointed to notice the elo of 1500. But it has a nice active playing stile, well suited to play a quick game with (and still be able to win in the end ;-)). I'm adding an extra post on my thread about it being not an 1800 elo machine with a link to new documentation about it being a Kaare Danielson program. Probably the strongest 4kb rom program there is!


Regards, Paul
User avatar
paulwise3
Senior Member
Posts: 1505
Joined: Tue Jan 06, 2015 10:56 am
Location: Eindhoven, Netherlands

Post by paulwise3 »

Hi,
I'm proud to present the testscore for game 1 for the Saitek Cougar:

COMPUTER TEST RESULT
Saitek
Cougar - 1998
Frans Morsch
Level A8 (30 seconds) average time
H8/3214 (HD6433214) - 8 Bit - 16 MHz - 32 KB ROM - 1 KB RAM
WHITE MOVES
5.Nf3
6.Bxf7+
7.Qxf3
8.Qxb7
9.Nb5
10.0-0
11.Bb5+
12.0-0
13.Qc6+
14.Qxb5+
15.c4
16.Qxd5
17.Qxd5
18.c3
19.0-0
20.Qxd7+
21.Qd8+
WHITE SCORE:
2596
BLACK MOVES
4. ... g6
5. ... e6
6. ... Bh5
7. ... Nc6
8. ... Nbd7
9. ... Bd6
10. ... Rb8
11. ... Nxc4
12. ... Nd6
13. ... Ke7
14. ... Nd7
15. ... exd5
16. ... c5
17. ... 0-0
18. ... Rg8
19. ... Qe4+
20. ... Kf8
BLACK SCORE:
2168
TOTAL SCORE:
2382

Well, I'm continuing with game 2 ;-)

Baffled regards,
Paul
User avatar
paulwise3
Senior Member
Posts: 1505
Joined: Tue Jan 06, 2015 10:56 am
Location: Eindhoven, Netherlands

Post by paulwise3 »

A word about the clone/programmer discussion: A while ago I found a way to determine if my Turbo Advanced Trainer really was a clone to the GK2000. I did some of the BT-2450 tests where the GK2000 succeeded, and found out that the Turbo Advanced Trainer solved those in exactly the same time! For machines of elo 1600 or higher this is suitable. But you have to spend some time waiting. If there is a machine on the list there you suspect it is a clone, you can start with a test for which it has a short solution time ;-)

Edit: the list I'm referring to is on http://www.schach-computer.info/wiki/index.php/BT-2450

Clone regards,
Paul
User avatar
paulwise3
Senior Member
Posts: 1505
Joined: Tue Jan 06, 2015 10:56 am
Location: Eindhoven, Netherlands

Post by paulwise3 »

And here is the score for testgame 2 (the turk):
COMPUTER TEST RESULT
Saitek
Cougar
Frans Morsch
Level A8 (30 seconds) average time
H8 - 8 Bit - 16 MHz - 32 KB ROM - 1 KB RAM
WHITE MOVES
7.0-0
8.Nbd2
9.Be3
10.Nd4
11.Bxc6+
12.Nxe6
13.Nxd5
14.c4
15.Nb5
16.Be3
17.Nbxc7
18.Rxe3
19.Nxd5
20.Rd3
21.Rxe8
22.Rd3
23.b4
24.Rab3
25.Rxb7+
26.Rxa7+
27.Qb6+
WHITE SCORE:
2034
BLACK MOVES
6. ... Nb4
7. ... Bb4
8. ... Qf4
9. ... Bc5
10. ... Qf6
11. ... Bh3
12. ... Bd7
13. ... 0-0-0
14. ... Bxc6
15. ... Bd6
16. ... Be6
17. ... Bxd5
18. ... Bf7
19. ... Rhe8
20. ... Rhe8
21. ... Qf8
22. ... Re4
23. ... Re4
24. ... Nc6
25. ... Kc8
26. ... Nxa7
BLACK SCORE:
2140
TOTAL SCORE:
2087

Average for test 1 & 2 is:
White: 2315, Black: 2154, Total: 2234,5

Test regards,
Paul
User avatar
spacious_mind
Senior Member
Posts: 3999
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

Hi Paul,

Thanks for testing the computers. I hope you are enjoying it. I will be updating the rating list as soon as all 5 tests are complete. You will find since each game is unique that the scores will vary a lot but it is the final 5 game score that counts and compared to the rest of the computers.

I am busy at the moment testing my latest new toy! Will post about it later as well!! :P

Best regards
Nick
Carl Bicknell
Member
Posts: 252
Joined: Mon May 04, 2009 10:06 pm

Post by Carl Bicknell »

These tests are interesting, thank you for doing them.

My only comment would be that they are not real games.

I've just played a mini-match between my Mephisto Master (aka Milano Pro) and my RISC 2500.

You'd assume the RISC 2500 would win, right? Well not at G60 it doesn't. Its handling of the clock is so bad (too fast) it gets clobbered by the Master.
User avatar
spacious_mind
Senior Member
Posts: 3999
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

Carl Bicknell wrote:These tests are interesting, thank you for doing them.

My only comment would be that they are not real games.

I've just played a mini-match between my Mephisto Master (aka Milano Pro) and my RISC 2500.

You'd assume the RISC 2500 would win, right? Well not at G60 it doesn't. Its handling of the clock is so bad (too fast) it gets clobbered by the Master.
Hi Carl,

Thanks for the feedback. I normally play the Risc 2500 at 60/30 or 60/60 or Tournament setting 40/2 hrs and under these settings I have never really noticed a problem with clock handling. I am assuming you are playing G60 countdown? Perhaps it has problems with countdown levels. Also do you have it set up as Expert? Under another setting it might not play at its best.

Regarding your comment about "not real games" can you elaborate on this please. Are you meaning because the computers don't play in this way (meaning old historical games)? Or because a complete game is not played because of continuous take-back of moves?

These tests are intended to see what moves the computer or human would have chosen in the positions arisen from these historical human games and the moves are rated accordingly one move at a time. In this respect you could say that they are not real from a complete game perspective as the natural flow is missing, however move by move the move choice and the rating of that choice would be quite real and very accurate because that is what the computers or the humans are playing in these positions :)

I am working on doing a complete set of tests through the centuries. The very old games such as the first 5 tests will of course incorporate openings and moves that are not normal nowadays, but still the computer has to still figure out the best move that it would play. It is these individual moves that are being scored and compared.

As the tests evolve through the centuries you will see modern game evaluations as well.

Besides these historical players are being rated as well so this approach also allows you to also get a feel of the Master's strength through the generations.

It all just takes soooooooo much work and time.

Just trying to kill 5 birds with one stone regards :roll:
Nick
Carl Bicknell
Member
Posts: 252
Joined: Mon May 04, 2009 10:06 pm

Post by Carl Bicknell »

Hi,

You're doing great work! I'm not saying otherwise.

I don't know what the intention of your testing is, it may just be for fun. If it is for establishing the strength of computers though (elo) then a testsuite only goes so far.

A testsuite doesn't take into account how a machine handles the clock. It's possible that two dedicated machines both score the same result on a testsuite and yet one beats the other in a real match. Simply because it handles the clock better.

This is one issue I have with something like the BT2630 test for determining strength, it's a great test, but the SSDF real games are better.

But I am just thinking out loud - continue your testing and I know it takes many hours to do.

(Yes I was testing the RISC 2500 at G60 on Expert, and its time management is horrible)
User avatar
spacious_mind
Senior Member
Posts: 3999
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

Carl Bicknell wrote:Hi,

You're doing great work! I'm not saying otherwise.

I don't know what the intention of your testing is, it may just be for fun. If it is for establishing the strength of computers though (elo) then a testsuite only goes so far.

A testsuite doesn't take into account how a machine handles the clock. It's possible that two dedicated machines both score the same result on a testsuite and yet one beats the other in a real match. Simply because it handles the clock better.

This is one issue I have with something like the BT2630 test for determining strength, it's a great test, but the SSDF real games are better.

But I am just thinking out loud - continue your testing and I know it takes many hours to do.

(Yes I was testing the RISC 2500 at G60 on Expert, and its time management is horrible)
Hi Carl,

I don't disagree with you, tests will only take things so far and of course a match between two computers may end up completely different to any test ever created.

Having said that, the same applies to test matches. I have played enough tournaments to also know quite well that you can take any row of 10 computers from a rating list and let them play two games against each other and I would be safe in winning a million bucks that the end result table will not look like Wiki, ELO lists, SSDF or any other rating list from which these computers were taken. Repeat this a second time and the second result would even be different to the first.

So everything we do is just a guideline at the end of the day, and ultimately fun, otherwise why do it? Testing at SSDF or playing tournaments is absolutely no different. Nothing is guaranteed. You do it for fun and to learn something more about your hobby.

Every list and every test lacks something, they are all approximate guidelines. There are plenty of examples of two computers playing 20 games against each other where after about 8 games the score is something like 6-2 or even 7-1 for one computer but after 20 games the other computer has caught up or is ahead. So even that makes you wonder if 12 game tests are enough or should it be 20 or even 40 or a 100?

Something else to question is, what is the value of playing two computers against each other repeatedly for many games? It might prove a difference of strength at the end of the day between these two but does not necessarily reflect how they perform against a dozen or two dozen different computers in a tournament.

If you played Stockfish for example a 100 times against your Saitek Master or Risc 2500 do you think that they would rate at 2100 to 2200 ELO or closer to 1000 ELO? What would be the perspective in a test like this?

So in the end I think we all ultimately try different things for fun and in order to cut corners, it is impossible to play enough games unless you have 100 dedicated testers playing a hundred years at 2hrs/40 matches and even if this were possible, repeat the same thing again a second time and the outcome and rankings would I guarantee not be the same as your first attempt.

A driver for me with my test is this logic. My 5 test games have a total of 226 moves that are being rated. If I multiply this across say a set of say 16 test games then that equals to approx. 723 moves (of course the more games the higher potentially the accuracy). Now if it were possible to use these 723 moves to post similar results as for example SSDF with every computer in the world having been evaluated 723 times exactly the same. Don't you think that there may a slight possibility that this test ends up being at least as accurate if not more so than other tests including SSDF? To do that with SSDF for example I would have to play 723 games with every single computer against every single computer to get the same equal result and that is impossible to achieve in a hundred thousand years.

Can you just imagine how much time I am saving between the difference of 723 moves and 723 games per tested computer against every other tested computer, if I can make this work?? And that is not even mentioning that every single computer was tested exactly the same. That would be like playing every single computer in the world against every single computer in the world a fixed number of times. :wink:

Best regards
Nick
Reinfeld
Member
Posts: 486
Joined: Thu Feb 17, 2011 3:54 am
Location: Tacoma, WA

Post by Reinfeld »

If you played Stockfish for example a 100 times against your Saitek Master or Risc 2500 do you think that they would rate at 2100 to 2200 ELO or closer to 1000 ELO? What would be the perspective in a test like this?
Now this is an interesting question - though Stockfish isn't the best example. But a modern engine that includes an auto-adjust feature (Hiarcs, Shredder, Fritz, for example) could provide some relevant data.

Clearly these programs would smack our dedicateds in every game, if the programs were set at full strength. But apply the auto-adjust, and what would happen? I've tried this now and then, and I find it usually takes about 20 games to get to the break-even line. The Chessmaster program is also good for this, because it has the added feature of the programmed personalities. In addition, if you create a username for your dedicated (RISC 2500), you can run those 20 games in reasonably short order and get to a starting point for a rating - just as a human can.

I hesitate to match software against dedicateds because of the obvious strength factors (plus it's a little boring), but I think using the various self-rating systems in those engines is another way to reach a reasonable measurement.

- R.
"You have, let us say, a promising politician, a rising artist that you wish to destroy. Dagger or bomb are archaic and unreliable - but teach him, inoculate him with chess."
– H.G. Wells
Post Reply