SPACIOUS-MIND RATING TEST DOWNLOAD AND INSTRUCTIONS
Moderators: Harvey Williamson, Steve B, Watchman
Forum rules
This textbox is used to restore diagrams posted with the fen tag before the upgrade.
This textbox is used to restore diagrams posted with the fen tag before the upgrade.
- spacious_mind
- Senior Member
- Posts: 4000
- Joined: Wed Aug 01, 2007 10:20 pm
- Location: Alabama
- Contact:
SPACIOUS-MIND RATING TEST DOWNLOAD AND INSTRUCTIONS
I created this new post to avoid having these lists lost amongst all the test posts.
Below are all 5 Rating Tests for download as well as some PowerPoint instructions. The Excel spreadsheets and Powerpoint were created with MS Office 2010. Therefore earlier versions Excel will probably not work. (sorry I can't help that.)
Since I have carried out about 200 tests with each spreadsheet I know they are fully functional and error free. If you find an error in your test games it is much more probable that you have made a mistake somewhere. To minimize that for you I created some instructions that are also downloadable.
The attachments below are the 5 rating tests, a ranking list and the instructions. I have kept all the tested games in the spreadsheet to allow you to compare the programs.
Test Game 1
http://spacious-mind.com/forum_reports/ ... ellvi.xlsx
Test Game 2
http://spacious-mind.com/forum_reports/ ... known.xlsx
Test Game 3
http://spacious-mind.com/forum_reports/ ... onway.xlsx
Test Game 4
http://spacious-mind.com/forum_reports/ ... lidor.xlsx
Test Game 5
http://spacious-mind.com/forum_reports/ ... lidor.xlsx
Rating List
http://spacious-mind.com/forum_reports/ ... _list.xlsx
INSTRUCTIONS
http://spacious-mind.com/forum_reports/ ... tions.pptx
For anyone that can use the spreadsheets but doesn't have PowerPoint here are the instructions on how to use the spreadsheets.
RATING TEST INSTRUCTIONS
I have decided to show the scoring while you play through a test which should give you a lot more fun as you see the ratings develop.
The tests are really quite easy to do you just have to give one a go and you will soon get used to it.
A bit of advice for computers that have average settings and fixed settings think about using the fixed settings where applicable on computers who when you change sides continue working towards 60 moves in 30 minutes as the these computers will start to increase their speed of when they move. Fixed setting is much accurate for this test. Ponder doesn't do a lot since you have to take back moves a lot.
Best regards
Below are all 5 Rating Tests for download as well as some PowerPoint instructions. The Excel spreadsheets and Powerpoint were created with MS Office 2010. Therefore earlier versions Excel will probably not work. (sorry I can't help that.)
Since I have carried out about 200 tests with each spreadsheet I know they are fully functional and error free. If you find an error in your test games it is much more probable that you have made a mistake somewhere. To minimize that for you I created some instructions that are also downloadable.
The attachments below are the 5 rating tests, a ranking list and the instructions. I have kept all the tested games in the spreadsheet to allow you to compare the programs.
Test Game 1
http://spacious-mind.com/forum_reports/ ... ellvi.xlsx
Test Game 2
http://spacious-mind.com/forum_reports/ ... known.xlsx
Test Game 3
http://spacious-mind.com/forum_reports/ ... onway.xlsx
Test Game 4
http://spacious-mind.com/forum_reports/ ... lidor.xlsx
Test Game 5
http://spacious-mind.com/forum_reports/ ... lidor.xlsx
Rating List
http://spacious-mind.com/forum_reports/ ... _list.xlsx
INSTRUCTIONS
http://spacious-mind.com/forum_reports/ ... tions.pptx
For anyone that can use the spreadsheets but doesn't have PowerPoint here are the instructions on how to use the spreadsheets.
RATING TEST INSTRUCTIONS
I have decided to show the scoring while you play through a test which should give you a lot more fun as you see the ratings develop.
The tests are really quite easy to do you just have to give one a go and you will soon get used to it.
A bit of advice for computers that have average settings and fixed settings think about using the fixed settings where applicable on computers who when you change sides continue working towards 60 moves in 30 minutes as the these computers will start to increase their speed of when they move. Fixed setting is much accurate for this test. Ponder doesn't do a lot since you have to take back moves a lot.
Best regards
Nick
- paulwise3
- Senior Member
- Posts: 1505
- Joined: Tue Jan 06, 2015 10:56 am
- Location: Eindhoven, Netherlands
Re: SPACIOUS-MIND RATING TEST DOWNLOAD AND INSTRUCTIONS
Nick,spacious_mind wrote:A bit of advice for computers that have average settings and fixed settings think about using the fixed settings where applicable on computers who when you change sides continue working towards 60 moves in 30 minutes as the these computers will start to increase their speed of when they move. Fixed setting is much accurate for this test. Ponder doesn't do a lot since you have to take back moves a lot.
Until now I only saw this behaviour with the Mephisto III.
But of course I tested a lot less machines than you did. Do you know more of these?
Regards, Paul
2024 Special thread: viewtopic.php?f=3&t=12741
2024 Special results and standings: https://schaakcomputers.nl/paul_w/Tourn ... 25_06.html
If I am mistaken, it must be caused by a horizon effect...
2024 Special results and standings: https://schaakcomputers.nl/paul_w/Tourn ... 25_06.html
If I am mistaken, it must be caused by a horizon effect...
- spacious_mind
- Senior Member
- Posts: 4000
- Joined: Wed Aug 01, 2007 10:20 pm
- Location: Alabama
- Contact:
Re: SPACIOUS-MIND RATING TEST DOWNLOAD AND INSTRUCTIONS
Yes the ones that show Fixed on the Level setting column of the list I had to do twice.paulwise3 wrote:Nick,spacious_mind wrote:A bit of advice for computers that have average settings and fixed settings think about using the fixed settings where applicable on computers who when you change sides continue working towards 60 moves in 30 minutes as the these computers will start to increase their speed of when they move. Fixed setting is much accurate for this test. Ponder doesn't do a lot since you have to take back moves a lot.
Until now I only saw this behaviour with the Mephisto III.
But of course I tested a lot less machines than you did. Do you know more of these?
Regards, Paul
Thanks
Nick
Nick
- spacious_mind
- Senior Member
- Posts: 4000
- Joined: Wed Aug 01, 2007 10:20 pm
- Location: Alabama
- Contact:
Hi Michael,blaubaer wrote:Hi Nick,
wow, now I know what you're doing day and night....
Are you still working on Mysticum?
Regards,
Michael
Not day and night as I also have to work, but yes I spend a lot of time on it in my spare time
Good to hear from you. Yes I am doing Mysticum as well, trying to finish Revelation and Gavon 2, then I will add Mysticum as well. It will in the end give a good overview of the top programs on good dedicated hardware.
It just needs a lot of patience to get closer to the end goal.
Best regards,
Nick
- blaubaer
- Full Member
- Posts: 935
- Joined: Thu Jul 28, 2011 12:53 pm
- Location: Bavaria, the centre of Mysticum
- Contact:
Hi Nick,
I have some questions regarding the test:
- You generally play with 30s/move?
- How do you exactly perform the test on the chess computer? You let the cc do the first move, i.e. in Test 1 it is move 4 for black and then you press the respective button to start the move calculation for the white side for move 5 and so on?
- I think the choice of the opening book is relevant for the test - don't you take a note of it?
Regards, Michael
I have some questions regarding the test:
- You generally play with 30s/move?
- How do you exactly perform the test on the chess computer? You let the cc do the first move, i.e. in Test 1 it is move 4 for black and then you press the respective button to start the move calculation for the white side for move 5 and so on?
- I think the choice of the opening book is relevant for the test - don't you take a note of it?
Regards, Michael
Hi Nick,
I ran your first position with Komodo 9.02 but could not finish it because Komodo crashed a couple of time with the comment: no engine loaded.
I switched to Stockfish 5 64 bit which ran stable.
I don't understand your scoring system really. ON move 11 Stockfish plays Nxc8 which is a winning move but gets 0.00 points. On move 12 Stockfish plays ...Nd7 with 0.00 points awarded and at move 16 it plays Qxd5 with zero points. Although there might be in all cases moves which are very slightly better these are all basically winning moves so I don't understand the 0.00 score.
Best wishes,
Gerhard
I ran your first position with Komodo 9.02 but could not finish it because Komodo crashed a couple of time with the comment: no engine loaded.
I switched to Stockfish 5 64 bit which ran stable.
I don't understand your scoring system really. ON move 11 Stockfish plays Nxc8 which is a winning move but gets 0.00 points. On move 12 Stockfish plays ...Nd7 with 0.00 points awarded and at move 16 it plays Qxd5 with zero points. Although there might be in all cases moves which are very slightly better these are all basically winning moves so I don't understand the 0.00 score.
Best wishes,
Gerhard
- spacious_mind
- Senior Member
- Posts: 4000
- Joined: Wed Aug 01, 2007 10:20 pm
- Location: Alabama
- Contact:
Hi Gerhard,kgvetter wrote:Hi Nick,
I ran your first position with Komodo 9.02 but could not finish it because Komodo crashed a couple of time with the comment: no engine loaded.
I switched to Stockfish 5 64 bit which ran stable.
I don't understand your scoring system really. ON move 11 Stockfish plays Nxc8 which is a winning move but gets 0.00 points. On move 12 Stockfish plays ...Nd7 with 0.00 points awarded and at move 16 it plays Qxd5 with zero points. Although there might be in all cases moves which are very slightly better these are all basically winning moves so I don't understand the 0.00 score.
Best wishes,
Gerhard
It is not about winning moves, there are plenty of those in many of the situations. It is about rewarding best moves. In some cases the difference between best moves and second best moves, the second best move goes beyond the parameter of point rewards.
Nxd7 is not a move good enough to score points and neither is Qxd5 on move 16. Check it yourself let the positions run for a while and you will see that these move are not the best ones.
I created these tests where evaluations for each move had much much longer times than 30 seconds or even 3 minutes therefore I know they are not moves worthy of points. Trust in the tests and you will get a good rating in the end
Best regards
Nick
- spacious_mind
- Senior Member
- Posts: 4000
- Joined: Wed Aug 01, 2007 10:20 pm
- Location: Alabama
- Contact:
With engines I do the tests using Fixed time. All tests are 30 seconds per move or the equivalent with dedicated computers. You manually in 2 player mode input the start position. When you get to the start position stop 2 player mode and press enter for the program to start calculating the move. Once the program has moved, you find that move from the drop down list and it gets rated. You then look at what the human played and correct it if the computer played something different and then you hit enter again for the computer to calculate the next move which this time might be the black move and so on. Once you have done that for the game you will see your final game score for the test. In some of the 5 games the programs will do better and in some maybe not quite that well. That is life...that is chess. Once all 5 tests are completed you then let the 5 test game formula calculate the final ELO.blaubaer wrote:Hi Nick,
I have some questions regarding the test:
- You generally play with 30s/move?
- How do you exactly perform the test on the chess computer? You let the cc do the first move, i.e. in Test 1 it is move 4 for black and then you press the respective button to start the move calculation for the white side for move 5 and so on?
- I think the choice of the opening book is relevant for the test - don't you take a note of it?
Regards, Michael
The opening book is uninteresting for this test and it should be switched off if possible. For dedicated computers I deliberately went to start positions where dedicated computers were already out of book. But there might be here and there an exception where computer still has a book move. In those cases take back the move, turn off book and then let the move be calculated again. You will find the computer will most likely play different move to the book.
The intent of these tests is to find out the computers program strength and not its book strength.
Best regards
Nick
- spacious_mind
- Senior Member
- Posts: 4000
- Joined: Wed Aug 01, 2007 10:20 pm
- Location: Alabama
- Contact:
- blaubaer
- Full Member
- Posts: 935
- Joined: Thu Jul 28, 2011 12:53 pm
- Location: Bavaria, the centre of Mysticum
- Contact:
Hi Nick,
what does "equivalent with dedicated computers" exactly mean?
Do you test with two identical chess computer?
Regards,
Michael
spacious_mind wrote:With engines I do the tests using Fixed time. All tests are 30 seconds per move or the equivalent with dedicated computers.
what does "equivalent with dedicated computers" exactly mean?
"You then look at what the human played" means what is played on the left side of the game table and correct the cc's move if it's different?spacious_mind wrote:You then look at what the human played and correct it if the computer played something different and then you hit enter again for the computer to calculate the next move which this time might be the black move and so on.
Do you test with two identical chess computer?
Regards,
Michael
- blaubaer
- Full Member
- Posts: 935
- Joined: Thu Jul 28, 2011 12:53 pm
- Location: Bavaria, the centre of Mysticum
- Contact:
Hi Nick,
impressive!
I played a quick Test Game 1 on my Tasc R30 v2.5 (it was just ready built up and waiting...) - the test result was ELO 2392 and .info Wiki says ELO 2368; a (small) difference of 24 ELO points!
Next will be Mysticum Hiarcs...
Impressed Regards,
Michael
impressive!
I played a quick Test Game 1 on my Tasc R30 v2.5 (it was just ready built up and waiting...) - the test result was ELO 2392 and .info Wiki says ELO 2368; a (small) difference of 24 ELO points!
Next will be Mysticum Hiarcs...
Impressed Regards,
Michael
- spacious_mind
- Senior Member
- Posts: 4000
- Joined: Wed Aug 01, 2007 10:20 pm
- Location: Alabama
- Contact:
HI Michael,blaubaer wrote:Hi Nick,spacious_mind wrote:With engines I do the tests using Fixed time. All tests are 30 seconds per move or the equivalent with dedicated computers.
what does "equivalent with dedicated computers" exactly mean?"You then look at what the human played" means what is played on the left side of the game table and correct the cc's move if it's different?spacious_mind wrote:You then look at what the human played and correct it if the computer played something different and then you hit enter again for the computer to calculate the next move which this time might be the black move and so on.
Do you test with two identical chess computer?
Regards,
Michael
With equivalent I mean 60/30 or 40/20. I try to use average times as well. But some computers whose clock counts forwards when you take back moves, I then also sometimes use fixed time. Which ever gives most of the time 30 seconds of thinking best.
Yes you correct with the human move on the left side of the table which of course was the real game played originally. I also rated the human game so that the computer can be compared against the human players.
Anyway it looks like you got it now since you just completed a test
ps. You must play all 5 test games in order to get a final rating that can be compared to other computers that were tested.
I have found in these tests that Schroeders and King over perform a bit and some of the Langs and Spracklen under perform a bit. That is why I will create more test games in the near future to add to these 5 tests.
But I have also found that engines overall have shown very accurate ratings. The nice thing is that you can actually:
1) Compare all chess programs inside the same universe. (Any Dedicated, DOS, PC Program etc etc)
2) Find clones or related programs
3) Compare strength improvements between same program playing test at 30 Seconds and then playing test at 3 minutes.
4) Humans can be tested and compared against computer.
5) The Grandmasters playing strength can be established.
There is so much I can do with these tests that no other tests can do Of course nothing is 100% accurate but at least everyone gets tested 100% the same way, which means everything is consistent.
Best regards
Nick
Hello Nick,
I am beginning to have real fun and enjoyment with your tests!
As I managed now to run Stockfish 5 64bit in the Mysticum-environment with all four cores active I ran your game 2 and came out with a combined rating of 3275.
I will perform the remaining tests soon and will keep you posted of the results.
Thanks for your great test-suite!
Gerhard
I am beginning to have real fun and enjoyment with your tests!
As I managed now to run Stockfish 5 64bit in the Mysticum-environment with all four cores active I ran your game 2 and came out with a combined rating of 3275.
I will perform the remaining tests soon and will keep you posted of the results.
Thanks for your great test-suite!
Gerhard
- spacious_mind
- Senior Member
- Posts: 4000
- Joined: Wed Aug 01, 2007 10:20 pm
- Location: Alabama
- Contact:
Hi Gerhard,kgvetter wrote:Hello Nick,
I am beginning to have real fun and enjoyment with your tests!
As I managed now to run Stockfish 5 64bit in the Mysticum-environment with all four cores active I ran your game 2 and came out with a combined rating of 3275.
I will perform the remaining tests soon and will keep you posted of the results.
Thanks for your great test-suite!
Gerhard
Great I am looking forward to it. Hopefully very soon I will be able to let you all enter it all online and a database of results is kept for everyone to download any or all results they want.
Alain has been helping with this. In fact while I was creating the tests he has been creating the online version of it.
There is just a final piece of keeping computer entries consistent that we are trying to overcome so we don't have duplicated chess computer names in the test (in other words a standard naming convention). And a consistent time entry as well which is flexible enough to allow different levels to be tested.
Alain is waiting on me for some final input on this piece then quite soon thereafter it will be ready to give everyone access.
Best regards
Nick