SPACIOUS-MIND RATING TEST DOWNLOAD AND INSTRUCTIONS

This forum is for general discussions and questions, including Collectors Corner and anything to do with Computer chess.

Moderators: Harvey Williamson, Steve B, Watchman

Forum rules
This textbox is used to restore diagrams posted with the fen tag before the upgrade.
User avatar
spacious_mind
Senior Member
Posts: 4000
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

SPACIOUS-MIND RATING TEST DOWNLOAD AND INSTRUCTIONS

Post by spacious_mind »

I created this new post to avoid having these lists lost amongst all the test posts.

Below are all 5 Rating Tests for download as well as some PowerPoint instructions. The Excel spreadsheets and Powerpoint were created with MS Office 2010. Therefore earlier versions Excel will probably not work. (sorry I can't help that.)

Since I have carried out about 200 tests with each spreadsheet I know they are fully functional and error free. If you find an error in your test games it is much more probable that you have made a mistake somewhere. To minimize that for you I created some instructions that are also downloadable.

The attachments below are the 5 rating tests, a ranking list and the instructions. I have kept all the tested games in the spreadsheet to allow you to compare the programs.

Test Game 1

http://spacious-mind.com/forum_reports/ ... ellvi.xlsx

Test Game 2

http://spacious-mind.com/forum_reports/ ... known.xlsx

Test Game 3

http://spacious-mind.com/forum_reports/ ... onway.xlsx

Test Game 4

http://spacious-mind.com/forum_reports/ ... lidor.xlsx

Test Game 5

http://spacious-mind.com/forum_reports/ ... lidor.xlsx


Rating List

http://spacious-mind.com/forum_reports/ ... _list.xlsx

INSTRUCTIONS

http://spacious-mind.com/forum_reports/ ... tions.pptx

For anyone that can use the spreadsheets but doesn't have PowerPoint here are the instructions on how to use the spreadsheets.

RATING TEST INSTRUCTIONS

Image

Image

Image

Image

Image

Image

Image

Image

Image

Image

Image

Image

Image

Image

Image

I have decided to show the scoring while you play through a test which should give you a lot more fun as you see the ratings develop.

The tests are really quite easy to do you just have to give one a go and you will soon get used to it.

A bit of advice for computers that have average settings and fixed settings think about using the fixed settings where applicable on computers who when you change sides continue working towards 60 moves in 30 minutes as the these computers will start to increase their speed of when they move. Fixed setting is much accurate for this test. Ponder doesn't do a lot since you have to take back moves a lot.

Best regards
Nick
User avatar
paulwise3
Senior Member
Posts: 1505
Joined: Tue Jan 06, 2015 10:56 am
Location: Eindhoven, Netherlands

Re: SPACIOUS-MIND RATING TEST DOWNLOAD AND INSTRUCTIONS

Post by paulwise3 »

spacious_mind wrote:A bit of advice for computers that have average settings and fixed settings think about using the fixed settings where applicable on computers who when you change sides continue working towards 60 moves in 30 minutes as the these computers will start to increase their speed of when they move. Fixed setting is much accurate for this test. Ponder doesn't do a lot since you have to take back moves a lot.
Nick,

Until now I only saw this behaviour with the Mephisto III.
But of course I tested a lot less machines than you did. Do you know more of these?

Regards, Paul
2024 Special thread: viewtopic.php?f=3&t=12741
2024 Special results and standings: https://schaakcomputers.nl/paul_w/Tourn ... 25_06.html
If I am mistaken, it must be caused by a horizon effect...
User avatar
spacious_mind
Senior Member
Posts: 4000
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Re: SPACIOUS-MIND RATING TEST DOWNLOAD AND INSTRUCTIONS

Post by spacious_mind »

paulwise3 wrote:
spacious_mind wrote:A bit of advice for computers that have average settings and fixed settings think about using the fixed settings where applicable on computers who when you change sides continue working towards 60 moves in 30 minutes as the these computers will start to increase their speed of when they move. Fixed setting is much accurate for this test. Ponder doesn't do a lot since you have to take back moves a lot.
Nick,

Until now I only saw this behaviour with the Mephisto III.
But of course I tested a lot less machines than you did. Do you know more of these?

Regards, Paul
Yes the ones that show Fixed on the Level setting column of the list I had to do twice.

Thanks
Nick
Nick
User avatar
blaubaer
Full Member
Posts: 935
Joined: Thu Jul 28, 2011 12:53 pm
Location: Bavaria, the centre of Mysticum
Contact:

Post by blaubaer »

Hi Nick,

wow, now I know what you're doing day and night.... :wink:

Are you still working on Mysticum?

Regards,
Michael
User avatar
spacious_mind
Senior Member
Posts: 4000
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

blaubaer wrote:Hi Nick,

wow, now I know what you're doing day and night.... :wink:

Are you still working on Mysticum?

Regards,
Michael
Hi Michael,

Not day and night as I also have to work, but yes I spend a lot of time on it in my spare time :P

Good to hear from you. Yes I am doing Mysticum as well, trying to finish Revelation and Gavon 2, then I will add Mysticum as well. It will in the end give a good overview of the top programs on good dedicated hardware.

It just needs a lot of patience to get closer to the end goal.

Best regards,
Nick
User avatar
blaubaer
Full Member
Posts: 935
Joined: Thu Jul 28, 2011 12:53 pm
Location: Bavaria, the centre of Mysticum
Contact:

Post by blaubaer »

Hi Nick,

I have some questions regarding the test:

- You generally play with 30s/move?
- How do you exactly perform the test on the chess computer? You let the cc do the first move, i.e. in Test 1 it is move 4 for black and then you press the respective button to start the move calculation for the white side for move 5 and so on?
- I think the choice of the opening book is relevant for the test - don't you take a note of it?

Regards, Michael
kgvetter
Member
Posts: 239
Joined: Sat May 12, 2012 5:22 pm

Post by kgvetter »

Hi Nick,

I ran your first position with Komodo 9.02 but could not finish it because Komodo crashed a couple of time with the comment: no engine loaded.
I switched to Stockfish 5 64 bit which ran stable.

I don't understand your scoring system really. ON move 11 Stockfish plays Nxc8 which is a winning move but gets 0.00 points. On move 12 Stockfish plays ...Nd7 with 0.00 points awarded and at move 16 it plays Qxd5 with zero points. Although there might be in all cases moves which are very slightly better these are all basically winning moves so I don't understand the 0.00 score.

Best wishes,

Gerhard
User avatar
spacious_mind
Senior Member
Posts: 4000
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

kgvetter wrote:Hi Nick,

I ran your first position with Komodo 9.02 but could not finish it because Komodo crashed a couple of time with the comment: no engine loaded.
I switched to Stockfish 5 64 bit which ran stable.

I don't understand your scoring system really. ON move 11 Stockfish plays Nxc8 which is a winning move but gets 0.00 points. On move 12 Stockfish plays ...Nd7 with 0.00 points awarded and at move 16 it plays Qxd5 with zero points. Although there might be in all cases moves which are very slightly better these are all basically winning moves so I don't understand the 0.00 score.

Best wishes,

Gerhard
Hi Gerhard,
It is not about winning moves, there are plenty of those in many of the situations. It is about rewarding best moves. In some cases the difference between best moves and second best moves, the second best move goes beyond the parameter of point rewards.

Nxd7 is not a move good enough to score points and neither is Qxd5 on move 16. Check it yourself let the positions run for a while and you will see that these move are not the best ones.

I created these tests where evaluations for each move had much much longer times than 30 seconds or even 3 minutes therefore I know they are not moves worthy of points. :) Trust in the tests and you will get a good rating in the end :)

Best regards
Nick
User avatar
spacious_mind
Senior Member
Posts: 4000
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

blaubaer wrote:Hi Nick,

I have some questions regarding the test:

- You generally play with 30s/move?
- How do you exactly perform the test on the chess computer? You let the cc do the first move, i.e. in Test 1 it is move 4 for black and then you press the respective button to start the move calculation for the white side for move 5 and so on?
- I think the choice of the opening book is relevant for the test - don't you take a note of it?

Regards, Michael
With engines I do the tests using Fixed time. All tests are 30 seconds per move or the equivalent with dedicated computers. You manually in 2 player mode input the start position. When you get to the start position stop 2 player mode and press enter for the program to start calculating the move. Once the program has moved, you find that move from the drop down list and it gets rated. You then look at what the human played and correct it if the computer played something different and then you hit enter again for the computer to calculate the next move which this time might be the black move and so on. Once you have done that for the game you will see your final game score for the test. In some of the 5 games the programs will do better and in some maybe not quite that well. That is life...that is chess. Once all 5 tests are completed you then let the 5 test game formula calculate the final ELO.

The opening book is uninteresting for this test and it should be switched off if possible. For dedicated computers I deliberately went to start positions where dedicated computers were already out of book. But there might be here and there an exception where computer still has a book move. In those cases take back the move, turn off book and then let the move be calculated again. You will find the computer will most likely play different move to the book.

The intent of these tests is to find out the computers program strength and not its book strength.

Best regards
Nick
User avatar
spacious_mind
Senior Member
Posts: 4000
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

Hi Gerhard,

I am testing Revelation MM5 at the moment as well as Millennium Chess Genius.

The Revelation result when I am finished should be interesting for you to compare against what you score with your I7 Mysticum MM5.

Best regards
Nick
User avatar
blaubaer
Full Member
Posts: 935
Joined: Thu Jul 28, 2011 12:53 pm
Location: Bavaria, the centre of Mysticum
Contact:

Post by blaubaer »

Hi Nick,
spacious_mind wrote:With engines I do the tests using Fixed time. All tests are 30 seconds per move or the equivalent with dedicated computers.

what does "equivalent with dedicated computers" exactly mean?
spacious_mind wrote:You then look at what the human played and correct it if the computer played something different and then you hit enter again for the computer to calculate the next move which this time might be the black move and so on.
"You then look at what the human played" means what is played on the left side of the game table and correct the cc's move if it's different?

Do you test with two identical chess computer?

Regards,
Michael
User avatar
blaubaer
Full Member
Posts: 935
Joined: Thu Jul 28, 2011 12:53 pm
Location: Bavaria, the centre of Mysticum
Contact:

Post by blaubaer »

Hi Nick,

impressive!

I played a quick Test Game 1 on my Tasc R30 v2.5 (it was just ready built up and waiting...) - the test result was ELO 2392 and .info Wiki says ELO 2368; a (small) difference of 24 ELO points!

Next will be Mysticum Hiarcs... :wink:

Impressed Regards,
Michael
User avatar
spacious_mind
Senior Member
Posts: 4000
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

blaubaer wrote:Hi Nick,
spacious_mind wrote:With engines I do the tests using Fixed time. All tests are 30 seconds per move or the equivalent with dedicated computers.

what does "equivalent with dedicated computers" exactly mean?
spacious_mind wrote:You then look at what the human played and correct it if the computer played something different and then you hit enter again for the computer to calculate the next move which this time might be the black move and so on.
"You then look at what the human played" means what is played on the left side of the game table and correct the cc's move if it's different?

Do you test with two identical chess computer?

Regards,
Michael
HI Michael,
With equivalent I mean 60/30 or 40/20. I try to use average times as well. But some computers whose clock counts forwards when you take back moves, I then also sometimes use fixed time. Which ever gives most of the time 30 seconds of thinking best.

Yes you correct with the human move on the left side of the table which of course was the real game played originally. I also rated the human game so that the computer can be compared against the human players.

Anyway it looks like you got it now since you just completed a test :)

ps. You must play all 5 test games in order to get a final rating that can be compared to other computers that were tested.

I have found in these tests that Schroeders and King over perform a bit and some of the Langs and Spracklen under perform a bit. That is why I will create more test games in the near future to add to these 5 tests.

But I have also found that engines overall have shown very accurate ratings. The nice thing is that you can actually:

1) Compare all chess programs inside the same universe. (Any Dedicated, DOS, PC Program etc etc)
2) Find clones or related programs
3) Compare strength improvements between same program playing test at 30 Seconds and then playing test at 3 minutes.
4) Humans can be tested and compared against computer.
5) The Grandmasters playing strength can be established.

There is so much I can do with these tests that no other tests can do :) Of course nothing is 100% accurate but at least everyone gets tested 100% the same way, which means everything is consistent.
Best regards
Nick
kgvetter
Member
Posts: 239
Joined: Sat May 12, 2012 5:22 pm

Post by kgvetter »

Hello Nick,

I am beginning to have real fun and enjoyment with your tests!
As I managed now to run Stockfish 5 64bit in the Mysticum-environment with all four cores active I ran your game 2 and came out with a combined rating of 3275.
I will perform the remaining tests soon and will keep you posted of the results.

Thanks for your great test-suite!

Gerhard
User avatar
spacious_mind
Senior Member
Posts: 4000
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

kgvetter wrote:Hello Nick,

I am beginning to have real fun and enjoyment with your tests!
As I managed now to run Stockfish 5 64bit in the Mysticum-environment with all four cores active I ran your game 2 and came out with a combined rating of 3275.
I will perform the remaining tests soon and will keep you posted of the results.

Thanks for your great test-suite!

Gerhard
Hi Gerhard,

Great I am looking forward to it. Hopefully very soon I will be able to let you all enter it all online and a database of results is kept for everyone to download any or all results they want.

Alain has been helping with this. In fact while I was creating the tests he has been creating the online version of it.

There is just a final piece of keeping computer entries consistent that we are trying to overcome so we don't have duplicated chess computer names in the test (in other words a standard naming convention). And a consistent time entry as well which is flexible enough to allow different levels to be tested.

Alain is waiting on me for some final input on this piece then quite soon thereafter it will be ready to give everyone access.

Best regards
Nick
Post Reply