Phoenix Systems Revelation Test Scores

This forum is for general discussions and questions, including Collectors Corner and anything to do with Computer chess.

Moderators: Harvey Williamson, Steve B, Watchman

Forum rules
This textbox is used to restore diagrams posted with the fen tag before the upgrade.
User avatar
spacious_mind
Senior Member
Posts: 4000
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Phoenix Systems Revelation Test Scores

Post by spacious_mind »

Here are some test scores for Revelation. I completed the Revelation engines the dedicated emulation programs, I have not started yet.

Phoenix Systems Revelation

Image

Revelation has a 500 MHz ARM XSCALE 32 Bit Processor with 32 MB RAM.

Sorry about the bad pic. Bad lighting in the room that I am playing.

COLOR KEY

Image


REVELATION TEST SCORES - 30 SECONDS AVERAGE TIME

Image

Deep Sjeng 3.0 performing best was a small surprise since I expected either Hiarcs 13.3 or Shredder 12 to pace first. Deep Sjeng 1.8 liked the tests as well performing much better than expected which resulted in Ruffian 2.1 finishing last.

Best regards
Nick
User avatar
Steve B
Site Admin
Posts: 10140
Joined: Sun Jul 29, 2007 10:02 am
Location: New York City USofA
Contact:

Post by Steve B »

Hey Nick

your results for the Rev are running a solid 200 Pts lower then the SSDF:

39 Revelation Hiarcs 13.3 XScale 500 MHz 2775
42 Revelation Shredder 12 XScale 500 MHz 2706
44 Revelation Rybka 2.2 XScale 500 MHz 2628

Of Course your tests are at 30 Sec.Fixed ... while the SSDF plays full games at 40/2

Partial Explanation Regards
Steve
User avatar
spacious_mind
Senior Member
Posts: 4000
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

Steve B wrote:Hey Nick

your results for the Rev are running a solid 200 Pts lower then the SSDF:

39 Revelation Hiarcs 13.3 XScale 500 MHz 2775
42 Revelation Shredder 12 XScale 500 MHz 2706
44 Revelation Rybka 2.2 XScale 500 MHz 2628

Of Course your tests are at 30 Sec.Fixed ... while the SSDF plays full games at 40/2

Partial Explanation Regards
Steve
Hi Steve,

If you take the per doubling factor of the individual programs that I posted in the ratings estimation post, then you get as follows:

Hiarcs 12 = 2.5*74=185 which equals 2587+185=ELO 2772
Shredder 12 = 2.5*97=243 which equals 2514+243=2757
Rybka 2.2 = 2.5*88=220 which equals 2507+220=2727
Deep Sjeng 3.0 = 2.5*41=103 which equals 2638+103=2741
Shredder 11.74 = 2.5*61=153 which equals 2582+153=2735

I would say that's pretty close to SSDF?

Best regards
Nick
User avatar
Steve B
Site Admin
Posts: 10140
Joined: Sun Jul 29, 2007 10:02 am
Location: New York City USofA
Contact:

Post by Steve B »

spacious_mind wrote:
Steve B wrote:Hey Nick

your results for the Rev are running a solid 200 Pts lower then the SSDF:

39 Revelation Hiarcs 13.3 XScale 500 MHz 2775
42 Revelation Shredder 12 XScale 500 MHz 2706
44 Revelation Rybka 2.2 XScale 500 MHz 2628

Of Course your tests are at 30 Sec.Fixed ... while the SSDF plays full games at 40/2

Partial Explanation Regards
Steve
Hi Steve,

If you take the per doubling factor of the individual programs that I posted in the ratings estimation post, then you get as follows:

Hiarcs 12 = 2.5*74=185 which equals 2587+185=ELO 2772
Shredder 12 = 2.5*97=243 which equals 2514+243=2757
Rybka 2.2 = 2.5*88=220 which equals 2507+220=2727
Deep Sjeng 3.0 = 2.5*41=103 which equals 2638+103=2741
Shredder 11.74 = 2.5*61=153 which equals 2582+153=2735

I would say that's pretty close to SSDF?

Best regards
Hi Nick
im not sure you can do that
if you look at the Wiki ratings you never see a difference of 200 Pts between a computers active rating(30 min) and tournament Rating
actually the differences in rating are not all that great which makes sense to me..with most computers having a higher active rating then Tournament rating
if we used your formula for other dedicated computers i think the Wiki ratings would be way under yours at the Tournament level

for example from your other thread:
http://hiarcs.net/forums/viewtopic.php?t=7308

you show a 30 sec rating for the Van (active) of 2155
dont know what your speed doubling factor would come out to but i imagine it would be at least 50-70 Pts
using your formula you would get 60(lets say for discussion purposes) times 2.5=150 or 2305...way over the Wiki TM rating of 2163 which is lower then its Wiki active rating of 2173

perhaps the formula works better for the XScale processor rather then the Motorola?

Wondering Regards
Steve
Last edited by Steve B on Sun Aug 30, 2015 8:12 pm, edited 1 time in total.
User avatar
spacious_mind
Senior Member
Posts: 4000
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

Hi Nick

not sure you can do that
if you look at the Wiki ratings you never see a difference of 200 Pts between a computers active rating(30 min) and tournament Rating
actually the differences in rating are not all that great which makes sense to me
if we used your formula for other dedicated computers i think the Wiki ratings would be way under yours at the Tournament level

Cautious Regards
Steve
Hi Steve,

You asked about the difference between SSDF and how I rated the Revelations. I just showed you that it is not unreasonable. There is a huge difference in modern engines with sufficient RAM and ROM and old dedicated computers.

Old dedicated computers are maxed out in their intelligence and hardware, so I would never compare them as I did with Resurrection, Revelation, Gavon & Gavon 2.

It would be difficult to compare a 68000 Almeria at 2000 elo with a calculation like this as it would add 947 ELO with a value of 70 per doubling, if you compared it to CCRL's hardware. That would probably be very crazy. I have however tried to show everyone that every single program has a different doubling factor. Same would apply with Almeria if it were to run on Gavon 2. Actually that would be a nice comparison since we know its performance at 68000, 60020 and Revelation.

Besides it is all relative if you take CCRL 40/40 the difference in ELO between Stockfish 6 and Hiarcs 13.2 which is close enough to 13.3 since 13.3 was created to work with Revelation is 332 Points with exactly the same Hardware for both. In my test Gavon 2 is 1.39 times faster than Revelation and the difference is 464 Points. Divide 464 points by 1.39 and you get 333.8 points, which again is very close to the 332 points difference between programs as shown in CCRL 40/40!

My preference is CCRL 40/40 for comparison because of the simple fact that they have 600,000+ games on exact same hardware compared to 136,000+ SSDF games on different hardware and their opponent variety is also greater.

Best regards
Nick
User avatar
Steve B
Site Admin
Posts: 10140
Joined: Sun Jul 29, 2007 10:02 am
Location: New York City USofA
Contact:

Post by Steve B »

spacious_mind wrote:

It would be difficult to compare a 68000 Almeria at 2000 elo with a calculation like this as it would add 947 ELO with a factor of 70 to it if you compared it to CCRL's hardware. That would probably be very crazy.

That makes Sense
although the fact remains that under your methodology the Rev's Active ratings seem to be significantly lower then its TM rating
to be honest i dont know the differences in rating strength PC engines exhibit between Active and TM levels
is it common for PC engines Active ratings to rate well below their TM levels because we dont see that in the older dedicated computers


Thanks Regards
Steve
User avatar
spacious_mind
Senior Member
Posts: 4000
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

Steve B wrote:
spacious_mind wrote:

It would be difficult to compare a 68000 Almeria at 2000 elo with a calculation like this as it would add 947 ELO with a factor of 70 to it if you compared it to CCRL's hardware. That would probably be very crazy.

That makes Sense
although the fact remains that under your methodology the Rev's active ratings seem to be significantly lower then its TM rating
to be honest i dont know the differences in rating strength PC engines exhibit between active and TM levels
is it common for PC engines active ratings to rate well below their TM levels because we dont see that in the older dedicated computers


Thanks Regards
Steve
No I think their rating makes mostly sense with some exceptions like Deep Sjeng 1.8 and of course dedicated exceptions such as Spracklen, Lang and Kittinger performing so badly as well in certain games. Which alludes that I will have to add more test games at some point.

Many of the Revelation programs showed some major weakness in test games 4 and 5. However Hiarcs remained pretty consistent. You can see the performance improvement between Gavon and Gavon 2 and also between Resurrection & Revelation (other than Ruffian). They all trended upwards by quite a lot. Therefore I feel pretty sure since they are not limited by RAM that the 2.5 times difference in time on Revelation would make up a lot of the difference between my Active rating and SSDF tournament rating.

You just have to compare the performance growth from Resurrection to Revelation and have some faith :P Hallelujah !!

Best regards
Nick
User avatar
spacious_mind
Senior Member
Posts: 4000
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

I just looked at difference between CCRL 40/40 Stockfish 6 and Hiarcs 13.2 and difference between Gavon Stockfish 6 and Revelation Hiarcs 13.3. Gavon is most closely related to Revelation and only 35% faster, therefore a pretty accurate comparison.

CCRL Stockfish 6 = ELO 3237
CCRL Hiarcs 13.2 = ELO 2905

= Difference ELO 332

Gavon Stockfish 6 Test Score = 2947
Revelation Hiarcs 13.3 Test Score = 2587

= Difference ELO 360

Difference between Revelation & Gavon = Gavon 35% faster. Revelation Hiarcs 13.3 doubling factor was 74 Points * 35% = 26

360 minus 26 = 334

With this adjustment for hardware I show a difference of 334 ELO compared to CCRL's 332!

With this in mind I will stand behind the ratings for Revelation.

Best regards
Nick
Shaun
Member
Posts: 32
Joined: Sat Nov 30, 2013 8:25 am
Location: Near Brum-in-the-smog, England

Post by Shaun »

Hi. I was just wondering whether you were planning to run a similar exercise for the Rev 2. I played 5 games - Rev 2 (Hiarcs 2900 ELO) vs. Fritz 11 (3076 ELO) recently. 30 mins each. Fritz was noticeably faster but the result was a creditable one win each with three draws, and incidentally some terrific 'no-holds-barred' chess, observed by a mere ~1800 human patzer (me)!

Enjoying the post, many thanks
User avatar
spacious_mind
Senior Member
Posts: 4000
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

Steve B wrote:
you show a 30 sec rating for the Van (active) of 2155
dont know what your speed doubling factor would come out to but i imagine it would be at least 50-70 Pts
using your formula you would get 60(lets say for discussion purposes) times 2.5=150 or 2305...way over the Wiki TM rating of 2163 which is lower then its Wiki active rating of 2173

perhaps the formula works better for the XScale processor rather then the Motorola?

Wondering Regards
Steve
Hi Steve,

To provide a little more clarity on your above observation. At Schachcomputer.info, SSDF and at CCRL you have these ratings:

Info Active list + Tournament List - mixed hardware
CCRL 40/4 list + 40/40 List. - same hardware
SSDF 2hrs/40 List - some mixed hardware

As a result you see for example the following:

40/40 List - Stockfish 6 64-Bit 4CPU ELO 3310 (weaker than 40/4?)
40/4 List - Stockfish 6 64-Bit 4CPU ELO 3386
Active Genius 68030 - 2335 ELO
Tournament Genius 68030 - 2278 ELO (Weaker than active?)

What this would tell you at CCRL is that Stockfish 6 is comparatively speaking a little stronger at 40/4 when compared to its opponents (assuming they are all the same opponents) than when it plays at 1 minute per move. What it doesn't show you however is the reality that if 40/40 were to play 40/4 then you would see a considerable loss for the 40/4 program.

The same applies to Info's lists, they don't show the relative rating loss between 60/30 and 2hr/40, but what they do show well is a relative strength difference between the same program like Genius playing on different hardware and speeds ie. within the active list universe or the tournament list universe (but not combined universe!).

In this regard SSDF's method would be better because they do test to some degree some program performances between different hardware. But since these are played at this long time setting of 2hrs/40 SSDF in my opinion it lacks enough opponents (not enough testers) and the too many games played between two programs creates again in my opinion a kind of tunnel vision where sufficient variety of opponents is lacking, especially since the huge majority of their tests are centered around Chessbase programs and most of everything else is not considered.

The tests that I am doing albeit not perfect because of insufficient tests does however allow you to play whatever level you want to compare and any hardware you want to compare and you should in the final results see the outcome of these settings compared within the same universe.

I guess you can say that everyone has a different methodology :)

Best regards
Nick
User avatar
Steve B
Site Admin
Posts: 10140
Joined: Sun Jul 29, 2007 10:02 am
Location: New York City USofA
Contact:

Post by Steve B »

Thanks Nick

not familiar with CCRL or any PC engines
I am just trying to get my head around why the Rev engines are significantly weaker at Active (according to your methodology) then they are at Tournament controls
again..we don't see this on any of the dedicated computers regardless of hardware
you replied to my question about the modern pc engines and hardware and you said there is also no such disparity in the rating
so this would mean that this anomaly only occurs for the Rev due to its particular hardware specs( processor , ram etc..)


I guess the only way to verify your active ratings would be to do some matches at 30 sec per move

I did conduct a match(and posted it here) with the REV Hiarcs when it was first released in 2011
10 games vs Rev Rybka
1 min time control
Hiarcs trounced Rybka 8.5-1.5 turning in a performance rating of over 2900
even I think that's a bit too high
:P

anyway....I remember not long ago .INFO did include the Phoenix engines in their ratings lists but after a long debate they removed them
do you have any active ratings for the REV(Res ll) engines from those lists before they were removed?

Archival Regards
Steve
User avatar
spacious_mind
Senior Member
Posts: 4000
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

Steve B wrote:Thanks Nick

not familiar with CCRL or any PC engines
I am just trying to get my head around why the Rev engines are significantly weaker at Active (according to your methodology) then they are at Tournament controls
again..we don't see this on any of the dedicated computers regardless of hardware
you replied to my question about the modern pc engines and hardware and you said there is also no such disparity in the rating
so this would mean that this anomaly only occurs for the Rev due to its particular hardware specs( processor , ram etc..)


I guess the only way to verify your active ratings would be to do some matches at 30 sec per move

I did conduct a match(and posted it here) with the REV Hiarcs when it was first released in 2011
10 games vs Rev Rybka
1 min time control
Hiarcs trounced Rybka 8.5-1.5 turning in a performance rating of over 2900
even I think that's a bit too high
:P

anyway....I remember not long ago .INFO did include the Phoenix engines in their ratings lists but after a long debate they removed them
do you have any active ratings for the REV(Res ll) engines from those lists before they were removed?

Archival Regards
Steve
No I don't have any Revelation games, but bear with me, I am trying to complete the Gavon 2 tests followed some Mysticum programs and Palm & Pocket PC and then I will do a tournament with all these programs, which should give us the ratings at active level when done. But that won't start until later this year or early next year.

My rating test could be repeated for Revelation Hiarcs at 3 minutes per move but I am dreading to do it as it takes 6 times longer than doing it at 30 seconds per move :)

Best regards
Nick
User avatar
spacious_mind
Senior Member
Posts: 4000
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

Well based on Steve's concerns about the rating accuracy I decided to also test Revelation Hiarcs 13.3 with a time setting of 180 seconds fixed in order to compare the rating with SSDF where Revelation Hiarcs 13.3 at the setting of 2hrs/40 moves rates as ELO 2775.

http://www.ssdf.bosjo.net/list.htm

REVELATION RATING TEST RESULTS INCLUDING HIARCS 13.3 AT 180S/MOVE

Image

My test scored 21 ELO points less than the rating at SSDF. Also I am glad I did the test not only to satisfy my curiosity but also because I spotted an error with Hiarcs 13.3 playing at 30S Fixed time. Previously I had shown Hiarcs 13.3 at 2587 ELO but it seems I had when transferring the final rating score to the game comparison list I had not carried over the last two moves of test game #3. With this adjustment Hiarcs 13.3 at 30 seconds per move fixed time in fact scores 2602 ELO.

That was a pain because I then had to check all other 170+ computer scores over the 5 tests to make sure that I did not do something similar in other tests. Well I was lucky as this was the only transfer error I found.

TOP 30 RANKING LIST

Image

Revelation Hiarcs 13.3 at 3 minutes per move currently ranks 18th on the list. Revelation Hiarcs 13.3 at 30 seconds per move ranks 27th on the list.

I guess it would be interesting to do the other Revelations programs at 3 minutes per move as well. But that would take me a long time so I will pass on that for now.

SPEED DOUBLING FACTOR

Image

As a results of Hiarcs 13.3 30 second per move fixed time rating improvement to ELO 2602. Hiarcs speed doubling changed as well to 70 ELO per doubling as seen in the above chart.

I find the above chart interesting as it does provide a good guideline. For example:

Hiarcs 13.3 - 30S/move = ELO 2602 + (70*2.5) = ELO 2777 for 180S/move

The rating test scored ELO 2754 which means the difference between using the factor and actual rating test ELO is 23 ELO points. Which I think is also quite close as a guide.

Best regards
Nick
User avatar
spacious_mind
Senior Member
Posts: 4000
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

I had time to add 4 more Revelation Emulation programs to the tests:

Revelation Vancouver - ELO 2209
Revelation Polgar - ELO 2256
Revelation Diablo Select 3 - ELO 2116
Revelation Glasgow - ELO 2013

Not surprising really since Lang programs have problems in scoring well with test # 4, Ed Schroeder's Polgar scored best out of the above programs.

REVELATION RATING TEST RESULTS

Image

Best regards
Nick
User avatar
spacious_mind
Senior Member
Posts: 4000
Joined: Wed Aug 01, 2007 10:20 pm
Location: Alabama
Contact:

Post by spacious_mind »

Had time to add Revelation London to the tests. London with Active setting scored better than Vancouver and finished with an ELO score of 2240. The table in the previous post is updated to include London.
Nick
Post Reply