Human game Appleshampogal vs Novag Obsidian

This forum is for general discussions and questions, including Collectors Corner and anything to do with Computer chess.

Moderators: Harvey Williamson, Steve B, Watchman

Forum rules
This textbox is used to restore diagrams posted with the fen tag before the upgrade.
User avatar
Steve B
Site Admin
Posts: 10140
Joined: Sun Jul 29, 2007 10:02 am
Location: New York City USofA
Contact:

Post by Steve B »

Monsieur Plastique wrote:

Hi Steve,

The only conclusion I really drew from that match is that the Mark VI defeated the Gameboy. I did not draw a conclusion as to which computer was actually stronger even though others might have. I have plenty of matches in my database where a weaker machine soundly defeats a stronger one and the SSDF databases have many examples as well. I also have examples in various 10 game encounters were the same machine can go down 8 to 2 against itself and then the next time win 7 - 3. It is my experience that even the lowest level of statistical reliability is only achieved after around 40 games and at least 5 different opponents whose own ratings are within as close as possible range to the test machine's estimated rating. My concern is thus that given there is a very good chance the score will not be 5-all, a faulty conclusion will be reached.

Even after 100 games I consider a computer rating to be questionable though in general things tend to stabilise around the 40 game mark unless unsuitable opponents are used.

However since I don't own an Obsidian I do not need to spend time playing the matches but again, I just cannot see it being possible to draw a conclusion on such a small sample. It might be of interest, but I can't see it providing anything definitive. In the end, I will actually be surprised if in the test positions there is any significant variation between the old and new machines unless there has actually been a significant hardware change that Novag did not make public.
Well we will have to agree to disagree
firstly.. I have no idea what is learned when you have the same dedicated unit play against itself..you are basically playing without pondering


Secondly..I have played several 10 game matches over the years ..both online and offline ....both here and on the CCC .. and where the score was lopsided the stronger rated computer always won
never once did I have the opposite occur
I would agree that a close score...6-4 or 51/2-41/2 would tell us very little
but a wipe out score would be quite revealing
in my experience if one computer wins 8-2 or 9-1 there is very little chance the other computer will come back and reverse that score
its theoretically possible I suppose but remote
it might better its score in the next 10 games but not with a lopsided win

anyway...if there are no differences in the moves from some test positions then the whole exercise is moot

the point of a match between the two computers is also to have some fun playing the two machines against each other ..always looking for a reason for some friendly competition


Possible But Remote Regards
Steve
User avatar
fourthirty
Full Member
Posts: 763
Joined: Fri Dec 06, 2013 8:46 pm
Location: San Francisco

Post by fourthirty »

Steve B wrote:
fourthirty wrote: Steve,

212636 has the small black date code sticker on the box and unit.
202541 has the small black date code sticker on the unit (unit was purchased without the box)

My inferior Obsidian may actually play a stronger overall game?

Cue the Rocky music regards...
Greg
Thanks for Info
interesting that your second unit has the sticker on it
seems the original owner knew enough to remove the sticker from the box and place it on the unit ...Steve
Steve,

To clarify, BOTH Obsidian units have the date sticker on the back.

Double date regards,
Greg
User avatar
fourthirty
Full Member
Posts: 763
Joined: Fri Dec 06, 2013 8:46 pm
Location: San Francisco

Post by fourthirty »

appleshampogal wrote: Hi Greg,

Unfortunately, the gentleman I bought it from sold it to me with just the carrying case. He DID however indicate that he bought his "new" and that the only reason he was selling it was because he didn't use it enough. So I can't imagine this unit is very old at all.
Thanks Kat. Interesting that the 100,000 series units don't seem have the date code on the back.
User avatar
Monsieur Plastique
Senior Member
Posts: 1014
Joined: Thu Jul 03, 2008 9:53 am
Location: On top of a hill in eastern Australia

Post by Monsieur Plastique »

Steve B wrote:I have no idea what is learned when you have the same dedicated unit play against itself..you are basically playing without pondering
I was referring to two actual units against each other - not the same actual machine against itself, so pondering is on if the feature is available.

Steve B wrote:Secondly..I have played several 10 game matches over the years ..both online and offline ....both here and on the CCC .. and where the score was lopsided the stronger rated computer always won

It would take a very long time indeed to go through the entire SSDF database to give examples of matches providing a result contrary to rating expectation, but here are a few examples that caught my eye after a very quick perusal. These are examples where the lower rated machine actually won. There are a number of shorter encounters I saw but these are the longer ones:

Novag Super Expert C (1861) vs Milano (1866) result: 11.5 - 8.5
Novag Super Expert C (1861) vs Polgar 10 Mhz (1942) result: 16 - 15
Mephisto Roma 68000 (1870) vs Polgar 5 Mhz (1871) result: 14 - 8
GK2100 (1879) vs Portorose 68000 (1945) result: 11.5 - 8.5
Novag Sapphire (1990) vs Atlanta (1981) result: 9 - 11
Novag Sapphire (1990) vs Mach IV (1975) result: 16.5 - 21.5
Mephisto Berlin (2014) vs Portorose 68000 (1945) result: 9.5 - 10.5

I also noticed quite a few results where the match result was a draw but there was a significant rating disparity. And of course an even larger number of encounters where despite the higher rated machine being victorious, it did not perform at all close to it's actual rating (not that any of us should be too surprised at that).

It would actually be an interesting exercise to go through every listed match on a machine by machine basis and pull out all the anomalies, but as I say, unless I could get hold of the database in a spreadsheet it would be a huge job.

I also have many more examples in my private database of results obtained contrary to rating, though in the end of course the ratings are earned across multiple opponents and a large number of games and are therefore fairly reliable nonetheless.
Chess is like painting the Mona Lisa whilst walking through a minefield.
User avatar
Steve B
Site Admin
Posts: 10140
Joined: Sun Jul 29, 2007 10:02 am
Location: New York City USofA
Contact:

Post by Steve B »

The situation I am addressing is if we have a lopsided score between the 2 computers
I don't see that in your list... but lets not beat this to death

I for one would be interested in seeing a match between the two versions if we see they reply differently to some test positions

if the match score was close then I would come away with the impression that the programs were about the same in strength and nothing significantly changed in the program's overall playing ability
if the match score was lopsided I would come away with the impression that the winning version was stronger
further games could always be played to confirm this

Simple As That Regards
Steve
User avatar
fourthirty
Full Member
Posts: 763
Joined: Fri Dec 06, 2013 8:46 pm
Location: San Francisco

Post by fourthirty »

appleshampogal wrote:I like this idea. Of course, now there is the task of figuring out what the test positions should be!

Chesscomputeruk has a magazine with 20 chess problems for dedicated chess computers with instructions for evaluating the results which *may* be helpful. I definitely want input. Otherwise! I might suggest using a couple dozen positions from past games if that seems advisable.
I'm game also. Due to my travel schedule there may be some delays, but I'd be happy to put the "inferior" Obsidians to work!

Greg
User avatar
appleshampogal
Member
Posts: 126
Joined: Sun Jan 12, 2014 8:53 am

Post by appleshampogal »

fourthirty wrote:
appleshampogal wrote:I like this idea. Of course, now there is the task of figuring out what the test positions should be!

Chesscomputeruk has a magazine with 20 chess problems for dedicated chess computers with instructions for evaluating the results which *may* be helpful. I definitely want input. Otherwise! I might suggest using a couple dozen positions from past games if that seems advisable.
I'm game also. Due to my travel schedule there may be some delays, but I'd be happy to put the "inferior" Obsidians to work!

Greg

I pitted Jill up against Ada, who defeated the Mephisto Milano. I thought a match between them would be insightful. The result was two draws thus far. I also pitted Jill up against Agent Cassidy Adams (MCC) and while the game is still in the opening phase, Jill has already won a pawn. The evaluation favors Jill by +1.0, but we'll see how it goes. Jill opened with f4 for the opening, which already speaks variety, so it should be interesting. I will also see about some test positions as well.
User avatar
Steve B
Site Admin
Posts: 10140
Joined: Sun Jul 29, 2007 10:02 am
Location: New York City USofA
Contact:

Post by Steve B »

Monsieur Plastique wrote:
select just two or three test positions then set the machines to a fixed ply level (say 6 for a complicated middlegame or 9 for an endgame with just minor pieces and pawns)

And the fixed ply Novag levels are the same as the fixed time levels in that they use all the proper "tournament play" heuristics, etc, but of course we don't want to use fixed time levels because we want to see if the two versions take differing amounts of time to go through the same computations.

If the moves are different then that would be a strong indication the program has changed. Timings might vary by a very small percentage but this typical of batch to batch / machine to machine variances (say up to a second or three).
Following Jon's Suggetions i gave the Superior(?)Obisidian three middle game positions
The positions were all selected from the games in Spacious Mind's now cult classic thread on rating the chess computers:
http://hiarcs.net/forums/viewtopic.php? ... sc&start=0

each position offered several reasonable move choices
there are no forced wins or winning tactical shots or forced mates
basically just solid positional play

i set the level to FIXED DEPTH 7 PLY(level FD7)
I recorded the move selected...time ..and Eval

POSITION 1
Botvinnik, M. vs Grob, H. , Zurich 1956
[fen]rnb1kb1r/1p3p1p/pq2pp2/2pP4/4P3/2N2N2/PP3PPP/R2QKB1R w KQkq - 0 10[/fen]
WHITE TO MOVE

Move- Qc2
Time-3 Min. 43Sec.
Eval- +.25




POSITION 2
Unzicker, W. vs Sanchez, M. , Saltsjobaden, 1952
[fen]r1r3k1/2qbbppp/p1np1n2/4p3/Pp1PP3/4NN1P/1PB2PP1/R1BQR1K1 w - - 0 17[/fen]
WHITE TO MOVE

Move-b3
Time-2 Min. 29 Sec.
Eval- +.04



POSITION 3
Mangini, A. vs Kotov, A. , Mar del Plata, 1957
[fen]r2q1rk1/pp4pp/2nbb3/4pp2/2B5/2P1BQ2/PP1N1PPP/R4RK1 b - - 0 14[/fen]
BLACK TO MOVE

Move-Qe7
Time-3 Min. 01 Sec.
Eval- +.04

Lets Rock Regards
Steve
User avatar
Cyberchess
Full Member
Posts: 658
Joined: Wed Jan 08, 2014 6:10 pm

Post by Cyberchess »

:) The Cyberator approves of your methodology, Steve.

GM Alexander Kotov achieved a promising position against Mangini with the black pieces in Position #3.

After black’s last :arrow: [1.)…. Qe7], white needs to take measures against 2.)…. f4.

Everybody in the whole cellblock….
Was dancin’ to the jailhouse rock regards,

John
User avatar
Steve B
Site Admin
Posts: 10140
Joined: Sun Jul 29, 2007 10:02 am
Location: New York City USofA
Contact:

Post by Steve B »

Cyberchess wrote::) The Cyberator approves of your methodology, Steve.

GM Alexander Kotov achieved a promising position against Mangini with the black pieces in Position #3.

After black’s last :arrow: [1.)…. Qe7], white needs to take measures against 2.)…. f4.

Everybody in the whole cellblock….
Was dancin’ to the jailhouse rock regards,

John
Speaking Of The Cyberator...
The Ides Of October are upon us
:wink:


Oktoberfest Regards
Steve
User avatar
appleshampogal
Member
Posts: 126
Joined: Sun Jan 12, 2014 8:53 am

Post by appleshampogal »

Steve B wrote:
Monsieur Plastique wrote:
select just two or three test positions then set the machines to a fixed ply level (say 6 for a complicated middlegame or 9 for an endgame with just minor pieces and pawns)

And the fixed ply Novag levels are the same as the fixed time levels in that they use all the proper "tournament play" heuristics, etc, but of course we don't want to use fixed time levels because we want to see if the two versions take differing amounts of time to go through the same computations.

If the moves are different then that would be a strong indication the program has changed. Timings might vary by a very small percentage but this typical of batch to batch / machine to machine variances (say up to a second or three).
Following Jon's Suggetions i gave the Superior(?)Obisidian three middle game positions
The positions were all selected from the games in Spacious Mind's now cult classic thread on rating the chess computers:
http://hiarcs.net/forums/viewtopic.php? ... sc&start=0

each position offered several reasonable move choices
there are no forced wins or winning tactical shots or forced mates
basically just solid positional play

i set the level to FIXED DEPTH 7 PLY(level FD7)
I recorded the move selected...time ..and Eval

POSITION 1
Botvinnik, M. vs Grob, H. , Zurich 1956
[fen]rnb1kb1r/1p3p1p/pq2pp2/2pP4/4P3/2N2N2/PP3PPP/R2QKB1R w KQkq - 0 10[/fen]
WHITE TO MOVE

Move- Qc2
Time-3 Min. 43Sec.
Eval- +.25




POSITION 2
Unzicker, W. vs Sanchez, M. , Saltsjobaden, 1952
[fen]r1r3k1/2qbbppp/p1np1n2/4p3/Pp1PP3/4NN1P/1PB2PP1/R1BQR1K1 w - - 0 17[/fen]
WHITE TO MOVE

Move-b3
Time-2 Min. 29 Sec.
Eval- +.04



POSITION 3
Mangini, A. vs Kotov, A. , Mar del Plata, 1957
[fen]r2q1rk1/pp4pp/2nbb3/4pp2/2B5/2P1BQ2/PP1N1PPP/R4RK1 b - - 0 14[/fen]
BLACK TO MOVE

Move-Qe7
Time-3 Min. 01 Sec.
Eval- +.04

Lets Rock Regards
Steve


I will also test these on the aforementioned settings when i get home! I will post the results soon,
User avatar
appleshampogal
Member
Posts: 126
Joined: Sun Jan 12, 2014 8:53 am

Post by appleshampogal »

Steve B wrote:
Monsieur Plastique wrote:
select just two or three test positions then set the machines to a fixed ply level (say 6 for a complicated middlegame or 9 for an endgame with just minor pieces and pawns)

And the fixed ply Novag levels are the same as the fixed time levels in that they use all the proper "tournament play" heuristics, etc, but of course we don't want to use fixed time levels because we want to see if the two versions take differing amounts of time to go through the same computations.

If the moves are different then that would be a strong indication the program has changed. Timings might vary by a very small percentage but this typical of batch to batch / machine to machine variances (say up to a second or three).
Following Jon's Suggetions i gave the Superior(?)Obisidian three middle game positions
The positions were all selected from the games in Spacious Mind's now cult classic thread on rating the chess computers:
http://hiarcs.net/forums/viewtopic.php? ... sc&start=0

each position offered several reasonable move choices
there are no forced wins or winning tactical shots or forced mates
basically just solid positional play

i set the level to FIXED DEPTH 7 PLY(level FD7)
I recorded the move selected...time ..and Eval

POSITION 1
Botvinnik, M. vs Grob, H. , Zurich 1956
[fen]rnb1kb1r/1p3p1p/pq2pp2/2pP4/4P3/2N2N2/PP3PPP/R2QKB1R w KQkq - 0 10[/fen]
WHITE TO MOVE

Move- Qc2
Time-3 Min. 43Sec.
Eval- +.25




POSITION 2
Unzicker, W. vs Sanchez, M. , Saltsjobaden, 1952
[fen]r1r3k1/2qbbppp/p1np1n2/4p3/Pp1PP3/4NN1P/1PB2PP1/R1BQR1K1 w - - 0 17[/fen]
WHITE TO MOVE

Move-b3
Time-2 Min. 29 Sec.
Eval- +.04



POSITION 3
Mangini, A. vs Kotov, A. , Mar del Plata, 1957
[fen]r2q1rk1/pp4pp/2nbb3/4pp2/2B5/2P1BQ2/PP1N1PPP/R4RK1 b - - 0 14[/fen]
BLACK TO MOVE

Move-Qe7
Time-3 Min. 01 Sec.
Eval- +.04

Lets Rock Regards
Steve


Here are the results!



Setting Ply Depth 7 (FD7)

Position 1

WHITE TO MOVE

Move- Qc2
Time- 3 Min. 43Sec.
Eval- +.25

Matched





Position 2

WHITE TO MOVE

Move- b3
Time- 2 Min. 29 Sec.
Eval- +.04

Matched



Position 3

BLACK TO MOVE

Move-Qe7
Time-2 Min. 43 Sec.
Eval-+.04

Time Deviation





I think given the last bit I think some more positions are in order to determine this definitively, but this is a good start. I can figure out some as well and post them here. Stay tuned!
User avatar
Steve B
Site Admin
Posts: 10140
Joined: Sun Jul 29, 2007 10:02 am
Location: New York City USofA
Contact:

Post by Steve B »

appleshampogal wrote:
Position 3

BLACK TO MOVE

Move-Qe7
Time-2 Min. 43 Sec.
Eval-+.04

Time Deviation


I think given the last bit I think some more positions are in order to determine this definitively, but this is a good start. I can figure out some as well and post them here. Stay tuned!
re-played position 3 to Double Check my time
remains at 3.01

We are both using the Superior(?)Obsidian WITH NEXT BEST and more varied book
we need some one with the inferior(?)Obsidian to post results

Treading Water Regards
Steve
User avatar
appleshampogal
Member
Posts: 126
Joined: Sun Jan 12, 2014 8:53 am

Post by appleshampogal »

Steve B wrote:
appleshampogal wrote:
Position 3

BLACK TO MOVE

Move-Qe7
Time-2 Min. 43 Sec.
Eval-+.04

Time Deviation


I think given the last bit I think some more positions are in order to determine this definitively, but this is a good start. I can figure out some as well and post them here. Stay tuned!
re-played position 3 to Double Check my time
remains at 3.01

We are both using the Superior(?)Obsidian WITH NEXT BEST and more varied book
we need some one with the inferior(?)Obsidian to post results

Treading Water Regards
Steve


Oh okay! I see now. I forgot you mentioned you also had a similar unit. I am interested to see those who have the other model of Obsidian come forward with their results. I have been corresponding with Jonathan (MonsieurPlastique) and showing him some recent games from my Obsidian. He does concur it plays in a way that deviates from the other inferior Obsidians.

Curious,

Kat
User avatar
Steve B
Site Admin
Posts: 10140
Joined: Sun Jul 29, 2007 10:02 am
Location: New York City USofA
Contact:

Post by Steve B »

appleshampogal wrote:
I have been corresponding with Jonathan (MonsieurPlastique) and showing him some recent games from my Obsidian. He does concur it plays in a way that deviates from the other inferior Obsidians. '

Curious,

Kat
Hmmm
my theory..stated earlier in this thread (and the reason we are testing positions) was that perhaps the "Inferiors" had Next Best deleted and the book shortened..in order to make room for changes in the program and perhaps they actually Enhanced the playing strength
so we have to see what "deviates" means.. do they play stronger or weaker?

i doubt the Obsidians with Next Best and larger book ...also play stronger
i see only two options here...
Inferior's and Superior's are the same in strength
OR
Inferior's actually play a stronger game

But Until we can get some "Inferior's " to show their results of the positions we are still

In The Dark Regards
Steve
Post Reply