Einzelnen Beitrag anzeigen
  #10  
Alt 08.01.2019, 01:28
Benutzerbild von spacious_mind
spacious_mind spacious_mind ist offline
Lebende Foren Legende
 
Registriert seit: 29.06.2006
Ort: Alabama, USA
Land:
Beiträge: 2.222
Abgegebene Danke: 650
Erhielt 1.144 Danke für 510 Beiträge
Aktivitäten Langlebigkeit
0/20 18/20
Heute Beiträge
0/3 sssss2222
Re: Gedanken ueber Schachcomputer Performance

There is a book that I played a thousand times when I was young and still enjoy it today it was written in the late 1960's it is called Solitaire Chess.

https://www.amazon.com/Solitaire-Che.../dp/4871878236

This one was written by Al Horowitz. But there are also others with the same concepts that are fun for humans to play when they are at home alone. It has a points rating system but it is very basic.

Peter mentioned Yagaz, his ideas are similar and probably have the same roots but also have the same problems today as the book example I just gave.

The problem with these books today is that if you run the games through a modern computer, you see that the people who wrote them had something important missing.

1) They did not allow for the hundreds of alternative reasonably good moves. As they gave scoring points for one or two moves and that is it.

2) In many cases the points they gave were wrong when you study them today as they missed better moves which they evaluated as zero points.

So although these books are great, they had gaps in knowledge and research. It did not matter if Grandmaster helped them to create it. Modern analysis will show the mistakes made.

But I loved these books and the concept behind them. So in 2014 when I created these tests I decided to use modern technology to help me evaluate everything. Every possible move, move by move for any game.

But to start I had to sit back and think through the exact same questions which are being asked today, because I knew that regardless of what is done people will find a way to critize its accuracy, choice of game etc etc.

So before I even started working on the first test, I made a small list of problems:

1) What games do I pick
2) How do I know that the evaluations are correct
3) How do I evaluate them
4) etc...

And when I started making these notes, just like comments I am reading here in the posts. A light bulb went on and it said this is crazy!

1) Why am I worried about which game to pick?
2) Why am I worried about the absolute exactness of the evaluation?

Well for 1) and 2) I realized that my questions and worrying about them is absolutely crazy and totally irellevant if I plan to evaluate every single move possible.

So the question is why is it crazy?

Well lets start with the obvious. In 2014 Stockfish 8 or 9 was the strongest program, cant remember which. It sits at 3400 ELO and is about 600 ELO better than the best human player in existence. So why would I be worried about it evaluating a move here and there perhaps differently to Houdini or Komodo. Who cares and what difference does it make.

Let me explain the "Who cares" comment so I will put it again in bold and perhaps the Light will shine with more people. Here is the Light:

I AM EVALUATING EVERY POSSIBLE MOVE WITHIN EVERY POSSIBLE MOVE IN A GAME AND SCORING IT!

That is the Light that has to first shine before the game rating system can be understood. So what does that mean?

Lets say the best move within a given tolerance gets 30 points. And the next move which is almost as good gets 29.4 points and so on down to zero. What difference does it make to the final score whether I used Stockfish, Komodo, Houdini or LC0? The answer is the almost "ZERO DIFFERENCE"

Why? Well because if Komodo evaluated the one move here of there better than Stockfish and vice versa over the the 238 moves that are being evaluated they probably cancel each other out. So lets say over these 5 games there is a difference of 30 points between the best programs out of a Total of 6780 points what is that as a percentage 30/6780? Well it is 0.44% What if the difference is 100 points? Well it is 1.4% Well in a rating system 1.4% amounts to a +/- 14 ELO on a 2000 ELO computer.

So once you sit back and realize this you start to laugh at how crazy you are to even ask a question such as what if the evaluation is wrong from a Stockfish that has a rating of 3400 ELO and where you are planning to evaluate each move by about an hour at a time to get to at least 35 ply or higher (this was a few years ago when i did this test). Try running Critter 1.6 rated CCRL at 3159 ELO. Try running it at 3 minutes per move or 30 seconds per move and you will see that it will not come close to 35-50 ply. So yes once you know this, you also know that you can pretty much evaluate everything and rate it.

The second light that needs to shine is this: IT MAKES ABSOLUTELY NO DIFFERENCE WHICH CHESS GAME YOU USE!

That is the second misconception. I can rate any game I want. Because the game itself is cosmetic. It is the evaluation from Stockfish for example that matters.!!!

Once you get past these two fundamental misconceptions which are bred into us because of the Tests we grew up with like BT2450 or BT2630 then you will realize that there are other ways to do this nowadays!

Anyway this what made me realize that I could do something different, be very accurate in comparisons, not only in the test scores between computers, but also at the same time have a very accurate tool for Clone tests and related programs as you can see deviations or no deviations in each of the 5 games.

I converted the test scores to ELO only for cosmetic reasons because I just like everyone else here likes to get a feel were the rating stands in ELO. But it is cosmetic only.

Frank you asked how many games are needed?

Well when I realized that it made no difference which game to evaluate. I then became nostalgic. I wanted to also rate famous names in history right from the beginning to today. That is why I started with the 1st test which came from a Poem written in the year 1475 that follows a game between Francesco di Castellvi playing Narcisco Vinyoles. That is the earliest ever chess game recorded in history. The second game is the Turk (Automaton) from 1770. The 3rd game is a beautiful series of sacrifices made by Thomas Bowdler in the year 1788. Games 4 and 5 are Philidor. All these games are pre 1800.

I am planning to someday do famous games and players between 1800-1900 and then 1900 to 2000 and then 2000 to now.

As I said it makes no difference to me what the game is. I just want them to be famous players and beautiful games that lets me also compare the rating of the human who played them. Not just the computer that is being tested.

Just imagine for example if I picked 20 Steinitz games or picked his World Championship games. I would be able to rate Steinitz as a player and compare him to Lasker for example. The possibilities are endless.

Anyway to answer your question my 5 games is too small a representation for obtaining accurate data across many different styles of play. But if for example I had another set of games for each of the following timelines then I think it would be extremely accurate and fun to test and play:

5 for before 1800
5 for 1800-1850
5 for 1850-1900
5 for 1900-1950
5 for 1950-2000
5 for 2000-2019

Now these should be classic Games..ie Anderssen's Immortal Game !!

BTW each game because of the depth of analysis needed takes about a week to prepare so you can see why so far I had only done 5 games.

Best regards

Geändert von spacious_mind (08.01.2019 um 01:34 Uhr)
Mit Zitat antworten
Folgende 2 Benutzer sagen Danke zu spacious_mind für den nützlichen Beitrag:
Fluppio (09.01.2019)