Frage: Elo game levels - Seite 4 - Schachcomputer.info Community

		Schachcomputer.info Community > Schachcomputer / Chess Computer: > Die ganze Welt der Schachcomputer / World of chess computers
Frage: Elo game levels

Seite 4 von 4

« Erste

Themen-Optionen

Ansicht

#31

06.07.2024, 18:36

Tibono

TASC R40

Dabei seit 6 Jahren, 5 Monaten und 25 Tagen.

Registriert seit: 22.05.2018

Ort: Frankreich

Alter: 63

Land:

Beiträge: 548

Abgegebene Danke: 2.953

Erhielt 1.305 Danke für 436 Beiträge

AW: Elo game levels

Here is the tournament table

And the ELOStat rating where the Fidelity Excellence level 4 (3Mhz device) has been used as the 'Start Elo' reference value (1798).

In addition, a statistical view where the correlation is calculated from both sets of data per program (Elo settings, EloStat outcome); and the mean deviation results from the unsigned Elo gaps, averaged per program.

Comments per program (alphabetical order):

Arasan 24.2.1
With -200 to -400 achieved Elo as an order of magnitude, the throttled Arasan program is largely overrated. It could not reach better than the low third of the tournament table. I would consider the weak Elo values quite deserved: it is definitely prone to very early blunders, most often obvious, nonsensical gifts, up to 2000 Elo! A very deceptive behavior from the J. Dart program. The way it uses the granted thinking time is compliant but rather artificial (added waits).

Chiron 5
It is on the podium (bronze medal) as far as the correlation set_Elo/achieved_Elo is concerned with 0,981 as its correlation coefficient (better than Shredder 13!). As a result, the Elo ranking resulting from the tournament is perfectly ordered. The mean deviation is rather high, on another hand: close to 400 Elo points on average, even if reduced once levels 1800-2000 are reached. Play is very consistent even at low Elo level, maybe the reason why the claimed Elo looks underrated.

HIARCS 15.3.1
Very early in the tournament rounds I suspected it would be somewhat stronger than expected; that's true according to the ELOStat outcome: mean deviation is over 400 Elo points, that's a couple of categories! Scaling isn't bad, nor perfect: 1600 and 1800 Elo settings achieve the same ELOStat ranking; 1200 setting gets as many points as the 1800 one in the tournament. Best way to appreciate M. Uniacke's program is the playing consistency, without gifts even at low levels. Timing is perfectly fine, very natural; and it is worth mentionning the adapted openings (to the target level), assuming the eponym GUI is used and the internal book selected.

Honey X5i
Definitely an exception within my selection of reknown series - the reason why I was keen on testing it being its adaptive feature, a scarce one amongst chess software. Okay, the tournament context is not exactly about adaptive behavior, rather predefined strength. But hey, let's check anyway... The ranking resulting from the tournament is perfect, but the rating from ELOStat not so much (with 2000 Elo setting slightly inferior to the 1800 one). As a consequence, the correlation coefficient is mediocre. But the main issue is with the deviation, over 800 Elo points on average and over 1000 for the 1200 Elo setting! The theoretical Elo setting is widely underrated... The moves played are consistent, but of course this is not much significant with regards to the too high strength level. No adaptation occurs for the timing: instant moves, regardless of the granted thinking time.

Komodo Dragon 3.3
Silver medal on the correlation podium and under 200 Elo points for the mean deviation, the data is successful. Unsurprisingly, the ELOStat ranking is perfect. The deviation decreases along with higher Elo levels being set (thus meaning lower accuracy for lower Elo levels). Play is consistent even at lowest levels (not that low, though). Timing is not adjusted (instant play). A concern: the UCI option names for limiting strength are not standard, therefore need to be set apart from the usual GUI options, using the engine configuration.

Maia
As a reminder, I used a non-standard implementation (no one-node level, rather nps throttling). As a consequence, the timing is perfect and tactical blunders frequency is reduced. Nevertheless, Maia keeps being prone to such mistakes. Correlation remains weak, particularly 1200 and 1400 nets are underrated by over 350 Elo points. 1400 to 1800 settings could only settle apart in the tournament table thanks to the tie-break (all three versions got 10 points). Shortly said, expecting complying Elo levels from Maia is uneasy to achieve, if any possible. I anyway like to use Maia for the playing style that remains different from usual chess software, and also different across the various nets that can be used. Lower level nets play rather aggressive chess, higher level ones play more cautiously. Too bad this is often ruined by unexpected mistakes (I suspect a very high selectivity due to the net, strongly steering the search). I think a touch of brute force would help!

Mephisto Academy
Correlation hits a medium value, mean deviation is under 200 Elo points: these are rather healthy scores. ELOStat ranking is perfectly ordered, as well as the tournament table is. That's an achievement from the oldest Schröder participant! But it can offer gifts when set to low levels; things get better starting with Elo 1600 setting. A small limitation (as well true for the RISC and Nigel Short): to use Elo levels, you have no timing choice but tournament or active chess. As a consequence, I used 30 minutes per game as the standard for all programs involved in this tournament. By the way, timing of play is compliant.

Mephisto Nigel Short
Worst correlation within this tournament... Due to 1400 & 1600 Elo settings that achieved much too strong ELOStat values. Extreme settings look much better with around 100 Elo points deviation, which is low. Nevertheless, the tournament and ELOStat rankings are perfectly fine. Play is consistent, even using low Elo levels. Timing is fine; worth mentioning: once out of time, the program settles to using three seconds per move.

Mephisto RISC
With only two participants (1800 & 2000 Elo settings), the correlation is not meaningful (value 1=maximum, two points cannot be misaligned). The lowest Elo you can set is 1700, which is somewhat elitist. Mean deviation is same as Academy's one. Of course, with the rather high levels offerred, play is consistent. I faced a concern with the timing: the RISC sometimes overstepped its own granted 30 minutes, from this point on was unable to anymore comply with the expected thinking time (now apparently unlimited). I therefore needed to either reassign the Elo level in order to grant another 30 minutes bucket, or manually force each move using relevant reduced time to think.

Millennium the King Performance
Gold medal as far as correlation is concerned! On another hand, the ELOStat calculation results in over 400 Elo points mean variation. But this is a score close to the ones of Chiron 5 and HIARCS, nothing to be ashamed of. Ordering is perfect, both in the ELOStat rating list and the tournament table ranking. But considering only three levels could attend the tournament, with large gaps from one level to the next one, correct ordering was to be expected. This is a (small) drawback of the easy levels settings that belong to a short list, no adjustment can be made. Another limitation is the timing (instant play, or about five seconds/move using the comfort menu Elo levels). But the user can adjust the CPU clock speed, to somewhat slow this down for pace convenience. Another concern (from the player point of view) is the opening book that is still fully used, regardless of the Elo limit. The de Koning program plays very consistently, even using the lowest levels.

Shredder 13
It failed to achieve a consistent ranking order over the ten tournament rounds, thankfully the ELOStat calculation restored the correct expected rating, perfectly ordered. The correlation coefficient is fine enough, the mean deviation slightly over 200 Elo points, which is rather good. Usage of the thinking time is perfect. Play is consistent down to 1200 Elo level, but 1000 Elo can commit gifts. Good, though not best as I would have expected. Using the native GUI, the opening book can be restrained along with using weakened levels.

Stockfish 16.1
The most accurate one, with only 87 Elo points average deviation. Elostat rating is accordingly perfectly ordered; whilst the tournament ranking is not. Timing is fine, but unusual: the engine appears to search full speed and depth, in the end selecting an inferior move than the one displayed as the principal variant. Correlation is weak, due to the 2000 setting that underperformed (is overrated, so). Strange, I would have expected this engine to be more comfortable with higher Elo throttling...

Wasp 7.0
It achieved correct ordering both in the tournament ranking and the ELOStat rating; not much more than 300 Elo points deviation and not so bad a correlation. Timing is fine. It lacks low levels (min is 1500) but the main issue is: it often fails to wrap the win if short of time: unable to mate with Q+K vs K at level 2000, also unable to mate with Q+R+K vs K at level 1800! Just unacceptable...

MfG,
Eric

Geändert von Tibono (06.07.2024 um 18:41 Uhr) Grund: typo

Folgende 5 Benutzer sagen Danke zu Tibono für den nützlichen Beitrag:
Chessguru (06.07.2024), Egbert (06.07.2024), Robert (08.07.2024), Schachhucky (07.07.2024), spacious_mind (06.07.2024)

#32

07.07.2024, 13:27

Tibono

TASC R40

Registriert seit: 22.05.2018

Ort: Frankreich

Alter: 63

Land:

Beiträge: 548

Abgegebene Danke: 2.953

Erhielt 1.305 Danke für 436 Beiträge

AW: Elo game levels

Hi all,

about Maia and search width, I analyzed this position from round 7, Maia_1800 plays black vs Academy_1800, move 56:

Maia played 56. ... Ke4?? that immediately lost (mate in 1: 57.Rd4#).

First control: I set the position and level = 1 node, Maia_1800 plays the same: this move is indeed the "natural" one from the 1800 net knowledge.

I then increased the granted number of nodes and used multi-PV set to 4 and repeated the search (with an engine restart each time) until the Ke4 move is assigned the -M1 score, meaning Maia is fully aware of the potential blunder.

Worth reading: Lc0 options. The options that drive the search width appear to be CPuct, CPuctBase and CPuctFactor; with an additional set of options dedicated to the root of the search.

Using the default setting of lc0 (here I run the 026.3 version): CPuct & CPuctAtRoot=2.147; CPuctBase & CPuctBaseAtRoot=18368; CPuctFactor & CPuctFactorAtRoot=2.815) Maia still wants to play Ke4 up to 8 nodes searched (score -1.59).
Starting with 9 nodes searched, Maia prefers the correct move Kc4 thanks to the better score -1.52, but Ke4 is still granted -1.59. Reached depth is 3/5.

Maia needs 16 searched nodes in order to assign Ke4 the correct -M1 score; reached depth is 3/6. According to the NPSLimit I set for Maia_1800 (1.26), it requires a 12.7 seconds search (and according to the move number 56, the 30 minutes limit for all game, and the played blunder, Maia_1800 must have spent less time on that move).

Then I pushed the above UCI options to each limit in order to maximize the width of the search: CPuct & CPuctAtRoot=100; CPuctBase & CPuctBaseAtRoot=1; CPuctFactor & CPuctFactorAtRoot=1000. Maia still wants to play Ke4 up to 7 nodes searched (-1.81 score ).
Starting with 8 nodes searched, Maia prefers the correct move Kc4 thanks to the better score -1.55, but Ke4 is still granted -1.81. Reached depth is 2/4.

Now to the real gain: Maia needs only 11 searched nodes to assign Ke4 the -M1 score, depth 2/5 (reminder: default setting needed 16 nodes and depth 3/6). Associated displayed variant is Kd5e4 Rd2d4+. Maia is now aware. This result would have required 8.7 seconds search using my NPSLimit setting, a better chance than the above 12.7 using default values.

Worth additional tests over full games...
MfG,
Eric

#33

08.07.2024, 11:32

Tibono

TASC R40

Registriert seit: 22.05.2018

Ort: Frankreich

Alter: 63

Land:

Beiträge: 548

Abgegebene Danke: 2.953

Erhielt 1.305 Danke für 436 Beiträge

AW: Elo game levels

Zitat von Tibono

Worth additional tests over full games...

Well, I ran many matches Maia_1800 "default setting" vs Maia_1800 "extreme width search"; no way to get the latest win a decent share of the games, even granting it a massive acceleration (I tested up to 16 times the nps limit). Worst: even if accelerated enough to reach same or so and even deeper search depth, it keeps being unable to cope with the default setting.

I then tried out some intermediate, more reasonnable settings, but the "brute force" effect (ability to detect the mate in 1 threat at low search depth) just vanished.
In a nutshell, I could not get a really convincing behavior and shall stick to the default setting for now.
Cheers,
Eric

Seite 4 von 4

« Erste

« Vorheriges Thema | Nächstes Thema »

Forumregeln
Du bist nicht berechtigt, neue Themen zu erstellen. Du bist nicht berechtigt, auf Beiträge zu antworten. Du bist nicht berechtigt, Anhänge hochzuladen. Du bist nicht berechtigt, deine Beiträge zu bearbeiten. BB code ist An Smileys sind An. [IMG] Code ist An. HTML-Code ist An. Forum Regeln

Gehe zu

Ähnliche Themen
Thema	Erstellt von	Forum	Antworten	Letzter Beitrag
Turnier: 10 game match between TheKing personalities	Ray	Partien und Turniere / Games and Tournaments	9	27.01.2022 14:52
Hilfe: Great Game Machine	Sargon	Die ganze Welt der Schachcomputer / World of chess computers	4	19.07.2012 06:46
Save Game Funktion	Stefan	Die ganze Welt der Schachcomputer / World of chess computers	1	27.05.2005 23:32

Alle Zeitangaben in WEZ +1. Es ist jetzt 02:59 Uhr.