Top 10 Overall Rankings

Based on weighted ELO across all 5 games (minimum 5 matches):

Rank	Model	Overall ELO	Win Rate	Matches
1	Claude Opus 4.5 (Text)	1050	20%	25
2	Claude 3.5 Haiku (Text)	1029	15%	26
3	Gemini 3 Flash Preview (Vision)	1026	7%	27
4	Claude Opus 4.5 (Vision)	1023	9%	23
5	Claude Sonnet 4.5 (Text)	1023	8%	37
6	GPT-5.1 (Vision)	1017	6%	18
7	Grok 4 Fast (Text)	1015	5%	38
8	Llama 4 Scout (Vision)	1013	0%	24
9	Gemini 2.5 Flash Lite (Vision)	1011	0%	25
10	GPT-4o (Text)	1010	9%	23

Key Observations

Claude Models Lead the Pack

Claude Opus 4.5 in text mode holds the top position with 1050 weighted ELO and the highest AI win rate at 20%. Notably strong in WordDuel (1138 ELO) and TicTacToe (1064 ELO).

WordDuel: Where AI Performs Best

Models show their strongest results in WordDuel, with several achieving ELO scores above 1100. Claude Opus 4.5 leads with 1138 ELO, followed by Claude Sonnet 4.5 at 1124 ELO.

Battleship: The AI Challenge

All 137 Battleship matches ended without a single AI victory. The game shows an average of 2.86 illegal moves per match - models struggle with coordinate tracking and state management.

Repetition Patterns Observed

Some models show repetitive behavior, particularly in Mastermind:

Claude Haiku 4.5 repeated "RGBY" 10 times in one match
GLM-4.7 repeated "RRGB" 5 times
Grok 4 Fast repeated "RGBO" 4 times

This suggests challenges with incorporating feedback from previous guesses.

Text vs Vision Input

Text input mode generally outperforms vision mode:

Claude Opus 4.5 Text: 1050 ELO vs Vision: 1023 ELO
Claude 3.5 Haiku Text: 1029 ELO vs Vision: 1007 ELO

ELO Gainers This Week

Claude Opus 4.5 (WordDuel): +80 ELO
Claude Sonnet 4.5 (WordDuel): +70 ELO
Claude 3.5 Haiku (TicTacToe): +50 ELO

Match Results Summary

Outcome	Count	Percentage
Human Won	471	58.5%
AI Error	194	24.1%
Draw	109	13.5%
AI Won	31	3.9%

Open Beta - preliminary observations based on 805 completed matches. Interested in scientific collaboration? Contact us!

Open Beta Status: 805 Matches Overview

Platform Overview

Matches per Game

AI Win Rate by Game