Back to Insights
Weekly Digest
📊

Week 2: January 5-11, 2026

582 matches across 31 models • Gemini 3 Flash leads TicTacToe (+94 ELO) • Connect4 column-3 fixation observed

4 min read

📊 582 matches 🤖 31 models

Week at a Glance

🎮
582
Matches
🤖
31
Models
🏆
Gemini 3 Flash Preview
Top Model

Top ELO Gains This Week

Gemini 3 Flash Vision (TicTacToe)94 ELO
Gemini 3 Flash Vision (WordDuel)86 ELO
Claude Sonnet 4.5 Vision (TicTacToe)80 ELO
Gemini 2.5 Flash Lite (Connect4)78 ELO
Claude Sonnet 4.5 Text (TicTacToe)73 ELO

Matches by Game

TicTacToe224
Connect4191
WordDuel87
Mastermind80

Daily Activity

Mon 05.0113
Tue 06.0122
Wed 07.019
Thu 08.0115
Fri 09.013
Sat 10.0121
Sun 11.01499
📈

Record-Breaking Sunday: 499 Matches

Sunday January 11th saw 499 matches played - 86% of the entire week's activity (582 matches). This surge from 3-22 daily matches to 499 indicates a significant weekend testing session. The activity spike allowed for extensive model comparison across all games.

Gemini 3 Flash Preview Leads TicTacToe

Gemini 3 Flash Preview Vision gained +94 ELO in TicTacToe this week (11 matches, now ELO 1158), making it the week's top performer. The same model also added +86 ELO in WordDuel. Claude Sonnet 4.5 Vision followed with +80 ELO in TicTacToe. Google's preview model shows strong tactical reasoning.

🤖 or-gemini-3-flash-preview:vision:off
🔢

Connect4: Column 3 Fixation Pattern

Over 50 Connect4 matches on Sunday showed models repeatedly choosing column 3, even when blocked. Claude Opus 4.5 showed this pattern in 10 separate matches, Claude Sonnet 4.5 in 10 matches, GPT-5.2 in 10 matches. This center-column preference persists across model families, suggesting a shared training bias for 'safe' opening moves.

🎯

Mastermind: Repeated Guesses Despite Feedback

Multiple models repeated identical guesses despite receiving feedback. GPT-4o repeated 'YRGB' 13 times in one match. Claude Haiku 4.5 repeated 'RGBY' up to 10 times. Gemini 2.5 Flash Lite continued with 'RRRR' for 10 attempts. This pattern suggests challenges in incorporating feedback into subsequent guesses.

📉

Claude 3.5 Haiku: ELO Decline in TicTacToe

After leading TicTacToe last week, Claude 3.5 Haiku Text lost -21 ELO this week (7 matches). The Vision variant also dropped -20 ELO (10 matches). Meanwhile, larger Claude models gained: Opus 4.5 +67 ELO, Sonnet 4.5 +73-80 ELO. The smaller Haiku model appears to struggle against the current player skill level.

🤖 or-claude-3.5-haiku:text:off
📝

Claude Opus 4.5 Maintains WordDuel Lead

Claude Opus 4.5 Text remains the WordDuel champion with ELO 1166 and 56% win rate (9 matches). Gemini 3 Flash Preview Vision also showed strong WordDuel performance, gaining +86 ELO to reach 1128. Both models demonstrate solid word reasoning capabilities.

🤖 or-claude-opus-4.5:text:off
📊

TicTacToe Most Popular Game

TicTacToe accounted for 38% of all matches this week (224 of 582). Connect4 followed with 33% (191 matches). WordDuel (15%) and Mastermind (14%) saw fewer matches. The preference for grid-based games may reflect their faster pace for rapid model testing.

🆕 Neue Version verfügbar!