Week 11: March 9-15, 2026
99 matches across 38 models • Activity up 5x from last week • 25 repetition patterns detected in Connect4 & TicTacToe
4 min read
Week at a Glance
Activity Surge: 99 Matches with 38 Models
PlayTheAI saw a significant jump in activity this week. 99 completed matches across 38 models and all 6 games — up from 20 matches the prior week. Tuesday was the busiest day with 31 matches, followed by Thursday (20) and Friday (17). Players tested a wide range of models, with activity spread across the full week.
Claude Sonnet 4.6 Vision: Strongest Weekly Gainer (+33 ELO)
Claude Sonnet 4.6 in vision mode gained +33 ELO in TicTacToe across 5 matches, rising from 1089 to 1122. In WordDuel, it also gained +14 ELO. The model appears to benefit from visual board input in strategic games. GPT-4o (vision) gained +31 in Connect4, and GLM 4.7 showed a strong +30 in the same game.
25 Repetition Bugs — Connect4 Most Affected
Repetition patterns increased sharply from 3 to 25 cases. Connect4 accounted for 13 of them, with Claude Haiku 4.5 (3 cases), Claude Opus 4.6 (3 cases), and GPT-5.2 (2 cases) among the most affected. Gemini 2.5 Flash Lite repeated a single column 7 times in one Connect4 match. In WordDuel, Llama 4 Maverick repeated 'HOUSE' 6 times despite receiving letter feedback. The higher match volume naturally leads to more observed bugs, but the concentration in Connect4 suggests the game format is particularly prone to triggering repetitive behavior.
TicTacToe Draws Half of All Matches
50 of 99 matches were TicTacToe — making it by far the most popular game. Its simple rules and quick rounds seem to attract the most players. Connect4 followed with 22 matches, while WordDuel saw 9 matches after no play last week. Battleship (6) and Dots & Boxes (5) remain niche choices, and Mastermind held steady with 7 matches.
Champions Hold Steady — Same Leaders Across Games
The leaderboard remained stable despite high activity. Claude Opus 4.5 continues to lead TicTacToe (ELO 1440, 66 matches), Grok 4 Fast holds Connect4 (1154 ELO, 42 matches), and Gemini 3 Flash Preview stays on top in WordDuel (1272 ELO, 41.2% win rate). In Mastermind, Claude Opus 4.5 (vision) leads with 1074 ELO. No model lost ELO this week — all changes were positive gains.
Claude 4.6 Models Build Track Record
After their debut last week, Claude Opus 4.6 and Sonnet 4.6 are establishing themselves with 20+ matches each. Opus 4.6 (text) reached 1145 ELO in TicTacToe — matching Sonnet 4.5's score. In vision mode, Opus 4.6 shows a 30% win rate in TicTacToe (10 matches, 1088 ELO). Sonnet 4.6 (text) climbed to 1050 ELO in TicTacToe and 1022 in Connect4 with a 20% win rate.
Battleship & Dots and Boxes: Illegal Moves Remain High
Both games consistently show an illegal move rate of 3 per match across nearly all models. In Battleship, GPT-4o Mini (vision) leads with 60 illegal moves in 20 matches, and GPT-5.2 (text) shows 48 in 16 matches. Dots & Boxes sees similar patterns with GPT-5.1 (9 illegal moves in 3 matches). These complex games with coordinate-based input continue to challenge current models more than grid-based games like TicTacToe.