Weekly Digest Sunday, March 22, 2026

🆕

Week 12: March 16-22, 2026

71 matches across 38 models • Claude 4.6 debuts in 4 variants • Sonnet 4.6 leads ELO gains (+54) • 9 repetition patterns observed

5 min read

📊 71 matches 🤖 38 models

Week at a Glance

🎮

Matches

🤖

Active Models

🆕

New Variants

📅

Mon (37)

Busiest Day

Matches by Game

TicTacToe20

Dots & Boxes18

WordDuel11

Connect49

Mastermind7

Battleship6

Top ELO Gains This Week

Sonnet 4.6 (TicTacToe)54 ELO

GPT-5.1 (WordDuel)26 ELO

GPT-5.2 (Mastermind)14 ELO

Grok 4 Fast (Mastermind)14 ELO

Gemini 2.5 Flash Lite (WordDuel)14 ELO

🆕

Claude 4.6 Generation Enters the Arena

This week marks the debut of Claude 4.6 with four new variants: Opus and Sonnet, each in text and vision mode. Claude Opus 4.6 (text) shows an early TicTacToe ELO of 1145 across 13 matches, while Sonnet 4.6 (vision) reaches 1132 ELO in the same game with 18 matches. The data is still limited, but initial results suggest competitive performance — especially in strategic board games.

📈

Sonnet 4.6 Leads ELO Gains with +54 in TicTacToe

Claude Sonnet 4.6 (text) was this week's biggest ELO mover, climbing from 1050 to 1104 in TicTacToe over 4 matches. Second place goes to GPT-5.1 with +26 ELO in WordDuel (2 matches). No significant ELO losses were recorded this week, likely due to the lower overall match volume of 71 games.

🤖 or-claude-sonnet-4.6:text:off

🔲

Dots and Boxes Gains Traction as Second Most Popular Game

With 18 matches, Dots and Boxes was the second most-played game after TicTacToe (20). Players seem drawn to the strategic line-drawing challenge. However, the game also generates consistent illegal move spikes — 7 model-game combinations showed elevated rates, suggesting that rule compliance for Dots and Boxes remains a notable challenge for AI models.

🔁

9 Repetition Patterns: Feedback Processing Remains Challenging

Nine repetition bugs were detected across 7 different models. Standout cases: GPT-4o Mini repeated the guess 'RBGY' 7 times in a single Mastermind match, and Gemini 2.5 Flash Lite guessed 'ADIEU' 6 times in WordDuel. In Connect4, three different models (Sonnet 4.5, Gemini 3 Flash, Mistral Large) each repeated moves in column 3. These patterns suggest ongoing difficulties with incorporating game feedback into subsequent decisions — relevant for any AI application that relies on iterative feedback loops.

📊

Quiet Week with Monday Spike: 52% of Matches on One Day

With 71 total matches, this was a quieter week on PlayTheAI. Notably, 37 matches (52%) were played on Monday alone, with the remaining six days averaging just 5-6 matches each. Players challenged 38 different AI models across all 6 available games, with TicTacToe and Dots and Boxes together accounting for over half of all activity.

🇨🇳

Qwen 3.5 Plus Joins the Model Roster

Alongside Claude 4.6, Qwen 3.5 Plus also debuted in both text and vision modes. The vision variant shows early promise in TicTacToe (1083 ELO, 14 matches), while the text variant has yet to secure a win across 17 matches. With limited data, it's too early for conclusions — more matches will reveal where Qwen 3.5 Plus settles in the rankings.

⚓

Battleship and Dots & Boxes: Illegal Move Hotspots

Of the 20 model-game combinations with elevated illegal move rates, 13 involve Battleship and 7 involve Dots and Boxes. In Battleship, GPT-4o Mini (vision) averages 3 illegal moves per match across 20 matches, while GPT-5.2 and Grok 4 Fast (vision) show similarly high rates. Coordinate tracking in grid-based games continues to challenge spatial reasoning capabilities across all model families — a pattern with real-world implications for warehouse robotics and navigation systems.

⚠️

Note: Open Beta

⚠️ Open Beta: Preliminary observations based on limited data. All findings reflect a small sample size and should be interpreted with caution.