Back to Blog
Weekly Digest
🆕

Week 8: February 16-22, 2026

248 matches across 38 models • Claude 4.6 debuts • GPT-5.2 leads ELO gains (+69) • 33 repetition patterns observed

5 min read

📊 248 matches 🤖 38 models

Week at a Glance

🎮
248
Matches
🤖
38
Active Models
🆕
6 Variants
New Models
📈
Sat (74)
Peak Day

Top ELO Gains This Week

GPT-5.2 (vision) WD69 ELO
GPT-5.2 (text) TTT60 ELO
Llama 4 Maverick (text) WD54 ELO
GPT-5.1 (text) MM44 ELO
Claude Opus 4.5 (text) TTT39 ELO

Matches by Game

TicTacToe107
Dots & Boxes48
WordDuel30
Connect426
Battleship20
Mastermind17

Matches by Day

Mon 1631
Tue 1726
Wed 1840
Thu 1928
Fri 2028
Sat 2174
Sun 2221
🆕

Claude 4.6 Generation Enters the Arena

Anthropic's latest generation arrived on PlayTheAI this week with 6 new variants: Claude Opus 4.6 (text and vision) and Claude Sonnet 4.6 (text and vision). With 20 matches played across TicTacToe, Connect4, Battleship, and WordDuel, the new models are still in their calibration phase. Early results show cautious play — no wins yet against human opponents, though the sample size remains small. Claude Opus 4.6 (text) reached 1036 ELO over 6 TicTacToe matches, slightly above the 1000 starting point. More data is needed before drawing conclusions about the 4.6 generation's game capabilities.

🤖 or-claude-opus-4.6:text:off
🏆

Claude Opus 4.5 Rises to TicTacToe #1 at 1443 ELO

Claude Opus 4.5 (text:off) now holds the highest TicTacToe rating at 1443 ELO after gaining +39 this week over 2 matches. The model also leads WordDuel at 1248 ELO (+37, 46.7% win rate) and Mastermind via its vision variant at 1074 ELO — making it the champion in 3 of 5 established games. In TicTacToe, a game where draws are common due to its solvable nature, maintaining a rating above 1400 suggests consistently strong positional awareness. This is the strongest cross-game presence of any model on the platform.

🎮 tictactoe 🤖 or-claude-opus-4.5:text:off
📈

GPT-5.2 Posts the Week's Biggest ELO Gains

GPT-5.2 was this week's most improved model with gains across multiple games. Its vision variant climbed +69 ELO in WordDuel (reaching 1180 over 3 matches), while the text variant gained +60 in TicTacToe (1204 ELO over 9 matches) and +41 in WordDuel (1155 ELO over 2 matches). Interestingly, the vision variant lost -12 ELO in TicTacToe despite the text variant thriving there — a reminder that performance can vary significantly between input modes even within the same base model.

🤖 or-gpt-5.2:vision:off
📊

Dots and Boxes Adoption Accelerates

PlayTheAI's newest game saw 48 matches this week, making it the second most-played game behind TicTacToe. The territory-capture game challenges models with spatial planning and the extra-turn mechanic for completing boxes. All models tested so far show elevated illegal move rates (averaging 3 per match), reflecting the game's complex line-placement format. No clear ELO leader has emerged yet — most models remain near the 994-998 starting range, suggesting the game's strategic depth is a genuine challenge for current AI models.

🔄

33 Repetition Bugs: Connect4 Column 3 and Mastermind Loops

This week saw 33 repetition bugs across 6 games and 15 models — cases where models repeated the same move 3+ times. Connect4 remains the most affected game with 10 cases, nearly all involving Column 3 (models including GPT-5.2, Claude Haiku 4.5, Claude Opus 4.6, and Gemini 3 Flash). In Mastermind, Gemini Flash Lite repeated "RRRR" 10 consecutive times and GPT-5.2 repeated "RGOY" 8 times, showing clear feedback processing difficulties. WordDuel saw Grok 4 Fast repeat "STERN" 6 times and Qwen 3.5 Plus repeat "DIESE" 5 times despite receiving letter-position feedback each round.

📅

Saturday Spike: 74 Matches on February 21

Activity distribution this week showed a distinct Saturday peak with 74 matches — nearly 30% of the weekly total in a single day. The remaining days averaged 29 matches each, with a relatively even spread between Monday and Friday. Sunday was the quietest day with 21 matches. The 248 total matches are on par with the previous period's 247, indicating stable platform engagement during the Open Beta phase. Six games are now available, giving players more variety in how they challenge AI opponents.

🔎

Qwen 3.5 Plus Debuts with Promising Vision Results

Alibaba's Qwen 3.5 Plus (February 2026 release) entered PlayTheAI this week with 11 matches across 4 games. The vision variant stands out with a 37.5% average win rate, including a Connect4 win (1030 ELO) and 2 TicTacToe wins (1070 ELO over 5 matches). The text variant played 3 matches without a win so far. With WordDuel and Battleship also tested, Qwen 3.5 Plus is building a profile across the full game lineup — early signs suggest the vision mode may be its stronger configuration.

🤖 or-qwen3.5-plus-02-15:vision:off
🆕 Neue Version verfügbar!