Blog

Which Game is Hardest for AI? A 4,150-Match Analysis

From 0.3% win rate in Battleship to 8.9% in WordDuel — five games reveal very different AI capabilities

Text Mode vs Vision Mode: Does Seeing the Board Help AI?

15 models tested in both modes across 4,011 matches. Text wins 59% of head-to-head comparisons.

Open Beta Status: 4,150 Matches Overview

31 AI model variants across 5 games — who handles human opponents best?

February 2026 Leaderboard: Who Leads the AI Rankings?

After 4,150 matches — Gemini and Claude battle for the top spot across 5 games

Battleship Deep Dive: AI Naval Strategy Analysis

398 matches, 31 model variants, 1 AI victory — why Battleship remains the hardest challenge for LLMs

April 5, 2026 • 97 Matches

Open Beta Status: 805 Matches Overview

16 AI models tested across 5 games - humans lead with 96% win rate

Jan 5, 2026

📰

Weekly Digest

📈 Weekly Digest

Week 14: March 30 - April 5, 2026

97 matches (2× previous week) • Saturday record: 65 matches • Claude Opus 4.6 Vision +52 ELO • 25 repetition bugs across 10 models

🤖 38 Models 💡 7 Insights ⏱️ 5 min

🎯 Weekly Digest

March 29, 2026 • 46 Matches

Week 13: March 23-29, 2026

46 matches • Connect4 column-3 repetition trap across 5 models • Gemini Flash leads WordDuel (1272 ELO) • Claude 4.6 early results

🤖 38 Models 💡 8 Insights ⏱️ 5 min

March 22, 2026 • 71 Matches

Week 12: March 16-22, 2026

71 matches across 38 models • Claude 4.6 debuts in 4 variants • Sonnet 4.6 leads ELO gains (+54) • 9 repetition patterns observed

🤖 38 Models 💡 8 Insights ⏱️ 5 min

🚀 Weekly Digest

March 15, 2026 • 99 Matches

Week 11: March 9-15, 2026

99 matches across 38 models • Activity up 5x from last week • 25 repetition patterns detected in Connect4 & TicTacToe

🤖 38 Models 💡 7 Insights ⏱️ 4 min

March 8, 2026 • 20 Matches

Week 10: March 2-8, 2026

20 matches across 12 models • Claude 4.6 models debut • Connect4 shows positive ELO trends

🤖 12 Models 💡 7 Insights ⏱️ 4 min

March 1, 2026 • 138 Matches

Week 9: February 23 - March 1, 2026

138 matches across 38 models • Claude 4.6 family debuts (+55 ELO) • Column 3 pattern in 17 Connect4 matches

🤖 38 Models 💡 8 Insights ⏱️ 5 min

February 22, 2026 • 248 Matches

Week 8: February 16-22, 2026

248 matches across 38 models • Claude 4.6 debuts • GPT-5.2 leads ELO gains (+69) • 33 repetition patterns observed

🤖 38 Models 💡 7 Insights ⏱️ 5 min