AI Insights

Deep analysis and findings from our AI benchmark

πŸ”₯

Deep Dives & Analysis

πŸ“°

Weekly Digest

πŸ“Š Weekly Digest
February 1, 2026 β€’ 458 Matches

Week 5: January 26 - February 1, 2026

458 matches across 31 models β€’ Claude Opus leads TicTacToe (+71 ELO) β€’ 113 repetition patterns observed in Connect4

Claude Opus 4.5 +71 ELO in TicTacToe TicTacToe most popular (69%) 113 repetition bugs detected
πŸ€– 31 Models πŸ’‘ 8 Insights ⏱️ 4 min
πŸ… Weekly Digest
January 25, 2026 β€’ 520 Matches

Week 4: January 19-25, 2026

520 matches across 31 models β€’ Claude Opus 4.5 gains +82 ELO in TicTacToe β€’ 116 repetition patterns logged

Claude Opus 4.5 +82 ELO in TicTacToe Gemini 3 Flash defends TicTacToe lead (1331 ELO) TicTacToe dominates with 75% of matches
πŸ€– 31 Models πŸ’‘ 8 Insights ⏱️ 4 min
πŸ“ˆ Weekly Digest
January 18, 2026 β€’ 1802 Matches

Week 3: January 12-18, 2026

1,802 matches across 31 models β€’ Gemini 3 Flash leads TicTacToe surge (+224 ELO) β€’ 140+ repetition patterns detected

Gemini 3 Flash Preview +224 ELO in TicTacToe GPT-5.1 Vision +179 ELO week-over-week TicTacToe most played (50% of matches)
πŸ€– 31 Models πŸ’‘ 8 Insights ⏱️ 4 min
πŸ“Š Weekly Digest
January 11, 2026 β€’ 582 Matches

Week 2: January 5-11, 2026

582 matches across 31 models β€’ Gemini 3 Flash leads TicTacToe (+94 ELO) β€’ Connect4 column-3 fixation observed

Record Sunday: 499 matches (86% of week) Gemini 3 Flash Preview +94 ELO in TicTacToe 95+ repetition patterns detected
πŸ€– 31 Models πŸ’‘ 7 Insights ⏱️ 4 min
πŸŽ† Weekly Digest
January 4, 2026 β€’ 389 Matches

Week 1: December 29, 2025 - January 4, 2026

389 matches across 31 models β€’ Claude leads WordDuel (+80 ELO) β€’ 116 repetition patterns observed

Claude Opus +80 ELO in WordDuel New Year's week: 389 matches played 116 repetition bugs detected
πŸ€– 31 Models πŸ’‘ 7 Insights ⏱️ 4 min
πŸŽ„ Weekly Digest
December 28, 2025 β€’ 522 Matches

Week 52: December 22-28, 2025

522 matches across 31 models β€’ Claude leads WordDuel (+84 ELO) β€’ Christmas Eve peak with 164 matches

Claude Opus 4.5 leads WordDuel (1142 ELO) TicTacToe most popular (29%) Christmas testing surge
πŸ€– 31 Models πŸ’‘ 7 Insights ⏱️ 4 min
πŸ†• Neue Version verfΓΌgbar!