Pull Requests

TL;DR We pulled 7,156 real pull requests authored by Codex, Claude Code, Cursor, Devin, and Copilot, then sliced the data by task type. The headline: the gap between best and worst task category is 29 percentage points — far bigger than the gap between agents inside any single category. Codex is the steady generalist. Claude Code wins documentation. Cursor wins bug fixes. Stop asking which agent is best. Start asking best at what. The wrong question Walk into any engineering Slack and you’ll see the same debate: Codex vs. Claude Code vs. Cursor vs. Devin vs. Copilot. Threads explode. Benchmarks fly. Someone screenshots a leaderboard. ...

Pull Requests

There Is No "Best" AI Coding Agent — And That's the Whole Point

When AI Agents Lie About Their Own Code (Without Meaning To)