Same Model, Different Results
Claude Sonnet feels completely different in Cursor, Claude Code, and Antigravity. The toolchain is the product.
Claude Sonnet feels completely different in Cursor, Claude Code, and Antigravity. The toolchain is the product.
Run Claude Sonnet in three different tools and you get three different experiences. Same weights. Same training data. Different output.
Last month I asked all three to trace how auth worked in the same Next.js repo. Cursor pulled the right middleware files on the first try. Claude Code wandered through three directories before it found the session helper. A raw API call with no codebase indexing gave me a plausible-sounding answer that pointed at a file we deleted six months ago.
Same model. The wrapper around it, the context it sees, and the agent loop running the show changed everything.
Builder.io's framing matches what I saw in that test: in 2026 the best LLM for coding is a full stack, not a model name on a leaderboard.
Cursor lives in your editor. It indexes the repo locally, so @Codebase questions pull semantically relevant files into the prompt. Composer applies multi-file edits without leaving the IDE. I reach for it when I already know roughly where the code lives and want to stay in flow.
Claude Code lives in the terminal. It reads files on demand, runs commands, and can hook into external services through MCP. Extended thinking helps on gnarly bugs. I use it when the fix might require running tests, grepping logs, or touching five files I haven't opened yet.
Antigravity is Google's bet on documentation-first agents. It builds context from explicit artifacts and skills rather than raw repo traversal. Different shape entirely. I haven't lived in it the way I live in Cursor and Claude Code, but the contrast is useful: some teams want the agent reading your ADRs, not your entire src/ tree.
Same Sonnet under all three. Day to day, they feel like different products.
Context strategy is the big one. Cursor retrieves files by semantic similarity. Claude Code scopes dynamically as the agent explores. Antigravity leans on structured docs. Coderide.ai put it bluntly: without context the model guesses, with context it knows. My auth trace failed in the API call for exactly that reason.
Tool harness matters too. Cursor gives the model a fixed toolbox for search, edit, and terminal work inside the IDE. Claude Code loads tools through MCP, so the same session can hit your database or a custom integration. A model that can run your test suite and read the failure behaves differently from one that can only patch files.
Hidden prompts are the part nobody talks about. Every product wraps your message in system instructions before Sonnet sees it. Wording shifts in those layers change output more than most benchmark charts admit. OpenAI's prompt guide is about your prompts. Product prompts are the other half of the conversation, and you never get to edit them.
Agent loop design is where sessions diverge over time. Retry logic, linter checks, when to stop and ask you. A tool that validates its own edit against eslint catches mistakes a single-pass tool ships. I've watched Claude Code loop on a flaky test until it passed. I've watched other tools declare victory after one broken attempt.
I stopped asking "which tool is best" and started asking which one fits the task in front of me.
Terminal brain? Claude Code. Live in the editor? Cursor. Monorepo you barely know? Cursor's @Codebase first, then Composer once you've read the map. Greenfield spike where docs are the source of truth? Worth trying Antigravity's angle.
The Qodo comparison landed on something I keep relearning: output quality tracks how clearly you describe the task at least as much as which logo is on the window. A sharp prompt in a mid-tier tool beats a vague one in the flagship.
If you haven't done this yet, pick one real ticket from your backlog. Something bounded. Run it through two tools and compare where each one wasted your time. That experiment taught me more than any model release blog post.
The model race isn't slowing down. New weights every few months. The toolchain you pick today is the product you're actually buying. Context, tools, hidden prompts, retry logic. That's where the gap shows up when you sit down to ship.