Skip to main content
Nathan Fennel
Back to Blog

Two Models, Thirty-Five Minutes

Anthropic and OpenAI released their flagship models within minutes of each other. Here's what matters for the engineers actually using them.

On February 5th, Anthropic released Claude Opus 4.6. Roughly thirty-five minutes later, OpenAI released GPT-5.3-Codex. Nobody coordinated this. But it also was not a coincidence.

Both companies are watching each other's release cadence with the intensity of chess players studying each other's openings. When one moves, the other is already staging a response. The result, for those of us actually using these tools daily, is a sudden jump in capability on the same afternoon. You go to lunch with one set of tools and come back to a meaningfully different landscape.

That is interesting in itself. But what actually matters is what changed under the hood, and whether any of it affects how you work tomorrow morning.

What Each Model Brings

Claude Opus 4.6 pushes in the direction of scale and sustained reasoning. It ships with a 1M-token context window, a feature Anthropic calls "Adaptive Thinking" (which dynamically adjusts reasoning depth based on problem complexity), and support for agent teams that can split complex tasks across parallel workers. Where it excels is long-context reasoning: ingesting a large codebase, a full document set, or a sprawling test suite and reasoning over the whole thing holistically. If your workflow involves navigating enterprise-scale codebases or running sustained agentic coding sessions, Opus 4.6 is a meaningful step up from Opus 4.5.

GPT-5.3-Codex takes a different angle. It is 25% faster than GPT-5.2, strong at autonomous debugging, and designed for full software lifecycle management. OpenAI notably disclosed that 5.3-Codex was the first model used to assist in its own creation, which is either impressive or unsettling depending on your disposition. Its strength is fast, focused coding execution: the kind of iterative loop where you describe a change, watch it happen, steer, and repeat.

Reasoning Effort

Both model families now let you dial reasoning effort from low to high. Claude adds a "max" tier above high. The idea is straightforward: low effort is fast and cheap for tasks where the answer is obvious, high effort applies more compute for problems that require multi-step logic, mathematical reasoning, or complex architectural decisions.

The practical takeaway for day-to-day work: medium is fine for most things. You should only reach for high or max when the problem genuinely requires sustained chain-of-thought reasoning. Running every request at max effort is like running every SQL query with EXPLAIN ANALYZE. It is technically more thorough, but it is mostly a waste of time and money. Save the heavy compute for the problems that actually need it.

When to Reach for Which

Opus 4.6 is the pick when you need to ingest a massive codebase or document set and reason over it as a whole. If the task requires understanding how 30 files interact before making a change, you want the large context window and the deep reasoning. It is also the stronger choice for agentic workflows where the model needs to maintain state across a long sequence of operations.

GPT-5.3-Codex is the pick when you want fast, iterative coding with real-time feedback. If you are building a feature and want tight loop execution (describe, generate, test, refine), the speed advantage is noticeable. It is also solid for autonomous debugging, where the model can identify and fix failures across a test suite without constant hand-holding.

For most engineers, both are good enough that the deciding factor is which ecosystem you are already invested in. If you live in Claude Code or Cursor, Opus 4.6 slots in naturally. If you are in ChatGPT or Codex CLI, 5.3-Codex is the obvious choice. Switching ecosystems for a marginal model difference rarely makes sense.

Cost

Claude Opus 4.6 prices at $5 per million input tokens and $25 per million output tokens, identical to Opus 4.5 (which saw a 66% price drop from the Opus 4.1 era). GPT-5.3-Codex is bundled into ChatGPT paid plans, with standalone API pricing still to be announced. For reference, GPT-5.2 API pricing sits in a comparable range.

The trend line is clear: capability is going up while cost per token is trending down. For most teams, the API cost is no longer the bottleneck. The bottleneck is knowing how to use the tools effectively. A team spending $200/month on API calls and losing 10 engineering hours to poor prompting has a people problem, not a pricing problem.

What Wasn't Possible Six Months Ago

Things that were impractical before this generation of models: ingesting an entire monorepo into a single context window and asking architectural questions across the whole thing. Having an agent autonomously debug its own failing tests across multiple services without losing track of what it already tried. Generating and iterating on full pull requests with real-time steering, where the model understands the entire diff and the broader codebase it sits within.

None of these are science fiction anymore. They are Tuesday afternoon workflows. The gap between "proof of concept demo" and "thing I actually use in production" has been closing fast, and this week it closed a bit more.

The interesting story is not that two companies shipped on the same day. It is that the tools are now good enough that the limiting factor has shifted from "can the model do it" to "does the engineer know how to ask." That is the conversation worth having.