Two Models, Thirty-Five Minutes

2 Sents

Claude Opus 4.6 and GPT-5.3-Codex dropped within about thirty-five minutes of each other, and they are optimized for different jobs. Opus 4.6 is better when I need deep, long-context reasoning across a messy codebase, while 5.3-Codex is better in fast edit-test-fix loops and autonomous debugging. For most teams, the bigger limit now is operator skill, not raw model cost.

On February 5, Anthropic released Claude Opus 4.6. About thirty-five minutes later, OpenAI released GPT-5.3-Codex.

Fun gossip timing. What mattered to me was practical. I was already mid-sprint, and both releases changed what I could finish that same day.

Opus 4.6

Opus 4.6 goes big on context and sustained reasoning. You get a 1M-token context window, Anthropic's "Adaptive Thinking," and support for agent teams that split work across parallel workers. I can load a large repo plus tests plus docs and keep one coherent thread.

GPT-5.3-Codex

5.3-Codex goes the other way on speed and execution. OpenAI says it is 25% faster than GPT-5.2, and that lines up with what I felt in coding loops. I can ask for a change, run tests, steer, and repeat without waiting long enough to lose focus.

Reasoning effort

Both model families now let you dial reasoning effort from low to high. Claude adds a "max" tier above high.

I keep most tasks on medium. High or max is for messy architecture, multi-step logic, or math-heavy debugging. Running everything at max feels like running every SQL query with EXPLAIN ANALYZE. Thorough, expensive, and mostly unnecessary.

Picking one

Opus 4.6 when you need to ingest a massive codebase or document set and reason over it as a whole. Thirty files interacting before you change anything? Large context window and deep reasoning.

GPT-5.3-Codex when you want fast, iterative coding with real-time feedback. Describe, generate, test, refine. Also solid for autonomous debugging across a test suite without constant hand-holding.

For most engineers, ecosystem fit matters more than tiny model deltas. Already deep in Claude Code or Cursor? Opus 4.6 slots in cleanly. Team lives in ChatGPT or Codex CLI? 5.3-Codex is the smoother default. I almost never recommend switching ecosystems for marginal gains.

Cost

Claude Opus 4.6 costs $5 per million input tokens and $25 per million output tokens, same as Opus 4.5. GPT-5.3-Codex is bundled in paid ChatGPT plans, and OpenAI still has not posted standalone API pricing. GPT-5.2 API pricing is in a similar neighborhood.

Capability keeps climbing while per-token cost drifts down. In my experience, most teams now lose more money to weak prompting and poor review habits than to API spend. A team burning 10 engineering hours a month on low-quality agent output should fix workflow before negotiating model pricing.

Six months ago

Six months ago, three workflows were mostly demo material for me.

Loading an entire monorepo into one context and asking cross-system architecture questions.
Letting an agent debug failing tests across multiple services without forgetting prior attempts.
Iterating on full pull requests with live steering while the model tracks both the diff and the surrounding codebase.

Now they are normal Tuesday workflows.

The interesting race is no longer "who shipped this week." It is which teams are building better model operators. That is where the next 10x gains are.