Anthropic's Skill Formation Study: Where AI Helps and Where It Hurts
Anthropic's recent study found AI users scored slightly lower on a coding quiz. The research has real signal, the methodology has limits, and you can still use AI without sacrificing learning.
Anthropic's study showed developers using AI to learn a new Python library scored lower on comprehension quizzes than those who struggled through it manually. Generation-then-comprehension beats blind delegation if you want the tool without hollowing out the learning.
Anthropic ran a randomized trial on whether AI helps you learn to code, or just finish faster. Developers who used AI on unfamiliar Trio tasks scored 17 points lower on the follow-up quiz (50% vs 67%) and saved about two minutes. Twitter read that as "AI makes you dumb." I read it as "AI plus a timed quiz plus a brand-new library is a specific trap."
What they actually did
Fifty-two developers, mostly junior, at least a year of Python, some AI tool familiarity, zero Trio experience. One group got an AI assistant for two Trio tasks in 35 minutes. The other didn't. Everyone knew a quiz was coming and was told to work fast.
The AI group did worse on the quiz. The biggest gap was debugging questions. That part worries me. Debugging is how you catch bad agent output and own code you didn't write. Anthropic frames it as "scalable oversight," which is a fancy way of saying someone still has to notice when the model ships nonsense.
How people used AI mattered more than whether they used it. High scorers (65%+) asked conceptual questions, asked for explanations, or did generation-then-comprehension. Low scorers (under 40%) delegated the coding or let AI debug without following along. Same tool, different habits.
Caveats I'd actually argue about in a review
Time pressure. Thirty-five minutes, quiz looming, "work as fast as possible." Of course some people treated the AI like a finish line. Sprint culture does the same thing. The lab setup may have exaggerated the effect, but I don't think it's fake.
Nobody aced the quiz. No-AI averaged 67%, AI averaged 50%. Real gap, shallow mastery on both sides.
n=52. Twenty-six per arm. The qualitative buckets get silly small. "Generation-then-comprehension" had two people. I'm not building policy on a pair.
Junior cohort. I use AI differently than I did at year two. This paper doesn't tell me how seniors would score.
Chat in a browser, not Cursor. Feels closer to 2024 workflows than how I work today. Another reason not to treat this as the last word on agentic tools.
Two minutes still counts. On a ~22-minute task that's roughly 10% faster. The paper called it not statistically significant. In a week of tickets, I'd notice.
Errors teach. AI smooths them away
The no-AI group hit more errors (median 3 vs 1) and fixed them themselves. Classic Trio faceplants: RuntimeWarning for a coroutine never awaited, TypeError when you pass a coroutine object where an async function belongs. Ugly, instructive.
The AI group skipped most of that friction. Fewer scars, less library intuition. The study may have punished them by robbing them of failure loops that actually teach.
Pasting vs typing made no difference on scores. Nine pasters finished fastest. Nine manual typers learned nothing extra. Retyping AI output is theater. Asking "why does this need await here?" is not. Other work also suggests AI-boosted speed doesn't always stick when you take the tool away, which fits.
Fast when you know the domain, muddy when you don't
Anthropic's older observational studies saw up to ~80% speedups on familiar work. This trial found no significant productivity win. Different job: learning a new library cold.
My read: AI is a multiplier on skills you already have. On net-new material, it can trade depth for done. Use it for routine implementation. Be careful onboarding someone to a framework they've never touched.
The experiment used chat assistance, not Claude Code or Cursor writing into the repo. Agentic tools might offload even more cognition. Worth remembering before you hand a junior an unlimited Composer budget.
"But I'm more scalable now"
Fair pushback. Plenty of seniors say AI frees them for architecture. This study only measured short-term comprehension of Trio, not whether you'll design better systems in five years. Different claim.
What I do with this on my team
When I'm learning something new, I don't let the agent close the loop. I ask why. I make it explain tradeoffs. Same instinct as Ask mode before Edit: understand, then generate.
Claude Code Learning, ChatGPT Study Mode, that whole family exists for acquisition, not just throughput. Turn them on when the goal is skill, not when the goal is "ship before standup."
Managers: if your system only rewards closed tickets, AI will look like a cheat code. Juniors will ship Trio they can't debug. I've felt that after a fast agent session: "I got lazy," "there are still a lot of gaps." Sound familiar.
The study has holes. The signal still landed for me. Tutor, not substitute, when the library is new. Copilot, not crutch, when you've been here before.