Code Review Fatigue and The New Seniority

AI did not remove the hard part of software engineering. It moved it from writing code to proving the code is worth trusting.

Jun 26, 2026

The Verification Tax

I think the fight over AI coding is less about taste than people admit.

At one pole, Andrew Kelley bans LLM-generated contributions from Zig. His argument is economic. Maintainers do not have infinite review time, and a low-effort AI pull request spends the scarcest resource in the project: the attention of someone who can actually tell whether the patch belongs there.

At the other pole, Andrej Karpathy gave us the language of vibe coding, and then the industry took the phrase as permission to stop touching the keyboard. The emotionally honest version of that position is seductive: if the agent can produce the code, why keep paying the human cost of typing it?

Both reactions make sense. That is the uncomfortable part.

The human cost of generating code has gone down significantly while the cost of trusting code has remained the same. Every team that adopts agents eventually runs into this conversion rate. Implementation time turns into verification debt. Sometimes the exchange rate is favorable. Sometimes it is awful.

That is the verification tax.

The Kelley-Karpathy Split Is Rational

I do not read Kelley’s position as nostalgia for hand-written code. I read it as a maintainer protecting the review loop. A compiler project is a hostile environment for median code. It cares about undefined behavior, portability, bootstrap constraints, diagnostics, performance, language design, and a hundred small invariants that are obvious only after years inside the project.

An agent can write code that looks like a contribution but the contributor remains a human, at least for now. That distinction matters. A human who submits a clumsy patch can learn the project. Review time may be an investment. A drive-by AI patch usually has no future learning curve attached to it. The human submitter can disappear, leaving the maintainer to explain the difference between “passes local tests” and “belongs in the compiler.”

So the ban is not irrational. It is a local policy for a system where review bandwidth is the production bottleneck.

Karpathy is also not irrational. He is reacting to a different production function. If I am building a prototype, exploring an API, or generating scaffolding around a clear boundary, an agent can be absurdly useful. I have had the same experience: the code appears faster than my hands can make it. The first draft often gets me to the real question sooner.

The split appears because people are measuring different loops. The enthusiast measures time-to-first-working-version. The skeptic measures time-to-correct-and-maintainable-version. Those are not the same metric.

The perfection loop, even with agents in it, scales down the expected AI speedup.

The METR Result Was A Warning, Not A Verdict

The best empirical work I have seen does not give either side a clean victory. METR’s 2025 randomized trial studied 16 experienced open-source developers working on 246 tasks in mature projects they already knew. The developers expected AI tools to make them faster. Afterward, they still felt faster. Measured task completion time went the other way: with the early-2025 tools in that setting, they were slower.

I would not turn that into a law. Sixteen developers is not civilization. The tools have already improved. The study also focused on mature projects, which are exactly where hidden context matters most.

But that is why I take the result seriously.

The interesting finding is not “AI makes developers slower.” The interesting finding is that experienced engineers can feel acceleration while the system slows down. Prompting feels like progress. Watching the agent stream code feels like progress. Accepting a patch feels like progress. The meter spins.

Then the senior engineer starts paying the bill.

They read the diff. They chase the invariant. They notice that the agent used the local helper but bypassed the authorization boundary. They ask why the migration touches a table it does not own. They wonder whether the “cleanup” changed behavior. None of this work vanished. It moved downstream.

A separate 2025 study of Copilot adoption in open-source projects found the same shape from another angle: less-experienced developers produced more, but the added maintenance and review burden shifted toward core developers. That is the profession in miniature. Output rises. Judgment becomes the choke point.

The Dangerous Code Looks Boring

A syntax error is a kindness, but we all know that the worst AI-generated code is not the code that obviously fails.

The dangerous patch looks cruelly normal. It follows the local naming style. It imports the expected library. It adds a test. The test is often too close to the implementation, but at a glance the ritual has been performed.

Then you look closer.

The schema migration assumes a nullable field is always present because every fixture had it. The retry loop catches the broad exception and converts a partial write into a silent success. The authentication check moved below a cache read because the agent optimized for the happy path. The refactor split one ugly function into four clean ones and lost the fact that a side effect had to happen before the second branch returned.

This is where senior engineers are feeling the profession change under their feet. The old loop was: think, type, run, revise. The new loop is closer to: specify, generate, inspect, constrain, reject, generate again. That is not a small tooling change. It changes where competence lives.

When I hand-write code, some checks happen before language. My hands do not type the broad catch. My hands do not put the auth check after the cache read. My hands have absorbed scars from old outages, old bugs, old reviews, old shame. The agent has absorbed public code. That is not nothing. It is also not my production history.

So when the agent writes, the senior engineer has to externalize instincts that used to be silent. Write the invariant. Write the acceptance test. Write the boundary. Write the “do not touch this table” instruction. Write the review checklist. The tacit has to become executable.

That is painful because tacit knowledge was part of the status game of engineering. The expert knew without saying. Agents punish that.

Legacy Humans Have The Same Problem As Legacy Systems

There is an obvious version of this argument for companies. A twenty-year-old enterprise system contains valuable domain knowledge in the worst possible format: tribal memory, stale diagrams, incident lore, Jira archaeology, Slack threads, and modules shaped by reorganizations no one remembers.

But the same thing is true of a twenty-year engineer.

The legacy engineer has immense knowledge. The question is whether that knowledge can be turned into constraints an agent can use. If it stays as private taste, the agent cannot benefit from it. If it becomes tests, interfaces, threat models, migration rules, and retrieval context, it becomes a force multiplier.

That is the pivot I care about.

The engineer who refuses all agents may still be right inside a compiler, kernel, database, allocator, or safety-critical system where the review tax overwhelms generation speed. The engineer who refuses to learn agent direction in ordinary product engineering is making a different bet: that typing will remain scarce enough to protect them.

I do not buy that bet.

Typing is getting less scarce. Good constraints are getting more scarce. So is taste. So is the ability to know which generated solution is subtly wrong.

The New Seniority

A junior developer with an agent can now create more surface area than a senior engineer can review. That sentence should make managers nervous.

It does not mean juniors become useless. It means the old apprenticeship model breaks if the junior never develops the internal model the agent is replacing. If all they learn is prompting and acceptance, they become fast at producing code they cannot defend.

It also does not mean senior engineers can retreat into purity. A senior who only says “no AI” may be protecting quality in one loop while losing the larger shift in production. The future engineer has to know how to create a harness around the agent: small diffs, deterministic tests, explicit contracts, locked migrations, typed APIs, repeatable benchmarks, permission boundaries, and review gates that fail before a human gets tired.

This is where my loyalties are. I am betting on top-down processing, but not the lazy version where a prompt replaces engineering. I mean top-down as constraint design. The human decides the shape of the system, the invariants that matter, the blast radius, the evaluation, and the language in which success is judged.

The agent fills in leaves. The human owns the tree.

That is a more abstract job, but it is not an easier one. It may require more engineering maturity, because the code no longer carries as many visible fingerprints of its author’s uncertainty. The uncertainty is still there. It is just hidden behind fluent syntax.

The verification tax is the price of that fluency.

I do not want to go back to a world where every leaf node is hand-written. I also do not want a world where nobody can explain why the generated forest is safe to walk through.

The interesting work is in the middle: make human judgment explicit enough that machines can act inside it, and make machine output bounded enough that humans can still verify it.

It is the new software engineering with the private parts of judgment dragged into the light.

Khola.Blog: Post-Human Engineering

Discussion about this post

Ready for more?