Ousterhout Was Right. But the Game Has Changed.

Deep modules were a cognitive convenience for human engineers. For autonomous agents, they are a hard architectural requirement.

Jun 11, 2026

I keep a copy of A Philosophy of Software Design on a short shelf. Eight books, maybe. The ones I still argue with.

Ousterhout’s argument is simple: the enemy of a good codebase is complexity, and the weapon against it is the deep module, a simple interface masking a large hidden implementation. His justification was cognitive load. Human brains hold limited state. Deep modules reduce what you need to hold simultaneously. That justification is now beside the point. The conclusion is more important than ever.

The thing doing most of the implementation work in a modern codebase is running on a transformer with a bounded attention mechanism and a hard token budget. Its cognitive limit is a hardware constraint. When you hand it a shallow module, the failure is a broken production invariant that surfaces three months later with no traceable origin.

The Private State It Was Not Supposed to Touch

Here is the failure mode that clarified this for me. A research team tasked an LLM-based coding agent with adding a statistics dashboard to an existing React application. The app had a public API for reading task data. The agent inspected the interface, found several preconditions to satisfy and parameters to order correctly, and made a calculation.

It skipped the API. It reached into the internal state store and wrote directly to _todos and _nextId, two private variables that were never part of the contract.

Every test passed. The feature shipped. Six months later, a developer refactored the internal state representation. The statistics dashboard broke silently, returning stale numbers with no exception raised, no error logged at the call site, nothing. Just wrong output that a user eventually noticed.

The agent took the cheaper path, not the correct one. The shallow public API cost more tokens to satisfy than writing to the private array. So it wrote to the private array. That is the optimization function at work: minimize tokens, pass the tests, ship the diff.

Ousterhout spent a chapter explaining why information hiding matters. The agent confirmed it empirically, by violating it.

What the Benchmarks Say About Boundaries

The behavioral data at scale matches the pattern from that single incident.

Autonomous agents on SWE-bench, which draws from real GitHub issues across production repositories, resolve roughly 71% of tasks requiring a single-file edit. On tasks requiring changes across multiple files, that drops to roughly 28%. Forty-three points. The gap is not a task difficulty artifact. It is what happens when an agent needs to trace across an abstraction boundary to understand what a change will do.

A deep module makes that boundary cheap. Load the signature, understand the contract, work. A shallow module makes the boundary expensive. Load the implementation, load the callers, load whatever adjacent module holds the state this function implicitly depends on, load the configuration that gates behavior. Every file is tokens. Every token is attention budget that the model cannot spend on reasoning.

You cannot solve this by expanding the context window. A million tokens of shallow module dependency graph does not give the model more understanding. It dilutes the signal with more surface area. The agent still fails; it just fails more expensively.

What Gets Generated When You Ask for Modular Code

Tell an agent to write modular code. Watch what comes out.

Files will be small. Directories will be organized. Interfaces will be defined and injected. It will look correct on a whiteboard. Then try to change a single struct and discover that the definition has leaked into six different files, each of which needs to be updated atomically or the system breaks. The agent distributed a single logical responsibility across the filesystem rather than encapsulating it. The coupling is real; the modularity is cosmetic.

Researchers who systematically analyzed AI-generated codebases at scale named this the “Modular Mirage.” The visual pattern of modularity is present. The semantic isolation that makes modularity useful is absent.

The longitudinal benchmark SWE-CI measures what happens to a codebase over months of agentic maintenance, spanning commits rather than single pull requests. Across those continuous integration loops, most autonomous agents introduced breaking regressions into previously working code on more than 75% of extended tasks. The agent fixed the current ticket. It walked backward over something that worked. The reason is shallow boundaries: a local edit had non-local consequences the agent could not see from within its context window.

The “Define Errors Out of Existence” Chapter Lands Differently Now

Ousterhout’s advice here is to design APIs so that certain errors cannot occur at the call site. If a substring function returns empty on out-of-bounds indices rather than throwing, the caller never needs to handle an exception that carries no actionable information.

Read that chapter again with an agent in the caller role.

An agent hitting an API that throws granular exceptions must write branching logic. It must predict the shape of the error object. It must anticipate edge cases that the API designer could have swallowed once and hidden from everyone downstream. LLMs do this poorly. They will guess the error structure, write a try-catch that catches too broadly, and move on. When the function returns a success code on a corrupted internal state, the agent proceeds to build the next layer on that foundation. The error surfaces far from the origin, in behavior that looks plausible for a while before it obviously is not.

There is a term for this failure mode in the research: silent failure rate, meaning the proportion of API contract violations that produce wrong behavior without raising an exception. Silent failures compound across agentic workflows in a way they never did with human developers, because the human would notice the wrong output during review. The agent will not.

State-of-the-art coding agents top out around 34% success when patching confirmed security vulnerabilities. A large fraction of that failure traces back to this: the agent patches the visible exception path and misses the invariant that was being enforced implicitly by the calling code it just restructured.

An interface that cannot be called incorrectly is not merely a convenience, for agents, it is the only safe interface to expose.

The Margin Note I Would Add to Chapter 11

Ousterhout’s “design it twice” principle is about forcing yourself to draft a second interface before committing to the first. The discipline surfaces what the first draft was actually exposing.

My margin note: every parameter in the signature is a token you are billing your agent maintainer.

The practical test I run before approving any interface: can an LLM use this correctly, zero-shot, from the function signature and a one-line docstring, with no additional context loaded? If the answer is no, the interface is too shallow. That standard is Ousterhout’s standard. The entity being measured has changed, and the measurement has gotten stricter.

Expose twelve configuration flags and you have written a form, not a module. A probabilistic system will fill it out incorrectly.

What the Book Got Right About a Problem It Did Not Know Existed

A Philosophy of Software Design was written for an era when the bottleneck was human working memory. The advice was correct. The frame was narrow.

Deep modules are the only mechanism that gives an autonomous agent a cleanly bounded scope to work in. The interface becomes the contract that defines what the agent can safely assume, touch, and ignore. When that contract is weak, the agent does not operate on a module. It operates on the entire graph of things the module implicitly depends on, which may be larger than its context window and will certainly be larger than what its attention mechanism can hold coherently.

The senior engineer’s job in 2026 is not writing code. I write the interface. I write the test. I set the boundary that triggers an alarm when the agent crosses it. The agent implements. I own what ships.

Ousterhout did not intend to write the survival manual for the post-human codebase. Reading it in 2026, that is what it is.

Discussion about this post

Ready for more?