How Practitioners Are Improving SpecOps

When I first wrote a book about the SpecOps methodology, it represented my experience working to modernize legacy systems in government and my ideas for how we should do that work effectively with AI tools. SpecOps is a different way to think about legacy system modernization. It's built on two proven ideas borrowed from modern software practices. First, Spec-driven development, which says that you write a comprehensive description of how a system should behave before any code gets written. And second, GitOps, which says that the version-controlled representation of a system is the authoritative source of truth about how that system works, and changes flow to it first. Put those two powerful ideas together and you get SpecOps.

That reversal is what makes SpecOps different. Traditional legacy modernization, whether done by human coders or with AI assistance, converts old code into new code and treats the new code as the source of truth the moment the project ends. Requirements documents and architecture diagrams get produced along the way, but they exist to support the conversion and stop being maintained once it's over. SpecOps decouples understanding how a system works from how it happens to be built. Policy experts and program staff can read and verify a plain-language specification of eligibility rules or benefit calculations in a way they never could verify a Java translation of COBOL. Verified specifications then guide the actual legacy modernization. When something needs to change, the specification changes first.

This matters because it breaks a cycle that government has been burdened with forever. Even a successful conversion produces a new system that starts accumulating technical debt on day one and becomes the next decade's modernization crisis. A verified, maintained specification means a government agency never again gets stuck operating a critical system it depends on but no longer understands. The knowledge outlives the technology.

I wrote the SpecOps book so that practitioners could use it, share it — and improve it. That's now happening, and the improvements are worth writing down and talking about. Two practitioners working independently have extended SpecOps in ways the original articulation didn't fully anticipate, and both extensions converge on the reality for modernizing legacy systems: a verified specification is essential, but it does not, by itself, guarantee the system stays aligned with it. Something has to actively watch for the gaps between the system and the spec.

Specifications Need a Layer Above the Rules

The first set of extensions comes from a working engineering team that turned SpecOps into a tool they install into a codebase and run every day. Several of their additions are practical scaffolding, but two reach into the core theory of SpecOps.

The first is the separation of state from motion. A specification describes what should be true forever. It says nothing about the order in which work gets done, what's blocking what, or how anyone knows a piece is finished. The team at Jarvus Innovations added a second artifact they call a plan. Each plan is one bounded chunk of work in its own file, carrying its scope, the specifications it implements, its dependencies on other plans, and a checklist of concrete criteria that has to be satisfied before the work counts as done. Specifications are frozen by review; plans freeze when the work merges, and a frozen plan becomes a permanent record of what got built and what got deferred. This is the part of their approach with no real counterpart in my original description of the approach. SpecOps answered what must be true and who verifies it. The plan protocol answers what we work on next, in what order, and how we prove each piece is complete, and it does so in a form that survives the departure of whoever did the work. That directly serves the knowledge-preservation goal at the heart of SpecOps.

Another improvement is the elevation of principles to first-class content inside a specification. My original framing treated the specification as the place where system behavior lives: the rules, the calculations, the edge cases. The Jarvus team noticed that whoever implements a specification, a person or an AI agent, makes hundreds of small decisions, and enumerated rules can only cover the cases someone thought to write down. So they add the reasoning behind the rules. A principle is a decisive statement that picks a side of a real trade-off, written specifically to resolve the cases no rule anticipated, the same way the original author would have. The bar is that it has to rule something out. "The interface should be user-friendly" is useless. "Show stale data with a timestamp rather than block the screen on a refresh, because this tool runs in places with no signal" is enforceable. This is a valuable addition. SpecOps preserves what a system does. This extension argues you also have to preserve the judgment behind what it does, or the next person quietly rebuilds it as something else.

Generated Code Drifts from the Behavior It Was Supposed to Preserve

The second set of extensions comes from Ryan Mahoney, a seasoned SpecOps practitioner who focused on a problem I probably underplayed: even after experts verify a specification, the code an AI generates from it can fail to match the legacy system it replaces. Code generation is itself a lossy step. It drops error handlers, weakens concurrency, loses edge cases, and does so silently, because nothing in a specification audit can see what the generator actually produced.

The technique here is a clever inversion of the SpecOps method itself. After a round of code generation, you run the same analysis you ran on the legacy system, the identical prompt and structure, against the freshly generated code, treating that new code as if it were itself legacy. Then you compare the two descriptions. The original SpecOps approach distrusts a code translation enough to analyze the legacy system into a readable specification. This extension applies the same distrust to the output, on the reasonable grounds that generated code deserves no more automatic trust than the COBOL it replaced.

What makes this valuable is the taxonomy that comes with it: a structured account of how generated code loses fidelity. Behavior goes missing, with logging and error modes the usual casualties. Behavior gets weakened, where "exactly three retries with exponential backoff" degrades into "retries on failure." Behavior changes outright, in defaults or ordering or concurrency. Edge cases disappear. And modernized code sometimes gains behaviors the original never had, which catches scope creep as well as loss. The SpecOps book describes where AI hallucination happens, inventing rules and fabricating cases. This is the operational complement: a map of where AI code generation silently subtracts. For a benefits or tax system, silent subtraction is the more dangerous of the two.

Ryan adds two further disciplines worth understanding. The first is epistemic humility about the audit itself. Both descriptions being compared are AI-generated, so a difference between them might be nothing more than two different ways of describing identical behavior. Every flagged divergence therefore gets checked against the actual generated code, which is treated as the only real ground truth, and false alarms are expected as a normal result rather than a failure. The second is treating completion as something you measure rather than declare. The back end of a migration becomes a loop: generate, audit for drift, correct, generate again, and keep going until only cosmetic differences remain. Severity sets the gate. A single critical divergence blocks completion no matter how few there are, and if the corrections stop shrinking from one round to the next, that plateau is itself a signal that the generator has a systematic blind spot worth investigating.

The Concept of "Spec Drift" Is More Than Just One Thing

Put these enhancements side by side and a pattern emerges that neither states alone. In my original vision, the verified specification was the thing that prevented drift. What these practitioners have highlighted is that "drift" is not a single problem. There are at least three types, and each needs its own mechanism.

There is drift within a specification, where it becomes ambiguous or contradicts itself. There is drift between a living specification and its code, where the two fall out of sync over time as a system evolves. And there is migration drift, where freshly generated code fails to preserve what the legacy system actually did. A verified specification, all on its own, doesn't really address any of these. It makes all three findable, which is something, but finding them takes deliberate auditing, not faith that the specification's existence keeps everything magically aligned.

I'm more convinced now than I was when I published the book that the SpecOps approach is the right way to approach legacy modernization. But the ideas in the book are only the beginning. It's the people practicing this approach that will write the next chapter in the SpecOps story.

If you're applying SpecOps to your own systems and finding gaps the original ideas didn't cover or address, I'd love to hear about it. That's how this approach gets better.