Return on Code

The Return on Code glossary.

A plain-language vocabulary for measuring the ROI of AI-generated code, the metrics Codelitics uses to tell durable value apart from raw output, across every AI coding tool a team runs. Applied to a specific tool in our Claude Code ROI guide.

Return on Code (RoC)
Code Yield
Code Half-Life
Survival rate
Cost per realized change
Verification tax
Tool yield
AI-authored line
Tokenmaxxing

Return on Code (RoC)

The realized return on AI-generated code: how much of what you spent on AI coding tools became durable, verified, goal-linked code that shipped.

Return on Code is the AI-era successor to vanity productivity metrics. Where older measures counted output (lines, suggestions accepted, pull requests), RoC anchors on outcomes: code that shipped, survived, and served a goal, set against what it cost. It is deliberately repo-local and tool-neutral, so the same definition applies across every AI coding tool a team runs.

Code Yield

The share of committed AI-authored code that ships, lasts, and matters: the product of three gate yields, reported 0–100.

Code Yield is the headline Return on Code metric. It multiplies three gates rather than averaging them:

Code Yield = Ship × Last × Matter

Ship: the fraction of AI-authored lines that reach your default branch instead of being abandoned or reverted before merge.
Last: the fraction still present and load-bearing at 30 and 90 days, not just technically present in a dead file.
Matter: the fraction tied to a real goal and not implicated in a revert, hotfix, or incident.

The rolled-product form is the point: value leaks at every gate, so three gates at 80% yield 0.8³ ≈ 51%, not 80%. That makes the number hard to game by inflating any single stage.

Code Half-Life

The time it takes for half of a cohort of AI-written code to be rewritten or deleted. A durability number, reported per tool and per model.

Code Half-Life applies survival analysis to a cohort of AI-authored lines from a given period and reports the point at which 50% has been modified or removed. It captures persistence in one vivid, screenshot-friendly figure ("our AI code half-life is six weeks"), and is a diagnostic companion to Code Yield, not a replacement: half-life measures duration, while Code Yield measures return per dollar.

Survival rate

The percentage of AI-generated lines still present in the repository at 7, 30, or 90 days after they first landed.

Survival rate is the most intuitive durability signal: of the code an AI tool wrote, how much is still in the codebase weeks later? It is the foundation of the Last gate in Code Yield. On its own, raw survival can over-credit code that persists only because nobody dares touch it, which is why a rigorous version also checks that surviving code is load-bearing.

Cost per realized change

Total AI spend plus the human cost of verifying it, divided by the changes that actually shipped and stuck. Sometimes phrased as cost per surviving line.

Cost per realized change is what makes cross-tool comparison coherent. It collapses a flat-seat tool, a token-priced tool, and a hybrid into one comparable figure by putting spend over outcome. The denominator includes not just tokens and subscriptions but inferred verification cost: the human time AI work consumed, estimated from commit-to-merge latency and fixup density rather than from surveys.

Verification tax

The hidden human time AI-generated code consumes before it can be trusted, measured from commit-to-merge latency and fixup-commit density.

Faster generation is not free. Code that arrives quickly still has to be read, tested, and often corrected. The verification tax makes that cost visible by inferring it from observed signals instead of self-reports, capturing the J-curve where AI accelerates typing but shifts effort into review. It feeds directly into cost per realized change.

Tool yield

Survival and cost broken down per AI coding tool and stratified by task type, so each tool's contribution can be compared fairly.

Tool yield is the procurement view: which tools earn their seat. A naïve "Tool A 31% vs Tool B 22%" ranking is invalid, because it is confounded by which work each tool was given and how much. A valid comparison stratifies by task type, repo area, and author seniority, and reports confidence alongside every figure. It is the cross-tool view no single vendor can produce, because each one only sees its own usage.

AI-authored line

A line of code attributed to an AI tool: the atomic unit Return on Code metrics are built on.

Everything downstream depends on knowing which code an AI tool wrote. Attribution is resolved at the change or session level, ideally consuming an existing git-notes attribution layer and falling back to commit trailers or telemetry where needed. Every reported figure carries its attribution method and a confidence level, so the number can be audited rather than taken on faith.

Tokenmaxxing

Throwing an expensive, overkill model at work a cheaper one would have shipped just as well, burning budget without improving outcomes.

Tokenmaxxing is what cost-blind AI adoption looks like in practice: maximizing token usage as if it were a proxy for value. You can only catch it if you can see survival and cost per tool and per workflow. Otherwise an expensive model quietly drains budget on trivial work, and the bill is the only evidence it ever happened.