Code Yield
Code Yield is the share of AI-authored code that ships, lasts, and matters.
It is the headline Return on Code metric, computed as a rolled product of three gates rather than an average. Where acceptance rate stops at the keystroke, Code Yield follows a line all the way to whether it reached the default branch, stayed load-bearing, and served a real goal. It is repo-local and tool-neutral, so the same definition holds across every AI coding tool a team runs.
The three gates
Each gate asks a harder question than the last.
A line of AI-authored code has to clear all three gates to count. Each one is a fraction of the cohort that came before it, which is why the order matters and the failures compound.
It reaches the default branch
The fraction of AI-authored lines that actually land on main, instead of being abandoned, rewritten, or reverted before merge. A line generated in a session that never ships counts for nothing here.
It is still load-bearing at 30 and 90 days
The fraction still present and doing work weeks later, not just technically present in a dead file. Code that survives only because nobody dares touch it does not pass this gate.
It is tied to a goal and clean
The fraction connected to a real ticket or objective and not implicated in a revert, hotfix, or incident. Lines that shipped and survived but were never part of anything that mattered are excluded.
Each gate draws on the same attribution layer, so every Code Yield figure carries the method and confidence behind it.
A worked example
Watch a healthy-looking week collapse to a 54.
The numbers below are illustrative, not a measured benchmark or a real customer result. They exist to show how the rolled product behaves.
| Gate | What it asks | Lines passing | Gate yield |
|---|---|---|---|
| Ship | Reached main | 820 of 1,000 | 0.82 |
| Last | Still load-bearing at 30 days | 680 of 820 | 0.83 |
| Matter | Tied to a closed ticket, clean of reverts | 540 of 680 | 0.79 |
Rolled product
0.82 × 0.83 × 0.79
54
Code Yield, reported on a 0–100 scale. Just over half of the week's AI-authored lines actually shipped, lasted, and mattered.
The same gates, averaged
(0.82 + 0.83 + 0.79) ÷ 3
81
An average reads 81 and looks fine. The 27-point gap between 81 and 54 is the leak the rolled product exposes and the average hides.
Why multiply, not average
An average forgives a leak. A product compounds it.
Rolled Throughput Yield exists because a process with many stages is only as good as the product of its stages. A factory line where each of five steps runs at 95% does not run at 95% overall, it runs at about 77%, because the losses stack. Code moves through the same kind of pipeline: generated, merged, kept, and justified. Averaging the stages quietly pretends the losses do not stack.
Multiplying also resists gaming. If you flood main with loosely reviewed AI code to push the Ship gate up, you tend to drag Last and Matter down as that code gets reverted or stranded. No single stage can carry a weak score, and a near-zero gate collapses the whole figure regardless of the other two. That is the opposite of acceptance rate, which can look great while almost none of the accepted code survives.
This matters more as confidence outruns reality. In a controlled study, experienced developers were measured roughly 19% slower on real tasks while believing they were about 24% faster. A metric that compounds its gates is built to catch exactly that gap between how productive AI feels and how much of its output endures.
How to read a low score
A low Code Yield is a map, because you can see which gate leaked.
Because the score decomposes into three gates, a low number is diagnostic rather than just discouraging. The same 54 means very different things depending on where it leaked.
Ship leaks
A lot of AI code never reaches main. Look at where sessions stall before merge: prompts that produce throwaway scaffolding, or work generated faster than it can be reviewed.
Last leaks
Code ships but gets rewritten within weeks. That points at quality or fit: AI output that passes review but does not hold up once the next person touches it. Pair this with Code Half-Life.
Matter leaks
Code survives but is tied to reverts, hotfixes, or nothing at all. That is busywork or risk, not realized value, and it is where cost-blind volume hides.
Reading Code Yield per tool turns it into a procurement signal. The cross-tool view is the one no single vendor dashboard can produce, because each tool only sees its own usage. See it applied to Claude Code, GitHub Copilot, and Cursor.
Code Yield FAQ
What engineering leaders ask about the number.
- What is a good Code Yield?
- There is no published industry benchmark for AI-code survival yet, so we do not quote a universal pass mark. The useful comparison is internal: your own Code Yield over time, and the gap between tools, teams, and workflows on the same codebase. A score that holds steady as AI volume rises is a healthier signal than any single absolute number. Watch the trend and the spread, not a magic threshold.
- How is Code Yield different from acceptance rate?
- Acceptance rate measures a keystroke: did the developer accept the suggestion. It says nothing about whether that code shipped, survived, or mattered. Code Yield starts where acceptance ends. A suggestion can be accepted, merged, and still get reverted a week later, which lowers Code Yield while leaving acceptance rate untouched. Acceptance is an input signal; Code Yield is an outcome.
- How is Code Yield different from Code Half-Life?
- Code Half-Life measures duration: the point at which half of a cohort of AI-authored lines has been modified or removed. Code Yield measures realized return: the rolled product of shipping, surviving, and mattering. Half-life answers how long code lasts; Code Yield answers how much of it was worth keeping. They are companions, not substitutes, and they are computed from the same underlying attribution.
- Why multiply the three gates instead of averaging them?
- Value leaks at every stage, and a rolled product captures that compounding while an average hides it. Three gates at 0.80 average to 80, but multiply to 0.80 cubed, roughly 51. The product is also harder to game: you cannot rescue a weak Code Yield by inflating one stage, because a near-zero gate drags the whole figure down no matter how strong the other two look.
- Can Code Yield be gamed?
- It is deliberately hard to game. Because the gates multiply, padding one stage does little, and the gates pull against each other: shipping more loose code lowers Last and Matter. Every figure is also exportable and traceable to how it was computed, with its attribution method and confidence attached, so a number can be audited rather than taken on faith. You control which repositories and tools are in scope.
- Does measuring Code Yield rank individual developers?
- The public, conformant Code Yield score is reported at the team and tool level and never ranks individuals. Individual-level views do exist, but they are opt-in, sit outside the conformant score, and are governed by your own policy, including GDPR and works-council requirements. The headline number is about how your AI tooling performs, not about scoring people.