“AI writes most of our code” sounds like a startup slogan. Can it be real for an enterprise application — live customers, live billing, a monorepo where a bad merge costs money? At QualityUnit it is. Here is the ten-month trail of evidence, and the rules that make it work.
TL;DR: In ten months, agent-authored work went from the first experimental PRs to 133 of 144 development PRs merged in May (92%) — verified by a three-way forensic audit of all 1,409 merged PRs, down to commit trailers and a manual inspection of every unmarked 2026 PR. It didn’t happen by “letting the AI code”: it happened by adding rules — a risk-tier harness config, a staged agent pipeline with bounded review loops, protected paths, and a human holding every merge. The rules are the product. And with a context engine feeding the agents, the same work now costs ~30% less per task (measured here ).
What it actually takes
Not a tool. A pipeline, a policy file, and a gate — run by harnext .
The pipeline: staged agents, one human
The harness is harnext — QualityUnit’s open-source, provider-agnostic coding-agent harness. In our production monorepo, every issue that enters the pipeline runs the same gauntlet of CI-triggered agent stages, its progress tracked through labels a human can read at a glance:
Two details matter more than the stage count. The loop is bounded: defects found in review go back to the implementation stage a limited number of times — agents converge or escalate to a human, they don’t thrash. Nothing starts blind: before writing a line, the implementing agent must load the project’s conventions and emit a confirmation block reviewers can check.
The policy file
The other half is a machine-readable policy: every path in the repo classified into risk tiers, each tier with enforceable gates. CI reads it; merge policy reads it; agents are briefed on it. It’s not advice:
Protected paths — migrations, payments, auth — are files no agent may touch. Architectural boundaries are enforced, not suggested. Take these rules away and a coding agent is a very fast generator of plausible-looking liabilities.
Ten months, one chart
The adoption trail, measured from the repository itself.
The chart counts, for every month, how many merged development PRs carry any hard agent signal — the coding agent’s footer, the pipeline’s labels, the harness tier convention, commit co-author trailers, agent commit emails, or the pipeline’s own account as author. Dependency-bot PRs (about 8% of all merges) are excluded from the chart entirely — they’re neither human nor coding-agent work. We audited the signals three independent ways: PR metadata for all 1,409 merges, commit-level trailers across 5,000+ commits, and a manual forensic pass over every single unmarked PR of 2026. Three readings matter:
Enthusiasm fades; infrastructure sticks. The 2025 era was ad-hoc, personal adoption — and it oscillated exactly like personal habits do: 44% one month, barely 4% in November when the heaviest users paused. The harness changed the shape of the curve: within a month of the risk tiers arriving, the measured share jumped to 89%; with the full pipeline it reached 92% and stayed there. Each layer of rules increased adoption more than any individual’s enthusiasm ever did. The two shades tell the same story inside the agent share: the light band is developers pairing with the agent by hand; the dark band — work that ran the full pipeline from issue to reviewed PR — appears only when the harness lands, and by May it carries the majority of the agent work.
We inspected the remainder, PR by PR. For April–June 2026, the PRs without any marker decompose into: dependency-bot automation, agent work whose only attribution survived in commit trailers, and a residue of plausibly hand-written changes — about 11% of non-automation merges. So the honest sentence is: ~89% of real development merges in the last quarter show verifiable agent involvement — and even that is a floor, since editor-level AI assistance leaves no trace at all. We also sent skeptical auditors at the three weakest months, PR by PR: November’s count rose from 1 to 3 proven (plus 3 suspected on style), January’s fell from 10 to 8 after catching two false positives, and December was confirmed exactly — with one twist: by code volume, December’s eight marked PRs delivered 39% of that month’s inserted lines. The agent was already writing the big features; the count just couldn’t see it. Adoption also isn’t uniform: some developers run near-100% agent-assisted, a couple still mostly hand-write — the pipeline carries a growing share either way.
Quality didn’t move backwards. The same window shipped Tier-3 changes — LLM-provider integration, payment-adjacent work, an i18n expansion — under gates that got stricter over the period, not looser. And when we measured agent review consistency directly, 21 of 22 independent review agents reached the same verdict on the same PR.
So who’s the author?
The best articulation of where this leaves the human comes from an engineering thesis that studied harness-driven development on an aviation-grade project:
By the time a change reached the human author, the routine quality issues had been resolved — the author’s review concentrated on architectural and domain-level decisions. The merge was the author’s decision. Authorship of the merged code rests with the human author, regardless of which actor produced the initial draft.
— Štefan Moravík, Design and Implementation of a Drone Mission Planning Module for Airport Lighting Inspection (thesis, 2026)
That’s the deal in production too: agents do the drafting and the routine quality work; the human does architecture, domain judgment, and owns the merge.
The collaboration nobody planned
The least expected change wasn’t speed — it was what happened between developers. Everything runs on GitHub: agent PRs and human PRs sit in the same queue, carry the same labels, clear the same gates, and we review each other’s work — and the agents’ — in one flow. There is no separate “AI process” to context-switch into.
And it changed what happens to small ideas. Normally, when you stumble on a bug or think of an improvement mid-task, you write it on some list and never get back to it. Now you tell the agent, mid-flow: “file an issue for this.” The issue is created, tagged, triaged, and picked up by the pipeline while you finish what you were doing — and by the time you’re done, the fix is often already implemented, sitting in a PR, waiting for your review. The backlog of “I’ll get to it someday” quietly became a queue of things waiting for approval.
And it keeps getting cheaper
The newest layer is the context engine. Instead of every agent re-reading every convention document on every task, harnext’s meaninggrid serves a versioned, cited digest of the project’s rules — and our measured A/B test found the digest-plus-policy-file configuration was both the cheapest and the most thorough reviewer setup (−32% cost vs. reading the docs). The harness made agentic development trustworthy; the context engine is making it cheap.
The rules are the product
What to copy if you want this in your company.
What you can do on Monday
- Write the rules down as config, not culture. Risk tiers, protected paths, architectural boundaries — one machine-readable file that CI enforces. A wizard can propose it; a human must review it.
- Stage the pipeline. Tag → triage → plan → implement → review, each as a CI workflow with visible state. Don’t hand an agent a raw issue and hope.
- Bound the review loop. A review-fix cycle with a hard iteration cap, so agents converge instead of thrashing.
- Gate the context. No agent writes code before loading the project’s conventions and proving it — a confirmation block reviewers can check.
- Keep the merge human. The pipeline’s job is to make human review architectural, not to remove it.
- Then cut the context bill. Serve the rules through a context engine instead of raw file reads — measured: −32% per task with better results .
An open invitation
If you’re asking whether this can be real in your company, we’d genuinely like to compare notes. The harness is open source at harnext.dev , and the FlowHunt team helps companies set up exactly these pipelines. Skeptics especially welcome.

