The Next Gap

Michael Domanic, Section’s Head of AI, published a short piece this week called “The chasm is already visible.” It’s worth reading. He lays out what’s happening inside Section: 154 pull requests in three weeks from a five-person engineering team. Pre-AI, per Domanic, that team would have shipped 54. Nearly 3x throughput improvement. He notes that some of Section’s customers are now so deep in agent experimentation that they’re building internal marketplaces and improvising governance structures because the tools don’t support it natively yet.

His thesis: the gap between AI-forward companies and laggards used to be subtle. Now it’s not. And it’s widening every week.

He’s right. And I want to add one thing.

The visible gap and the invisible one

The gap Domanic describes is a velocity gap. Throughput, decision speed, scope of what small teams can take on. That’s real and it’s measurable. PR counts go up. Time-to-decision goes down. Headcount needed to reach a given output target falls.

But there’s a second gap forming underneath it, and almost no one is talking about it yet.

It’s the gap between companies whose agents produce reliable numbers and companies whose agents hallucinate them.

When Domanic describes a Chief of Staff agent delivering daily strategic briefings with more context than any human could synthesize, that’s a powerful capability. But somewhere in that briefing is revenue, runway, burn, headcount cost per function, customer acquisition cost, payback period, contribution margin by segment. Numbers. And every one of those numbers, if it came out of an LLM doing the arithmetic, has a non-trivial probability of being wrong.

Not “wrong” in a fuzzy, judgment-call way. Wrong in a “the LLM produced 0.847 when the correct answer was 0.487” way. Wrong silently. Wrong with the full fluency and confidence of a well-formatted briefing document.

The CEO reads the briefing. The numbers look right. The narrative holds together. The decision gets made.

This is the part of the Supercompany story that nobody is auditing yet.

Why velocity hides the floor

The reason this gap is invisible right now is that velocity metrics don’t surface it. PR counts don’t tell you whether the code is correct. Briefing throughput doesn’t tell you whether the financial model underneath the briefing is arithmetically sound. Decision speed doesn’t tell you whether the inputs to the decision were computed reliably.

The Harvard, MIT, Wharton, and BCG study on the jagged frontier found that consultants using GPT-4 produced output rated 40% higher in quality on tasks inside the frontier. Beautiful result. But the same study found that on tasks outside the frontier, AI users performed 19 percentage points worse than the control group. And they didn’t know they’d crossed the line. The AI just confidently produced wrong answers that looked right, and the consultants accepted them.

The researchers called it “falling asleep at the wheel.” That’s a polite way of describing what happens when fluency outruns correctness.

For tasks where “mostly right” is good enough, the velocity gain is pure upside. Brainstorming, drafting, summarizing, coding scaffolds, research synthesis. The Section team’s 154 PRs almost certainly fall mostly into this bucket, and that’s why the gain is so dramatic.

But for tasks where “mostly right” is structurally insufficient, the velocity gain is a trap. Drug dosing. Bond pricing. Loan amortization. Tax calculation. Engineering tolerances. Insurance reserves. Key Performance Indicators for the enterprise. Any number that ends up in a contract, a regulatory filing, a financial statement, a bonus payment calculation, or a clinical decision.

In those domains, 99% accurate is 100% wrong, because you don’t know which 1% is the wrong one.

The architecture answer

Section’s customers are already feeling this. Domanic mentions that they’re “hacking together governance structures because the tools don’t support it natively yet.” That sentence is the whole story.

The reason they’re hacking is that the agent stack as currently constituted has a missing layer. There’s a layer for orchestration. A layer for memory. A layer for tool use. A layer for retrieval. There’s no layer for deterministic computation.

When an agent needs to send an email, it calls an email tool. When it needs to query a database, it calls a database tool. When it needs to do math, it… asks the LLM to do math. And the LLM, being a probability machine over tokens, produces an answer that is probably right.

That’s the floor problem. It’s not a capability problem and it won’t be solved by a bigger model. It’s an architectural problem. The math layer needs to be a separate, deterministic, auditable component, the same way the database layer is a separate component. Agents that call a deterministic math kernel produce reliable numbers. Agents that don’t, don’t.

This is what we’re building at TrueMath. A computational layer for the agent control plane. LLMs interpret intent. TrueMath executes the math. Every calculation is traceable, auditable, and reproducible. When the briefing says revenue grew 14.2% quarter over quarter, you can click through and see exactly which inputs produced exactly which output, with no probability distribution involved.

What the next chasm looks like

Domanic’s chasm is real, and it’s the right thing to be writing about right now. Companies that haven’t started experimenting with agents in a serious, structured way are watching the gap widen every week. That’s true.

But there’s a second chasm forming a layer underneath, and it will become visible the moment a Chief of Staff agent’s briefing produces a number that turns out to be wrong in a way that costs real money. Or the moment a customer-facing agent calculates and quotes a price that doesn’t match the underlying contract or agreed-upon business logic. Or the moment a compliance audit traces a regulatory filing back through an agent workflow and can’t reconstruct how the numbers were computed.

When that happens, and it will, the velocity gap will suddenly look less important than the correctness gap. Companies that are built on agents with deterministic computation underneath will keep moving. Companies that are built on agents alone will spend the next year figuring out which of their last year’s decisions were based on hallucinated arithmetic.

The first chasm rewards speed. The second one rewards architecture.

If you’re already deep in agent experimentation, good. Keep going. But while you’re hacking together governance structures, ask one more question: what’s underneath the math?

Because the chasm Domanic sees is already visible. The one I’m describing is right behind it.

Reach out: bill.kelly@truemath.ai
Learn more: truemath.ai
Sign up for early access: app.truemath.ai/signup