The "Flying Too Close To The Sun" Argument Is Wrong

And if you’re building AI infrastructure, you need to understand why.

Frequently, an investor or analyst looks at what we’re building at TrueMath and says some version of what we heard this past week: “You’re flying too close to the sun where the big models roam.” In other words, isn’t it just easy for OpenAI or Anthropic or Gemini to just add a feature and take you out?

It’s a reasonable instinct. It’s also wrong. And the reason it’s wrong tells you something important about where AI is actually going.

The math benchmark scores keep climbing. The reproducibility problem has not moved.

OpenAI just shipped interactive math visualizations inside ChatGPT. Adjust a slider, watch the hypotenuse update. It’s genuinely useful for students. It is not, in any meaningful sense, a threat to math infrastructure for regulated industries.

Here’s why. LLMs are stochastic by design. Run the same prompt twice, get two outputs, or at least two methods of solving the problem. That is not a bug waiting to be patched. It is an architectural property of how transformers work. Determinism cannot be added as a feature. If it could have been, it would have been by now. The big models have poured billions into math reasoning. Benchmark scores keep climbing. The reproducibility problem has not moved an inch.

This isn’t conjecture. IBM Field CTO Raffi Khatchadourian tested 74 configurations across 12 models to find out how reliably frontier LLMs reproduce the same decision given the same inputs. Even at temperature zero, models like Claude Opus and Gemini Pro achieved only 88.5% decision determinism. Under stress conditions like data quality faults or redeployment, the numbers dropped further. His conclusion: determinism requires purpose-built infrastructure. It cannot be solved through model selection or prompt engineering. The architecture itself is the constraint.

The reason is simple and underappreciated: math reasoning is a language skill. Math execution is not.

We can train people to teach and explain calculus. We cannot train people to perform calculus reliably in their heads, or even on paper beyond the basic arithmetic. That’s why we built calculators, then spreadsheets, then computational engines. The brain handles intent and interpretation. The tool handles execution. That division of labor is not a historical accident. It is the correct architecture.

LLMs are extraordinarily good at the language part. They are structurally unsuited to the execution part. TrueMath is the execution layer.

Accessibility without auditability is liability.

AI makes complex computation accessible to non-specialists and unlocks efficiencies for specialists. Those are genuinely good things. But the workflows that matter most, the ones where AI’s potential is largest, are also the ones where a stochastic system is simply not enough.

Consider what becomes possible when you put a deterministic execution layer underneath the model:

An EMT is managing a pediatric trauma case in the back of a moving ambulance. She describes the patient’s vitals, weight, and presenting symptoms out loud. The AI replies with dosing calculations, contraindication flags, and recommended interventions. She trusts the numbers because she knows they weren’t generated, they were computed, the same way they would have been by the most experienced clinician on the planet with a calculator and a reference manual, just in three seconds instead of three minutes. Her eyes stay on the patient.
A real estate investment fund lets prospects ask any question they want about the underlying pro formas. Want to know what happens to IRR if vacancy runs four points higher than projected for the first two years? Ask. The partners trust the answers because the math is deterministic and auditable. And they’ve discovered something unexpected: the questions prospects ask tell them more about buyer intent and sophistication than any intake form ever did.

None of these workflows exist today. Not because the AI isn’t capable enough. Because there’s no infrastructure underneath it that anyone on either side of those transactions would trust.

That’s the problem TrueMath solves. Accessibility without auditability is liability. And that problem gets harder, not easier, as AI matures. Autonomous agent workflows, software talking to software, executing transactions, pricing contracts, running clinical protocols without a human in the loop, these are coming fast. The efficiency gains are real. So is the audit requirement. A reliable trail of how calculations were derived is not a nice-to-have in that world. It is the table stakes for operating in it.

TrueMath is infrastructure, not a feature.

TrueMath runs as a developer environment with an SDK and as an MCP server. It sits underneath the model, not beside it. The model handles intent. TrueMath handles execution, with deterministic results, full auditability, and 30 years of verified mathematical correctness in the PowerOne kernel underneath it.

This is the Stripe analogy. When LLMs arrived, no one expected them to simply subsume payment transactions. The reason is obvious in retrospect: payment infrastructure requires guarantees that intelligence alone cannot provide. You need the same transaction to execute the same way every time, with a record, governed by rules, auditable on demand. Math infrastructure is the same category of problem.

The “just add a feature” argument proves the opposite of what people think.

Could a major model provider throw money at this, build teams, create the functionality? Yes. But they don’t start with a natural technology advantage. It is not a feature they can add based on their current architecture. That means any serious move in this direction involves a build-versus-buy decision. And a buy decision is a partnership or exit path, not an existential threat.

The “flying too close to the sun” argument assumes that proximity to a large player means you’ll be subsumed. Sometimes that’s true. It’s true when what you’re building is essentially a feature, a thin layer of functionality that the platform can absorb in an afternoon.

It’s not true when what you’re building is infrastructure. Infrastructure requires different architecture, different engineering, and in TrueMath’s case, a 30-year head start on the underlying math kernel. That’s a moat, not a liability.

The sun isn’t the threat. It’s the reason the market exists.

Reach out: bill.kelly@truemath.ai
Learn more: truemath.ai
Sign up for early access: https://app.truemath.ai/signup