Determinism Isn’t Enough
The Case for Math Governance in AI Financial Workflows
A Field CTO at IBM Financial Services recently published a paper that should be required reading for anyone deploying AI in regulated financial environments. Raffi Khatchadourian’s “Replayable Financial Agents” poses a simple question: if a regulator asks your AI system to reproduce a flagged transaction decision, can it?
Across 74 configurations and 12 models, his answer is mostly no.
Even frontier models, including Claude Opus and Gemini Pro, achieved only 88.5% decision determinism at temperature zero. Under stress conditions like data quality faults or redeployment, numbers drop further. The paper introduces a rigorous evaluation framework called DFAH (Determinism-Faithfulness Assurance Harness) to measure this problem, along with deployment recommendations for financial institutions trying to navigate it.
The framework is excellent. The diagnosis is correct. But I think the paper stops one layer short of the real solution.
Measuring the Symptom vs. Fixing the Architecture
Raffi frames determinism as a property of the model. Tier 1 models (7–20B parameters) achieve 100% determinism; frontier models don’t. His recommendation for compliance-critical deployments: use smaller, schema-first models and accept lower accuracy in exchange for reproducibility.
That’s a reasonable workaround. But I’d argue it’s treating the symptom.
LLMs are probabilistic inference engines. Asking them to produce deterministic math is asking the wrong layer of the stack to do the wrong job. It’s not a model selection problem. It’s an architectural one.
You wouldn’t evaluate whether your accounting software “tends to” produce consistent numbers. You’d demand it always produces the same number given the same inputs. That isn’t a benchmark. It’s a design requirement.
LLMs should interpret intent, context, and natural language. Deterministic infrastructure should execute the math. These are different jobs. Conflating them is where the problem starts.
But Determinism Alone Isn’t Enough Either
Here’s where I’d push the conversation further: even if you solve the determinism problem at the compute layer (as TrueMath does) you haven’t yet satisfied what a financial regulator actually needs.
Regulators don’t just ask “did the system produce the same answer?” They ask:
What business logic produced that number?
Who approved that logic?
Has it changed since the decision was made?
That’s not operational determinism. That’s governance. And it’s the layer the field hasn’t talked about yet.
What TrueMath Provides Today
TrueMath is a deterministic math infrastructure platform for AI workflows. Built on Elia Freedman’s 30-year PowerOne math kernel, it sits beneath the LLM layer and executes calculations with provable consistency. But the governance capabilities are where the story gets interesting.
Every calculation, stored.
TrueMath logs 100% of calculations for later recall. Not as an LLM transcript. As an actual calculation record — variables, logic paths, outputs — that can be retrieved, replayed, and shown to an examiner. The difference between a flight data recorder and asking the pilot to remember the flight.
Business logic, locked.
The formula that ran on Tuesday runs identically on Thursday. Not because the LLM was consistent, but because the logic is immutable at the compute layer. This includes proprietary methods: firms can encode their own valuation models, risk algorithms, or compliance rules directly into TrueMath workflows, without exposing them or losing control of them.
One canonical version, everywhere.
A single approved calculation method propagates uniformly across every agent, every workflow, every desk that calls it. No shadow spreadsheets. No desk-by-desk drift. One version of the math, firm-wide. Think interest rate models, risk weight formulas, concentration limit rules.
Where This Is Going: Certified Math
The regulatory framework that governs traditional quantitative models in financial services has always required something that goes beyond reproducibility. It requires provenance. A model needs documentation, validation, approval, and change governance.
Nobody has applied that framework to the math inside AI workflows. Until now.
TrueMath’s near-term roadmap extends the governance layer in three directions:
Certification and timestamping. Every formula and method gets a provenance record: what it is, when it was approved, what version is current. The math has a chain of custody, not just a log.
Named approval rights. Specific individuals can be provisioned with authority to approve business logic, and to approve changes to it. An auditor can see not just what the system did, but who signed off on the methodology that governed it.
Change governance. When a model or formula changes, the change is recorded, attributed, and traceable. The regulator doesn’t just see the current state. They see the history and the human decisions behind it.
This is SR 11-7 logic applied to the AI era. It’s not a new concept. It’s an overdue extension.
The Stack, Restated
Raffi’s paper answers one question: is this AI system behaving consistently?
TrueMath answers a different one: is the math in this AI system governed?
These are sequential requirements, not competing ones. Determinism is table stakes. Governance is what gets you past the examiner.
The right architecture: LLM for intent, language, and reasoning. TrueMath for execution, storage, and governance of the math.
Intelligence is becoming abundant. Trustworthy, governed execution is the new scarcity.
Reach out: bill.kelly@truemath.ai
Learn more: truemath.ai
Sign up for early access: https://app.truemath.ai/signup
