When AI Agents Start Operating Excel, Determinism Still Is Not Solved

Anthropic’s expansion of Claude’s enterprise “Cowork” plugins marks an important inflection point in the evolution of AI systems. Claude is no longer positioned solely as a conversational assistant but as an operational actor capable of interacting directly with tools like Excel, PowerPoint, Google Docs and Sheets, and internal enterprise systems. This signals a broader shift in the AI stack. Language models are moving from generating drafts and answering questions to executing structured work inside the systems where financial, operational, and regulatory decisions are made.

It is natural to assume that once a model can operate Excel, the math reliability issue disappears. Excel is deterministic. Given a fixed formula and fixed inputs, it will always return the same result. On the surface, the architecture appears sound. Claude interprets intent, writes formulas into Excel, and Excel computes the output. Determinism is restored through the spreadsheet.

The problem is that this misunderstands where determinism must exist.

Deterministic Arithmetic vs Deterministic Construction

Excel deterministically evaluates formulas. It does not deterministically construct them. When Claude builds a financial model, restructures a workbook, or generates a multi-step projection, it must decide how to decompose the problem, which intermediate variables to create, how to link sheets, which assumptions to embed, and how to interpret ambiguity. Those modeling decisions are generated probabilistically at runtime. Two identical prompts can yield slightly different model structures or embedded assumptions. Excel will compute whatever it is given with perfect consistency, but the model determines what it is given.

In complex enterprise workflows, the risk is not arithmetic error. The risk is silent structural variation. Financial projections, underwriting models, healthcare cost analyses, and engineering simulations depend on stability of logic as much as correctness of calculation. If a model constructed last quarter cannot be reconstructed exactly under the same assumptions, the organization does not have governed automation. It has a powerful but opaque artifact.

Plugins and enterprise connectors standardize workflow access and orchestration. That is important. But workflow governance is not computational governance. Controlling which systems an agent can access does not guarantee that the logic producing enterprise-critical numbers is deterministic, versioned, and replayable.

Excel as Human-in-the-Middle Governance

There is another structural issue that is less discussed. Excel has historically functioned as a human-in-the-middle governance layer. A person builds the model. A person reviews the formulas. A person signs off on the assumptions. The spreadsheet becomes both the execution environment and the artifact of accountability.

That governance model assumes human authorship at every step.

When an autonomous agent begins constructing and modifying spreadsheets, that assumption changes. The spreadsheet remains the artifact, but the human is no longer necessarily the author of the logic. Governance shifts from reviewing human judgment to reviewing machine-generated structure. The familiarity of Excel creates a sense of safety, but its governance model was built around human construction, not autonomous execution.

Even if every formula calculates correctly, the deeper question becomes who owns the structure, how that structure is versioned, and whether it can be replayed at the level of logic rather than at the level of files.

Even Perfect Construction Does Not Solve the Storage Problem

Suppose the reasoning layer became fully stable. The IBM Drift study by Raffi Khatchadourian argues that larger models may drift rather than stabilize, but assume for a moment that this challenge is solved. The model decomposes problems the same way every time and writes identical formulas into Excel under identical conditions. Even in that scenario, Excel remains a document-based storage model for what is now becoming agent-driven execution infrastructure.

Spreadsheets are extraordinary productivity tools. They were not designed to be structured systems of record for autonomous agents constructing business logic at scale. They store formulas and results, but they do not inherently separate execution graphs from presentation layers. They do not version business logic at the level of variables and dependencies. They do not provide structured replay under time-locked assumptions without significant external scaffolding. Governance becomes file-centric rather than logic-centric.

Supporting existing workflows is attractive. Financial teams already use spreadsheets. Audit processes already reference saved workbooks. Continuity reduces friction. But architectural familiarity is not architectural optimality. As AI systems increasingly generate and modify these artifacts, governance shifts toward managing documents rather than managing deterministic execution logic. If a structured, versioned execution layer exists that can store business logic independently of presentation artifacts, then the spreadsheet no longer needs to serve as the ultimate audit substrate. It can remain a powerful interface without remaining the system of record.

Smarter Agents Increase the Need for Deterministic Infrastructure

There is a broader pattern in AI infrastructure. Improvements at the reasoning layer consistently increase demand for stable infrastructure beneath it. Better embeddings did not eliminate vector databases. More capable code models did not eliminate orchestration tools. In the same way, more capable reasoning agents operating inside enterprise software increase the need for deterministic execution layers that are independent of the model and independent of document artifacts.

This is not a critique of Anthropic. The movement toward operational AI is inevitable and beneficial. The question is not whether Claude can operate Excel. It clearly can. The deeper question is whether enterprise-critical numbers are being produced by a deterministic, versioned, replayable execution substrate or by a probabilistic construction layer writing into mutable documents.

As AI autonomy increases, the distinction between reasoning and execution becomes more important, not less. Language models excel at interpreting intent and navigating ambiguity. Deterministic execution, particularly in domains where numbers carry legal or financial consequences, requires fixed logic graphs, explicit variable definitions, version control, and replayability under time-locked assumptions. That is the category of infrastructure TrueMath is designed to address: a dedicated execution layer that stores and governs business logic independently of both the model and the spreadsheet artifact.

Claude operating Excel is a meaningful step forward. But it is not the final answer to deterministic computation or enterprise governance. Arithmetic may be stable. Construction may vary. And even if construction stabilizes, document-centric storage built for human-in-the-middle workflows may still be insufficient for autonomous systems. As AI systems take on greater operational responsibility, organizations will need to decide whether productivity artifacts remain their execution layer or whether a purpose-built deterministic substrate sits beneath them. That decision will shape not just convenience, but reliability, auditability, and institutional trust.

Reach out: bill.kelly@truemath.ai
Learn more: truemath.ai
Sign up for early access: https://app.truemath.ai/signup