Audit trails for AI decisions

What a compliance-grade decision record contains, why post-hoc "explainability" isn't it, and what regulations are starting to expect.

Explanation vs. record

Most AI "explainability" answers the question what might the model have been thinking? — a reconstruction, generated after the fact, with no guaranteed connection to the actual computation. That can be useful for debugging. It is close to worthless in a dispute.

An audit trail answers a different question: what exactly happened? For an operational decision, a record that can survive a customer dispute, an insurance claim or a regulator's query needs to contain, at minimum:

The trigger — the task or disruption, as received.
The inputs — the facts used, and which versions of which rules were active.
The alternatives — every option considered, with the verdict on each and the specific rule any rejected option violated.
The decision — what was chosen, and on what basis it was preferred.
The authority — which autonomy mode applied, and who approved (or that it executed autonomously under a configured policy).
Immutability — append-only storage, so the record's integrity isn't an honor-system claim.

Why architecture decides whether this is possible

Here's the uncomfortable part: whether you can keep such records is determined by the decision-making architecture, not the logging budget.

If a language model makes the decision, the "basis" of the decision is a distribution over tokens — there is nothing crisp to record. You can log the prompt and the output, but the middle is fog, and the explanation you log is itself generated text.

If a deterministic solver makes the decision over structured facts and versioned rules, the full record falls out naturally: the inputs are enumerable, the verdicts are reproducible, and rejections come with proofs (unsat cores). Logging 100% of decisions with complete reasoning isn't a heroic feature; it's a side effect of the design.

The regulatory direction

The EU AI Act phases in obligations around transparency, human oversight and record-keeping for AI systems, scaled by risk class. Sector rules in transport — driving-time regulation, dangerous-goods (ADR) rules, safety certifications — already demand demonstrable compliance today. The practical reading for operations leaders: any AI that takes or recommends operational actions will increasingly need to show which rule, which fact, which approval on demand.

One operational bonus

Teams that run verified audit trails report an unplanned benefit: the trail becomes the internal arbiter. "Why was it scheduled this way?" stops being an argument between shifts and becomes a lookup. Institutional memory with timestamps beats institutional memory with seniority.