Why the LLM never decides

18 March 2025 · Ujjwal Soni · 2 min read

When I started building Vera in early 2025, the prevailing wisdom was that you build an AI agent by giving a large language model tools and letting it reason its way to an action. I tried that first, honestly. The demos were spectacular. Then I watched one confidently assign a technician without a gas certification to a gas-boiler job, and explain its choice in three fluent, persuasive, wrong paragraphs.

That failure mode isn't a bug you patch. It's what a probabilistic text generator is. You can lower the temperature, you can prompt it with the rules in bold capital letters, and you have still only changed the probability of violation, not eliminated it. In an operation where the constraint is "a driver may not exceed legal driving hours" or "hazmat must not pass through residential zones," a 99% compliance rate is not a safety property. It's a lawsuit schedule.

So Vera is built on one rule that shapes everything else: the LLM never decides.

The language model does exactly one job — translation. Free text in, typed structures out, validated against strict schemas. A dispatcher writes "Gold customers must get a P1 response within four hours" and the model produces a structured rule with scope, condition and requirement. If the output doesn't validate, it gets repaired or rejected. It is never waved through.

The deciding is done by Z3, a constraint solver. Solvers are the opposite of language models in every way that matters operationally: deterministic, exhaustive over the rule set, and capable of proving infeasibility. When Vera rejects a plan, it returns the unsat core — the minimal set of rules and facts that make the plan impossible. That's not an explanation the system made up afterwards. It's the actual reason, extracted from the proof.

People sometimes ask whether this is just rules engines again. It isn't, and the difference is the front door: classical rule engines die because encoding rules requires specialists and every change is a project. In Vera, the operators write the rules in their own language, confirm the structured form, and they're live. The LLM solved the authoring problem. The solver solved the trust problem. Neither could solve both.

We're running this architecture in a supervised pilot with a Finnish field-service operator right now. The early lesson is the one I hoped for: dispatchers don't actually want an AI that sounds smart. They want one that can show its work.

Want to see this in your operation?

Bring one real disruption from last week — request a demo and we'll show you the verified version.