Why do IaC tools use state files at all?

Three legitimate reasons - mapping declared resources to real provider identifiers, caching state to avoid slow refresh calls, and recording dependency metadata for ordering destroys. These are real problems; the state file solves them at the cost of maintaining a copy of reality that must be locked, refreshed, and repaired.

Can agents just use Terraform via its CLI?

They can, and for executing an existing human workflow (plan, review, apply) it works. It breaks down when agents need to ask ad-hoc questions, act on resources outside the state file, or operate concurrently - state locking serializes writers, and the state answers only "what does Terraform manage?", not "what exists?".

What replaces the state file for agents?

Live querying of the provider APIs. The control plane is the source of truth; an interface that queries it directly (and mutates through the same surface) gives agents current facts, no lock contention, and visibility into unmanaged resources. This is the model StackQL implements.

Why State Files Break Down for AI Agents

State files were designed around assumptions that AI agents violate: one writer at a time, changes batched into infrequent human-reviewed applies, and a workflow that tolerates the state being stale between refreshes. Agents are concurrent, act continuously, and reason from whatever facts they are given - if those facts come from a state file, the agent reasons from a stale, partial copy of reality.

The assumptions, made explicit

The state-file model works when:

Writes are serialized. State locking assumes contention is rare. Human teams apply a few times a day; lock contention is an edge case.
Reads happen at plan time. State is refreshed when a human runs plan; staleness in between is invisible because nobody is looking.
The managed set is the relevant set. The state file describes only resources the tool created. The workflow assumes nothing important exists outside it.
Errors are repaired by experts. Corrupted or divergent state is fixed by a practitioner doing state surgery (imports, moves, manual edits).

Each assumption fails under agentic operation. Agents act in seconds, so serialized writes become a throughput ceiling and lock contention the norm. Agents observe continuously, so between-refresh staleness becomes a stream of false premises. Agents are routinely asked about the whole environment ("is anything publicly exposed?"), and the answer outside the managed set is precisely what matters. And an agent that corrupts state cannot reliably perform its own state surgery - the repair loop assumed human judgment.

The deeper problem: the copy is the interface

All of these reduce to one design decision: the state file makes a copy of infrastructure state the system's working interface. Every copy has a synchronization problem, and every consumer of the copy inherits it. Infrastructure drift is not an operational accident; it is the steady-state behavior of a copy in an environment where reality changes through other channels - consoles, autoscalers, other tools, other agents.

Human workflows absorb this with process: change freezes, single-tool mandates, periodic reconciliation. Agentic environments cannot - the value of agents is that they act often and independently, which is exactly the traffic pattern that makes a coordinated copy unmaintainable.

The replacement primitive

The control plane itself is already a consistent, authoritative, concurrent-safe state store - the providers run it. What agents need is an interface to it that fits how they work: declarative queries over current state, uniform across providers, with mutations expressed in the same grammar and gated by policy. That is the model StackQL implements:

SELECT instance_id, instance_type, launch_time
FROM aws.ec2.instances
WHERE region = 'us-east-1';

The result is true at execution time, includes unmanaged resources, requires no lock, and needs no repair workflow - there is nothing to corrupt. Identifier mapping and dependency ordering (the state file's legitimate jobs) move into the query engine and deployment orchestration (stackql-deploy) rather than into a mutable artifact the agent must keep consistent.

What is Infrastructure Drift? - drift as a property of copies
What is Agentic Infrastructure? - the requirements agents impose
StackQL vs Terraform - the two models side by side
What is Queryable Infrastructure? - the live-query operating model
How to use StackQL with AI agents - the practical setup

The assumptions, made explicit​

The deeper problem: the copy is the interface​

The replacement primitive​

Related concepts​

The assumptions, made explicit

The deeper problem: the copy is the interface

The replacement primitive

Related concepts