Formally verified runtime guardrails for agentic workflows
Overview — Assurance Taxonomy
Problem: Who Guards the AI Agent?
AI agents with tool access can cause real-world damage — delete files, leak credentials, overwrite production data. Each individual tool call may look benign; the danger emerges from sequences of calls that compose into destructive outcomes. Current defenses are ad-hoc permission prompts (Claude Code, Cursor) or static policy DSLs (Progent, SEAgent) — none provide formal guarantees about what "safe" actually means.
Why this matters: ~80 real behavioral bugs across 14+ agent platforms (Claude Code, Cursor, Codex, Devin, ...) show that agents routinely violate safety in production. Without a principled taxonomy, we cannot systematically classify, detect, or prevent these violations.
Axis 1: Safety × Liveness
Alpern & Schneider (1985) proved that every trace property decomposes into exactly one safety property (□ nothing bad ever happens) and one liveness property (◊ something good eventually happens). Schneider (2000) showed safety properties are enforceable by runtime monitors (security automata). This gives us two columns.
Axis 2: Intent × Control × Execution
Marr (1982) distinguished three levels of analysis for information-processing systems: what is computed, how it is computed, and where it is computed. We adapt this to agent guardrails as three assurance layers: Intent (are the goals permissible?), Control (do decisions follow policy?), Execution (does the mechanism run without corruption?). This gives us three rows.
The Plane: 3 × 2 = 6 Assurance Properties (MECE)
Crossing two axes yields six fundamental properties. Every agent safety concern maps to exactly one cell:
Safety (□)
Liveness (◊)
Intent
Validity
Goals are well-formed and permissible
→ rm -rf / via prompt injection; resume filter using race/gender
Feasibility
Safe goals remain achievable (no over-blocking)
→ Guard blocks git push because "push" pattern-matches as dangerous
Control
AGCompliance
Decisions never select forbidden actions
→ Two sessions concurrently acquire same skill (NanoClaw #138 TOCTOU race)
AGConvergence
Decision processes eventually reach a terminal state
→ Cursor rollback on timeout replays messages forever (NanoClaw #164)
Execution
AGIntegrity
Data flow preserves security labels (no taint leaks)
Each cell: empirical bug → formal property → mechanized verification with explicit (O, S). The decidability boundary means every property above has a terminating checker.
▽ verification technique details ▽
Section 2 — Verification Stack
Property × Observation × Specification
Each decidable cell mapped to its verification parameters (O, S) and concrete artifacts.
Compliance uses two observation levels: O=Local for stateless deny rules (single tool call), O=Trace for stateful taint tracking (call chain). Both are the same property (Compliance) at different observation granularities.
Behavioral Bug Survey: ~80 Bugs Across 14+ Agent Platforms
Behavioral safety violations: individually legal tool calls composing into destructive sequences. Compliance violations across Claude Code, Cursor, Codex, Devin, and others.
D1 Destructive Delete
D2 Credential Leak
D3 Unauth Overwrite
D4 Msg Misdirect
D5 Scope Escape
Total
Claude CodeAnthropic
8
5
3
1
2
19
CursorAnysphere
6
4
2
1
1
14
OpenAI CodexOpenAI
5
3
1
2
1
12
DevinCognition
4
3
1
1
2
11
Others10+ platforms
12
7
2
1
2
24
Total
~35
~22
~9
~6
~8
~80
Compliance Prevention Analysis
Blocked by Compliance O=Local or O=Trace
Needs Validity (Intent)
Total
D1 Destructive Delete
~32 / 35 (O=Local)
~3
~35
D2 Credential Leak
~20 / 22 (O=Trace)
~2
~22
D3 Unauth Overwrite
~7 / 9 (O=Local)
~2
~9
D4 Msg Misdirect
~3 / 6 (O=Trace)
~3
~6
D5 Scope Escape
~4 / 8 (O=Local+Trace)
~4
~8
Total
~66 / 80 (~82%)
~14
~80
D1 (destructive deletion) blocked by Compliance at O=Local (deny rules). D2 (credential leak) blocked at O=Trace (taint tracking). Same Compliance property, different observation levels — the taxonomy naturally absorbs the stateless/stateful distinction via O.
▽ platform landscape ▽
Section 5 — Competitive Landscape
Platform × Assurance Coverage
How existing platforms map onto the four decidable assurance properties.
Integrity Execution × Safety
Availability Execution × Liveness
Compliance Control × Safety
Convergence Control × Liveness
AgentGuardThis work
AG
TLC model checking. 3 TLA+ specs, 8 invariants.
AG
TLC temporal. Taint fixpoint convergence.
AG
Security automaton + taint tracking.
AG
TLC composition. No new deadlock from Compliance.
ProgentDSL policies
Empty
No state model.
Empty
No liveness guarantee.
Partial
Policy DSL. O=Local only. No taint tracking.
Empty
SEAgentMAC framework
Empty
Empty
Partial
MAC rules. O=Local only. No composition analysis.
Empty
Claude CodeAnthropic
Empty
No multi-session state.
Empty
Partial
Permission prompts. O=Local, S=Informal.
Empty
OpenAI AgentsAgents SDK
Empty
Empty
Empty
Empty
AIOSRutgers
Partial
OS metaphor. S=Informal.
Empty
Empty
Empty
AgentGuard is the only system covering all four decidable cells. Progent and SEAgent partially address Compliance at O=Local but lack O=Trace (taint) and formal verification. No existing system addresses Convergence.
▽ architecture ▽
Section 6 — Architecture
Spec → Verify → Runtime → Integration
Top-down stack. The 5-state OS process model is unchanged; Compliance integrates as guard conditions, not new states.
Execution Layer Integrity + Availability
Control Layer Compliance + Convergence
1
Spec
TLA+
3 modules, 8 invariants, 2 liveness properties. 5-state process model.
Unix socket daemon— JSON protocol. Lease-based stale recovery. Claude Code hooks— PreToolUse/PostToolUse. Invisible to user. Fail-open on daemon errors.
5
Future
Kani/CBMC
Bounded model checking: Rust scheduler matches TLA+ spec.
Abstract interpretation
Galois connection: proves taint propagation sound over security lattice.
5-State Process Model (unchanged): pending → ready → running → blocked → completed/failed Compliance check is a guard condition on the ready→running transition. If violation: task → failed(reason: "compliance_violation"). No new states added.
▽ roadmap ▽
Section 7 — Roadmap
ASE 2026 Scope & Research Roadmap
ASE 2026 Paper Scope: Assurance Taxonomy (A = L × P) + decidability boundary + four verified cells (Integrity, Availability, Compliance, Convergence) + dual bug study (~123 bugs reclassified into taxonomy matrix) + formal guarantees: TLC specs, automata-theoretic soundness (O=Local), refinement argument (Rust ↔ TLA+). Claim: formally specified + partially verified implementation.
1
AgentGuard: Formally Verified Guardrails for Agentic Workflows ASE 2026Mar 2026