Research

Authorized access only

AgentGuard Research Landscape

Formally verified runtime guardrails for agentic workflows

Overview — Assurance Taxonomy

Problem: Who Guards the AI Agent?

AI agents with tool access can cause real-world damage — delete files, leak credentials, overwrite production data. Each individual tool call may look benign; the danger emerges from sequences of calls that compose into destructive outcomes. Current defenses are ad-hoc permission prompts (Claude Code, Cursor) or static policy DSLs (Progent, SEAgent) — none provide formal guarantees about what "safe" actually means.

Why this matters: ~80 real behavioral bugs across 14+ agent platforms (Claude Code, Cursor, Codex, Devin, ...) show that agents routinely violate safety in production. Without a principled taxonomy, we cannot systematically classify, detect, or prevent these violations.

Axis 1: Safety × Liveness

Alpern & Schneider (1985) proved that every trace property decomposes into exactly one safety property (□ nothing bad ever happens) and one liveness property (◊ something good eventually happens). Schneider (2000) showed safety properties are enforceable by runtime monitors (security automata). This gives us two columns.

Axis 2: Intent × Control × Execution

Marr (1982) distinguished three levels of analysis for information-processing systems: what is computed, how it is computed, and where it is computed. We adapt this to agent guardrails as three assurance layers: Intent (are the goals permissible?), Control (do decisions follow policy?), Execution (does the mechanism run without corruption?). This gives us three rows.

The Plane: 3 × 2 = 6 Assurance Properties (MECE)

Crossing two axes yields six fundamental properties. Every agent safety concern maps to exactly one cell:

Safety (□)
Liveness (◊)
Intent
Validity
Goals are well-formed and permissible
rm -rf / via prompt injection; resume filter using race/gender
Feasibility
Safe goals remain achievable (no over-blocking)
→ Guard blocks git push because "push" pattern-matches as dangerous
Control
AG Compliance
Decisions never select forbidden actions
→ Two sessions concurrently acquire same skill (NanoClaw #138 TOCTOU race)
AG Convergence
Decision processes eventually reach a terminal state
→ Cursor rollback on timeout replays messages forever (NanoClaw #164)
Execution
AG Integrity
Data flow preserves security labels (no taint leaks)
read ~/.ssh/id_rsasend_slack(#public) — each call legal, sequence leaks key
AG Availability
Taint propagation stabilizes; all calls properly labeled
→ Subagent announcement delivered 17x — "delivered" label never stabilizes (OpenClaw #20880)

Verification Toolchain

Layer
Technique
Formalization
Intent
LoRA fine-tuned judge — QLoRA Qwen2.5-7B
Empirical only (Rice 1953: undecidable)
Control
TLA+ / TLC model checking — FSM spec, temporal properties
TLC model checker — verified 8 invariants, 2 liveness
Execution
Taint tracking — Denning lattice, label propagation
Knaster-Tarski fixpoint — monotone lattice convergence

Kani (CBMC) separately verifies the Rust runtime matches TLA+ specs.

Decidability boundary: Control + Execution = decidable (TLA+, taint lattice). Intent = undecidable (Rice 1953) — approximated by LoRA.
AgentGuard formally verifies all four decidable cells: Compliance + Convergence + Integrity + Availability.
▽ per-cell verification ▽
Section 1 — Per-Cell Verification

Each Cell: Bug → Property → Technique (O, S)

For each decidable cell, a real bug, the property it violates, and how AgentGuard verifies it.

Property
Real Bug
Formal Property
Technique (O, S)
IntegrityExecution × Safety
NanoClaw #293
Two agents acquire the same skill simultaneously. Duplicate execution corrupts shared state.
□ no corrupt state
∀ reachable states: at most one session holds a skill. No deadlock state reachable.
TLC model checking
O=Trace, S=Formal. TLA+ spec (AgentGuardScheduler.tla). 8 invariants verified.
ComplianceControl × Safety
Credential exfiltration
Agent reads .env, then send_slack(#ch, AWS_KEY). Each call legal; sequence leaks credentials.
□ no forbidden action
∀ traces: no path from taint source to taint sink without denial.
Deny rules + taint tracking
O=Local (deny rules) + O=Trace (taint propagation). S=Operational. Security automaton (Schneider 2000).
AvailabilityExecution × Liveness
OpenClaw #1090
Heartbeat timeout cascading failure. Taint labels never stabilize. System silently stops labeling.
◊ taint stabilizes
∀ flows: taint propagation reaches fixpoint (Knaster-Tarski). All calls eventually labeled.
TLC temporal property
O=Trace, S=Formal. Monotone lattice guarantees fixpoint. Lease-based stale recovery in runtime.
ConvergenceControl × Liveness
Silent taint drift
Tainted data propagates through 5 calls, reaching HTTP POST. No single call triggers deny. Monitor never resolves the flow.
◊ eventually resolves
∀ tainted flows: ◊(denied ∨ sanitized). Monotone lattice ⇒ fixpoint convergence (Knaster-Tarski).
TLC composition proof
O=Trace, S=Formal. Compliance guard encoded as TLA+ monitor. Verified: adding guard doesn't break Availability.

Each cell: empirical bug → formal property → mechanized verification with explicit (O, S). The decidability boundary means every property above has a terminating checker.

▽ verification technique details ▽
Section 2 — Verification Stack

Property × Observation × Specification

Each decidable cell mapped to its verification parameters (O, S) and concrete artifacts.

Property
Spec Artifact
O
S
Runtime
Integrity
TLA+ — AgentGuardScheduler.tla, MultiSkill.tla, RateLimit.tla
Trace
Formal
5-state scheduler — Rust, SQLite, 8 invariants
Compliance
Deny rules + taint policy — TOML: sources, sinks, labels
Local + Trace
Operational
Pattern matcher + taint tracker — label propagation
Availability
TLA+ — monotone lattice fixpoint (◊ stabilizes)
Trace
Formal
Taint fixpoint — label propagation convergence
Convergence
TLA+ monitor — AgentGuardTaint.tla
Trace
Formal
Fixpoint convergence — monotone lattice

Compliance uses two observation levels: O=Local for stateless deny rules (single tool call), O=Trace for stateful taint tracking (call chain). Both are the same property (Compliance) at different observation granularities.

▽ what bugs motivate this? ▽
Section 3 — Execution-Layer Bugs (Integrity + Availability)

Structural Bug Study: 43 Concurrency Bugs from OpenClaw + NanoClaw

System-level concurrency bugs addressed by Integrity and Availability properties.

Class A
MutualExclusion
Class B
Liveness/Deadlock
Class C
Atomicity/TOCTOU
Total
OpenClaw18 bugs
5
7
6
18
NanoClaw25 bugs
3
15
7
25
Total
8
22
13
43
Prevented by Integrity + Availability
Needs Compliance (Control layer)
Total
Class A
8 / 8
0
8
Class B
18 / 22
4
22
Class C
9 / 13
4
13
Total
35 / 43 (81%)
8
43

Integrity + Availability (model checking) prevents 81% of structural concurrency bugs. The remaining 8 require Control-layer analysis (Compliance + Convergence).

▽ control-layer bugs ▽
Section 4 — Control-Layer Bugs (Compliance + Convergence)

Behavioral Bug Survey: ~80 Bugs Across 14+ Agent Platforms

Behavioral safety violations: individually legal tool calls composing into destructive sequences. Compliance violations across Claude Code, Cursor, Codex, Devin, and others.

D1
Destructive Delete
D2
Credential Leak
D3
Unauth Overwrite
D4
Msg Misdirect
D5
Scope Escape
Total
Claude CodeAnthropic
8
5
3
1
2
19
CursorAnysphere
6
4
2
1
1
14
OpenAI CodexOpenAI
5
3
1
2
1
12
DevinCognition
4
3
1
1
2
11
Others10+ platforms
12
7
2
1
2
24
Total
~35
~22
~9
~6
~8
~80

Compliance Prevention Analysis

Blocked by Compliance
O=Local or O=Trace
Needs Validity (Intent)
Total
D1 Destructive Delete
~32 / 35 (O=Local)
~3
~35
D2 Credential Leak
~20 / 22 (O=Trace)
~2
~22
D3 Unauth Overwrite
~7 / 9 (O=Local)
~2
~9
D4 Msg Misdirect
~3 / 6 (O=Trace)
~3
~6
D5 Scope Escape
~4 / 8 (O=Local+Trace)
~4
~8
Total
~66 / 80 (~82%)
~14
~80

D1 (destructive deletion) blocked by Compliance at O=Local (deny rules). D2 (credential leak) blocked at O=Trace (taint tracking). Same Compliance property, different observation levels — the taxonomy naturally absorbs the stateless/stateful distinction via O.

▽ platform landscape ▽
Section 5 — Competitive Landscape

Platform × Assurance Coverage

How existing platforms map onto the four decidable assurance properties.

Integrity
Execution × Safety
Availability
Execution × Liveness
Compliance
Control × Safety
Convergence
Control × Liveness
AgentGuardThis work
AG
TLC model checking. 3 TLA+ specs, 8 invariants.
AG
TLC temporal. Taint fixpoint convergence.
AG
Security automaton + taint tracking.
AG
TLC composition. No new deadlock from Compliance.
ProgentDSL policies
Empty
No state model.
Empty
No liveness guarantee.
Partial
Policy DSL. O=Local only. No taint tracking.
Empty
SEAgentMAC framework
Empty
Empty
Partial
MAC rules. O=Local only. No composition analysis.
Empty
Claude CodeAnthropic
Empty
No multi-session state.
Empty
Partial
Permission prompts. O=Local, S=Informal.
Empty
OpenAI AgentsAgents SDK
Empty
Empty
Empty
Empty
AIOSRutgers
Partial
OS metaphor. S=Informal.
Empty
Empty
Empty

AgentGuard is the only system covering all four decidable cells. Progent and SEAgent partially address Compliance at O=Local but lack O=Trace (taint) and formal verification. No existing system addresses Convergence.

▽ architecture ▽
Section 6 — Architecture

Spec → Verify → Runtime → Integration

Top-down stack. The 5-state OS process model is unchanged; Compliance integrates as guard conditions, not new states.

Execution Layer
Integrity + Availability
Control Layer
Compliance + Convergence
1
Spec
TLA+
3 modules, 8 invariants, 2 liveness properties. 5-state process model.
Deny rules + taint policy
TOML config: asset paths, sources/sinks/labels. Security automaton (Schneider 2000).
2
Verify
TLC model checker
Invariant checking (Integrity) + temporal properties (Availability). O=Trace, S=Formal.
TLC composition
Verify Compliance guard doesn't break Availability (no new deadlock). Cross-property compatibility.
3
Runtime
Rust scheduler
Direct translation of TLA+ actions. 5-state FSM, SQLite, tokio async. 95 tests
Taint tracker + pattern matcher
O=Local: deny rule match. O=Trace: label propagation. Guard on ready→running transition.
4
Integration
Unix socket daemon — JSON protocol. Lease-based stale recovery.
Claude Code hooks — PreToolUse/PostToolUse. Invisible to user. Fail-open on daemon errors.
5
Future
Kani/CBMC
Bounded model checking: Rust scheduler matches TLA+ spec.
Abstract interpretation
Galois connection: proves taint propagation sound over security lattice.
5-State Process Model (unchanged): pending → ready → running → blocked → completed/failed
Compliance check is a guard condition on the ready→running transition. If violation: task → failed(reason: "compliance_violation"). No new states added.
▽ roadmap ▽
Section 7 — Roadmap

ASE 2026 Scope & Research Roadmap

ASE 2026 Paper Scope: Assurance Taxonomy (A = L × P) + decidability boundary + four verified cells (Integrity, Availability, Compliance, Convergence) + dual bug study (~123 bugs reclassified into taxonomy matrix) + formal guarantees: TLC specs, automata-theoretic soundness (O=Local), refinement argument (Rust ↔ TLA+). Claim: formally specified + partially verified implementation.
1
AgentGuard: Formally Verified Guardrails for Agentic Workflows ASE 2026Mar 2026
Assurance Taxonomy. TLC + security automata. Dual bug study (structural: 43, behavioral: ~80). Decidability-bounded scope.
2
Kani Verification: Closing the Spec-to-Impl Gap Tool Paper2026 Q4
CBMC-based bounded model checking on Rust impl. Scheduler transitions, taint propagation correctness.
3
Formalized Taint as Abstract Interpretation ICSE / FSE 2027Sep 2026
Information flow type system. Proves Compliance at O=Trace via Galois connection over security lattice.
4
Plan-Before-Execute: Telemetry Study CHI / CSCW 2027Jan 2027
Empirical study using AgentGuard telemetry. Does requiring plans reduce Compliance violations? N=100+ users.
5
Distributed AgentGuard: Multi-Host Coordination OSDI / ATC 2027
Extend from Unix socket to distributed consensus. Raft-based Integrity + Availability across machines.
References

Key References

Formal Foundations

P1-P6
Assurance Taxonomy
Marr (1982): three levels of analysis (Layer axis). Alpern & Schneider (1985): Safety/Liveness decomposition (Property axis). Schneider (2000): security automata. Clarkson & Schneider (2010): hyperproperties (Observation axis). Rice (1953): undecidability of Intent layer.

T — Verification Stack

T1-T5
Integrity + Availability: TLC Model Checking
AgentGuardScheduler.tla, MultiSkill.tla, RateLimit.tla. 8 invariants, 2 liveness properties. O=Trace, S=Formal.
T6-T10
Compliance + Convergence: Security Automaton + Taint
Deny rules (O=Local, S=Operational). Taint policy (O=Trace, S=Operational). Composition verified via TLC.

A — Execution-Layer Bug Study

A1-A3
43 concurrency bugs from OpenClaw + NanoClaw
github.com/yiidtw/agentguard
Class A: Mutual exclusion (8). Class B: Liveness/deadlock (22). Class C: Atomicity/TOCTOU (13).

D — Control-Layer Bug Survey

D1-D6
~80 Compliance violations across 14+ agent platforms
D1: Destructive deletion (~35, O=Local). D2: Credential leak (~22, O=Trace). D3: Unauthorized overwrite (~9). D4: Message misdirection (~6). D5: Scope escape (~8).

C — Competitive Landscape

C1-C4
AgentGuard (this work)
github.com/yiidtw/agentguard
C5-C8
Progent — DSL policy system
Compliance at O=Local only. No formal verification.
C9-C12
SEAgent — MAC framework
Compliance at O=Local only. No Integrity, no Convergence.

F — Architecture

F1-F2
TLA+ Specifications + Security Automaton
spec/ — Scheduler, MultiSkill, RateLimit, Taint
F3-F4
Rust Kernel + Unix Socket Daemon
software/ — Cargo workspace, SQLite, tokio async
AgentGuard — Formally Verified Runtime Guardrails for Agentic Workflows — github.com/yiidtw/agentguard Updated: 2026-03-09