Skip to content
Agent Threat Model

How AgentTrust limits risk in autonomous AI systems

Agents are autonomous, long-running, and capable of taking actions across many systems with minimal oversight. When they fail, they often fail quietly-and impact can accumulate before anyone notices.

AgentTrust is built to govern actions before they execute. Not prompts. Not reasoning quality. This page outlines threats we mitigate-and boundaries we intentionally do not cross.

Control loop
IntentPolicyApprovalSessionActionAudit(approval if required)

AgentTrust assumes agents can be manipulated or compromised. The loop ensures failures remain bounded, attributable, and provable.

Why agent security is different
  • • Agents run continuously
  • • Tools chain into tools
  • • Credentials outlive intent
  • • “Looks normal” abuse accumulates
Reference: NIST AI RMF

Threats we’re designed to address

These threats are mitigated by enforcing explicit intent, bounded scope, time limits, and audit-grade evidence at the moment of action.

Agent identity spoofing

Threat

An agent, tool, or service claims an identity or privilege level it does not possess.

Why it works

Many agent stacks infer trust from context. Identity becomes implied instead of enforced at execution time.

How AgentTrust helps

AgentTrust authorizes only policy-backed actions and validates a scoped session before execution. Self-asserted identity does not execute.

Long-horizon / delayed-execution attacks

Threat

An agent behaves benignly then escalates over hours/days (enumeration, staged changes, delayed destructive steps).

Why it works

Long-lived credentials and no enforced mission duration allow quiet escalation outside review windows.

How AgentTrust helps

AgentTrust issues time-bound sessions. When a session expires, intent must be re-declared and re-evaluated. This limits blast radius.

Privilege creep during execution

Threat

An agent starts with limited access and gradually expands privileges through tool chaining or implicit scope growth.

Why it works

Permissions are often coarse and static for the lifetime of the agent.

How AgentTrust helps

Scope is defined at decision time and enforced for the session duration. Scope cannot expand without a new policy decision (and approval if required).

Unregistered / shadow agents

Threat

New agents or automations appear without visibility, review, or governance.

Why it works

There is no centralized enforcement point across frameworks and tool boundaries.

How AgentTrust helps

AgentTrust is the pre-execution decision point. Actions that don’t pass through policy evaluation are not authorized to execute.

Threats we do not attempt to solve

AgentTrust is intentionally scoped. Some risks require different layers of control.

Prompt injection

Not addressed by AgentTrust (prevention); impact is bounded through explicit intent and scoped sessions.

Prompt security is model- and application-level. AgentTrust assumes agents may be manipulated and focuses on limiting the impact of resulting actions.

Hallucinations or reasoning errors

Not addressed by AgentTrust.

AgentTrust enforces authorization, not correctness. It bounds what can happen when reasoning fails.

Insider abuse with legitimate approval

Not prevented.

Human approval is a trust boundary. AgentTrust provides attribution, auditability, and replayability rather than prevention.

Downstream system vulnerabilities

Out of scope.

AgentTrust controls who can act, for how long, and with what scope. It does not replace patching or hardening downstream systems.

Pairs well with

AgentTrust is a control layer, so it relies on adjacent safeguards to close the loop.

  • • Prompt injection defenses (app-level)
  • • Data loss prevention / downstream data controls
  • • Vulnerability mgmt / hardening for target systems
  • • Identity controls (SCIM/OIDC/OAuth patterns)

Design principle

AgentTrust assumes agents may hallucinate, be manipulated, behave unexpectedly, or be compromised. The goal is not perfect agents. The goal is safe operation at scale.

Explicitly authorized actions
Tightly scoped access
Time-bounded sessions
Attribution to identities
Audit receipts and replayability
Fail-closed enforcement