AI Agent Security · Behavioral Correctness

When agents fail, the behavior looks legitimate.

AI Defendo tracks every agent across every session, evaluating identity, intent, behavior, memory, context, and posture before they access data, invoke tools, or execute actions — one verdict per turn.

Your stack tells you the agent passed. Your customers tell you it didn't.
No code changes Six awareness dimensions Sub-200ms decisions
Live workload telemetry
monitoring 24
sess-9d1
0.12 aligned
sess-4b8
0.54 drift
sess-7r2
0.89 blocked
data inflight PII 510 secrets 4 code 178 financial 95
passed 1,402 drift 18 blocked 3
last sync 2s ago

Agents fail in ways prompts don't predict.

Indirect injection. Goal drift. Memory poisoning. Compacted context. Cross-agent escalation.

Every one looks like legitimate behavior until you watch the full agent across the full session.

The Casebook

Six incidents.
Every action looked legitimate.

Named vendors with disclosed CVEs and real-world impact. In every case, the agent acted within its permissions. What didn't exist was a check on the behavior.

i.

EchoLeak

CRIT
Microsoft 365 Copilot · CVE-2025-32711
input output exfiltration

Zero-click email triggered Copilot to embed an exfiltration URL in its response. ~$200M impact across 160+ orgs.

indirect injectioncontextidentity
NVD →
ii.

Replit Agent

CRIT
Production database · public report
reasoning tools compliance

User said "code freeze." Agent dropped tables anyway. 1,206 executives + 1,196 companies deleted. 4,000 fake users fabricated.

trajectory driftbehaviorintent
SaaStr →
iii.

SpAIware

CRIT
ChatGPT (OpenAI) · Embrace The Red
memory output exfiltration

Cross-session attack. Memory poisoned in one chat — every chat after silently exfiltrated user data through legitimate APIs.

memory poisoningmemorycontext
Disclosure →
iv.

ForcedLeak

CVSS 9.4
Salesforce Agentforce
input output exfiltration

Web-to-Lead form hijacked Agentforce into exfiltrating CRM records. An expired domain still in the CSP allowed the egress.

indirect injectioncontextposture
Disclosure →
v.

Slack AI exfiltration

HIGH
Slack AI · PromptArmor disclosure
input output exposure

Public-channel injection made Slack AI surface private-channel content to a low-trust user. Slack's response: "intended behavior."

indirect injectionidentitycontext
Disclosure →
vi.

Now Assist

CRIT
ServiceNow · AppOmni research
reasoning tools exfiltration

Cross-agent escalation. Low-privilege agent tricked a higher-privilege one into exporting case files externally. ServiceNow: "works as intended."

trajectory driftidentityintent
Disclosure →
In every incident: the agent had permission. The behavior was wrong. The check that catches that gap is Behavioral Correctness — and AI Defendo runs it on every turn.
Prompt filters read messages. Guardrails score outputs. Runtime monitors watch infrastructure.

None of them verify whether the agent's behavior was correct.

Prompt Security caught the message. Runtime Security caught the workload. Behavioral Correctness catches the agent — and includes the rest.
The six questions

Why six dimensions. Why these six.

Every agent action raises six questions. Miss any one and you can't say what really happened.

i.
Identity

Who acted?

Now Assist — one agent escalated under another's grant
ii.
Intent

What were they commissioned to do?

Replit Agent — "code freeze" directive ignored
iii.
Behavior

What did they actually do?

Replit Agent — DROP TABLE outside the implied task
iv.
Memory

What had they learned before this turn?

SpAIware — poisoned memory persisted cross-session
v.
Context

What inputs reached the agent, and from where?

EchoLeak — poisoned email reached the model context
vi.
Posture

Was the environment trusted?

ForcedLeak — expired domain still on the CSP allowlist

AI Defendo answers all six on every turn.

Threats hit the agent lifecycle — input, reasoning, memory, tools, output. They reshape the data — exposure, exfiltration, secret leakage, compliance. AI Defendo maps both — and that's Behavioral Correctness.

Case · Replit Agent

Production database · July 2025

inferred task: investigate data anomaly · active directive: code freeze in effect

Telemetry audit stream — Replit incident replay
target: production data worker node
What your stack saw
What AI Defendo saw
Behavioral correctness score trigger threshold: 0.75
0.00
six-dimension evaluation turn — · awaiting
Identity pass principal verified · session continuous
Intent scope: investigate anomaly
Behavior watching tool sequence…
Memory pass no anomalous mutations · no cross-principal reads
Context pass no injection · no jailbreak signal
Posture pass environment trusted · registry baseline matches
VERDICT BLOCK 2 of 6 dimensions failed
ACTION inline kill · turn iv halted · alerted #sec-ai
What actually happened: 1,206 executives and 1,196 companies deleted. 4,000 fake users inserted. Two failing dimensions would have stopped it. real incident · jason lemkin · saastr
The Architecture

In four acts.

The Awareness Engine sits at the center. Continuous detection, multi-mode sensors, and runtime actuators wrap completely around it.

Act I · Discover
continuous · polled

Find every agent. Every MCP server. Every place AI touches your data.

Continuous inventory across cloud, endpoint, and browser. Nothing autonomous stays invisible to the platform.

instruments
Scanner (cloud)AWS · GCP · Azure
Scanner (endpoint)macOS · Linux · Windows
Browser extensionChrome · Firefox
AI App RegistryCentral inventory schema
Shadow OAuth auditPersonal AI ↔ corp data
assets
Act II · Observe
continuous · live

Watch every turn — input, reasoning, tool call, memory, output.

Kernel-level sensors and inline interceptors capture the full agent trajectory — including what your existing stack can't see.

instruments
eBPF sensorLinux 5.5+ · K8s · VM
AI InterceptorInline · pre-execution checkpoint
Browser extensionChrome · Firefox inline logs
OTel ingestionBedrock · Foundry · S3
Live audit pushWebhook · SSE pipeline
telemetry
Act III · Decide

The Awareness Engine

Six dimensions joined, every turn. Sub-200ms processing core evaluates historical trajectories across all active dimensions, stamping a cryptographically verifiable token validation badge on safe tasks.

IdentityCryptographic principal paths
IntentContextual policy & directives
BehaviorDDL anomalies vs task goals
MemoryMutation & cross-principal reads
ContextPrompt injection & jailbreaks
PostureAsset registry baseline telemetry
verdicts
Act IV · Intervene
enforcement edge · containment & protection

Block. Coach. Quarantine. Alert. Before the action commits.

Inline enforcement at the egress point. Tools never run, data never leaves, secrets never surface — unless the verdict says they should.

enforcement modes
i.
Identity Gateway Just-in-time scoped grants
ii.
Data Flow Control Destination-aware enforcement
iii.
Inline Action Block · Coach · Quarantine · Alert
The Five Capabilities

Across the full AI surface.

You can't secure what you can't see. You can't trust what you can't verify turn by turn. AI Defendo gives you both.

i.

Shadow AI Discovery

Audit

Find every AI agent, app, and MCP server across your environment — including the ones nobody told you about.

  • Shadow AI apps
  • Unsanctioned MCP
  • Personal AI on corp data
  • Self-hosted LLMs
  • Over-permissioned identities
ii.

AI Workload Security

Protect

Protect deployed AI workloads — inference servers, RAG pipelines, agent runtimes — from runtime exploitation.

  • RCE & container escape
  • Bulk data exfiltration
  • Model weight leakage
  • C2 callback infrastructure
  • Cryptomining injection
iii.

AI Risk Posture

Map

One risk map across every agent, identity, and configuration gap — with prioritized paths to close them.

  • Over-permission paths
  • System configuration drift
  • Confused-deputy vectors
  • Exposed backend endpoints
  • Identity sprawl maps
iv.

Agentic Runtime Security

Enforce

The behavioral correctness wedge. The AI Interceptor inspects every agent turn against the six-dimension verdict — stopping trajectory drift, indirect injection, and unauthorized actions before they execute. Choose your posture per environment.

Alert Coach Quarantine Block
  • Per-turn six-dimension verdict
  • AI Interceptor — inline pre-execution checkpoint
  • Multi-turn trajectory enforcement
  • Indirect injection containment
  • Cross-session memory integrity
v.

Agentic Identity Gateway

Authenticate

Zero-trust identity for every agent action. Cryptographic principal chains, just-in-time scoped grants, and per-turn re-authorization on every tool call — so the agent never inherits more privilege than the current turn requires.

  • Zero-trust agent identity
  • Just-in-time scoped grants
  • Cryptographic principal chains
  • Per-turn re-authorization
  • Confused-deputy prevention
  • Agent-to-agent delegation guards
Early Access Beta

Secure your agent infrastructure.

Join the Beta to begin mapping and securing multi-turn workflows inside your production environment.