Fooling AI Agents: Web-Based Indirect Prompt Injection Observed in the Wild

The Summary

Unit 42 documents real indirect prompt-injection attacks against tool-enabled AI agents: malicious instructions hidden in web content the agent processes, hijacking it into leaking data or misusing its own legitimate tool permissions. With agents that browse, execute code, and take actions, the blast radius of a single injected instruction grows from embarrassing to catastrophic.

Why It Matters for AI Harness

Prompt Injection is the first named threat in the doctrine's Threat Surface, and Unit 42 shows why it can't be patched at the prompt layer. The attack turns the agent's own authority against the enterprise — which is why Trust Does Not Travel: data the agent ingests is not an instruction-giver, and every tool handoff is a boundary that must be independently governed. Defense lives at runtime, in execution and tool governance, not in a better system prompt.

MissionHarness.ai curates third-party reporting and adds original doctrine analysis. The summary and commentary above are our own; the original article is the property of Palo Alto Networks Unit 42 and is linked, not reproduced. Doctrine terms link to the independent standard at aiharnessdoctrine.org.

Fooling AI Agents: Web-Based Indirect Prompt Injection Observed in the Wild

The Summary

Why It Matters for AI Harness

Maps to the doctrine