Fooling AI Agents: Web-Based Indirect Prompt Injection Observed in the Wild
Read the original on Palo Alto Networks Unit 42 ↗The Summary
Unit 42 documents real indirect prompt-injection attacks against tool-enabled AI agents: malicious instructions hidden in web content the agent processes, hijacking it into leaking data or misusing its own legitimate tool permissions. With agents that browse, execute code, and take actions, the blast radius of a single injected instruction grows from embarrassing to catastrophic.
Why It Matters for AI Harness
Prompt Injection is the first named threat in the doctrine's Threat Surface, and Unit 42 shows why it can't be patched at the prompt layer. The attack turns the agent's own authority against the enterprise — which is why Trust Does Not Travel: data the agent ingests is not an instruction-giver, and every tool handoff is a boundary that must be independently governed. Defense lives at runtime, in execution and tool governance, not in a better system prompt.
Maps to the doctrine
This story illustrates the following principles of the independent AI Harness Doctrine:
MissionHarness.ai curates third-party reporting and adds original doctrine analysis. The summary and commentary above are our own; the original article is the property of Palo Alto Networks Unit 42 and is linked, not reproduced. Doctrine terms link to the independent standard at aiharnessdoctrine.org.