
Fear not, dear reader, I haven't actually
resigned.
But I did mock up a personalized version of an email OpenAI published Monday to illustrate the dangers of malicious "prompt injections" as we move from generative to agentic AI.
Turns out, the shift to "agent mode" is not without its risks, because -- no surprise -- malicious actors have already figured out ways to inject exploitative prompts in our agents as we empower
them to traverse forms of digital media -- the web pages, email, you name it.
The funny thing is I never even heard of prompt injections until OpenAI published a primer on it Monday, pointing out that he already has created a security update for the email example (outlined step-by-step below).
advertisement
advertisement
But much like other forms of malware, malicious actors are ingenious and work hard to find new ways of working around digital media security systems.
What's different about agentic mode is
that once we empower an agent to act on our behalf, it's not like generative prompts or even co-piloting. It's entrusting AI to act on your own behalf, without ever knowing what kind of trouble they
can get into when another third-party -- a bad actor -- finds a way of activating a new, undesirable prompt.
Or as expert Shelly Palmer writes in his newsletter today:
"OpenAI admitted yesterday that prompt injection attacks, which occur when an AI encounters malicious instructions hidden in content it processes and treats them as commands, may never be
fully solved. In other words, the same access that makes agents valuable is exactly what makes them dangerous.
As an industry -- to to mention as individuals -- we are rapidly accelerating
toward the brave new agentic world and mostly not even thinking of the consequences.
In fact, my last column here was a about the Association of National Advertisers selecting "agentic AI" as
its marketing word of the year, and we've already seen live, open agentic trades between to agents acting on a media buyer's and seller's behalf.
This is not a not-to-distant future scenario.
It's here and now.
And there are no regulated guardrails. I mean, just yesterday my colleague Wendy Davis reported on the Federal Trade Commission vacating an
order against an allegedly deceptive AI firm, because our federal "policy" is to not impede innovation among AI developers with unnecessary regulations that might cause America to lose the AI "race."
I'm not a big fan of move-fast-and-break-things business models, and I'm certainly not in favor of it for
an emerging super power model.
On the bright side, OpenAI is trying to move-fast-and-fix-things before they break, and the potential threat illustrated in the agentic email management scenario
below has been thwarted. As for future prompt injection attacks, OpenAI says they will be discovered via a process of "end-to-end and high-compute reinforcement learning.
That reminds me a
little of what ad fraud detection services say they do, but for all the venture capital and ad spending supporting them, there are no vectors for ad fraud every day.
Some things to think about
as you shift toward agent mode.
* * *
Anatomy Of A Malicious "Prompt Injection"
1 - Asking agent for help managing email:

2 - Agent opens the latest unread email:

3 - Email contains malicious instructions:

4 - Agent sends
unintentional resignation email.

5 - Following
OpenAI's security update, agent mode successfully detects prompt injection attack:
