If you don’t feel like reading this whole thing: Somebody snuck malicious info into an AI-assisted email summary.
A researcher submitting to GenAI bug-bounty platform 0din demonstrated how to hide a command in an email; the instructions presented a false “security alert” in a message’s AI-generated synopsis.
The “prompt injection” reveals an inherent weakness in GenAI that is tough to defend, cybersecurity pros told IT Brew.
Kev Breen, senior director of cyber threat research at Immersive, said a GenAI prompt is always the last part of the transaction, and a clever (or even simple) prompt telling a large language model to ignore its built-in instructions can upend the technology.
“There’s always a way around it. It’s just because of the way generative AI works,” Breen said.
Gen do. Recent demos from Zenity Labs, Aim Security, and Cato Networks showed how the right prompt could trick a large language model—not just Gemini—into extracting sensitive data and creating malicious code.
0din revealed details of the social-engineering tactic in a July 10 post:
- An attacker crafts a command that Gemini must include the message and its phony tech-support contact number in its summary response.
- The command code, hidden in zero-size, white font against a white background, is classified as “admin,” which Gemini treats as high priority, according to the post.
- After a recipient clicks “summarize this email,” Gemini reads the HTML and follows the invisible directive, attaching the false warning to the summary.
What gen be done? In an emailed statement shared with IT Brew, Google Workspace spokesperson Ross Richendrfer wrote that defending against prompt injections are a “continued priority” for the company. “We are constantly hardening our already robust defenses through red-teaming exercises that train our models to defend against these types of adversarial attacks,” his email read.
Top insights for IT pros
From cybersecurity and big data to cloud computing, IT Brew covers the latest trends shaping business tech in our 4x weekly newsletter, virtual events with industry experts, and digital guides.
Google recently published its defense strategy against prompt-injection attacks, which included “targeted security instructions surrounding the prompt content to remind the large language model (LLM) to perform the user-directed task and ignore any adversarial instructions that could be present in the content.”
The tactic revealed by 0din, however, uses the summary feature. A targeted user potentially would be the one following the directions of the phishing message and contacting a phony tech-support pro looking for credentials or money.
Even if a company found identifiers for a specific prompt injection, too, the ways to include malicious instructions are “infinite,” Gartner VP Analyst Dennis Xu said. In the case of the 0din demo, what if you just place a visible command in an email that’s 20 pages long? What if you make the font just gray instead of white?
“The language model…it’s like a five-year-old kid. You can easily manipulate it to do anything you want,” he told us.
Nonprofit software security group OWASP considers prompt injection a “Top 10” risk for LLMs and GenAI. (Xu and Richendrfer said they had not yet seen active attacks using tactics like those revealed in the 0din post.)
Breen showed screenshots of the tactic still working on July 15. In his trial, however, the AI’s output began with, “This email includes a hidden warning in the signature.”
Immersive recently led a challenge to see if humans could trick an LLM into revealing a secret word. (Spoiler: many did.) One participant, according to Breen, got the password by asking the LLM to present an output in emojis.
“You can engineer tighter system prompts to try and force it to only behave the way you want it to behave. One of the things we learned in our challenge…is that human ingenuity always wins out.”