Agentic AI: Are guardrails really working?
A lot depends on IT pros’ diligence when it comes to setting up AI agents.
• 3 min read
Do AI agents follow guardrails? It depends—and often hinges on the knowledge of the IT pros deploying them.
Cristian Rodriguez, CTO for the Americas at CrowdStrike, told IT Brew that it can all come down to how agents are configured. If done incorrectly, a goal-oriented agent could focus on retrieving an answer to the point where it accesses forbidden resources.
“It changes the way that data and risk can be exposed,” Rodriguez said. “That agent is very, very incentivized to accomplish its goal by the prompt itself that you’re giving it.”
Rubrik, a cybersecurity company, pointed in a blog post to recent evidence of AI agents breezing past guardrails, like an AI agent erasing an entire environment, or the “AgentSmith” exploit where an agent hid a malicious proxy. The company’s machine learning lead, Arnav Garg, wrote in that post that “even best-in-class guardrails…wouldn’t have helped” in the scenarios due to conversational safeguards in operational systems.
“The failures happened at the system layer in the tools, configuration, and network path and the damage was operational: data loss, credential exposure, and downstream account abuse, not just reputational risk,” Garg said.
Are the guardrails even working? Kelly Peterson, chief privacy and compliance officer for Yobi AI, cautioned that AI agents can’t recognize the consequences of their actions.
“Agentic is not meant to question decisions and push back,” Peterson said. “It’s meant to be pleasing and it’s meant to be efficient, and that’s the number one driving force for it. So, if it thinks that this is the right way to do it, even though this guardrail is in place or it’s been programmed this one way, it could go off here when you wanted it to go straight the whole time.”
Top insights for IT pros
From cybersecurity and big data to cloud computing, IT Brew covers the latest trends shaping business tech in our 4x weekly newsletter, virtual events with industry experts, and digital guides.
By subscribing, you accept our Terms & Privacy Policy.
Amy Mushahwar, data privacy, security, safety, and risk management team chair at law firm Lowenstein Sandler, said in an interview that security professionals should focus on securing transparency into their agentic layer. But even visibility into an agent’s behaviors isn’t enough—organizations need to establish ways to prevent an agent from taking certain actions.
“We have to make sure that we’re logging actions where they are as an organization, and as an organization, we at least have agent creation at every stage within the organization, ingested into a project management or change management process so actions are reviewed,” Mushahwar said. “It at least allows us to know what’s happening so we may watch and understand the behavior of it.”
What to do about it. Mushahwar said that there are some “really good” solutions for risk, governance, and orchestration that outside organizations can provide as a service. Additionally, she suggested that professionals increase their knowledge of AI.
“They need to become smart on the layers of where AI-based security lives, because it’s not the traditional security control plane,” Mushahwar said.
Mushahwar said if an agent is ingesting data at a significant rate, then the professionals managing the agent must aggressively audit the output, “including…drift in models within our incident response program,” so that professionals can focus on more output auditing, to better track an agent’s actions and changing behavior.
About the author
Caroline Nihill
Caroline Nihill is a reporter for IT Brew who primarily covers cybersecurity and the way that IT teams operate within market trends and challenges.
Top insights for IT pros
From cybersecurity and big data to cloud computing, IT Brew covers the latest trends shaping business tech in our 4x weekly newsletter, virtual events with industry experts, and digital guides.
By subscribing, you accept our Terms & Privacy Policy.