What to do about troubles with AI-generated code
Spoiler: Logging and monitoring are your friends.
• 4 min read
Should companies slow their adoption of code-generating AI? A series of high-profile outages has highlighted the potential issues of allowing LLMs to build ever-larger portions of an organization’s codebase. However, experts say some of these concerns can be mitigated with a few key steps.
In the news…According to recent reporting from the Financial Times and CNBC, Amazon recently held an internal meeting to discuss several service outages. A briefing note viewed by the Financial Times stated that one of the “contributing factors” was GenAI usage “for which best practices and safeguards are not yet fully established.”
When contacted by IT Brew, Amazon spokesperson Maxine Tagay wrote in an email that only one outage was related to AI and none of the incidents involved AI-generated code. Additionally, Tagay shared that Amazon Web Services was not involved in the incidents.
Can you just do this for me? The news about the Amazon outages illuminates a particular conundrum facing IT pros: How much company coding work should be given over to LLMs? While language models can generate code in response to a simple prompt, they are “much weaker” when it comes to respecting the existing and often complex code architecture found within larger companies, John Callery-Coyne, co-founder and chief product and technology officer at ReflexAI, told IT Brew.
“Teams need to provide really strong documentation, really strong architectural guidelines, and clear interfaces that AI can reference as it’s developing new code,” Callery-Coyne said. It’s also wise to “build internal tooling that inject this kind of context into the prompts automatically as it’s building. Without that…[it will] auto-generate code that technically works but violates patterns that matter for maintainability.”
Can I get some context here? While LLMs are becoming more sophisticated, they don’t necessarily understand the full context of a particular coding project, which can lead to issues.
“When an AI model is trying to help with something complex, like large production systems, it needs context about how the system works, like architecture dependencies, coding standards, and business logic,” Callery-Coyne said. “There’s a practical limit to how much information then the model can effectively reason about all at once.”
Top insights for IT pros
From cybersecurity and big data to cloud computing, IT Brew covers the latest trends shaping business tech in our 4x weekly newsletter, virtual events with industry experts, and digital guides.
Spiros Xanthos, founder and CEO at Resolve AI, told IT Brew that most of the context an AI system needs isn’t within the actual codebase. Instead, it exists within live systems that serve customers and interact with both databases and external systems.
“The reality is that very experienced developers have developed an intimate intuition and knowledge about how that system runs in production,” Xanthos said. “Because they have all this context that developed over the years to be able to deal with this very dynamic, live system. Models, they look just at the code primarily, and as a result, they don’t really understand how that materializes the live system that has external dependencies that interacts with infrastructure, things you cannot see from code.”
Lacking context. Without strong logging and monitoring, unusual behaviors from AI systems can go unchecked, especially within larger organizations. When models lack context while coding, the output can end up breaking other aspects of the organization’s infrastructure.
Veracode found that 45% of AI-generated code contains security vulnerabilities, including cryptographic failures, log injection, cross-site scripting, and more. This is due to AI models lacking security context, limited semantic understanding, and training data contamination.
Mitrix Technology also pointed to context collapse, when an AI system loses track of information relevant to an appropriate response, which could lead to irrelevant replies, hallucinated outputs, and more.
What to do to make sure this doesn’t happen. Callery-Coyne pointed to the need for IT professionals to think about how to develop carefully curated context, since handing off an enormous code repo to an AI system can “make it a lot noisier.”
In addition to solving the context issue, Callery-Coyne said the key to ensuring that AI-generated code is production-ready is through establishing guardrails such as human ownership of the output, strong automated testing, and observability.
“When AI assisted changes are deployed, teams need strong logging and monitoring to ensure that they can detect unusual behavior quickly and roll back if necessary,” Callery-Coyne said. “It’s to make sure that the speed AI provides doesn’t create instability.”
About the author
Caroline Nihill
Caroline Nihill is a reporter for IT Brew who primarily covers cybersecurity and the way that IT teams operate within market trends and challenges.
Top insights for IT pros
From cybersecurity and big data to cloud computing, IT Brew covers the latest trends shaping business tech in our 4x weekly newsletter, virtual events with industry experts, and digital guides.