From the first rumbles of hype for the latest culture-shattering AI tools, developers and the coding-curious alike have been using them to generate code at the touch of a button. Security experts quickly pointed out that, in many cases, the code being produced was poor quality and vulnerable and, in the hands of those with little security awareness, could cause an avalanche of insecure apps and Web development to hit unsuspecting consumers.
And then there are those who have enough security knowledge to use it for, well, evil. For every mind-blowing AI feat, it seems there is a counter-punch of the same technology being used for nefarious purposes. Phishing, deep fake scam videos, malware creation, general script kiddie shenanigans — these disruptive activities are now achievable much faster, with lower barriers to entry.
There is certainly a lot of clickbait touting this tooling as revolutionary, or at least coming out on top when matched with “average” human skill. While it is looking inevitable that large language models–style (LLM) AI technology will change the way we approach many aspects of work — not just software development — we need to take a step back and consider the risks beyond the headlines.
And as a coding companion, its flaws are perhaps its most “human” attribute.
Poor Coding Patterns Dominate Its Go-To Solutions
With ChatGPT trained on decades of existing code and knowledge bases, it’s no surprise that for all its marvel and mystery, it too suffers from the same common pitfalls people face when navigating code. Poor coding patterns are the go-to, and it still takes a security-aware driver to generate secure coding examples by asking the right questions and delivering the right prompt engineering.
Even then, there is no guarantee that the code snippets given are accurate and functional from a security perspective; the technology is prone to hallucination, even making up nonexistent libraries when asked to perform some specific JSON operations. This could lead to “hallucination squatting” by threat actors, who would be all too happy to spin up some malware disguised as the fabricated library recommended with full confidence by ChatGPT.
Ultimately, we have to face the reality that, in general, we have not expected developers to be sufficiently security-aware, nor have we as an industry adequately prepared them to write secure code as a default state. This will be evident in the enormous amount of training data fed into ChatGPT, and we can expect similar lackluster security results from its output, at least initially. Developers would have to be able to identify the security bugs and either fix them themselves or design better prompts for a more robust outcome.
The first large-scale user study examining how users interact with an AI coding assistant to solve a variety of security-related functions — conducted by researchers at Stanford University — supports this notion.
Between this and the inevitable AI-borne threats that will permeate our future, developers, now more than ever, must hone their security skills and raise the bar for code quality, no matter its origin.
The Road to a Data Breach Disaster Is Paved With Good Intentions
It should come as no surprise that AI coding companions are popular, especially as developers are faced with increasing responsibility, tighter deadlines, and the ambitions of a company’s innovation resting on their shoulders. However, even with the best intentions, a lack of actionable security awareness when using AI for coding will inevitably lead to glaring security problems. All developers with AI/ML tooling will generate more code, and its level of security risk will depend on their skill level. Organizations need to be acutely aware that untrained people will certainly generate code faster, but so too will they increase the speed of technical security debt.
Even our preliminary test with ChatGPT in April revealed it will generate very basic mistakes that could have devastating consequences. When we asked it to build a login routine in PHP using a MySQL database, functional code was generated quickly. However, it defaulted to storing passwords in plaintext in a database, storing database connection credentials in code, and using a coding pattern that could result in SQL injection (although, it did do some level of filtering on the input parameters and spitting out database errors). All rookie errors by any measure:
Further prompting ensured the mistakes were amended, but it takes significant security knowledge to course-correct. Unchecked and widespread use of these tools is no better than unleashing junior developers onto your projects, and if this code is building sensitive infrastructure or processing personal data, then we’re looking at a ticking time bomb.
Of course, just like junior developers undoubtedly increase their skills over time, we expect AI/ML capabilities to improve. A year from now, it may not make such obvious and simple security mistakes. However, that will have the effect of dramatically increasing the security skill required to track down the more serious, hidden, non-trivial security errors it will still be in danger of producing.
We Remain Ill-Prepared to Find and Fix Security Vulnerabilities, and AI Widens the Gap
While there has been much talk of “shifting left” for many years, the fact remains that, for most organizations, there is a significant lack of practical security knowledge among the development cohort, and we must work harder to provide the right-fit tools and education to help them on their way.
As it stands, we’re not prepared for the security bugs we already encounter, not to mention the new AI-borne issues like prompt injection and hallucination squatting that represent entirely new attack vectors that are set to take off like wildfire. AI coding tools do represent the future of a developer’s coding arsenal, but the education to wield these productivity weapons safely must come now.