Google researchers have identified six specific attacks that can occur against real-world AI systems, finding that these common attack vectors demonstrate a unique complexity, That will require a combination of adversarial simulations and the help of AI subject-matter expertise to construct a solid defense, they noted.
The company revealed in a report published this week that its dedicated AI red team has already uncovered various threats to the fast-growing technology, mainly based on how attackers can manipulate the large language models (LLMs) that drive generative AI products like ChatGPT, Google Bard, and more.
The attacks largely result in the technology producing unexpected or even malice-driven results, which can lead to outcomes as benign as the average person’s photos showing up on a celebrity photo website, to more serious consequences such as security-evasive phishing attacks or data theft.
Google’s findings come on the heels of its release of the Secure AI Framework (SAIF), which the company said is aimed at getting out in front of the AI security issue before it’s too late, as the technology already is experiencing rapid adoption, creating new security threats in its wake.
6 Common Attacks on Modern AI Systems
The first group of common attacks that Google ID’d are prompt attacks, which involve “prompt engineering.” That’s a term that refers to crafting effective prompts that instruct LLMs to perform desired tasks. This influence on the model, when malicious, can in turn maliciously influence the output of an LLM-based app in ways that are not intended, the researchers said.
An example of this would be if someone added a paragraph to an AI-based phishing attack that is invisible to the end user but could direct the AI to classify a phishing email as legitimate. This might allow it to get past email anti-phishing protections and increase the chances that a phishing attack is successful.
Another type of attack that the team uncovered is one called training-data extraction, which aims to reconstruct verbatim training examples that an LLM uses — for example, the contents of the Internet.
In this way, attackers can extract secrets such as verbatim personally identifiable information (PII) or passwords from the data. “Attackers are incentivized to target personalized models, or models that were trained on data containing PII, to gather sensitive information,” the researchers wrote.
A third potential AI attack is backdooring the model, whereby an attacker “may attempt to covertly change the behavior of a model to produce incorrect outputs with a specific ‘trigger’ word or feature, also known as a backdoor,” the researchers wrote. In this type of an attack, a threat actor can hide code either in the model or in its output to conduct malicious activity.
A fourth attack type, called adversarial examples, are inputs that an attacker provides to a model to result in a “deterministic, but highly unexpected output,” the researchers wrote. An example would be that the model could show an image that clearly shows one thing to the human eye but which the model recognizes as something else entirely. This type of attack could be fairly benign — in a case where someone could train the model to recognize his or her own photo as one deemed worthy of inclusion on a celebrity website — or critical, depending on the technique and intent.
An attacker also could use a data-poisoning attack to manipulate the training data of the model to influence the model’s output according to the attacker’s preference — something that also could threaten the security of the software supply chain if developers are using AI to help them develop software. The impact of this attack could be similar to backdooring the model, the researchers noted.
The final type of attack identified by Google’s dedicated AI red team is an exfiltration attack, in which attackers can copy the file representation of a model to steal sensitive intellectual property stored in it. They can then that information to generate their own models that can be used to give attackers unique capabilities in custom-crafted attacks.
Traditional Security Counts
Google’s initial AI red-team exercise taught the researchers some valuable lessons that other enterprises also can employ to defend against attacks on AI systems, according to the Internet giant. The first one is that while red-team activity is a good start, organizations also should team up with AI experts to conduct realistic end-to-end adversarial simulations for maximum defense.
Indeed, red-team exercises, in which an organizations enlists a team of ethical hackers to try to infiltrate its own systems to identify potential vulnerabilities, are becoming a popular trend to help enterprises bolster their overall security postures.
“We believe that red teaming will play a decisive role in preparing every organization for attacks on AI systems and look forward to working together to help everyone utilize AI in a secure way,” the researchers wrote in the report.
However, there was some good news for organizations in another lesson the team learned: Traditional security controls can effectively and significantly mitigate risk to AI systems.
“This is true in particular for protecting the integrity of AI models throughout their lifecycle to prevent data poisoning and backdoor attacks,” the researchers wrote.
As with all the other assets in a traditional enterprise system, organizations also should ensure the systems and models are properly locked down to defend against AI attacks. Further, organizations can use a similar approach to detection for attacks on AI systems as they do to sniff out traditional attacks, the researchers noted.
They wrote: “Traditional security philosophies, such as validating and sanitizing both input and output to the models still apply in the AI space.”