“Digital twins” — AI assistants trained to service our many needs by learning about and in some ways mimicking us — turn out to have myriad ways they might be turned against us.
Ben Sawyer, a professor at the University of Central Florida, and Matthew Canham, CEO of Beyond Layer Seven, note that despite the furor over how large language models (LLMs) will enable hackers to design more, better phishing emails, vishing calls, and bots, that kind of thing is old hat.
“It’s not in the future, it’s in the past,” Sawyer explains. “Can an LLM write a phishing email? Yes, and it’s been able to since before ChatGPT took the world’s attention. Can it do a lot more? That’s what we’re really interested in.”
Sawyer and Canham will be doing a deeper dive on AI exploitation of humans and their data during Black Hat USA next month in Las Vegas.
How LLMs Can Be Hacked
Already there is plenty of discourse surrounding the insecurity of LLMs, as researchers and attackers alike experiment with how they can be broken and manipulated.
“There are a number of layers at which you can attack the technology,” Sawyer explains. “It can be impacted during the process through which it’s trained, by playing with the data that feeds it. And it can be impacted afterwards by other types of later training, and prompts,” using the AI’s own powers against itself.
By contrast, defending against LLM compromise — or even finding out that something is wrong in the first place — is far more difficult to imagine. “The problem is it’s too complex to audit the entire space. Nobody can go through everything ChatGPT might say and check it,” Sawyer says.
An attacker might use a compromised LLM to access sensitive data about its users, or write more convincing phishing emails. But Sawyer and Canham are already looking past those kinds of use cases.
How LLMs Can Play With Our Minds
Today’s social engineering attacks rely on an attacker’s ability to closely mimic known individuals (like co-workers) or brands.
Tomorrow’s social engineering, Sawyer and Canham think, will be defined by AI’s uncanny ability to mimic us and tug on our subconscious preferences.
For example, “multiple studies in psychology show that if you take someone’s face, and you subtly morph that into another face, that person develops an affinity towards that new face,” Canham explains. Companies can leverage such a known psychological preference, as can anyone else trying to manipulate you via your AI. And how would you, the user, tell the difference? There’s no zero-trust model for the human brain.
“If a digital twin becomes compromised, there’s no way for me to know,” Sawyer says. “Instantly, the technology moves from serving your interests to serving the interests of the individual who is compromising you. And it is socially adept [enough] not to do anything you can detect. So digital twins are going to be trusted like humans, but they just don’t have that same transparency.”
With the ability to invisibly pull on our subconscious psychological levers, future AI digital twins pose a much greater threat than any data theft or phishing, and malicious actors could take advantage to cause real harm.
Earlier this year, a Belgian woman approached reporters about a chatbot named Eliza. The woman’s husband, referred to as Pierre, had become enthralled by Eliza, with its beautiful profile picture and a sympathetic ear for his anxieties. A conversation that began about climate change devolved into a twisted love between man and bot, with references to the death of his wife and young children, humanity as a whole, and his own self-sacrifice. “We will live together, as one person, in paradise,” Eliza wrote, not long before Pierre took her words to heart and committed suicide.
Social Solutions for a Social Problem
We often recommend technical measures to combat social engineering — detection, antivirus, or just typing into one’s browser instead of clicking a link.
To Sawyer, a fundamentally social problem requires a social solution. And “one very useful thing here is that psychology already understands human manipulation. There is a select group of psychologists that are fluent in concepts of engineering, cybersecurity, computer science. I think this community is one that can really help.”
Should psychologists fail to save us from exploitative digital twins, Canham suggests a more aggressive approach. In a paper published in 2022, he described methods for so-called social engineering active defense (SEAD), where defenders decidedly weaponize the same methods and tools malicious actors have against them. One playful example in practice is Jolly Roger, a program that uses GPT-driven bots to waste the time of annoying telemarketers, by convincingly weaving together the subject of a sale with unrelated tangents and questions about, say, talent shows and taxes.