You’ve heard of Zoom Bombing, but have you heard of Zoom Snooping? Researchers contend they can extract keystroke data from participants in a video call simply by tracking shoulder movements. A recently published study warns malicious actors might use the technique to decipher personal passwords and proprietary business information.
This week, a group of University of Texas researchers published a technical brief on what they claim is a reliable framework – using shoulder movements (PDF) – that can determine what someone on the other end of a Zoom, Microsoft Skype or Google Hangouts video call is typing.
Researchers led by Murtuza Jadliwala, set out to determine, “can an adversary, who is at one end of a video call, infer some potentially sensitive information about the participant at the other end which is not trivially visible/audible from the call?”
In a controlled test with a limited number of words, researchers averaged about 75 percent accuracy when it came to snooping on participants. Control factors included specific chairs, keyboards and webcam. Effecting accuracy, researchers said, were determinants such as long hair, long sleeves or slow “hunt and peck” style typing.
“Being security/privacy researchers, and heavy users of such applications ourselves, we wondered what non-obvious private information one (with nefarious motives) can infer by being on the other end of such call/conference videos.” Jadliwala told Threatpost by email.
Can Your Shoulders REALLY Reveal What You’re Typing?
He added most users are typically doing other stuff while participating in video calls and a lot of those tasks involve typing.
“This observation led us to investigate if it was indeed possible to infer what someone is typing by just observing the upper body of the user (in a video call),” Jadliwala said. “One of the reasons our attack framework targets image frames (in the video call) containing upper body/shoulders of the user is because that is the only portion of the body that is typically visible in most video calls.”
Jadliwala reported his team was able to read slight pixel shifts on high-definition video around someone’s shoulders and upper arms to see if their movements were headed either north, south, east or west. From there, the team could map keystrokes on a QWERTY keyboard to make inferences about the text.
And while the technology is still experimental and needs work, the sheer volume of work, school and social life being done on high-definition video calling platforms is driving cybersecurity researchers to take a hard look at their vulnerabilities.
Public-interest technologist Bruce Schneier recently highlighted the research on his blog, writing the, “Accuracy isn’t great, but that it can be done at all is impressive.”
Video Conferencing Faces Enormous Security Challenges
Video conferencing platforms have struggled in recent months to keep up with the security demands of their new user base. Zoom “bombings” were an early issue, where people interrupted meetings with hate speech, pornography or otherwise jarring or inappropriate content, which went live on national television during one particular Zoom hijacking incident during a House Oversight Committee hearing last April.
Zoom was sued for saying it provided encryption users said was not there. The company later announced it would provide end-to-end encryption, but only for its paid subscribers but was later pressured to roll E2EE for basic users too.
And in early Oct., Cisco’s Webex, another popular, high-definition video conferencing platform issued patches for three “high-severity” flaws and 11 “medium” severity ones for its conferencing system’s video surveillance IP cameras and Identity Services Engine network admin software.
Users who are concerned about their keystrokes being mapped over video conferencing can take a few simple steps to protect their data, according to Jadliwala and his team.
First, use existing tools to blur the background during calls. The researchers experimented with blurring and found it cut their ability to decipher words from 65 percent down to as low as 13 percent.
“These results show that blurring is an effective mitigation technique, which imposes little efficiency and quality overheads,” the report said.
Pixelation and frame skipping, much like blurring, were also effective at mitigating the team’s ability to read keystrokes, according to the report.
Jadliwala adds there is no reason for alarm and no threat attempts using video conferencing to interpret keystrokes have been observed.
“However,” he added. “It is good to be informed about such threats as a user of such video calling/conferencing applications.”
Hackers Put Bullseye on Healthcare: On Nov. 18 at 2 p.m. EDT find out why hospitals are getting hammered by ransomware attacks in 2020. Save your spot for this FREE webinar on healthcare cybersecurity priorities and hear from leading security voices on how data security, ransomware and patching need to be a priority for every sector, and why. Join us Wed., Nov. 18, 2-3 p.m. EDT for this LIVE, limited-engagement webinar.