Apple Vision Pro’s Eye Tracking Exposed What People Type

An anonymous reader quotes a report from Wired: You can tell a lot about someone from their eyes. They can indicate how tired you are, the type of mood you’re in, and potentially provide clues about health problems. But your eyes could also leak more secretive information: your passwords, PINs, and messages you type. Today, a group of six computer scientists are revealing a new attack against Apple’s Vision Pro mixed reality headset where exposed eye-tracking data allowed them to decipher what people entered on the device’s virtual keyboard. The attack, dubbed GAZEploit and shared exclusively with WIRED, allowed the researchers to successfully reconstruct passwords, PINs, and messages people typed with their eyes. “Based on the direction of the eye movement, the hacker can determine which key the victim is now typing,” says Hanqiu Wang, one of the leading researchers involved in the work. They identified the correct letters people typed in passwords 77 percent of the time within five guesses and 92 percent of the time in messages.

To be clear, the researchers did not gain access to Apple’s headset to see what they were viewing. Instead, they worked out what people were typing by remotely analyzing the eye movements of a virtual avatar created by the Vision Pro. This avatar can be used in Zoom calls, Teams, Slack, Reddit, Tinder, Twitter, Skype, and FaceTime. The researchers alerted Apple to the vulnerability in April, and the company issued a patch to stop the potential for data to leak at the end of July. It is the first attack to exploit people’s “gaze” data in this way, the researchers say. The findings underline how people’s biometric data — information and measurements about your body — can expose sensitive information and beused as part of the burgeoning surveillance industry.

The GAZEploit attack consists of two parts, says Zhan, one of the lead researchers. First, the researchers created a way to identify when someone wearing the Vision Pro is typing by analyzing the 3D avatar they are sharing. For this, they trained a recurrent neural network, a type of deep learning model, with recordings of 30 people’s avatars while they completed a variety of typing tasks. When someone is typing using the Vision Pro, their gaze fixates on the key they are likely to press, the researchers say, before quickly moving to the next key. “When we are typing our gaze will show some regular patterns,” Zhan says. Wang says these patterns are more common during typing than if someone is browsing a website or watching a video while wearing the headset. “During tasks like gaze typing, the frequency of your eye blinking decreases because you are more focused,” Wang says. In short: Looking at a QWERTY keyboard and moving between the letters is a pretty distinct behavior.

The second part of the research, Zhan explains, uses geometric calculations to work out where someone has positioned the keyboard and the size they’ve made it. “The only requirement is that as long as we get enough gaze information that can accurately recover the keyboard, then all following keystrokes can be detected.” Combining these two elements, they were able to predict the keys someone was likely to be typing. In a series of lab tests, they didn’t have any knowledge of the victim’s typing habits, speed, or know where the keyboard was placed. However, the researchers could predict the correct letters typed, in a maximum of five guesses, with 92.1 percent accuracy in messages, 77 percent of the time for passwords, 73 percent of the time for PINs, and 86.1 percent of occasions for emails, URLs, and webpages. (On the first guess, the letters would be right between 35 and 59 percent of the time, depending on what kind of information they were trying to work out.) Duplicate letters and typos add extra challenges.