Big Wave Shoulder Surfing

Big Wave Shoulder Surfing Neil Patil Brian Cui∗ Hovav Shacham [email protected] [email protected] [email protected] University of Texas at Austin University of Texas at Austin University of Texas at Austin ABSTRACT Camera technology continues to improve year over year with advancements in both hardware sensor capabilities and computer vision algorithms. The ever increasing presence of cameras has opened the door to a new class of attacks: the use of computer vision techniques to infer non-digital secrets obscured from view. We prototype one such attack by presenting a system which can recover handwritten digit sequences from recorded video of pen motion. We demonstrate how our prototype, which uses off-the-shelf computer vision algorithms and simple classification strategies, can predict a set of digits that dramatically outperforms guessing, challenging the belief that shielding information in analog form is sufficient to maintain privacy in the presence of camera surveillance. We conclude that addressing these new threats requires a new method of thinking that acknowledges vision-based side-channel attacks Figure 1: An example scenario. Even though the writing is against physical, analog mediums. not visible, the pen’s motion can be observed and the writing reconstructed by a digital camera. Our prototype gener- CCS CONCEPTS alizes this attack to less optimal camera angles, distances, • Security and privacy → Human and societal aspects of se- and image quality. curity and privacy; Security in hardware; • Computing method- ologies → Computer vision. Meanwhile, analog mediums continue to be trusted platforms KEYWORDS for sensitive information. Writing on paper, physical ID cards, and face-to-face meetings are considered more trustworthy and less video camera, computer vision, pen and paper, side-channel attack, accessible to foreign adversaries when compared to digital com- image analysis, mobile devices, motion tracking, handwriting munication. The airgapped nature of analog, “real-life" activities ACM Reference Format: is assumed to avoid many of the risks stemming from the digital Neil Patil, Brian Cui, and Hovav Shacham. 2019. Big Wave Shoulder Surfing. realm. In . ACM, New York, NY, USA, 10 pages. Given ongoing trends regarding the proliferation of cameras, improvements to sensor hardware, and new computer vision tech- 1 INTRODUCTION niques, we argue that the safety of analog data - and in particular, Cameras are widely understood as capturing a limited 2D projec- the process of its creation - is overestimated in today’s environment. tion of a 3D world. However, improvements in camera hardware The increasing availability and effectiveness of cameras, present and image processing are enabling cameras to infer aspects of the in both pockets and on property, mean that common areas where surrounding 3D space. Advancements in motion tracking, object physical data is frequently produced cannot be trusted as confi- recognition, and scene reconstruction from the Computer Vision dential, and core societal processes dependent on analog platforms community, combined with higher megapixel counts and video have not yet adapted to the increasing capabilities of cameras. framerates, are enabling advanced image processing features on While cameras add permanency to what can be seen directly consumer devices such as "Night Sight" (for low light photography) in their field of view, they can also indirectly observe hidden in- and "Portrait Mode" (for simulated depth-of-field). formation in and outside the frame. As a result, the number of ∗The first two authors contributed equally to this work. opportunities for computer-vision based side channel attacks is increasing. Advanced image processing techniques can recover Permission to make digital or hard copies of all or part of this work for personal or visual artifacts from what was previously thought as noise [17]. classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation Even in designated secure environments, such as a bank vault or on the first page. Copyrights for components of this work owned by others than the government embassy, remote recording devices intended for main- author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or taining surveillance and security can be used to breach the analog republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. sphere. As an example, we believe the following attacks will become ,, feasible within the coming years: © 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 978-x-xxxx-xxxx-x/YY/MM...$15.00 • An attacker with access to a surveillance camera observes the hand and pen motion of someone writing on a paper ,, Patil, Cui, Shacham form at a bank, inferring what was written without a direct only available in specialized high speed cameras [18]. Minute visual view of the paper. In this paper, we prototype a working details useful for side-channel attacks - like the vibrations of a bag version of this attack. of chips caused by noise [11] - will begin to become more easily • The camera present in an airplane infotainment tablet ob- captured by general purpose consumer-grade camera hardware. serves the laptop vibrations and arm motions of a passen- Furthermore, manufacturers are beginning to exceed single- ger, inferring what he or she is typing. Attacks making use camera configurations in favor of housing multiple cameras in of hardware vibrations from user input have already been their devices for the capture of "3D pictures" and depth perception demonstrated, inferring keypresses from the vibration of a effects. The recently released Nokia 9 PureView features five rear- laptop [16] and touchscreen input from the vibrations of a facing cameras [28], and the next iPhone is expected to house three tablet [38]. [20]. As sensor costs decrease over the coming years, these multi- • An overhead camera records a customer using an ATM and camera setups are predicted to make their its way to lower-end uses the victim’s muscle movements to decipher their PIN, phones and other consumer hardware [23]. even if they have shielded the keypad with their hand. Prior work has already shown that hand motions can be used to 2.2 Improvements in Software infer a user’s smartphone PIN upon entry [36]. Improvements in photo processing have made their way to the We explore the first attack in depth: recovering handwritten consumer market, as innovations in computer vision pave the way text through distant observation of pen motion alone. Such an for intelligent analysis and interpretation of noisy raw camera attack would be effective even if the victim deliberately attempts data [17]. "Night Sight", present in the Google Pixel line of phones, to obscure their writing from view by covering the tip of their pen can capture details in near-total darkness using relatively short or the writing itself. To test the attack, we prototype a system that shutter speeds. "Portrait Mode", available in smartphones from demonstrates promising accuracy, while only making use of a single several manufacturers including the iPhone, uses computer vision fixed camera, off-the-shelf 2D motion tracking, and a relatively techniques to estimate depth from one or more cameras. Google simple classification strategy. Clips, a pocket-sized camera released in 2018, uses machine learning In summary, the contributions of this paper are: to control the shutter and automatically take "interesting" photos • We outline a new class of attacks that allow digital cameras [14]. to steal information through indirect observation of analog Recent advancements in computer vision have also enabled com- input devices and storage mediums. puters to fill in hidden visual content. Scene reconstruction of3D • We demonstrate a prototype of an example attack that allows environments from a set of disjoint 2D images has been studied an attacker to steal hidden written text from camera footage extensively over past years [12]. Adobe’s "content aware fill" fea- of pen motion alone. ture, now present in Adobe After Effects, is capable of seamlessly • We discuss ongoing social trends as well as hardware and inferring and filling in missing sections of an image [7]. These software capabilities that will make this new class of attacks techniques could be applied towards ascertaining aspects about the a serious concern in the future. environment that are not immediately shown in the camera frame. Machine learning has also driven improvements in discerning 2 BACKGROUND patterns from noisy signal data [17]. This has already made possible Side channel attacks leveraging mobile sensors have grown in pop- several side channel attacks against the physical space: mobile ularity over recent years. Attacks using microphones to infer key- keyloggers have been built from smartphone microphones [2] and board input [2] and accelerometers to infer smartphone input [3] gyroscopes [8]. We believe similar techniques can be carried over have been demonstrated to be effective. However, the progress of to analyze camera images and enable new computer vision side these attacks has been slowed due to diminishing returns in hard- channel attacks. ware development and sensor fidelity - in part because of a lack of consumer interest (microphone quality is not usually a distinguish- 2.3 Societal Trends ing hardware feature) and physical constraints (such as the inverse We believe ongoing societal trends will multiply the quantity of square law). cameras and their frequency of use in the coming years, amplifying On the other hand, camera technology continues to improve the plausibility of visual side-channel attacks. year-over-year. Image and video quality improvements open the The norms around disclosure of surveillance are being relaxed. door to new vision based side channels that may have previously The release of Google Glass in 2013 was followed by significant criti- been considered infeasible.

Big Wave Shoulder Surfing

Zencam: Context-Driven Control of Continuous Vision Body Cameras

Devices, the Weak Link in Achieving an Open Internet

Electronic 3D Models Catalogue (On July 26, 2019)

Prof. Feng Liu

Manovich.Ai Aestheti

AUTHORS and MACHINES Jane C

PDF (Thumbnails)

Nestdnn: Resource-Aware Multi-Tenant On-Device Deep Learning for Continuous Mobile Vision

Chameleon: Scalable Adaptation of Video Analytics

Photoshop/Lightroom Tips

Calling BS, Far and Wide

Edgeeye - an Edge Service Framework for Real-Time Intelligent Video Analytics