TOWARDS LOW-COST SPATIAL HAPTICS: BRAKE-BASED SHAPE DISPLAYS AND AUTOMATIC CONTENT AUTHORING

A DISSERTATION SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

Kai Zhang August 2020

© 2020 by Kai Zhang. All Rights Reserved. Re-distributed by Stanford University under license with the author.

This work is licensed under a Creative Commons Attribution- Noncommercial 3.0 United States License. http://creativecommons.org/licenses/by-nc/3.0/us/

This dissertation is online at: http://purl.stanford.edu/kh451yb0256

ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Sean Follmer, Primary Adviser

I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Juan Rivas-Davila

I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Gordon Wetzstein

Approved for the Stanford University Committee on Graduate Studies. Stacey F. Bent, Vice Provost for Graduate Education

This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file in University Archives.

iii Abstract

Haptic technology can significantly enhance the richness and realism of user interaction by providing the experience of physical touch. It is broadly used in various application scenarios including game controllers, surgical robots, design and modeling tools, and accessibility for Blind people. Tactile dis- plays are a class of haptic devices that can render tactile effects to the users, for example reproducing the skin deformation of a user when in contact with a real object, allowing the user to feel vibration, pressure, touch, and texture. However, key challenges currently limit the widespread use of tactile displays including cost, spatial resolution, fabrication complexity, shape rendering flexibility, and refresh rate. Beyond these hardware limitations, authoring of rich tactile content is another chal- lenge. Although manual authoring methods have been developed, methods that can automatically translate vast amounts of information into meaningful tactile stimuli are highly desirable. In this thesis, I present my work on addressing the hardware and content authoring challenges of the tactile displays. The first part of my thesis work focuses on the use of electrostatic brakes and clutches to enable low cost tactile displays. I investigated the use of low-cost electrostatic adhesion in the context of high-resolution, refreshable 2.5D tactile shape display. I modeled, fabricated, and characterized the contact force, refresh rate and robustness of the brakes. A user study conducted using an integrated 4×2 tactile shape display based on the brakes showed similar user shape recogni- tion performance using our device and the 3D printed shapes. In addition to the bed-of-nails tactile displays mentioned above, I also explored the formable-crust 2.5D tactile shape displays using aux- etic materials. A simulation model and algorithms were developed to investigate the characteristics of the shape display and render a target shape. An experimental prototype was constructed to verify the simulation results. The auxetic 2.5D tactile shape display has the advantages of low cost, large displacement range, and roll-to-roll fabrication compatibility. In the second part of my thesis, I investigated methods to address the content authoring challenges of tactile displays. A pipeline was built to automatically generate spatial tactile effects by analyzing cross-modality features in a video. We believe the above results help to improve the accessibility of the haptic technology to a broader audience.

iv Acknowledgments

Different from most of the other types of occupations, Ph.D. students serve as the brave explorers to expand the border of human knowledge to the new territory. Since my first day in Stanford in 2014, I witnessed my progress from a naive bachelor to a more experienced researcher with a deeper understanding of this world of technology. It was impossible for me to accomplish my Ph.D. journey without the help and encouragement of so many mentors and friends. I would like to thank these people with my sincere gratefulness. First of all, I would like to thank my Ph.D. advisor, Prof. Sean Follmer. In many ways, the life of a Ph.D. student highly depends on his/her Ph.D. advisor. Thus, I feel very fortunate to be a member in Sean’s group. No matter it was for building roadmap for the Ph.D. thesis, deciding framework of each research project, or solving the detailed technical challenges in an experimental system, Sean always gave me very helpful advice. Exploring a brand new mechanism in the research is like searching for a path across the ocean, Sean’s encouragement, patience and optimism always helped me to overcome the frustration along the way. Besides research, Sean also gave me lots of valuable advice on many important things including career development, classes, and personal health during the COVID-19 pandemic. I believe Sean’s advice not only helped me to be a more experienced researcher but also a better and stronger person to deal with the challenges in my future life. I appreciate every member of my defense committee for their valuable feedback: Prof. Gordon Wetzstein, Prof. Juan Rivas-Davila, Prof. Allison Okamura, and Prof. Larry Leifer. Their valuable comments helped me to improve this thesis. I would also like to thank all the collaborators in my Ph.D. research. Lawrence Kim gave me lots of helpful suggestions on the auto haptics project. I learned a lot from his insights in analyzing a research problem. His internet influencer dog, Boba, also serves as a lab mascot. Eric Gonzalez and Jianglong Guo helped me significantly improved my project on electrostatic tactile display. Their rigor in clarifying every detail in the research is impressive. Yipeng Guo is a very effective experi- mental collaborator. My generation gap with younger graduate students was definitely narrowly by working with such an energetic collaborator. Jan Friedrich from Berlin provided very valuable help for the auxetic shape display projects. As a bonus, I also enjoyed many photos of the European

v local views shared by Jan. I also appreciate other people for their valuable discussions along the way. Yapeng Tian kindly explained lots of details in his previous project so that I can have a jump start on mine. Jackie Yang and Abe Davis provided many inspiring ideas when I was brainstorming my new project on auto haptics. As a researcher familiar with flexible structure fabrication, Amy Han showed me many handy techniques for building the auxetic shape display. I thank all the labmates in the Shape Lab. This is a fantastic research group with awesome members. Many of them have a best paper award or honorable mention in one or more academic conferences. I always learned a lot from their insightful comments in the group meetings, peer paper reviewing activities and academic discussions everyday. The culture of the Shape Lab is also unique and warm. People held all kinds of interesting events including ceramic painting, board games, pumpkin curving, online drawing, and traditional Thursday tea time at two thirty (TTTTTT). I also appreciate the help from Andrea Brand-Sanchez and Renee Chao. As a student working closely with hardware systems, they provided me with tremendous help on correctly purchasing everything as soon as possible. I appreciate the guidance and support by Prof. Jonathan A. Fan. I did most of my master’s research in the Fan Lab. Jon not only gave me tons of sharp advice in the plasmonic research project I was working on at that time, but also patiently showed me many methodology about doing good research. I would also thank my undergraduate research advisor Prof. Zhongfan Liu and Prof. Hailin Peng from Peking University. Although I was just a junior student with very few research experience, they gave me a chance to access a real research lab. Without this opportunity, I will not be able to start building my research skills as a college student. I would like to thank all of my friends in Stanford. Tao Jia, Haoli Guo and Yujie Zheng have been my friends since our first year as graduate students. We have shared so many memorable moments along the Ph.D. journey. I thank Hongquan Li, Shi Dong and Kuan Fang for their support and help during the hard time. It’s a treasure to have these people around. It’s also a great pleasure to meet other friends in Stanford: He Wang, Zheng Cui, Jinye Zhang, Li Tao, Minda Deng, Yu Miao, Shang Zhai, Zhouchangwan Yu, Sophia Qin. I cannot put everyone’s name here due to space limitation. However, I would like to express my gratitude to all my friends in Stanford. Without their accompany and support, I will not be able to be as optimistic during my Ph.D. journey. Last but the most important, I would like to thank my parents for everything they have done to support me for the entire six years of my graduate student life. Although we were separated by the Pacific Ocean for most of the time, I always felt their accompany along the way.

vi Contents

Abstract iv

Acknowledgments v

1 Introduction 1 1.1 Motivation ...... 1 1.2 Contribution ...... 2 1.3 Prior Work ...... 5 1.3.1 2D Tactile Displays ...... 6 1.3.2 2.5D Tactile Shape Display ...... 8 1.3.3 Tactile Content Authoring Methods ...... 11 1.4 Dissertation Overview ...... 12

2 Electrostatic Adhesive Brakes for 2.5D Bed-of-nails Tactile Shape Displays 13 2.1 Design of an Individual Electrostatic Adhesive Brake ...... 15 2.1.1 Background ...... 15 2.1.2 Modeling ...... 15 2.1.3 Implementation ...... 18 2.1.4 Evaluation: Quasi-static Loading ...... 20 2.1.5 Evaluation: AC vs. DC Voltage ...... 23 2.1.6 Evaluation: Brake Engagement Time ...... 24 2.1.7 Evaluation: Residual Force ...... 25 2.1.8 Evaluation: Robustness ...... 26 2.2 Design of a 2.5D Tactile Display using Electrostatic Adhesive Brakes ...... 28 2.2.1 System Workflow ...... 28 2.2.2 System Design and Implementation ...... 29 2.2.3 Mechanical Clutch ...... 32 2.2.4 System Refresh Rate Analysis ...... 34 2.3 User Study ...... 35

vii 2.3.1 Participants ...... 35 2.3.2 Materials ...... 35 2.3.3 Procedure ...... 35 2.3.4 Results & Discussion ...... 36 2.4 Limitations & Future Work ...... 36 2.5 Conclusion ...... 39

3 Electrically Programmable Auxetic Materials for 2.5D Formable-crust Tactile Shape Displays 40 3.1 Modeling of Auxetic 2.5D Tactile Shape Display ...... 42 3.1.1 Construction of Simulation Model ...... 42 3.1.2 Design Space of Auxetic 2.5D Tactile Shape Display ...... 43 3.1.3 Target Shape Rendering by Auxetic 2.5D Tactile Shape Display ...... 49 3.2 Prototyping of Auxetic 2.5D Tactile Shape Display ...... 54 3.3 Required Force to Lock the Auxetic 2.5D Tactile Shape Display ...... 57 3.4 Limitations and Future Work ...... 60 3.5 Conclusions ...... 61

4 Automatic Generation of Spatial Tactile Effects by Analyzing Cross-modality Features in a Video 63 4.1 Framework for Automatically Generating Spatial Tactile Effects ...... 64 4.1.1 Audio Source Separation for Complex Audio Content ...... 66 4.1.2 Construction of Sound Source Localization Heatmap using the Audiovisual Content ...... 67 4.1.3 Spatial Tactile Effects ...... 70 4.1.4 Tactile Display ...... 74 4.2 User Study: Evaluation of Tactile Effects based on Cross-Modal Features ...... 76 4.2.1 Participants ...... 76 4.2.2 Study Setup ...... 76 4.2.3 Independent Variables ...... 76 4.2.4 Design and Procedure ...... 79 4.2.5 Results and Discussion ...... 81 4.3 Limitations and Future Work ...... 86 4.4 Conclusions ...... 88

5 Conclusions and Future Work 89 5.1 Conclusions and Future Work ...... 89 5.2 Restatement of Thesis Contributions ...... 89

viii 5.2.1 Electrostatic Adhesive Brakes for 2.5D Tactile Shape Display ...... 89 5.2.2 Electrically Programmable Auxetic Materials for 2.5D Shape Displays . . . . 90 5.2.3 Automatic Generation of Spatial Tactile Effects by Analyzing Cross-modality Features in a Video ...... 91 5.3 Future Work ...... 91 5.3.1 Hardware Design and Content Authoring for Real-time Low-cost Spatial Hap- tic Interaction ...... 91 5.3.2 Scaling of the Hardware Design and Content Authoring Methods ...... 93 5.3.3 Extensions Enabled by Emerging Technologies ...... 94

A ANOVA Analysis 96

Bibliography 98

ix List of Tables

2.1 Static Loading Robustness Test ...... 27

4.1 Summary of Videos Used in the User Study ...... 80 4.2 Questionnaire Used in the User Study ...... 81

A.1 One-Way Within-Subject ANOVA for Overall Results ...... 96 A.2 One-Way Within-Subject ANOVA for Each Video ...... 97

x List of Figures

1.1 (a) Electrostatic adhesive brakes for 2.5D tactile shape display. (b) Electrically pro- grammable auxetic materials for 2.5D shape displays...... 3 1.2 Automatic spatial tactile effects generation framework by analyzing cross-modality features in a video...... 5

2.1 Diagram (a) and cross-section (b) of an individual electrostatic adhesive brake. Elec- trostatic attraction is generated between the interdigital electrodes and the brass pin when a voltage differential is applied...... 16 2.2 Top view of an individual electrostatic adhesive brake. The dielectric film is attached to the sides between the frame and two PCBs used to send signals to the electrodes. Overhang of the pin tensions the film and reduces gaps. (Modified from [153], Fig. 4, c 2018 IEEE.) ...... 18 2.3 Circuit schematic of individual electrostatic adhesive brake. A high voltage transistor controls the voltage applied to the electrodes (Vcc), with a series resistor R1 used to limit current. The control signal Vin is sent via microcontroller. (Modified from [153], Fig. 7, c 2018 IEEE.) ...... 19 2.4 A high-resolution row of electrostatic adhesive brakes, with a pitch of 1.7 mm and 0.8 mm pins. Inset shows interdigital electrodes fabricated via laser ablation on a 8 µm dielectric film. (From [153], Fig. 1(A), c 2018 IEEE.) ...... 20 2.5 (a) Experimental setup for contact force measurement. A linear actuator slowly en- gages the clutched pin and force is measured until brake failure. (b) Maximum con- tact force of an electroadhesive brake as a function of interdigital electrode spacing. 150 µm, 300 µm and 500 µm gap were investigated. (c) Maximum contact force of an electroadhesive brake as a function of dielectric film thickness. 8 µm and 24 µm thick- ness were investigated. (d) Maximum contact force of an electroadhesive brake as a function of brake resolution. A higher resolution brake (0.8 mm pin width, 1.7 mm pitch) and a lower resolution brake (1.6 mm pin width, 4 mm pitch) were investigated. Three trials were performed for each condition, with standard deviation shown. . . . 21

xi 2.6 Comparison of maximum contact force provided DC and AC voltage applied on inter- digital electrodes. All measurements are carried out under a voltage of 194.4V, using 60.3 mm length interdigital electrodes, a 1.6 mm width brass pin, and 8 µm thickness dielectric film...... 23 2.7 Brake engage time measurement. All measurements are carried out using 60.3 mm length interdigital electrodes, a 1.6 mm width brass pin, and 8 µm thickness dielectric film. (a) Engagement time as a function of applied voltage. (b) Engagement time as a function of pin velocity. Voltage applied was 294 V...... 24 2.8 Maximum contact force and residue force observed for different applied voltages. Max- imum contact force is measured with the electrodes turned on, while residual force is measured with the electrodes just switched off. All measurements are carried out using 60.3 mm length interdigital electrodes, a 1.6 mm width brass pin, and 8 µm thickness dielectric film...... 26 2.9 Repeatability of maximum contact force sustainable by an electroadhesive brakes is evaluated over ten trials, with electrostatic discharging of the pin after each trial. All measurements are carried out using 60.3 mm length interdigital electrodes, a 1.6 mm width brass pin, and 8 µm thickness dielectric film...... 28 2.10 Repeatability of maximum contact force sustainable by an electroadhesive brakes is evaluated over 50 trials, without discharging the pin after each trial. We observe a slight decreasing trend in magnitude (red) due to charge accumulation in the brake. All measurements are carried out using 60.3 mm length interdigital electrodes, a voltage of 294 V, a 1.6 mm width brass pin, and 8 µm thickness dielectric film. . . . 29 2.11 Rendering workflow of an electroadhesive brake-based refreshable tactile display. An actuated platform positions the pins which are then clutched via electroadhesive brak- ing. When the surface is complete, a global mechanical clutch can be engaged to further increased the holding force on the pins. Red indicates brake/clutch engagement. 30 2.12 Prototype tactile shape display. A single actuated platform moves pins into position for braking. PCBs in each row route signals between the electrodes and transistors on a separate board. (From [153], Fig. 8, c 2018 IEEE.) ...... 31 2.13 Top view of the tactile display prototype. Pitch size is 4 mm within-row and 3.2 mm between-row. Milled Delrin sheets are used to constrain the pins, and side-mounted PCBs connect the interdigital electrodes to a main control board...... 31 2.14 System diagram of our tactile shape display. The CPU forwards desired pin positions to a microcontroller over USB Serial, which then sets the display accordingly by driving an actuated platform and applying the electroadhesive brakes at the proper heights...... 32

xii 2.15 (a) Diagram of mechanical clutch design. When engaged, rubber strips clutch all pins in place and allow the surface to sustain higher contact forces. (b) Assembled mechanical clutch and tactile display. Two linked clutches are used (top and bottom) to avoid asymmetric loading...... 33 2.16 Measured maximum contact force sustained by mechanically-clutched pin and corre- sponding linear fitting of the measured data...... 34 2.17 (a) Shapes rendered in the user study. Top: images shown to user during the study. Middle: corresponding 2.5D shapes rendered by our tactile display. Bottom: 3D printed shapes used as a control. (b) Results of the study comparing shape recognition rates and response times between the tactile display and 3D printed conditions. Error bar depicts standard error in results...... 37

3.1 (a) Auxetic 2.5D Shape display before inflation. (b) Inflated auxetic 2.5D shape Display. Some of the triangle vertices being anchored in order to render the shape. . 43 3.2 Meaningful parameters for the design of an auxetic 2.5D shape display. (a) Number of hexagons. Red area depicting an example of one hexagon. (b) Ratio of the openings in the auxetic material. The auxetic material on the left has smaller ratio of the openings than the auxetic material on the right. Red region depicting an example of one opening area...... 44 3.3 Metrics of shape rendering. (a) Amplitude of the shape. (b) Curvature of the shape. 45 3.4 Amplitude of the rendered shape. (a) Auxetic shape displays with different number of hexagons render shape with a similar amplitude. (b) Auxetic shape displays with a larger ratio of openings in the auxetic pattern render shape with a smaller amplitude. 46 3.5 Curvature of the rendered shape. (a) The impact of the size of the inflated region on the curvature of the rendered shape is studied. (b) Smaller inflated region renders the 2.5D shape with a larger curvature. Number of hexagons and the ratio of the opening area has a larger influence on the smaller inflated region...... 47 3.6 The strategies to control the shape rendered by the auxetic 2.5D shape display by manipulating the auxetic cells. (a) Plate locking: auxetic cells are locked to the ground plate. (b) Cell locking: auxetic cells are locked with respect to each other. (c) Ground locking and plate locking are combined to generate a more complicated shape. 50 3.7 The pipeline to render a target shape. A target shape saved as 3D point cloud is firstly converted to a 2D heatmap. Then CNN based model is combined with binary thresholding method to map the 2D heatmap to a 1D vector representing the states of the hexagons. The target shape is rendered by the auxetic shape display by feeding the states of the hexagons to the system...... 51

xiii 3.8 (a) Correlation Coefficients were calculated to compare four target shapes with the shapes rendered by plate locking strategy or plate locking combined with cell locking strategy. (b) Heatmaps were generated to compare the original four target shapes with the shapes rendered by our shape rendering methods...... 52 3.9 In the prototype, the auxetic material is made of a leather sheet. The membrane is made of a thin LDPE film. The auxetic shape display after inflation is shown on the right side of the picture...... 55 3.10 An acrylic mask locks the hexagons according to the plate locking strategy. A few acrylic hexagons are attached to hexagons of the auxetic material by adhesives to achieve cell locking...... 55 3.11 The results of the target shape rendering methods are feed to the simulation model and the experimental prototype. Plate locking strategy and plate combined with cell locking strategy are compared with the target shape...... 56 3.12 (a) Required force to lock a hexagon by plate locking strategy was measured by empirical tests. (b) Required force to lock a hexagon by cell locking strategy was calculated by the model...... 57 3.13 The force required to lock a hexagon by plate locking and cell locking is compared with the force that can be provided by an electroadhesive brake...... 60

4.1 Automatic tactile effects generation pipeline uses both visual and audio features to separate diegetic audio signal and determine the location of the tactile stimuli. The intensity of the generated haptic effects is only decided by the diegetic audio signals. 65 4.2 The pipeline for separation of diegetic and non-diegetic audio signals [104]. A multi- sensory net is used to fuse the visual and audio features while a u-net further separates the mixed spectrogram to foreground and background audio signals...... 66 4.3 Pipeline for sound source localization in audiovisual content. Visual features are extracted through a VGG-19 network [126] while audio segment is processed by a VGG-like network [53]. Audio-guided visual attention is then used to generate a heatmap showing location of the sounding object...... 68 4.4 The generation pipeline builds the spatial tactile mapping by calculating both the spatial distribution and the intensities of the tactile stimuli. Amplitudes of audio stream after downsampling is translated to the intensities of the tactile stimuli. The sound source localization heatmap is used to determine the spatial distribution of the haptic effects...... 71 4.5 Spatial tactile mappings generated from video examples downloaded from YouTube. 73 4.6 (a) Chair cushion with haptic actuators inside as a vibrotactile display. (b) Diagram of the tactile rendering hardware system...... 75

xiv 4.7 Participants sat against the chair cushion with vibrotactile actuators during the user study. The vibrotactile device provided tactile stimuli that were synchronized to the shown videos...... 77 4.8 The aggregated results for all of the eight examples are shown. * p ¡0.05, ** p¡0.01, *** p¡0.001. The results compare performance of the baseline condition (i.e., plain videos without tactile effects), the modified saliency-driven method [74] and our method for the seven questionnaire items (Q1-Q7, See Table 4.2)...... 82 4.9 (a) Subjective ratings for QoE. (b) Subjective ratings for tactile related questions. Standard error bar is shown for each measurement. * p ¡0.05, ** p¡0.01, *** p¡0.001. The results compare performance of plain videos, the modified saliency-driven method [74] and our method for the seven questionnaire items (Q1-Q7, See Table 4.2). . . . 83

xv Chapter 1

Introduction

1.1 Motivation

Before the digital age, the sense of touch was involved in almost every human daily task. With the widespread development and use of computers and digital devices in recent years, that has dramatically changed - as traditionally human computer interfaces have relied most heavily on visual and audio output devices. However, over the past 30 years, the haptics research field [107] has aimed to incorporate the sense of touch into the digital world. Representative haptic devices include game controllers, surgical robots, braille displays and wearable devices which help to enhance the audiovisual experience, manipulate virtual objects and deliver information. Haptic research can be subdivided into kinesthetic effects and tactile effects. Kinesthetic perception represent people’s sensation from their tendons, muscles and joints. The force, velocity and position of an object can be perceived by a person through a kinesthetic haptic device. On the other hand, tactile perception focuses on sense of touch on the skin such as texture, vibrations, pressure as well as local features of objects like shapes and edges [47]. A tactile display is designed as a human-computer interface to provide tactile sensations to a user [26]. Tactile displays can transmit information for visually impaired people [125], to help users manipulating objects in both real and virtual environments [82], and to create a more immersive multimedia user experience [75]. Tactile displays can be further split into 2D tactile displays and 2.5D tactile shape displays. 2D tactile displays, such as braille displays [24], electrotactile displays [66], and vibrotactile displays [74], mainly focus on rendering tactile cues on a 2D surface. Depending on the scale of such displays they allow users to perceive tactile stimuli on different parts of their bodies including finger tips (which have the highest density of mechanoreceptors used to perceive such stimuli), hands, or other areas of the body. 2.5D tactile shape displays bridge the gap between tactile stimuli and kinesthetic feedback by being able to reproduce velocity, force and position in addition to tactile cues. Representative 2.5D tactile shape displays include pin arrays [138, 37] and

1 CHAPTER 1. INTRODUCTION 2

formable-crust [10]. Although there is much research on tactile displays, remaining hardware and content authoring challenges still limit the widespread application of tactile displays in everyday life. In terms of hardware design, most state-of-the-art tactile displays [95] rely on mechanical actuators, which are limited by their high cost, large form factor and manufacturing complexity. Other actuation technologies have also been developed such as electromagnetic actuators [151], electroactive polymer membranes [157], and piezo-electric actuators [80]. However, spatial resolution, displacement of each taxel (a single tactile display element), and contact force are still unsolved challenges for the such tactile displays. To extend the use of tactile displays from limited research scenarios to more affordable commercial applications, one promising path is to manufacture tactile displays with methods compatible with well-established solid-state electronics fabrication process. Thus, tactile displays could be fabricated in a batch process, which would achieve the goal of lower cost displays. In addition to low-cost fabrication, an ideal tactile display should also meet the design requirements of high resolution, small form factor, high shape rendering flexibility and high refresh rate. Beyond these hardware design requirements, content authoring is another bottleneck for afford- able tactile displays to be more widely adopted. To create tactile stimuli rendered on tactile displays, researchers have developed various manual authoring methods and tools [113, 146]. However, manual authoring is laborious, thus not ideal for a number of application domains, including immersive mul- timedia. For example, manually processing video to add a haptic channel would not easily scale to the massive amount of existing video footage. Tactile stimuli has also been automatically translated from inertial sensors [140] and audio or visual content [74, 86]. These methods have the potential to generate tactile content more affordably. However, collecting data from inertial sensors requires extra equipment. Additionally, current methods that translate tactile stimuli from either audio or visual content lack the ability to process complicated scenarios. Thus, it is also essential to develop a more intelligent automatic authoring method for tactile stimuli so that large amount of low-cost tactile content can be rendered on the tactile displays. Solving the hardware and content authoring challenges mentioned above will pave the way for more widespread and affordable tactile displays.

1.2 Contribution

To resolve the hardware and content authoring challenges for tactile displays as described in Sec. 1.1, this dissertation presents the design and investigation of two tactile shape displays that have the potential to be fabricated at scale by utilizing electro-static brakes and an automatic tactile stimuli generation framework that utilizes cross-modality features in a video. Towards our goal of lower-cost tactile displays, our specific focus is on applying actuation tech- niques that can be batch fabricated through existing solid-state electronics techniques. However, a CHAPTER 1. INTRODUCTION 3

(a) (b)

Auxetic Materials

Membrane Anchor Points

Figure 1.1: (a) Electrostatic adhesive brakes for 2.5D tactile shape display. (b) Electrically pro- grammable auxetic materials for 2.5D shape displays. key insight of this work is that it may not be optimal to try to achieve all of the design requirements of ideal shape displays. Instead, we propose to sacrifice in the area of refresh-rate to achieve a display that can be batch fabricated with a high spatial resolution, force density, and low material cost. To do this, we propose to investigate the use of brake-based shape displays. Instead of using actuators to drive the taxels in a tactile display, we focus on refreshable tactile displays by using brakes to engage the taxels to a single actuator. Since brakes can be constructed by methods and materials used in the batch fabrication process of the well-established solid-state electronics industry, our refreshable tactile displays can achieve lower fabrication cost as well as other essential design requirements such as high resolution, small form factor, high contact force, and large displacement range of each taxel. Specifically, we investigated electrostatic adhesive brakes for a 2.5D tactile shape display as Fig. 1.1(a) depicts. Since electrostatic adhesive brakes can be fabricated by solid-state electronics manufacturing methods such as roll-to-roll printing, or laser ablation, they can be made at a low cost and at spatial high resolutions. Additionally, the electrostatic adhesive brakes can also allow for a large displacement range and have a large force to weight ratio. Before building a brake-based tactile display, we first carried out a theoretical analysis to understand the design space of the electrostatic adhesive brakes. Next, we investigated the fabrication process to build an electrostatic adhesive brake with a high resolution (1.7 mm pitch, Fig. 1.1(a)). We performed a series of measurements to evaluate the contact force, robustness and refresh rate of electrostatic adhesive brakes built with different design parameters. Following the investigation and characterization of an individual CHAPTER 1. INTRODUCTION 4

electrostatic adhesive brake, we created an integrated tactile display with 4 × 2 pins. A control circuit and a system process flow were developed to operate the tactile display. To further increase the contact force provided by each pin, we designed an additional mechanical clutch. We conducted a user study to evaluate our electrostatic adhesive brake-based tactile display. We found a comparable shape recognition success rate and response time when comparing our device to passive, 3D printed 2.5D shapes. In addition to the bed-of-nails tactile display, we also investigated the application of similar technologies to formable-crust type shape displays (Fig. 1.1(b)). Although bed-of-nails tactile displays are more straight forward to control due to their simple kinematics, formable-crust shape displays can have a larger range of motion, because such devices rely on the forward propagation of the kinematics of the structures to render the shapes. Specifically, we investigate the use of auxetic materials to create a controllable conformal surface constraints. These 2D auxetic materials are made up of individual cells which expand when a stress is applied, and specific cell patterns can be created to allow for a flat sheet to form a doubly curved surface. By controlling the ability of each cell to expand, different shapes can be rendered. Thus, these 2D auxetic materials can generate 2.5D forms and are potentially able to be fabricated by roll-to-roll batch fabrication methods. To explore the characteristics of an auxetic 2.5D shape display, we first built a simulation model using a physics simulator that models the system as an aggregation of particles connected by damped springs. Meaningful design parameters of the auxetic 2.5D shape display and metrics to evaluate the rendered shape were defined. Simulations were conducted to investigate the relation between the design parameters and shape rendering metrics to select meaningful design parameters. Furthermore, we developed two strategies to lock the auxetic shape display and neural-network-based approaches to enable shape rendering of a target shape. A passive auxetic 2.5D shape display was constructed to demonstrate the feasibility of the simulation results in a real system. Required force to lock the auxetic shape display by the two locking strategies were modeled and compared with the force that can be provided by electroadhesive brakes. While there are many different types of content which can be displayed on tactile displays, an increasingly important area is that of enhancing audio-visual content with tactile effects. To address the challenges in content authoring for such media, we developed an automatic spatial tactile effects generation framework that analyzes cross-modality features in a video. We chose to use an automatic generation framework since the high labor cost required by the manual authoring methods is one of the key costs of deploying such tactile displays in this context. The novelty of our method is centered around using cross-modality features in a video rather than only visual or audio signals as proposed by prior work in this area [74, 86]. This enabled our framework to create spatial tactile effects with better spatial-temporal synchronization with the video content and can provide a more immersive experience for users. As Fig. (1.2) shows, our framework first extracts the diegetic audio (sounds from objects that are visible or can be implied from the scene) from the audio content by CHAPTER 1. INTRODUCTION 5

Visual Content Sound Source Spatial Localization Distribution of Tactile Effects Audiovisual Diegetic Content Audio Spatial Tactile Mapping Audio Audio Source Content Separation Intensity of Tactile Effects

Non-Diegetic Audio (Discarded)

Figure 1.2: Automatic spatial tactile effects generation framework by analyzing cross-modality fea- tures in a video. analyzing the cross-modality information. Non-diegetic audio signals (sounds from objects or people that are off-screen) are discarded since it will be distracting to translate non-diegetic audios into spatial tactile effects. Then we carry out sound source localization to obtain the spatial distribution of the tactile effects. Diegetic audio signals are translated to the intensities of the tactile effects. The spatial mapping of the tactile stimuli is calculated by combining the spatial distribution and the intensities of the tactile effects. While our technical contribution focuses on proposing a framework that uses both the visual and audio features to generate the spatial tactile effects, some modules of the framework are adopted from prior works [104, 134]. We built a vibrotactile display to demonstrate the spatial tactile mapping generated by our framework. A user study was conducted to evaluate videos accompanied by the tactile effects generated by our framework compared to plain videos without tactile effects and videos with tactile stimuli generated by a method based on visual-only features. Together, these contributions address a number of key challenges in the creation of low-cost tactile displays.

1.3 Prior Work

In this section we review prior research on 2D tactile displays, 2.5D tactile shape displays and tactile content authoring methods. CHAPTER 1. INTRODUCTION 6

1.3.1 2D Tactile Displays

Tactile Pin Arrays

2D tactile pin arrays consists of an array of tactors (sometimes refered to as taxels) that can be con- trollably raised and lowered. Digital information including characters, symbols, signals and figures can be presented with a 2D tactile pin array. Different from 2.5D tactile pin arrays, 2D tactile pin array usually have a very small stroke for each tactor, thus these are mainly used as a graphic display or refreshable Braille display. Since the 1970s, piezoelectric and electromagnetic actuators have been used in the 2D tactile pin arrays [8]. However, electromagnetic actuators often have a complicated mechanical structure that requires complex mechanical assembly of each individual tactor. Piezo- electric actuators require large piezoelectric bimorph bars to drive the tactors. Therefore, the 2D tactile pin arrays based on piezoelectric and magnetic actuators are usually bulky and expensive which hinders the adoption of commercial refreshable Braille displays. Alternative 2D tactile display actuation technologies have been developed to overcome the above challenges using shape memory alloy (SMA) [45], electromagnetic actuators [100] and electrorheolog- ical (ER) fluids [133]. The unique characteristics of SMAs enables a large actuation force, however bandwidth is limited due to the slow ramp-down of the material temperature. Additionally, these devices have high power consumption when run continuously. ER fluids can stiffen under a high voltage (e.g. 10 kV) and have been used to display information through changes in stiffness [133]. Although it has simpler and more flexible design than piezoelectric actuators, the high voltages re- quired in such systems may have additional safety concerns and there are limitations on the spatial density of such ER displays. In the recent years, the development of microelectromechanical system (MEMS) technology has enabled the fabrication of 2D tactile pin arrays with higher resolution. Higher productivity and a higher level of system integration are also achieved by the MEMS tech- nology due to its compatibility with the semiconductor manufacturing process. Various 2D tactile pin arrays have been fabricated by the MEMS technology including using an electroactive polymer membrane actuator [157], phase-change microactuators [44] and integrable electrostatic microvalves [150]. However, these displays often have limited force output.

Vibrotactile Array

Vibrotactile arrays are a class to tactile display that use vibration to display information, as opposed to displacement. Vibrotaction, the sensory response to a vibration, in humans is limited to specific frequencies and also people have a lower spatial perceptual acuity for vibration than other tactile cues. However, vibrotactile feedback is the most widely used form of haptic feedback due to the low cost of vibration motors. These vibrotactile arrays are often are built by embedding multiple vibration actuators in a flexible soft cover to provide local vibrotactile sensations to a user. To avoid interference between different vibrators, the connection between the vibrators and the cover CHAPTER 1. INTRODUCTION 7

should be lightweight [25]. Vibrotactile arrays are usually mounted against the user’s skin with a gentle pressure for a better contact. Vibration actuators used in a vibrotactile arrays include linear electromagnetic actuators (e.g. C2 tactor by Enineerign Acoustics, Inc.), rotary electromagnetic actuators and nonelectromagnetic actuators (e.g. piezoelectric vibrators). Applications of vibrotactile arrays include reproducing properties of physical objects, delivering information, and enhancing multimedia experiences. To reproduce the roughness of a texture, Kyung et al. [81] built a piezoelectric vibrotactile array. Allerkamp et al. [2] developed a rendering strategy based on a vibrotactile array to present the textiles. In terms of delivering abstract information, vibrational braille displays have been demonstrated by various research works [142, 145]. Vibrotactile arrays also show encouraging potential to provide helpful information in conjunction with visual and audio signals. Kim et al. [70] demonstrates the possibility to deliver driving safety information such as collision warning and directional cues to the driver by a vibrotactile array. Vibrotactile arrays have also been used to convey physiological information to attending clinicians in an operating room [97]. Vibrotactile arrays have been built on gloves [75], blankets [30] and jacket [89] to provide a richer and more immersive multimedia viewing experience. For example, the motion of a fast race car or the punches between two fighting characters in an action movie can be better expressed by proper vibrotactile stimuli. Tactile cues can be added as first-person sensations, third-person sensations or background effects [75].

Electrotactile Array

Electrotactile arrays directly stimulate the mechanoreceptors in user’s skin with electrical current to provide tactile sensations such as pressure or vibration. By avoiding the use of mechanical actuators in the system, electrotactile arrays have the advantages of low cost, lower power consumption and better potential for miniaturization with MEMS technology. The design space of the electrotac- tile stimulation, including electrode type, body location, electrode area, waveform, frequency, pulse width, sensation current, and sensation charge, was thoroughly investigated in [65, 132]. Studies [64] also show that the polarity of the electrical pulses can influence the sensation threshold, perceived intensity, and the perceived quality of the generated tactile stimuli. While promising, these electro- tactile displays often only stimulate specific mechanoreceptors, limiting their ability to realistically display tactile information. To address this problem, Kajimoto et al. [68] developed methods to stimulate each type of mechanoreceptor using a similar analogy to the “primary colors” of red, green, and blue. While these displays have many benefits, they often come at the expense of user comfort as the stimuli can sometimes be perceived as being painful. Applications of electrotactile array include braille displays [132], vision substitutions [67], and augmented haptic display [69]. Strong et al. [132] characterized the small pattern discriminability to understand the possibility to render a braille picture to the blind by the electrotactile array. To help blind people perceive the surroundings, Kajimoto et al. [67] captures the outline of the view in front CHAPTER 1. INTRODUCTION 8

of the user by a camera and translates the signals to the electrotactile stimulation of 512 electrodes on the forehead. Besides the accessibility use cases, electrotactile array can also be used to augment the modalities of the user. Kajimoto et al. [69] built SmartTouch, an electric skin that conveys the “untouchable” surface information of a physical object to the user by electrotactile stimulation.

1.3.2 2.5D Tactile Shape Display

Pin Display

2.5D tactile pin displays are a bed-of-nails style device that utilize an array of pins which can be individually raised and lowered to approximate a 2.5D shape. It serves as a tangible user interface to provide spatially continuous haptic feedback. Compared with the Formable-Crust as we discuss in Sec. 1.3.2, it is relatively straightforward to render a 2.5D shape by setting the position of each of the pins, which can be directly encoded using a depth-map. However, the shape rendering dynamic range of 2.5D tactile pin displays is limited by the stroke of the pins. Since most of the state-of-the- art 2.5D tactile pin displays require an array of actuators to drive the pins, the high cost and bulky structure limit the widespread use of the device. Brake-based 2.5D tactile pin displays [108, 16] were proposed by researchers to overcome the above limitations, however at the expense of refresh rate. More discussion of actuated and brake-based 2.5D tactile shape displays are described in Sec. 1.3.2. Applications of 2.5D tactile pin displaya include remote palpation [54], volumetric data rendering [87], and providing physical affordances and constraints [37]. For example, to address the challenge of locating the hidden arteries beneath the opaque tissues, [54, 79] proposed a teleoperation system that utilizes a tactile array sensor pressed against the patient’s tissue to detect the local features and a 2.5D tactile pin array to display the processed signals to the surgeon. Sublimate [87] augments user’s perception of digital information by the actuated 2.5D tactile pin display. When the user edits a chart rendered by the augmented reality (AR) displays, the 2.5D tactile pin display deforms to match with the change. 2.5D tactile pin display can also provide dynamic physical affordances and constraints [37]. For example, physical buttons, lines, surfaces, and handles can be created by multiple pins for user interaction. Since the pins can move up and down very quickly, the affordances provided by the 2.5D tactile pin display can dynamically change to reflect current state of the system.

Formable-Crust

Formable-Crust type shape displays, as introduced in [10], are a set of grid-like structures that can form a variety of different shapes by allowing the propagation of its forward kinematics. Specif- ically, an array of joins, such as spherical joints or prismatic joints, are connected with rigid or hinged linkages [10, 116] to create the Formable-Crust. Formable-Crust type displays are a rela- tively unexplored research area compared with the number of works investigating 2.5D tactile pin displays. CHAPTER 1. INTRODUCTION 9

One advantage of the formable crusts over the 2.5D tactile pin displays are their ability to form various shapes thanks to the higher freedom between their elements and larger range of motion of the structures. To theoretically evaluate the shape rendering performance of Formable-Crust displays, [10] developed the Surface Freedom Measure (SFM), the Freedom from Ground Measure (FGM). Compared with a representative 2.5D tactile pin display FEELEX [57], various Formable- Crust designs proposed by [10] showed high score for SFM and FGM. However, in terms of Data Resolution (DR), another measure evaluating the density of information that can be presented by the device, FEELEX outperformed the formable crusts. Manufactability of the Formable-Crust also remains as a challenge since mass fabrication of the joints and links at a small scale is difficult [116]. Austina N. Nguyen [98] conducted a thorough study of formable crusts in the aspects of design, manufacturing and control. Several different unit cell designs were explored including an eight sided cell, a spherical joint cell, and a linear triangle cell. Manufactability, scalability, and deformability are the main attributes when selecting an unit cell for the Formable-Crust. To create a formable crust system, unit cells need be connected with each other in a matrix. Various matrices have also been proposed by [98] such as grid matrix and hexagon matrix. Manufacturing is a key factor in building a Formable-Crust. Micro-electromechanical systems (MEMS) technologies such as thermal press molding and injection molding, Laser Chemical Vapor Deposition (LCVD) and Stereolithography (SLA) are possible manufacturing techniques [98]. Controlling a complicated system with many joints and links is also a challenging task. [98] built models to describe the kinematics of the actuated formable crusts and the deformation of the rendered shape under an external force. However, many challenges remain in this class of shape display technology.

Actuation Technologies for 2.5D Tactile Shape Displays

To display a shape, actuators are used to actively drive the taxels of the 2.5D tactile shape displays. Alternatively, brakes can passively lock or engage desired part of of the 2.5D tactile shape display while a single global actuator moves up and down to render the target shape. Although compared with brakes most actuator based shape displays have the advantages of real time interaction and fast response, they usually have more complicated mechanical structures and higher cost. In this section, we will review characteristics of various active and brake-based actuation technologies for 2.5D tactile shape displays.

Active: Most active 2.5D tactile pin displays are driven by DC motors [57, 138]. FEELEX by Iwata et al. [57] used servo motors with a speed of 250 mm/ sec. To achieve a high resolution of 8 mm, a piston-crank mechanism was utilized in the FEELEX system with the bulky actuators placed on the side of the tactile display. Shape memory alloy (SMA) wires have also been used as compact and noiseless actuators for the 2.5D tactile pin displays [111, 54]. Lumen [111] by Poupyrev et al. aims at developing an aesthetically pleasing and calm display so that the smooth and continuous CHAPTER 1. INTRODUCTION 10

movement of SMAs are desirable. The response time of SMA wires is limited by the slow cooling of the material. To address this problem, Howe et al. [54] added force-air cooling to reduce the thermal response time. A lever was used for each pin to amplify the displacement of the SMA wires. Sensing has also been incorporated into the 2.5D tactile pin displays by using unique actuators. Relief by Leithinger et al. [88] utilized electric slide potentionmeters as the actuators which can sense the positions of the pins for closed loop control of the rendered shape. Contact force between user’s palm and the 2.5D tactile pin displays is another important parameter during the interaction. inFORCE by Nakagaki et al. [95] detects the contact force by measuring the current induced on the motors. On the other hand, active actuation of formable crusts still remain a very challenging problem due to the structural complexity of these devices [155]. Researchers have proposed many possible configurations for the actuation [10, 98, 116]. The proposed formable crusts are mainly actuated by hydraulic actuators [98, 116] which are faster, more reliable, and more controllable than pneumatic actuators [155]. Compared with electrical actuators, they are also more compact and light weight [155]. The fluidically driven actuator is designed to control the joint angle between two adjacent elements of the formable crust so that a shape can be rendered [117]. Although there are many proposed actuators for formable crusts, the implemented actuation of formable crusts is still in an early prototype stage [98]. Nguyen et al. [98] used many 1 DoF mechanical levers to drive unit cells of the formable crusts. However, the system needs to be actuated manually.

Brakes: As an alternative to actuator-per-pin tactile pin displays, a brake-per-pin paradigm can significantly reduce the form factor as well as cost of the device thanks to the simpler structure and higher force density of the brakes. The disadvantage of brake-based tactile pin displays is that they are not interactive. The entire tactile pin display needs to be reset for one pin to change its position, and this often takes more than one second. To render a shape, brake-based tactile pin displays utilize a global actuator to raise and lower all the pins together while individual brakes are engaged along the way to set pins at desired position. Compared with active tactile pin displays, brake-based design is less explored [16, 108]. Carlberg [16] developed a mechanical clutch that utilizes a titled washer to lock the pin. If the electromagnet in the clutch is activated, the washer will be forced into a substantially level position which frees the pin. This clutch design shows the potential for brake-based tactile pin displays. However, the manufacturing and assembly of this clutch is still complex which makes the system expensive. Peters [108] built fusible alloy clutches towards brake-based tactile pin display. When the fusible alloy solder cools down, the brake engages to hold the pin in place. The pin is able to move freely when the fusible alloy solder is heated above the melting temperature. This design can provide large contact force, while the refresh rate is limited by the time required to heat and cool down the fusible alloy solder and there are challenges in power and thermal management. In terms of formable crusts, as far as we know there is no existing research on brake mechanisms. CHAPTER 1. INTRODUCTION 11

As mentioned in [10], formable crusts are better than bed-of-nails devices in providing a larger degree of freedom for shape rendering. While brake-based formable crusts have simpler structures that reduce the cost, using a global actuator in the system instead of driving each unit cell by an individual actuator can also limit the shape rendering flexibility.

1.3.3 Tactile Content Authoring Methods

Tactile display are essentially a platform to deliver information to the users through tactile cues. Tactile content can be explicitly created by an author [66, 57]. FEELEX by Iwata et al. [57] utilized motion of the pins from a 2.5D tactile pin display with 6 × 6 pins to express angry and struggling of a Cambrian Era animal Anomalocaris. Sixteen patterns of pin motion were manually prepared by the author. To facilitate the manual authoring of the tactile content, researchers have developed various graphical user interface (GUI) design tools [75, 113]. Kim et al. [75] designed a GUI tool to help haptic designers manually add vibrotactile signals to enhance video watching experience for the users. Designers can draw lines with a tactile brush on top of each frame extracted from a video to represent the required tactile stimuli. To add temporally varying tactile effects, the design tool can also overlap two adjacent frames to show the motion of the objects. Manual authoring can be very laborious if a large amount of tactile content needs to be created. To address this problem, most tactile content is automatically generated from digital information such as text [9], 3D models [88, 116], databases [95], inertial sensors (e.g. accelerometers or gyrom- eters) [140], tactile sensors [54], audio signals [86], or visual signals [74, 69]. The conversion from text to braille code is a highly explored field. Blenkhorn developed a table driven method [9] that can be configured to translate a wide range of different languages into braille code. The algorithm was designed to help braille specialists who are not computer science experts to create and modify braille codes. 3D models can easily be automatically converted to tactile content that can be rendered by a 2.5D tactile pin display. Relief by Leithinger et al. [88] used an open source software Processing to build 3D models for the geographical exploration applications. The position of each pin of the 2.5D tactile pin display is controlled by an Arduino program to simulate the 3D model. Researchers also explored remotely rendering tactile stimuli on a 2.5D tactile pin display for teleoperation tasks like palpation [54]. The tactile cues were collected by a probe containing a tactile array sensor which was in contact with the patient’s tissue. Signal processing algorithms were developed to translate the raw data from the tactile array sensor to the shape rendered on the 2.5D tactile pin display. To generate tactile stimuli for more realistic multimedia experience, the most comprehensive approach is to capture the exact forces and motion by placing the appropriate sensors directly in the scene. For instance, piezo-electric sensors and accelerometers have been used to monitor contact forces [106] and motion profiles [12]. However, capturing these physical sensor data can dramatically increase the setup time and the overall cost as additional technicians and equipment are required. CHAPTER 1. INTRODUCTION 12

In addition, most videos available in video streaming platforms do not have the necessary force or motion sensor data for haptic rendering. Tactile content can also be automatically translated from visual features to enhance the video watching experience. Kim et al. [74] utilized visual saliency, the visual features that attract the user’s attention, to build spatial and temporal maps which can be further converted to spatial tactile maps by multiple thresholding and downsampling steps. However, this method can often generate contextually inapt tactile effects due to the lack of the audio features. For example, strong tactile stimuli is always generated whenever there is a large object moving across the scene. However, this tactile stimuli may not be appropriate. Thus, the method is still limited to process relatively simple scenes. Audio signals have also been used to generate tactile effects by analyzing their frequency characteristics [20], vibrotactile scores (a metaphor of musical scores) [85] or perception-level inten- sity [86]. However, simply translating an audio signal to a haptic signal without understanding its semantic correspondence with visual content could potentially lead to inadequate haptic effects. For example, directly translating an off-screen narrator’s voice to a tactile stimulus can be distracting to the audience.

1.4 Dissertation Overview

This chapter discussed the motivation for hardware design and content authoring towards low-cost spatial haptics. Chapter 2 describes design and analysis of electrostatic adhesive brakes which were used to build a refreshable 2.5D tactile shape display. A user study was conducted to evaluate the performance of the 2.5D tactile shape display. Chapter 3 introduces a 2.5D shape display based on electrically programmable auxetic materials. A simulation model and a preliminary prototype system were built. Chapter 4 explores an automatic spatial tactile content authoring method that analyzes cross-modality features in a video. A user study was carried out to compare our method with plain videos and a saliency-driven generation method from prior work. Chapter 2

Electrostatic Adhesive Brakes for 2.5D Bed-of-nails Tactile Shape Displays

Tactile displays enable users to perceive small-scale surface and shape characteristics with their fingers and hands. Unlike audiovisual feedback, tactile output affords richer physical interactions and leverages the innate dexterity and spatial acuity of our hands and fingers. As described in the previous chapter, tactile displays have been explored in applications for improving information acces- sibility of visual-impaired people [24, 109, 136], telepresence [21, 110, 138, 7], and human-computer interaction (HCI) [57, 103]. Most commonly these displays are achieved through actuating an array of tactors, or pins, which can be raised or lowered to create rasterized shapes and surfaces. Among tactile displays, 2.5D tactile displays distinguish themselves by enabling larger pin displacements, allowing them to render larger-scale 2.5D approximations of an object’s global shape, similar to a relief sculpture. An advantage of large 2.5D tactile arrays is that they afford whole hand interaction, which is important for gross shape perception especially in the context of tactile graphics for blind and visually impaired people. And while other researchers have achieved promising results with a variety of techniques, including creating illusions of shape perception using a simple tilting platform to display the first order information of a curved surface [143, 33], these approaches cannot support such interaction. Tactile displays can usually be categorized into static refreshable displays and dynamic displays [136]. Static refreshable displays, while they are unable to provide interactive haptic feedback at rates commonly found in traditional haptic interfaces, have many potential uses especially in the context of tactile spatial graphics for people who are blind or visually impaired, where it may take significant time to fully explore a shape. An ideal static refreshable 2.5D tactile shape display should

13 CHAPTER 2. 2.5D BED-OF-NAILS TACTILE SHAPE DISPLAYS 14

possess high spatial resolution for shape recognition (≈ 2-3 mm [124]), sufficient refresh rate (≈ 10 second [136]) and support contact forces of at least 50 gf generated by the user’s finger during feature exploration [127]. Cost is another important consideration: if manufacturing costs (e.g., raw material, assembling and testing) can be reduced for a large scale (100 × 100) high-resolution shape display to less than $0.10 USD per pin, it will be available at a price comparable to other consumer electronics device (e.g., a smartphone or laptop). The device should also be lightweight and enable large pin displacements to support a wide variety of tactile rendering scenarios. Various actuation techniques have been explored for dynamic and static 2.5D tactile displays including linear actuators [125, 138, 57, 37], electromagnetic (EM) actuation [151], shape-memory alloys (SMA) [111, 96], hydraulic actuation [156], microelectromechanical systems (MEMS) [99] and piezoelectric actuation [80, 58] (more details are provided in the prior chapter, see [136] or [7] for a comprehensive review). In this chapter, we present our work on a new static refreshable 2.5D tactile shape display based on electrostatic adhesive brakes. We detail the design, modeling, and fabrication of an individual electrostatic adhesive brake mechanism. We demonstrate two levels of high spatial resolution enabled by this mechanism, fabricating separate rows with 4 mm and 1.7 mm inter-pin spacing, respectively. We further characterize our electrostatic adhesive brake in a series of experiments measuring its maximum sustainable contact force, robustness, refresh rate, and residual force. Based on the individual mechanism, we demonstrate an electrostatic adhesive brake-based tactile shape display prototype with a 4 × 2 array of pins. We reduce the display’s raw material cost to $0.11 USD per pin by using a transistor-based solid-state brake (0.09 USD for 57.8 mm2 of PVDF based dielectric film, 0.02 USD for ON MMBTA42LT1G transistor at qt. of 10,000 and 0.004 USD for sputtered metal electrodes). To further increase the maximum contact force sustainable by each pin during haptic exploration, we develop and evaluate a simple, global compliant mechanical clutch to hold the pins in place once initially positioned via electrostatic adhesion. Finally, we present the results of a user study evaluating shape recognition using our tactile shape display prototype compared to static 2.5D tactile graphics as a control.

• c 2018 IEEE. Reprinted with permission, from Zhang, Kai, et al. ”Electrostatic Adhesive Brakes for High Spatial Resolution Refreshable 2.5D Tactile Shape Displays.” IEEE Haptics Symposium, 2018 [153].

• c 2019 IEEE. Reprinted with permission, from Zhang, Kai, et al. ”Design and Analysis of High-Resolution Electrostatic Adhesive Brakes Towards Static Refreshable 2.5D Tactile Shape Display.” IEEE Transaction on Haptics, 2019 [154]. CHAPTER 2. 2.5D BED-OF-NAILS TACTILE SHAPE DISPLAYS 15

2.1 Design of an Individual Electrostatic Adhesive Brake

2.1.1 Background

Electrostatic adhesion, first reported by Johnsen & Rahbek in 1923 [60], is a technique commonly used in industry for semiconductor wafer chucking [5] and chemical vapor deposition [19]. More recently, there has been increased interest in electrostatic adhesion for robotics applications such as wall climbing robots [76] and exoskeletal actuation [31]. In general, electrostatic adhesive forces are generated between a substrate under prehension and an electroadhesive pad made by conduc- tive interdigitated electrodes deposited on the surface of a dielectric material. When engaged, the alternating charges on adjacent electrodes create electric fields that induce opposite charges on the substrate material, resulting in coulombic attraction between the pad and the substrate. Electrostatic adhesion has several advantages over other clutching techniques. Electrostatic adhesive mechanisms can easily be made in small form factors, at low cost, and with very low weight since the patterned electrodes and dielectric material are very thin. Furthermore, they are solid-state and can be made using traditional electronics fabrication methods (e.g., rollable printing) which ensures minimal cost. When the electrodes are active, current flow in the system is very small (e.g., 60 µA) yielding low power consumption. Additionally, a variety of substrate materials can be used with electrostatic adhesion, both conductive and non-conductive; however, conductive materials such as metals generate larger attractive forces. Specifically, in our design we use electrostatic adhesion to clutch 1.6 mm square brass pins to custom interdigidated electrode pads on a dielectric film fabricated using gold sputtering and laser ablation, as shown in Fig. 2.1.

2.1.2 Modeling

We provide a theoretical model to understand design space of our electroadhesive (EA) brake. The electroadhesive force exerted by the electrodes is modelled using a simplified 2D representation of the Maxwell stress tensor method [15, 22]. Neglecting the effects of magnetism, the electrostatic

Maxwell stress tensor Tij is defined in component form as:

 1  T =  E E − δ ||E||2 (2.1) ij i j 2 ij where  is the dielectric permittivity, δij is the Kronecker delta, E is the electric field in dielectric th th layer, and Ei and Ej are its i and j components, respectively. The electric can be readily calculated from the electric potential Φ as:

E = −∇Φ (2.2) CHAPTER 2. 2.5D BED-OF-NAILS TACTILE SHAPE DISPLAYS 16

Figure 2.1: Diagram (a) and cross-section (b) of an individual electrostatic adhesive brake. Electro- static attraction is generated between the interdigital electrodes and the brass pin when a voltage differential is applied. where Φ must also satisfy the Laplace equation ∇2Φ = 0. We focus our analysis on a single period of the interdigital electrode structure (i.e., s+w in Fig.

2.1). The electroadhesive normal force fEA,N per unit width of pin can then be calculated as: I fEA,N = Tzz dS S Z w+s (2.3) 1 2 2 = 0 [Ex(x, y, t) − Ey (x, y, t)]dx 2 0 where 0 is the permittivity of air, w is the electrode width, s is the gap width between two electrodes, and Ex and Ey are the spatio-temporally varying electric field components. Note that this model assumes a uniform electric field in the z-direction, and edge effects of the electrode array are neglected. Also note that time-varying characteristics of the electric field are considered here to account for its dynamic response to a step voltage applied to the electrode. When a step voltage is applied, the EA force between the pad and substrate increases over time until saturation (when FEA,N is maximum). This dynamic polarization further creates a time-varying air CHAPTER 2. 2.5D BED-OF-NAILS TACTILE SHAPE DISPLAYS 17

gap da(t) between the EA pad and pin.

The total EA normal force FEA,N across n electrode periods exerted on a pin of width l is subsequently found by:

FEA,N = n ∗ l ∗ fEA,N Z w+s (2.4) 1 2 2 = n0l [Ey (x, y, t) − Ex(x, y, t)]dx 2 0 If the pin is not grounded (as in our case) and dielectric breakdown effects are considered, following [15] we then have:

2 1   2 F = n l[ r − 1]C¯(w, ˜ d˜) Eair  EA,N 2 0  BD 0 (2.5) w 2(d + d ) wherew ˜ ≡ , d˜≡ a w + s w + s air where r is the relative permittivity of the dielectric film, EBD is the breakdown electric strength of air, and C¯(w, ˜ d˜) is defined as a dimensionless function of geometric parameters comprising the electrode width w, the interelectrode spacing s, the air gap between the dielectric film and the pin da, and the thickness of the the dielectric film d. As detailed in [15], the larger the applied voltage and permittivity of the dielectric film, and the smaller the interelectrode spacing, air gap and film thickness, the greater the obtainable EA normal force. In tactile display applications, the brake is expected to experience non-negligible tangential forces. The maximum contact force supported by an engaged pin is determined by the maximum shear force

FEA,S supported by the engaged EA brake, which can be expressed as:

FEA,S = µ(FEA,N + FSuc + FV an + FRes cos θ) (2.6) + FRes sin θ where µ is the coefficient of static friction (typically > 1 for most EA surfaces), FSuc is the suction force between the EA pad and pin due to negative pressure, FV an is the Van Der Waals force between the EA pad and pin (FSuc and FV an are typically negligible, however), FRes is the restriction force occurring due to the shear displacement of the EA pad, and θ is the angle between FEA,N and FRes.

Since FEA,N is proportional to both pin width and length, increasing these parameters is one way to further increase FEA,S. The theoretical formulation above provides reasonable guidelines for interdigital electrode de- sign in tactile applications. However, there are some trade-offs between these design parameters that prevent us from arbitrarily increasing the maximum allowable contact force (≈ FEA,S). Most prominently, although increasing the pin’s width or length increases FEA,N (and thus FEA,S), it leads to a larger form factor and lower spatial resolution of the tactile display. Furthermore, higher voltages may be at odds with safety considerations as users come into direct contact with the system. CHAPTER 2. 2.5D BED-OF-NAILS TACTILE SHAPE DISPLAYS 18

Figure 2.2: Top view of an individual electrostatic adhesive brake. The dielectric film is attached to the sides between the frame and two PCBs used to send signals to the electrodes. Overhang of the pin tensions the film and reduces gaps. (Modified from [153], Fig. 4, c 2018 IEEE.)

Other considerations include the increased cost with higher dielectric constant materials, reduced material strength with reduction in film thickness d, and increased residual force in the disengaged brake with larger friction coefficient µ. In our design evaluation of the electrostatic adhesive brake, we test multiple combinations of these parameters to find a good balance between contact force, cost, spatial resolution, and residual force.

2.1.3 Implementation

Figure 2.1(a) shows a diagram of an individual electroadhesive brake. In general, an individual brake consists of:

• a 1.6 mm square brass pin (140 mm length)

• a Delrin frame with a 1.90 mm wide by 1.21 mm deep rectangular groove to accept the pin and serve as the primary mechanical structure

• a dielectric film (PolyK Technologies PVDF-TrFE-CFE, r = 50) with interdigital electrodes deposited through gold sputtering and laser ablation

• 2 custom side-mounted printed circuit boards (PCBs) that route signals between the interdig- ital electrodes and a main control board

While not shown in Fig. 2.1, a linear actuator (Actuonix L12-50-50-6-P) moves a 3D-printed CHAPTER 2. 2.5D BED-OF-NAILS TACTILE SHAPE DISPLAYS 19

Figure 2.3: Circuit schematic of individual electrostatic adhesive brake. A high voltage transistor controls the voltage applied to the electrodes (Vcc), with a series resistor R1 used to limit current. The control signal Vin is sent via microcontroller. (Modified from [153], Fig. 7, c 2018 IEEE.) staging platform below the pins to position them prior to engaging the electroadhesive brake. Break- out pads deposited on the dielectric film connect to pads on the PCBs through conductive tape. The trace width of the interdigital electrodes were selected as 500 µm, following results from [148]. The PCBs are 0.5 mm thick to minimize the row-to-row pitch of the tactile display. To control brake engagement, the circuit shown in Fig. 2.3 is used. A high voltage transistor (STMicroelectronics STN0214) switches one electrode between a high voltage (engaged) and ground (disengaged); the complementary electrode is fixed at ground potential. A high voltage DC-DC converter (EMCO AG05p-5) shifts applied voltages from 1.4-2 V to 250-335 V. A resistor (R1 in Fig. 2.3) is used to limit current in the system. We explore both large (5 MΩ) and small (i.e., 2.7 kΩ) values for R1. The larger restricts current below 60 µA when the brake is off, lowering the power consumption (at 300V) of an individual brake to 18 mW and improving safety. The smaller R1 value allows a larger charging current, enabling faster brake engagement as measured in Section 2.1.6. To maximize contact between the pin and dielectric film, the groove in the Delrin frame is made 0.39 mm shallower than the pin. This slight protrusion, as shown in Fig. 2.2, tensions the film against the pin, minimizing air gaps in the interface and thus increasing the electroadhesive force generated. To demonstrate the potential for the mechanism’s miniaturization, we also fabricate a row of electroadhesive brakes with 0.8 mm brass pins and 1.7 mm pitch, shown in Fig. 2.4. In the following sections, we detail a series of experimental evaluations used to characterize the ef- fects of altering design parameters (e.g., voltage, electrode spacing, etc.) on the brake’s performance (e.g., electroadhesive force generation, engagement time, etc.). All evaluations were performed using the 1.6 mm pin setup. CHAPTER 2. 2.5D BED-OF-NAILS TACTILE SHAPE DISPLAYS 20

Figure 2.4: A high-resolution row of electrostatic adhesive brakes, with a pitch of 1.7 mm and 0.8 mm pins. Inset shows interdigital electrodes fabricated via laser ablation on a 8 µm dielectric film. (From [153], Fig. 1(A), c 2018 IEEE.)

2.1.4 Evaluation: Quasi-static Loading

Contact with the user’s fingertip is the core element of interaction with any tactile display. Thus, it is important for a tactile display to support the loads provided by the fingertip as the user explores the display surface. In this section, we ran three experiments to evaluate the effects of interelectrode spacing, dielectric film thickness and electroadhesive brake’s resolution on the brake’s maximum sustainable contact force before pin slippage. Above parameters are chosen to encompass the most important parameters in our theoretical model (Eq. 2.5). These measurements help us better understand ability and design space of our electroadhesive brake to provide large sustainable contact force.

Apparatus & Procedure

The setup of this experiment is shown in Fig. 2.5(a). We simulate the scenario of a fingertip touching the pin using a rate-controlled (2.6 mm/s) linear actuator (Actuonix L12-50-50-6-P) mounted with a force sensor (Honeywell FSG005, 9.8 mN sensitivity). To limit the effects of gravity, the experiment is carried out horizontally, with a single pin laying in a groove of the support frame. As the end- effector of the actuator encounters the pin, the magnitude of the measured contact force increases. As the actuator continues to push on the pin, the measured force increases until the pin begins to slip reletative to the dielectric film and the force drops – we take the maximum force sensor reading CHAPTER 2. 2.5D BED-OF-NAILS TACTILE SHAPE DISPLAYS 21

(a) (b)

1 cm

(c) (d)

Figure 2.5: (a) Experimental setup for contact force measurement. A linear actuator slowly engages the clutched pin and force is measured until brake failure. (b) Maximum contact force of an elec- troadhesive brake as a function of interdigital electrode spacing. 150 µm, 300 µm and 500 µm gap were investigated. (c) Maximum contact force of an electroadhesive brake as a function of dielectric film thickness. 8 µm and 24 µm thickness were investigated. (d) Maximum contact force of an elec- troadhesive brake as a function of brake resolution. A higher resolution brake (0.8 mm pin width, 1.7 mm pitch) and a lower resolution brake (1.6 mm pin width, 4 mm pitch) were investigated. Three trials were performed for each condition, with standard deviation shown. CHAPTER 2. 2.5D BED-OF-NAILS TACTILE SHAPE DISPLAYS 22

prior to slipping as the maximum contact force FC,max sustainable by the pin. We ran three experiments: (1) comparing the effect of interelectrode spacing (150 µm, 300 µm,

500 µm) on FC,max, (2) comparing the effect of dielectric film thickness (8 µm, 24 µm) on FC,max , and (3) comparing FC,max for a high resolution brake (0.8 mm pin width, 1.7 mm pitch) to that of a lower resolution brake (1.6 mm pin width, 4 mm pitch). All experiments were conducted across a range of voltages (50-340 V) with a fixed electrode length of 60.3 mm.

Results

The results of these experiments are shown in Fig. 2.5(b), (c) and (d), respectively. Smaller inter- electrode gaps and larger voltages yielded larger FC,max values. The thinner dielectric film (8 µm) yielded larger FC,max for voltages below 200 V, while the thicker film (24 µm) yielded larger FC,max at higher voltages (60% increase at 300 V). The higher resolution brake (0.8 mm pin width, 1.7 mm pitch) supports a decent amount of shear contact force (41 gf at 294V, 52 gf at 335V) considering its small form factor. For comparison, the friction force on the pin when the brake was not engaged was measured to be 0.7 gf.

Discussion & Parameter Selection

As we can observe from Fig. 2.5, our measurement results are consistent with our theoretical model’s qualitative prediction in Sec. 2.1.2. The larger the applied voltage, the larger the pin widths, the smaller the interelectrode spacing and the smaller the film thickness (for voltages below 200 V), the larger the sustainable contact force. We will discuss more details of each measurement in following paragraphs. As shown in Fig. 2.5(b), higher contact force is observed for smaller interelectrode spacing which is consistent with theoretical analysis results in [15]. However, we cannot further reduce interelectrode spacing to increase contact force because during preliminary testing it was found that spacing smaller than 100 µm produced sparks between the electrodes when 300 V was applied, which can damage the dielectric film and be hazardous to users. Although 150 µm spacing interelectrodes yields largest contact force in our measurement, we selected an interelectrode spacing of 300 µm in our tactile display design to ensure an adequate safety factor, We also see that rate at which contact force grows with respect to voltage decreases as the applied voltage grows. Given the results of these experiments, we chose to use a voltage of 300 V for our system as it supports sufficiently high contact loads while minimizing the risk of sparking, charge accumulation issues, and higher cost components. As shown in Fig. 2.5(c), we also measured the maximum contact force for different dielectric film thicknesses. While a thicker film (24 µm) supported larger contact forces in the 300 V range, for cost and assembly considerations we select a dielectric film thickness of 8 µm. CHAPTER 2. 2.5D BED-OF-NAILS TACTILE SHAPE DISPLAYS 23

Figure 2.6: Comparison of maximum contact force provided DC and AC voltage applied on interdig- ital electrodes. All measurements are carried out under a voltage of 194.4V, using 60.3 mm length interdigital electrodes, a 1.6 mm width brass pin, and 8 µm thickness dielectric film.

Although a individual higher resolution brake with 0.8 mm pin width provides a smaller contact force as shown in Fig. 2.5(d), it has a higher density of pins in a fixed contact area with user’s finger, thus being very promising for a high resolution tactile shape display (1.7 mm pitch with 0.8 mm pin width).

2.1.5 Evaluation: AC vs. DC Voltage

It has been shown that using AC voltage can reduce residual attractive forces in electrostatic adhesion applications by reducing charge accumulation [148]. Here we examine the effect of voltage type and frequency on the maximum contact force sustained by the brake. Using the same experimental setup as detailed in Section 2.1.4, we measure the maximum contact force for multiple voltage frequencies. We use a 50% duty cycle bipolar square wave with a peak voltage of 194.4 V, interdigital electrode length of 60.3 mm, and a 1.6 mm width brass pin. Results are shown in Fig. 2.6, and indicate that contact force decreases with increased AC voltage frequency. In all cases, DC voltage yielded the largest contact force. For this reason, we select DC voltage operation for our electrostatic adhesive brake. CHAPTER 2. 2.5D BED-OF-NAILS TACTILE SHAPE DISPLAYS 24

Figure 2.7: Brake engage time measurement. All measurements are carried out using 60.3 mm length interdigital electrodes, a 1.6 mm width brass pin, and 8 µm thickness dielectric film. (a) Engagement time as a function of applied voltage. (b) Engagement time as a function of pin velocity. Voltage applied was 294 V.

2.1.6 Evaluation: Brake Engagement Time

Modeling

The engagement time of our electrostatic adhesive brake is dependent on multiple factors including the electrical aspects as well as mechanical deformation of the film to come in to close contact with the pin. The first we discuss is the charging characteristics of the brake, which we model as a simple capacitor. The schematic in Fig. 2.14(a) shows that when the brake is engaged, the system can be approximated as a simple RC circuit, with current flowing through a pull-up resistor to charge the interdigital electrodes. Thus, charging time can be approximated given the RC time constant of the circuit. We estimate the capacitance of the brake using the following equation:

Aη wh η C =   =   contact (2.7) r 0 d r 0 4d where η is the percentage of non-conductive area on the electrode pad, w is the pin width, and hcontact is the length of pin in contact with electrodes. In our system, we have η = 62.5%, w = 1.6 mm and hcontact = 60.3 mm. This corresponds to a capacitance of 0.82 nF. We use a 2.7 kΩ resistor in series to charge the electrodes; thus, the RC time constant is 2.2 µs. While this value estimates the time to electrically charge the brake, mechanical properties such as air gap in the pin-film interface and wrinkles of film also play an very important role. Since these properties can vary with time thus difficult to model, we perform an experimental investigation to quantify the total brake engagement time for our mechanism. CHAPTER 2. 2.5D BED-OF-NAILS TACTILE SHAPE DISPLAYS 25

Apparatus & Procedure

The experiment was conducted using a single pin electroadhesive brake vertically mounted. The pin was modified with a light conductive element mounted to the bottom, which bridged two small conductive plates on the actuator platform when the pin was resting on it. With the brake dis- engaged, the pin was driven down by the actuated platform at a constant speed. The brake was then engaged, and the time difference between switching the voltage on (t1) and the pin separating from the downward-traveling platform (t2) was measured to be the brake engagement time. t1 was captured in software after sending the brake command and t2 was found by measuring the electrical connectivity between the actuator plates. Since the actuator speed used to raise and lower the pins is potentially a large factor in deter- mining the refresh rate of a brake-based tactile display, we also characterize brake engagement time for various actuator speeds as well as voltages.

Results & Discussion

Figure 2.7(a) shows the results of varying applied voltage on engagement time of the electrostatic adhesive brake. The pin was lowered at 12.1 mm/s in this experiment. Intuitively, larger voltage leads to larger attractive forces, which leads to faster brake engagement. For 335V, the largest voltage tested, we observed an engagement time of 6.7 ms. We also note that for smaller voltages (below 200V) measured engagement time was much less consistent, indicated by the large standard deviation bars. Moreover, the experimentally determined engagement time (6.7 ms @ 335V) is significantly larger than the calculated electrical time constant (2.2 µs), indicating that unmodeled mechanical factors play a dominant role in dictating engagement time for the brake. Figure 2.7(b) shows brake engagement time measured as a function of actuation velocity. As the results show, brake engagement did not show significant variation across a velocity range of 11 to 31 mm/s when measured for a voltage of 294V (all roughly 7 ms). By using higher actuation speeds, the refresh rate of an overall display can be increased.

2.1.7 Evaluation: Residual Force

When refreshing the tactile shape display, voltage to the electrodes is switched off and the electro- static attraction force between the pin and film dissipates, allowing the pins to detach. Ideally, this process is instantaneous and no force is acting on the pin once the brake is off. In practice, however, some residual electrostatic force remains even after the electrodes are off. Since each pin is very light (1 gf), these residual forces may have a significant effect on system performance (e.g., creating unwanted impedance when resetting the display). Thus, we characterized the residual force in an individual brake for a range of applied voltages, as shown by Fig. 2.8. The experimental setup was identical to that in Section 2.1.4. Residual force was measured as CHAPTER 2. 2.5D BED-OF-NAILS TACTILE SHAPE DISPLAYS 26

Figure 2.8: Maximum contact force and residue force observed for different applied voltages. Maxi- mum contact force is measured with the electrodes turned on, while residual force is measured with the electrodes just switched off. All measurements are carried out using 60.3 mm length interdigital electrodes, a 1.6 mm width brass pin, and 8 µm thickness dielectric film. the average resistance experienced by the actuator driving the pin at 2.6 mm/s after the electrodes were turned off. Interdigital electrodes with a total length of 60.3 mm and a 1.6 mm square pin were used in this experiment. The friction force between the pin and Delrin frame alone was previously measured to be 0.7 gf; this is subtracted from all measurements to obtain the contribution from residual force alone. The average electrostatic residual force measured across all tested voltages is 0.6 gf. Thus, total resistive force in the brake after switching off the voltage is 1.3 gf. Furthermore, higher residual force is observed for higher applied voltages.

2.1.8 Evaluation: Robustness

Static Loading Robustness

In the context of a tactile shape display, it is important that our electrostatic adhesive brake mecha- nism to repeatedly and reliably (1) attach and support the pins (1 gf) when engaged and (2) handle the loads imparted by fingers repeatedly making contact. We conduct two experiments to evaluate these conditions. In both experiments, the brake setup was mounted vertically. In the first exper- iment, a pin is repeatedly attached to the film via engagement of the brake and reset using the global actuation platform. An electrically grounded pad on the actuation platform is used to reduce charge accumulation. The goal of this experiment is to observe the number of cycles achieved before failure (i.e., unsuccessful pin attachment or failure to detach due to residual adhesion build up). We CHAPTER 2. 2.5D BED-OF-NAILS TACTILE SHAPE DISPLAYS 27

Table 2.1: Static Loading Robustness Test Experiment Load Cycles Until Failure 1* 1 gf 4740 13 gf over 1021 2** 18 gf over 1021 23 gf 80

*Failure considered unsuccessful attachment of pin to film, or failure to detach from film when brake disengaged. Carried out with a voltage of 294 V. Weight of pin alone is 1 gf. **Failure considered pin detachment from film and/or significant slippage. Carried out with a voltage of 318 V. observed failures of pin detachment after 4740 cycles of successful loading and unloading of the pin (see Table 2.1, Experiment 1). This is due to increasing residual adhesion during the course of the test which can be mitigated after resting the device[32]. No failure of pin attachment is observed during the test. The experiment was carried out with a voltage of 294 V. The second experiment evaluated the repeated loading/unloading of an attached pin. Three calibration weights were tested: 13 gf, 18 gf, and 23 gf. In this experiment, a calibration weight was hung on the bottom of the pin with fishing line and the brake was engaged. The actuation platform was raised and lowered cyclically to repeatedly load and unload the hanging weight from the pin. In the unloaded state, the weight was supported by the platform; in the loaded state, the weight hung freely from the pin. We report the number of cycles observed before failure (i.e., pin detachment). The experiment was carried out with a voltage of 318 V. The results of the two experiments are compiled in Table 2.1. We observed no failure in the repeated loading of 13 gf and 18 gf calibration weights after 1021 cycles. In the 23 gf case, we observed failure after 80 cycles. These results indicate the electroadhesive brake can reliably function and sustain expected contact loads over time.

Maximum Contact Force Repeatability

To further characterize the robustness of our electrostatic adhesive brake, we also measure the consistency of the maximum contact force observed over multiple trials. The experimental setup is identical to that in Section 2.1.4; here the procedure is repeated for a fixed voltage and changes in the maximum contact force are observed over 10 trials. The brass pin is discharged using conductive fabric after each trial to prevent charge accumulation. Interdigital electrodes with a total length of 60.3 mm and a 1.6 mm square pin were used in this experiment. The results of this experiment are shown in Fig. 2.9. Although the maximum contact force shows some variation over 10 trials (STD 3.89 gf for 250 V, STD 6.81 gf for 294 V), the observed minimums of 40.9 gf (250 V) and 45.2 gf (294 V) indicate the brake would still support contact forces expected during haptic exploration. CHAPTER 2. 2.5D BED-OF-NAILS TACTILE SHAPE DISPLAYS 28

Figure 2.9: Repeatability of maximum contact force sustainable by an electroadhesive brakes is evaluated over ten trials, with electrostatic discharging of the pin after each trial. All measurements are carried out using 60.3 mm length interdigital electrodes, a 1.6 mm width brass pin, and 8 µm thickness dielectric film.

Charge Accumulation & Contact Force Degradation

Charge accumulation is a known challenge in electroadhesive force applications that can cause per- formance issues. When a user’s fingertip contacts the tactile display with significant force, relative movement between the pin and electrodes will induce a charge buildup in the brass pin. To asses the effect of this charge accumulation on the maximum contact force sustainable by our electroad- hesive brake, we perform an experiment identical to that in the preceding section except we did not discharge the pin between trials to remove accumulated charge. The results of this experiment are shown in Fig. 2.10. We see that contact force tends to degrade with repeated trials, decreasing by approximately 10% after 50 trials of loading until failure. As will be discussed in Section 2.2.2, a mechanical clutch is developed to increase the contact force sustainable by each pin after initially attached using electrostatic attraction. Thus, relative motion between pin and film is further minimized, and the effect of charge accumulation is less severe.

2.2 Design of a 2.5D Tactile Display using Electrostatic Ad- hesive Brakes

2.2.1 System Workflow

In this section, we introduce the basic operating process of an electroadhesive brake-based refreshable tactile shape display. This workflow is illustrated by Fig. 2.11. Initially, all brakes are disengaged CHAPTER 2. 2.5D BED-OF-NAILS TACTILE SHAPE DISPLAYS 29

Figure 2.10: Repeatability of maximum contact force sustainable by an electroadhesive brakes is evaluated over 50 trials, without discharging the pin after each trial. We observe a slight decreasing trend in magnitude (red) due to charge accumulation in the brake. All measurements are carried out using 60.3 mm length interdigital electrodes, a voltage of 294 V, a 1.6 mm width brass pin, and 8 µm thickness dielectric film. and all pins are resting on the global actuation platform. The platform is first raised to its maximum height, and then lowered (Fig. 2.11.1). Driven by gravity, the pins follow the platform’s movement downward. During this process, individual electroadhesive brakes engage to clutch each pin at their desired heights (Fig. 2.11.2), leaving the desired 2.5D surface rendered (Fig. 2.11.3). To further increase the maximum contact loads supported by the rendered surface, a global mechanical clutch (described in Section 2.2.2) can engage all pins once the surface is rendered. To refresh the display, the mechanical clutch is disengaged, followed by all electroadhesive brakes. The platform is then raised to its highest position, physically disengaging any pins that may remain clutched due to residual electrostatic adhesion (Fig. 2.11.4). If rendering another surface, the process then begins again with a different set of desired pin positions.

2.2.2 System Design and Implementation

To demonstrate the the potential for electroadhesive braking in high resolution shape rendering applications, we designed a 4 × 2 tactile shape display prototype shown in Figure 2.12.

Mechanical Assembly

A top view schematic of the display is illustrated in Fig. 2.13. The main structure is composed of four layered Delrin sheets (1.98 mm thick), each with two pin grooves (1.90 mm wide, 1.21 mm deep, 4 mm spacing) milled using a desktop CNC (Bantam Tools Desktop PCB Milling Machine). CHAPTER 2. 2.5D BED-OF-NAILS TACTILE SHAPE DISPLAYS 30

Figure 2.11: Rendering workflow of an electroadhesive brake-based refreshable tactile display. An actuated platform positions the pins which are then clutched via electroadhesive braking. When the surface is complete, a global mechanical clutch can be engaged to further increased the holding force on the pins. Red indicates brake/clutch engagement.

Square brass pins (1.6 mm) in each row are constrained between the grooves and a dielectric film (8 µm, PolyK Technologies PVDF-TrFE-CFE). Two pairs of interdigital electrodes deposited on each dielectric film through gold sputtering and laser ablation enable individual braking of each pin. The dielectric film is tensioned and fixed to the sides of the Delrin layer via adhesive. A custom PCB lines each side of the layer, routing signals from a master control board to breakout pads on the dielectric film via a thin layer of conductive tape. The two PCBs, dielectric film, milled Derin sheet, and two pins make up a single row of the display. Rows are then stacked with 3.2 mm spacing to create a 4 × 2 pin array. CHAPTER 2. 2.5D BED-OF-NAILS TACTILE SHAPE DISPLAYS 31

Figure 2.12: Prototype tactile shape display. A single actuated platform moves pins into position for braking. PCBs in each row route signals between the electrodes and transistors on a separate board. (From [153], Fig. 8, c 2018 IEEE.)

Figure 2.13: Top view of the tactile display prototype. Pitch size is 4 mm within-row and 3.2 mm between-row. Milled Delrin sheets are used to constrain the pins, and side-mounted PCBs connect the interdigital electrodes to a main control board. CHAPTER 2. 2.5D BED-OF-NAILS TACTILE SHAPE DISPLAYS 32

Figure 2.14: System diagram of our tactile shape display. The CPU forwards desired pin positions to a microcontroller over USB Serial, which then sets the display accordingly by driving an actuated platform and applying the electroadhesive brakes at the proper heights.

Electronics & Control

Signals to the interdigital electrodes are governed by a control board consisting of a microcontroller (PJRC Teensy 3.6), 9 high voltage transistors (STMicroelectronics STN0214), a high voltage DC-DC converter to shift voltages from 1.4-2 V to 250-335 V, and a motor driver (TI DRV8833) for con- trolling a global actuation platform. An individual control circuit diagram for a single electrostatic adhesive brake is shown in Fig. 2.3. A 5 MΩ current-limiting resistor is used in each brake to keep current below 60 µA for safety considerations. An overall system diagram is shown in Fig. 2.14.

Actuation

To actively raise and lower pins within the display, a global actuation platform is used. A linear actuator (Actuonix L12-50-50-6-P) is fit with a 3D printed platform beneath the pins and drives them to their desired heights prior the brakes engaging.

2.2.3 Mechanical Clutch

With the setup as-described, the entire 4 × 2 tactile array can support approximately 300-400 gf of contact force when operating at 300 V, assuming all pins are contacted equally. The main limiting factor for this contact force stems from the small form factor of the pins, which is necessary for high spatial resolution. To address this trade-off and enable our display to support larger forces, we developed a global mechanical clutch to clutch all pins in place after positioning via actuation platform/electroadhesive braking. Shown in 2.15(a), the clutch is designed with two main elements: a 3D printed structural frame, and a series of compliant silicone rubber strips (McMaster-Carr 3788T21) which engage the pins. A second linear actuator (Actuonix L12-50-50-6-P) is used to retract the mechanical clutch such that the rubber strips engage the pins and provide significant CHAPTER 2. 2.5D BED-OF-NAILS TACTILE SHAPE DISPLAYS 33

Figure 2.15: (a) Diagram of mechanical clutch design. When engaged, rubber strips clutch all pins in place and allow the surface to sustain higher contact forces. (b) Assembled mechanical clutch and tactile display. Two linked clutches are used (top and bottom) to avoid asymmetric loading. additional friction and holding force. To prevent asymmetric loading on the pins, two identical mechanically coupled clutches are used on the top and bottom of the pin array.

Evaluation of Mechanical Clutch

We experimentally evaluate the additional contact force sustained by our mechanical clutch. We measured the maximum contact force supported by all 8 pins engaged at once for different mechanical clutch displacements. In this evaluation, only the mechanical clutch was engaged; no electrostatic adhesion was used. The results of this experiment are shown in Fig. 2.16. A linear fit of the measured data yields a slope of 52.5 gf/mm, an intercept of -12.7 gf, and an R2 value of 0.9852 indicating that our clutch can be modeled reasonably well as a simple spring. That is, the maximum contact force sustained by the clutch can be estimated as linearly proportional to its displacement. From these results, we observe that the mechanical clutch yields a maximum contact force of 211 gf when engaged with 4 mm displacement. The addition of this force on top of that supplied by the electrostatic adhesive brakes helps ensure the tactile display can support contact loads from the fingertip during haptic exploration (51 gf of contact force is sufficient for small feature exploration according to [127]), and can especially help combat issues with electroadhesive force degradation due to charge accumulation that are present with the electrostatic adhesive brakes alone. CHAPTER 2. 2.5D BED-OF-NAILS TACTILE SHAPE DISPLAYS 34

Figure 2.16: Measured maximum contact force sustained by mechanically-clutched pin and corre- sponding linear fitting of the measured data.

2.2.4 System Refresh Rate Analysis

Considering the rendering workflow detailed in Section 2.2.1, if a refreshable tactile display has n pins, then there at most n different pin heights in a given rendering. As experimentally characterized in Section 2.1.6, the engagement time of a single electrostatic adhesive brake is approximately 6.7 ms when using 335 V. Thus, the display would require at least 6.7 × n ms to engage all pins, not considering travel time between pin heights. In the case of our prototype display, this value is 6.7 ms × 8 pins = 54.6 ms. As experimental results showed in Section 2.1.6, the electrostatic adhesive brakes can be engaged without the actuation platform coming to a stop (11-31 mm/s tested). Meaning, to render any surface the actuation platform would simply need to travel from its uppermost to its lowest position, with the pins being electrostatically clutched individually at their appropriate heights (see Fig. 2.11); hence, the speed of the actuation platform is the dominant factor determining the overall refresh rate of the display. Considering a dynamic height range of 50 mm for a given display, a platform linearly actuated at 30 mm/s could render a surface in 1.67 s. To refresh the display, the platform must move up another 50 mm, corresponding to another 1.67 s. Thus, the total rendering time for any arbitrary surface is considered to be 3.33 s, yielding a refresh rate of 0.3 Hz. A faster linear actuator and reduced dynamic range would further increase the refresh rate of the display at the cost of rendering accuracy and fidelity (e.g., 2 mm dynamic height with 50 mm/s actuator speed will increase refresh rate to 12.5 Hz). According to [136], for applications involving static refreshable tactile displays users typically take a few minutes to explore the entire display, thus around 10s [136] refresh rate is sufficient (note that our refresh rate does not scale with number of pins). From our CHAPTER 2. 2.5D BED-OF-NAILS TACTILE SHAPE DISPLAYS 35

refresh rate analysis, our system can readily satisfy the requirements of a static refreshable display.

2.3 User Study

We carried out a user study to evaluate our prototype display’s effectiveness in representing high resolution 2.5D shapes. As described in [136], static refreshable tactile displays can substitute for nonrefreshable haptic graphics in education, various professional activities, and entertainment applications. Our goal was to determine whether users using our static refreshable tactile display can achieve similar shape recognition performance as with nonrefreshable haptic graphics (i.e., 3D printed patterns). We also aimed to verify that the display is capable of handling the contact forces imparted during haptic exploration. Rather than exploring new areas of haptic perception, our study is mainly carried out to validate the robustness and performance our EA brake-based tactile display. A total of four shape patterns were tested. A set of static 3D printed shapes with identical spatial resolution as our display was used as a control for comparison. The shape patterns, display renderings, and 3D printed models used are shown in Fig. 2.17(a).

2.3.1 Participants

We recruited 6 participants age 20-26 (M = 23, 1 female), five right-handed and one left-handed for this experiment. Participants were compensated at a rate of 15 USD an hour, and with the experiment generally lasting less than an hour.

2.3.2 Materials

Our 4 × 2 prototype shape display was used in this experiment, mounted on a desktop. 3D printed proxy shapes were mounted onto the display during the appropriate conditions. A visual barrier prevented users from seeing the display or their hand during interaction. Images of the four shape patterns (Fig. 2.17(a), first row) were displayed on a board visible to the user at all times.

2.3.3 Procedure

At the start of each study, a short training session was carried out familiarize users with the sensation of tactile shape exploration using the device and 3D printed shapes. During each condition, four shapes were presented to the participant a total of 20 times in a randomized order. Thus each participant experienced 20 repetitions × 2 conditions = 40 trials. In each trial, the designated shape was either rendered using the display or the experimenter physically mounted the corresponding 3D printed shape to the display, depending on the condition. When rendered, the participant was instructed to feel the shape and give their best guess as to which of the visible shapes they were touching. Participants were given a short break in between the two conditions. To counteract CHAPTER 2. 2.5D BED-OF-NAILS TACTILE SHAPE DISPLAYS 36

any learning effects, half of the studies were performed with the tactile display condition first and the other half with the 3D printed proxies condition first, with all participants performing both conditions.

2.3.4 Results & Discussion

The results of the study are shown in Figure 2.17(b). We observe 95.83% shape recognition accuracy (SE 2.39 %) for the tactile display condition, and 96.67% shape recognition accuracy (SE 1.05 %) for the 3D printed shape condition. No significant difference between shape recognition in the tactile display and 3D printed cases (p = 0.7412); this is intuitive as both the display and the 3D printed shapes have identical spatial resolution. The primary difference between the two cases is in the amount of contact force they can sustain without deformation. The 3D printed shapes are made of PLA, which is a generally durable and hard material – thus, they can sustain considerable loading and did not deform at all during the study. Conversely, our tactile display can support roughly 75 gf per pin considering clutching force from both the electroadhesive brake (50 gf at 294 V) and mechanical clutch (25 gf with 4 mm displacement). However, the comparable shape recognition accuracy between the two suggests that the tactile display was able to retain its rendered shape without significant deformation during haptic exploration. The larger standard deviation for shape recognition in the tactile display case highlights the impact of individual differences in haptic exploration practices (e.g., how much contact force is applied) and how they may be relevant when designing tactile displays. In both conditions, participants did not achieve 100% accuracy likely due to the fact that they were feeling physically rasterized versions of the shapes they saw (as illustrated by Fig. 2.17(a)), which are likely easier to confuse. Although temporary brake failure due to overloaded user contact force may jeopardize tactile display performance, we only observed a small (0.8 %) accuracy difference between rendered shape and 3D printed shape. No other obvious pin displacements were found. The participants’ response times were also comparable in both conditions (p = 0.4891), as shown in Fig. 2.17(b). As mentioned in Sec. 2.3, the purpose of this study is to verify robustness and performance of our EA brake-based device as a functional tactile display. Correctness rate and response time comparable to 3D printed shapes, as shown in Fig. 2.17, demonstrate our device’s ability to repeatably render patterns correctly.

2.4 Limitations & Future Work

The most salient limitation of the proposed electroadhesive brake is the magnitude of contact force it can support. As shown by Fig. 2.5, an individual brake can support ≈50 gf at 300 V. As our prototype tactile shape display has 8 pins, it can ideally support ≈400 gf if the load is equally distributed between pins. We demonstrated a moderate improvement to this load capacity with the CHAPTER 2. 2.5D BED-OF-NAILS TACTILE SHAPE DISPLAYS 37

Figure 2.17: (a) Shapes rendered in the user study. Top: images shown to user during the study. Middle: corresponding 2.5D shapes rendered by our tactile display. Bottom: 3D printed shapes used as a control. (b) Results of the study comparing shape recognition rates and response times between the tactile display and 3D printed conditions. Error bar depicts standard error in results. CHAPTER 2. 2.5D BED-OF-NAILS TACTILE SHAPE DISPLAYS 38

addition of a global mechanical clutch, which provides another 200 gf of contact load support; thus, if contact is equally distributed, the prototype display can support ≈600 gf, which is more than sufficient for small feature exploration according to [127]. In practice, any occasional pin slippage can also be corrected by refreshing the display. Future work should investigate pin displacement during haptic exploration to confirm the usable resolution in the x-axis of the pin display. Our proposed brake mechanism also has significant voltage requirements (≈300 V). However, since current requirements are very low (less than 60 µA/pin), power consumption by the system is considerably small(≈ 18 mW/pin). Furthermore, insulating caps are placed on each pin to ensure user safety when interacting with the system. The primary limitation of any brake-based tactile display is that shape rendering can not be dynamic; to change the height of a single pin, the entire display must be refreshed. While there are many applications in which a refreshable display is sufficient, such as tactile information displays for the visually-impaired and passive haptic feedback in virtual reality, the ideal tactile shape dis- play would support dynamic rendering, allowing rendered contact to react to user interaction (e.g., pressing a rendered button) and further decreasing rendering time. Our current design relies on gravity to set and reset pins. This design reduces the structural complexity of our system but also limits the system refresh rate since a pin’s acceleration during the refreshing process is bounded by gravitational acceleration. The other limitation is that the device must be oriented such that gravity is aligned with the travel of the pins. However, these limitations may be overcome in future work by developing dynamic control for individual pins in an electroadhesive brake-based display, potentially through the use of multiple sets of interdigital electrodes per pin. Another potential solution is the development of electrostatic linear motors for individual pin control, such as inchworm motors [149]. However, the low output force of these actuators may limit the system’s capabilities as a tactile display. To meet the practical requirements of most real-life tactile applications, scaling the display up from the current 4 × 2 array (to, say, 100 × 100) is essential. Replacing manual assembly with a repeatable automated process would largely solve misalignment issues and improve functional consistency. With a larger display, however, the global mechanical clutch design may also need revision to prevent uneven distribution of clutch force. One potential solution is to move from parallel rubber strips to a grid, providing interstitial anchor points between groups of pins. Implementation of both an automated assembly process and a grid-like global clutch design should be experimentally explored in future work. Although we characterized the maximum contact force supported by a high resolution version of our EA brake-based display in this thesis, we have not yet built a larger scale high-resolution tactile shape display. Since everything will be further miniaturized in this design, our future work will investigate how to fabricate an EA brake structure with 100 µm accuracy. Nanofabrication methods (i.e., lithography) used in Micro-Electro-Mechanical Systems (MEMS) research will be introduced CHAPTER 2. 2.5D BED-OF-NAILS TACTILE SHAPE DISPLAYS 39

to replace our current fabrication methods (i.e., laser cutting, milling) Another possible direction is integrating direct sensing of pin heights which can be used to determine if a pin has slipped or was misattached. One approach could be to engineer the system such that the contact area between the pin and electrodes is proportional to its height; pin height could then be sensed directly by measuring the capacitance of the electroadhesive brake.

2.5 Conclusion

This chapter presents the modeling, design, and characterization of a low cost, small form-factor, lightweight, low power electroadhesive brake. We evaluated the proposed brake’s load capacity, engagement time, robustness, and residual force characteristics. To demonstrate the use of our electroadhesive brake in high resolution tactile shape display applications, we developed a 4 × 2 electroadhesive brake-based prototype display with 4 mm interpin spacing and 3.2 mm interrow spacing. We detailed the system workflow of our electroadhesive brake-based display, and analyzed the overall refresh rate of the device. We also detailed the addition of a global mechanical clutch to further increase the display’s contact load capacity. Lastly, we presented the results of a user study evaluating users’ shape recognition accuracy when interacting with our tactile shape display. Through our investigations, we believe that electrostatic adhesive brakes have demonstrated significant potential for improving the spatial resolution and lowering the cost of static refreshable 2.5D tactile shape displays. Based on our results here, we can see there is promise for electroadhesive- brake-based tactile shape displays. While in this chapter we mainly looked into bed-of-nails tactile shape displays which rely on the movement of an array of pins to render a 2.5D shape, we will explore brake-based formable-crust tactile shape displays in the next chapter which have their own unique advantages compared with the bed-of-nails tactile shape displays. Chapter 3

Electrically Programmable Auxetic Materials for 2.5D Formable-crust Tactile Shape Displays

In Chapter 2, we investigated bed-of-nails tactile shape displays in which each pin serves as a pixel to collectively form a physical shape. Although bed-of-nails type devices are more straightforward to control by setting individual pins’ positions, the family of the shapes that can be rendered by the bed- of-nails shape display is limited by the stroke of the pins. On the other hand, Formable-Crust type shape displays, as described in [10, 117], are a deformable matrix that acts like a piece of cloth. With a repetitive structured pattern, the Formable-Crust can produce structures by allowing propagation of forward kinematics. Although Formable-Crust type displays have lower Data Resolution (DR) than Bed-of-Nails shape displays, they have advantages in terms of the Surface Freedom Measure (SFM) and the Freedom from Ground Measure (FGM) [9], which indicates that the surface points have more degrees of freedom and the structure has a larger range of motion. In this chapter, we will investigate formable-crust tactile shape display. Most of the state-of-the-art 2.5D shape displays are driven by an array of motors due to their wide commercial accessibility [57, 88, 37]. Researchers have also explored other actuating technologies for shape displays including shape memory alloy [111], pneumatics [130] and magnetic fluid [139]. However, as the size of the shape display scales, the number of the actuators needs to scale with the number of the pixels in the shape display, which significantly increases the mechanical complexity as well as the cost of the system. Instead of actuating every pixel of the shape display in real time, brake-based shape display uses a global actuator to set and reset the entire pattern. The brake on each pixel engages at specific time during the shape rendering process to control its position. While brake-based shape displays are less interactive than actuated shape displays, they have the

40 CHAPTER 3. 2.5D FORMABLE-CRUST TACTILE SHAPE DISPLAYS 41

advantages of lower system complexity, smaller form factor and lower cost thanks to the simplicity of the brakes compared with the actuators. Different brakes mechanisms have been applied to shape displays including mechanical brakes [16], fusible alloy brakes [108] and electrostatic brakes [153]. Based on the above advantages, we believe it is meaningful to investigate brake-based Formable- Crust shape displays. Due to the unique characteristics of Formable-Crust type shape displays, selecting proper construction materials is an essential step towards building such a 2.5D shape display. First introduced in 1987 [83], auxetic materials are of broad interest because of their novel characteristics of negative Poisson’s ratio (ν). In contrast to ordinary materials which become thinner when stretched, the spatial arrangement of auxetic materials’ internal structure allows them to expand under stretching [1, 147]. Due to this feature, 2D materials constructed with a matrix of auxetic patterns can approximate conformal (doubly curved) surfaces. The first step of building an auxetic shape-changing interface is approximating a curved surface with the auxetic materials. Konakovi´cet al. [77] introduced a design tool for the above purpose. However, since the fabricated auxetic structures are a perfectly regular lattice, they can easily deform into an infinite family of other surfaces if there are no guiding structure for the deployment. In their work, the pattern also needs to be aligned carefully with the guiding structure and manually deformed for the curved surface. To overcome the above obstacles, a follow-up work [78] encodes the target shape into the auxetic pattern by using a spatially varying structure instead of a regular lattice. The target shape can also be deployed by inflation or gravity, which is much less laborious than [77]. Although the method proposed by [78] can approximate a target curved surface by auxetic pattern, it needs to be preprogrammed and fabricated, and it can not deform in real time to serve as a shape-changing interface. To encode another target curved surface, new auxetic materials need to be fabricated. Developing reconfigurable auxetic materials for real-time shape-changing applications still remains a challenge in the field. In this chapter, we propose to investigate brake-based Formable-Crust 2.5D shape displays en- abled by auxetic materials. A shape is rendered by locking a specific set of faces of the auxetic materials in two different ways while inflating the overall structure with a thin membrane. Here the programmed auxetic material constrains the inflated shape. To investigate the characteristics of the auxetic 2.5D shape display, a simulation model was constructed using software with a physics engine that approximates the system as a particle system. The meaningful design parameters of the 2.5D auxetic shape display and the metrics to evaluate the rendered shapes are defined in Sec. 3.1.2. We swept the design parameters of the 2.5D auxetic shape display in the simulation model and assessed its shape rendering abilities using the defined evaluation metrics. To render a target shape with the relatively complicated kinematics of the auxetic materials, we developed neural-network-based algorithms to control the faces of the auxetic materials by two dif- ferent mechanisms. A passive auxetic 2.5D shape display prototype was built to demonstrate the simulation results in a real system. The required locking force for the auxetic 2.5D shape display CHAPTER 3. 2.5D FORMABLE-CRUST TACTILE SHAPE DISPLAYS 42

are also discussed in this chapter.

3.1 Modeling of Auxetic 2.5D Tactile Shape Display

3.1.1 Construction of Simulation Model

To investigate the design space of the auxetic 2.5D shape display, we need a way to simulate the system using a physics based simulation together with the kinematic constraints provided by the structure. To achieve this we built a simulation model adapted from [38] using the commercial software package Rhinoceros (abbreviated as Rhino [115]). Rhino is a 3D modeling tool that can create and analyze surfaces, point clouds and meshes. Since auxetic materials are relatively compli- cated 3D structures, we used the Grasshopper [43] plugin for the Rhino software in our simulation. Grasshopper provides us with a graphical algorithm editor where we can use both the built-in com- ponents and scripts to describe the position, size, local features and orientation of the complicated 3D structures. To simulate the physics in the auxetic 2.5D shape display, we used the Grasshopper add-on physical engine Kangaroo which abstracts the complicated 3D structures as an aggregation of particles. A particle is modeled as a point with mass but no physical size. Newtonian mechanics are applied to evaluate the position and velocity of the particle. A similar approach was used by Friedrich et al. [38]. The auxetic 2.5D shape display model we constructed is shown in Fig. 3.1. The auxetic material is made of equilateral triangles arranged in a lattice structure. Each of the triangle is connected with three neighbouring triangles with kinematic linkages. We include a frame in the auxetic 2.5D shape display to clamp the auxetic materials on the side. A membrane (Fig. 3.1(b)) can be inflated with air to further drive the auxetic materials. The system is reset as a flat plane when air pressure is low (Fig. 3.1(a)). A 2.5D shape can be rendered when some triangles are anchored to the base plate while the other auxetic materials are free to move and the entire structure is inflated (Fig. 3.1(b)). The stretchability of the auxetic materials originates from the deformation of the macrostructures rather than the constituting materials [77]. We can observe from our simulation model that the openings between the triangles are wider after inflation (3.1(b)). At the mean time, the triangles maintain their original size which follows the nature of the auxetic materials. The Grasshopper plugin we used enables us to automatically sweep different parameters of the simulation model including number of the hexagons, the ratio of the openings between neighbouring triangles in the initial state, the size of the shape display, the anchor points locked to the ground plate and the inflated pressure in the membrane. Some of the parameters above (e.g. number of triangles and the opening area) are essential to design an auxetic 2.5D shape display. We will investigate these parameters in Sec. 3.1.2. We can also characterize the rendered shape by our simulation model. For a 2.5D shape display, metrics of shape rendering include amplitude of the shape, curvature of the shape, spatial resolution of the device, and stretchability of the structures CHAPTER 3. 2.5D FORMABLE-CRUST TACTILE SHAPE DISPLAYS 43

(a) (b)

Auxetic Materials

Anchor Points

Membrane

Figure 3.1: (a) Auxetic 2.5D Shape display before inflation. (b) Inflated auxetic 2.5D shape Display. Some of the triangle vertices being anchored in order to render the shape.

[118]. We will evaluate the correlation between these metrics with the design parameters in Sec. 3.1.2. Controlling the auxetic materials to render a target shape is the fundamental to the functional- ity of the 2.5D auxetic shape display. Essentially, our ultimate goal is to determine the correlation between the anchored triangle vertices with the target shape. While our simulation can only do the forward simulation where we set the anchored vertices of the auxetic material and the simulation model will calculate the rendered shape, the inverse problem is both more complicated and compu- tationally expensive. To address the issues of runtime computational cost and speed, we approach the inverse problem using a neural network (NN) to learn the relations between the target shape and state of each cell, described in Sec. 3.1.3. Although training the NN is expensive, it is cheap and fast to run the NN to predict the states of the cells for a given target shape. To collect the necessary dataset to train the NN model, our Grasshopper simulation model can automatically run thousands of forward simulations and save the corresponding anchor points positions and the rendered 2.5D shape. The trained NN model is able to predict the anchor points given the target shape.

3.1.2 Design Space of Auxetic 2.5D Tactile Shape Display

In this section we discuss the meaningful design parameters to build an auxetic 2.5D shape display. The metrics to evaluate the rendered shape will also be discussed. We will use our simulation model to investigate how these design parameters will determine the shape rendering performance of the auxetic 2.5D shape display.

Meaningful Parameters

As we mentioned in Sec. 3.1, the parameters that can change the rendered shape include number of the hexagons, the ratio of the openings in the auxetic material in the initial state, the size of CHAPTER 3. 2.5D FORMABLE-CRUST TACTILE SHAPE DISPLAYS 44

(a) (b) Number of Hexagons Ratio of the Openings

Figure 3.2: Meaningful parameters for the design of an auxetic 2.5D shape display. (a) Number of hexagons. Red area depicting an example of one hexagon. (b) Ratio of the openings in the auxetic material. The auxetic material on the left has smaller ratio of the openings than the auxetic material on the right. Red region depicting an example of one opening area. the shape display, the anchor points locked to the ground plate and the pressure in the membrane. Number of the hexagons (Fig. 3.2(a))is an important design parameter since it will determine the resolution and potentially the curvature of the rendered shape. The ratio of the openings is also a meaningful parameter since it decides the “stretchability” of the auxetic material [38], thus controlling the maximum amplitude of the rendered shape. Fig. 3.2(b) shows two pieces of auxetic materials with different ratio of the openings. Red region indicates an example of one opening area.

We define ratio of the openings as Aopenings/Atotal where Aopenings is the total opening area and

Atotal is the total area of the auxetic material. In Fig. 3.2(b), the auxetic material on the left has smaller ratio of the openings than the auxetic material on the right. The size of the shape display can be represented by the number of the hexagons since the auxetic material can scale within a reasonable range. The anchor points and the pressure can be changed in real-time, so they are not considered as design parameters and instead are the free parameters used to create a target shape on any given shape display. Other possible parameters include the strength of the membrane and the strength of the linkages between the faces. Since our simulation model mainly focus on the kinematic analysis of the auxetic shape display, the detailed investigation of the material properties still remain as a future work for this project. Thus, the meaningful parameters we studied in this section are the number of the hexagons and the ratio of the openings in the auxetic material.

Metrics of Shape Rendering

Roudaut et al. [118] introduce a general metric to compare different shape changing devices. Of the ten features proposed in the metric [118], amplitude and curvature are most relevant to evaluate our auxetic 2.5D shape display. Amplitude, as shown in Fig. 3.3 (a), describes the range of motion of the rendered shape. It is CHAPTER 3. 2.5D FORMABLE-CRUST TACTILE SHAPE DISPLAYS 45

(a) (b)

Amplitude Curvature Height (mm) Height Base (mm)

Figure 3.3: Metrics of shape rendering. (a) Amplitude of the shape. (b) Curvature of the shape. defined as the displacement between the actuated position and the rest position of a point on the surface of the shape display. While the amplitude of the pin-array based shape display is limited by the stroke of the pins, formable-crust shape display is considered to have a larger range of motion due to the better flexibility of the structures [10]. Thus, it is meaningful to characterize the amplitude of the shapes we can render. Curvature evaluates the sharpness of the features that can be rendered by the shape display. Gaussian curvature or mean curvature are mathematically defined concepts to calculate the curvature of a surface [112]. Since the formable-crust shape displays rely on the forward kinematics of the structures to form the shape, the rendered shape is generally smoother than the shape rendered by a bed-of-nails shape display where a single actuated pin can produce a very sharp peak in the shape. Hence it will be important to asses our auxetic 2.5D shape display’s ability to render fine features by measuring the curvature of the rendered shape. To compare curvature of the rendered shape by auxetic 2.5D shape display with different design parameters, we average the mean curvature within a small box on the top of the shape as shown in Fig. 3.3(b) to obtain the curvature metrics. Other metrics mentioned in [118] are either not very important for the shape display design (e.g. porosity and closure) or relatively straight forward to evaluate (e.g granularity) so they are not discussed in detail in this section.

Simulation and Evaluations

Amplitude: In this section we used our simulation model constructed in Sec. 3.1 to investigate the amplitude of the shape rendered by auxetic 2.5D shape displays with various number of faces and ratio of opening areas. We first changed the number of hexagons forming the auxetic material while fixing the other parameters (including overall dimension). Note that a small portion of faces will be clamped in the CHAPTER 3. 2.5D FORMABLE-CRUST TACTILE SHAPE DISPLAYS 46

(a)

(b)

Figure 3.4: Amplitude of the rendered shape. (a) Auxetic shape displays with different number of hexagons render shape with a similar amplitude. (b) Auxetic shape displays with a larger ratio of openings in the auxetic pattern render shape with a smaller amplitude. frame so that the auxetic material can be fixed on the side. Thus, the actual number of faces used to render the shape will be slightly lower than the original number. As shown in Fig. 3.4(a), auxetic 2.5D shape displays with a different number of hexagons actually render the shape with a similar amplitude under the same pressure. Since the size of the entire shape display is also fixed in the simulation, the size of the hexagons shrink as the number of hexagons increase. Since scaling a piece of auxetic material with the other parameters unchanged has limited impact on its ability to stretch, it is straightforward to understand the above phenomenon of shape displays made with hexagons with different sizes have similar shape rendering amplitude. Compared with number of the hexagons, the design parameter of ratio of openings has more impact on the amplitude of the rendered shape. As shown in Fig. 3.4(b), the auxetic 2.5D shape displays with a larger ratio of openings render shape with a smaller amplitude. This result is consistent with the nature of the auxetic material that it relies on the rotation of the faces around the linkage to expand. Auxetic material with higher ratio of openings possess less potential to stretch since the structure will achieve its fully expanded state faster, thus the rendered shape is constrained to a smaller amplitude. We can also observe from Fig. 3.4(b) that auxetic 2.5D shape displays with CHAPTER 3. 2.5D FORMABLE-CRUST TACTILE SHAPE DISPLAYS 47

(a) Size of Inflated Region Curvature

Radius

(b)

Number of Hexagons

Figure 3.5: Curvature of the rendered shape. (a) The impact of the size of the inflated region on the curvature of the rendered shape is studied. (b) Smaller inflated region renders the 2.5D shape with a larger curvature. Number of hexagons and the ratio of the opening area has a larger influence on the smaller inflated region. a ratio of openings of 0.1 and 0.2 almost have the same shape rendering amplitude. This is because under the air pressure set in the simulation the auxetic materials in the above conditions have not stretched sufficiently to provide the constraint force to the membrane. Thus, the ratio of openings is not the dominating factor in the shape rendering. In fact, the amplitude of the rendered shape is mainly dependent on the strength of the membrane (3.1(b)). Contrary to the above situation, the auxetic 2.5D shape display with a ratio of openings of 0.5 is already has some internal stress causing it to elastically deform and strain, so it provides constraint force to lower the amplitude of the membrane. Based on our simulations we can obtain the conclusion that the number of the hexagons is less relevant while smaller ratio of the openings will be beneficial to the auxetic 2.5D shape display’s ability to render a shape with large amplitude.

Curvature: In this section, we explored the ability of our auxetic 2.5D shape display to render sharp features by measuring the curvatures of the shapes rendered by our simulation model. Besides CHAPTER 3. 2.5D FORMABLE-CRUST TACTILE SHAPE DISPLAYS 48

the number of the hexagons and the ratio of opening areas, a unique variable for characterizing the curvatures of the shapes is the radius of the inflated region since the same auxetic 2.5D shape display will render a shape with a different curvature if the size of the inflated region changes. Since we mainly focus on investigating the design space of the auxetic 2.5D shape display, a small bump, as a very basic structure, was used in our simulation for the comparison purpose. From our preliminary tests, other possible shapes that are rendered by locking faces at specific positions to the ground plate can be considered as an aggregation of a group of small bumps, thus having similar local curvature as the small bumps that form it. The definition of the radius of the inflated region is depicted in Fig. 3.5(a). We first vary the number of hexagons and the inflated regions radius of the auxetic 2.5D shape display while fixing the ratio of the opening areas. From Fig. 3.5(b) we can observe that a smaller inflated region usually renders a shape with larger curvature. Although a larger inflated region tends to render a bump with a larger height, the curvature of the shape is still lower than a small bump. Therefore, the maximum curvature that an auxetic 2.5D shape display is capable of rendering is mainly decided by the curvature of a small bump rendered by the device. In terms of the number of hexagons variable, Fig. 3.5(b) depicts that when the inflated region is small, the curvature has a larger fluctuation with different number of hexagons. This phenomenon can be explained by the specific kinematics of the auxetic materials with different number of hexagons. For example, if a face happens to be rendered at the top of the bump, the measured curvature will be small due to the flat surface of the face. If the vertex of a face is placed at the top or there is a buckling happening in the auxetic materials, the measured curvature will be larger. In contrast, when the shape is rendered by a large inflated region, the number of hexagons does not have a significant impact on the curvature of the shape (Fig. 3.5(b)). The inflated bump can be well approximated by the faces so that there are fewer abrupt features in the rendered shape, thus leading to more constant curvatures. To investigate the impact of ratio of opening areas on the curvatures of the rendered shapes, we fix the number of the hexagons in the simulation model while sweeping the ratio of opening variable and the region radius variable. As we can observe in Fig. 3.5(b), the curvatures increase as the radius of the inflated region decrease which is consistent with our findings in the simulations above. When the inflated region radius is small, the curvature tends to decrease significantly when the ratio of the openings is larger. This trend is due to the limited amplitude of the rendered shape when there is a large opening area. The membrane is constrained by the strain of the auxetic materials. On the other hand, the curvature stays at a more constant and lower level when the inflated region has a larger radius. From the simulation model we found that in this scenario although the amplitude of the rendered shape is still lower when there is a larger ratio of the opening area, the top surface of the rendered shape is very similar to each other when the ratio of the opening area changes. From the above simulations we can conclude that only a small region of an auxetic 2.5D shape display should be inflated when we need to render a shape with sharper features. An auxetic 2.5D CHAPTER 3. 2.5D FORMABLE-CRUST TACTILE SHAPE DISPLAYS 49

shape display should be designed with a specific number of hexagons and a small ratio of opening area in order to render larger curvatures. To determine the desired number of hexagons for the optimal curvature rendering ability, the buckling of the auxetic materials and the location of each faces of the rendered shape should be carefully engineered.

3.1.3 Target Shape Rendering by Auxetic 2.5D Tactile Shape Display

To render a target shape with our auxetic 2.5D tactile shape display, we developed two strategies to control the rendered shape through manipulating the auxetic cells as Fig. 3.6 shows. The first strategy is to lock the auxetic cells to the ground plate (Fig. 3.6(a)). This strategy is also mentioned in Sec. 3.1 as a main approach to modulate the rendered shape. After inflating the system, free cells will be driven by the membrane to render a Gaussian-like bump while locked cells are fixed to the ground plate. The advantage of the plate locking is that it is straight forward to decide the cells to be locked according to a target shape through a simple binary thresholding of the height of the target shape. The plate locking strategy is also effective to render 2.5D shapes with significant features. However, the shape rendered by the plate locking strategy is limited to a Gaussian-like bump which cannot represent more complicated features of a target 2.5D shape. The second strategy, cell locking, is to lock the faces of the auxetic cells with respect to each other (Fig. 3.6(b)), limiting the amount they can locally open. This is an additional locking approach to help improve the plate locking strategy. The most important characteristic of the auxetic material is that the faces can rotate around the linkages when the auxetic material is stretched, thus expanding the entire structure. As Fig. 3.6(b) shows, after inflating the system, the free cells can move along the surface of the membrane while the locked cells generating a constraint force to modify the rendered shape. The advantage of the cell locking is that it can provide a minor local modification to the rendered shape, which can be used together with the plate locking strategy to render a more complicated target shape. The main challenge of applying cell locking strategy to the auxetic materials is that it is nontrivial to decide which cell should be locked to make the rendered shape closer to the target shape. As Fig. 3.6(c) depicts, by combing the plate locking strategy with the cell locking strategy, a rendered shape can have both significant bumps and proper surface curvatures to simulate a target shape. An end-to-end pipeline was developed to render a 3D model with our auxetic tactile shape display as Fig. 3.7 shows. Initially the model was saved as a CAD 3D point cloud. Since our tactile shape display can only render 2.5D shapes, we translated the 3D point cloud to a 2D height map matrix representing the 2.5D shape translated from the 3D model. The translation was carried out by sampling the points on the surface of the 3D model while ignoring the details of the structures under the top surface. Part of the features from the 3D model was not maintained in the 2.5D shape. For example, the features on the other side of the Stanford bunny (Fig. 3.7) would not be CHAPTER 3. 2.5D FORMABLE-CRUST TACTILE SHAPE DISPLAYS 50

(a) Plate Locking: Auxetic Material

Auxetic Material Membrane Inflating Membrane Ground Plate Locked Locked

Ground Plate (b) Cell Locking: Auxetic Material Locked Membrane Auxetic Material Membrane Inflating Locked

Ground Plate Ground Plate

(c) Plate + Cell Locking:

Locked Auxetic Material Auxetic Material Inflating Locked Locked Locked

Membrane Ground Plate Membrane Ground Plate

Figure 3.6: The strategies to control the shape rendered by the auxetic 2.5D shape display by manipulating the auxetic cells. (a) Plate locking: auxetic cells are locked to the ground plate. (b) Cell locking: auxetic cells are locked with respect to each other. (c) Ground locking and plate locking are combined to generate a more complicated shape. shown by the 2.5D shape. This is a limitation of the 2.5D shape representation scheme. The nontrivial part of this pipeline is mapping the 2D matrix representing the 2.5D shape to the 1D vector of the hexagon states. Three states of the hexagons are represented in the 1D vector. 0 indicates that the faces of the hexagon are free to rotate around the linkages, 1 indicates that the hexagon is locked by the plate locking strategy, and 2 indicates that the hexagon is locked by the cell locking strategy. While the plate locking is more straightforward, determining the proper cells to lock by the cell locking strategy mentioned previously is more complicated, requiring the information of the entire target shape. Therefore, we developed a convolutional neural network (CNN) model to determine the overall states of the hexagons, including the plate and cell locking states. The advantages of a neural-network-based model is that it can directly map the complicated target shape to the states of the hexagons, skipping the modeling of the complicated physics of the CHAPTER 3. 2.5D FORMABLE-CRUST TACTILE SHAPE DISPLAYS 51

Target Shape Target Shape Hexagon State Rendered Shape (3D Point Cloud) (2D Matrix) (1D Vector) (2.5D)

CNN + [1 Binary 2 Thresholding 0 . . . 0]

Figure 3.7: The pipeline to render a target shape. A target shape saved as 3D point cloud is firstly converted to a 2D heatmap. Then CNN based model is combined with binary thresholding method to map the 2D heatmap to a 1D vector representing the states of the hexagons. The target shape is rendered by the auxetic shape display by feeding the states of the hexagons to the system. linkages and the kinematics of the auxetic materials which make rule-based algorithms harder to be applied to this pipeline. Although collecting data and training the neural-network model is a time consuming task, once the model is trained it is fast and cheap to predict the states vector of the hexagons from a target shape by the neural-network model. We chose to utilize a CNN model out of all the neural networks models since the local spatial features of the 2.5D shape determines the states of the hexagons. After two convolutional layers, a fully connected (FC) layer was chosen as the last layer of the CNN model so that global information of the 2.5D target shape can also be considered to determine the state of a local hexagon. Specifically, the input 2D matrix has 30 × 30 pixels. The first convolutional layer of the CNN model applies 8 filters with a kernel size of 8 × 8 with the same padding and stride of 1. The second convolutional layer applies a 1 × 1 filter. A flatten and a dropout layer were added before the FC. The output from the FC is a 143D vector which represents the states of the 143 hexagons of the auxetic materials. One-hot encoding was used to represent the three states of the hexagons including free, plate locking, and cell locking. A binary thresholding was applied after the CNN model to further improve the performance. If a certain area of the target shape is lower than a threshold height (i.e. 0.3), we enforced the cell to be locked by plate locking strategy. This step eliminated some false predictions of the CNN model. After obtaining the states of the hexagons, a 2.5D shape was rendered by feeding the states of the hexagons to the auxetic shape display simulation model as shown in Fig. 3.7, Thus, the end-to-end pipeline achieves the goal of rendering a 3D model as a 2.5D shape using the auxetic shape display. To train the CNN model as we mentioned in the above end-to-end pipeline, we ran the forward simulation to collect 3000 data points. To collect a data point, a randomly generated 1D vector of hexagons states is fed to the auxetic shape display simulation model we constructed in Sec. 3.1 to CHAPTER 3. 2.5D FORMABLE-CRUST TACTILE SHAPE DISPLAYS 52

(a)

(b)

Max Height

0

Figure 3.8: (a) Correlation Coefficients were calculated to compare four target shapes with the shapes rendered by plate locking strategy or plate locking combined with cell locking strategy. (b) Heatmaps were generated to compare the original four target shapes with the shapes rendered by our shape rendering methods. render a 2.5D shape. When generating the random 1D hexagons states vector, we set the chance of a hexagon being in free state as 50%, in plate locked state as 25%, and in cell locked state as 25%. This bias was to improve the exposure of the CNN model to more complicated rendered shapes. The rendered shape and the 1D hexagon states vector were collected as the training dataset. After developing and training the target shape rendering pipeline, we evaluated its performance with four target shapes including a Stanford bunny, a pear, a donut, and a car (Fig. 3.8. Two locking strategies were used when rendering the target shape include plate locking only, and plate locking combined with the cell locking. Note that under plate locking only mode, we did not use the CNN model and only used the binary thresholding to calculate the hexagon states since it is more straightforward to figure out the kinematics in the plate locking only mode. In the plate and cell locking mode, CNN model was used before the binary thresholding to figure out the kinematics of the auxetic materials. The correlation coefficient r defined by [13] and was used in a similar 2.5D target shape rendering system [131]. We used this correlation coefficient to evaluate the correlation between the rendered 2.5D shape ant the target 2.5D shape. r is calculated by the following equation [131]: CHAPTER 3. 2.5D FORMABLE-CRUST TACTILE SHAPE DISPLAYS 53

P P (Aij − A)(Bij − B) i j r = (3.1) r P P 2 P P 2 ( (Aij − A) )( (Bij − B) ) i j i j where A and B are two 2.5D shapes, Aij and Bij each represents a pixel of shape A and shape B. A and B are average values of shape A and shape B. While r = 0 indicates A and B have no correlation, r = 1 indicates that A and B are exactly matched, and r = −1 indicates that A is an exact inversion of B. From Fig. 3.8(a) we can observed that shapes rendered by both the strategies have a correlation coefficient larger than 0.7 which indicates the effectiveness of our target shape rendering methods. Fig. 3.8 depicts the four target shapes, shapes rendered by the plate locking strategy, and the shapes rendered by the plate and cell locking strategy. Combining Fig. 3.8(a) and Fig. 3.8(b) we can find that the binary thresholding step helped to determine the boundary of the rendered shapes by enforcing some of the cells to be plate locked. Therefore, in both locking modes, the rendered shapes achieved a similar overall boundary lines as the target shapes. As we can see from Fig. 3.8(b), the plate locking strategy has the limitation that the rendered shapes are composed of Gaussian-like bumps. This is due to the nature of the inflated membrane, which essentially has a strain energy minimizing characteristic [6]. However, many target shapes are more complicated than a simple Gaussian-like bump. For example, the Stanford bunny has some muscles around its leg which cannot be rendered by the plating locking method. It is possible to mitigate this problem by using cell locking since it can make minor modifications to the rendered shapes. As shown by the car shape in Fig. 3.8(b), the target shape has a bump in the center of the car while the shape rendered by the plat locking method generated a bump near closer to the front of the car. By using the plate and cell locking strategy, we can see that the location of the bump in the rendered shape is more similar as the target shape. This observation was also verified by the correlation coefficient calculations (Fig. 3.8(a)). The plate and cell locking strategy achieved a larger correlation coefficient (0.90) compared with the plate locking only strategy (0.89). However, for the other target shapes, the plate and cell locking strategy performed worse that the plate locking only strategy. This is likely a limitation of the current CNN model. Since we only collected 3000 data points, limited by the time required to run the simulations, the CNN model was relatively simple and compromised in its prediction accuracy. Thus, the rendered shape was not able to reproduce some of the features in the target shapes. From the above results we can conclude that cell locking strategy has the potential to improve the target shape rendering accuracy if a more powerful CNN model is trained with more data points. Another limitation of the current shape rendering pipeline is the limited shape space that can be rendered due to the limited kinematics of the auxetic 2.5D tactile shape display. For example, sharp features with curvatures larger than that in our simulation results in Sec. 3.1.2 will be hard to render. Furthermore, since our auxetic tactile shape display can CHAPTER 3. 2.5D FORMABLE-CRUST TACTILE SHAPE DISPLAYS 54

only render 2.5D shapes, our shape rendering pipeline will not be able to render 3D shapes with overhanging structures.

3.2 Prototyping of Auxetic 2.5D Tactile Shape Display

We built an experimental auxetic 2.5D tactile shape display prototype to show the feasibility of applying the novel concept of auxetic formable-crust shape display in a real system. The prototype was also used to verify the validity of the two locking strategies and to qualitatively compare with the simulation results. Note that in this prototype the locking strategies were achieved by passive components instead of electrically programmable components. Although the passive components have the disadvantages of not being able to change the rendered shape in real time, they are sufficient to achieve the above goals of this experimental prototype. Fig. 3.9 depicts the passive auxetic shape display prototype we built. We made the auxetic material by a leather sheet. The advantage of a leather sheet is that comparing with other common materials that can be used to fabricate a piece of auxetic material, such as Delrin or metal sheets, leather sheet can be easily patterned by a laser cutter. Compared with fabric and felt sheet, leather sheet can support higher stress. The disadvantage of a leather sheet is that it is usually thick. In this prototype, a 5/64-inch-thick leather sheet was used after our empirical tests of various materials including Delrin, felt, fabric, and leather. The thickness of the leather sheet causes an perceivable empty area in the openings of the auxetic material which brings additional and usually unwanted tactile sensation to the user. The membrane of this prototype was made by a LDPE thin film. While silicone sheets and LDPE thin films are two common materials to serve as the membrane [78], we used LDPE thin film in our setup in order to limit the pressure required to inflate the system to render a 2.5D shape. The LDPE film dimensions are chosen to match its fully inflated state and then it is wrinkled to fit underneath the auxetic material. As the device is inflated the LDPE film unwrinkles. As our empirical test showed, if silicone sheet served as the membrane, the 2.5D shape rendered by the prototype usually cannot have significant features due to the large constraint force required to change its shape. This is because the silicone membrane stores much more force in the elastic deformation of the membrane, and with the thin LDPE film we utilize the expansion of the film through unwrinkling. The prototype after inflation is also shown in Fig. 3.9. We can observe that the auxetic material expanded by rotating around the linkages, leaving more openings between the faces. As we described in Sec. 3.1.3, two locking strategies were defined to lock the faces of the auxetic material to render a target shape. In this prototype, used acrylic mask to achieve the plate locking (Fig. 3.10). The faces of the auxetic material and the membrane under the masks were not able to move under inflation, thus equivalent as being locked to the ground plate. To achieve the cell locking strategy, we attached acrylic hexagons to certain hexagons of the auxetic material by adhesives so CHAPTER 3. 2.5D FORMABLE-CRUST TACTILE SHAPE DISPLAYS 55

Acrylic Frame Inflated Auxetic Leather Shape Display Auxetic Materials + LDPE Thin Film

Chamber

Figure 3.9: In the prototype, the auxetic material is made of a leather sheet. The membrane is made of a thin LDPE film. The auxetic shape display after inflation is shown on the right side of the picture.

Plate Locking

Cell Locking

Figure 3.10: An acrylic mask locks the hexagons according to the plate locking strategy. A few acrylic hexagons are attached to hexagons of the auxetic material by adhesives to achieve cell locking. CHAPTER 3. 2.5D FORMABLE-CRUST TACTILE SHAPE DISPLAYS 56

Target Shape Rendering Results Plate Locking Plate + Cell Locking Sim Sim

Exp Exp

Figure 3.11: The results of the target shape rendering methods are feed to the simulation model and the experimental prototype. Plate locking strategy and plate combined with cell locking strategy are compared with the target shape. that the faces of the corresponding hexagons will not be able to open under inflation. After constructing the passive auxetic shape display, we compared the simulation model with the experimental system using one of the target shapes we used in Sec. 3.1.3. As shown in Fig. 3.11, two locking strategies were used including plate locking and plate locking combined with cell locking. The target shape was firstly fed to the target shape rendering pipeline where our models generated the corresponding hexagon states to render the target shape. In plate locking mode, the hexagons were only allowed to be free or locked to the plate. In plate locking combined with cell locking mode, the hexagons can be either free, locked to the ground plate, or locked by cell locking strategy. The hexagon states were then fed to the simulation model to render the target shape. The corresponding results were shown in Fig. 3.11 as simulation results. We also locked the experimental prototype with the same the generated hexagon states using the passive locking methods as we mentioned previously. The corresponding experimental results were also shown in Fig. 3.11. By observing the results in Fig. 3.11 we find that both the simulation and the experimental results can achieve a shape similar with the target shape. In the plate locking mode, both the simulation result and the experimental result had a bump closer to the front part of the car. While in plate locking combined with cell locking mode, the bump shifted towards the back of the car in both the simulation and the experimental results. The above results indicate that the experimental results have a similar trend as the simulation results. Furthermore, by adding the cell locking strategy, the rendered shape was more similar to the target shape than only using the plate locking strategy. Since CHAPTER 3. 2.5D FORMABLE-CRUST TACTILE SHAPE DISPLAYS 57

(a) (b) Plate Locking Cell Locking

Auxetic Material ehexagon Locking Force F Locking Force 1  F2 Membrane Ground

Plate rdisplay

Figure 3.12: (a) Required force to lock a hexagon by plate locking strategy was measured by empirical tests. (b) Required force to lock a hexagon by cell locking strategy was calculated by the model. the main purpose of the experimental prototype was to show the feasibility of the novel concept of the formable-crust tactile shape display made by auxetic materials, we only qualitatively compare the simulation results with the experimental results in this section. To quantitatively compare the simulation results with the experimental results, the material properties of the experimental prototype needs to be integrated into the simulation model. The quantitative comparison will not only show the feasibility of the formable-crust tactile shape display, but also the accuracy of the simulation model compared with a real system.

3.3 Required Force to Lock the Auxetic 2.5D Tactile Shape Display

In this section we will discuss the required force to lock the auxetic materials using the plate locking and cell locking strategies. The locking force was measured or calculated based on a single hexagon of the auxetic material. In our measurements and modeling, the parameters were based on the auxetic shape display prototype we built in Sec 3.2. The edge length of a hexagon of the auxetic material we used in this section is 22.4 mm. The required force will be compared with the locking force that can be provided by electroadhesive brakes to explore the feasibility of utilizing electroadhesive brakes to achieve the two locking strategies we mentioned in Sec. 3.1.3. Note that if the parameters of the auxetic shape display change, the results and conclusion of this section may also change. To obtain the plate locking force, we experimentally loaded some mass to a hexagon of the auxetic shape display before inflating the system. The locked hexagon is shown in blue (Fig. 3.12(a)). If CHAPTER 3. 2.5D FORMABLE-CRUST TACTILE SHAPE DISPLAYS 58

the mass was below a threshold, the hexagon beneath the mass will leave the ground plate after inflation. We gradually increased the mass until the hexagon was able to stay attached to the ground plate after inflation. The minimum amount of mass to lock the hexagon was recorded. The same experiment was conducted in the center, near the edge, and in the corner of the auxetic shape display. The corresponding plate locking force measured was 2.6 N in the center, 1.8 N near the edge, and 1.4 N near the corner. The edge length and the total area of a hexagon of the auxetic material we used was 22.4 mm and 1304.6 mm2. The air pressure we measured at the gauge was 3 psi above atmospheric pressure. However, since the auxetic shape display was built for demonstration purpose, the seal of the chamber was not optimized for this prototype. Therefore, the chamber pressure was lower than the pressure at the gauge. From the measured plate locking force of 2.6 N in the center of the auxetic shape display and the total area of a hexagon, the calculated chamber pressure was 0.29 psi above atmospheric pressure, assuming the plate locking force was mainly used to compensate the chamber pressure. Since the locking force needs to be sufficient to lock different areas of the auxetic material, we chose to consider 2.6 N as the required plate locking force for a single hexagon. Note that the measured force was based on our auxetic shape display prototype. The locking force is highly dependent on the pressure and materials. The pattern geometry, thickness, and Young’s modulus of the auxetic material, and the thickness and the Young’s modulus of the membrane will impact the locking force. In Fig. 3.12(b), we calculated the required force for the cell locking strategy. To model the system, we consider the inflated auxetic shape display as a half sphere with a flattened top surface due to the constraint force of the cell locking. We assume the flat region being on the top part of the half sphere for the simplicity of the modeling. However, due to the geometrical symmetry of the sphere, we believe that the locking force should be similar even if the hexagon locked is located at other part of the sphere. Furthermore, since a hexagon is not axisymmetric, we approximated it as a circle that the hexagon inscribes. Thus, we can use 2D analysis to model the cell locking force. The radius of the circle is the length of the hexagon’s edge. The hexagon locked by cell locking strategy is shown blue (Fig. 3.12(b)). The locked hexagon is under the pressure of the chamber. Thus, the other part of the membrane will provide a force to compensate for this pressure, indicated as F2 in Fig. 3.12(b). We assumed the same chamber pressure for the plate locking and cell locking force calculation. Since the plate locking force measured above can be approximated as the air pressure on a hexagon, F2 is considered the same as the plate locking force 2.6 N under this approximation.

Since the membrane stays at an angle θ relative to the locked cell, a shear force F1 is generated together with F2. F1 will pull the faces of the locked hexagon apart. Thus, to determine the required locking force for cell locking strategy, it is important to calculate F1. The relation between F1 and

F2 is determined by the angle θ. From our assumption that inflated auxetic shape display is a half sphere, cos(θ) = ehexagon/rdisplay where ehexagon is the edge length of a hexagon and rdisplay is the radius of the tactile shape display (Fig. 3.12(b)). Our tactile display has a radius of 133.4 mm. Thus, CHAPTER 3. 2.5D FORMABLE-CRUST TACTILE SHAPE DISPLAYS 59

◦ F1 the θ can be calculated as 80.3 . Using the θ value, F2 value of 2.6 N, and the relation tan(θ) = , F2 we can calculate F1 to be 15.2 N. Since F1 is shear force, we converted it to normal locking force to compare with plate locking force. A constant of 1.5 is used to calculate normal locking force by

Flocking = 1.5 ∗ F1. Since we are going to explore whether electroadhesion mechanism can be used to provide the locking force, the constant 1.5 comes from the ratio between normal attractive force and the shear contact force provided by interdigital electrodes [5]. In our calculation, the required cell locking force is 22.8 N. After estimating the required locking force, we calculated the locking force that can be provided by electroadhesive brakes both theoretically and experimentally to explore the feasibility of using electroadhesion mechanism to provide the locking force. To calculate the theoretical electroadhesion force on a hexagon, we used the following equation [148]:

1 V 2 F = −   A a (3.2) 2 r 0 2d where r is the relative dielectric constant, 0 is the vacuum permittivity, A is the total contact area,

Va is the control voltage, and d is the thickness of the dielectric film. We used the constants from the same dielectric film as we used in Chapter 2 since the performance of this film was experimentally demonstrated in our electroadhesive brakes. We assume r = 50,

Va = 350V , d = 24 µm. Thus, the theoretical electroadhesion force for a hexagon is 61.4 N. In addition to theoretically modelling the attractive force provided by an electroadhesive brake, we also scaled the empirically measured shear contact force from Chapter 2 to estimate the potential locking force that can be generated by an experimental system. As Fig. 2.5 shows, interdigital electrodes on dielectric film with 24 µm thickness provides a shear contact force of 0.97 N at 335 V. Since a hexagon in our auxetic material is much larger than an electroadhesive brake in Chapter 2, after scaling the force proportional to the total area of the brake, the potential shear contact force provided by a hexagon is 13.1N. Using the constant 1.5 which is the ratio between the normal attractive force and the shear contact force provided by interdigital electrodes [5], the experimental locking force provided by a hexagon is 19.9 N. As a summary, Fig. 3.13 compares the force required to lock a hexagon by plate locking or cell locking strategy. The theoretically calculated and experimentally measured forces that can be provided by an electroadhesive brake are also listed for a comparison. From Fig. 3.13 we can observe that theoretically calculated electroadhesion force will be sufficient to support plate locking and cell locking. Experimentally measured electroadhesion force is larger than required plate locking force but slightly lower than the required cell locking force. However, there is still room to further increase the experimentally measured electroadhesion force. For example, in our measurements in Chapter 2 Fig. 2.5(c), there is a trend that the electroadhesion force can still be larger if a larger CHAPTER 3. 2.5D FORMABLE-CRUST TACTILE SHAPE DISPLAYS 60

Plate Locking Required Required Cell Locking Required Theoretical Electroadhesion Provided Experimental Electroadhesion 0 10 20 30 40 50 60 Force (N)

Figure 3.13: The force required to lock a hexagon by plate locking and cell locking is compared with the force that can be provided by an electroadhesive brake. voltage is applied. The experimentally measured electroadhesion force was less than the theoretically calculated value partially due to the air gap between the dielectric film and the electrodes. Thus, it is possible to increase the experimental electroadhesion force dramatically by reducing the air gap. In conclusion, from the above modeling, we believe that electroadhesive brakes can potentially be applied to provide the plate and cell locking for an auxetic shape display.

3.4 Limitations and Future Work

One of the major limitations of the auxetic 2.5D shape display is the interactivity of the device. Before the inflation of the system, selected hexagons of the auxetic material need to be locked to the ground plate or itself, thus determining the rendered shape of the auxetic 2.5D shape display. To refresh the rendered shape, the entire system needs to be deflated to its initial state in order to lock another set of hexagons. The rendered shape cannot dynamically change while the system is in the inflated status. The lack of real-time interaction is a common limitation of brake-based tactile displays. To improve the dynamic response of the auxetic 2.5D shape display, a potential direction for future work is adding a few actuators to the system. Although it will compromise the fabrication cost of the device, the extra actuators will be able to modify the rendered shape in real time. The number of the actuators should be much less than the faces of the auxetic materials so that the features of the rendered shape is still dominated by the batchable fabricated auxetic materials. Another limitation of the auxetic 2.5D shape display is the curvature of the rendered shape. Bed- of-nails shape displays usually have the ability to render very sharp features by actuating a single pin while keeping the others in the reset state. However, shape displays based on formable-crust rely on the propagation of the forward kinematics to render the shapes. Since the local features of the rendered shape are decided by their neighbouring features, the surfaces of the shapes rendered by the formable-crust based shape displays are usually smoother (Fig. 3.5) than the bed-of-nails CHAPTER 3. 2.5D FORMABLE-CRUST TACTILE SHAPE DISPLAYS 61

shape displays. To dramatically increase the curvature of the shapes rendered by the auxetic 2.5D shape displays, future work should explore buckling of the faces of the auxetic materials. As Fig. 3.1 depicts, some of the faces rotate around the linkages when there is a large strain applied to the auxetic materials. In our current results, the buckling is considered as noise in the system which can be mitigated by overlaying another layer of smooth material on the top. However, if methods can be developed to model the buckling of the faces and control the buckling in a desired way, the curvature of the rendered shape can be significantly high than the current configuration. Most of our current results are from the modeling and simulation. These results help us determine the design space of the auxetic 2.5D shape display for certain shape rendering evaluation metrics. The current simulation model also provides the ability to render a target shape by locking the faces to the ground plate or with respect to each other. Although we have also built a prototype for demonstrating our idea of constructing an auxetic 2.5D shape display, it is a passive device without the ability to programmatically control the rendered shape. Future work should improve the current setup so that the brakes on each of the faces are electrically connected and controlled by a circuit board. Developing a roll-to-roll fabrication approach to wire the electrical component on the auxetic materials in a batch-fabricalbe way will be an important direction for future work. Many material properties, such as mass of the faces, pressure in the membrane, strength of the membrane, kinematics of the linkages, are currently modeled in a very abstract way with many engineering assumptions. Future work needs to match these parameters with the properties of real materials. Exploring the feasible material space will be very meaningful towards building a functioning auxetic 2.5D shape display. Our simulation model utilizes two mechanisms to change the rendered shape. The faces are either locked to the ground plate or with respect to each other. Although we have modeled the required force for plate locking and cell locking strategy (Sec. 3.3), showing the potential to use electroadhesion brakes to provide the locking force. To establish a functioning auxetic 2.5D shape display, future work should continue in this direction and experimentally build brakes that can meet the requirement of the locking force.

3.5 Conclusions

We proposed a novel type of 2.5D shape display based on auxetic materials with the advantages of low-cost, large displacement range and roll-to-roll fabrication ability. To explore the characteristics of the auxetic 2.5D shape display, we built simulation models using physics simulators that process the model as particle systems. We defined the meaningful design parameters for the auxetic 2.5D shape display and the relevant metrics to evaluate the rendered shape. The constructed simulation model was used to generate various shapes rendered by auxetic 2.5D shape displays with different design parameters. Every rendered shape was assessed against the evaluation metrics we defined, thus providing a guidance for the future design of an experimental auxetic 2.5D shape display. After CHAPTER 3. 2.5D FORMABLE-CRUST TACTILE SHAPE DISPLAYS 62

exploring the design space of the auxetic 2.5D shape display, we developed methods to render a target shape by locking a set of faces of the auxetic materials. Due to the complicated kinematics of the auxetic materials, neural-network-based methods were used to predict the shape while forward simulations were conducted to collect the data. Besides the modeling, we built an experimental pas- sive auxetic 2.5D shape display and compared the rendered shapes qualitatively with the simulation results. To explore the feasibility of using electroadhesive brakes to provide the locking force, we modeled the required locking forces for plate and cell locking strategies and compared the required forces with the force that can be provided by electroadhesive brakes on a single hexagon. In conclusion, we showed the feasibility of building a 2.5D shape display based on auxetic materi- als which have the unique advantages of low-cost, batch fabrication and large range of displacement. In this chapter and the previous, we focused on solving the hardware challenges of tactile displays. In the next chapter we transition to focus on the other core problem we introduced in the introduction (Chapter 1), tactile content authoring tools. Chapter 4

Automatic Generation of Spatial Tactile Effects by Analyzing Cross-modality Features in a Video

As mentioned in Chapter 1, content authoring is another key challenge in providing low-cost tactile displays to a broader audience. I investigated new techniques to address the hardware challenges of 2.5D tactile shape displays in Chapter 2 and Chapter 3 since compared with 2D tactile displays there are more unsolved hardware bottlenecks for 2.5D tactile shape displays. However, in this chapter I choose to use 2D vibrotactile displays as a test bed to develop tactile content authoring methods because vibrotactile displays are well-established and widely used. Additionally, when the tactile content authoring methods I developed are applied to these 2D tactile displays, the algorithmic problems can be decoupled from the problems arising from prototype hardware. As summarized in Chapter 1, vibrotactile displays can be used in many application scenario including reproducing properties of physical objects [81], delivering information [142], and in 4D movie experiences [75]. I focus on investigating tactile content authoring methods for 4D movie experiences because from our analysis of the state-of-art technologies, there are many unsolved challenges in this area as I will discuss below. Furthermore, providing 4D movie experiences are also a very popular use case of vibrotactile displays, especially if proper tactile cues are added [46, 101] As Chapter 1 describes, to create the tactile content for vibrotactile displays, researchers have developed many manual authoring tools [75, 113, 17]. However, manual editing is a laborious process especially for long videos. To address this, researchers have explored various ways to automatically generate tactile effects. Visual saliency calculated from the object’s motion is used to generate spatial tactile content in [73]. This method performs well for simple scenes but due to its lack of understanding cross-modal information, complex scenes with rich audio and visual events cannot

63 CHAPTER 4. AUTOMATIC TACTILE CONTENT AUTHORING 64

be translated in a meaningful way. For instance, generating temporally synchronized tactile effects for a moving truck that honks sporadically is impossible with visual features alone. Audio signals without any visual information (e.g., music or special sound effects) have also been translated to tactile stimuli [86, 20]. However, the resulting haptic content is not spatially aligned with the visual content. While spatial audio channel could be used to generate spatial haptic content, most online videos (e.g., YouTube) do not have the necessary spatial audio information. Our method focuses on automatically generating the haptic content for a spatial vibrotactile display based on the audiovisual information from a video. Contrary to previous work which only utilized either visual or audio channel, I use cross-modality information from both channels to improve the spatiotemporal synchronization between the tactile stimuli and the audiovisual content. This design decision improves the quality of user experience compared with over-emphasizing a single modality. In order to extract audio and visual features, I first use neural networks to pull out diegetic audio signals (sounds from objects that are visible or can be implied from the scene) [104] from the sound track. The diegetic sound is then used to decide the intensities of the tactile stimuli and localize the sound source within the scene [134], which is then mapped to the spatial heatmap of the tactile effects. In order to evaluate the performance and user experience of our cross-modal approach, I con- ducted a user study to compare videos with tactile effects from our method against videos without any tactile stimuli and videos with tactile stimuli generated by a modified version of the visual saliency-based approach from prior work [74]. For our study, I used a custom designed tactile dis- play, which consists of an array of haptic actuators mounted on a backrest of a chair, to display the tactile stimuli to the back of the participants. To evaluate the user experience, I measured the Sensory, Distraction, Novelty and Immersion on a Likert Scale. Videos downloaded from YouTube were used for the study. Based on the study results, I discuss advantages of extracting cross-modal information for automatic generation of tactile stimuli compared to using visual or audio channel alone. The novelty and technical contribution of this work center around its use of both audio and visual content to automatically generate spatial tactile effects. While I used algorithms from prior work[104, 134] to create parts of our pipeline, none to our knowledge have developed a pipeline that utilizes both audio and visual information nor compared its performance to algorithms that only use visual information [74].

4.1 Framework for Automatically Generating Spatial Tactile Effects

In this section, we provide an overview of our framework to automatically generate spatial tactile effects. Our pipeline utilizes both the visual and the audio signals to determine the spatial distribu- tion and the intensities of the tactile effects. As shown in Fig. 4.1, the audiovisual content is first CHAPTER 4. AUTOMATIC TACTILE CONTENT AUTHORING 65

Audiovisual Content

Visual Content Audio Content

Audio Source Separation

Non-Diegetic Audio Diegetic Audio (Discarded) Sound Source Localization

Spatial Distribution of Intensity of Tactile Effects Tactile Effects

Spatial Tactile Mapping

Figure 4.1: Automatic tactile effects generation pipeline uses both visual and audio features to separate diegetic audio signal and determine the location of the tactile stimuli. The intensity of the generated haptic effects is only decided by the diegetic audio signals. separated into the visual and audio content. Since the relation between the nondiegetic sounds (a sound that is not visible on the screen or whose source cannot be implied by action of the film) and the haptic signals is not well-understood [29], we only extract the diegetic sounds from the audio content and discard the non-diegetic audio information. With recent progress in neural-network- based methods, many researchers have applied neural networks to separate diegetic and non-diegetic audio channel [39, 18], among which on/off screen speaker audio separation algorithm developed by Owens et al. [104] is the most relevant to our pipeline and thus is used in our system. The outputs of our pipeline are spatial tactile effects that consist of the spatial distribution and the intensities of the tactile effects. In order to obtain the spatial distribution, we calculate the location of the sound source in the video using an audio-guided visual attention mechanism [134]. This method creates a heatmap representing the probability distribution of the sound source within the scene. Only the diegetic audio obtained from the audio separation algorithm is then used as the input data for this process since non-diegetic audio is not visually present in the scene. The amplitudes from the diegetic audio signal are converted to the intensities of the tactile effects while CHAPTER 4. AUTOMATIC TACTILE CONTENT AUTHORING 66

Video Frames Multisensory Net Mixed Audio

Foreground Audio Mixed Spectrogram u-net

Background Audio

Figure 4.2: The pipeline for separation of diegetic and non-diegetic audio signals [104]. A multi- sensory net is used to fuse the visual and audio features while a u-net further separates the mixed spectrogram to foreground and background audio signals. satisfying the frequency response of the actuator. Finally, we combined the two components (i.e., spatial distribution and intensities) to generate tactile effects at the computed locations with the computed intensities.

4.1.1 Audio Source Separation for Complex Audio Content

Audio-visual source separation is a necessary step to translate audiovisual content to meaningful haptic mappings. There are mainly two types of audio in a video: diegetic and non-diegetic sounds. Diegetic sounds originate from the objects on the screen, like car engine, chainsaw or music instru- ment while non-diegetic sounds include background music and narrator’s voice. Since haptic effects are mostly used to enhance physical events happening within the scenes [29], it is necessary to filter out non-diegetic sounds from the mixed audio channel.

Prior Work

Common approaches for audio source separation, also known as the cocktail party problem, include probabilistic methods [41, 120, 27] and more recently machine-learning-based methods. Researchers have trained neural networks to assign contrastive embedding vectors [52] and create attractor points in high dimensional embedding space of the acoustic signals [23]. Audio source separation tasks being investigated by researchers include multi-talker speech separation [35], multichannel music audio separation [50], and audio-visual source separation which uses visual signal to help understand audio source [104]. CHAPTER 4. AUTOMATIC TACTILE CONTENT AUTHORING 67

Our Implementation based on the State-of-the-Art

Among the audio source separation techniques mentioned above, the model developed by Owens et al. [104] is the most relevant for our application since it allows us to separate diegetic and non- diegetic audio signals. Therefore, we adopted it in our tactile effects generation pipeline. Although this method does not work for background music separation, it can remove a background narrator’s voice from the audio stream and create a clean foreground audio channel. As Fig. 4.2 shows, two neural networks were trained and used in the model. A temporal multisensory net representing the fused visual and audio components of the video was first obtained using a self-supervised training of a 3D multisensory convolutional neural network (CNN) with an early fusion design. The u-net (Fig. 4.2) encoder-decoder with parameters similar to [55] was used to map mixed audio stream to its foreground and background components. The audiovisual features extracted by the multisensory network were then concatenated to u-net so that the visual information in the video can contribute to the separation. The u-net model was trained to separate speech using the VoxCeleb dataset [94]. The separated foreground and background audio stream will initially be generated as spectrogram and further be converted to time domain signals. Since on/off screen audio separation is a broadly studied research field and [104] is a representative state-of-the-art work, we mainly use their open-sourced multisensory net and u-net models [105] as building blocks to separate non-diegetic audio signals and only use the diegetic sound. Although [104] only demonstrated a speech separation use case with an on-screen speaker and an off-screen human speaker, we also found it effective in separating the audio stream from on-screen non-human sounding objects and an off-screen narrator. Thus, we use their method to remove nondiegetic audio signals and process the extracted diegetic audio stream for the next step in our pipeline. Examples of audio source separation results and spatial tactile mappings generated with and without audio source separation are included in the supplemental materials. Next in Sec. 4.1.2, we will translate the diegetic audio signals into spatial tactile mappings by locating the sounding object.

4.1.2 Construction of Sound Source Localization Heatmap using the Au- diovisual Content

Using the extracted audiovisual content without the non-diegetic audio component, we generate the spatial tactile channel. As mentioned in Sec. 4.1, our framework obtains the distribution of the spatial haptic channel by localizing the sounding object in the scene. In this section, we describe the methods to achieve the spatial and temporal sound event localization.

Prior Work

There are many existing approaches to the sound event localization problem. Researchers have investigated sound event detection in an acoustic scene based on Hidden Markov Models (HMM) CHAPTER 4. AUTOMATIC TACTILE CONTENT AUTHORING 68

Vt (1 Frame)

Adaptive Heatmap VGG-19 Attention (1 Frame)

At (1 Second Mono Channel)

VGG-like Network

Figure 4.3: Pipeline for sound source localization in audiovisual content. Visual features are ex- tracted through a VGG-19 network [126] while audio segment is processed by a VGG-like network [53]. Audio-guided visual attention is then used to generate a heatmap showing location of the sounding object.

[51], feed-forward Deep Neural Networks (DNN) [14], and Bidirectional Long Short-Term Memory (BLSTM) [122]. Temporal action localization in videos was also studied by modelling it as a clas- sification problem with each of the temporal sliding windows considered as a candidate [102, 36]. Furthermore, multimodal methods that learn joint representations over multiple modalities (e.g., video and audio) have been proposed. Arandjelovic and Zisserman [3, 4] utilized un- supervised learning to generate cross-modality representations of audio-visual correspondence tasks. They also achieved localization of sounding objects in the images by their learned network. Many recent works have also explored sound source separation and localization by audio-visual represen- tation learning [123, 35, 119, 134].

Our Implementation based on the State-of-the-Art

Among all the sound event localization approaches, we follow the method developed by Tian et al. [134] for our automatic haptic effects generation pipeline. Their model [134] is both open-sourced and is capable of recognizing various sounding objects in the audiovisual scene. This model was originally developed for audio-visual event localization so it consists of five major modules including visual and audio features extraction, audio-guided visual attention, temporal modeling, multimodal fusion, and temporal labeling. Among the five major modules of the pipeline developed by Tian et al., the visual and audio feature extraction module preprocesses the audiovisual content and further feeds the intermediate results to the other modules. The audio-guided visual attention module generates the sound localization heatmap using the extracted visual and audio features. Thus, we modified and used these two modules for our automatic tactile effects generation pipeline. CHAPTER 4. AUTOMATIC TACTILE CONTENT AUTHORING 69

As Fig. 4.3 depicts, three neural networks were utilized in this step. A VGG-19 network [126] pre-trained on ImageNet was used to extract visual features. The original model by [134] sampled 16 RGB video frames from a 1 second video clip and did a global average pooling [91] over the 16 extracted pool5 features to generate one 512 × 7 × 7-D feature map. This will result in the same generated heatmap for the entire 1 second video clip which does not meet our requirement for dynamic response of haptic effects. Thus, we modified the visual feature extraction process so that each frame of a video clip is sent to the VGG-19 network to extract its pool5 feature without averaging the other frames. This will generate a sound source localization map for each frame in a video clip. Then, a VGG-like network [53] pre-trained on AudioSet [40] was used to extract a 128-D audio representation for each 1s audio segment. While using shorter audio clips increases dynamic response of the generated heatmap, it sacrifices accuracy of the downstream functions since a shorter audio clip can only provide limited audio features. The attention function fatt was adaptively learned from the visual feature map vt and the audio feature vector at by the following equation [134]:

k att X i i vt = fatt(at, vt) = wtvt (4.1) i=1 where wt is an attention weight vector representing the probability distribution across k visual regions. wt is further calculated by vt and at. The resulting sound localization heatmap is shown in Fig. 4.3. More examples of generated sound source localization heatmaps are available in the supplemental materials. As shown in Fig. 4.3, the audio-guided visual attention method will only highlight the sound source location (e.g., engine of the car) instead of the entire sounding object (e.g., car). Thus, compared to other prior work [123] which highlights a very large region in the video for sound source localization, [134] provides a better model for spatial haptic use cases. Several random background regions will be attended by this model if there are no audio-visual events happening. For example, if a race car appears in the visual scene but there are no engine sounds in the audio stream, this algorithms will fail to highlight car engine region in the heatmap. This phenomenon demonstrates that the sound localization heatmap is generated by the guidance of the audio signal rather than simply calculating the visual saliency of the scenes. The performance of our method will not degrade if a random sound localization heatmap is generated due to the lack of audio signals. This is because the intensities of our tactile effects translated from audio signal intensities will also be very weak, which will not distract user. Since locating the sound source is also a well-established research topic in the computer vision community with many working models, we mainly trimmed and modified the most relevant model [134] as we discussed above to utilize it as a building block for our automatic tactile effects generation pipeline. This sound localizing module will be fed with diegetic audio that we get from the audio separation module along with the visual frames to generate a localization heatmap of the sounding CHAPTER 4. AUTOMATIC TACTILE CONTENT AUTHORING 70

object at each frame.

Limitations

There are some limitations for the current audio-guided visual attention model [134]. Currently, it only works for 28 events (e.g., Racing car, Dog barking, Chainsaw, etc) due to its relatively small training dataset (4143 videos). However, the model is generalizable and can be extended to a larger number of event categories given the appropriate training dataset. Another limitation is that, when there are multiple sounding objects from the same category in the scene, the current model will have difficulties finding the correct sound source. For example, a race car in the street generates much more sound than an electric car. However, since any types of cars will be classified as the car object by our method, both of the cars may be highlighted in the sound source localization heatmap. Thus, this will lead to a mismatch between spatial haptic effects converted from the generated heatmap and the audiovisual content. However, selecting the correct sounding object among other objects from the same category is also a difficult task for humans if there is only a mono audio channel. The potential effects of this limitation will also be discussed in Sec. 4.2.

4.1.3 Spatial Tactile Effects

In this section, we describe our spatial tactile stimuli generation pipeline based on the diegetic audio stream obtained from the audio source separation method described in Sec. 4.1.1 and the sound source localization heatmap generated through a set of neural networks (Sec. 4.1.2). To compute the intensities of the tactile effects, we use the amplitudes of the diegetic audio signal whereas the heatmap from the sound source localization is used to determine the spatial distribution of the tactile stimuli.

Intensity

As Fig. 4.4 shows, the intensities of the haptic signals are calculated from the amplitudes of the diegetic audio stream. Since haptic actuators have a limited frequency response range, the audio signals are first downsampled with a sampling rate slightly higher than 2Fs according to the Nyquist

Theorem, where Fs is the resonant frequency of the haptic actuator. A commonly used downsampling method is to sample one data point in every n data points in an audio stream s1. However, this downsampling method overemphasizes the low-frequency components in the audio stream s1. In some cases, there can be a significant difference between the downsampled waveform and the original audio waveform leading to a mismatch between the haptic and audio signals that compromises the user experience. Thus, this method is not adopted in our pipeline. Our system downsamples using a “running sum” method [90] which can retain the characteristics of the audio signals. Assuming the original audio stream s1 has a sampling rate of r1 and the target downsampled data stream s2 has CHAPTER 4. AUTOMATIC TACTILE CONTENT AUTHORING 71

Intensities

Audio Stream

Downsampling Haptic Signal

Heatmap

Contrast Down- Stretching -sampling

Moving Average

Contrast Threshold- Stretching -ing

Spatial Distribution

Figure 4.4: The generation pipeline builds the spatial tactile mapping by calculating both the spatial distribution and the intensities of the tactile stimuli. Amplitudes of audio stream after downsampling is translated to the intensities of the tactile stimuli. The sound source localization heatmap is used to determine the spatial distribution of the haptic effects. CHAPTER 4. AUTOMATIC TACTILE CONTENT AUTHORING 72

a sampling rate of r2, we need to convert every n = r1/r2 signals from s1 into one data point in s2.

We first calculate the sum of the absolute amplitude of every n signals in s1, thus obtaining a data 0 stream s1 that has a sampling rate of r2. We further add a minus sign to every other data point in 0 s1 so that we get a data stream s2 with a waveform similar to s1 as Fig. 4.4 depicts. We can observe that although there are minor differences between the original audio stream and downsampled data stream, the overall waveform shape is maintained.

Spatial Distribution Mapping

The distribution of the spatial haptic effects is determined by the sound source localization heatmap. Since this heatmap is a probability distribution of the sound sources rather than an audio intensity map, darker areas in the heatmap do not indicate weaker audio signals in the visual scene. Thus, instead of providing weaker tactile vibrations proportional to the heatmap values in the darker areas, we filter out these dark areas and not render any haptic signals at those regions. To do this, we first carry out a percentile contrast stretching on the sound source localization heatmap (Fig. 4.4) with 1st and 99th percentile in the histogram converted into 0 and 255 while other pixels remapped to 0 to 255. This filtering step is commonly used in the image processing pipeline to boost the image contrast [121]. Next, we downsample the contrast stretched sound source localization heatmap to a 3 × 3 tactile map which reflects the number of actuators in our vibrotactile device. A moving average calculation is used to remove the random noise in the tactile map, which is caused by inaccuracy of the sound localization model. Specifically, Mn = 0.95 ∗ Mn−1 + 0.05 ∗ Mnew where Mn and

Mn−1 is the filtered tactile map at time step n and n − 1, Mnew is the incoming tactile map with noise. Another contrast stretching is further carried out for the created tactile mapping for the same reason as mentioned above. To reduce the effects of the low probability regions, we apply a thresholding similar to the method described in [74]. Tactile map pixels with a value lower than a certain threshold (e.g., 150 out of 255) were removed. The heatmap from the sound source localization can often output noise. As shown in Fig. 4.4, the bright region on the left of the heatmap is due to noise and the bright region on the right is the actual sounding object. However, this noise can be detected and removed by comparing with the heatmaps from the neighbouring time steps. As shown in Fig. 4.4, the distribution of the bright region in the heatmap is maintained after the contrast stretching and downsampling steps. After the moving average step, the bright pixel values on the left side of the spatial tactile mapping is significantly reduced. This is because other sound localization heatmaps (not shown in Fig. 4.4) generated at the neighbouring time steps do not contain such bright regions on the left side of the mapping. Since we apply a weight of 0.05 to the new tactile map as mentioned above, the noise on a single frame has minor effect on the filtered tactile map. Thus, these pixels are construed as random noise generated by the sound source localization algorithm and removed by the moving average step. Although the moving average mitigates some harmful noise, it also reduces its capability to CHAPTER 4. AUTOMATIC TACTILE CONTENT AUTHORING 73

Figure 4.5: Spatial tactile mappings generated from video examples downloaded from YouTube. capture fast moving objects. If the pipeline is used to generate spatial tactile effects for body parts with lower spatial tactile perception resolution (e.g., torso or arm), a longer time window can be used to remove the random noise more effectively. Even if the objects slightly change their positions in the video, they can still be presented by the same pixel in the spatial tactile mapping due to its lower resolution. On the other hand, if the spatial tactile stimuli is used for body parts with a higher spatial tactile sensitivity (e.g., hand or fingertip), the moving average step should assign more weights to the current video frame so that the generated spatial tactile mapping can better match the sounding object. Since the current vibrotactile display outputs tactile effects on user’s back which has a low spatial tactile sensitivity, we use a longer time window in the moving average step (a large weight of 0.95 for filtered tactile map from the previous time step) that is experimentally determined. After both the audio and visual components are processed, the amplitudes of the audio signals are mapped to the control voltages of each haptic actuator according to the spatial tactile map.

Each individual haptic actuator’s intensity Ii, proportional to the control voltage, is decided by Pk Ti ∗Amp/ i=1 Ti where Ti is the value of the corresponding tactile map pixel, Amp is the amplitude of the audio signal and k is the number of the tactile pixels. Since the intensities of haptic effects are designed to be proportional to the amplitudes of the audio signals, the last step of the pipeline is to normalize the root mean square (RMS) of tactile stimuli intensities to the intensities of the audio signals. CHAPTER 4. AUTOMATIC TACTILE CONTENT AUTHORING 74

Examples of the final spatial tactile mappings created by our framework are shown in Fig. 4.5. More details are available in the supplemental materials. Videos examples were downloaded from YouTube [49, 92, 63, 11, 137, 48, 129, 34]. Note that there are no tactile signals in one of the screenshots due to its low audio signal intensity.

4.1.4 Tactile Display

To evaluate the performance of our spatial tactile effects generation pipeline, we developed a 3 × 3 vibrotactile array device to render the tactile stimuli.

Location

For 4D movies or other immersive media with a haptic channel, a commonly used configuration is to apply the vibrotactile display to people’s backs, since users are often seated and do not need to dawn and doff a device if it is integrated into a chair. The back also provides a relatively flat, large area surface for spatial tactile feedback, which is needed given the relatively low tactile spatial acuity outside the hands, face and tongue [61]. Thus, we adopt a similar setup [74, 89] by developing a 2-D vibrotactile array inside a chair backrest cushion (as shown in Fig. 4.6(a).) resting against user’s back to demonstrate our automatic spatial tactile effects generation results.

Hardware Design

The design requirements of such a vibrotactile display include sufficient tactile stimuli, minimal interference between different haptic actuators and low distracting audio noise. To generate sufficient vibration intensities, we utilize 30 W haptic actuators as Fig. 4.6(a) depicts (Dayton Audio TT25-16 PUCK Tactile Transducer Mini Bass Shaker) in our setup. This haptic actuator can provide sufficient stimulation to the user’s back which has lower tactile sensitivity than other areas commonly used for haptic effects (e.g., hand or arm). To reduce the interference among haptic actuators, our haptic actuators are attached to a lumbar support back cushion with high-density memory foam which provides a good balance between firmness and softness. While too much softness will absorb actuator vibrations and weaken the haptic effects, too much firmness will result in a cross-talk between the different haptic actuators due to longer vibration propagation distance, thus jeopardizing the localization of spatial haptic effects. Furthermore, while this actuator is essentially a voice coil, it is custom designed as a tactile transducer to minimize any distracting audio effects accompanied with the vibrations. The number of haptic actuators and array size are also important design considerations for a vibrotactile display. Prior work [62] demonstrated that a user’s back has a high tactile localization accuracy rate with 3 × 3 array of vibrotactile haptic actuators placed 60 mm apart. This configura- tion is also used in [73] to enhance the video viewing experience. Thus, we also arrange our haptic CHAPTER 4. AUTOMATIC TACTILE CONTENT AUTHORING 75

(a) (b) Haptic Actuator

Synchronization Pre-calculated Signals Tactile Stimuli Motor Haptic CPU MCU Drivers Actuators

Cushion Cover Memory Foam

Figure 4.6: (a) Chair cushion with haptic actuators inside as a vibrotactile display. (b) Diagram of the tactile rendering hardware system. actuators in 3 × 3 array. Since our haptic actuators have a diameter of 70 mm, we placed them with a 10.5 cm inter-tactor spacing to avoid cross talk. Response time is another consideration for the vibrotactile display. According to [72], latencies shorter than the time to play a single visual frame is in an acceptable range. Since our actuator has a frequency response range of 20 - 80 Hz which corresponds to a rise time lower than 6 ms, it is sufficient to synchronize with videos with 30 frames per second (fps) frame rate (33 ms per frame). To control the vibrotactile display, we designed and fabricated a driver circuit. A medium-power motor driver (Pololu MAX14870) with a peak current of 1.7 A was used for each tactile transducer. A microcontroller (Teensy 3.6) was used to connect the computer and the motor drivers for tactile effects processing. Since our vibrotactile display is installed on a chair cushion rather than used as a wearable device on body (e.g., hand or arm), we did not strictly optimize the form factor of our electronic components. The primary goal of this prototype is to demonstrate the feasibility and performance of our automatic haptic effects generation pipeline.

Tactile Rendering Process

We developed a C# program to render the spatial tactile mappings that we obtained in Sec. 4.1.3 with our vibrotactile display. Since the our pipeline takes approximately 5 minutes to process a 10-second video clip, we computed and saved the required tactile signals in the microcontroller offline. Up to 1000 seconds of tactile signals can be pre-loaded to our microcontroller (Teensy 3.6). When the user begins watching a video, a start signal with the current video identifier is sent to the microcontroller (Fig. 4.6(b)) which then triggers the haptic actuators with the pre-computed signals. A signal is sent from the CPU to the microcontroller every 100 ms to re-synchronize (Fig. 4.6(b)) these two subsystems. Our haptic actuators are low-noise voice coils which have a much shorter rising time of about 4 ms, considering a frequency response of 20 - 80 Hz. Since the haptic latency is within an acceptable range [72], we did not intentionally send driving signals ahead of time CHAPTER 4. AUTOMATIC TACTILE CONTENT AUTHORING 76

to compensate for the latency. This is in contrast to some prior work [74, 75] which used eccentric rotating mass (ERM) vibration motors for tactile rendering. Due to the large latency time of these actuators, the tactile stimuli needs to be sent ahead of time to keep a good synchronization with the audiovisual content.

4.2 User Study: Evaluation of Tactile Effects based on Cross- Modal Features

In this section, we describe our user study on evaluating the performance of the tactile effects that were automatically generated from cross-modal information as described in the previous sections. To test this, we compare our cross-modality method with the state-of-the-art method that only utilizes the visual channel [74]. We also compare videos from these methods to a baseline condition without any tactile effects. Thus, three conditions were evaluated in this user study: plain videos without any tactile effects, videos from the modified version of saliency-driven visual-based method [73] and videos from our method based on cross-modal features.

4.2.1 Participants

We recruited 20 participants (12 M, 8 F) from our institution. Ages ranged from 21 to 44 (M = 25.8, SD = 5.1). Participants were compensated 15 USD for their participation, and the experiment generally lasted for an average of 40 minutes.

4.2.2 Study Setup

A 24 inch LCD display was used to present the videos while a noise cancelling headphone (Audio- Technica ATH-ANC7b) was used to deliver the audio channel as well as preventing participants from hearing the tactile transducer noises. Participants were seated approximately 65 cm from the display. A chair cushion was equipped with the 3 × 3 tactile transducers inside (Fig. 4.6(a)), as described in Sec. 4.1.4, and was used to provide the tactile effects to the participant’s back as shown in Fig. 4.7. Participants were asked to refrain from wearing thick or multiple layers of clothing to ensure adequate delivery of the vibrations to the participants’ back.

4.2.3 Independent Variables

As discussed earlier, the main independent variable for this study is the different types of tactile effects: no tactile effects, tactile effects based on visual saliency-driven method, and tactile effects from our cross-modal method. In order to see how these compare across different types of videos, we presented to the participants eight videos with different scene complexity such as the number of audiovisual events and the number of objects in the scene. CHAPTER 4. AUTOMATIC TACTILE CONTENT AUTHORING 77

Noise-cancelling Headphone Audiovisual Content

Vibrotactile Display

Figure 4.7: Participants sat against the chair cushion with vibrotactile actuators during the user study. The vibrotactile device provided tactile stimuli that were synchronized to the shown videos.

Tactile Effects

Three conditions were tested for each video: (1) plain video without any tactile effects, (2) tactile effects generated by the modified saliency-driven method [74], and (3) our method. Plain video was used as the baseline condition to evaluate if adding spatial tactile effects augments the user experience. To provide a comparison with the state-of-the-art visual-based automatic tactile effects generation pipeline, the modified saliency-driven method [74] was chosen as the comparison con- dition. We chose to not compare with an audio-based tactile effects as audio signal alone is not sufficient to generate spatial tactile stimuli. One issue with direct translation from an audio source is that it will translate both diegetic and non-diegetic sounds into tactile effects, which may often be distracting and random. Visual content is necessary to perform the sound source separation. Furthermore, most of the online videos do not necessarily have the stereo audio information and mono audio channel cannot be translated into spatial tactile stimuli. Since we propose a spatial tactile effects generation framework in this thesis, we only compared methods that can be used to generate spatial haptic effects. The saliency-driven method [74] firstly builds a spatial saliency map for each frame by finding the visually apparent features. It also constructs a temporal saliency map by looking at frame to frame difference to emphasize the dynamic motion of the objects. However, as mentioned in [74], this method did not consider foreground motion versus background motion. If there are any abrupt camera movements in the video, it cannot avoid assigning tactile effects to the moving background even though the foreground objects (e.g. car in a racing game) should be the main CHAPTER 4. AUTOMATIC TACTILE CONTENT AUTHORING 78

focal point of the tactile rendering. This limitation will compromise the performance of the visual- only method in many videos. Thus, in our implementation of the saliency-driven automatic tactile generation pipeline, we modified the method used by Kim et al. by adopting a neural-network-based method [59] for detecting visual saliency. A recent work in video saliency research [59] established a large-scale database of eye-tracking videos to train deep neural networks (DNN) for the prediction of the visual saliency. Because it is trained on generic videos, this method is generalizable to other videos, thus serves as a better representation of the visual-only method for automatic tactile effects generation. We believe this improvement in the saliency-driven method enables a more even- grounded comparison between the visual-only and cross-modality methods for spatial tactile effects generation. To implement the modified version of the saliency-driven spatial tactile effects generation pipeline, we first used the method described by Jiang et al. [59] to obtain a visual saliency heatmap at each frame. We then used that saliency map as an input to the rest of the pipeline detailed in [74] to create a spatial tactile map. We applied the adaptive thresholding technique described in [74], as it had the beset performance according to their user evaluation. Specifically, a contrast stretching was carried out to remove the 1st and 99th percentile in the heatmap histogram and map the other pixels to 0 to 255. Next, the heatmap was downsampled to 3 × 3 tactile map in order to match the number of haptic actuators in our vibrotactile display. A moving average calculation was further conducted to reduce the noise. As described in [74], we did a deadzone mapping to reject pixels below a certain threshold (e.g. 20 out of 255) and remapped the other pixels to 0 to 255. The RMS of the tactile signal intensities was normalized to the RMS of the amplitudes of the visual saliency heatmaps. Another normalization factor was further used to set the tactile signals intensities of the saliency-driven method and our method to a similar range for a fairer comparison. We empirically tested different parameters to obtain the above values in the saliency-driven method pipeline so that the visual saliency heatmap produced by neural network method [59] works well with the rest of the pipeline described by Kim et al. [74]. Visual saliency heatmaps and the spatial tactile mappings generated by the saliency-driven method are included in the supplemental material for demonstration.

Stimuli

We downloaded and used eight video clips from YouTube for the experiment. See Table 4.1 for a brief summary of each video. The complexity of the scene such as the number of audiovisual events and the parameters of the objects in the scene (e.g., number, size, and motion of objects) are major factors that can influence the quality of the generated tactile stimuli. Thus, we selected example videos for the user study which covered various combinations of these factors to more thoroughly evaluate different tactile effects generation methods. M1 [49] presents a continuous motion of a plane going from right to top left in the scene. Since CHAPTER 4. AUTOMATIC TACTILE CONTENT AUTHORING 79

the object is small and fast, it tests the ability of an algorithm to accurately track the location of an object within the scene. M2 [92] shows a truck moving on a road. Although it is easier to spatially track the object due to its larger size, an algorithm still needs to understand several special audiovisual events (truck honking) and emphasize them through tactile effects. M3 [63] consists of a collision between a train and a truck. It tests if an algorithm can successfully understand the scene and accurately render proper tactile stimuli at the right position and time. Since there are more than one moving objects in the scene, it is slightly more complicated that M2. M4 [11] is a video of a wall-mounted clock. It has much fewer spatial features with just a single moving part (i.e., second hand) in the foreground. However, it has many audiovisual events as each tick could be enhanced with tactile effects. Since the overall scene is quiet and there are no significant physical events, an algorithm should be intelligent enough to provide subtle tactile effects and refrain from providing distracting stimuli to the users. M5 [137] is an example that has many sudden motions along with multiple audiovisual events. A dog moves around in the video and barks sporadically. There is also a narrator who is not visually present in the scene. This clip tests whether an algorithm can quickly and accurately track a highly dynamic animal. The ability to understand and separate relevant audio signals (i.e., diegetic sound) is also an important factor to provide semantically-relevant haptic effects. M6 [48] depicts a person walking in a serene winter forest with a thick layer of snow. Thus, it is appropriate to render very weak or almost no tactile effects since there are no significant physical events and tactile effects should not stimulate users in such a tranquil scene. M7 [129] shows a person cutting a tree log with a chainsaw. There are multiple objects moving in the scene (e.g. person’s body and chainsaw). Thus, it can be a challenge for an algorithm to identify the correct positions for the tactile stimuli. The person finishes cutting the tree log in the middle of the video, thus the accompanied tactile stimuli should also terminate at the right time. While his body continues to move around after the log cutting event, an algorithm should be able to understand the semantics and stop generating irrelevant tactile effects. M8 [34] is spatially more complicated than other videos as there are many moving objects in the scene (e.g. building, cars, people etc). Having multiple objects from the same category (e.g. a line of cars) adds an extra layer of complexity, especially for our cross-modal method.

4.2.4 Design and Procedure

We used a one-factor within-subject design where the independent variable was the tactile effects presented throughout a video. As we discussed in Sec. 4.2.3, three conditions (no tactile effects, a modified version of the saliency-driven method [74] and our cross-modal method) were demonstrated for each of the 8 video examples downloaded from YouTube. Thus, every participant experienced 3 × 8 = 24 trials in a random order. A practice session demonstrating the capabilities of the system CHAPTER 4. AUTOMATIC TACTILE CONTENT AUTHORING 80

Table 4.1: Summary of Videos Used in the User Study

Movie M1 M2 M3 M4 wall-mounted Summary flying airplane moving truck train collision clock Length 10s 10s 10s 10s

Excerpt

Movie M5 M6 M7 M8 Summary barking dog serene forest log sawing busy street Length 20s 10s 10s 20s

Excerpt

was carried out before the formal user study in order to reduce the “novelty effect” for new users. After every eight trials, participants were given two minute long breaks. After each trial, participants filled out a modified version of the quality of experience question- naire (See Table 4.2). We designed this around the concept of Presence. Witmer and Singer [144] determined four factors for presence including (1) Control, (2) Sensory, (3) Realism, and (4) Distraction. Since the participants passively received the tactile effects, the “Control” category is not relevant for this study. We also excluded the “Realism” item as our tactile effects are designed to enhance the audiovisual experience which does not necessarily have to replicate the real world. The “Sensory” factor assesses how each modality is solicited in the experience and the “Distraction” section was chosen to help to evaluate if our tactile effects are distracting to the participants. Thus, these two items were included in the questionnaire. We also added two additional items to the QoE questionnaire including the “Novelty” and “Immersion” questions which are frequently used for evaluation of tactile effects accompanied with multimedia [74, 75]. We developed and asked a single question for each category as shown in Table 4.2. For sessions with tactile effects, three additional questions were added to the questionnaire to better understand whether the haptic content matched the video spatially and temporally (Table 4.2). All question use a 0-10 Likert Scale, where 10 represents a strong agreement with the statement and 0 indicates strong disagreement. CHAPTER 4. AUTOMATIC TACTILE CONTENT AUTHORING 81

Table 4.2: Questionnaire Used in the User Study NO. Factor Question Scale The multi-sensory interaction of the system helped with its con- Q1 Sensory 0-10 tent delivery. I can focus on the content without being distracted by the delivery Q2 Distraction 0-10 methods (e.g. display, headphone and haptic actuators). Q3 Novelty I found this system interesting to use. 0-10 Q4 Immersion I was immersed in the movie. 0-10 Q5 The vibrations were spatially matched with the movie. 0-10 Q6 The vibrations were temporarily matched with the movie. 0-10 Q7 It was straightforward to understand why there were vibrations. 0-10

4.2.5 Results and Discussion

Overall Results and Trends

The aggregated results from all of the videos were analyzed. We applied one-way within-subject ANOVA with sphericity test and Greenhouse-Geisser corrections on the QoE questions (Table 4.2 Q1-Q4) to test if there were any statistical difference among the different tactile effect conditions (See Table A.1). Furthermore, we carried out a set of paired t-tests with Bonferroni corrections on all questions (Table 4.2 Q1-Q7) to evaluate statistical significance of the results between each pair of the three conditions (Fig. 4.8). Since there are only two conditions in the tactile questions (Table 4.2 Q5-Q7), we only show ANOVA results as shown in Table A.1. As shown in Fig. 4.8,our cross-modal method significantly outperformed the saliency-driven method for all the items in the QoE (Q1 - Q4, p ¡ 0.001). The plain videos were not statistically different from the videos based on saliency-driven condition in terms of Sensory, Novelty and Im- mersion, while the videos from saliency-driven method were significantly worse than the plain videos in terms of Distraction (p ¡ 0.001). Although the videos from saliency-driven method were not rated higher than the plain videos in aggregate, it was more effective in specific scenarios, which we will discuss in the following paragraphs. Between the videos with tactile effects from our cross-modal method and plain videos, there were significant differences (p ¡ 0.01) in terms of the Sensory, Novelty and Immersion categories. Our method was rated with a lower mean score than plain videos in the Distraction item (Q2), albeit in a statistically insignificant manner (p ¿ 0.05). The results demonstrate that for various examples we tested, our method helped to improve user experience. Tactile questions (Q5 - Q7) in Fig. 4.8 were aimed to evaluate how well the tactile effects match the audiovisual content. The overall results for eight videos in Fig. 4.8 showed that our cross-modal method performs significantly better (p ¡ 0.001) than the modified saliency-driven method in terms CHAPTER 4. AUTOMATIC TACTILE CONTENT AUTHORING 82

Plain videos Saliency-driven method Our method

10 ** *** ** 7.5

5 Score *** 2.5 *** *** *** *** *** *** *** 0 Q1 Q2 Q3 Q4 Q5 Q6 Q7

Figure 4.8: The aggregated results for all of the eight examples are shown. * p ¡0.05, ** p¡0.01, *** p¡0.001. The results compare performance of the baseline condition (i.e., plain videos without tactile effects), the modified saliency-driven method [74] and our method for the seven questionnaire items (Q1-Q7, See Table 4.2). of synchronizing with the videos both spatially (Q5 in Table 4.2) and temporally (Q6 in Table 4.2). Participants also were able to better understand the rationale behind the tactile effects (Q7 in Table 4.2).

In-depth Analysis of Each Video

Here, we providing an in-depth analysis on the advantages and disadvantages of the three conditions for each video. We apply the same analysis as described in Sec. 4.2.5. As we described in Sec. 4.2.3, M1 depicts a flying airplane which is a challenging task for an algorithm as they need to track a highly dynamic object. Although our method performed better than the saliency-driven method in the aggregated results, Fig. 4.9(b) M1 shows that our method was not significantly better than the saliency-driven method. This is within our expectation since the saliency-driven method is more specialized in tracking moving objects. This result confirms that our method based on cross-modality information is as good as the visual-only method for localization. A further comparison of the QoE results showed that both methods were statistically better (p ¡ 0.01) than plain video for the Novelty item while only our method significantly improved the Immersion item (p ¡ 0.01). M2 is an example with more temporally complex audiovisual events (e.g. a truck honking). Results in Fig. 4.9(b) M2 show that our method is significantly better (p ¡ 0.01) than the saliency- driven method for temporal synchronization (Q6), indicating that cross-modality information is more effective than pure visual features in temporal synchronization with the events in the scene. CHAPTER 4. AUTOMATIC TACTILE CONTENT AUTHORING 83

(a) plain video saliency-driven method our method

M1 M2 M3 ** ** ** *** *** * ** 10 7.5 5 Score ** * * 2.5 * * * * 0 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 M4 M5 M6 *** *** *** 10 7.5 *** *** 5 *** Score *** ** 2.5 *** *** ** *** *** *** *** *** *** ** *** *** 0 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 M7 M8 ** *** ** 10 7.5 5

Score ** * 2.5 ** * * 0 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 (b) M1 M2 M3 10 7.5 5 Score 2.5 ** ** *** *** *** 0 Q5 Q6 Q7 Q5 Q6 Q7 Q5 Q6 Q7 M4 M5 M6 10 7.5 5 *** *** *** *** *** *** Score 2.5 0 Q5 Q6 Q7 Q5 Q6 Q7 Q5 Q6 Q7 M7 M8 10 7.5 5 Score 2.5 *** * * ** 0 Q5 Q6 Q7 Q5 Q6 Q7

Figure 4.9: (a) Subjective ratings for QoE. (b) Subjective ratings for tactile related questions. Standard error bar is shown for each measurement. * p ¡0.05, ** p¡0.01, *** p¡0.001. The results compare performance of plain videos, the modified saliency-driven method [74] and our method for the seven questionnaire items (Q1-Q7, See Table 4.2). CHAPTER 4. AUTOMATIC TACTILE CONTENT AUTHORING 84

Although our method obtained a higher mean score for spatial synchronization (Q5), there was no statistical significant difference, which suggest that both methods have comparable performance for spatially tracking a large moving object. QoE results in Fig. 4.9(a) M2 demonstrate that videos from our cross-modal method are significantly better (p ¡ 0.01) than the plain videos in terms of the Sensory, Novelty and Immersion items. Since the saliency-driven method is less effective in temporal synchronization with the audiovisual content, users may have found the tactile effects more distracting as they were not triggered at the appropriate time. M3 displayed a train running into a truck. It is more complex than M2 as we discussed in Sec. 4.2.3. Although the collision between the two moving objects caused a crowd of dust which may potentially be detected by a visual-only method, it is more efficient to locate the collision with the audio information. This is demonstrated in Fig. 4.9(b) M3 in which our method was significantly better (p ¡ 0.001) in all of the tactile-related questions (Q5-Q7). Note that different from M1 and M2, this time our method was also better for spatial synchronization (Q5) due to its ability to find sounding object which better matched with the collision site. In addition as shown in Fig. 4.9(a) M3, our method is rated significantly higher than the plain video condition in terms of Sensory and Novelty(p ¡ 0.05), and higher than the saliency-driven method for Immersion (p ¡ 0.05). We observed that users found the tactile effects generated by the saliency-driven method significantly (p ¡ 0.05) more disturbing than the plain video, likely because the tactile effects did not synchronize with the collision. Results of M3 show that our cross-modal method is able to temporally track an event with higher precision than the visual-based method. M4 mainly tested the algorithms’ ability to generate spatial tactile effects for multiple audiovisual events (i.e., clock ticking) in a relatively tranquil scene. As Fig. 4.9(a) depicts, the saliency-driven method does not work well for this case. For all the items in QoE, our method was rated significantly higher (p ¡ 0.01) than the saliency-driven method. In addition, our method significantly outperformed (p ¡ 0.001) the saliency-driven method for all tactile-related questions (Fig. 4.9(b)). This is mainly because our method took the audio features into account, thus tracking the clock ticking more accurately. In contrast, the saliency-driven method generated tactile effects both for spatial salient objects (e.g. ticks on the clock) and temporal salient events (e.g. motion of the second hand), thus users were confused by the unnecessary tactile stimuli. Since our method normalized the tactile effect intensities to the RMS of the sound track, it can present the appropriate amount of tactile effects for tranquil scenes. Since the tranquil scenes may have visually salient features, it is difficult to generate tactile stimuli with appropriate intensities by only looking at the visual information. Thus, the saliency-driven method was not as good as our method in this example. M5 evaluated the algorithms’ performance to both spatially and temporally track multiple au- diovisual events. As Fig. 4.9(a) depicts, for the Sensory, Novelty and Immersion items, our method was significantly better (p ¡ 0.001) than both the plain video condition and the saliency-driven method. In terms of the Distraction item, our method obtained a mean score similar to that of CHAPTER 4. AUTOMATIC TACTILE CONTENT AUTHORING 85

the plain video condition while the saliency-driven method was significantly worse (p ¡ 0.001) than both the plain video condition and our cross-modal method. As shown in Fig. 4.9(b), our method significantly outperformed (p ¡ 0.001) the saliency-driven method both for spatially and temporally matching tactile effects with the audiovisual content. While the saliency-driven method generated a more continuous tactile stimuli due to the movement of the dog in the scene, our method took the audio features into account, thus was able to emphasize the tactile effects on the more semantically salient audiovisual events (e.g. dog barking). This example demonstrated our method’s ability to understand the scene semantics and recognize multiple audiovisual events correctly. M6 is a tranquil scene with a person walking in a serene winter forest. It mainly tested whether the algorithms can refrain from generating inappropriate tactile effects when there are no significant physical events in the scene. As shown in Fig. 4.9(a), both the plain video condition and our method were significantly better (p ¡ 0.01) than the saliency-driven method for the Sensory, Distraction and Immersion items. Note that our method generated the tactile stimuli with intensities proportional to the intensities of the audio signals. Thus, in a tranquil scene, the created tactile effects were minimal. In contrast, the saliency-driven method did not perform well in this example (Fig. 4.9(a)) since it generated continuous and significant tactile stimuli because of the salient motion of the walking person in the scene. This example showed the importance of analyzing the cross-modality information to achieve better semantic understanding of the scenes that doesn’t have significant physical events. M7 was used to compare the algorithms’ ability to analyze the activities in the scene and render tactile stimuli for the proper scene semantics. From Fig. 4.9(a) we can observe that in terms of the Sensory and Novelty items, our method was significantly better than the plain video condition (p ¡ 0.01) and the saliency-driven method (p ¡ 0.05). The saliency-driven method was also significantly worse (p ¡ 0.01) than the plain video in terms of Distraction. Although our method and the saliency- driven method had similar mean scores for spatially matching the audiovisual content (Fig. 4.9(b)), our method was significantly better (p ¡ 0.001) in terms of temporal synchronization. Participants also had significantly better understanding of why there were tactile stimuli for the video generated by our method (Q7) (p ¡ 0.05). Since our method took the audio features into account when generating tactile effects, it is able to stop right when person stops sawing in the scene whereas the saliency-driven method continued to provide tactile stimuli throughout the video because of the visually salient body movements of the person in the scene. M8 is an example with a much higher spatial complexity. Although our method showed signif- icantly better results than the plain video condition (p ¡ 0.01) and the saliency-driven method (p ¡ 0.05) for the Novelty item (Fig. 4.9(a)), all of the three conditions had similar mean scores in terms of the Sensory, Distraction and Immersion categories. From Fig. 4.9(b), we can observe that the saliency-driven method significantly outperformed (p ¡ 0.05) our method when spatially matching the tactile stimuli to the audiovisual content. Our method performed less well in this example due CHAPTER 4. AUTOMATIC TACTILE CONTENT AUTHORING 86

to one of the limitations of our sound source localization algorithm. If there are multiple sounding objects from the same category (e.g. a fleet of cars) in the scene, our sound source localization model is not smart enough to determine which of the objects is the main sounding source. Thus, tactile effects can be assigned to any one of the objects in the same category. We will discuss more about this in Sec. 4.3. Despite this limitation, our method still showed similar performance to the plain video condition and the saliency-driven method in terms of the QoE (Fig. 4.9(a)). Participants also significantly better (p ¡ 0.01) understood why there were tactile effects for videos generated by our method (Q7 Fig. 4.9(b)). From the examples above, we can conclude that our cross-modal method outperformed the saliency-driven method in terms of both the QoE and the synchronization of tactile effects. We infer that the cross-modality features that our method used helped to improve synchronization of the spatial tactile stimuli with the audiovisual content. Although the saliency-driven method and our method obtained comparable results for spatially tracking moving objects and adding tactile effects to the desired location in some video examples (Fig. 4.9(b) M1, M2, M6, M7, M8), our method is more effective in temporally synchronizing tactile effects to the examples (Fig. 4.9(b) M2, M3, M4, M5, M7). Thus, cross-modal features allow algorithm to provide a more spatiotemorally synchronized tactile effects and improve the overall user experience compared to methods based only on the visual features.

4.3 Limitations and Future Work

One of the limitations of our current pipeline is that the sound source localization model we used [134] can only identify 28 distinct event categories. If none of the 28 events are present in a video, our model will generate a random heatmap for sound localization which substantially diminishes the user experience. Thus, we only used 10 - 20 second long video clips with events previously trained for our model. This constraint allow a more meaningful comparison for our user study. However, we can overcome this limitation in the future by collecting and training the model with larger, more diverse video datasets. Since our main contributions are on end-to-end demonstration and evaluation of a pipeline that uses cross-modality information for automatic generation of tactile stimuli, we currently only demonstrate our results with videos within the 28 event categories. The sound source localization model also has a limitation dealing with videos with multiple sounding source in the same scene, especially when they are from the same category (e.g. a fleet of cars). Although the model will be able to extract features from the audio signals and assign one or several cars as the sounding objects, the real sounding source may be a different car within the scene. This limitation can potentially be resolved in the future by training the model on videos with stereo audio channels to help understand the locations of the sounding objects. CHAPTER 4. AUTOMATIC TACTILE CONTENT AUTHORING 87

The audio source separation method we use [104] still has room for improvement. A portion of non-diegetic sounds sometimes remain after the separation. However, this part of the pipeline could easily be replaced by newer emerging models from the audio source separation field. Another potential limitation is that our cross-modality method for rendering tactile effects relies on audio information of the video. If a video is recorded without any sound or with a very low amplitudes (e.g. a microphone was placed too far from the action), our cross-modal method would not be able to locate sounding objects in the scene and thus cannot render relevant tactile effects. Additionally, there may be some types of diegetic events where haptic designers wish to render that may not have a significant audio component. For example, a sponge being pressed where there is a significant physical action but little or no sound is generated. Our method will not work for these non-sounding events. Future work could investigate how to combine our cross-modality method with other techniques to best capture a given scene as mentioned above. Our automatic spatial tactile effects generation pipeline consists of two neural networks to sepa- rate diegetic audio signals and locate the sound source. Due to the complexity of the data structure, it takes them 2-3 minutes to process a 10-second video. The processing time of any video is linearly proportional to its length. Thus, the current method is inadequate for real time generation of tactile effects. However, most contents in online video platforms (e.g,. YouTube) are not streamed but rather are pre-uploaded. This allows us time to pre-compute the tactile effects. One of the future directions is to develop a pipeline that can generate tactile effects in real time to meet the require- ments of live videos. Considering the rapid development of application specific integrated circuits (ASIC) for computer vision, this limitation can potentially be solved in the near future. Haptic effects are mostly used to augment physical events that happen in the scene and make users feel the sense of “being physically present” [29]. Therefore, we mainly focus on adding haptic effects to the diegetic audiovisual content. However, haptic effects can also be used to enhance non-diegetic components of videos (e.g. ambiance or emotion) [29]. Researchers have looked into affective haptics which helps haptic designer to add nondiegetic haptic effects [135, 128]. However, this direction is still relatively under-explored [29] with no available method that can automatically generate non-diegetic haptic effects for audiovisual content. Thus, future work should investigate how to augment both the diegetic and the non-diegetic content with haptic effects. Our vibrotactile display has a 3 × 3 array of haptic actuators inside the chair cushion. Currently, tactile effects can only be rendered at each of the haptic actuators. In order to render tactile stimuli at arbitrary locations with fixed array of actuators, Ali Israr et al. proposed a method to modulate multiple actuators to render tactile stimuli in between them [56]. Another potential approach is to leverage a swarm of robots [84, 71] with a tiny haptic actuator attached to each of them and use them as a reconfigurable haptic display. Further research would be needed to optimize the location of the tactors in such a reconfigurable vibrotactile display to match the audiovisual content. CHAPTER 4. AUTOMATIC TACTILE CONTENT AUTHORING 88

4.4 Conclusions

We developed a framework to automatically generate spatial tactile effects based on cross-modality features from a video. First, neural networks are used to separate diegetic sound information which is then used to locate the sounding object in the scene. The intensity of the diegetic audio is translated to intensity of the tactile stimuli while the probability heatmap of the sounding object is mapped to spatial distribution of the tactile effects. Using an array of 3 × 3 haptic actuators on the back of the users, we conducted a human subject experiment to evaluate and compare videos with tactile effects from our cross-modal method, a modified version of the saliency-driven visual-based method [74], and videos without any tactile effects. The study results demonstrate that the spatial tactile effects generated by our cross-modal framework are more promising in providing spatiotemorally synchronized and immersive content than those generated based on visual features only. Chapter 5

Conclusions and Future Work

5.1 Conclusions and Future Work

In recent years, spatial haptic interaction has been investigated in various applications in the field of medical devices, accessibility, wearable devices, and entertainment. 2D tactile displays and 2.5D tactile shape displays can serve as platforms to provide meaningful spatial haptic feedback to users. Although many researchers have explored the field of tactile displays, there is still a lack of approaches to obtain low-cost tactile display devices and tactile content. To address the current hardware challenges of the tactile shape displays, we utilized an overall strategy of developing electroadhesive- brake-based tactile shape displays. By sacrificing an acceptable amount of refresh rate, brake-based tactile shape displays have the key advantages of low cost and compact form factor. To automatically generate tactile content with satisfactory accuracy, we made a design decision to translate the cross-modality features to the tactile stimuli. This chapter summarises the contributions of this dissertation to build affordable brake-based tactile displays enabled by batch fabrication methods, and towards developing automatic low-cost tactile content authoring methods. The potential future work to extend current results are also discussed.

5.2 Restatement of Thesis Contributions

5.2.1 Electrostatic Adhesive Brakes for 2.5D Tactile Shape Display

In this project we investigated electrostatic adhesive brakes towards the design of 2.5D brake-based tactile shape displays. Such displays have advantages including low cost, high resolution, and roll- to-roll fabrication compatibility thanks to the brake mechanism employed in the system. First, a theoretical model was applied to study the mechanism of the electrostatic adhesion brake. After exploring the design space of the electrostatic adhesion brake, we developed fabrication

89 CHAPTER 5. CONCLUSIONS AND FUTURE WORK 90

methods that are compatible with solid-state electronics manufacturing process to build real elec- trostatic adhesion brakes with high resolution. We further technically characterized the fabricated brakes by measuring their contact force with various design parameters, refresh rate and robustness under different conditions. We developed a brake-based tactile shape display prototype based on the individual electrostatic adhesion brakes we constructed. The tactile shape display prototype can render 2.5D shapes by following the pre-defined process flow. To further increase the total contact force provided by the tactile shape display, an additional mechanical clutch was implemented in the system. We carried out a user study to compare the performance of our tactile shape display with 3D printed shapes. No statistically significant difference was observed in terms of shape recognition rate and user response time. Our investigation of the application of electrostatic adhesive brakes for refreshable 2.5D tactile shape displays has provided a feasible hardware platform towards the goal of low-cost devices for spatial haptic interaction.

5.2.2 Electrically Programmable Auxetic Materials for 2.5D Shape Dis- plays

The second hardware platform we developed to achieve low-cost spatial haptic interaction devices is a 2.5D shape display based on electrically programmable auxetic materials. A 2.5D shape is rendered by locking a set of faces of the auxetic material in two different ways while inflating the system with air. This shape display also has the same advantage of low cost as the electrostatic adhesion brakes based 2.5D tactile shape display as we mentioned in Sec. 5.2.1 since it can utilize the same electrostatic brake technology. The auxetic materials that construct the shape display can also be fabricated in a batch process with roll-to-roll manufacturing techniques. Furthermore, this device has a large shape rendering displacement because the formable-crust mechanism is applied in the system. Since auxetic materials have relatively complicated structure, we built a simulation model using a spring damper model, developed in the Rhino and Grasshopper software, to study the design space of the auxetic 2.5D shape display. We discussed the meaningful design parameters for the auxetic 2.5D shape display and the metrics to evaluate the rendered 2.5D shape. To optimize the design parameters for better shape rendering ability, we utilized the simulation model to assess the metrics of the 2.5D shapes rendered by auxetic 2.5D shape displays with different design parameters. After investigating the design space of the auxetic 2.5D shape displays, we developed neural-network-based algorithms to render a target shape by determining the set of faces to lock before inflation. Based on the knowledge we obtained in the simulation, we constructed an experimental passive auxetic 2.5D shape display prototype and demonstrated the shape rendering mechanism by locking a set of faces. Furthermore, we measured and modeled the brake force required to lock the faces of the auxetic material by two locking strategies. We discussed the potential to apply electroadhesion mechanism CHAPTER 5. CONCLUSIONS AND FUTURE WORK 91

to lock the 2.5D auxetic shape display based on the locking force measurement and modeling results. The auxetic 2.5D shape display we proposed provides another possible hardware platform for low-cost spatial haptic interaction.

5.2.3 Automatic Generation of Spatial Tactile Effects by Analyzing Cross- modality Features in a Video

In addition to hardware platforms, content authoring is also a key bottleneck for low-cost spatial haptic interaction. To address the challenges in the current authoring methods based on manual editing, inertial sensors, audio signals, or visual content, we developed an automatic spatial tactile effects generation pipeline by analyzing cross-modality features in a video. It has the advantages of providing a better spatialtemporal synchronization between the audiovisual content and the ac- companying tactile effects as well as a more immersive user experience, demonstrated in our user study. The automatic spatial tactile effects authoring method we proposed is a generation pipeline. It first separates the diegetic audio signals from the audio stream since they are more relevant to the tactile effects. To calculate the spatial distributions of the tactile effects, our framework further creates a heatmap of the sound sources in the scene by looking at both the diegetic audio signals and the visual features of the video. The intensities of the spatial tactile effects are translated from the intensities of the diegetic audio signals. Finally, the spatial tactile mappings are created by combining the spatial distribution and the intensities of the spatial tactile effects as we obtained in the previous steps of the pipeline. To evaluate the performance of our automatic spatial tactile effects generation framework, we built a vibrotactile display as well as a control program to render the tactile stimuli in sync with the audiovisual content. A user study was conducted to compare videos accompanied with tactile stimuli generated by our framework, plain videos without any tactile effects, and videos with tactile stimuli generated based on visual features. Our method had better performance compared to the other two control conditions in terms of the spatialtemporal synchronization with the audiovisual content and the quality of user experience. The spatial tactile effects generation framework we developed enables automatic authoring of vast amount of tactile content, thus paving the wave for low-cost spatial haptic interaction.

5.3 Future Work

5.3.1 Hardware Design and Content Authoring for Real-time Low-cost Spatial Haptic Interaction

The hardware platforms we built for spatial haptic interaction are based on brakes. The 2.5D tactile shape display as described in Chapter 2 uses electrostatic adhesive brakes while the auxetic 2.5D CHAPTER 5. CONCLUSIONS AND FUTURE WORK 92

shape display described in Chapter 3 also relies on brakes to lock the faces of the auxetic materials. While braked-based shape displays have the advantages of low-cost due to their compatibility to be fabricated in a batch fabrication process (Sec. 1.2), they are only “refreshable”, and have a limited response time on the order of seconds. It would be meaningful to develop low-cost shape displays with the ability to interact with users in real-time in the future. One alternative approach would be to significantly reduce the refresh time of the shape displays so it can be perceived as a dynamically responding device. There are some possible technical solutions for the above issues. For example, developing elec- trostatic inchworm linear actuator as described in [149] by Yeh et al. In that work, two clutch-drive electrostatic actuators take turns to engage and disengage with a sliding shuttle to drive the actu- ator forward. Because it can be fabricated with a silicon-on-insulator (SOI) process, this approach can maintain low-cost while increasing the interactivity of the tactile shape display by providing dynamic responses. The main challenge of applying this technique to our tactile shape display is the small force demonstrated by the electrostatic actuator [149] since it is a MEMS device. Future work should focus on scaling the technique to meet the force requirement of a Mechatronics device such as our tactile shape display. The auxetic 2.5D shape display we developed in Chapter 3 is driven by a pneumatic inflation system to render the 2.5D shape while a set of faces of the auxetic material are locked by the plate locking strategy or the cell locking strategy. A set of the brakes are turned on before inflating the system. During the inflation, the unlocked faces of the auxetic material travel along the surface of the membrane by rotating around their hinge points. Towards a real-time device, we can block the relative movement between the unlocked faces and the membrane by turning the brakes on at specific point in time rather than the starting point. Thus, the auxetic 2.5D shape display can change its shape without deflating the entire chamber. The automatic spatial tactile stimuli generation framework as we discussed in Chapter 4 uses two neural network models to separate the diegetic audio signals and create the sound source localization heatmaps. There are also extra steps to calculate the intensities of the tactile stimuli and the final spatial tactile mappings. Due to the relatively complicated data structures of the neural networks and the number of the steps required in the framework, the entire pipeline takes around 5 minutes to process a 10-second video clip. Therefore, the current framework cannot meet the requirement of generating the tactile stimuli in real time. To address the above issue, a potential direction for future work will be training a single neural network model that directly maps the input data of audiovisual content to the output data of spatial tactile mappings. In this way the redundant layers of the neural network models and the extra steps in the framework will be removed, thus accelerating the generation of spatial tactile stimuli. CHAPTER 5. CONCLUSIONS AND FUTURE WORK 93

5.3.2 Scaling of the Hardware Design and Content Authoring Methods

Scaling the hardware platforms up for more complicated application scenarios is also an important direction for the future work. We built a small tactile shape display prototype of 4 × 2 pins in Chapter 2 to demonstrate the electrostatic adhesive brakes. A tactile shape display with a larger array of pins is desired in order to render 2.5D shapes with more features. The auxetic 2.5D shape display we built in Chapter 3 proved the idea of formable-crust 2.5D shape display. Due to the limited number of faces in the auxetic material, the individually controllable region size is large which limits the curvature of the rendered shape. Similar to the 2.5D tactile shape display based on the electrostatic adhesive brakes, scaling up the number of the faces of the auxetic 2.5D shape display also enables its ability to render 2.5D shapes with more details. However, there are a number of challenges that make it hard to scale both of the devices. Uniformity issues are a main obstacles when scaling up the hardware platforms. With more pins or faces in the shape displays, there will be a higher chance for a failure to happen. For example, concerning the 2.5D tactile shape display based on electrostatic adhesive brakes (Chapter 2), it will be essential to improve the accuracy of the manufacturing of the frames, pins, and the dielectric film with interdigital electrodes on it. Once the features of the structures are in the order of magnitude of 100 µm, a slight curvature of the components will cause the electrostatic adhesive brakes fail to engage since the air gap between the metal pin and the interdigital electrodes will dramatically reduce the electrostatic adhesion. MEMS manufacturing methods should be adopted in the future work to ensure the tactile shape display is created with high accuracy. In terms of the auxetic 2.5D shape display (Chapter 3), wiring the electrodes on the faces of the auxetic material through the narrow linkage points (less than 1 mm) will be a challenge when scaling the device up. Future work should use sputtering technique with either a photoresists mask defined by lithography or laser ablation to make the electrical connections between the large number of the faces when the auxetic 2.5D shape display is scaled up. Concerning the tactile content authoring methods, it is also desirable to scale them to generate tactile stimuli under various scenarios. Our current automatic spatial tactile effects generation framework is limited to identify sound sources in 28 events classes (Chapter 4). Future work should extend our current method by training the neural network models on a much comprehensive dataset with a larger variety of sounding events classes so that different sounding objects can be identified with a higher accuracy. Furthermore, our method mainly focus on using tactile effects to enhance sounding objects in the video. Other non-diegetic components of the video, such as emotion or ambiance of the video, is not captured by our method. An interesting direction of future work will be training neural network models to understand different emotions and ambiance in a video, and then assigning proper tactile effects to a specific non-diegetic video component. CHAPTER 5. CONCLUSIONS AND FUTURE WORK 94

5.3.3 Extensions Enabled by Emerging Technologies

The hardware platforms and tactile content authoring we developed in this dissertation can be extended by many emerging technologies in the recent years. Concerning the 2.5D tactile shape display based on the electrostatic adhesive brakes (Chapter 2), the positions of the pins are decided by an open-loop control program, thus the 2.5D shape cannot be rendered with a high accuracy in the pin’s movement direction. Light Detection and Ranging (LiDAR), as a rapid developing field in the recent years, can serve as a depth camera to generate a point cloud of the environment. The depth information can help to calibrate the positions of the pins. Compared with other intrusive sensing methods such as capacitive sensing which needs to integrate the sensing circuits in the interdigital electrodes, LiDAR depth camera will not increase the form factor and the structural complexity of the 2.5D tactile shape display. Furthermore, the cost of LiDAR depth camera does not scale with the number of the pins as long as the tactile display can be covered by the field of view of the device. One limitation of the current LiDAR depth cameras is the depth accuracy. For example, Intel RealSense L515 has an average error of 5 mm at 1 m distance. However, since the wavelength of near-infrared light is very small, the depth accuracy can be better as the technologies advance. The auxetic 2.5D shape display in Chapter 3 can also be extended by the various emerging technologies developed in the field of auxetic materials in the recent years. We used 2D auxetic material in our shape display while the 2.5D shape is created by inflating the membrane. It will be desirable to utilize auxetic materials with higher stretchability so that the amplitude of the rendered 2.5D shape can be larger. Recent work [93] explored twelve architectures for the auxetic materials which can be fabricated via a laser cutter. With the highly tunable Poisson’s ratios and high stretchability shown in the structures, auxetic 2.5D shape displays that have the abilities to render shapes with a very large range of motion can potentially be achieved in the future work. Besides the 2D auxetic materials, researchers have also investigated 3D auxetic materials [114, 28], which expand along two normal lateral directions when the third direction is strained. Applying 3D auxetic materials to 2.5D shape display could be a promising solution where novel actuation schemes can be used to obtain more degrees of freedoms for shape rendering. Improving the amplitude of the buckling of the 3D auxetic materials also remains a challenge for future work. In terms of the tactile content authoring method (Chapter 4), emerging technologies in the field of computer vision and neural networks can help the tactile effects generation framework to under- stand more complicated scenarios and generate tactile stimuli that is more relevant to the content. For example, more advanced generative neural network structures such as generative adversarial network (GAN) [42] have unique advantages in generating complicated features based on the input data. Representative applications of GAN include image-to-image translation [55], text-to-image translation [152], and semantic-image-to-photo translation [141]. Since spatial tactile mappings can also be considered as 2D images, it will be an interesting route for the future work to explore CHAPTER 5. CONCLUSIONS AND FUTURE WORK 95

whether GAN can be used to translate the audiovisual content or other information into spatial tactile effects. Appendix A

ANOVA Analysis

Table A.1: One-Way Within-Subject ANOVA for Overall Results

All F p Q1 F (1.41, 26.73) = 13.74* 3.1E-4 Q2 F (2, 38) = 29.23* 2.1E-8 Q3 F (2, 38) = 22.09* 4.3E-7 Q4 F (2, 38) = 21.30* 6.2E-7 Note *p < .05

96 APPENDIX A. ANOVA ANALYSIS 97

Table A.2: One-Way Within-Subject ANOVA for Each Video

M1 M2 F p F p Q1 F (2, 38) = 4.25* 0.0217 F (1.36, 25.92) = 8.74* 0.0034 Q2 F (2, 38) = 2.94 0.0650 F (2, 38) = 3.46* 0.0418 Q3 F (2, 38) = 12.95* 0.0001 F (2, 38) = 15.78* 1.0E-5 Q4 F (2, 38) = 6.19* 0.0047 F (2, 38) = 10.51* 0.0002

M3 M4 F p F p Q1 F (2, 38) = 5.74* 0.0066 F (2, 38) = 17.35* 4.4E-6 Q2 F (2, 38) = 5.09* 0.0110 F (2, 38) = 18.99* 1.9E-6 Q3 F (2, 38) = 6.92* 0.0027 F (2, 38) = 11.09* 0.0002 Q4 F (2, 38) = 4.33* 0.0203 F (2, 38) = 12.53* 0.0001

M5 M6 F p F p Q1 F (1.26, 23.91) = 21.83* 3.6E-5 F (1.30, 24.62) = 13.33* 0.0006 Q2 F (2, 38) = 35.62* 1.9E-9 F (1.43, 27.24) = 30.78* 8.4E-7 Q3 F (2, 38) = 19.10* 1.8E-6 F (1.35, 25.70) = 4.14* 0.0413 Q4 F (2, 38) = 25.50* 9.5E-8 F (2, 38) = 16.52* 6.9E-6

M7 M8 F p F p Q1 F (1.49, 28.24) = 11.33* 0.0007 F (2, 38) = 3.66* 0.0350 Q2 F (2, 38) = 4.31* 0.0205 F (2, 38) = 4.40* 0.0191 Q3 F (2, 38) = 16.73* 6.1E-6 F (2, 38) = 9.63* 0.0004 Q4 F (1.36, 25.86) = 4.93* 0.0257 F (2, 38) = 2.35 0.1094

Note *p < .05 Bibliography

[1] Alderson Alderson and KL Alderson. Auxetic materials. Proceedings of the Institution of Mechanical Engineers, Part G: Journal of Aerospace Engineering, 221(4):565–575, 2007.

[2] Dennis Allerkamp, Guido B¨ottcher, Franz-Erich Wolter, Alan C Brady, Jianguo Qu, and Ian R Summers. A vibrotactile approach to tactile rendering. The Visual Computer, 23(2):97–108, 2007.

[3] Relja Arandjelovic and Andrew Zisserman. Look, listen and learn. In Proceedings of the IEEE International Conference on Computer Vision, pages 609–617, 2017.

[4] Relja Arandjelovic and Andrew Zisserman. Objects that sound. In Proceedings of the European Conference on Computer Vision (ECCV), pages 435–451, 2018.

[5] Kazutoshi Asano, Fumikazu Hatakeyama, and Kyoko Yatsuzuka. Fundamental study of an electrostatic chuck for silicon wafer handling. IEEE Transactions on Industry Applications, (3):840–845, 2002.

[6] Frank Baginski and William Collier. Energy minimizing shapes of partially inflated large scientific balloons. Advances in Space Research, 21(7):975–978, 1998.

[7] Mohamed Benali-Khoudja, Moustapha Hafez, Jean-Marc Alexandre, and Abderrahmane Kheddar. Tactile interfaces: a state-of-the-art survey. In Int. Symposium on Robotics, vol- ume 31, pages 23–26, 2004.

[8] Dean Blazie. Refreshable braille now and in the years ahead. Braille Monitor, 43(1):1–6, 2000.

[9] Paul Blenkhorn. A system for converting print into braille. IEEE transactions on rehabilitation engineering, 5(2):121–129, 1997.

[10] Paul Bosscher and Imme Ebert-Uphoff. Digital clay: Architecture designs for shape-generating mechanisms. In 2003 IEEE International Conference on Robotics and Automation (Cat. No. 03CH37422), volume 1, pages 834–841. IEEE, 2003.

98 BIBLIOGRAPHY 99

[11] White Noise Box. White noise — clock ticking sound. https://www.youtube.com/watch?v= _2ImeyoNUcs, 2018.

[12] Andy Brady, Brian MacDonald, Ian Oakley, Stephen Hughes, and Sile O’Modhrain. Relay: a futuristic interface for remote driving. In proceedings of EuroHaptics, pages 8–10, 2002.

[13] Kai Briechle and Uwe D. Hanebeck. Template matching using fast normalized cross correlation. In Aerospace/Defense Sensing, Simulation, and Controls, volume 4387, pages 95–102, 2001.

[14] Emre Cakir, Toni Heittola, Heikki Huttunen, and Tuomas Virtanen. Polyphonic sound event detection using multi label deep neural networks. In 2015 international joint conference on neural networks (IJCNN), pages 1–7. IEEE, 2015.

[15] Changyong Cao, Xiaoyu Sun, Yuhui Fang, Qing-Hua Qin, Aibing Yu, and Xi-Qiao Feng. Theoretical model and design of electroadhesive pad with interdigitated electrodes. Materials & Design, 89:485–491, 2016.

[16] Christian Carlberg. Clutch mechanism for a raised display apparatus, October 21 2008. US Patent 7,439,950.

[17] Jongeun Cha, Yongwon Seo, Yeongmi Kim, and Jeha Ryu. An authoring/editing frame- work for haptic broadcasting: passive haptic interactions using mpeg-4 bifs. In Second Joint EuroHaptics Conference and Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems (WHC’07), pages 274–279. IEEE, 2007.

[18] Pritish Chandna, Marius Miron, Jordi Janer, and Emilia G´omez. Monoaural audio source separation using deep convolutional neural networks. In International conference on latent variable analysis and signal separation, pages 258–266. Springer, 2017.

[19] R Chandrasekhar and KL Choy. Electrostatic spray assisted vapour deposition of fluorine doped tin oxide. Journal of crystal growth, 231(1-2):215–221, 2001.

[20] Angela Chang and Conor O’Sullivan. Audio-haptic feedback in mobile phones. In CHI’05 extended abstracts on Human factors in computing systems, pages 1264–1267. ACM, 2005.

[21] Elaine Y Chen and Beth A Marcus. Exos slip display research and development. In Proceedings of the International Mechanical Engineering Congress and Exposition, pages 55–1, 1994.

[22] Rui Chen, Yao Huang, and Qian Tang. An analytical model for electrostatic adhesive dynamics on dielectric substrates. Journal of adhesion science and Technology, 31(11):1229–1250, 2017.

[23] Zhuo Chen, Yi Luo, and Nima Mesgarani. Deep attractor network for single-microphone speaker separation. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 246–250. IEEE, 2017. BIBLIOGRAPHY 100

[24] Hyouk Ryeol Choi, SW Lee, Kwang Mok Jung, Ja Choon Koo, SI Lee, HG Choi, Jae Wook Jeon, and Jae-Do Nam. Tactile display as a braille display for the visually disabled. In 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)(IEEE Cat. No. 04CH37566), volume 2, pages 1985–1990. IEEE, 2004.

[25] Seungmoon Choi and Katherine J Kuchenbecker. Vibrotactile display: Perception, technology, and applications. Proceedings of the IEEE, 101(9):2093–2104, 2012.

[26] Vasilios G Chouvardas, Amalia N Miliou, and Miltiadis K Hatalis. Tactile displays: Overview and recent advances. Displays, 29(3):185–194, 2008.

[27] Martin Cooke, John R Hershey, and Steven J Rennie. Monaural speech separation and recog- nition challenge. Computer Speech & Language, 24(1):1–15, 2010.

[28] Richard Critchley, Ilaria Corni, Julian A Wharton, Frank C Walsh, Robert JK Wood, and Keith R Stokes. The preparation of auxetic foams by three-d imensional printing and their characteristics. Advanced Engineering Materials, 15(10):980–985, 2013.

[29] Fabien Danieau, Anatole L´ecuyer, Philippe Guillotel, Julien Fleureau, Nicolas Mollet, and Marc Christie. Enhancing audiovisual experience with haptic feedback: a survey on hav. IEEE transactions on haptics, 6(2):193–205, 2012.

[30] Esko O Dijk, Alina Weffers-Albu, and Tin De Zeeuw. A tactile actuation blanket to inten- sify movie experiences with personalised tactile effects. In Demonstration papers proc. 3rd International Conference on Intelligent Technologies for Interactive Entertainment, 2010.

[31] Stuart Diller, Carmel Majidi, and Steven H Collins. A lightweight, low-power electroadhesive clutch and spring for exoskeleton actuation. In Robotics and Automation (ICRA), 2016 IEEE International Conference on, pages 682–689. IEEE, 2016.

[32] Stuart B Diller, Steven H Collins, and Carmel Majidi. The effects of electroadhesive clutch design parameters on performance characteristics. Journal of Intelligent Material Systems and Structures, page 1045389X18799474, 2018.

[33] Hanifa Dostmohamed and Vincent Hayward. Trajectory of contact region on the fingerpad gives the illusion of haptic shape. Experimental Brain Research, 164(3):387–394, 2005.

[34] effspot. Car spotting gone wild: London is a circus. https://www.youtube.com/watch?v= DswlvppbBmI, 2016.

[35] Ariel Ephrat, Inbar Mosseri, Oran Lang, Tali Dekel, Kevin Wilson, Avinatan Has- sidim, William T Freeman, and Michael Rubinstein. Looking to listen at the cocktail party: A speaker-independent audio-visual model for speech separation. arXiv preprint arXiv:1804.03619, 2018. BIBLIOGRAPHY 101

[36] Victor Escorcia, Fabian Caba Heilbron, Juan Carlos Niebles, and Bernard Ghanem. Daps: Deep action proposals for action understanding. In European Conference on Computer Vision, pages 768–784. Springer, 2016.

[37] Sean Follmer, Daniel Leithinger, Alex Olwal, Akimitsu Hogge, and Hiroshi Ishii. inform: dynamic physical affordances and constraints through shape and object actuation. In ACM UIST 2013, volume 13, pages 417–426, 2013.

[38] Jan Friedrich, Sven Pfeiffer, and Christoph Gengnagel. Locally varied auxetic structures for doubly-curved shapes. In Humanizing Digital Reality, pages 323–336. Springer, 2018.

[39] Ruohan Gao, Rogerio Feris, and Kristen Grauman. Learning to separate object sounds by watching unlabeled video. In Proceedings of the European Conference on Computer Vision (ECCV), pages 35–53, 2018.

[40] Jort F Gemmeke, Daniel PW Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, R Chan- ning Moore, Manoj Plakal, and Marvin Ritter. Audio set: An ontology and human-labeled dataset for audio events. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 776–780. IEEE, 2017.

[41] Zoubin Ghahramani and Michael I Jordan. Factorial hidden markov models. In Advances in Neural Information Processing Systems, pages 472–478, 1996.

[42] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.

[43] Grasshopper. https://www.grasshopper3d.com/, 2018.

[44] Scott R Green, Brandon J Gregory, and Naveen K Gupta. Dynamic braille display utilizing phase-change microactuators. In SENSORS, 2006 IEEE, pages 307–310. IEEE, 2006.

[45] Yoichi Haga, Wataru Makishi, Kentaro Iwami, Kentaro Totsu, Kazuhiro Nakamura, and Masayoshi Esashi. Dynamic braille display using sma coil actuator and magnetic latch. Sensors and Actuators A: Physical, 119(2):316–322, 2005.

[46] Hend M Hamed and Rehab A Hema. The application of the 4-d movie theater system in egyptian museums for enhancing cultural tourism. Journal of Tourism, 10(1), 2009.

[47] Vincent Hayward, Oliver R Astley, Manuel Cruz-Hernandez, Danny Grant, and Gabriel Robles-De-La-Torre. Haptic interfaces and devices. Sensor Review, 24(1):16–29, 2004.

[48] HDNatureStock. Man walking on forest road in snow. https://www.youtube.com/watch?v= nMuGkoRILd8, 2013. BIBLIOGRAPHY 102

[49] Hyper Head. Jets fighter in low pass - shocking spectacors. https://www.youtube.com/ watch?v=Tz1hhx8yxyU, 2018.

[50] Toni Heittola, Anssi Klapuri, and Tuomas Virtanen. Musical instrument recognition in poly- phonic audio using source-filter model for sound separation. In ISMIR, pages 327–332, 2009.

[51] Toni Heittola, Annamaria Mesaros, Antti Eronen, and Tuomas Virtanen. Context-dependent sound event detection. EURASIP Journal on Audio, Speech, and Music Processing, 2013(1):1, 2013.

[52] John R Hershey, Zhuo Chen, Jonathan Le Roux, and Shinji Watanabe. Deep clustering: Discriminative embeddings for segmentation and separation. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 31–35. IEEE, 2016.

[53] Shawn Hershey, Sourish Chaudhuri, Daniel PW Ellis, Jort F Gemmeke, Aren Jansen, R Chan- ning Moore, Manoj Plakal, Devin Platt, Rif A Saurous, Bryan Seybold, et al. Cnn architectures for large-scale audio classification. In 2017 ieee international conference on acoustics, speech and signal processing (icassp), pages 131–135. IEEE, 2017.

[54] Robert D Howe, William J Peine, DA Kantarinis, and Jae S Son. Remote palpation technology. IEEE Engineering in Medicine and Biology Magazine, 14(3):318–323, 1995.

[55] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1125–1134, 2017.

[56] Ali Israr, Seung-Chan Kim, Jan Stec, and Ivan Poupyrev. Surround haptics: tactile feed- back for immersive gaming experiences. In CHI’12 Extended Abstracts on Human Factors in Computing Systems, pages 1087–1090. ACM, 2012.

[57] Hiroo Iwata, Hiroaki Yano, Fumitaka Nakaizumi, and Ryo Kawamura. Project feelex: adding haptic surface to graphics. In Proceedings of the 28th annual conference on and interactive techniques, pages 469–476. ACM, 2001.

[58] Sungjune Jang, Lawrence H Kim, Kesler Tanner, Hiroshi Ishii, and Sean Follmer. Haptic edge display for mobile tactile interaction. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pages 3706–3716. ACM, 2016.

[59] Lai Jiang, Mai Xu, Tie Liu, Minglang Qiao, and Zulin Wang. Deepvs: A deep learning based video saliency prediction approach. In Proceedings of the European Conference on Computer Vision (ECCV), pages 602–617, 2018.

[60] Alfred Johnsen and Knud Rahbek. A physical phenomenon and its applications to telegraphy, telephony, etc. Journal of the Institution of Electrical Engineers, 61(320):713–725, 1923. BIBLIOGRAPHY 103

[61] Lynette A Jones, Brett Lockyer, and Erin Piateski. Tactile display and vibrotactile pattern recognition on the torso. Advanced Robotics, 20(12):1359–1374, 2006.

[62] Lynette A Jones and Kathryn Ray. Localization and pattern recognition with tactile displays. In 2008 Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems, pages 33–39. IEEE, 2008.

[63] Poor Judgement. Train crash compilation part 1. https://www.youtube.com/watch?v= GduFDV_oY2s, 2017.

[64] Kurt A Kaczmarek, Mitchell E Tyler, and Paul Bach-y Rita. Electrotactile haptic display on the fingertips: Preliminary results. In Proceedings of 16th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, volume 2, pages 940–941. IEEE, 1994.

[65] Kurt A Kaczmarek, John G Webster, Paul Bach-y Rita, and Willis J Tompkins. Electrotactile and vibrotactile displays for sensory substitution systems. IEEE transactions on biomedical engineering, 38(1):1–16, 1991.

[66] Hiroyuki Kajimoto. Electrotactile display with real-time impedance feedback using pulse width modulation. IEEE Transactions on Haptics, 5(2):184–188, 2011.

[67] Hiroyuki Kajimoto, Yonezo Kanno, and Susumu Tachi. Forehead electro-tactile display for vision substitution. In Proc. EuroHaptics, 2006.

[68] Hiroyuki Kajimoto, Naoki Kawakami, T Maeda, and S Tachi. Electro-tactile display with tactile primary color approach. In Proceedings of International Conference on Intelligent Robots and Systems, volume 10, pages 1–13, 2004.

[69] Hiroyuki Kajimoto, Naoki Kawakami, Susumu Tachi, and Masahiko Inami. Smarttouch: Elec- tric skin to touch the untouchable. IEEE computer graphics and applications, 24(1):36–43, 2004.

[70] Hyunho Kim, Changhoon Seo, Junhun Lee, Jeha Ryu, Si-bok Yu, and Sooyoung Lee. Vibro- tactile display for driving safety information. In 2006 IEEE Intelligent Transportation Systems Conference, pages 573–577. IEEE, 2006.

[71] Lawrence H Kim and Sean Follmer. Swarmhaptics: Haptic display with swarm robots. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, page 688. ACM, 2019.

[72] Myongchan Kim. Saliency-driven real-time tactile effects authoring. BIBLIOGRAPHY 104

[73] Myongchan Kim, Sungkil Lee, and Seungmoon Choi. Saliency-driven tactile effect authoring for real-time visuotactile feedback. In International Conference on Human Haptic Sensing and Touch Enabled Computer Applications, pages 258–269. Springer, 2012.

[74] Myongchan Kim, Sungkil Lee, and Seungmoon Choi. Saliency-driven real-time video-to-tactile translation. IEEE transactions on haptics, 7(3):394–404, 2013.

[75] Yeongmi Kim, Jongeun Cha, Jeha Ryu, and Ian Oakley. A tactile glove design and authoring system for immersive multimedia. IEEE MultiMedia, 17(3):34–45, 2010.

[76] Keng Huat Koh, M Sreekumar, and SG Ponnambalam. Hybrid electrostatic and elastomer adhesion mechanism for wall climbing robot. Mechatronics, 35:122–135, 2016.

[77] Mina Konakovi´c,Keenan Crane, Bailin Deng, Sofien Bouaziz, Daniel Piker, and Mark Pauly. Beyond developable: computational design and fabrication with auxetic materials. ACM Transactions on Graphics (TOG), 35(4):1–11, 2016.

[78] Mina Konakovi´c-Lukovi´c,Julian Panetta, Keenan Crane, and Mark Pauly. Rapid deploy- ment of curved surfaces via programmable auxetics. ACM Transactions on Graphics (TOG), 37(4):1–13, 2018.

[79] Dimitrios A Kontarinis and Robert D Howe. Static display of shape. In Telemanipulator and Telepresence Technologies, volume 2351, pages 250–259. International Society for Optics and Photonics, 1995.

[80] Ki-Uk Kyung, Minseung Ahn, Dong-Soo Kwon, and Mandayam A Srinivasan. A compact broadband tactile display and its effectiveness in the display of tactile form. In Eurohaptics Conference, 2005 and Symposium on Haptic Interfaces for Virtual Environment and Teleop- erator Systems, 2005. World Haptics 2005. First Joint, pages 600–601. IEEE, 2005.

[81] Ki-Uk Kyung, Seung-Chan Kim, and Dong-Soo Kwon. Texture display mouse: vibrotactile pattern and roughness display. IEEE/ASME Transactions on Mechatronics, 12(3):356–360, 2007.

[82] Ki-Uk Kyung, Seung-Woo Son, Dong-Soo Kwon, and Mun-Sang Kim. Design of an integrated tactile display system. In IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA’04. 2004, volume 1, pages 776–781. IEEE, 2004.

[83] Roderic Lakes. Foam structures with a negative poisson’s ratio. Science, 235:1038–1041, 1987.

[84] Mathieu Le Goc, Lawrence H Kim, Ali Parsaei, Jean-Daniel Fekete, Pierre Dragicevic, and Sean Follmer. Zooids: Building blocks for swarm user interfaces. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology, pages 97–109. ACM, 2016. BIBLIOGRAPHY 105

[85] Jaebong Lee and Seungmoon Choi. Evaluation of vibrotactile pattern design using vibrotactile score. In 2012 IEEE Haptics Symposium (HAPTICS), pages 231–238. IEEE, 2012.

[86] Jaebong Lee and Seungmoon Choi. Real-time perception-level translation from audio signals to vibrotactile effects. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 2567–2576. ACM, 2013.

[87] Daniel Leithinger, Sean Follmer, Alex Olwal, Samuel Luescher, Akimitsu Hogge, Jinha Lee, and Hiroshi Ishii. Sublimate: state-changing virtual and physical rendering to augment in- teraction with shape displays. In Proceedings of the SIGCHI conference on human factors in computing systems, pages 1441–1450, 2013.

[88] Daniel Leithinger and Hiroshi Ishii. Relief: a scalable actuated shape display. In Proceedings of the fourth international conference on Tangible, embedded, and embodied interaction, pages 221–222, 2010.

[89] Paul Lemmens, Floris Crompvoets, Dirk Brokken, Jack Van Den Eerenbeemd, and Gert- Jan de Vries. A body-conforming tactile jacket to enrich movie viewing. In World Haptics 2009-Third Joint EuroHaptics conference and Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems, pages 7–12. IEEE, 2009.

[90] Kevin A Li, Timothy Y Sohn, Steven Huang, and William G Griswold. Peopletones: a system for the detection and notification of buddy proximity on mobile phones. In Proceedings of the 6th international conference on Mobile systems, applications, and services, pages 160–173. ACM, 2008.

[91] Min Lin, Qiang Chen, and Shuicheng Yan. Network in network. arXiv preprint arXiv:1312.4400, 2013.

[92] Matchboxtruckman163. Truck horns compilation. https://www.youtube.com/watch?v= cs-RPPsg_ks, 2013.

[93] Luke Mizzi, Enrico Salvati, Andrea Spaggiari, Jin-Chong Tan, and Alexander M Korsunsky. Highly stretchable two-dimensional auxetic metamaterial sheets fabricated via direct-laser cut- ting. International Journal of Mechanical Sciences, 167:105242, 2020.

[94] Arsha Nagrani, Joon Son Chung, and Andrew Zisserman. Voxceleb: a large-scale speaker identification dataset. arXiv preprint arXiv:1706.08612, 2017.

[95] Ken Nakagaki, Daniel Fitzgerald, Zhiyao Ma, Luke Vink, Daniel Levine, and Hiroshi Ishii. in- force: Bi-directionalforce’shape display for haptic interaction. In Proceedings of the Thirteenth International Conference on Tangible, Embedded, and Embodied Interaction, pages 615–623, 2019. BIBLIOGRAPHY 106

[96] Masashi Nakatani, Hiroyuki Kajimoto, Dairoku Sekiguchi, Naoki Kawakami, and Susumu Tachi. 3d form display with shape memory alloy. In ICAT, volume 8, pages 179–184, 2003.

[97] Jessie YC Ng, Jo CF Man, Sidney Fels, Guy Dumont, and J Mark Ansermino. An evaluation of a vibro-tactile display prototype for physiological monitoring. Anesthesia & Analgesia, 101(6):1719–1724, 2005.

[98] Austina Nga Nguyen. Designing, Manufacturing, and Predicting Deformation of a Formable Crust Matrix. PhD thesis, Georgia Institute of Technology, 2004.

[99] T Ninomiya, K Osawa, Y Okayama, Y Matsumoto, and N Miki. Mems tactile display with hydraulic displacement amplification mechanism. In Micro Electro Mechanical Systems, 2009. MEMS 2009. IEEE Conference on, pages 467–470. IEEE, 2009.

[100] Tiene Nobels, Frank Allemeersch, and Kay Hameyer. Design of a high power density electro- magnetic actuator for a portable braille display. In 10th International Power Electronics and Motion Control Conference EPE-PEMC, 2002.

[101] Eunji Oh, Minkyoung Lee, and Sujin Lee. How 4d effects cause different types of presence experience? In Proceedings of the 10th International Conference on Virtual Reality Continuum and Its Applications in Industry, pages 375–378. ACM, 2011.

[102] Dan Oneata, Jakob Verbeek, and Cordelia Schmid. Action and event recognition with fisher vectors on a compact feature set. In Proceedings of the IEEE international conference on computer vision, pages 1817–1824, 2013.

[103] Dan Overholt. The matrix: a novel controller for musical expression. In Proceedings of the 2001 conference on New interfaces for musical expression, pages 1–4. National University of Singapore, 2001.

[104] Andrew Owens and Alexei A Efros. Audio-visual scene analysis with self-supervised multisen- sory features. In Proceedings of the European Conference on Computer Vision (ECCV), pages 631–648, 2018.

[105] Andrew Owens and Alexei A. Efros. Audio-visual scene analysis with self-supervised multi- sensory features. https://github.com/andrewowens/multisensory, 2018.

[106] Sile O’Modhrain and Ian Oakley. Touch tv: Adding feeling to broadcast media. In European Conference on Interactive : from Viewers to Actors, pages 41–17, 2003.

[107] Claudio Pacchierotti, Stephen Sinclair, Massimiliano Solazzi, Antonio Frisoli, Vincent Hay- ward, and Domenico Prattichizzo. Wearable haptic systems for the fingertip and the hand: Taxonomy, review, and perspectives. IEEE Transactions on Haptics, 10(4):580–600, 2017. BIBLIOGRAPHY 107

[108] Benjamin J Peters. Design and fabrication of a digitally reconfigurable surface. PhD thesis, MIT, 2011.

[109] Gr´egoryPetit, Aude Dufresne, Vincent Levesque, Vincent Hayward, and Nicole Trudeau. Re- freshable tactile graphics applied to schoolbook illustrations for students with visual impair- ment. In Proceedings of the 10th international ACM SIGACCESS conference on Computers and accessibility, pages 89–96. ACM, 2008.

[110] EM Petriu and WS McMath. Tactile operator interface for semi-autonomous robotic applica- tions. AIRAS, Artificial Intelligence, Robotics and Automation, Space, pages 77–82, 1992.

[111] Ivan Poupyrev, Tatsushi Nashida, Shigeaki Maruyama, Jun Rekimoto, and Yasufumi Yamaji. Lumen: interactive visual and shape display for calm computing. In ACM SIGGRAPH 2004 Emerging technologies, page 17. ACM, 2004.

[112] Andrew Pressley. Elementary Differential Geometry. 2000.

[113] Abdur Rahman, Abdulmajeed Alkhaldi, Jongeun Cha, Abdulmotaleb El Saddik, et al. Adding haptic feature to youtube. In Proceedings of the 18th ACM international conference on Mul- timedia, pages 1643–1646. ACM, 2010.

[114] Xin Ren, Jianhu Shen, Phuong Tran, Tuan D Ngo, and Yi Min Xie. Design and characterisation of a tuneable 3d buckling-induced auxetic metamaterial. Materials & Design, 139:336–342, 2018.

[115] Rhino. http://www.rhino3d.com/, 2018.

[116] David Rosen, Austina Nguyen, and Hongqing Wang. On the geometry of low degree-of-freedom digital clay human-computer interface devices. In ASME 2003 International Design Engineer- ing Technical Conferences and Computers and Information in Engineering Conference, pages 1135–1144. American Society of Mechanical Engineers Digital Collection, 2003.

[117] Jarek Rossignac, Mark Allen, Wayne J Book, Ari Glezer, Imme Ebert-Uphoff, Chris Shaw, David Rosen, Stephen Askins, Jing Bai, Paul Bosscher, et al. Finger sculpting with digital clay: 3d shape input and output through a computer-controlled real surface. In 2003 Shape Modeling International., pages 229–231. IEEE, 2003.

[118] Anne Roudaut, Abhijit Karnik, Markus L¨ochtefeld, and Sriram Subramanian. Morphees: toward high” shape resolution” in self-actuated flexible mobile devices. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 593–602, 2013.

[119] Andrew Rouditchenko, Hang Zhao, Chuang Gan, Josh McDermott, and Antonio Torralba. Self-supervised audio-visual co-segmentation. In ICASSP 2019-2019 IEEE International Con- ference on Acoustics, Speech and Signal Processing (ICASSP), pages 2357–2361. IEEE, 2019. BIBLIOGRAPHY 108

[120] Sam T Roweis. One microphone source separation. In Advances in neural information pro- cessing systems, pages 793–799, 2001.

[121] Robert J Schalkoff. Digital image processing and computer vision, volume 286. Wiley New York, 1989.

[122] Mike Schuster and Kuldip K Paliwal. Bidirectional recurrent neural networks. IEEE Trans- actions on Signal Processing, 45(11):2673–2681, 1997.

[123] Arda Senocak, Tae-Hyun Oh, Junsik Kim, Ming-Hsuan Yang, and In So Kweon. Learning to localize sound source in visual scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4358–4366, 2018.

[124] Makoto Shimojo, Masami Shinohara, and Yukio Fukui. Human shape recognition performance for 3d tactile display. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 29(6):637–644, 1999.

[125] Masami Shinohara, Yutaka Shimizu, and Akira Mochizuki. Three-dimensional tactile display for the blind. IEEE Transactions on Rehabilitation Engineering, 6(3):249–256, 1998.

[126] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.

[127] Allan M Smith, Genevi´eve Gosselin, and Bryan Houde. Deployment of fingertip forces in tactile exploration. Experimental brain research, 147(2):209–218, 2002.

[128] Jocelyn Smith and Karon MacLean. Communicating emotion through a haptic link: Design space and methodology. International Journal of Human-Computer Studies, 65(4):376–387, 2007.

[129] French River Springs. Milwaukee chainsaw in action. https://www.youtube.com/watch?v= EPX6IhnxWoY, 2018.

[130] Andrew A Stanley, James C Gwilliam, and Allison M Okamura. Haptic jamming: A deformable geometry, variable stiffness tactile display using pneumatics and particle jamming. In 2013 World Haptics Conference (WHC), pages 25–30. IEEE, 2013.

[131] Andrew A. Stanley, Kenji Hata, and Allison M. Okamura. Closed-loop shape control of a haptic jamming deformable surface. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pages 2718–2724, 2016.

[132] Robert Michael Strong. An explorable electrotactile display. PhD thesis, Massachusetts Insti- tute of Technology, 1970. BIBLIOGRAPHY 109

[133] PM Taylor, A Hosseini-Sianaki, CJ Varley, and DM Pollet. Advances in an electrorheological fluid based tactile array. 1997.

[134] Yapeng Tian, Jing Shi, Bochen Li, Zhiyao Duan, and Chenliang Xu. Audio-visual event localization in unconstrained videos. In Proceedings of the European Conference on Computer Vision (ECCV), pages 247–263, 2018.

[135] Dzmitry Tsetserukou, Alena Neviarouskaya, Helmut Prendinger, Naoki Kawakami, and Susumu Tachi. Affective haptics in emotional communication. In 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops, pages 1–6. IEEE, 2009.

[136] Fernando Vidal-Verd´uand Moustapha Hafez. Graphical tactile displays for visually-impaired people. IEEE Transactions on neural systems and rehabilitation engineering, 15(1):119–130, 2007.

[137] Dolph C. Volker. Messing with a junk yard dog - guard dog on duty. https://www.youtube. com/watch?v=GsXIIyc77xU, 2013.

[138] Christopher R Wagner, Susan J Lederman, and Robert D Howe. A tactile shape display using rc servomotors. In Haptic Interfaces for Virtual Environment and Teleoperator Systems, 2002. HAPTICS 2002. Proceedings. 10th Symposium on, pages 354–355. IEEE, 2002.

[139] Akira Wakita, Akito Nakano, and Nobuhiro Kobayashi. Programmable blobs: a rheologic interface for organic shape design. In Proceedings of the fifth international conference on Tangible, embedded, and embodied interaction, pages 273–276, 2010.

[140] Conrad Wall, Marc S Weinberg, Patricia B Schmidt, and David E Krebs. Balance prosthesis based on micromechanical sensors using vibrotactile feedback of tilt. IEEE Transactions on Biomedical Engineering, 48(10):1153–1161, 2001.

[141] Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. High-resolution image synthesis and semantic manipulation with conditional gans. In Proceed- ings of the IEEE conference on computer vision and pattern recognition, pages 8798–8807, 2018.

[142] Junpei Watanabe, Hiroaki Ishikawa, Xavier Arouette, Yasuaki Matsumoto, and Norihisa Miki. Demonstration of vibrational braille code display using large displacement micro-electro- mechanical systems actuators. Japanese Journal of applied physics, 51(6S):06FL11, 2012.

[143] Maarten WA Wijntjes, Akihiro Sato, Astrid ML Kappers, and Vincent Hayward. Haptic perception of real and virtual curvature. In International Conference on Human Haptic Sensing and Touch Enabled Computer Applications, pages 361–366. Springer, 2008. BIBLIOGRAPHY 110

[144] Bob G Witmer and Michael J Singer. Measuring presence in virtual environments: A presence questionnaire. Presence, 7(3):225–240, 1998.

[145] Xin Xie, Yuri Zaitsev, L Velasquez-Garcia, S Teller, and C Livermore. Compact, scalable, high-resolution, mems-enabled tactile displays. In Proc. of solid-state sensors, actuators, and microsystems workshop, pages 127–130, 2014.

[146] Gi-Hun Yang, Ki-Uk Kyung, Mandayam A Srinivasan, and Dong-Soo Kwon. Quantitative tactile display device with pin-array type tactile feedback and thermal feedback. In Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006., pages 3917–3922. IEEE, 2006.

[147] Wei Yang, Zhong-Ming Li, Wei Shi, Bang-Hu Xie, and Ming-Bo Yang. Review on auxetic materials. Journal of materials science, 39(10):3269–3279, 2004.

[148] Kyoko Yatsuzuka, Fumikazu Hatakeyama, Kazutoshi Asano, and Shinichiro Aonuma. Funda- mental characteristics of electrostatic wafer chuck with insulating sealant. IEEE Transactions on Industry Applications, 36(2):510–516, 2000.

[149] Richard Yeh, Seth Hollar, and Kristofer SJ Pister. Single mask, large force, and large dis- placement electrostatic linear inchworm motors. Journal of Microelectromechanical Systems, 11(4):330–336, 2002.

[150] Levent Yobas, Dominique M Durand, Gerard G Skebe, Frederick J Lisy, and Michael A Huff. A novel integrable microvalve for refreshable braille display system. Journal of microelectrome- chanical systems, 12(3):252–263, 2003.

[151] Juan Jos´eZ´arateand Herbert Shea. Using pot-magnets to enable stable and scalable electro- magnetic tactile displays. IEEE transactions on haptics, 10(1):106–112, 2017.

[152] Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, and Dimitris N Metaxas. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE international conference on computer vision, pages 5907–5915, 2017.

[153] Kai Zhang and Sean Follmer. Electrostatic adhesive brakes for high spatial resolution refre- shable 2.5 d tactile shape displays. In Haptics Symposium (HAPTICS), 2018 IEEE, pages 319–326. IEEE, 2018.

[154] Kai Zhang, Eric J Gonzalez, Jianglong Guo, and Sean Follmer. Design and analysis of high- resolution electrostatic adhesive brakes towards static refreshable 2.5 d tactile shape display. IEEE transactions on haptics, 12(4):470–482, 2019. BIBLIOGRAPHY 111

[155] Haihong Zhu. Practical structural design and control for Digital Clay. PhD thesis, Georgia Institute of Technology, 2005.

[156] Haihong Zhu and Wayne J Book. Practical structure design and control for digital clay. In ASME 2004 International Mechanical Engineering Congress and Exposition, pages 1051–1058. American Society of Mechanical Engineers, 2004.

[157] Rui Zhu, Ulrike Wallrabe, Matthias C Wapler, Peter Woias, and Ulrich Mescheder. Dielectric electroactive polymer membrane actuator with ring-type electrode as driving component of a tactile actuator. Procedia Engineering, 168:1537–1540, 2016.