Intelligent Carpet: Inferring 3D Human Pose from Tactile Signals

Intelligent Carpet: Inferring 3D Human Pose from Tactile Signals

Intelligent Carpet: Inferring 3D Human Pose from Tactile Signals Yiyue Luo, Yunzhu Li, Michael Foshey, Wan Shou, Pratyusha Sharma, Tomas´ Palacios, Antonio Torralba, Wojciech Matusik Massachusetts Institute of Technology fyiyueluo, liyunzhu, mfoshey, wanshou, pratyuss, tpalacios, torralba, [email protected] z Data Collection Setup y x RBG Image x x x x x x y y y y y y Map Pressure 3DSkeleton Reconstructed Figure 1. Left: A low-cost, high-density, large-scale intelligent carpet system to capture the real-time human-floor tactile interactions. Right: The inferred 3D human pose from the captured tactile interactions of a person standing up from a sitting position. Abstract evitable during daily activities. Further, the rising demand for privacy also promotes development in non-vision-based Daily human activities, e.g., locomotion, exercises, and human pose estimation systems [63, 62]. Since most hu- resting, are heavily guided by the tactile interactions be- man activities are dependent on the contact between the tween the human and the ground. In this work, leveraging human and the environment, we present a pose estima- such tactile interactions, we propose a 3D human pose es- tion approach using tactile interactions between humans timation approach using the pressure maps recorded by a and the ground. Recently, various smart floor or carpet tactile carpet as input. We build a low-cost, high-density, systems have been proposed for human movement detec- large-scale intelligent carpet, which enables the real-time tion [11,2, 48,7,3, 60, 40, 16,1], and posture recognition recordings of human-floor tactile interactions in a seam- [25, 50]. Previous work has also demonstrated the feasibil- less manner. We collect a synchronized tactile and visual ity of using pressure images for pose estimation [6,9,8]. dataset on various human activities. Employing a state-of- However, these studies mainly focus on the estimation of the-art camera-based pose estimation model as supervision, poses where a large portion of the body is in direct con- we design and implement a deep neural network model to tact with the sensing surface, e.g., resting postures. A more infer 3D human poses using only the tactile information. challenging task is to infer 3D human pose from the limited Our pipeline can be further scaled up to multi-person pose pressure imprints involved in complicated daily activities, estimation. We evaluate our system and demonstrate its po- e.g., using feet pressure distribution to reconstruct the pose tential applications in diverse fields. of the head and limbs. So far, complex 3D human pose esti- mation and modeling using tactile information, spanning a 1. Introduction diverse set of human activities including locomotion, rest- Human pose estimation is critical in action recognition ing, and daily exercises, have not been available. [30, 52], gaming [26], healthcare [64, 36, 22], and robotics In this study, we first develop an intelligent carpet – [34]. Significant advances have been made to estimate hu- a large integrated tactile sensing array consisting of over man pose by extracting skeletal kinematics from images 9,000 pressure sensors, covering over 36 square feet, which and videos. However, camera-based pose estimation re- can be seamlessly embedded on the floor. Coupled with mains challenging when occlusion happens, which is in- readout circuits, our system enables real-time recordings 6 ft of high-resolution human-ground tactile interactions. With this hardware, we collect over 1,800,000 synchronized tac- …… tile and visual frames for 10 different individuals perform- 32 electrodes ing a diverse set of daily activities, e.g., lying, walking, and Data Transmission exercising. Employing the visual information as supervi- Step Bend Lunge Sit Sit-up Roll Push-up sion, we design and implement a deep neural network to in- fer the corresponding 3D human pose using only the tactile Images RGB information. Our network predicts the 3D human pose with the average localization error of less than 10 cm compared with the ground truth pose obtained from the visual infor- Pressure Maps mation. The learned representations from the pose estima- Figure 2. Tactile data acquisition hardware. Top: Our recording tion model, when combined with a simple linear classifier, set-up includes a tactile sensing carpet spanning 36 ft2 with 9,216 allow us to perform action classification with an accuracy sensors (upper right), the corresponding readout circuits, and 2 of 98.7%. We also include ablation studies and evaluate cameras. Bottom: Typical pressure maps captured by the carpet how well our model generalizes to unseen individuals and from diverse human poses and activities. unseen actions. Moreover, our approach can be scaled up in contact with the ground and the pressure they exert. for multi-person 3D pose estimation. Leveraging the tactile 2.2. Tactile Sensing sensing modality, we believe our work opens up opportuni- ties for human pose estimation that is unaffected by visual Recent advances in tactile sensing have benefited obstructions in a seamless and confidential manner. the recording, monitoring, and modeling of human- environment interactions in a variety of contexts. Sun- 2. Related Work daram et al. [55] have investigated the signatures of hu- man grasp through the tactile interaction between human 2.1. Human Pose Estimation hands and different objects, while other researchers have proposed to connect vision and touch using cross-domain Thanks to the availability of large-scale datasets of an- modeling [31, 61, 28]. Davoodnia et al. [10] transform an- notated 2D human poses and the introduction of deep neu- notated in-bed human pressure map [41, 15] to images con- ral network models, human pose estimation from 2D im- taining shapes and structures of body parts for in-bed pose ages or videos has witnessed major advances in recent estimation. Furthermore, extensive works on biomechanics, years [57, 56, 45, 38, 39, 59, 46,5, 17, 12, 20, 54]. Re- human kinematics, and dynamic motions [13, 29, 35, 32] covering 3D information from 2D inputs is intrinsically have explored the use of the foot pressure maps induced ambiguous. Some recent attempts to recover 3D human by daily human movement.Previous studies have demon- pose from 2D images either require explicit 3D supervi- strated human localization and tracking by embedding in- sion [33] or rely on a discriminator and adversarial train- dividual pressure sensing units in the smart floor systems ing to learn a valid pose distribution [23, 24] or perform- [53, 50]. Furthermore, using large-scale pressure sensing ing semi-supervised learning by leveraging the temporal in- matrices, researchers have been able to capture foot pres- formation [44]. Still, 3D pose estimation remains a chal- sure patterns when humans are standing and walking and lenging problem due to the underlying ambiguity. Many develop models that provide gait analysis and human identi- methods do not perform well in the presence of a severe fication [43, 58, 37]. Based on the fact that human maintains occlusion. Another alternative is to use multi-camera mo- balance through redirecting the center of mass by exerting tion capture systems (e.g., VICON) or multiple cameras to force on the floor [19], Scott et al. [49] have predicted foot obtain a more reliable pose estimation [51, 63, 21]. pressure heatmap from 2D human kinematics. Our work builds upon the past advances in computer Different from previous works, which include only lim- vision by using OpenPose [4] to extract the 2D keypoints ited actions due to the limited size and resolution of the from multiple cameras and triangulate them to generate the tactile sensing platform, we record and leverage high- ground truth 3D pose. Our system predicts 3D pose from resolution tactile data from diverse human daily activities, only the tactile signals, which does not require any visual e.g., exercises, to estimate 3D human skeleton. data and is fundamentally different from all past work in computer vision. The introduced tactile carpet has a lower 3. Dataset spatial resolution than typical cameras. However, it func- tions as a camera viewing humans from the bottom up. This In this section, we describe details of our hardware setup type of data stream does not suffer from occlusion problems for tactile data acquisition, pipeline for ground truth 3D that are typical for camera systems. Furthermore, it pro- keypoint confidence map generation, as well as data aug- vides additional information, such as whether humans are mentation and synthesis for multi-person pose estimation. annotate the ground truth human pose in a scalable manner, we leverage a state-of-the-art vision-based system, Open- Pose [5], to generate 2D skeletons from the images captured by the cameras. Once we have obtained the 2D skeletons generated from the calibrated camera system, we triangulate the keypoints to generate the corresponding 3D skeletons. The triangu- OpenPose Triangulation Gaussian lation results may not be perfect in some frames due to RGB Images 2D Skeleton 3D Skeleton 3D Confidence Map Label perception noise or misdetection. To resolve this issue, Figure 3. 3D keypoint confidence map generation. The ground we add a post-optimization stage to constrain the length of truth voxelized 3D keypoint confidence maps are annotated by first each link. More specifically, we first calculate the length extracting 2D skeleton keypoints from RGB images using Open- of the links in the skeleton using the median value across Pose [4], then generating 3D keypoints through triangulation and the naively triangulated result for each person. For this spe- optimization, and finally applying a 3D Gaussian filter. th cific person, we denote the length of the i link as Ki. We A B 3.1. Tactile Data Acquisition then use q and q to represent the detected N keypoints at a specific time step from the two cameras, which lie in a A A A A A A Our tactile data acquisition is based on a custom, high- 2D space, where q = fq1 ;:::; qN g and qk = (uk ; vk ).

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    11 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us