Learning Perspective Undistortion of Portraits
Total Page:16
File Type:pdf, Size:1020Kb
Learning Perspective Undistortion of Portraits Yajie Zhao1,*, Zeng Huang1,2,*, Tianye Li1,2, Weikai Chen1, Chloe LeGendre1,2, Xinglei Ren1, Jun Xing1, Ari Shapiro1, and Hao Li1,2,3 1USC Institute for Creative Technologies 2University of Southern California 3Pinscreen Beam Splitter Fixed Camera Sliding Camera (a) Capture Setting (b) Input image (c) Undistorted image (d) Reference (e) Input image (f) Undistorted image (g) Reference Figure 1: We propose a learning-based method to remove perspective distortion from portraits. For a subject with two different facial expressions, we show input photos (b) (e), our undistortion results (c) (f), and reference images (d) (g) captured simultaneously using a beam splitter rig (a). Our approach handles even extreme perspective distortions. Abstract 1. Introduction Perspective distortion artifacts are often observed in Near-range portrait photographs often contain perspec- portrait photographs, in part due to the popularity of the tive distortion artifacts that bias human perception and “selfie” image captured at a near-range distance. The challenge both facial recognition and reconstruction tech- inset images, where a person is photographed from dis- niques. We present the first deep learning based approach tances of 160cm and 25cm, demonstrate these artifacts. to remove such artifacts from unconstrained portraits. In When the object-to- contrast to the previous state-of-the-art approach [25], our camera distance is method handles even portraits with extreme perspective dis- comparable to the size of tortion, as we avoid the inaccurate and error-prone step of a human head, as in the first fitting a 3D face model. Instead, we predict a distor- 25cm distance example, tion correction flow map that encodes a per-pixel displace- there is a large propor- ment that removes distortion artifacts when applied to the tional difference between arXiv:1905.07515v1 [cs.CV] 18 May 2019 input image. Our method also automatically infers missing the camera-to-nose dis- facial features, i.e. occluded ears caused by strong perspec- tance and camera-to-ear tive distortion, with coherent details. We demonstrate that @160CM @25CM distance. This difference our approach significantly outperforms the previous state- creates a face with unusual proportions, with the nose and of-the-art [25] both qualitatively and quantitatively, partic- eyes appearing larger and the ears vanishing all together ularly for portraits with extreme perspective distortion or [61]. facial expressions. We further show that our technique ben- Perspective distortion in portraits not only influences the efits a number of fundamental tasks, significantly improving way humans perceive one another [10], but also greatly im- the accuracy of both face recognition and 3D reconstruc- pairs a number of computer vision-based tasks, such as face tion and enables a novel camera calibration technique from verification and landmark detection. Prior research [41, 42] a single portrait. Moreover, we also build the first perspec- has demonstrated that face recognition is strongly compro- tive portrait database with a large diversity in identities, mised by the perspective distortion of facial features. Ad- expression and poses. ditionally, 3D face reconstruction from such portraits is 1 highly inaccurate, as geometry fitting starts from biased fa- missing facial features like ears or the rim of the face. We cial landmarks and distorted textures. show that our approach significantly outperforms Fried et Correcting perspective distortion in portrait photography al. [25] both qualitatively and quantitatively for a synthetic is a largely unstudied topic. Recently, Fried et al. [25] in- dataset, constrained portraits, and unconstrained portraits vestigated a related problem, aiming to manipulate the rel- from the Internet. We also show that our proposed face ative pose and distance between a camera and a subject in a undistortion technique, when applied as a pre-processing given portrait. Towards this goal, they fit a full perspective step, improves a wide range of fundamental tasks in com- camera and parametric 3D head model to the input image puter vision and computer graphics, including face recogni- and performed 2D warping according to the desired change tion/verification, landmark detection on near-range portraits in 3D. However, the technique relied on a potentially inac- (such as head mounted cameras in visual effects), and 3D curate 3D reconstruction from facial landmarks biased by face model reconstruction, which can help 3D avatar cre- perspective distortion. Furthermore, if the 3D face model ation and the generation of 3D photos (Section 6.3). Ad- fitting step failed, as it could for extreme perspective distor- ditionally, our novel camera distance prediction provides tion or dramatic facial expressions, so would their broader accurate camera calibration from a single portrait. pose manipulation method. In contrast, our approach does Our main contributions can be summarized as follows: not rely on model fitting from distorted inputs and thus can • The first deep learning based method to automatically handle even these challenging inputs. Our GAN-based syn- remove perspective distortion from an unconstrained thesis approach also enables high-fidelity inference of any near-range portrait, benefiting a wide range of funda- occluded features, not considered by Fried et al. [25]. mental tasks in computer vision and graphics. In our approach, we propose a cascaded network that maps a near-range portrait with perspective distortion to its • A novel and accurate camera calibration approach that distortion-free counterpart at a canonical distance of 1:6m only requires a single near-range portrait as input. (although any distance between 1:4m ∼ 2m could be used as the target distance for good portraits). Our cascaded net- • A new perspective portrait database for face undistor- work includes a distortion correction flow network and a tion with a wide range of subject identities, head poses, completion network. Our distortion correction flow method camera distances, and lighting conditions. encodes a per-pixel displacement, maintaining the origi- nal image’s resolution and its high frequency details in the 2. Related Work output. However, as near-range portraits often suffer from Face Modeling. We refer the reader to [46] for a com- significant perspective occlusions, flowing individual pixels prehensive overview and introduction to the modeling of often does not yield a complete final image. Thus, the com- digital faces. With advances in 3D scanning and sens- pletion network inpaints any missing features. A final tex- ing technologies, sophisticated laboratory capture systems ture blending step combines the face from the completion [6,7,9, 23, 26, 40, 44, 62] have been developed for high- network and the warped output from the distortion correc- quality face reconstruction. However, 3D face geometry tion network. As the possible range of per-pixel flow values reconstruction from a single unconstrained image remains vary by camera distance, we first train a camera distance challenging. The seminal work of Blanz and Vetter [8] pro- prediction network, and feed this prediction along with the posed a PCA-based morphable model, which laid the foun- input portrait to the distortion correction network. dation for modern image-based 3D face modeling and in- Training our proposed networks requires a large corpus spired numerous extensions including face modeling from of paired portraits with and without perspective distortion. internet pictures [35], multi-view stereo [3], and reconstruc- However, to the best of our knowledge, no previously ex- tion based on shading cues [36]. To better capture a vari- isting dataset is suited for this task. As such, we construct ety of identities and facial expressions, the multi-linear face the first portrait dataset rendered from 3D head models with models [60] and the FACS-based blendshapes [12] were large variations in camera distance, head pose, subject iden- later proposed. When reconstructing a 3D face from im- tity, and illumination. To visually and numerically evalu- ages, sparse 2D facial landmarks [20, 21, 52, 65] are widely ate the effectiveness of our approach on real portrait pho- used for a robust initialization. Shape regressions have tographs, we also design a beam-splitter photography sys- been exploited in the state-of-the-art landmark detection ap- tem (see Teaser) to capture portrait pairs of real subjects proaches [13, 34, 48] to achieve impressive accuracy. simultaneously on the same optical axis, eliminating differ- Due to the low dimensionality and effectiveness of mor- ences in poses, expressions and lighting conditions. phable models in representing facial geometry, there have Experimental results demonstrate that our approach re- been significant recent advances in single-view face recon- moves a wide range of perspective distortion artifacts (e.g., struction [56, 49, 38, 55, 30]. However, for near-range por- increased nose size, squeezed face, etc), and even restores trait photos, the perspective distortion of facial features may lead to erroneous reconstructions even when using the state- using vanishing points from a single scene photos with hori- of-the-art techniques. Therefore, portrait perspective undis- zontal lines. To the best of our knowledge, our method is the tortion must be considered as a part of a pipeline for accu- first to estimate camera parameters from a single portrait. rately modeling facial geometry. 3. Overview