Arxiv:2104.07608V1 [Cs.CV] 15 Apr 2021
Total Page:16
File Type:pdf, Size:1020Kb
Camera View Adjustment Prediction for Improving Image Composition Yu-Chuan Su Raviteja Vemulapalli Ben Weiss Chun-Te Chu Philip Andrew Mansfield Lior Shapira Colvin Pitts Google Research Abstract Initial view View Adjustment Model Image composition plays an important role in the qual- ity of a photo. However, not every camera user possesses the knowledge and expertise required for capturing well- Move-Down composed photos. While post-capture cropping can im- Real-time Needs adjustment: Yes prove the composition sometimes, it does not work in many feedback Adjustment: common scenarios in which the photographer needs to ad- just the camera view to capture the best shot. To address Magnitude: 20% this issue, we propose a deep learning-based approach that Adjust the view provides suggestions to the photographer on how to adjust the camera view before capturing. By optimizing the com- position before a photo is captured, our system helps pho- Photo with improved composition tographers to capture better photos. As there is no publicly- available data for this task, we create a view adjustment dataset by repurposing existing image cropping datasets. Furthermore, we propose a two-stage semi-supervised ap- Figure 1: Our goal is to improve the composition of the proach that utilizes both labeled and unlabeled images for captured photo by providing view adjustment suggestions training a view adjustment model. Experiment results show when the user is composing the shot. that the proposed semi-supervised approach outperforms justment dataset, the crops predicted by the state-of-the-art the corresponding supervised alternatives, and our user GAIC cropping model [48] have an average IoU of 0.61 study results show that the suggested view adjustment im- with the groundtruth views, clearly indicating that cropping proves image composition 79% of the time. is not enough. In comparison, the proposed view adjust- ment model achieves a much higher IoU of 0.75. While there are some existing rules for composing pho- 1. Introduction tos, each rule is valid only for specific scenes, and requires Image composition has a significant effect on the percep- detection of various low-level (leading lines, triangles, etc.) tion of an image. While a good composition can help make and high-level semantic (face/person, foreground objects) cues. Furthermore, it is non-trivial to determine which rule arXiv:2104.07608v1 [cs.CV] 15 Apr 2021 a great picture out of the dullest subjects and plainest of en- vironments, a bad composition can easily ruin a photograph or combination of rules is applicable for a given scene. despite how interesting the subject may be. Unfortunately, To address this problem, we introduce a system that pro- a typical camera user may lack the knowledge and expertise vides camera view adjustment suggestions to the photogra- required to capture images with great composition. pher when they are composing the shot. Given a view com- A commonly used technique for improving image com- posed by the user, our goal is to suggest a candidate view position is image cropping, and several existing works study adjustment and its magnitude such that the photo captured how to crop images automatically [38,5, 35, 29,4, 49, 23, after applying the adjustment will have a better composi- 11, 31, 40, 10,7, 12, 44,6, 42, 47, 26, 39, 48]. How- tion (see Fig.1). Specifically, we consider the following ever, cropping works only in limited scenarios in which the adjustments in this work: horizontal (left or right), vertical best composition can be achieved by removal of certain por- (up or down), zoom (in or out), and rotation (clockwise or tions of the image. It is not suitable in many common sce- counter clockwise) along the principal axis. The adjustment narios where the photographer needs to adjust the camera magnitude is represented using a percentage of the image view to get the best shot. When evaluated on our view ad- size for all adjustments except rotation, for which we use We evaluate our approach on a view adjustment dataset consisting of 3,026 samples generated from 521 images from the FCDB [6] and GAICD [47] datasets. Quantitative results show that the proposed semi-supervised approach clearly outperforms the corresponding supervised alterna- Figure 2: View adjustment enables more diverse modifica- tives, and user study shows that the adjustments suggested tion to image composition, and can improve composition in by our model improve the composition 79% of the time. scenarios where image cropping fails. Our major contributions are as follows. First, we for- mulate the problem of view adjustment prediction for im- radians. By adjusting the view prior to capture, we enable proving image composition. Second, we introduce a labeled generic modifications to image composition. Hence, our dataset for evaluating view adjustment prediction models. system can improve image composition in scenarios where Finally, we propose a two-stage semi-supervised approach cropping fails (see Fig.2). Note that this work focuses on that leverages both labeled and unlabeled data to train the static scenes, or more specifically scenes where motion is view adjustment model. We show that the proposed semi- not the subject (e.g. portraits, nature and urban environment supervised approach outperforms the corresponding super- photography). Suggesting view adjustments may not be suit vised alternative quantitatively and demonstrate the effec- dynamic (wildlife, sports, actions) scenes where lightning- tiveness of our model through user study. quick decisions are required from the photographer. To the best of our knowledge, there is no publicly- 2. Related Work available data for evaluating the performance of view ad- Image cropping Cropping is a widely used technique for justment prediction. While a common practice is for human changing image composition during post-processing, and raters to annotate images with ground truth labels, this is automatic image cropping algorithms aim to find the best difficult for view adjustment because the results of adjust- crop within an image. One common approach followed ment are generally not available. The raters need to infer by existing methods is to select the candidate crops us- how the adjustment affects the composition, which may be ing a scoring function, and the research focus has been difficult without professional photography knowledge. In- on designing a good scoring function for cropping. Exist- stead, we create a new view adjustment dataset from exist- ing works exploit saliency [38,5, 35, 29,4], photography ing image cropping datasets. The idea is to convert view ad- rules [49, 23, 11], or a data driven approach to learn the justments into operations on 2D image bounding boxes and scoring function [31, 40, 10,7, 12, 44,6, 42, 47, 26, 39, 48, use the best crop annotation as the target view for adjust- 22]. While early methods learn the cropping model using ment. The proposed approach allows us to generate samples unsupervised data [7, 12] or data annotated for generic im- and view adjustment labels from image cropping datasets age aesthetic quality [31, 40, 10], recent works show that a automatically without additional human labor. large scale annotated dataset designed specifically for crop- We acknowledge that our view adjustment dataset is not ping is essential for learning the state-of-the-art image crop- ideal in several respects. It ignores perspective distortions ping model [42, 47, 26, 39, 48]. Instead of following the while adjusting the camera view, and the adjustment mag- scoring function paradigm, some works try to predict the nitude could be limited depending on the ratios between the target crop directly without generating and scoring candi- best crop and the uncropped image sizes. Despite these lim- date crops [19, 20, 25, 21, 24]. itations, our view adjustment dataset still provides a good Our goal is not to produce the best image crop; it is to starting point to evaluate composition-aware view adjust- improve image composition while the photographer is com- ment prediction models. Furthermore, we address these posing the image. Because view adjustment is a more gen- limitations by also evaluating the view adjustment model on eral operation than image cropping, it may improve compo- 360◦ images, which do not suffer from distortions or limited sition in scenarios where cropping is not suitable. Further- field-of-view (FOV). Please refer to Sec. 4.3 for details. more, a view adjustment model needs to make a suggestion Another limitation of our view adjustment dataset is that based on partial information, while the target crop is fully its amount and diversity is inherently limited by the crop- visible to image cropping models. This introduces unique ping datasets. Because state-of-the-art machine learning challenges in view adjustment prediction for both data col- models typically require a large and diverse dataset to train lection and modeling. well, our view adjustment dataset may not be sufficient for training a good view adjustment model. In light of this Photography recommendation Prior works on photog- problem, we propose a two-stage training approach that raphy recommendation provide various types of sugges- makes use of additional unlabeled images. See Fig.3. Our tions. Some of them study person location recommenda- empirical results show that the additional unlabeled data is tion [50, 43, 28, 27, 41, 34]. They take a scenic image as important for improving model performance. input and suggest where a person should stand within the Unlabeled Image Well-composed Image Cropping Annotation Composition Pseudo Label Scoring Model Generation Need adjustment? (Sec. 3.2) (Sec. 3.3) View Adjustment Which adjustment? Model Best Image Crop (Sec. 3.4) Adjustment magnitude? Perturb View Adjustment Label Generation (Sec. 3.1) Figure 3: Our two-stage semi-supervsied approach leverages both labeled and unlabeled data to learn the view adjustment model.