Learning to Learn Cropping Models for Different Aspect Ratio Requirements

Learning to Learn Cropping Models for Different Aspect Ratio Requirements Debang Li 1,2, Junge Zhang1,2, Kaiqi Huang1,2,3 1 CRISE, Institute of Automation, Chinese Academy of Sciences, Beijing, China 2 School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China 3 CAS Center for Excellence in Brain Science and Intelligence Technology, Beijing, China {debang.li,jgzhang, kaiqi.huang}@nlpr.ia.ac.cn Abstract Image cropping aims at improving the framing of an image by removing its extraneous outer areas, which is widely used in the photography and printing industry. In some cases, the aspect ratio of cropping results is specified de- Figure 1. Illustration of the aspect ratio specified image crop- pending on some conditions. In this paper, we propose a ping. The left image is the original image, and the three images meta-learning (learning to learn) based aspect ratio speci- on the right are cropped images of different required aspect ratios. fied image cropping method called Mars, which can gener- focus on the two-stage methods [31, 32, 13, 6, 7, 19]. Many ate cropping results of different expected aspect ratios. In candidates are generated at the first stage and ranked on the proposed method, a base model and two meta-learners the second stage. These two-stage methods can be directly are obtained during the training stage. Given an aspect transferred to the aspect ratio specified settings by adjust- ratio in the test stage, a new model with new parameters ing the candidates. Since there are many candidates in an can be generated from the base model. Specifically, the image, the speed of these methods is inevitably slow. To two meta-learners predict the parameters of the base model speed up, several methods [40, 25, 41, 26] obtain the crop- based on the given aspect ratio. The learning process of the ping window directly without using the sliding window. proposed method is learning how to learn cropping mod- However, these methods rarely consider the aspect ratios. els for different aspect ratio requirements, which is a typi- In [12], an object detection based approach is proposed for cal meta-learning process. In the experiments, the proposed the aspect ratio specified image cropping by adding more method is evaluated on three datasets and outperforms most prediction heads. state-of-the-art methods in terms of accuracy and speed. In In this paper, we regard generating the cropped images addition, both the intermediate and final results show that of a specified aspect ratio as an isolated task and adopt a the proposed model can predict different cropping windows single model to accomplish multiple such sub-tasks. The for an image depending on different aspect ratio require- model should be able to adapt to many environments with ments. different aspect ratio requirements. Therefore, we propose a meta-learning (learning to learn) based aspect ratio spec- 1. Introduction ified image cropping approach (called Mars) to accomplish this goal. In the proposed approach, we train a base model Image cropping is commonly used in image editing, try- and two meta-learners during the training process. In the ing to find a good view with a better composition than the inference stage, a new model with new parameters is gener- input image. Automatic image cropping can be widely ap- ated from the base model given a new aspect ratio. Specif- plied in the photographic, printing industry, and other re- ically, some parameters of the base model are predicted by lated fields for saving time. Depending on the application, the meta-learners depending on the required aspect ratio. As the aspect ratio of the cropped image may be specified and the required aspect ratio is a continuous value, the number vary with different conditions. As such, the aspect ratio of models with different parameters is infinite. The learning specified image cropping algorithms should be able to cover process of the proposed method can be viewed as learning a range of aspect ratios, an illustration of which is shown in how to learn cropping models for different aspect ratios. In Figure 1. the base model, the parameters depending on the required Early researches on the general image cropping mostly aspect ratio are the aspect ratio specified feature transfor- 12685 mation matrix (ARS-FTM) and the aspect ratio specified aspect ratio while keeping the import contents, which are pixel-wise predictor (ARS-PWP). When both ARS-FTM related to our task. However, image cropping aims to find and ARS-PWP are determined by the meta-learners, the the best window on the image that satisfies the requirement, newly generated model can predict the cropping window while image retargeting concentrates on content-aware im- of the specified aspect ratio from the image. In the exper- age resizing, the experimental settings between these two iments, both the quantitative and qualitative results show tasks are different. that the proposed meta-learning based approach can gener- Meta-Learning. Meta-learning is also known as learn- ate cropping windows of required aspect ratios effectively ing to learn, which means the machine learning algorithms and efficiently. can learn how to learn the knowledge. In other words, the The main contributions of this work are: model needs to be aware of and take control of its learn- • We propose a meta-learning based method that can ing [24]. Through these properties of meta-learning, mod- predict the cropping results of arbitrary aspect ratios els can be more easily adapted to different environments using a single model. and tasks, rather than considering each one separately. Due • We develop an aspect ratio embedding method and to these reasons, meta-learning has been widely applied two aspect ratio specified modules (i.e., ARS-FTM and in hyper-parameter optimization [29], neural network op- ARS-PWP) to model the aspect ratio information and timization [5], few-shot learning [14], fast reinforcement map the aspect ratio to the parameters of the model. learning [38], and visual tracking [2, 24]. • We show that the proposed algorithm achieves state- In this paper, our goal is to solve cropping problems for of-the-art performance on both the quantitative evalu- different aspect ratio requirements with a single model. Re- ation and user study and can run in real-time (over 100 garding generating cropping results for a specified aspect FPS). ratio as an isolated task, the above goal can be naturally solved by meta-learning. The weight prediction is one of 2. Related Work the meta-learning strategies [23], which can adapt mod- Image Cropping. Most early researches on image crop- els to different environments by dynamically predicting the ping focused on the sliding window based two-stage opera- weight of the model [3, 16, 43]. The proposed method also tions. According to the standards of ranking the candidates belongs to this category, which predicts the weight of the generated by the sliding window, these methods can be di- model depending on the aspect ratio information. Kishore et vided into two groups, the attention-based and aesthetics- al. [21] use the adaptive convolution [18] for the final classi- based methods. The attention-based methods [30, 32, 37, fication and regression according to a scalar input value (the 13, 4] usually rank candidates according to the attention aspect ratio). In contrast, our method proposes to uses the score obtained by saliency detection [30]. As such, the embedding interpolation for the aspect ratio representation cropping windows can preserve the main subjects and draw and also generates an ARS-FTM module for the global fea- more attention from people. However, they may fail to gen- ture transformation in the middle stage, which can encode erate visually pleasing results due to the lack of considering the aspect ratio information in the global feature represen- the image composition [6]. Those aesthetics-based meth- tation. In addition, our model can run much faster (over 100 ods [19] try to find the most visually pleasing cropping win- FPS). dow from the input image. Some methods [31, 8, 42, 13] design a set of hand-crafted features to evaluate the aesthet- 3. Proposed Algorithm ics, and other methods [17, 6, 7, 41] train aesthetics dis- criminators from data to rank cropping candidates. Several 3.1. Problem Formulation methods [36, 9, 35, 4, 25, 26] are developed to search for the In this section, we formulate the aspect ratio specified optimal cropping window more efficiently instead of eval- image cropping problem and the proposed meta-learning uating all candidates to speed up the sliding window based based approach (Mars). For general image cropping prob- methods. lems, the model takes an image xi as input and outputs a Fast-AT [12] is designed for the aspect ratio speci- visually pleasing cropping window yi, which is fied image cropping by plugging several predicting heads for different aspect ratio intervals to the object detection yi = F(xi; W ), (1) model [10]. In the proposed meta-learning based approach, we do not have to train different filters separately for differ- where W represents the parameters of the model F. Differ- ent aspect ratios, but use a single model to adapt to different ent from the general setting, the aspect ratio specified image aspect ratio requirements, where the aspect ratio specified cropping has an additional aspect ratio requirement, which parameters are predicted by the meta-learners. Image re- is (τi) targeting methods [1, 27] adjust the images to fit the target yi = F(xi,τi; W ), (2) 12686 aspect ratio ARS-FTM ARS-PWP Global Avg. Pool Replicate times Figure 2. Overview of the proposed model. The numbers above each feature map represent the shape of the feature map (height × width × channel).

Load more