<<

Including “ugly” features to improve aesthetic prediction in natural images.

Extended abstract

Visual is a branch of empirical that uses to explain and understand visual aesthetic experiences at the neurological level. It has been suggested (Cupchik et al., 2009) that aesthetic experience results from the interaction between top-down, attention-driven cognitive processes and bottom-up pre- attentive, perceptual facilitation mechanisms. Within this context, many attempts have been made to organise the aesthetic experience in terms of known rules that relate to neural properties of the brain. For example, the neurologist Semir Zeki proposed two Laws of the Visual Brain: constancy (the capacity to filter irrelevant dynamic properties), which helps capturing the essence of objects and (the capacity to generate general representations that can be applied to many particulars) which helps to efficiently process visual stimuli. A more detailed set of eight highly speculative rules was proposed by V.S. Ramachandran and William Hirstein (Ramachandran and Hirstein, 1999). They are said to be the universal laws or principles underlying aesthetical appreciation which transcend cultural boundaries or styles.

Up to now, Computer Science has explored the problem of aesthetic judgement prediction mainly from data-driven approaches (Datta et al., 2006; Dhar et al., 2011; Marchesotti et al., 2011; Perronnin, 2012; Yan et al., 2006; Yiwen and Xiaoou, 2008). Much of these rely on extracting ad-hoc visual features based on photography’s rules of thumb (such as the “rule of thirds”), intuition, and trends in the field (Dhar et al., 2011;Yan et al., 2006; Yiwen and Xiaoou, 2008). These features, (which may include image distortion (Hanghang et al., 2005), spatial distribution of edges, blur, histograms of low-level colour properties such as brightness and hue (Yan et al., 2006), composition (Dhar et al., 2011), etc.) are interpreted as medium-level attributes that connect cognitive high-level concepts (e.g. beautifulness, ugliness, indifference, ) to low-level retinal stimulation (pixels). The contribution of these medium-level descriptors has been shown to improve significantly the judgement prediction performance (Dhar et al., 2011; Yan et al., 2006; Yiwen and Xiaoou, 2008). However, since there is no agreed underlying theory, it is very difficult to learn from these features selection and results therefore have more technological than scientific value. Moreover, they ignore the bulk of neuroscience and neuroaesthetics literature that has been developed recently and crucially all the have been performed on image datasets that are both semantically charged and highly biased towards the “beautiful” side. Usually, aesthetic analysis databases are generally constructed from large repositories of images such as DPChallenge (http://www.dpchallenge.com/) whose purpose is to host digital photography contests or web-hosting sites such as Flickr (https://www.flickr.com/) (see for example Perronnin, 2012). These repositories contain highly semantically-charges images (images with meaning other than purely aesthetic) that have been uploaded with clear purposes: in the case of DPChallenge the idea is to win a competition, and in the case of Flickr to share or preserve images considered important for some reason (“ugly” or “uninteresting” images are generally deleted from cameras and computers). For this reason, there is currently no image dataset that contains semantics-free, purposely-made aesthetically unpleasant images, which might constitute a strong bias since nobody knows what features determine the lower extreme of the “ugly-beautiful” continuum.

In this master’s Dissertation, we propose to create a subset of natural scenes devoid of semantic content, (i.e. containing no man-made objects or animals) artificially manipulated to lower their aesthetic preference rating by observers. Thus, a new, non-biased dataset of images which include aesthetically “ugly” scenes will be generated, relabelled by human observers in a fashion consistent with existing datasets. Afterwards, we will apply to this new database the chromatic induction model (CIWaM, Otazu et al., 2010) and statistically learn the behaviour of its pyramid of operators. These operators functionally correspond to the activation of neurons in the presence of images and are meant to play a significant role in capturing the low-level features that determine the behaviour of human observers in this task.