Feature Supervised Labels representation learning

Labeled data (a)

Feature Supervised representation learning

Labeled data

Feature Supervised Labels representation learning

Labeled data (b) Transfer learning

Feature subspace

Unlabeled data

Feature Supervised Labels representation learning

Intuition LearningLabeled data (c) Semi - supervised learning

1 2 2 Anush Sankaran , Mayank Vatsa , RichaFeature Singh 1 IBM Research subspace Unlabeled data 2 IIIT, Delhi, India Feature Supervised Labels [email protected], [email protected], [email protected] learning

Labeled data (d) Self taught learning

Abstract Feature Feature Reinforcement Labels “By reading on the title and abstract, do you think subspace representation learning this paper will be accepted in IJCAI 2018?” A com- Unlabeled data

mon impromptu reply would be “I don’t know but Feature Supervised Labels I have an intuition that this paper might get ac- representation learning

cepted”. Intuition is often employed by humans Labeled data to solve challenging problems without explicit ef- (e) Intuition learning forts. Intuition is not trained but is learned from Figure 1: The proposed solution where an intuition learning algo- one’s own experience and observation. The aim of rithm can supplement a supervised or semi-supervised algorithm at this research is to provide intuition to an algorithm, decision level. apart from what they are trained to know in a super- vised manner. We present a novel intuition learn- ing framework that learns to perform a task com- learning. In other words, intuition is a context dependent guesswork pletely from unlabeled data. The proposed frame- that can be incorrect certain times. In a typical learning work uses a continuous-state reinforcement learn- pipeline, the concept of intuition can be used for a ing mechanism to learn a feature representation and variety of purposes starting from training data selection upto a data-label mapping function using unlabeled data. and including decision making. Heuristics are the simplest The mapping functions and feature representation form of intuition that bypass or is used in conjunction with are succinct and can be used to supplement any su- rational decisions to obtain quick approximate results. For pervised or semi-supervised algorithm. The exper- example, heuristics can be used in (1) choosing the new data [ et al. ] iments on the CIFAR-10 database shows challeng- points in an online active learning scenario Chu , 2011 , [ et al. ] ing cases where intuition learning improve the per- (2) for feature representation Coates , 2011 , (3) feature [ ] formance of a given classifier. selection Guyon and Elisseeff, 2003 , or (4) choice of clas- sifier and it’s parameters [Cavalin et al., 2012]. Table 1 shows the comparison of existing popular machine 1 Introduction learning paradigms. Supervised learning attempts to learn an Intuition refers to knowledge acquired without inference input-output mapping function on a feature space using a set and/or the use of reason [Simpson et al., 1989]. Philosophi- of labeled training data. Transfer learning aims to improve cally, there are several definitions for intuition and the most the target learning function using the knowledge in source popularly used one is “Thoughts that are reached with lit- (related) domain and source learning tasks [Pan and Yang, tle apparent effort, and typically without conscious aware- 2010]. Many types of knowledge transfer such as classifi- ness” [Hogarth, 2001] and is considered as the opposite of cation parameters [Lawrence and Platt, 2004], feature rep- a rational process. From a perspective, resentations [Evgeniou and Pontil, 2007], and training in- training a supervised classifier is a rational process where it stances [Huang et al., 2006] have been tested to improve the is trained with labeled data allowing it to learn a decision performance of supervised learning tasks. Semi-supervised boundary. Also, traditional methods learning utilizes additional knowledge from unlabeled data, do not map the learnt patterns to their corresponding class drawn from the same distribution and having the same task labels. Semi-supervised approaches bridge this gap by lever- labels as the labeled data. Many of these research works aging unlabeled data to better perform supervised learning has focussed on unsupervised ie., to create tasks. However, the final task (say, classification) is per- a feature subspace using the unlabeled data, to which the la- formed only by a supervised classifier using labeled data with beled data can be projected to obtain a new feature representa- some additional knowledge from unsupervised learning. The tion [Chapelle et al., 2006]. In 2007, Raina et al. [Raina et al., notion of intuition would mean that the system performs tasks 2007] proposed a framework termed as “Self-taught learn- using only unlabeled data without any supervised (rational) ing” to create the generic feature subspace using sparse auto- Paradigm Input data Learnt function Comments Supervised [Bishop and data-label mapping Nasrabadi, 2006] Unsupervised [Bishop data clusters and Nasrabadi, 2006] Semi-supervised , unlabeled data data-label mapping unlabeled data follow the same distribution [Chapelle et al., 2006] Reinforcement [Kael- reward function (or value) state, action policy need a teacher to provide reward bling et al., 1996] Active [Settles, 2010] data-label mapping, new need human annotator (Oracle) or expert al- data selection gorithm to provide labels for new data Transfer [Pan and , targetData - targetLabel transfer can be data instances, classification Yang, 2010] mapping parameters, or features Imitation [Natarajan et sourceData, sourceData- targetData - targetLabel need a teacher to provide reward al., 2011] sourceLabel mapping mapping Self taught [Raina et , unlabeled data data-label mapping unlabeled data need not follow the same al., 2007] distribution and label as data [Bengio, , unlabeled data data-label mapping complex architecture to learn robust data 2009] representations Intuition data, unlabeled data, reward data-label mapping unlabeled data need not follow the same function (or value) distribution, need a reward function

Table 1: Comparison of existing popular machine learning paradigms along with the proposed intuition learning paradigm.

encoders irrespective of the task labels. Self-taught learning Large dismisses the same class label assumption of semi-supervised unlabeled data learning and forms a generic high-level feature subspace from Generic feature the unlabeled data, where the labeled data can be projected. subspace

As shown in Figure 1, we postulate a framework of sup- RL-based Limited classifier plementing intuition decisions at the decision level to a su- Task #1 Task #2 . . . Task #m task-specific specific specific specific pervised or semi-supervised classifier. The decisions drawn data features features features by the block in Figure 1 are called in- RL-based RL-based . . . RL-based tuition because they are learnt only using the unlabeled data classifier classifier classifier with an indirect reward from a teacher. Existing algorithms, Label Label Label broadly, require training labels for building a classifier or bor- rows the classifier parameters from an already trained classi- Figure 2: A block diagram outlining on how a feature space can be fier. Direct or indirect training is not always possible as ob- adapted using reinforcement learning algorithm with feedback from taining data labels is very costly. To address this challenge, a supervised classifier trained on limited task-specific data. we propose a novel paradigm for unsupervised task perfor- mance mechanism learnt from cumulative experience. Intu- ition is modeled as a learning framework, which provides the sification framework is proposed to map input data to ability to learn a task completely from unlabeled data. By output class label, without the explicit use of training. using continuous state reinforcement learning as a classifier, • A residual Q-learning based function approximation the framework learns to perform the classification task with- method for learning the feature representation of task out the need for explicit labeled data. Reinforcement learning specific data. A novel reward function which does not helps in adapting a randomly initialized feature space to the require class labels is designed to provide feedback to specific task at hand, where a parallel supervised classifier is the reinforcement based classification system. used a teacher. As the proposed framework is able to learn a mapping function from the input data to the output class • A context dependent addition framework is proposed, labels, without the requirement for explicit training, it func- where the result of the intuition framework can be sup- tions similar to human intuition and we term this approach as plemented based on the confidence of the trained super- Intuition Learning. vised or semi-supervised mapping function.

1.1 Research Contributions 2 An Intuition Learning Algorithm This research proposes a intuition learning framework to en- The basic idea of the proposed intuition learning framework able algorithms learn a specific classification or regression is presented in Figure 2. Given a large set of unlabelled data, task completely from unlabeled data. The major contribu- different kinds of feature representations are extracted to de- tions of this research are as follows: scribe the data, irrespective of the task in hand. To further • A continuous state reinforcement learning based clas- leverage the knowledge interpretation from unlabeled data, a Unlabeled Data

(1) (2) (3) (p) (or the universal set), the features are the different observa- Iu Iu Iu Iu (I) Extract ‘r’ features tions made by the algorithm from the world. Similar to hu-

Generic (1) (2) (p) (1) (2) (p) (1) (2) (p) xu1 x ...... x x x ...... x . . . xur x ...... x (II) Cluster feature subspace u1 u1 u2 u2 u2 ur ur man intuition, the set of feature representations extracted are features task independent, and later depending on the learning task a Intuition based (1) (2) . . . (C) (1) (2) . . . (C) . . . (1) (2) . . . (C) (III) Calculate zu1 zu1 zu1 z z zu2 zur zur zur feature subspace u2 u2 current state subset of these features could be dominantly used. This task Intuition based (1) (2) . . . (C) (1) (2) . . . (C) . . . (1) (2) (C) s1 s s s s s2 s s . . . s representation 1 1 2 2 r r r independent feature space is similar to the human intuition Feature a(1) adaptation learnt by observing the environment. } Figure 3 provides a detailed description of the pro- Labeled posed intuition learning framework. From the set of { (C) Data a Predicted Continuous state label unlabeled data Iu, we extract r different kinds of fea- reinforcement learning ture representations, {Xu1 ,Xu2 ,...,Xur }, where Xui = Hand-crafted or (1) (2) (p) (j) n Supervised i unsupervised learnt {xui , xui , . . . , xui }, where xui ∈ R . For every feature classifier features representation q ∈ [1, 2, . . . , r], we cluster the representation (1) (2) (p) 1 Figure 3: Overall scheme of the proposed intuition learning algo- [xuq , xuq , . . . , xuq ] into C clusters using k-means cluster- rithm that aids a supervised classifier. ing. The centroid of each cluster for the ith feature represen- tation is given as [z1 , z2 , . . . , zC ]. This feature collec- u(i) u(i) u(i) tion of [z1 , z2 , . . . , zC ], for q = [1, 2, . . . , r] is called u(q) u(q) u(q) continuous state reinforcement learning mechanism is used as Intuition based Feature Subspace (IFS), as it clusters the to perform the given classification task. As reinforcement is entire set of unlabeled data into groups, based on every ob- a continuous learning process, using a reward based feedback servation (feature). mechanism, the classification task improves with time. The reinforcement learning, on one hand acts as a classifier, while 2.2 Classification using Reinforcement Learning on the other hand continuously adapts the feature represen- For a given set of m labeled training data, tation with respect to the given task. Thus, given multiple (1) (2) (m) tasks, the proposed intuition learning framework can adapt {Il ,Il ,...,Il }, the set of r features (as used for the generic feature space to be consistent with the correspond- the unlabeled data) are extracted as [x(1), x(2), . . . , x(m)], lq lq lq ing task. where q = [1, 2, . . . , r]. The extracted features are then (1) (1) (2) (2) (m) (m) Let {(Il , y ), (Il , y ),..., (Il , y )} be the projected onto the Intuition based Feature Subspace (IFS) by set of m labeled training data drawn i.i.d. from a calculating the distance of features from the corresponding distribution D. The labeled data are represented as cluster centroids shown as, (1) (1) (2) (2) (m) (m) (i) {(xl , y ), (xl , y ),..., (xl , y )}, where xl ∈ (i) (i) (j) s = ||x − z ||2 (1) n (i) (i) q lq uq R is the feature representation of the data Il and y ∈ (i) for j = [1, 2,...,C], q = [1, 2, . . . , r], and i = [1, 2, . . . , m]. [1, 2,...,C] denotes the class label corresponding to xl . (1) (2) (p) The representation of the data i is given by concatenating the Let the set of unlabeled data be {I ,I ,...,I }, where u u u distances corresponding to all the features, the subscript u represents that they are unlabeled data. Con- trary to self-taught learning [Raina et al., 2007], we do not s(i) = [s(i), s(i), . . . , s(i)] (2) assume that the labeled and unlabeled data should be drawn 1 2 r from the same distribution D or have the same class labels, The obtained representation is succinct with a fixed length however, they should be derived from the same modality. dimension of rC × 1, where r is the number of differ- Given a set of labeled and large unlabeled data, the aim of ent feature types extracted and C is the number of clus- intuition learning is to learn a hypothesis h0 :(X → R) ∈ ters. In essence, every value represents the distance from [1, 2,...,C] that predicts the labels for a given input repre- a cluster centroid. Also, in a typical semi-supervised sentation of data drawn. However, the hypothesis h0 is learnt (or self-taught) learning scheme, the mapping between in- without the direct use of labels y(i) and is used as a supple- tuition based representation and the output class labels, (1) (1) {(s(1), y(1)), (s(2), y(2)),..., (s(m), y(m))} is learnt in a su- ment for the hypothesis h learnt using (xl , y ) in a super- vised (or semi-supervised) manner. pervised manner. However, in the proposed intuition learn- ing, we attempt to learn the data-label mapping without using 2.1 Adapting Feature Representation the class labels, using reinforcement learning. The aim of re- inforcement learning is to learn an action policy π : s → a, From a large set of unlabeled data, many different kind of fea- where s ∈ S is the current state of the system and a is the ture representations are extracted. Each representation may action performed in that state. As the setup involves a con- correspond to a different property of the data that we try to tinuous state environment, the optimal action policy is learnt capture. For image data, the features could be color, texture, using a model free, off-policy Temporal Difference (TD) al- and shape while for text data, the features could be n-grams, gorithm called Q-learning, where Q(s, a)-value denotes the bag-of-words, and word embeddings. The features can also be a set of different color features or set of hierarchical n- 1The best adaption results are obtained when we fix C to be the grams. If the large set of unlabeled data is seen as the world number of classes we have in the learning task effectiveness of a state-action pair. The TD(0) Q-learning • Apart from unlabeled data, every labeled training data algorithm is given by, (and even testing data) gets incrementally added to the IFS, as the observed data affects the overall understand- h 0 i Q(st, a) = Q(st, a)+α rt + γ max Q(st+1, a ) − Q(st, a) ing of features. a0 (3) • The aim of this incremental learning, is to shape n where, rt ∈ R is the immediate reward obtained for per- the IFS as close as possible to LF while learning forming action a in state st, γ ∈ [0, 1] is the factor with the feature-label mapping function using reinforcement which the future rewards are discounted and α ∈ [0, 1] is learning. the learning rate. In our problem, reinforcement learning is The incremental update of the IFS happens for the ith formulated as a classification problem, where IFS is the cur- th rent state s and action a is the output label to be predicted, training example belonging to j class, as shown in the fol- the policy π learns the data-label relation for the given data. lowing equation, Due to the large, probabilistic, and continuous definition of  (i) (j)  x − zu the space s, the Q-values are approximated using a univer- lq q z(j) = z(j) + (7) [ uq uq  j  sal function approximation ie., a neural network Sutton and nq Barto, 1998]. X T for q = [1, 2, . . . , r], where nj is the number of data points Q(s, a) = ψ(s, a, θ) = φi(s, a).θi = φ (s, a).θ (4) q th th i in the j cluster for q feature. Further, to make effective learning from this incremental update, the reward function is where, φ is the approximation function. Using residual Q- defined as a function of the distance between the current IFS learning algorithm [Paletta and Pinz, 2000], the free parame- and LFS, as follows: ters θ are updated as follows, −1  (j) (j)  θt+1 = θt + α.ψ.∆ψ (5) r = ||z − z || (8) t uq ,t lq 2

for q = [1, 2, . . . , r], j = [1, 2,...,C] at a given time t. h 0 i θt+1 = θt + α rt + γ max Q(st+1, a ) − Q(st, a) a0 2.4 Context Dependent Addition Mechanism   (6) ∂ 0 ∂ × βγ max Q(st+1, a ) − Q(st, a) Intuition learning framework acts as a supplement to ∂θ a0 ∂θ (and not complementing) supervised learning. The need where β is a weighting factor called the Bellman residual. for intuition arises only when the confidence of super- Baird [Baird, 1995] guaranteed the convergence of the above vised learner falls below a particular threshold. There- approximate Q-learning function, the details of which are fore, a context dependent mechanism is designed to lever- skipped for the sake of brevity. − exploration strategy is age supervised learning using intuition only when re- (1) (2) (m) adopted, where, in every state a random action is preferred quired. For given labeled training data {Il ,Il ,...,Il }, with a probability of . As observed in [Lagoudakis and Parr, some hand-written or unsupervised features are extracted, 2003], “the crucial factor for a successful approximate al- (1) (1) (2) (2) (m) (m) {(xl , y ), (xl , y ),..., (xl , y )} and a super- gorithm is the choice of the parametric approximation archi-  (i)  tecture and the choice of the projection (parameter adjust- vised model is learnt, Hs : xl → yˆs . Based on the su- ment) method(s)”. The choice of reward function employed pervised learning algorithm, the classification confidence is th (i) is highly important and directly implies the effectiveness of computed for the i data point and is given as confs = adaption, which is explained in the next section. (i) (i) (i) [cs1 , cs2 , . . . , csC ]. The mechanism to calculate the clas- 2.3 Design of Reward Function sification confidence depends on the supervised learning model used. Similarly, the intuition learning can be repre- The Intuition based Feature Subspace (IFS) is defined by sented as H : s(i) → yˆ  and the classification con- the cluster centroid points obtained using unlabeled data int int (1) (2) (C) fidence is the output of the last layer of the value func- for every feature q as [zuq , zuq , . . . , zuq ], where q = (i) tion approximation neural architecture, given as confint = [1, 2, . . . , r]. This space provides an organized definition (i) (i) (i) of how the entire set of unlabeled data is observed and in- [cint1 , cint2 , . . . , cintC ]. A label switching mechanism ferred. From the various features of the labeled training is performed to give the final predicted label, yˆ, as follows, data [(x(1), y(1)), (x(2), y(2)),..., (x(m), y(m))], where q ∈  lq lq lq yˆs, ∆ > th [1, 2, . . . , r], the centroid points for every feature and ev- yˆ = (9) yˆnew, otherwise ery class are calculated as, [z(1), z(2), . . . , z(C)], where q = lq lq lq [1, 2, . . . , r]. This space, called the Labeled data Feature Sub- where th is the threshold for using intuition and the condition space (LFS), formed by these centroid points provide us the for context ∆ is calculated as follows, inference of the particular learning task to be performed. It is  (i)  (i) ∆ = max csj − max csl (10) to be noted that: j l6=j Type Feature Dimension Color Color Harris [Van De Weijer et al., 2006] 10 × 2 Color Color Autocorrelograms [Huang et al., 1997] 64 × 1 Local Texture Local Binary Pattern (LBP) [Ojala et al., 2002] 59 × 4 Global Texture GIST [Oliva and Torralba, 2001] 512 × 1 Saliency Region covariances [Erdem and Erdem, 2013] 32 × 32 Shape Multilayer [Vincent et al., 2008] 10 × 1

Table 2: Details of different features extracted from the image data.

In such cases where intuition is used to boost the confidence Color Autocorrelogram Color Harris Shape Saliency Texture LBP Texture GIST of supervised classifier the new label is computed as follows:

(a) component2 (i) (i) (i) component1 component1 component1 component1 component1 component1

cnewk = λ.csk + (1 − λ).cintk (11)

(b)

 (i) component2 yˆnew = arg max cnewj (12) j component1 component1 component1 component1 component1 component1 where λ is the trade-off parameter between intuition and su- (c) pervised learning. Thus, in simple words, we add the feeling dissimilarity of intuition to an algorithm. Iteration # Iteration # Iteration # Iteration # Iteration # Iteration #

3 Experimental Analysis Figure 4: Image showing the data clusters for each of the extracted feature and grid depicts the cluster density at local regions. (a) shows 3.1 Dataset the IFS of all the unlabeled data, (b) shows the adapted task specific The proposed intuition learning algorithm is applied feature space after 300 epochs of learning, and (c) shows the amount for 10-class classification problem using the CIFAR-10 of change happening in the cluster after the addition of an image. database [Krizhevsky and Hinton, 2009]. The database con- Best viewed in color. tains 60, 000 color images labeled, each of size 32 × 32 per- Training Semi Supervised Supervised taining to 10 classes (i.e. 6, 000 images per class). There are size supervised intuition 50, 000 training images and 10, 000 test images. The data set 10000 38.90 29.64 36.19 contains small size images, leading to limited and noisy infor- 5000 37.13 24.34 34.78 mation content and it provides the most relevant case study to 3000 14.12 17.07 25.65 demonstrate the effectiveness of the proposed paradigm. The 1000 10.34 16.54 19.61 STL-10 database [Coates et al., 2011] is used as the unla- beled image data set having one million colored images of Table 3: The performance accuracy (%) of supervised intuition size 96 × 96. As shown in Table 2 six different feature rep- learning is compared with supervised (neural network) and semi- resentation are extracted from the images. These features supervised (self-taught) learning methods. The significance of in- tuition, is studied by varying the amount of available training data. comprehensively comprise the various types of features that 5 times random cross validation is performed and the best models´ could be extracted from image data. For all the experiments, performance is reported. five times random cross-validation is performed and the best model accuracy is reported for all the experiments. th where Dj is the density of the j cluster. It can be visu- 3.2 Interpreting Intuition based Feature Subspace ally observed from the plot that, shape, gist, and LBP fea- ture spaces are updated (learns) after the addition of each im- The primary aim of the approach is to construct the feature age, indicating that these features contribute more towards the subspace completely from unlabeled data and to adapt it to classification task. However, both color Harris and autocor- a specific learning task. Figure 4 shows the clusters of en- relogram features are not much updated by the training data. tire unlabeled data corresponding to every feature extracted. The concatenation of the feature spaces put together in Fig- 3.3 Performance Analysis ure 4(a) represents the IFS. Figure 4(b) shows the adapted task-specific feature subspace after performing 300 epochs of It is to be noted that intuition learning framework is used learning with the given labeled data. Figure 4(c) shows the to supplement any supervised or semi-supervised learning amount of update in the cluster after adding an image, by cal- mechanism. In this paper, we show the results in the follow- culating the dissimilarity between the cluster centroid, before ing scenario: and after the addition of the image. Cluster dissimilarity is 1. Using two supervised learning algorithms (back- calculated for the r − th feature representation as follows: propagation neural network and multi-class SVM) C with Uniform Circular Local Binary Pattern X 1 [ ] C = .||z(j)| − z(j)| || (13) (UCLBP) Ojala et al., 2002 as features. Labeled dis ur (t+1) ur (t) 2 (1) (2) (m) Dj data, [(x , y(1)), (x , y(2)),..., (x , y(m))], from j=1 lq lq lq Training Intuition Intuition Intuition AL: Ship AL: Cat AL: Horse AL: Truck size supervised semi-supervised SL: Truck SL: Horse SL: Horse SL: Truck IL: Ship IL: Cat IL: Cat IL: Frog 10000 12.11 36.19 29.21 5000 10.00 34.78 23.85 3000 10.00 25.65 22.49 1000 08.99 19.61 20.53 (a) (b)

Table 4: The influence of supplementing intuition to supervised and Figure 5: Examples of (a) success and (b) failure cases of the pro- semi-supervised algorithm is shown by improvement in the perfor- posed intuition learning. AL = actual ground truth label, SL = label mance accuracy (%). predicted by the supervised neural network learner, and IL = label predicted when intuition is combined with supervised neural net- Training size Supervised Intuition supervised work learner. 10000 44.57 41.83 5000 43.21 40.77 3000 13.56 17.48 This supports our hypothesis that adding intuition would im- 1000 06.19 09.78 prove the performance from under challenging circumstances such as limited training data. Similarly from a human’s per- Table 5: The performance accuracy (%) of supervised and super- spective, under the presence of all data and information, one vised intuition framework using SVM classifier is studied. may take correct decisions. However, when the background training data information is limited, intuition learning helps. CIFAR-10 is used to train the supervised algorithms. Further, some key analysis are summarized below: 2. Using a semi-supervised learning algorithm, with neu- 1. To study the effectiveness of residual learning in Equa- ral network as classifier and UCLBP features trained on tion 6, training error over successive epochs is plotted, CIFAR-10 dataset. The semi-supervised algorithm used for a training size of 10000. It can be observed that the for comparison is one approach for self-taught learn- training error gradually decreases and remains constant ing [Raina et al., 2007], with unlabeled data from STL- after 300 epochs, indicating that maximum training ca- (1) (1) (2) (2) (m) (m) 10 dataset, {(s , y ), (s , y ),..., (s , y )}. pacity has been achieved, with minimum training error. 3. Using a intuition learning framework only, having the 2. The computation time required for intuition learning de- intuition based task specific feature representation com- pends on the complexity of r features that are extracted. bined with a continuous state reinforcement learning (Q- However, for one sample, under the assumption that the learning) in Equation 4 for classification. feature extraction happens off-line, the overall intuition 4. Using a supervised intuition framework, where the out- decision and feature space can be generated in 0.082s put of the supervised learning algorithm and the intu- while the supervised decision can be taken in ∼ 4s on ition learning framework is combined using the context an average. This shows that intuition is much faster re- dependent addition mechanism in Equation 12. quiring little effort than supervised decision making. 5. Using a semi-supervised intuition framework, where the 3. In Figure 5, some success and failure example cases are output of the semi-supervised learning algorithm and the shown where (a) intuition helps in correctly classifying a intuition learning framework is combined using the con- data but supervised learning fails and (b) data was incor- text dependent addition mechanism. rectly classified because of intuition. As previously dis- The optimized values of various parameters used in our cussed, intuition can go wrong sometimes. Upon analyz- framework are as follows: α = 0.99, γ = 0.95, β = ing the first horse example in failure case (Figure 5(b)), it 0.2, th = 0.9, λ = 0.5, and  = 0.05. Preprocessing of is observed that horses are clustered more towards brown features is done using z-score normalization. All the experi- color in the autocorrelogram color feature space. How- ments are performed on a Intel Xeon E5 − 2640 0, 2.50GHz, ever, as the horse shown in the images is white in color, 64GB RAM server. it gets clustered along with cat and misclassified by in- As already discussed, intuition has a better significance in tuition learning. challenging problems with limited training data. Table 3, Ta- ble 4, and Table 5 show the performance of the proposed in- 4 Conclusion tuition learning in comparison with other learning methods, by varying the training size as parameter2. It can be observed Inspired from human capabilities of instinct reasoning, this that with enough training data, supervised algorithms (both research presents a intuition learning framework that supple- neural network and SVM) yield the best classification perfor- ments a classifier for improved performance, especially with mance. However, with decrease in the size of training data, limited training data. Intuition is modeled as a continuous the performance of all the three algorithms, supervised, semi- state reinforcement learning, that adapts to a particular task supervised, and intuition learning reduces. The results show using large amount of unlabeled data and limited task spe- 10 that in such a scenario, incorporating intuition with super- cific data. The performance of intuition in shown in a vised or semi-supervised algorithm yields improved results. class image classification problem, in comparison with su- pervised, semi-supervised, and reinforcement learning. The 2For a given training size, the same subset of images is used results indicate that the application of intuition improves the across all the classifiers to avoid any training bias. performance of the classifier with limited training data. References [Lawrence and Platt, 2004] Neil D Lawrence and John C [Baird, 1995] Leemon Baird. Residual algorithms: Rein- Platt. Learning to learn with the informative vector ma- forcement learning with function approximation. In ICML, chine. In ICML, pages 65–72, 2004. pages 30–37, 1995. [Natarajan et al., 2011] Sriraam Natarajan, Saket Joshi, [Bengio, 2009] Yoshua Bengio. Learning deep architectures Prasad Tadepalli, Kristian Kersting, and Jude Shavlik. for ai. Foundations and trends in Machine Learning, Imitation learning in relational domains: A functional- 2(1):1–127, 2009. gradient boosting approach. In IJCAI, pages 1414–1420, 2011. [Bishop and Nasrabadi, 2006] Christopher M Bishop and [ et al. ] Nasser M Nasrabadi. PRML, volume 1. 2006. Ojala , 2002 Timo Ojala, Matti Pietikainen, and Topi Maenpaa. Multiresolution gray-scale and rotation invari- [Cavalin et al., 2012] Paulo R Cavalin, Robert Sabourin, and ant texture classification with local binary patterns. IEEE Ching Y Suen. LoGID: An adaptive framework combining TPAMI, 24(7):971–987, 2002. local and global incremental learning for dynamic selec- [Oliva and Torralba, 2001] Aude Oliva and Antonio Tor- tion of ensembles of HMMs. PR, 45(9):3544–3556, 2012. ralba. Modeling the shape of the scene: A holistic rep- [Chapelle et al., 2006] Olivier Chapelle, Bernhard resentation of the spatial envelope. IJCV, 42(3):145–175, Scholkopf,¨ Alexander Zien, et al. Semi-supervised 2001. learning, volume 2. MIT press Cambridge, 2006. [Paletta and Pinz, 2000] Lucas Paletta and Axel Pinz. Active [Chu et al., 2011] Wei Chu, Martin Zinkevich, Lihong Li, object recognition by view integration and reinforcement Achint Thomas, and Belle Tseng. Unbiased online active learning. Robotics and Autonomous Systems, 31(1):71–86, learning in data streams. In ACM SIGKDD, pages 195– 2000. 203, 2011. [Pan and Yang, 2010] Sinno Jialin Pan and Qiang Yang. A [Coates et al., 2011] Adam Coates, Andrew Y Ng, and survey on transfer learning. IEEE TKDE, 22(10):1345– Honglak Lee. An analysis of single-layer networks in 1359, 2010. ICAIS unsupervised feature learning. In , pages 215–223, [Raina et al., 2007] Rajat Raina, Alexis Battle, Honglak Lee, 2011. Benjamin Packer, and Andrew Y Ng. Self-taught learning: [Erdem and Erdem, 2013] Erkut Erdem and Aykut Erdem. Transfer learning from unlabeled data. In ICML, pages Visual saliency estimation by nonlinearly integrating fea- 759–766, 2007. tures using region covariances. Journal of vision, 13(4), [Settles, 2010] Burr Settles. Active learning literature sur- 2013. vey. Technical report, Computer Sciences Technical Re- [Evgeniou and Pontil, 2007] A Evgeniou and Massimiliano port 1648, University of WisconsonMadison, 2010. Pontil. Multi-task feature learning. In NIPS, volume 19, [Simpson et al., 1989] John Andrew Simpson, Edmund SC page 41, 2007. Weiner, et al. The Oxford english dictionary, volume 2. [Guyon and Elisseeff, 2003] Isabelle Guyon and Andre´ Elis- Clarendon Press Oxford, 1989. seeff. An introduction to variable and feature selection. [Sutton and Barto, 1998] Richard S Sutton and Andrew G JMLR, 3:1157–1182, 2003. Barto. Reinforcement learning: An introduction. Cam- [Hogarth, 2001] Robin M Hogarth. Educating intuition. bridge University Press, 1998. University of Chicago Press, 2001. [Van De Weijer et al., 2006] Joost Van De Weijer, Theo [Huang et al., 1997] Jing Huang, S Ravi Kumar, Mandar Mi- Gevers, and Arnold WM Smeulders. Robust photomet- tra, Wei-Jing Zhu, and Ramin Zabih. Image indexing using ric invariant features from the color tensor. IEEE TIP, color correlograms. In CVPR, pages 762–768, 1997. 15(1):118–127, 2006. [Huang et al., 2006] Jiayuan Huang, Arthur Gretton, [Vincent et al., 2008] Pascal Vincent, Hugo Larochelle, Karsten M Borgwardt, Bernhard Scholkopf,¨ and Alex J Yoshua Bengio, and Pierre-Antoine Manzagol. Extract- Smola. Correcting sample selection bias by unlabeled ing and composing robust features with denoising autoen- data. In NIPS, pages 601–608, 2006. coders. In ICML, pages 1096–1103, 2008. [Kaelbling et al., 1996] Leslie Pack Kaelbling, Michael L Littman, and Andrew W Moore. Reinforcement learning: A survey. arXiv cs/9605103, 1996. [Krizhevsky and Hinton, 2009] Alex Krizhevsky and Geof- frey Hinton. Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto, 2009. [Lagoudakis and Parr, 2003] Michail G Lagoudakis and Ronald Parr. Least-squares policy iteration. JMLR, 4:1107–1149, 2003.