Deep Reinforcement Learning for Imbalanced Classification
Total Page:16
File Type:pdf, Size:1020Kb
Deep Reinforcement Learning for Imbalanced Classification Enlu Lin1 Qiong Chen2,* Xiaoming Qi3 School of Computer Science and Engineering South China University of Technology Guangzhou, China [email protected] [email protected] [email protected] Abstract—Data in real-world application often exhibit skewed misclassification cost to the minority class. However, with the class distribution which poses an intense challenge for machine rapid developments of big data, a large amount of complex learning. Conventional classification algorithms are not effective data with high imbalanced ratio is generated which brings an in the case of imbalanced data distribution, and may fail when the data distribution is highly imbalanced. To address this issue, we enormous challenge in imbalanced data classification. Con- propose a general imbalanced classification model based on deep ventional methods are inadequate to cope with more and reinforcement learning. We formulate the classification problem more complex data so that novel deep learning approaches as a sequential decision-making process and solve it by deep Q- are increasingly popular. learning network. The agent performs a classification action on In recent years, deep reinforcement learning has been one sample at each time step, and the environment evaluates the classification action and returns a reward to the agent. The successfully applied to computer games, robots controlling, reward from minority class sample is larger so the agent is more recommendation systems [5]–[7] and so on. For classification sensitive to the minority class. The agent finally finds an optimal problems, deep reinforcement learning has served in eliminat- classification policy in imbalanced data under the guidance of ing noisy data and learning better features, which made a great specific reward function and beneficial learning environment. improvement in classification performance. However, there Experiments show that our proposed model outperforms the other imbalanced classification algorithms, and it can identify has been little research work on applying deep reinforcement more minority samples and has great classification performance. learning to imbalanced data learning. In fact, deep reinforce- ment learning is ideally suitable for imbalanced data learning Index Terms—imbalanced classification, deep reinforcement as its learning mechanism and specific reward function are learning, reward function, classification policy easy to pay more attention to minority class by giving higher reward or penalty. I. INTRODUCTION A deep Q-learning network (DQN) based model for im- Imbalanced data classification has been widely researched balanced data classification is proposed in this paper. In our in the field of machine learning [1]–[3]. In some real-world model, the imbalanced classification problem is regarded as classification problems, such as abnormal detection, disease a guessing game which can be decomposed into a sequential diagnosis, risk behavior recognition, etc., the distribution of decision-making process. At each time step, the agent receives data across different classes is highly skewed. The instances an environment state which is represented by a training sample in one class (e.g., cancer) can be 1000 times less than that and then performs a classification action under the guidance in another class (e.g., healthy patient). Most machine learning of a policy. If the agent performs a correct classification action algorithms are suitable for balanced training data set. When it will be given a positive reward, otherwise, it will be given facing imbalanced scenarios, these models often provide a a negative reward. The reward from minority class is higher arXiv:1901.01379v1 [cs.LG] 5 Jan 2019 good recognition rates to the majority instances, whereas the than that of majority class. The goal of the agent is to obtain minority instances are distorted. The instances in minority as more cumulative rewards as possible during the process of class are difficult to detect because of their infrequency and sequential decision-making, that is, to correctly recognize the casualness; however, misclassifying minority class instances samples as much as possible. can result in heavy costs. The contributions of this paper are summarized as fol- A range of imbalanced data classification algorithms have lows: 1) Formulate the classification problem as a sequential been developed during the past two decades. The methods decision-making process and propose a deep reinforcement to tackle these issues are mainly divided into two groups learning framework for imbalanced data classification. 2) De- [4]: the data level and the algorithmic level. The former sign and implement the DQN based imbalanced classification group modifies the collection of instances to balance the class model DQNimb, which mainly includes building the simula- distribution by re-sampling the training data, which often tion environment, defining the interaction rules between agent represents as different types of data manipulation techniques. and environment, and designing the specific reward function. The latter group modifies the existing learners to alleviate 3) Study the performance of our model through experiments their bias towards majority class, which often assigns higher and compare with the other methods of imbalanced data learning. noisy data. In [27], the classification task was constructed The rest of this paper is organized as follows: The second into a sequential decision-making process, which uses mul- section introduces the research methodology of imbalanced tiple agents to interact with the environment to learn the data classification and the applications of deep reinforcement optimal classification policy. However, the intricate simulation learning for classification problems. The third section elab- between agents and environment caused extremely high time orates the proposed model and analyzes it theoretically. The complexity. Feng et al. [28] proposed a deep reinforcement fourth section shows the experimental results and evaluates the learning based model to learn the relationship classification in performance of our method compared with the other methods. noisy text data. The model is divided into instance selector The last section summarizes the work of this paper and looks and relational classifier. The instance selector selects high- forward to the future work. quality sentence from noisy data under the guidance of agent while the relational classifier learns better performance from II. RELATED WORK selected clean data and feeds back a delayed reward to the A. Imbalanced data classification instance selector. The model finally obtains a better classifier The previous research work in imbalanced data classifica- and high-quality data set. The work in [29]–[32] utilized tion concentrate mainly on two levels: the data level [8]–[11] deep reinforcement learning to learn advantageous features and the algorithmic level [12]–[21]. Data level methods aim of training data in their respective applications. In general, to balance the class distribution by manipulating the train- the advantageous features improve the classifier while the ing samples, including over-sampling minority class, under- better classifier feeds back a higher reward which encourages sampling majority class and the combinations of the two above the agent to select more advantageous features. Martinez et methods [11]. SMOTE is a well-known over-sampling method, al. [33] proposed a deep reinforcement learning framework which generates new samples by linear interpolation between for time series data classification in which the definition of adjacent minority samples [9]. NearMiss is a typical under- specific reward function and the Markov process are clearly sample method based on the nearest neighbor algorithm [10]. formulated. Research in imbalanced data classification with However, over-sampling can potentially lead to overfitting reinforcement learning was quite limited. In [34] an ensemble while under-sampling may lose valuable information on the pruning method was presented that selected the best sub- majority class. The algorithmic level methods aim to lift classifiers by using reinforcement learning. However, this the importance of minority class by improving the existing method was merely suitable for traditional small dataset be- algorithms, including cost-sensitive learning, ensemble learn- cause it was inefficient to select classifiers when there were ing, and decision threshold adjustment. The cost-sensitive plenty of sub-classifiers. In this paper, we propose deep Q- learning methods assign various misclassification costs to network based model for imbalanced classification which is different classes by modifying the loss function, in which efficient in complex high-dimensional data such as image the misclassification cost of minority class is higher than or text and has a good performance compared to the other that of majority class. The ensemble learning based methods imbalanced classification methods. train multiple individual sub-classifiers, and then use voting or combining to get better results. The threshold-adjustment III. METHODOLOGY methods train the classifier in original imbalanced data and A. Imbalanced Classification Markov Decision Process change the decision threshold in test time. A number of deep learning based methods have recently been proposed Reinforcement learning algorithms that incorporate deep for imbalanced data classification [22]–[26]. Wang et al. [22] learning have defeated world champions