Ordinal Regression Methods: Survey and Experimental Study

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 1 Ordinal regression methods: survey and experimental study Pedro Antonio Gutierrez,´ Member, IEEE, Mar´ıa Perez-Ortiz,´ Javier Sanchez-´ Monedero, Francisco Fernandez-Navarro,´ and Cesar´ Hervas-Mart´ ´ınez, Senior Member, IEEE Abstract—Ordinal regression problems are those machine learning problems where the objective is to classify patterns using a categorical scale which shows a natural order between the labels. Many real-world applications present this labelling structure and that has increased the number of methods and algorithms developed over the last years in this field. Although ordinal regression can be faced using standard nominal classification techniques, there are several algorithms which can specifically benefit from the ordering information. Therefore, this paper is aimed at reviewing the state of the art on these techniques and proposing a taxonomy based on how the models are constructed to take the order into account. Furthermore, a thorough experimental study is proposed to check if the use of the order information improves the performance of the models obtained, considering the most significant published approaches within the taxonomy. The results confirm that ordering information benefits ordinal models improving their accuracy and the closeness of the predictions to actual targets in the ordinal scale. Index Terms—Ordinal regression, ordinal classification, binary decomposition, threshold methods, augmented binary classification, proportional odds model, support vector machines, discriminant learning, artificial neural networks F 1 INTRODUCTION itself. The second one covers those variables where a user provides his/her judgement on the grade of the ordered EARNING to classify or to predict numerical values categorical variable. However, imposing an ordering is from prelabelled patterns is one of the central re- L meaningful for both cases. search topics in machine learning and data mining [1]– Ordinal regression problems are very common in [3]. However, less attention has been paid to ordinal many research areas, and they have been frequently regression (also called ordinal classification) problems, considered as standard nominal problems which can where the labels of the target variable exhibit a natural lead to non-optimal solutions. Indeed, ordinal regression ordering. For example, student satisfaction surveys usu- problems can be said to be between classification and ally involve rating teachers based on an ordinal scale regression, presenting some similarities and differences. fpoor; average; good; very good; excellentg. Hence, class Some of the fields where ordinal regression is found labels are imbued with order information, e.g. a sample are medical research [5]–[11], age estimation [12], brain vector associated with class label average has a higher computer interface [13], credit rating [14]–[17], econo- rating (or better) than another from the poor class, but metric modelling [18], face recognition [19]–[21], facial good class is better than both. When dealing with this beauty assessment [22], image classification [23], wind kind of problems, two facts are decisive: misclassifica- speed prediction [24], social sciences [25], text classification costs are not the same for different errors (it is tion [26], and more. All these works are examples of clear that misclassifying an excellent teacher as poor application of specifically designed ordinal regression should be more penalised than misclassifying him/her models, where the ordering consideration improves their as very good) and the ordering information can be used performance with respect to their nominal counterparts. to construct more accurate models. A further distinc- In statistics, ordinal data were firstly studied by using tion is made by Anderson [4], which differentiates two a link function able to model the underlying prob- major types of ordinal categorical variables, “grouped ability for generating ordinal labels [4]. The field of continuous variables” and “assessed ordered categorical ordinal regression has evolved in the last decade, with variables”. The first one is a discretised version of an un- a plethora of noteworthy research progress made in derlying continuous variable, which could be observed supervised learning [27], from support vector machine This work has been partially subsidised by the TIN2011-22794 project of (SVM) formulations [28], [29] to Gaussian processes [30] the Spanish Ministry of Economy and Competitiveness (MINECO), FEDER or discriminant learning [31], to name a few. However, funds and the P2011-TIC-7508 project of the “Junta de Andaluc´ıa”(Spain). up to the authors’ knowledge, these methods have not P.A. Gutiérrez, M. Pérez-Ortiz, J. Sánchez-Monedero and C. Hervás-Mart´ınez are with the Department of Computer Science yet been categorised in a general taxonomy, which is and Numerical Analysis, University of Córdoba, Campus de essential for further research and for identifying the Rabanales, Albert Einstein building, 14017 - Córdoba, Spain, e- developments made and the present state of existing mail:fpagutierrez,i82perom,jsanchezm,[email protected] F. Fernández-Navarro is with the Department of Mathematics and Engineer- methods. This paper contributes a review of the state- ing, Loyola University Andaluc´ıa,Spain, e-mail: [email protected] of-the-art of ordinal regression, a taxonomy proposal to IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2 better organise the advances in this field, and an ex- These labels form categories or groups of patterns, and perimental study with a complete repository of datasets the objective is to find a classification rule or function f : and a total of 16 ordinal regression methods (including X!Y to predict the categories of new patterns, given a software tool to run and test all the methods). a training set of N points, D = f(xi; yi); i = 1;:::;Ng.A Several objectives motivate the experimental study. natural label ordering is included for ordinal regression, First of all, our focus is on evaluating the necessity C1 ≺ C2 ≺ ::: ≺ CQ, where ≺ is an order relation given of taking ordering information into account. In [32], by the nature of the classification problem. Many ordinal ordinal meta-models were compared with respect to regression measures and algorithms consider the rank of their nominal counterparts to check their ability to ex- the label, i.e. the position of the label in the ordinal scale, ploit ordinal information. The work concludes that such which can be expressed by the function O(·), in such a meta-methods do exploit ordinal information and may way that O(Cq) = q; q = 1;:::;Q. The difference between yield better performance. However, as will be analysed this setting and other related ones is now established. in this work, specifically designed ordinal regression The assumption of an order between class labels makes methods can further improve the results with respect to that two different elements of Y can be always compared meta-model approaches. Another study [33] argues that by using the relation ≺, which is not possible under the ordinal classifiers may not present meaningful advan- nominal classification setting. If compared to regression tages over the analogue non-ordinal methods, based on (where y 2 R), it is true that real values in R can be accuracy and Cohen’s Kappa statistic [34]. The results ordered by the standard < operator, but labels in ordinal of the present review show that statistically significant regression (y 2 Y) do not carry metric information, so the differences are found when using measures which take category serves as a qualitative indication of the pattern the order into account, which is the case of the Mean rather than a quantitative one. Absolute Error (MAE), i.e. the average deviation between predicted and actual targets in number of cate- 2.2 Ordinal regression in the context of ranking and gories. The second main motivation of this paper is to sorting provide some guidelines to decide on the best methods in terms of accuracy, MAE and computational time. Although ordinal regression has been paid attention Since there are not specific repositories of ordinal re- recently, the amount of related research topics is worth gression datasets, proposals are usually evaluated using to be mentioned. First, it is important to remark the discretised regression ones, where the target variable is differences between ordinal regression and other related simply divided into different bins or classes. 24 of these ranking problems. There are three terms to be clarified: discretised datasets are used for our study, in addition to ranking, sorting and multipartite ranking. 17 real benchmark ordinal regression datasets extracted Ranking generally refers to those problems where the from public repositories. The last objective is to evaluate algorithm is given a set of ordered labels [36], with one whether the methods behave differently depending on label for each pattern, and the objective is to learn a rule the nature of the datasets. able to rank patterns by using this discrete set of labels. This paper is a significant extension of a preliminary The induced ordering should be partial with respect to conference version [35]: a deeper analysis of the state- the patterns, in the sense that ties are allowed. This rule of-the-art has been performed, including most recent should be able to obtain a good ranking, but not to proposals and a taxonomy to group them. Moreover, the classify patterns in the correct

Load more