Received September 14, 2016, accepted October 1, 2016, date of publication October 26, 2016, date of current version November 28, 2016. Digital Object Identifier 10.1109/ACCESS.2016.2619719 Comparing Oversampling Techniques to Handle the Class Imbalance Problem: A Customer Churn Prediction Case Study ADNAN AMIN1, SAJID ANWAR1, AWAIS ADNAN1, MUHAMMAD NAWAZ1, NEWTON HOWARD2, JUNAID QADIR3, (Senior Member, IEEE), AHMAD HAWALAH4, AND AMIR HUSSAIN5, (Senior Member, IEEE) 1Center for Excellence in Information Technology, Institute of Management Sciences, Peshawar 25000, Pakistan 2Nuffield Department of Surgical Sciences, University of Oxford, Oxford, OX3 9DU, U.K. 3Information Technology University, Arfa Software Technology Park, Lahore 54000, Pakistan 4College of Computer Science and Engineering, Taibah University, Medina 344, Saudi Arabia 5Division of Computing Science and Maths, University of Stirling, Stirling, FK9 4LA, U.K. Corresponding author: A. Amin (
[email protected]) The work of A. Hussain was supported by the U.K. Engineering and Physical Sciences Research Council under Grant EP/M026981/1. ABSTRACT Customer retention is a major issue for various service-based organizations particularly telecom industry, wherein predictive models for observing the behavior of customers are one of the great instruments in customer retention process and inferring the future behavior of the customers. However, the performances of predictive models are greatly affected when the real-world data set is highly imbalanced. A data set is called imbalanced if the samples size from one class is very much smaller or larger than the other classes. The most commonly used technique is over/under sampling for handling the class-imbalance problem (CIP) in various domains.