Journal of Machine Learning Research 20 (2019) 1-7 Submitted 1/19; Revised 4/19; Published 5/19 PyOD: A Python Toolbox for Scalable Outlier Detection Yue Zhao
[email protected] Carnegie Mellon University∗ Pittsburgh, PA 15213, USA Zain Nasrullah
[email protected] University of Toronto Toronto, ON M5S 2E4, Canada Zheng Li jk
[email protected] Northeastern University Toronto Toronto, ON M5X 1E2, Canada Editor: Alexandre Gramfort Abstract PyOD is an open-source Python toolbox for performing scalable outlier detection on multi- variate data. Uniquely, it provides access to a wide range of outlier detection algorithms, including established outlier ensembles and more recent neural network-based approaches, under a single, well-documented API designed for use by both practitioners and researchers. With robustness and scalability in mind, best practices such as unit testing, continuous in- tegration, code coverage, maintainability checks, interactive examples and parallelization are emphasized as core components in the toolbox's development. PyOD is compatible with both Python 2 and 3 and can be installed through Python Package Index (PyPI) or https://github.com/yzhao062/pyod. Keywords: anomaly detection, outlier detection, outlier ensembles, neural networks, machine learning, data mining, Python 1. Introduction Outlier detection, also known as anomaly detection, refers to the identification of rare items, events or observations which differ from the general distribution of a population. Since the ground truth is often absent in such tasks, dedicated outlier detection algorithms are extremely valuable in fields which process large amounts of unlabelled data and require arXiv:1901.01588v2 [cs.LG] 10 Jun 2019 a means to reliably perform pattern recognition (Akoglu et al., 2012).