Scikit-Learn
Total Page:16
File Type:pdf, Size:1020Kb
Scikit-Learn i Scikit-Learn About the Tutorial Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python. It provides a selection of efficient tools for machine learning and statistical modeling including classification, regression, clustering and dimensionality reduction via a consistence interface in Python. This library, which is largely written in Python, is built upon NumPy, SciPy and Matplotlib. Audience This tutorial will be useful for graduates, postgraduates, and research students who either have an interest in this Machine Learning subject or have this subject as a part of their curriculum. The reader can be a beginner or an advanced learner. Prerequisites The reader must have basic knowledge about Machine Learning. He/she should also be aware about Python, NumPy, Scipy, Matplotlib. If you are new to any of these concepts, we recommend you take up tutorials concerning these topics, before you dig further into this tutorial. Copyright & Disclaimer Copyright 2019 by Tutorials Point (I) Pvt. Ltd. All the content and graphics published in this e-book are the property of Tutorials Point (I) Pvt. Ltd. The user of this e-book is prohibited to reuse, retain, copy, distribute or republish any contents or a part of contents of this e-book in any manner without written consent of the publisher. We strive to update the contents of our website and tutorials as timely and as precisely as possible, however, the contents may contain inaccuracies or errors. Tutorials Point (I) Pvt. Ltd. provides no guarantee regarding the accuracy, timeliness or completeness of our website or its contents including this tutorial. If you discover any errors on our website or in this tutorial, please notify us at [email protected] ii Scikit-Learn Table of Contents About the Tutorial ........................................................................................................................................... ii Audience .......................................................................................................................................................... ii Prerequisites .................................................................................................................................................... ii Copyright & Disclaimer .................................................................................................................................... ii Table of Contents ........................................................................................................................................... iii 1. Scikit-Learn — Introduction ...................................................................................................................... 1 What is Scikit-Learn (Sklearn)? ........................................................................................................................ 1 Origin of Scikit-Learn ....................................................................................................................................... 1 Community & contributors.............................................................................................................................. 1 Prerequisites .................................................................................................................................................... 2 Installation ....................................................................................................................................................... 2 Features ........................................................................................................................................................... 3 2. Scikit-Learn ― Modelling Process ............................................................................................................. 4 Dataset Loading ............................................................................................................................................... 4 Splitting the dataset ........................................................................................................................................ 6 Train the Model ............................................................................................................................................... 7 Model Persistence ........................................................................................................................................... 8 Preprocessing the Data ................................................................................................................................... 9 Binarisation ...................................................................................................................................................... 9 Mean Removal ................................................................................................................................................. 9 Scaling ............................................................................................................................................................ 10 Normalisation ................................................................................................................................................ 11 3. Scikit-Learn — Data Representation ....................................................................................................... 13 Data as table .................................................................................................................................................. 13 Data as Feature Matrix .................................................................................................................................. 13 Data as Target array ...................................................................................................................................... 14 iii Scikit-Learn 4. Scikit-Learn ― Estimator API ................................................................................................................... 16 What is Estimator API? .................................................................................................................................. 16 Use of Estimator API ...................................................................................................................................... 16 Guiding Principles .......................................................................................................................................... 17 Steps in using Estimator API .......................................................................................................................... 18 Supervised Learning Example ........................................................................................................................ 18 Unsupervised Learning Example ................................................................................................................... 23 5. Scikit-Learn — Conventions .................................................................................................................... 26 Purpose of Conventions ................................................................................................................................ 26 Various Conventions ...................................................................................................................................... 26 6. Scikit-Learn ― Linear Modeling .............................................................................................................. 31 Linear Regression .......................................................................................................................................... 32 Logistic Regression ........................................................................................................................................ 34 Ridge Regression ........................................................................................................................................... 37 Bayesian Ridge Regression ............................................................................................................................ 40 LASSO (Least Absolute Shrinkage and Selection Operator)........................................................................... 43 Multi-task LASSO ........................................................................................................................................... 45 Elastic-Net...................................................................................................................................................... 47 MultiTaskElasticNet ....................................................................................................................................... 51 7. Scikit-Learn — Extended Linear Modeling ............................................................................................... 54 Introduction to Polynomial Features ............................................................................................................. 54 Streamlining using Pipeline tools .................................................................................................................. 55 8. Scikit-Learn ― Stochastic Gradient Descent ............................................................................................ 57 SGD Classifier ................................................................................................................................................