Innovator Predict Employee Turnover Using IBM Watson Studio Build, Train and Evaluate Machine Learning Models
Total Page:16
File Type:pdf, Size:1020Kb
Innovator Predict Employee Turnover using IBM Watson Studio Build, train and evaluate Machine Learning models Exercise steps To perform this part of the exercise we will be using a feature in Watson Studio called AutoAI. AutoAI is an automated machine learning tool that helps to quickly get started and work with machine learning models. It helps in cleaning the data, preparing the data, feature engineering, hyper parameter optimization, model experimentation and then finally choosing the right model to deploy for your data. 1. Let’s start by adding an AutoAI experiment asset type to the project. Click on the Add to project button in the Assets tab of your project and click on AutoAI Experiment. 2. Name the model “Predicting Employee Turnover” and click on Associate a Machine Learning service instance to create a new Machine Learning instance on your cloud. Innovator Predict Employee Turnover using IBM Watson Studio Build, train and evaluate Machine Learning models 3. A new tab will open with the details of the Machine Learning instance. Make sure you’re creating a New Machine Learning instance (if you have already created a Machine learning service before switch to the Existing tab and select your service instead). 4. Again, make sure that the lite plan is selected. Innovator Predict Employee Turnover using IBM Watson Studio Build, train and evaluate Machine Learning models 5. Leave everything as default, scroll down to the bottom and create the Machine Learning instance. 6. Confirm the creation of the instance. Innovator Predict Employee Turnover using IBM Watson Studio Build, train and evaluate Machine Learning models 7. Once the Machine Learning instance is created, you should be back on the Create an AutoAI experiment details screen. Reload the Machine Learning service to view the newly created instance. 8. Once it is loaded you should see your newly added Machine Learning service instance. Go on ahead and click on Create. Innovator Predict Employee Turnover using IBM Watson Studio Build, train and evaluate Machine Learning models 9. Now, the auto AI experiment builder will be displayed. The first thing we need to do is to load the training data. Since we already have the data set in the project, go ahead and click on Select from project. 10. Select the Shaped dataset and then click on Select Asset. Innovator Predict Employee Turnover using IBM Watson Studio Build, train and evaluate Machine Learning models 11. Now, we need to select the column that we want to predict. Since we want to predict the attrition of an employee we will go ahead and select the Attrition column. 12. AutoAI analyzes your data and determines that the Attrition column contains “Yes” and “No” information making this suitable for a binary classification. The positive class is Yes and the default metric for a binary classification is ROC AUC which balances precision accuracy and recall. You can configure the prediction to change any of these parameters by clicking Experiment settings and learn more about them there. The Data Source tab shows a few settings that you can change for the dataset. You can use part of the dataset in the Machine learning model by subsampling the dataset to save time, you can change the training data split which by default is set to 90% (the data will split in three folds for creating, optimizing and validating the pipeline and the last 10% of the data is kept separate to test each model), and you can also select which columns you don’t want to include in your machine learning model. Innovator Predict Employee Turnover using IBM Watson Studio Build, train and evaluate Machine Learning models In the Prediction tab, there are a few settings you can change related to the Machine Learning Model. Under the prediction type you will find three different type of techniques: • Binary classification is used if the prediction column is binary like “true or false” and “0 or 1” etc. • Multiclass classification is used if your prediction column has more than one attributes or categories. • Regression is used if your prediction column has numeric values (automatically disabled in our case as our prediction column is of a classification type). You can change the positive class for attrition to be Yes or No, and as for the optimized metric table and algorithms to include, you don’t need to worry about it and can leave it as whatever is being recommended. Innovator Predict Employee Turnover using IBM Watson Studio Build, train and evaluate Machine Learning models Lastly in the Runtime tab, you can see the general runtime settings of the AI model being created. Once you are done going through the config and understanding it, you can go ahead and click on Save settings. 13. Let’s go ahead and run the experiment now. Innovator Predict Employee Turnover using IBM Watson Studio Build, train and evaluate Machine Learning models 14. The AutoAI tool will start running and you will see the Relationship map of what’s happening. You can click on the Swap view on the right to see the progress map of your model which will better show which steps the AutoAI tool is going through. 15. On the Progress map you'll see the entire pipeline of what Watson is going to do with this data. This pipeline may take about 10-15 minutes or more to complete. As the pipeline is running, let’s understand the different steps it goes through: a. Read dataset: First it reads the dataset b. Split holdout data: It will split the data into three sets. Generally when data scientists do model testing they split the data into three parts, 60% of the data is given to the machine Innovator Predict Employee Turnover using IBM Watson Studio Build, train and evaluate Machine Learning models which will try to learn from the given data and create patterns based on it, 20% of the data is kept as the test data which will help improve its patterns and see if they are identified correctly as yes and no and then finally the last 20% is kept to test the algorithm’s conclusion and to see if it works well with new data. c. Read training data: After the split it will read just the first set from the data. d. Preprocessing: Then the machine will try to clean the data as best as it can (some of which we performed manually in the last module). e. Model selection: Then it chooses what kind of machine learning model would best work with this data. f. XGB Classifier: It chooses XGb (eXtreme Gradient Boosting) classifier as its model g. Hyperparameter optimization: Here the AutoAI plays around with the different type of default settings of the chosen model to see which type of settings performs best by looking at the error rates. h. Feature engineering: Here AutoAI prepares the input dataset to be compatible with the machine learning algorithm and improves the performance of the models by using numerous techniques. i. Hyperparameter optimization: Using feature engineering AutoAI might have created more data/columns to work with by multiplying/dividing/ subtracting different columns. So it runs another round of hyperparameter optimization with the newly added columns. 16. Once the pipeline is completed it should look similar to the figure below. Innovator Predict Employee Turnover using IBM Watson Studio Build, train and evaluate Machine Learning models 17. You can scroll down to have a look at the leaderboard. The pipelines you see here are ranked based on the ROC AUC metric. As you can see Pipeline 3 & 4 have performed the best out of the 8 pipelines (8 different Machine Learning configurations) with the ROC AUC at 0.870, so it has put a star beside it to let us know which one to continue using. Let’s also compare the pipelines by clicking on the Pipeline comparison button on the top. 18. This chart provides an overview of the four pipelines. It’s a useful chart showing how all the different metrics compare with one another in the 4 pipelines. Innovator Predict Employee Turnover using IBM Watson Studio Build, train and evaluate Machine Learning models 19. To see details of the pipelines, you can scroll down and click on an individual pipeline’s name. 20. Here you can see the Model Evaluation, Confusion Matrix, Precision Recall Curve and Model Information. Go ahead and explore this section to understand the pipeline better, once you are done you can go back to the previous screen by clicking on the Back to Predicting Employee Turnover. Innovator Predict Employee Turnover using IBM Watson Studio Build, train and evaluate Machine Learning models 21. You can also see the summary of every pipeline by going back to the Experiment summary and clicking on the arrow beside the pipeline and expanding each section. 22. Here you can see the summary of the pipeline including the ROC curve, the cross validation score which is used to rank the training done by AutoAI and the Holdout score which is used for the resulting pipeline model evaluation and computation of the performance information such as ROC curves and confusion matrices. Now let’s go ahead and click on Save As -> Model for the top ranked pipeline. Innovator Predict Employee Turnover using IBM Watson Studio Build, train and evaluate Machine Learning models 23. Click on Save. 24. Once the model is saved a popup will appear, you can click on View in Project or you can view the model under the Assets tab of your project. In the next module you will learn how to deploy your model for use and test it.