Innovator Predict Employee Turnover using IBM Studio Build, train and evaluate models

Exercise steps

To perform this part of the exercise we will be using a feature in Watson Studio called AutoAI. AutoAI is an automated machine learning tool that helps to quickly get started and work with machine learning models. It helps in cleaning the data, preparing the data, , hyper parameter optimization, model experimentation and then finally choosing the right model to deploy for your data.

1. Let’s start by adding an AutoAI experiment asset type to the project. Click on the Add to project button in the Assets tab of your project and click on AutoAI Experiment.

2. Name the model “Predicting Employee Turnover” and click on Associate a Machine Learning service instance to create a new Machine Learning instance on your cloud.

Innovator Predict Employee Turnover using IBM Watson Studio Build, train and evaluate Machine Learning models

3. A new tab will open with the details of the Machine Learning instance. Make sure you’re creating a New Machine Learning instance.

Innovator Predict Employee Turnover using IBM Watson Studio Build, train and evaluate Machine Learning models

4. Again, make sure that the lite plan is selected.

5. Leave everything as default, scroll down to the bottom and create the Machine Learning instance.

Innovator Predict Employee Turnover using IBM Watson Studio Build, train and evaluate Machine Learning models

6. Confirm the creation of the instance.

7. Once the Machine Learning instance is created, you should be back on the Create an AutoAI experiment details screen. Reload the Machine Learning service to view the newly created instance.

Innovator Predict Employee Turnover using IBM Watson Studio Build, train and evaluate Machine Learning models

8. Once it is loaded you should see your newly added Machine Learning service instance.

Go on ahead and click on Create.

9. Now, the auto AI experiment builder will be displayed. The first thing we need to do is to load the training data. Since we already have the data set in the project, go ahead and click on Select from project.

Innovator Predict Employee Turnover using IBM Watson Studio Build, train and evaluate Machine Learning models

10. Select the Shaped dataset (you can hover over the name to see the full name of the file) and then click on Select Asset.

11. Now, we need to select the column that we want to predict. Since we want to predict the attrition of an employee we will go ahead and select the Attrition column.

Innovator Predict Employee Turnover using IBM Watson Studio Build, train and evaluate Machine Learning models

12. AutoAI analyzes your data and determines that the Attrition column contains Yes and No information making this suitable for a binary classification. The positive class is true and the default metric for a binary classification is roc AUC which balances precision accuracy and recall. You can configure the prediction to change any of these parameters by clicking Configure prediction.

13. You can change any of these parameters by choosing a different prediction type for a positive class, or optimized metric.

In the prediction type you will find three different type of techniques. Innovator Predict Employee Turnover using IBM Watson Studio Build, train and evaluate Machine Learning models

• Binary classification is used if the prediction column is binary like “true or false” and “0 or 1” etc. • Multiclass classification is used if your prediction column has more than one attributes or categories. • Regression is used if your prediction column has numeric values.

As for the optimized metric table, you can leave it as whatever is being recommended.

14. Let’s go ahead and run the experiment now.

15. Now you'll see the entire pipeline of what Watson is going to do with this data. This pipeline may take about 10-15 minutes or more to complete.

Innovator Predict Employee Turnover using IBM Watson Studio Build, train and evaluate Machine Learning models

As the pipeline is running let me explain the different steps it goes through:

a. Read dataset: First it reads the dataset b. Split holdout data: It will split the data into three sets. Generally when data scientists do model testing they split the data into three parts, 60% of the data is given to the machine which will try to learn from the given data and create patterns based on it, 20% of the data is kept as the test data which will help improve its patterns and see if they are identified correctly as yes and no and then finally the last 20% is kept to test the testing of the conclusion the machine has come up with and to see if it works well with new data. c. Read training data: After the split it will read just the first set from the data. d. Preprocessing: Then the machine will try to clean the data as best as it can (some of which we performed manually in the last module). e. Model selection: Then it chooses what kind of machine learning model would best work with this data. f. XGB Classifier: It chooses XGB (eXtreme Gradient Boosting) classifier as its model g. Hyperparameter optimization: Here the AutoAI plays around with the different type of default settings of the model chosen to see which type of settings performs best by looking at the error rates. h. Feature engineering: Here AutoAI prepares the input dataset to be compatible with the machine learning algorithm and improves the performance of the models by using numerous techniques. i. Hyperparameter optimization: Using feature engineering AutoAI might have created more data/columns to work with by multiplying/dividing/ subtracting different columns. So it runs another round of hyperparameter optimization with the newly added columns.

16. Once the pipeline is completed it should looks similar to the figure below.

Innovator Predict Employee Turnover using IBM Watson Studio Build, train and evaluate Machine Learning models

17. Once you scroll down you can have a look at the leaderboard. The pipelines you see here are ranked based on the ROC AUC metric. Lets start by comparing pipelines by clicking on the Compare pipelines button.

18. This table provides an overview of the four pipelines. You can return to the pipeline leaderboard by clicking on Back to Predicting Employee Turnover.

19. To see details of the pipelines you can click on an individual pipeline’s name.

Innovator Predict Employee Turnover using IBM Watson Studio Build, train and evaluate Machine Learning models

20. Here you can see the Model Evaluation, Confusion Matrix, Precision Recall Curve and Model Information. Again, go back to the previous screen by clicking on the Back to Predicting Employee Turnover.

21. You can also see the summary of every pipeline by clicking on the arrow and expanding each section.

Innovator Predict Employee Turnover using IBM Watson Studio Build, train and evaluate Machine Learning models

22. Here you can see the summary of the pipeline including the ROC curve, the cross validation score which is used to rank the training done by autoAI and the Holdout score is used for the resulting pipeline model evaluation and computation of the performance information such as ROC curves and confusion matrices.

Now let’s go ahead and click on Save as model for the top ranked pipeline.

23. Click on Save.

Innovator Predict Employee Turnover using IBM Watson Studio Build, train and evaluate Machine Learning models

24. Once the model is saved a popup will appear, you can click on View in Project or you can view the model under the Assets tab of your project.

In the next module you will learn how to deploy your model for use and test it.