Model Performance Evaluation : API Overview and Usage
Total Page:16
File Type:pdf, Size:1020Kb
Model Performance Evaluation : API Overview and Usage This document describes the web services provided by the Model Performance Evaluation (MPE) component. It allows user to compute different performance indicators for any given predictive model. The component API contains a single web service: evaluateModelPerformance: This function evaluates the performance of a predictor by comparing its predictions with the true values. It assumes that the true values were never communicated to the predictor. It computes regression metrics, confidence intervals, and error statistics. End Point base URL of the component: The end point base URL (referred as <end_point_base_url> in this chapter) is the following : Exchange URL: <end_point_base_exchange_url> = https://api.exchange.se.com/analytics-model-performance- evaluate-api-prod Authorization In order to be authorized to get access to the API please provide in the request header the following parameter: Authorization: <Subscription_Key> For steps on how to generate a <Subscription_Key> please refer to the Features tab. modelPerfEval Input End point Full URL (For Exchange and AzureML) The end point Full URL ( referred as <end_point_full_url> in this chapter) is the following : <end_point_full_exchange_url> = <end_point_base_exchange_url>/c367fb8bbe0741518426b0c5f2fa439e A http code example to consume the modelPerfEval web service can be seen below. Input data The inputDataset contains 2 columns. The first column contains the real values. The second column contains the predicted values. An input example can be see bellow. The API enforces a strict schema, so the name should be correctly provided to ensure that the service works and to make sure to get the correct interpretation of the results. real prediction 80321.33 80395.76 80477.75 80489.77 80648.28 80637.01 80770.82 80756.70 80037.92 80070.09 80353.61 80374.96 ... ... Global parameters Input parameters for the function evaluateModelPerformance are as follow: seed : integer, set the random seed bootstrap : integer, the number of bootstrap samplings to perform, zero means no sampling alpha1 : float, between 0 and 1, first confidence intervals are computed with the probability 1 - alpha1 alpha2 : float, between 0 and 1, second confidence intervals are computed with the probability 1 - alpha2 . This parameter enable the user to get confidence intervals computed with a second significance threshold. We recommend to used alpha1 for the desired threshold and alpha2 as a stringent threshold. error_bs : boolean, if True boostrap resampling is used to compute statistics for error modelPerfEval output The function evaluateModelPerformance returns three dataframes. performanceMetricsDf : various state-of-the-art regression metrics (e.g. MAE, RMSE, Rsquared...) to evaluate the prediction model performance confidenceIntervalDf : confidence intervals regarding the differences of real and predicted values under the consideration of different significance levels errorStatisticsDf : error statistics to define the distribution of the prediction errors The following section explains in more details how the performance indicators are computed. Calculation of Regression Metrics The calculation of regression metrics evaluates the model performance along different dimensions. Generally, the resulting output can be classified into three categories which are subsequently outlined in more detail. Absolute performance Absolute performance metrics refer to errors measured in absolute terms of the measured/predicted unit. For example, if the measured and predicted values are given in kWh, a mean_absolute_error = 199.93 can be interpreted as "an average error of 199.93 kWh in the predictions". The following metrics are implemented to evaluate the model performance in absolute terms: Mean Absolute Error (MAE) (or Average Absolute Deviation): ref Shows the average error between the predicted and actual observed values. Root Mean Squared Error (RMSE) (or Root Mean Squared Deviation): ref Is similar to the MAE metric, but punishes larger prediction errors (MAE $\leq$ RMSE). Performance relative to target value Relative model performance metrics represent the average error of the predicted value in relation to the target variable. They can be converted into a "percentage or error". For example relative_error = 0.002 can be interpreted as "an average prediction error of 0.2 % to the actual observations". In all cases, the relative errors are expressed in absolute terms. Note that this is opposed to the "standard" performance calculation typically described in the literature. However, we believe that this approach is more suitable for the MPE component. Normalized Mean Absolute Error (NMAE) (or Coefficient of Variation of MAE): This metric is used to facilitate the comparison regarding MAE of datasets with different scales. As a mean of normalization, the model performance evaluation tool uses the mean of the measured data. Normalized Root Mean Squared Error (NRMSE) (or Coefficient of Variation of RMSE): ref This metric is used to facilitate the comparison regarding RMSE of datasets with different scales. As a mean of normalization, the model performance evaluation tool uses the mean of the measured data. Relative Error Strict ($RE_S$): ref Relative Error Lenient ($RE_L$): ref Relative Error (RE) (or Percentage Error): ref Performance relative to constant model Performance metrics in this category compare the the prediction to the performance of a constant ("dummy") predictor that simply predicts the average of actual observations. Except for r_squared (which has an inverted scale), they can be interpreted as the ratio between the actual prediction error and the prediction error of the constant model. For example, a normalized_absolute_error = 0.2 suggests the applied predictor is 5 times better than the dummy model. Relative Absolute Error (RAE): ref Root Relative Squared Error (RRSE): ref Coefficient of determination ($R^2$): ref Performance metrics output example In our example, the output dataframe performanceMetricsDf is as follow [ { "category": "absolute", "reference": "actualValue", "indicatorName": "mean_absolute_error", "dataset": "inputs", "value": "28.0984546107082" }, { "category": "absolute", "reference": "actualValue", "indicatorName": "root_mean_squared_error", "dataset": "inputs", "value": "34.0783380277383" }, { "category": "relative", "reference": "actualValue", "indicatorName": "normalized_mean_absolute_error", "dataset": "inputs", "value": "0.000349003813162058" }, { "category": "relative", "reference": "actualValue", "indicatorName": "relative_error", "dataset": "inputs", "value": "0.000349355437697777" }, { "category": "relative", "reference": "actualValue", "indicatorName": "relative_error_lenient", "dataset": "inputs", "value": "0.00034920297099358" }, { "category": "relative", "reference": "actualValue", "indicatorName": "relative_error_strict", "dataset": "inputs", "value": "0.00034938263530956" }, { "category": "relative", "reference": "actualValue", "indicatorName": "normalized_root_mean_squared_error", "dataset": "inputs", "value": "0.000423278435867207" }, { "category": "relative", "reference": "constantModel", "indicatorName": "relative_absolute_error", "dataset": "inputs", "value": "0.132684888413312" }, { "category": "relative", "reference": "constantModel", "indicatorName": "root_relative_squared_error", "dataset": "inputs", "value": "0.137531222898783" }, { "category": "relative", "reference": "constantModel", "indicatorName": "r_squared", "dataset": "inputs", "value": "0.981085162727965" } ] Rows of the output dataframe performanceMetricsDf are performance metrics. The columns are as follow: indicatorName : the name of the performance metric category the category of the performance metric dataset the name of the input dataset value : the value of the performance metric The following graph summarize the performance metrics computed by the service: Calculation of Confidence Intervals The confidence interval table outlines statistical errors quantiles of the predictions for a given significance value $\alpha$. Symmetric and asymmetric confidence intervals are both computed. The symmetric confidence interval consider the overall distribution of the prediction error. Whereas asymmetric confidence interval consider positive and negative bias of the prediction error. Formally, if we consider that $$ y = \hat{y} + e $$ and $P$ the probability density operator then we can write symmetric interval asymmetric interval All intervals should be interpreted with respect to the associated significance value $\alpha$. For alpha = 0.05 each confidence interval should be interpreted as "a 5% risk that the error committed is outside of the given interval". This output is a good indicator of the reliability of the prediction. Thus, for a given prediction $\hat{y}$, the risk that the true value $y$ is outside the interval $[\hat{y} - m^{-}, \hat{y} + m^{+}]$ (or $[\hat{y} - m, \hat{y} + m]$ for the symmetric interval) is equal to $\alpha$. Each confidence interval boundary is obtained by bootstrapping the sample Nb times in order to get a robust value (the mean across bootstrap folds) ref. For this reason a standard deviation regarding the confidence intervals is also provided. Confidence interval output example In our example, the output confidenceIntervalDf is as follow [ { "scale": "Relative", "tails": "Positive", "dataset": "inputs", "num_bootstrap": "15", "significance_level": "0.01", "value": "0.000800377772775264", "std": "0.000193069842693065" }, { "scale": "Relative", "tails":