Improving Statistical Reporting Explainability via Principal Component Analysis

Shengkun Xie and Clare Chua-Chow Global Management Studies, Ted Rogers School of Management, Ryerson University, Toronto, Canada

Keywords: Explainable , Data Visualization, Principal Component Analysis, Size of Loss , Business Analytics.

Abstract: The study of high dimensional data for decision-making is rapidly growing since it often leads to more accu- rate information that is needed to make reliable decision. To better understand the natural variation and the pattern of statistical reporting data, visualization and interpretability of data have been an on-going challeng- ing problem, mainly, in the area of complex statistical data analysis. In this work, we propose an approach of dimension reduction and feature extraction using principal component analysis, in a novel way, for analyzing the statistical reporting data of auto insurance. We investigate the functionality of loss relative frequency, to the size-of-loss as well as the pattern and variability of extracted features, for a better understanding of the nature of auto insurance loss data. The proposed method helps improve the data explainability and gives an in-depth analysis of the overall pattern of the size-of-loss relative frequency. The findings in our study will help the insurance regulators to make a better rate filling decision in the auto insurance that would benefit both the insurers and their clients. It is also applicable to similar data analysis problems in other business applications.

1 INTRODUCTION required specific statistical approaches that can help reduce the dimensionality and complexity of the data. The study of high dimensional data for decision- This is why the insurance regulators are focusing on making is rapidly growing since it often leads to more the analysis of statistical reporting, which contains accurate information, which is needed to make reli- the aggregate loss information from the industry. On able decision. (Elgendy and Elragal, 2014; Goodman the other hand, big data is not just about a large vol- and Flaxman, 2017; Hong et al., 2019). Also, global ume of data being collected; it also implies the high businesses have entered a new era of decision mak- level of complexity of the frequently updated data ing using big data, and it has posed a new challenge (Lin et al., 2017; Bologa et al., 2013). In the regu- to most companies. Statistical analysis provides man- lation process, insurance loss data is continuously be- agers with tools for making a better decision. How- ing collected. The collected data are then further ag- ever, it is always a challenge to pick a better tool to gregated and summarized using some necessary sta- analyze the data to give a better picture necessary to tistical measures such as count, total and . The make even smarter and better business decisions. In data organizations are separated by different factors insurance rate regulation, the statistical data report- of interest. For instance, in statistical data report- ing is a big data application. It involves many dis- ing of large-loss analysis, claim counts and claim loss tributed computer systems to implement data collec- amounts are reported by coverage, territory, accident tions and data summaries, regularly, and the amounts year and reporting year (McClenahan, 2014). These of data collected are massive (Wickman, 1999). For data, which are organized as exhibits, are then used by example, to study the loss behaviour by the forward insurance regulators for a better understanding of the sortation area (FSA) in Ontario, Canada, around 10 insurance risk and uncertainty, through suitable sta- Gigabytes of loss data were collected from all auto tistical analysis, both quantitatively and qualitatively. insurance companies during the period from 2010 to The obtained results from statistical analysis are used 2012. Processing this significant amount of data to as guidelines for decision-making of rate filing re- extract useful information is extremely difficult and views. Because of the need for understanding the na-

185

Xie, S. and Chua-Chow, C. Improving Statistical Reporting Data Explainability via Principal Component Analysis. DOI: 10.5220/0009805901850192 In Proceedings of the 9th International Conference on Data Science, Technology and Applications (DATA 2020), pages 185-192 ISBN: 978-989-758-440-4 Copyright c 2020 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved

DATA 2020 - 9th International Conference on Data Science, Technology and Applications

ture of the aggregated loss data, it calls for suitable results are presented. Finally, we conclude our find- data analytics that can be used for processing statis- ings and provide further remarks in Section 5. tical data reporting. (Xie and Lawniczak, 2018; Xie, 2019). As a multivariate statistical approach, the conven- 2 DATA tional Principal Component Analysis (PCA) is often used to reduce the dimension of multivariate data or In this work, we focus on the study of the size-of-loss to reconstruct the multidimensional data matrix us- relative frequency of auto insurance using datasets ing only the selected PCs. However, within the PCA from the Insurance Bureau of Canada (IBC), which approach, the functionality between the multivariate is a Canadian organization responsible for insurance data and other variables is not considered (Bakshi, data collections and their statistical data reporting 1998). Application using PCA may become problem- problems in the area of property and casualty in- atic when multivariate data are interconnected. From surance. During the process, insur- the data visualization perspective, it could be mis- ance companies report the loss information, includ- leading if the frequency value at a given size-of-loss ing the number of claims, number of exposures, loss interval is visualized without incorporating the size- amounts, as well as other key information such as of-loss in the plot. In statistical data reporting, the territories of loss, coverages, driving records associ- incurred losses are grouped as intervals. Each size- ated with loss, and accident years. These statistical of-loss interval is not even, and with the increase of data are reported regularly (i.e., weekly, biweekly or the incurred loss, the width of the intervals dramati- monthly). At the end of each half-year, the total claim cally increases. To overcome the potential mistakes amounts and claim counts reported by all insurance that can be caused by the visualization of loss data, companies are aggregated by territories, coverages, PCA is used to extract key information from the data accident years, etc. The statistical data reporting is matrix so that the main pattern functionality between then used for insurance rate regulation to ensure the the relative frequency and the size-of-loss can be vi- premiums charged by insurance companies are fair sualized properly. By doing so, we significantly im- and exact. The dataset used in this work consists of prove data explainability. In this work, PCA is used summarized claim counts by different sizes of loss, for both low-rank approximations and feature extrac- which are represented by a set of non-overlapping in- tions, with the consideration of the functionality of tervals. The claim counts are aggregated by major relative frequency values and the size-of-loss. coverages, i.e. Bodily Injuries (BI) and Accident Ben- Our contribution to this research area is using efits (AB). Also, the data were summarized by differ- PCA in a novel way, to extract its key features of auto ent accident years, by different report years and by insurance loss to improve the data visualization for a different territories, i.e. Urban (U) and Rural (R). better decision-making process. To our best knowl- To carry out the study, we organize data by cov- edge, the proposed method appears for the first time erages (AB and BI) and by territories (U and R). We in literature to consider the data explainability prob- consider the data from different reporting years and lem of statistical data reporting in insurance sector. accident years as repeated observations. There are The proposed method helps to improve the data ex- two reporting years, 2013 and 2014, respectively. For plainability as well as a better understanding of the each reporting year, there is a set of rolling most re- overall pattern of the size-of-loss relative frequency cent five years of data corresponding to five accident at the industry level. Also, feature extraction by PCA years. Therefore, for this study, we have in total ten facilitates the understanding of loss vari- years of observation. Also, since we have both Acci- ability, both the overall and the local behaviour, and dent Benefits and Bodily Injuries as the coverage type its natural functionality between the frequency values and Urban and Rural as the territory, we consider the and the size-of-loss. The analysis conducted in this following four different combinations, Accident Ben- work illustrates the application of a suitable multi- efits and Urban ( ABU), Accident Benefits and Rural variate statistical approach to dimension reduction of (ABR), Bodily Injuries and Urban (BIU), and Bodily statistical data in auto insurance to have a higher data Injuries and Rural (BIR). These data are then formed interpretability. This paper is organized as follows. into a data matrix with a 40 × 24 dimension, where In Section 2, the data and its collection are briefly in- 40 is the total number of observations, and 24 is the troduced. In Section 3, the proposed methods, includ- number of total intervals of the size-of-loss. ing feature extraction and low-rank approximation via PCA, are discussed. In Section 4, analysis of auto in- surance size-of-loss data and the summary of the main

186 Improving Statistical Reporting Data Explainability via Principal Component Analysis

3 METHODS and we will explore both cases with and without scal- ing on the data matrix Y. Using PCA, we map the 3.1 Defining Size of Loss Relative data matrix Y onto a low-dimensional data matrix Z, which will be defined later. This mapping is achieved by retaining only a selected number of components, called principal components. These principal compo- Since the claim count data was pre-grouped, we must nents explain the majority of the data variation of the first give a definition to the empirical size-of-loss rela- underlying variables. Specifically, the first principal tive frequency distribution based on the aggregate ob- component is the normalized linear combination of servations of claim counts. Let f (x) be the true size- + variables that lead to the following form: of-loss relative frequency distribution, where x ∈ R is the ultimate loss. Ideally, to study the size-of-loss zi1 = φ11yi1 + φ21yi2 + ...,+φp1yip, i = 1,...,n, (2) relative frequency distribution function, we would ex- which has the maximum , subject to the pect to have a set of raw data pairs of loss frequency p 2 values and the ultimate loss, so that we can estimate constraint ∑ j=1 φ j1 = 1. The first principal com- > the distribution function by some parametric mod- ponent loading vector φ1= [φ11,φ21,...,φp1] indi- elling approaches. However, in statistical data report- cates the direction in the principal component fea- ing, we are only able to analyze the grouped data be- ture space, and the first principal component score > cause the raw data is usually not available to insur- vector Z1= [zi1,...,zn1] is the projected values of ance regulators. For the grouped data, the estimated Y onto the feature subspace φ1. The subsequent size-of-loss relative frequency distribution is defined principal components are obtained by following the as follows same step as the first principal component, which maximizes the variance of the linear combination of ˆ Ci f (x) = p if x ∈ [li,li+1), (1) the underlying variables after removing the varia- ∑ Ci i=1 tion that has been explained by the previous compo- where Ci is the total claim counts associated with the nents and they are orthogonal to the previous princi- ith interval. [li,li+1) is the ith size-of-loss interval. pal components. Through this process, we can obtain ˆ Note that, the function of f (x) is an empirical estimate the principal component loading matrix, denoted by of its true size-of-loss relative frequency distribution. φ=[φ1,φ2,...,φp], and the principal component score In this work, we use the PCA approach to approxi- matrix, denoted by Z=[Z1,Z2,...,Zp]. Also, we have f (x) mate the function by retaining only a small num- Z=Yφ>. Once we compute these principal compo- ber of principal components. Since we have grouped ˆ nents, we can reduce the dimension of our data by data, the function f (x) is replaced by a vector. solely focusing on the major principal components. These low-dimensional projected feature vectors can 3.2 Feature Extraction by Principal be visualized if the dimension is not higher than three. Component Analysis If we retain only M principal components, then the principal components scores matrix in the feature sub- Feature extraction is a dimension reduction method in space is given as follows: machine learning. It aims at extracting important and   key features from data so that further analysis can be z11 z12 ... z1M z z ... z M facilitated. Assume that we have n observations of ∗  21 22 2  Z = [Z1,Z2,...,ZM] =  . . . . . f (x), the function of f (x) is replaced by a p-variate  . . .. .  vector, denoted by Y =[y ,y ,...y ], where i=1,2,..., i i1 i2 ip zn1 zn2 ... znM n. Here n is the number of observations and p is the number of variables, which corresponds to the num- Note that, by retaining only M components, we can ber of size-of-loss intervals. These observation data reduce the dimension of data matrix Y from n × p to can be organized by the following n × p data matrix n × M, where M  p. This is how we improve the data explainability using a feature domain, instead of y y ... y  11 12 1p the original domain. Mapping data matrix Y onto a y y ... y p >  21 22 2  feature matrix Z is referred to as feature extraction in Y = (Y1,Y2,...,Yn) =  . . . . ,  . . .. .  the machine learning literature (Khalid et al., 2014). yn1 yn2 ... ynp where yi j corresponds to the observed relative fre- quency value at the ith observation and the jth inter- val. We assume that each column of Y is centred,

187 DATA 2020 - 9th International Conference on Data Science, Technology and Applications

3.3 Low Rank Approximation of Orignal ABU Approximation ABU 0.08 Relative Frequency Distribution by 0.08 0.00 0.00 Principal Component Analysis L0 L3 L6 L9 L12 L15 L18 L21

The relative frequency distribution is considered to be Orignal ABR Approximation ABR 0.15 a function of the size-of-loss, as shown in Equation 0.15 0.00 0.00 (1). By taking a linear combination of observations L0 L3 L6 L9 L12 L15 L18 L21 with a suitable choice of weight values, one can ex- tract the major patterns of the size-of-loss relative fre- Orignal BIU Approximation BIU quency distribution. The extracted major pattern re- 0.20 flects the functionality between the relative frequency 0.20 0.00 0.00 value and the size-of-loss. This function approxima- L0 L3 L6 L9 L12 L15 L18 L21 tion uses only M principal components to reconstruct the data matrix Y. Reducing the dimension from p to Orignal BIR Approximation BIR 0.3 M to reconstruct the data matrix Y is called a low-rank 0.3 0.0 0.0 approximation of Y (Clarkson and Woodruff, 2017), L0 L3 L6 L9 L12 L15 L18 L21 and it can be expressed as follows Figure 1: The barplots of loss relative frequency by differ- Y ≈ Zφ>, (3) ent combinations of major coverages and territories, before where φ is the p × M loading matrix, and Z is the and after reconstruction using the first five principal com- ponents. n × M score matrix. When M = p, the data matrix is fully restored by the score and loading matrices. An- Orignal ABU Approximation ABU other common approach related to low rank approx- 0.08 0.08 imation is to find principal components through the 0.00 0.00 Singular Value Decomposition (SVD) of the data ma- L0 L3 L6 L9 L12 L15 L18 L21 trix Y (Golub and Reinsch, 1971; Alter et al., 2000). Mathematically, it that the data matrix Y can Orignal ABR Approximation ABR 0.15 be decomposed into the following equation 0.15 0.00 0.00 Y = UΣV>, (4) L0 L3 L6 L9 L12 L15 L18 L21

where U is an n × n unitary matrix, Σ is n × p diag- Orignal BIU Approximation BIU onal matrix and V is a p × p unitary matrix. For a 0.25 given new observation Y with a dimension 1 × p , the 0.20 0.00 0.00 projected value becomes YV, which is the feature ex- L0 L3 L6 L9 L12 L15 L18 L21 traction using PCA discussed in section 3.2. If we use only the first M eigenvectors, the dimension of the ex- Orignal BIR Approximation BIR 0.3 tracted feature vector is M. 0.3 0.0 0.0 On the other hand, based on Equation (4), we can L0 L3 L6 L9 L12 L15 L18 L21 derive the spectral decomposition of Y>Y as shown Figure 2: The barplots of loss relative frequency by differ- below in Equation (5) ent combinations of major coverages and territories, before Y>Y = VΣ>U>UΣV>, (5) and after reconstruction using only the first two principal components. where Y>Y can be interpreted as the p × p sample matrix. Since U is an unitary matrix, we have U>U= 1, which is an identity matrix. This im- plies that 4 RESULTS Y>Y = VΣ>ΣV> = (VΣ)(VΣ)>. (6) This is a cholesky decomposition of YY>. Note that, First, we illustrate that the visualization of the size-of- the following solution of Y solves the above Equation loss distribution could be misleading when the func- ∗ > tionality between relative frequency values and the Y = ΣV . (7) size-of-loss is not considered. Figures 1 and 2 show Using Equations (4) and (7), we can obtain Y = UY∗. the relative frequency values for all combinations of This result implies that any orthogonal decomposition major coverages and territories. It is mistaken if one of Y∗ will solve Equation (6). This is the reason why tries to comment on the shape of the distribution since the principal component is not unique. the intervals of the size-of-loss are not the same. The

188 Improving Statistical Reporting Data Explainability via Principal Component Analysis

Table 1: Results of the first 6 PCs of loss relative frequency, including the of principal component, the proportion of variation explained by each principal component and the cumulative proportion of variation explained by the first few PCs. PC1 PC2 PC3 PC4 PC5 PC6 Standard deviation 3.8051 2.2667 1.2818 0.8789 0.7836 0.6426 Proportion of Variance 0.6033 0.2141 0.0684 0.0321 0.0255 0.0172 Cumulative Proportion 0.6033 0.8174 0.8858 0.9180 0.9436 0.9608

AB−Ubran−2013 Report Year AB−Rural−2013 Report Year AB−Ubran−2014 Report Year AB−Rural−2014 Report Year

● ● AY2009 ● ● AY2009 ● ● AY2010 ● ● AY2010 ● ● ● ● ● ●

AY2010 0.10 AY2010 AY2011 AY2011 0.25 ● AY2011 ● AY2011 0.25 ● ● AY2012 ● AY2012 ● ● ● ●

0.08 AY2012 AY2012 AY2013 AY2013 ● AY2013 ● AY2013 ● ● AY2014 ● AY2014 ● ● ● ● ● ● ●

0.08 ● ● 0.20 0.20 ●● ● ● ● ● ● ● ● ●

0.06 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● 0.06 ● ● ● 0.15 ● 0.15 ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● Frequency Frequency Frequency ● Frequency

0.04 ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● 0.04 ● ● 0.10 ●● ● 0.10 ●● ● ●●● ● ●●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.02 ● ● ● ● ● ● ● ● 0.02 0.05 ● 0.05 ●● ● ●● ● ● ● ● ●● ● ● ●● ● ●● ● ● ●●● ● ● ●● ●● ● ● ●●● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 0.00 0.00 0.00

0 500000 1000000 1500000 2000000 0 500000 1000000 1500000 2000000 0 500000 1000000 1500000 2000000 0 500000 1000000 1500000 2000000

Size of Loss Size of Loss Size of Loss Size of Loss (a) ABU (b) ABR (a) ABU (b) ABR

BI−Ubran−2013 Report Year BI−Rural−2013 Report Year BI−Ubran−2014 Report Year BI−Rural−2014 Report Year

● ● AY2009 ● ● AY2009 ● ● AY2010 ● ● AY2010 0.15 0.15 ● ● ● ● ● AY2010 AY2010 ● AY2011 AY2011 ● ● AY2011 ● AY2011 ● ● AY2012 ● AY2012 ● AY2012 ● AY2012 ● ● AY2013 0.15 ● AY2013 ● ● ● ● ● AY2013 AY2013 ● AY2014 AY2014 ● ● ● ● 0.15 ● ● ● ● ● ● ● ● 0.10 0.10 ● ● ● ● ●

● ● 0.10 ● ● ● ● ● ●

0.10 ● ● ● ● ● ● ● Frequency Frequency Frequency ● Frequency ● ● ● ● ● ● ● ● ● ● ● 0.05 ● 0.05 ● ● ●●

● ●● 0.05 ● ● ● ●● ● ●●● ● ●● ● 0.05 ● ● ●●● ● ●●● ●●● ●●●● ●●●● ● ●●●●● ●● ●● ●● ●●●● ● ●● ● ● ●● ● ●●● ●● ● ●● ● ●● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ●● ●●● ●● ●● ● ●●●● ●● ●●●● ●● ●●●●● ● ●●●●● ● ● ●●● ● ● ● ●●●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● 0.00 0.00 0.00 0.00

0 500000 1000000 1500000 2000000 0 500000 1000000 1500000 2000000 0 500000 1000000 1500000 2000000 0 500000 1000000 1500000 2000000

Size of Loss Size of Loss Level Size of Loss Size of Loss (c) BIU (d) BIR (c) BIU (d) BIR

Figure 3: The loss relative frequency patterns for different Figure 4: The loss relative frequency patterns for different accident years and different combinations of coverage and accident years and different combinations of coverage and territory for 2013 reporting year, with respect to the size-of- territory for 2014 reporting year, with respect to the size-of- loss. loss. interval width is dramatically increased with the in- with respect to the size-of-loss for different combina- crease of the size-of-loss. The shape of the distribu- tions of coverage and territory of the reporting years tion heavily depends on the size-of-loss. This result of 2013 and 2014 are presented. The results shown in implies that when visualizing the relative frequency Figures 3 and 4 are more interpretable as they show values, they must be presented with respect to the clearly the similar loss patten for different accident size-of-loss. When this is the case, the tail of the dis- years. tribution is much heavier as the bars located at right The results shown in Figures 3 and 4 reveal that side of the distribution are stretched out, and the bars the size-of-loss relative frequency values do not heav- located at the left-hand side will be combined and be- ily depend on the reporting years. The result implies come more dominant than other size-of-loss. In addi- that the claim counts were mainly developed within tion, Figures 1 and 2 summarized the results using the the accident year. We also observe that within the PCA to reconstruct the data matrix stated in Equation same coverage, either AB or BI, the size-of-loss rela- (3). In Figure 1, five PCs were used, while in Figure tive frequency appears to have high similarity among 2, only two PCs were used. The results show the more different accident years. For both coverages, the PCs used the better approximation of the original data zero claim amount has the most dominant frequency. pattern. However, if only the major PCs were used, This fact implies that many reported claims cause one can observe a similar tendency of the loss relative zero loss, which may be due to the insurance de- frequency within the same coverage, which leads to ductible. This result suggests that the loss amount a better data interpretability. In Figures 3 and 4, the associated with zero size-of-loss is mainly due to the relative frequency values of different accident years expenses that occurred during the claim reporting pro-

189 DATA 2020 - 9th International Conference on Data Science, Technology and Applications

AB−Ubran−2013 Report Year AB−Rural−2013 Report Year Claim Counts ● ● ● ● AY2009 ● AY2009 ● ● ● ●

AY2010 0.15 AY2010

0.15 ● AY2011 ● AY2011 ● ● ● ● AY2012 AY2012 ● ● AY2013 AY2013 ● ● ● ● ● ● ● 0.035 ● ● ●

● 0.10 ● ● 0.10 ● ●

● ● ● ● ● ● ● ● ●● ●● ● ● ●● 0.030

Frequency ● Frequency ●● ●●

● ● ● ● ● ●●●● 0.05 ●● 0.05 ●●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● 0.025 ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 0.00

0 500000 1000000 1500000 2000000 0 500000 1000000 1500000 2000000

Size of Loss Size of Loss 0.020

(a) ABU (b) ABR 0.015

BI−Ubran−2013 Report Year BI−Rural−2013 Report Year

● ● AY2009 ● ● AY2009

0.35 ● ● ● ● ● AY2010 AY2010 0.010 ● ● ● ● AY2011 0.4 AY2011 ● AY2012 ● AY2012 ● ● 0.30 AY2013 AY2013

● 0.3 0.25 0.005

0.20 ● ● ●

0.2 ● ● ● Frequency Frequency ● ● ● ● 0.15 0.000 ●

0.10 ● ● ● ● 0.1 ● ● ●● ● 1 2 3 4 5 6 7 8 9 10 ● ●● ● ● ●● ●●●●● ● ●● ● ● ●● ●● 0.05 ● ●● ●●● ●● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● Figure 7: The Scree plot, which shows the distribution of 0.00 0 500000 1000000 1500000 2000000 0 500000 1000000 1500000 2000000 eigenvalues with respect to their principal components. Size of Loss Size of Loss

(c) BIU (d) BIR Principal Component Vector Plot

Figure 5: The first principal component approximation of ● 1st PC 0.8 loss relative frequency patterns of different accident years ● 2nd PC ● 3rd PC and different combinations of coverage and territory for ● 4th PC

2013 reporting year, with respect to the size-of-loss. 0.6

AB−Ubran−2014 Report Year AB−Rural−2014 Report Year 0.4

● ● ● ● AY2010 ● AY2010 ● ● ● ● AY2011 ● AY2011 ● ● ● ● AY2012 ● AY2012

● AY2013 0.15 ● AY2013

0.15 ● AY2014 ● AY2014

● ●

● 0.2 ● ● ● ●

● ● ● ● ● 0.10 Harmonic 0.10 ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● Frequency Frequency

● 0.0 ● ● ● ● ● ● ● ● ● ● 0.05 ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ●

● −0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 0.00

0 500000 1000000 1500000 2000000 0 500000 1000000 1500000 2000000

Size of Loss Level Size of Loss −0.4 (a) ABU (b)ABR

BI−Ubran−2014 Report Year BI−Rural−2014 Report Year 0e+00 1e+05 2e+05 3e+05 4e+05

● ● ● ● ● AY2010 ● AY2010 ● ●

0.35 ● ● AY2011 AY2011

● AY2012 0.4 ● AY2012 Size of Loss ● ● AY2013 ● AY2013 ● AY2014 ● AY2014 0.30 Figure 8: The plot of principal component loadings with 0.3 0.25 respect to the size-of-loss for the first four PCs.

0.20 ● ● 0.2 Frequency Frequency 0.15

0.10 ● ● ● 0.1 ● ● ● ● for auto insurance companies to significantly reduce ●● ● ● ● ● ● ● ●● ●●●●● ● ●●● ● ● ● ●● 0.05 ●●● ●● ●●● ●●● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● the total expenses. On the other hand, in Figures 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● 0.00

0 500000 1000000 1500000 2000000 0 500000 1000000 1500000 2000000 and 4, the overall pattern of AB coverage shows that Size of Loss Size of Loss as the size-of-loss increases, the claim frequency de- (c) BIU (d) BIR creases. For BI coverage, the overall pattern of loss relative frequency shows that as the size-of-loss in- Figure 6: The first principal component approximation of loss relative frequency patterns of different accident years creases, the claim frequency decreases first, then it and different combinations of coverage and territory for increases again. It reaches the local maximum at the 2014 reporting year, with respect to the size-of-loss. size-of-loss around $5,000 to $75,000. In Figures 5 and 6 show the approximation of size-of-loss relative cess. From the management perspective, reducing frequency using only the first PC to reconstruct the the processing of claims with zero loss is necessary data matrix. From these results, we observe that the

190 Improving Statistical Reporting Data Explainability via Principal Component Analysis

● ABR ● ABU ● BIR ● BIU tween zero and $250,000 as the harmonic approaches to zero when the size-of-loss is greater than $250,000. Also, this result implies that the primary insurance un- ●

● ● ● ● certainty in terms of loss relative frequency is mainly ● ● 2.5 ● ● ● ● ● concentrated in between zero and $250,000. The fluc-

● tuation of loss relative frequency from different acci- ● ● ● ● ● ● 0.0 dent years and different reporting years starts to be- ● ● ● come small when an individual size-of-loss is higher ● ● ● ● ● than $250,000. This result provides us with valuable ● ● ● PC2 (21.4% explained var.) PC2 (21.4% explained ● ● ● −2.5 ● information on the estimate of the cut-off value of the ● ● large size-of-loss, which is an important aspect that

● ● leads to reinsurance or determination of the large size- −4 0 4 8 of-loss loading factor. On the other hand, the first two PC1 (60.3% explained var.) principal component loading vectors, share a similar Figure 9: The extracted two dimensional features, which functionality with respect to the size-of-loss. In con- are the first two principal components, with the input data trast, the third and fourth principal component load- being scaled. ing behaves similarly and go in the opposite direction

● ABR ● ABU ● BIR ● BIU from the first two PCs. This fact may suggest that dif- ferent principal components could explain the diverse nature of variation. To extract the major pattern of the ● loss data is to see if there is anything in common, one ● 0.1 ● ● ● ● should focus on either the first or first two PCs. ●

● ● ● ● Figures 9 and 10 show the extraction of two- ● ● ● ●

● ● dimensional features, respectively, for both cases of 0.0 ● ● ● ● ●●● ● ● ● ● input data with and without scaling. In practice, It ● ● ● ● ● is critical to see if there is an effect from the scaling ● PC2 (10.6% explained var.) PC2 (10.6% explained ● −0.1 ● of the data because the variation of each underlying ● variable may be different, because the separation of ●

−0.2 0.0 0.2 extracted features may be affected by the scales of the PC1 (81.5% explained var.) variables. Figure 9 shows the results with scaling be- Figure 10: The extracted two dimensional features, which fore applying PCA. The result reveals that the two- are the first two principal components, without the input dimensional features can be separated by the type of data being scaled. data with a slight overlapping part on AB coverage (ABR and ABU). The BI coverage is well separated overall relative frequency pattern was picked up by by different territories. The result suggests that PCA the first PC, although there are some small differences could form each type of claim count data into clusters, when comparing to the original observations shown which facilitates the understanding of their similari- in Figures 3 and 4. The benefit of having these ma- ties and differences. Furthermore, the extracted fea- jor patterns recognized by the first PC is to enhance a tures of the BI coverage appear to be more linear than understanding of driving forces for insurance claims. the features extracted from the AB coverage. This re- Coverages and Territories might share some common- sult implies that the extracted features of the BI cov- ality in claim frequency distribution. erage are more correlated within the same territory. Figure 7 displays the results of eigenvalues for Based on this result, we can further infer that the low- their principal components. From the results shown rank approximation of original data enhance the level in 7, we can see that the first four PCs can explain the of similarity within the same territory. This result co- significant amount of data variation. These four PCs incides with the observation we found in both Figures have explained more than 95% of the total variation, 5 and 6, which suggests a high level of similarity of which can be seen from Table 1. In particular, the first relative frequency distribution among different com- principal component and the second principal compo- binations of territory. However, PCA can only capture nents have explained about 60% and 21% of the total the similarity within the BI coverage, but not the AB variation, respectively. That is to say, the first two coverage. Similarly, Figure 10 displays the extracted PCs have been able to explain 81% of the total varia- features from the first two PCs, using the input data tion. In Figure 8, the harmonic pattern as a function without scaling. We can observe that without scaling, of the size-of-loss is displayed. The result shows that the separability of the extracted features is lower than the main harmonic patterns are significant only be-

191 DATA 2020 - 9th International Conference on Data Science, Technology and Applications

the one associated with the scaled input data. From Elgendy, N. and Elragal, A. (2014). Big data analytics: a the result, we can conclude that it is crucial to scale literature review paper. In Industrial Conference on the data when using the PCA approach. Data Mining, pages 214–227. Springer. Golub, G. H. and Reinsch, C. (1971). Singular value de- composition and solutions. In Linear Al- gebra, pages 134–151. Springer. 5 CONCLUDING REMARKS Goodman, B. and Flaxman, S. (2017). European union reg- ulations on algorithmic decision-making and a “right Data visualization and data interpretability improve- to explanation”. AI Magazine, 38(3):50–57. ment have been an on-going challenging problem in Hong, S., Hyoung Kim, S., Kim, Y., and Park, J. (2019). the complex statistical analysis. Importantly our re- Big data and government: Evidence of the role of search has shown that breaking down the grouped big data for smart cities. Big Data & Society, 6(1):2053951719842543. data using PCA will give more detailed information Khalid, S., Khalil, T., and Nasreen, S. (2014). A survey of for the insurance regulators to make better decision. feature selection and feature extraction techniques in In this work, we proposed PCA as feature extraction machine learning. In 2014 Science and Information and low-rank approximation methods, and we applied Conference, pages 372–378. IEEE. it to the auto insurance size-of-loss data. We illustrate Lin, W., Wu, Z., Lin, L., Wen, A., and Li, J. (2017). An that PCA is a suitable technique to improve data visu- ensemble random forest algorithm for insurance big alization and the interpretability of data. First, we use data analysis. Ieee Access, 5:16568–16575. principal component analysis to extract data features McClenahan, C. L. (2014). Ratemaking. Wiley StatsRef: from the size-of-loss relative frequency distributions Reference Online. for a better understanding of their natural fluctuation Wickman, A. E. (1999). Insurance data and intellectual and functionality to the size-of-loss. We also use PCA property issues. In CASUALTY ACTUARIAL SOCI- to reconstruct the input data matrix so that the major ETY FORUM Winter 1999 Including the Ratemaking Discussion Papers, page 309. Citeseer. pattern of the size-of-loss relative frequency distribu- Xie, S. (2019). Defining geographical rating territories tion can be obtained for data from a combination of in auto insurance regulation by spatially constrained major coverages and territories. By doing these, we clustering. Risks, 7(2):42. capture the common functionality of data so that the Xie, S. and Lawniczak, A. (2018). Estimating major risk result can be used as a baseline of the size-of-loss rel- factor relativities in rate filings using generalized lin- ative frequency distribution. Our study shows that the ear models. International Journal of Financial Stud- size-of-loss distributions share some common statis- ies, 6(4):84. tical property within the same major coverage or the same territory. It is necessary to further study this commonality by relating the size-of-loss relative fre- quency pattern to some potential risk factors, includ- ing coverages, territories, accident years and report- ing years. It is interesting to estimate these factor effects and to test their statistical significance in the future research.

REFERENCES

Alter, O., Brown, P. O., and Botstein, D. (2000). Singu- lar value decomposition for genome-wide expression data processing and modeling. Proceedings of the Na- tional Academy of Sciences, 97(18):10101–10106. Bakshi, B. R. (1998). Multiscale pca with application to multivariate statistical process monitoring. AIChE journal, 44(7):1596–1610. Bologa, A.-R., Bologa, R., Florea, A., et al. (2013). Big data and specific analysis methods for insurance fraud detection. Database Systems Journal, 4(4):30–39. Clarkson, K. L. and Woodruff, D. P. (2017). Low-rank approximation and regression in input sparsity time. Journal of the ACM (JACM), 63(6):54.

192