Igor Horvat, Jože Rovan and Anuška Ferligoj

SEGMENTATION OF RETAIL BANK MARKET Abstract The goal of the study is to classify private customers of the Slovene bank market into segments (groups), to determine the characteristics of each segment, and to determine the size of their market share. The segments are determined in such a way, that customers within each segment are as similar as possible according to three types of characteristics: a) the use of bank services, b) demographic characteristics, and c) behavior of customers in the relation to the banks. The variables are measured on different types of scales and therefore the combination of two statistical methods was used: – hierarchical clustering method and – multiple correspondence analysis (MCA). Using each method separately we can obtain only partial answers to the research goal. Hierarchical clustering gives the groups of customers. Because of the large number of mixed variables considered, it is not possible to obtain a simple description of the segments. On the other hand, MCA enables identification of relationship between categories but doesn't give the classification of observed units into segments. Therefore, the combined use of both approaches is proposed. The clustering of units into segments is obtained by hierarchical clustering method. In this way each unit is assigned to a segment. The membership of units to segments defines a new variable. This segment variable is included in MCA as a supplementary variable. In this way the characteristics of the formed segments can be recognized. The private customers of Slovene bank market can be classified into seven different segments.

ANALIZA SEGMENTOV NA BANČNEM TRGU OBČANOV Abstract Osnovni cilj analize je razvrstiti občane na slovenskem bančnem trgu v segmente (skupine), opredeliti značilnosti, ki določajo posamezne segmente, in ugotoviti tržne deleže segmentov. Segmenti so določeni tako, da so si enote (občani) znotraj posameznega segmenta med seboj čim bolj podobne, med segmenti pa čim bolj različne. Podobnost oz. različnost izhaja iz treh skupin značilnosti: uporabe bančnih storitev, demografskih značilnosti in odnosa med občani in bankami. Ker proučevane spremenljivke pripadajo različnim merskim lestvicam smo morali v analizi uporabiti dve statistični metodi: – hierarhično združevanje enot v skupine in – multiplo korespondenčno analizo (MCA). Z ločeno uporabo vsake izmed obeh metod lahko le delno odgovorimo na zastavljena vprašanja. S hierarhičnim združevanjem enot v skupine smo določili pripadnost občanov segmentom, vendar pa zaradi velikega števila mešanih spremenljivk nismo mogli preprosto opredeliti značilnosti segmentov. Po drugi strani MCA omogoča opredelitev odnosov med kategorijami, vendar pa ne omogoča združevanje občanov v segmente. Zato predlagamo kombinirano uporabo obeh metod. Z metodo hierarhičnega združevanjema enot v skupine določimo vrednosti nove segmentne spremenljivke, s čimer je opredelena pripadnost občanov k posameznim segmentom. To spremenljivko nato vključimo kot dopolnilno spremenljivko v MCA. Na ta način lahko odkrijemo značilnosti segmentov. Slovenski bančni trg občanov obsega sedem različnih segmentov.

1 Segmentation of Retail Bank Market

Igor Horvat, Jože Rovan, and Anuška Ferligoj

1 Introduction

Bank market is a structure, treated by the banks as a homogeneous group of customers, or as a number of internally homogeneous and externally heterogeneous groups - segments. Segments are formed in such a way, that units within each segment are as similar as possible according to the characteristics (variables) that determine the use of bank services. Banks have two possibilities: – they can focus on the bank market as a whole or – they can specialize in servicing just a few segments and adjust their services to the specific needs of these segments.

Focusing on a few segments leads to high satisfaction of the customers. They have a feeling that the bank has adjusted its service to their needs. This may be the reason for choosing a certain bank.

2 Research Objectives

This market research is focused on the retail bank market (bank market of individuals). Segments will be formed according to the characteristics of the bank market, trying to achieve as large heterogeneity as possible. Because of its complexity, bank market can not be segmented using only one variable (for instance age, education income, etc.), but a set of many variables should be considered instead.

The use of bank services on the retail bank market is mainly determined by three types of variables: – the variables that describe the use of bank services, – demographic characteristics, and – the variables that describe the relationship between the customers and the banks. The aim of the study is to: – classify individual customers of the Slovene bank market into segments, – determine the characteristics of each segment, and – determine the market share of each segment.

3 Data Acquisition

Since the year 1991, the marketing agency »PR plus RM« from Maribor has been providing various types of surveys for the needs of Slovenian banking pool. To analyze the use of bank rd th services by the individuals, a sample survey has been conducted from May 23 to June 28 1997 . In interviews, 2809 randomly selected individuals older than 15 years have answered questions about the reputation of banks, the use and quality of various bank services, loyalty to the banks and some other questions like demographic characteristics of the customers, their life style, etc.

2 The first group of variables describes the use of bank services: – savings booklet; 1U-use, 1N-not use – giro account; 2U, 2N – current account; 3U, 3N – savings booklet (foreign currency); 4U, 4N – current account (foreign currency); 5U, 5N – Slovene debit card; 6U, 6N – extra overdraft on current account; 7U, 7N – long term and short term deposits; 8U, 8N – processing payment orders and other transfers; 9U, 9N – short term personal loans; 10U, 10N – long term personal loans; 11U, 11N – withdrawal of cash through the ATMs; 12U, 12N – paying the payment orders through the ATMs; 13U, 13N, and – direct debit; 14U, 14N. The second group of variables describes demographic characteristics: – age of the individuals; A1 - 15-17 years, A2 - 18-29 years, A3 - 30-39 years, A4 - 40-49 years, A5 - 50-59 years, A6 - 60 years and over, – the size of the household; SH1 - 1 or 2 persons, SH2 - 3 persons, SH3 - 4 persons, SH4 - 5 persons and more, – level of education; EDNA - no answer, EDPS - at most primary school, EDLP - lower professional degree, EDSP - secondary professional degree, EDCO - college or more, – monthly income; I1 - up to 30,000 SIT, I2 - 30,001-70,000 SIT, I3 - 70,001-150,000 SIT, I4 - 150,001-250,000 SIT, I5 - 250,001 and over, INA - no answer, and – employment; EMNA - no answer, EMTM - top manager, EMMM - middle manager, EMEN - entrepreneur, EMSEP - self employed professional, EMEP - employed professional, EMCL - clerk, EMWO - worker, EMFA - farmer, EMST - pupil or student, EMHW - housewife, EMUN - unemployed, EMRE - retired person, EMCR - craftsman. The third group of variables describes the relationship between the customers and the banks: – loyalty to the bank; LPO - loyal in any case (poor offer), LBO - loyal-the best offer, LSB - intend to go to the other Slovene bank, LFB - intend to go to the other foreign bank, – frequency of visiting the bank; FV1 - less then once a month, FV2 - once a month, FV3 - twice a month, FV4 - once a week, FV5 - twice a week or more, and – perception of the change in quality of bank services; PQNA - no answer, PQIM - improvement of the quality, PQST - stagnation of the quality, PQDE - decrease in the quality, PQUN - unawareness of bank quality changes.

3 4 Analytical Methods

To answer the questions about segmentation of the Slovene retail bank market, two statistical methods were used: – hierarchical clustering and – multiple correspondence analysis. Both methods will be described in the next two chapters.

4.1 Hierarchical Clustering

Hierarchical clustering consists of two steps: – calculation of the matrix of similarity or dissimilarity and – sequential joining of units (groups) into new groups.

In this case, nominal and ordinal variables were used. One of the ways to calculate the measures of similarity is to represent original variables by new binary variables. The values of dichotomous variable should be represented by one binary variable and the values of polythomous variable by so many binary variables, as many different values a polythomous variable can take. This method of representing the values of categorical variables is called dummization.

The starting point for calculating similarity measures is an association table. Frequency a represents a number of binary variables, where answer is positive for both units U1 and U2 (+ + matching), frequency d represents a number of negative answers for both units (– – matching), and frequencies b and c represent a number of opposite answers (+ – or – + matching).

Association table

unit U2

+ 

+ a b

unit U1  c d

The frequencies in the association table are the basis for calculating the measures of similarity. In our case Jaccard measure of similarity has been used, because this measure does not take into account the – – matching. The measures that consider also the – – matching are not relevant, because of the artificial increase of number of the – – matching as a consequence of dummization.

Jaccard measure

a s  a  b  c

4 The values are defined on [0,1] interval and can be transformed into dissimilarity measures, regarding an equation: d  1 s . The units are joined into groups on the basis of the values of similarity or dissimilarity measures. Two units (groups) are joined into a new group according to the value of some criterion function. In this case the Ward method has been selected, because the use of minimal or maximal method has caused a chaining problem.

Ward method

(ni  n j ) nk 2 d(Ci C j ,Ck )  d (Tij ,Tk ) (ni  n j  nk )

Tij - centroid of the joined group Ci C j

Tk - centroid of the group Ck

ni - number of units in a group Ci d - measure of dissimilarity

The process of joining was conducted until 8 groups were formed. Further joining does not make any sense. In this way each unit is assigned to a segment. The membership of units to a segment is a new variable, called "segment variable". The values of segment variable are labeled with letters from A to H.

Due to numerous calculations hierarchical clustering is a memory consuming process. For that reason many statistical tools are limited to process no more than a few hundreds of observations. Since we had 2809 units to analyze we also faced this problem, but have overcome it successfully using the SAS system. The similarity matrix was calculated with "data step" and hierarchical clustering with "proc cluster" procedure.

4.2 Multiple Correspondence Analysis

Multiple correspondence analysis (MCA) is a multivariate method for exploring cross-tabular data. A prime output of this method is usually a planar map, in which categories of the considered variables are represented by points. The proximities between points represent the affinities between categories, i.e. two points that lie close together in the map represent categories that refer mainly to the same units.

The common starting point of the MCA is Burt matrix. It is a partitioned symmetric matrix, containing all pairs of crosstabulations among a set of categorical variables. The main diagonal includes diagonal submatrices of the marginal frequencies (i.e. a crosstabulation of variable with itself). Each off-diagonal crosstabulation is an ordinary two-way contingency table.

Spectral decomposition of the equation, based on Burt matrix (Dimovski, Rovan, 1998), results in a diagonal matrix of principal inertias and a matrix of standardized coordinates. MCA includes the fitting of the diagonal submatrices of the Burt table. As a result, the total inertia has inflated and thus the proportions of the first few principal inertias as parts of the total inertia are reduced. One

5 of the ways to overcome this deficiency is to calculate modified inertias according to Benzécri’s formula (Benzécri, 1979). The next question is the quality of the presentation of the position of the profiles based on a few first principal coordinates. For that purpose Greenacre’s formula for calculating relative modified inertias has been used (Greenacre, 1994).

Vectors of standardized coordinates are multiplied by the square roots of modified inertias to get vectors of principal coordinates of the observed categories. The position of categories of the variables considered is determined by the values of principal coordinates and can be presented in an optimal subspace of appropriate dimension. When the sum of the first two relative modified inertias is high enough it is recommended to choose a two-dimensional map (two-dimensional subspace), because of its simplicity. When a representation of position of categories in higher dimensional subspace is needed, a three-dimensional map or Andrews’ curves (Rovan, 1994) should be used.

In this case, the sum of the first two relative modified inertias takes relatively high value of almost 78.31%. For that reason the representation of the position of the categories in a two-dimensional map is recommended.

5 Complementary Use of Both Methods

As mentioned before, the primary aim of the study was to classify the individuals on the retail bank market into segments, and to determine the characteristics of each segment. MCA is able to determine the relations between categories, but does not enable the classification of units into segments. On the other hand, hierarchical clustering efficiently defines the groups of individuals, but it is unable to give a simple description of the segments, because of large number of variables considered.

The research aim could be achieved by combining both analytical methods. First, joining of the units into segments is obtained by hierarchical clustering method. In this way each unit is assigned to an appropriate segment. The membership of units in certain segment is a new variable, called "segment variable". This segment variable is included into MCA as a supplementary variable. Second, on the basis of MCA, all categories of the primary and supplementary variables are presented. in an optimal subspace. The position of categories of primary variables is determined solely by the values of these variables and not by the supplementary variable. On the basis of the position of all the categories the nature of formed segments can be determined.

6 Results

The results are presented in a two-dimensional map (Figure 1). According to the position of the categories, the first axis represents three ordinal variables (level of education, income, and frequency of visiting the bank). The values of all three ordinal variables increase from left to right. It is much more difficult to determine the mining of the second axis. To some extent it represents the age of individuals, but the broken line that connects the age-groups is not parallel with the second axis. The younger age-groups are located in the lower part of the map and the older-age

6 groups are in the upper part.

Picture includes segments obtained by the hierarchical clustering of units. The groups (segments) are graphically presented by circles and the size of the area of the particular circle corresponds to the relative size of the corresponding segment.

The picture presents the structure of the Slovene retail bank market, determined by demographic characteristics of individuals, the use of the bank services, and by the relationship between the individuals and the banks. Groups are defined in the following way: – A13% - older retired persons. This is a segment of old (A6), mainly retired persons (EMRE). In this segment dominate individuals with lower degree of education (EDPS), lower income (I1, I2), and they don’t use many bank services. – B9% - younger retired persons. This is a segment of older persons (50 to 60 years - A5), just before the retirement or younger retired persons. Comparing to the segment of the older retired persons, their income is somewhat higher, they have a higher degree of education, and they use more bank services. – C15% and D3% - young individuals. This segment consists mainly of pupils and students. The main characteristics of this group are the lowest income and nonuse of bank services. This segment is presented with two groups (C15% and D3%), because further joining would not result in fusing of these two groups, but some others, what has no sense in this context. In future a lot of units from this segment will become very interesting for the banks, and therefore deserve special attention. – E8% - nonattractive individuals. This segment is composed of housewives (EMHW), unemployed (EMUN), and farmers (EMFA). They have the lowest degree of education, the lowest income, and they do not use bank services. Normally, the banks are not interested in this segment. – F15% - workers. This segment is characterized by lower professional degree (EDLP), secondary professional degree (EDSP), and lower income (I2). Workers visit banks at most twice a month (FV1, FV2, and FV3) and they remain loyal to their bank only if the bank keeps high level of service quality (LBO). They use a very limited number of bank services, mainly savings booklet (1U). – G16% - less demanding individuals. This segment is composed mainly of clerks (EMCL) and craftsmen (EMCR). They visit bank almost every week (FV4) and they are not loyal to their bank (LSB, LFB). They mainly use current account (3U), giro account (2U), savings booklet in foreign currency (4U), processing payment orders and other transfers (9U), and withdrawal of cash through the ATMs (12U). – H22% - demanding individuals. This segment is composed mainly of individuals with the highest income (I4, I5) and the highest degree of education (EDCO). They are entrepreneurs (EMEN), middle managers (EMMM), top managers (EMTM), employed professionals (EMEP), and self employed professionals (EMSEP). They also use an extra overdraft on the current account (7U), paying the payment orders through the ATMs (13U), Slovene debit card (6U), short term (10U) and long term personal loans (11U), short term and long term deposits (8U), current account in foreign currency (5U), and a direct debit (14U).

7 Picture 1: Representation of the categories of the considered variables and the segments in the optimal subspace.

A13%

B9%

E8% H22%

F15% G16%

C15%

D3%

8 7 Conclusion

With the combination of hierarchical clustering and multiple correspondence analysis, the analytical question about segmentation of Slovene retail bank market was answered.

The segmentation used in Slovene banks is mainly the result of the empirical experiences and is usually based on the differences in income and age of the individuals. We believe that the process of segmentation can be improved by considering not just a few, but a set of many variables.

Because of the transitional processes Slovene bank market is facing with substantial changes. At this moment we are particularly interested in a direction of the future changes. For that reason we intend to include the time component in our further analysis.

REFERENCES Benzécry, J.P. (1979). »Sur le calcul des taux d’inertie dans l’analyse d’un questionnaire. Addendum et erratum à [BIN.MULT].« Cahiers de L’analyse des Données, 4, pp. 377-378. Ferligoj, A. (1989). Metodološki zvezki, št. 4, Razvrščanje v skupine (in English: Classification into groups). Ljubljana: Jugoslovansko združenje za sociologijo, Sekcija za metodologijo in statistiko. Greenacre, M.J. (1984). Theory and Applications of Correspondence analysis. London: Academic Press Greenacre, M.J. (1994). Multiple and joint correspondence analysis, in M.J. Greenacre and J. Blasius (eds.), Correspondence analysis in the Social Sciences, pp. 141-161. London: Academic Press. Horvat, I. (1996). Uporaba bančnih storitev s strani podjetij s korespondenčno analizo (in English The use of the bank services by companies with corespondence analysis). Ljubljana: Univerza v Ljubljani, Ekonomska fakulteta. PR plus RM (1997). Raziskava ugleda, kakovosti storitev in tržne pozicije slovenskih bank med občani (in English: The investigation of the reputation, the quality of bank services and the market position of Slovene banks by the individuals). Maribor: PR plus RM. Rovan, J. (1991). The role of the Andrews’ curves in correspondence analysis, in proceedings of the SAS European Users Group International Conference, pp. 460 - 474. SAS Institute. Rovan, J. (1994). Visualising solutions in more than two dimensions, in M.J. Greenacre and J. Blasius (eds.), Correspondence Analysis in the Social Sciences, pp. 210 - 229. London: Academic Press. Rovan J., Mramor D., Horvat I. (1997): The use of bank services by the companies. Slovenska ekonomska revija, Ljubljana, št. 6, letnik 48, december 1997, 507-517 Dimovski V., Rovan J. (1998): Environmental turbulence in the credit union industry: A multiple correspondence analysis approach, in A. Ferligoj (eds.), Advances in Methodology, Data Analysis, and Statistics, pp. 49-60. Ljubljana: FDV, Serija: Metodološki zvezki 14. Questions and comments on this paper can be addressed to:

Igor Horvat, B.A. (Econ), Nova Ljubljanska banka d.d., Ljubljana, Trg republike 2, 1520 Ljubljana, Slovenija, E-mail: [email protected] Jože Rovan, Ph.D., Assistant Professor of Statistics, Faculty of Economics, Unversity of Ljubljana, Kardeljeva ploščad 17, 1109 Ljubljana, E-mail: [email protected] Anuška Ferligoj, Ph.D., Professor of Statistics, Faculty of Social Sciences, University of Ljubljana, Kardeljeva ploščad 5, 1109 Ljubljana, E-mail: [email protected]

9