Market basket analysis with neural gas networks and self-organising maps Received (in revised form): 3rd March, 2003

Reinhold Decker is Professor and Head of Marketing at the University of Bielefeld, and a visiting professor in marketing at universities in Austria, Russia and Bulgaria. He is author and co-author of numerous publications in journals, conference proceedings and compilations focussing on marketing research and data analysis as well as the author and co-editor of academic books. He is a member of several academic societies and serves as a referee for different journals.

Katharina Monien received her diploma in mathematics in 2001 and is a lecturer in marketing at the Department of Economics and Business Administration, University of Bielefeld. Her research interest is the application of neural networks and machine learning in marketing.

Abstract Market basket analysis has been an elementary part of quantitative decision support in retail marketing for many years and it is regularly cited as a prime application area of data mining. In this paper two competitive neural network approaches are presented and discussed with respect to their suitability for purchase interdependence analysis on the product category level. Particular attention is paid to the user-oriented representation or visualisation of cross-category dependences. Both approaches are applied to point of sales scanner data provided by a German retail chain to check how far they are able to uncover presumed purchase interdependences.

INTRODUCTION The analysis of market basket data has Market basket analysis has not only been experienced a renaissance as a result of an important topic of traditional retail various publications on data mining and marketing for more than 25 years1–3 but knowledge discovery in databases since has also gained an increasing relevance 1993.6–9 In these papers market basket for electronic retailing.4 According to analysis is partly regarded as a typical Russell and Petersen5 market basket field of application for data mining and analysis focuses on the decision process the main emphasis is put on association in which a consumer selects items from a rule-based approaches. In Decker and given set of product categories on the Schimmelpfennig10 the traditional Katharina Monien Department of Economics same shopping trip. Correspondingly, the association coefficient-based approach and Business term ‘purchase interdependence’ is used (measuring purchase interdependence by Administration, University of Bielefeld, PO Box 10 01 31, in the following pages to refer to means of cluster analysis and 33501 Bielefeld, Germany. interrelations between different elements multidimensional scaling) is compared Tel: ϩ49 (0)521 106 4844; of a retail assortment resulting from with a rule-based approach reflecting on Fax: ϩ49 (0)521 106 6456; e-mail: kmonien@ purchases that have already been carried electronic retailing. In the present paper wiwi.uni-bielefeld.de out. both a self-organising map and a neural

᭧ Henry Stewart Publications 1479-1862 (2003) Vol. 11, 4, 373-386 Journal of Targeting, Measurement and Analysis for Marketing 373 Decker and Monien

gas network approach are investigated considered simultaneously. Secondly, with respect to their suitability for interrelations of the interesting kind purchase interdependence analysis on the usually lead to similar ‘market basket product category level. patterns’. Two market baskets have a The practical benefitofsuch similar pattern if they are characterised approaches strongly depends on their by a more or less identical combination ability to uncover ‘relevant’ interrelations of product categories. in the assortment to be analysed and the To identify those interrelations the extent to which one succeeds in authors adapted the neural gas network adequately documenting or visualising (NGN) approach introduced by the same for decision support in retail Martinetz and Schulten12 and the marketing. It may be that existing well-known self-organising map (SOM) asymmetric interdependences between approach introduced by Kohonen13 to individual product categories and the the problem on hand. In both cases the inevitable occurrence of ‘random take relevant patterns are represented by away effects’11 are special challenges in ‘neurons’ (units). The formal counterpart this context. At the same time purchase of such a unit is a weight vector. When interdependence analysis is confronted the training of the neural network is with a continuously growing database finished, that is to say after the net resulting from point of sales (POS) weights have been fitted to the given scanning. All these aspects have to be data, the resulting final weight vectors kept in mind when a new approach is are called prototypes. discussed. Before starting the methodological The rest of the paper is organised as considerations some symbols have to be follows. In the following section a introduced. Let n be the number of description of the self-organising map as interesting product categories and m the well as the less well-known neural gas unrestricted number of individual market network approach are presented as tools baskets to be analysed. Then an for purchase interdependence analysis. individual market basket can be defined ϭ Then easy-to-interpret representations of by binary transaction vector tj (tj1, Ј ෈ purchase interdependences are proposed tj2,...,tjn) with j {1,...,m}, where ϭ in the next section. Finally, a reality-based tjk 1 if market basket j contains at least applicationofbothapproachesis one item of product category ෈ ϭ presented. The paper closes with some k {1,...,n}andtjk 0otherwise. conclusions and a short outlook.

Essentials of the SOM approach PURCHASE INTERDEPENDENCE In most applications the units of an ANALYSIS BY MEANS OF SOM are organised on a two SELF-ORGANISING MAPS AND dimensional grid where the individual NEURAL GAS NETWORKS positions reflect the interrelations Both approaches introduced for purchase between the respective units.14 In the ෈ interdependence analysis start with two following, each unit uh(h {1, ..., p}) is basic assumptions. First, the measurement represented by a weight vector ␩ ϭ ␩ ␩ ␩ Ј of purchase interdependence is ab ( ab1,..., abk,..., abn) where a and meaningful only on the multicategory b refer to the position of the unit within Յ ␩ Յ level. This means that all product a rectangular grid and 0 abk 1 holds categories of interest have to be true.

374 Journal of Targeting, Measurement and Analysis for Marketing Vol. 11, 4, 373–386 ᭧ Henry Stewart Publications 1479-1862 (2003) Market basket analysis with neural gas networks and self-organising maps

Applying the SOM approach to POS distances finally leads to the optimal ␩ scanner data in order to identify existing prototype system { cacb}. purchase interdependences means To optimise the topological structure carrying out two different tasks of the whole map the accumulated simultaneously: finding an optimal set of neighbourhood distance prototypes representing similar market 2 ͸dist(␩ , ␩˜ ˜ ) ϭ͸͸ ʈ␩ Ϫ ␩˜ ˜ ʈ baskets and ensuring the optimal ca,cb ca cb cacb ca cb topological arrangement of these (ca,cb) (ca,cb)(˜ca,c˜b) n prototypes. In the resulting map similar 2 ϭ͸͸͸ (␩ Ϫ ␩˜ ˜ ) (2) weight vectors (representing similar cacbk ca cbk (ca,cb)(˜ca,c˜b) k=1 market basket patterns) are located close

~ ~ ෈ together. How to interpret such a map with ucacb Ncacb has to be minimised as

in detail with respect to purchase well. Neighbourhood set Ncacb contains all

~ ~ interdependence will be shown in the units ucacb that are topological neighbours

empirical part of the paper. of the winning unit ucacb within the The net training process looks like rectangular grid. ␩ this: at the beginning, all weights abk In the basic SOM approach the have to be initialised in a suitable way. number of units p has to be fixed in ෈ In the following iterations l {1,...,lmax} advance which can be restrictive in some the distances between each weight vector cases. To avoid this disadvantage ␩ 15 ab and a randomly chosen input vector Alahakoon et al. have proposed a (market basket) tj are computed so-called growing self-organising map according to (GSOM) approach where both the size

n and the shape of the network are dist(t , ␩ ) ϭ ʈt Ϫ ␩ ʈ2 ϭ͸(t Ϫ ␩ )2 determined dynamically during the j ab j ab jk abk k=1 (1) training process. Because of the fundamental nature of this investigation this aspect is not elaborated on at this to determine the winning unit ucacb with minimum distance. Then each weight point. vector has to be updated as follows:

␩ ϩ ϭ ␩ ϩ ␣ Essentials of the NGN approach ab(l 1) ab(l) (l) · nhfcacb(a,b) Ϫ ␩ ෈ · (tj ab) Again, each unit uh (h {1,...,s}) is represented by a weight vector ␣ ϭ ␣ Ϫ ␩ ϭ ␩ ␩ ␩ Ј ෈ n where learning rate (l) (0)(1 l/lmax) h ( h1,..., hk,..., hn) [0;1] .The is a decreasing function of iteration l. dimensionality of these vectors is equal The extent of this adaptation can be to the number of interesting product controlled via neighbourhood function categories n. In contrast to the SOM approach, there is no aprioridefined grid Ϫ 2 ϩ Ϫ 2 16 ϭ Ϫ (a ca) (b cb) to be fitted. nhfcacb(a,b) exp΂ 2 ΃ 2␴(l) The similarity of both approaches also becomes apparent in the training process. ␴ ϭ ␴ Ϫ where (l) (0)(1 l/lmax) determines After initialisation the distance between ␩ the scope of the neighbourhood kernel. each weight vector h and a randomly The whole procedure is repeated until chosen binary transaction vector tj is the maximum number of iterations lmax computed in accordance to equation 1. or any other aprioridefined stopping In respect of these distances all units uh criterion is reached. The minimisation of are arranged in such a way that the

᭧ Henry Stewart Publications 1479-1862 (2003) Vol. 11, 4, 373-386 Journal of Targeting, Measurement and Analysis for Marketing 375 Decker and Monien

␩ winning unit with weight vector win between two units uh and uh˜ an additional ␩ ϭ takes rank r(tj, win) 0, the co-winning controlling variable agehh˜ is introduced. unit takes rank 1 and so on. This This variable is set to 0 if chh˜ is set to 1. ranking explicitly determines the strength The ‘ages’ of all the other connections to

of the adaptation of the individual winning unit uh are raised by 1. If the weight vectors in each iteration. It is ‘age’ of a connection exceeds the dynamically computed maximum ⌬␩ ϭ ␧ ␩ Ϫ ␩ h(l (l) · nhf␭(l)(r(tj, h)) · (tj h) l ageEnd ϭ и lmax agemax(l) ageInit ΂ ΃ with neighbourhood function ageInit ␩ this connection is removed from the ␩ ϭ Ϫ r(tj, h) nhf␭(l)(r(tj, h)) exp΂ ΃ ␭(l) network. Threshold agemax(l) depends on pre-defined initial and final values ageInit and learning rates and ageEnd as well as on iteration l. Current extensions of the NGN ␧ l End ␧ ϭ ␧ и lmax approach focus on the speeding up of (l) Init ΂ ␧ ΃ Init the training process or on the dynamic and determination of the number of units to be included in the network. Atukorale ␭ l 18 End ␭ ϭ ␭ и lmax and Suganthan, for example, have (l) Init ΂ ␭ ΃ respectively. Init published a so-called implicit ranking scheme. The basic idea of this proposal is Obviously, the numerical value of to redefine the neighbourhood function ␩ neighbourhood function nhf␭(l)(r(tj, h)) using a kind of normalised distance decreases, other things being equal, with ␩ Ϫ an increasing rank of the respective unit. ␩ ϭ dist(tj, h) distmin q(tj, h) Ϫ Therefore, only those weight vectors that distmax distmin

are close to the input signal tj are changed significantly according to: instead of the original rank order to reduce training time. The maximum and ␩ ϩ ϭ ␩ ϩ⌬␩ minimum distances distmax and distmin h(l 1) h(l) h(l). between the current input signal tj and each weight vector have to be defined The basic NGN approach can be ␩ extended in different ways. For example, adequately. For the winning unit q(tj, win) following Martinetz and Schulten17 a is equal to 0. Another modification data-driven topological structure can be suggested by the same authors is the added to show the interrelations between so-called truncated update rule, where units. For this purpose a connectivity only those weight vectors are updated ϭ with normalised distances smaller than a matrix C (ch1h2)h1,h2 ϭ 1,...,s has to be introduced with dynamically computed threshold. In an own sensitivity study it became 1, if unit uh and uh are connected apparent that there are only negligible c ϭ 1 2 h1h2 Ά0, otherwise. differences in the running times of these modifications, if s Յ 20. Therefore, a

Each time unit uh becomes the winning detailed description was dispensed with unit indicator variable chh˜ is set to 1 for and a decision made in favour of the the co-winning unit uh˜.Totakeinto original NGN approach for the account the strength of the connection investigations.

376 Journal of Targeting, Measurement and Analysis for Marketing Vol. 11, 4, 373–386 ᭧ Henry Stewart Publications 1479-1862 (2003) Market basket analysis with neural gas networks and self-organising maps

Table 1: Elements of the NGN output

Unit Frequency Ranking Weights

1 1 1 ␩ 1 ␩ 1 ␩ 1 u1 P(u1) PCk 1 PCk 2 ...PCk n 1k 1 1k 2 ... 1kn ...... s s s ␩ s ␩ s ␩ s us P(us) PCk1 PCk2 ...PCkn sk1 sk2 ... skn

Similarly to the GSOM approach, Starting from this it is possible to there is also a dynamic extension of the visualise interesting interrelations both on NGN approach called growing neural gas the product category and the market network. For a detailed description of basket level by means of a graph. this see Fritzke.19

Visualising purchase interdependences REPRESENTATION OF To visualise possible interdependences on INTERRELATIONS the product category level h ϭ ␩ h ᭙ h As already emphasised at the beginning P(PCki |uh): hk i h,k i (with h ෈ of the paper, an essential aspect of ki {1,...,n}) is interpreted as the purchase interdependence analysis is the probability of observing product category h adequate documentation or visualisation PCk i inamarketbasketthatcorresponds of the uncovered interrelations. The to prototype h. Then this probability is availability of comprehensible combined with the observed frequency representations (eg as an essential part of of each prototype to get the compound h പ ϭ и regular sales reporting) makes it easier to probability P(PCki uh) P(uh) h ᭙ h apply this information to marketing P(PCk i|uh) h,k i. To decide whether decisions, for instance, with respect to two product categories have to be product placement and promotion connected in a relating graph because of pricing. In this section some procedures their interdependence a user-defined are proposed that can be applied to both threshold d can be introduced. Product

the SOM and the NGN approach. For categories k1 and k2 are connected if

demonstration purposes the focus is on both probabilities P(PCk1)andP(PCk2) the latter. are greater than d for at least one h. Table 1 gives a general impression of Alternatively, a suitable elbow criterion the NGN output underlying the can be applied. Figure 1 shows what following considerations. After having such a graph might look like. finished the training process the weights Product categories 6 and 7 are

of each unit uh can be arranged in interdependent in the given sense but Ն ␩ h Ն Ն ␩ h descending order: 1 hk 1 ... hk i without having any other meaningful Ն Ն ␩ h Ն ... hk n 0. The corresponding interrelation, whereas product category 2, ‘ranking’ of the individual product for example, is part of a much more categories (abbreviated by PC)is complex net. In the empirical application displayed in the centre of the table sectionsuchagraphwillbecreatedfrom

whereas the share of market baskets P(uh real POS scanner data with presumed ͚s ϭ (with h=1 P(uh) 1) which fit prototype interrelations between individual product h isgivenincolumn‘Frequency’. categories.

᭧ Henry Stewart Publications 1479-1862 (2003) Vol. 11, 4, 373-386 Journal of Targeting, Measurement and Analysis for Marketing 377 Decker and Monien

3 67

12 5

4

Figure 1 Possible structure of a simple purchase interdependence graph

Heterogeneity and asymmetry in market (prototype) h when product category basket patterns PCk is already an element of an arising Using connectivity matrix C agraphthat market basket. The computation of these represents existing similarities between probabilities can provide valuable hints individual prototypes is created. Two about those product categories which units are ‘neighbouring’ if they are induce the purchase of items of the represented by similar weight vectors. remaining categories. From a Each edge of the graph is weighted by sales-oriented point of view those

the reciprocal age 1/ageh1h2. The longer a product categories are of particular connection has not been confirmed interest to the retail management which during network training the lower the induce market baskets containing weight of this edge. If a continuously ‘profitable’ product categories with high growing number of market baskets have probability. Information of this kind is to be represented (eg as a result of daily useful primarily for periodical promotion POS scanning in retail stores) the current planning. Figure 2 illustrates the general relevance of a connection can be structure of an individual vertex of the expressed dynamically in this way. A graph. The given probabilities result from high number of edges with equation 3. comparatively low weights point to distinctive differences (heterogeneity) with respect to the underlying buying EMPIRICAL APPLICATION TO patterns. POS SCANNER DATA Furthermore, using the formula of Bayes, asymmetries on the market basket Data description level can be described. Let To demonstrate their general suitability ʝ both neural network approaches were ϭ P(PCk uh) P(uh|PCk) applied to POS scanner data collected by P(PCk) a German retail chain in the mid-1990s. и For illustration purposes 25 product ϭ P(uh) P(PCk|uh) ͚s и (3) h˜=1 P(uh˜) P(PCk|uh˜) categories from the chemist’s assortment were selected. Accompanying be the probability of observing pattern investigations of the available data have

378 Journal of Targeting, Measurement and Analysis for Marketing Vol. 11, 4, 373–386 ᭧ Henry Stewart Publications 1479-1862 (2003) Market basket analysis with neural gas networks and self-organising maps

Figure 2 General structure of vertices

shown that the following considerations from 2,079 market baskets. Product can also be transferred without categories with possible purchase restrictions to other categories of interdependences have been put together products in everyday use. In this context, in separate fields. Each market basket is it should be mentioned that in this coded with a binary vector where 1 implementation, using SAS Release 8.02, indicates the occurrence and 0 the neither the SOM nor the NGN non-occurrence of the respective product approach is restricted with respect to the category. The total number of items of a maximum number of market baskets to product category in a basket is not beanalysed.But,ofcourse,anupper considered. In doing this it is possible to bound may result from the storage abstract from biasing stock-keeping capacity of the employed hardware. effects. The transformation of the original The data referred to in Table 2 result data (containing, for example, information on price, time and date of Table 2: Profile of the POS scanner data purchase) into binary vectors was realised No. Product category Occurrence with standard data management facilities of SAS. 1 Shampoos 732 2 Hair conditioners 263 The product categories 8 to 12, for 3 Hair lotions 206 example,canbeassumedtobe 4 Tampons 221 interrelated in the relevant sense because 5 Sanitary napkins 248 they are at least partially complementary. 6 Cat food 174 7 Rewards for cats/dogs 127 The same, but in a somewhat more 8 Juices for babies 147 pronounced way, seems to be valid for 9 Desserts for babies 260 product category 17 and 18. Items of 10 Vegetables for babies 120 these two groups can only be used 11 Childrens food 98 12 Childrens menus 93 jointly. Finally, product categories 21 and 13 Denture cleansing agents 76 22 represent so-called random take away 14 Denture fixer 50 products, that are often placed 15 Sun protectors/blockers 21 spontaneously in the basket at the 16 After sun lotions 10 checkout. A distinct and causally 17 Shaving soaps/creams 181 18 Razor blades 233 motivated relation to any other product 19 Slim/diet food 56 category listed in the table is not 20 Functional/health food 132 discernible. 21 Cough drops 473 22 Chewing gum 362 23 Heart and nerve tonics 56 Results received from SOM 24 Eyeshadow 234 Checking several possible alternatives an 25 Lipsticks 174 8 ϫ 8 SOM layer was found to produce

᭧ Henry Stewart Publications 1479-1862 (2003) Vol. 11, 4, 373-386 Journal of Targeting, Measurement and Analysis for Marketing 379 Decker and Monien

a/b 1 2 3 4 5 6 7 8 1 12 221 2 17 17 18 18 810 910 2 123 3 24 417 518 8 89 10 11 3 13 13 14 145 45 4 9 911

4 1 118 117 15 521 912 11 12 5 121 1 1 25 5 6 13 14 20 6 21 24 124 122 222 622 67 7 20 21 7 24 22 24 22 22 21 22 621 421 21 8 24 25 22 25 522 20 22 21 22 21 22 23 21

Figure 3 SOM output after lmax ϭ 100,000 iterations

the best results with respect to All in all, the different ‘clouds’ in heterogeneity (cf. for this equation 1) Figure 3 conform to a great extent to the presumed interrelations. The hair 1 m het{␩ } ϭ ͸ dist (t , ␩ ) ϭ 1.183, care products of categories 1, 2 and 3, ca cb m j ca cb j=1 for instance, define a cloud where product category 1 plays a seemingly and simplicity (cf. equation 2) special role. Taking into account the basic function of the items of this simpl{␩ } ϭ͸ dist (␩ , ␩˜ ˜ ) ϭ 2.390 ca cb ca cb ca cb category (shampoos) within the entire (ca,cb) hair care process this seems to be quite of the prototype system as well as the plausible. In the same way the product interpretability in content. The solution categories8to12define a fairly compact accompanied by the initial learning rate cloud in the upper right-hand corner of ␣(0) ϭ 0.7 and ␴(0) ϭ 3 for the the map. This could be rated as a hint at neighbourhood kernel is shown in some stronger relations between these Figure 3. product categories. The interrelation of Each of the 64 fields of the map product categories 24 and 25, to represents one unit or prototype. To mention another nice example, is also make interpretations easier, however, undoubtedly understandable. only those product categories with Furthermore, the suggested type of weights greater than 0.8 have been representation is useful with respect to displayed. That is why three product the detection of so-called random take categories, 15, 16 and 19, do not appear away products. A simple and plausible in the map. Regarding the first two indicator for this phenomenon is the product categories (none of them have extent to which a product category weights greater than 0.25), this is not too ‘scatters’ across the map. In the present surprising because of the comparatively case this seems to be valid for product low frequency of occurrence (cf. Table category 21 (cough drops) which appears 2). Product category 19, with maximum in several, but not necessarily weight 0.73, however, only narrowly neighbouring, units and which is misses its inclusion in the map at the displayed together with different product ‘expected’ place (row 5, column 8). categories within one unit. For the other

380 Journal of Targeting, Measurement and Analysis for Marketing Vol. 11, 4, 373–386 ᭧ Henry Stewart Publications 1479-1862 (2003) Market basket analysis with neural gas networks and self-organising maps

Table 3: NGN output after lmax ϭ 20,000 iterations

h h h h h ␩ h ␩ h ␩ h ␩ h ␩ h hP(uh) PCk 1 PCk 2 PCk 3 PCk 4 PCk 5 hk1 hk2 hk3 hk4 hk5

1 0.071 24 25 1 3 18 0.925 0.497 0.279 0.143 0.095 2 0.131 22 21 1 24 3 1.000 0.341 0.275 0.106 0.066 3 0.091 5 1 4 22 6 1.000 0.363 0.226 0.121 0.074 4 0.125 9 10 8 11 12 0.858 0.373 0.362 0.331 0.242 5 0.058 20 19 21 22 1 0.983 0.300 0.275 0.167 0.133 6 0.055 6 7 13 14 3 0.772 0.605 0.158 0.158 0.079 7 0.143 21 1 5 24 4 1.000 0.232 0.117 0.084 0.077 8 0.089 18 17 1 25 13 0.773 0.681 0.314 0.065 0.059 9 0.170 1 2 3 25 6 0.926 0.554 0.304 0.088 0.051 10 0.067 4 1 22 2 24 1.000 0.371 0.171 0.100 0.100

presumed random take away product prototype. This applies for example to category 22 (chewing gum) unfortunately thehaircareproductcategories1,2and no such clear picture is obtained. In a 3 that are dominating prototype h ϭ 9. countermove this seems to apply for Atthesametimeitemergesthatproduct product category 1. category 1 (shampoos) seemingly contains random take away products because of its co-occurrence in several other market Results received from NGN basket prototypes. Another very nice Table 3 contains the output of the interrelation is uncovered by prototype original NGN approach using 10 units. h ϭ 6 where the dental care products Even this parsimonious specification appear together with cat food and turned out to produce acceptable results rewards for cats and dogs. Obviously, with respect to heterogeneity these products for small pets are 1 m predominantly bought by an older het{␩ } ϭ ͸ dist (t , ␩ ) ϭ 1.090 h m j h clientele. Insights of this kind can be j=1 used excellently for customer oriented and interpretability. The initial and final sales promotions. Finally, prototype h ϭ 1 ␧ ϭ values of the learning rates ( Init 0.7, contains face and hair care products ␧ ϭ ␭ ϭ ␭ ϭ End 0.005, and Init 1.0, End 0.01) typically bought by female consumers. ϭ as well as those of the ‘age’ (ageInit 10, All in all the NGN results are very ϭ ageEnd 100) are similar to proposals similar to those produced by the SOM. made in Martinetz and Schulten,20 The NGN approach proves, however, to Martinetz et al.21 and Fritzke.22 be more parsimonious both regarding Because of space restrictions only five training time and the required number of product categories at a time (starting units. In this respect, at least in this with the highest weight) have been investigation, the NGN approach slightly displayed. Obviously, the product dominates the SOM approach. categories for children are clearly Nevertheless, a final assessment needs dominating prototype h ϭ 4andare more comparisons on different data sets. strongly interdependent. Even product Taking into account that it makes a category 12 at rank 5 has a weight great difference whether a prototype greater than 0.2. But there are also some represents a frequent or a scarce market other very plausible purchase basket pattern the compound h പ interdependences indicated by probabilities P(PCk i uh)canbe comparatively high weights within one computed using the available weights

᭧ Henry Stewart Publications 1479-1862 (2003) Vol. 11, 4, 373-386 Journal of Targeting, Measurement and Analysis for Marketing 381 Decker and Monien

21 18 24 25

22 17 1 10

2 5

12 9 3 4

67 11 8

Figure 4 Interdependences on the product category level

and frequencies. Additionally, defining a Figure 5 shows the result of an threshold d ϭ 0.025, for example, finally application of the Bayes approach (cf. leads to the graphical representation of equation 3) to the given data. To purchase interdependences on the simplify the graph only those connections product category level depicted in (and, as a result, prototypes) have been Figure 4. depicted that exceed an aprioridefined The present visualisation of purchase probability to appear. The probability of

interdependences is remarkable in two an edge connecting unit uh1 and unit uh2 respects. First, the individual subgraphs is equal to the number of iterations reflect, as expected, the most important where these units were the winning and interrelations from a data analytical point the co-winning unit divided by the total of view. The starlike subgraph on the numbers of iterations. Fixing the left-hand side, for instance, confirms once threshold for this probability to 1/25, for again the special role of product category instance, results in the graph on hand.

1 which has already been mentioned and Unit u3 and u5 are missing because of which results from the at least partly their ‘weak’ connections to the other existing random take away effects. units in the present sense. Each edge of Secondly, graphical representations like the graph is additionally weighted by its this are an elegant way of enabling reciprocal age. visualisation of both direct and indirect The small weight of the edge

interdependences. Product categories 2 connecting unit u2 and u7 (1/18), for and 3, for example, are directly example, indicates that the seeming interdependent. But there is also an similarity of both prototypes could not indirect relation to product category 4 be confirmed for a longer time. The and 5 via product category 1. In contrast opposite holds for the edge connecting

to this the apparently strongly unit u9 and u10. Obviously, the latest interdependent product categories 6 and input signal (transaction vector) has 7 seem to occupy a solitary position. confirmed the common ground of both This point will be considered later. prototypes. The reader might be

382 Journal of Targeting, Measurement and Analysis for Marketing Vol. 11, 4, 373–386 ᭧ Henry Stewart Publications 1479-1862 (2003) Market basket analysis with neural gas networks and self-organising maps

18 17 1 25 13 24 25 1 3 18

0.614 0.696 0.079 0.069 0.145 u8 0.581 0.420 0.056 0.102 0.060 u1 0.112 0.087 0.352 0.084 0.037 0.113 0.084 0.352 0.099 0.112 1 12 1 4

4 1 22 2 24 1 2 3 25 6 1 0.633 0.071 0.066 0.053 0.060 u10 0.445 0.741 0.519 0.178 0.103 u9 0.106 0.352 0.174 0.127 0.113 0.352 0.127 0.099 0.084 0.084

1 13

9 10 8 11 12 1 22 21 1 24 3 4 0.858 0.808 0.639 0.878 0.677 u4 0.754 0.197 0.102 0.124 0.087 u2 0.125 0.058 0.071 0.047 0.045 0.174 0.228 0.352 0.113 0.099

1 18

6 7 13 14 3 1 21 1 5 24 4 5 0.506 0.543 0.237 0.360 0.044 u6 0.630 0.094 0.141 0.107 0.104 u7 0.084 0.061 0.037 0.024 0.099 0.228 0.352 0.119 0.113 0.106

Figure 5 Interdependences on the market basket level

astonished about the connection of unit order of product categories (first row), the h u2 and u4 although both seem to be Bayesian probabilities P(uh|PCk i) (second represented by different prototypes. In row), and the product purchase h fact, this edge is primarily determined by probabilities P(PCk i) (third row). Looking product categories ranked 6 to 25, which at the last two rows some interesting are not displayed. Therefore, if the NGN asymmetries are detectable. For instance,

is continuously (eg daily) adapted to new in unit u9, the probability of observing the POS scanner data the changes of weights respective market basket pattern is greater in the course of time provide valuable with product category 2 on hand instead hints at an emerging alignment or of product category 1 (0.741 versus differentiation of buying patterns. A 0.445). The contrary is valid for the corresponding challenge to future probabilities of buying items of these two research might be the development of a product categories (0.127 versus 0.352). measurement framework to monitor An example of a more or less symmetric dynamically movements in the observed relation is given by product categories 9

purchase interdependences. Additionally, and 11 (0.858 versus 0.878) in unit u4. it would be possible to focus on the But the fact that items of product respective influence of changing sales category 9 appear in a market basket with promotion activities. a higher probability (0.125 versus 0.047) According to Figure 2 each vertex of makes this one more interesting for the graph in Figure 5 contains the rank promotional activities.

᭧ Henry Stewart Publications 1479-1862 (2003) Vol. 11, 4, 373-386 Journal of Targeting, Measurement and Analysis for Marketing 383 Decker and Monien

9 22 11 10 8 12 4 5 21 20

19 24 1 16 25 3 2 23 15 13 6 18 14 7 17

Figure 6 MDS representation of the NGN results

It is necessary to point out that If only product categories with a lift asymmetries of the present kind are only greater than 1.0 are considered another valid for individual prototypes. Product graph will result which is very similar to categories 24 and 1, for example, are that depicted in Figure 4. The characterised by an extremely asymmetric corresponding probabilities are in bold relation with respect to prototype h ϭ 1, face in Figure 5. This time product whereas the same relation looks nearly categories6and7aswellas13and14 symmetric for prototype h ϭ 10. This is would be connected within the caused by the fact that product category concerning subgraph. In this way the 24 (together with 25) dominates the abovementioned interesting nature of this profile of prototype h ϭ 1. Information interrelation is confirmed from a about those ‘dominations’ can be used to methodological point of view as well. force cross-sellings within the assortment under consideration. Last but not least, the Bayesian Visualising purchase interdependences probabilities can be used to generate by means of NGN-based rules like this: ‘Product category k multidimensional scaling determines the occurrence of market To be able to compare the results of the basket type h’. Transforming the previous subsection to traditional conditional probabilities into verbal rules approaches of purchase interdependence eases the communication between the analysis, for instance to those starting 23 analyst and the decision maker. The from association coefficients like interestingnatureofsucharulecanbe Tanimoto,25 it seems to be helpful to determined using the lift, a measure that carry out a simple transformation of the 24 is well-known from data mining. NGN output. In the present case conf (PC ⇒ u ) product categories k1 and k2 are assumed lift(PC ⇒ u ) ϭ k h k h sup (u ) to be interrelated if they have similar h പ probabilities P(PCk1 uh)andP(PCk2 പ ϭ P(uh|PCk) uh) with respect to all prototypes h. P(uh) Analysing the corresponding similarities

384 Journal of Targeting, Measurement and Analysis for Marketing Vol. 11, 4, 373–386 ᭧ Henry Stewart Publications 1479-1862 (2003) Market basket analysis with neural gas networks and self-organising maps

by means of multidimensional scaling and comprehensive isolation of random (MDS) finally leads to Figure 6. Because take away effects still requires of the high conformity of this considerably greater effort. representation (produced with SAS The authors are preparing a further PROC MDS) with the assumptions applicationofbothapproachestoalarge above a more intensive interpretation is data set provided by another retail chain not necessary. — once again concerning everyday products, but different from the chemist’s assortment. Those who are interested in CONCLUSIONS AND OUTLOOK the results (which are scheduled to be This paper is concerned with the available in summer 2003) are invited to presentation and discussion of two write to the authors. alternative neural network approaches for purchase interdependence analysis. With Acknowledgment the empirical investigation it could be The authors would like to thank two shown that both approaches are powerful anonymous reviewers for their helpful tools providing outputs that can be hints concerning an earlier draft of the processed in different ways to extract paper. information. Both can isolate random take away effects to a certain degree and References can be extended regarding the detection 1Böcker, F. (1978) ‘Die Bestimmung der of asymmetries at the market basket Kaufverbundenheit von Produkten’,Duncker& level. An important advantage of both Humblot, . 2 Hruschka, H. (1991) ‘Bestimmung der approaches is the absence of a Kaufverbundenheit mit Hilfe eines probabilistischen methodologically motivated restriction of Me␤modells’, Zeitschrift für betriebswirtschaftliche the maximum number of market baskets Forschung,Vol.43,No.5,pp.418–434. 3 Merkle, E. (1981) ‘Die Erfassung und Nutzung von to be analysed. The adaptability of NGN Informationen u¨ber den Sortimentsverbund in makes this approach a useful instrument Handelsbetrieben’, Duncker & Humblot, Berlin. for dynamic POS scanner data analysis. 4Hao,M.C.,Dayal,U.,Hsu,M.,Sprenger,T.and Gross, M. H. (2001) ‘Visualization of directed Considering the fact that, at least for the associations in e-commerce transaction data’, data here — the NGN approach is Hewlett Packard Research Laboratories, Palo Alto. superior to the SOM approach with 5 Russell, G. J. and Petersen, A. (2000) ‘Analysis of respect to both the training time and the cross category dependence in market basket selection’, Journal of Retailing,Vol.76,No.3,pp. required number of weights the former is 367–392. worth a more thorough investigation in 6 Agrawal, R., Imielinski, T. and Swami, A. (1993) the present context. On the other hand, ‘Mining association rules between sets of items in large databases’, in Proceedings of the 1993 ACM in contrast to NGN, implementations of SIGMOD International Conference on Management the basic SOM methodology are of Data, Washington, pp. 207–216. available in several commercial or 7Brin,S.,Motwani,R.,Ullman,J.D.andTsur,S. (1997) ‘Dynamic itemset counting and implication academic tools for data analysis which rules for market basket data’, in Proceedings of the makes its application significantly easier. 1997 ACM SIGMOD International Conference on Future research should concentrate on Management of Data, Tuscon, pp. 255–264. the development of meaningful quality 8 Hu, Z., Chin, W.-N. and Takeichi, M. (2000) ‘Calculating a new data mining algorithm for market measures for application-oriented market basket analysis’, in Pontelli, E. and Santos Costa, V. basket analysis and the identification of (eds) ‘Practical aspects of declarative languages’, possible differences between weekly Lecture Notes in Computer Science, No. 1753, Springer, Berlin, pp. 169–184. shopping baskets and those of ‘top-up’ 9Haoet al. (2001) op cit. shopping trips. Beyond this the reliable 10 Decker, R. and Schimmelpfennig, H. (2002)

᭧ Henry Stewart Publications 1479-1862 (2003) Vol. 11, 4, 373-386 Journal of Targeting, Measurement and Analysis for Marketing 385 Decker and Monien

‘Alternative Ansätze zur datengestützten 16 Martinetz and Schulten (1991) op cit. Verbundmessung im Electronic Retailing’,inAhlert, 17 Ibid. D.,Olbrich,R.andSchröder, H. (eds) ‘Jahrbuch 18 Atukorale, A. and Suganthan, N. (2000) Handelsmanagement 2002 — Electronic Retailing’, ‘Hierarchical overlapped neural-gas network with Deutscher Fachverlag, , pp. 193–212. application to pattern classification’, Neurocomputing, 11 Schmalen, H., Pechtl, H. and Schweitzer, W. (1996) Vol. 35, No. 1–4, pp. 165–176. ‘Sonderangebotspolitik im 19 Fritzke, B. (1995) ‘A growing neural gas network Lebensmittel-Einzelhandel’,Schäffer-Poeschel, learns topologies’,inTesauro,G.,Touretzky,D.S. . andLeen,T.K.(eds)‘Advances in neural 12 Martinetz, T. and Schulten, K. (1991) ‘A ‘‘neural information processing systems 7’, MIT Press, gas’’ network learns topologies’, in Kohonen, T., Cambridge, pp. 625–632. Ma¨kisara,K.,Simula,O.andKangas,J.(eds) 20 Martinetz and Schulten (1991) op cit. ‘Artificial neural networks’, North Holland, 21Martinetz,T.,Berkovich,S.G.andSchulten,K. Amsterdam, pp. 397–402. (1993) ‘‘‘Neural-gas’’ network for vector 13 Kohonen, T. (1982) ‘Self-organized formation of quantization and its application to time-series topologically correct feature maps’, Biological prediction’, IEEE Transactions on Neural Networks, Cybernetics,Vol.43,pp.59–69. Vol. 4, No. 4, pp. 558–569. 14 Kohonen, T. (2001) ‘Self-organizing maps’,3rdedn, 22 Fritzke (1995) op cit. Springer, Berlin. 23 Pedrycz, W. (2001) ‘Granular computing in data 15 Alahakoon, D., Halgamuge, S. K. and Srinivasan, B. mining’, in Kandel, A., Last, M. and Bunke, H. (2000) ‘Dynamic self-organizing maps with (eds) ‘Data mining and computational intelligence’, controlled growth for knowledge discovery’, IEEE Physica, Heidelberg, pp. 37–61. Transactions on Neural Networks,Vol.11,No.3,pp. 24 Brin et al. (1997) op cit. 601–614. 25 Merkle (1981) op cit.

386 Journal of Targeting, Measurement and Analysis for Marketing Vol. 11, 4, 373–386 ᭧ Henry Stewart Publications 1479-1862 (2003)