Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Hierarchical clustering

François Husson

Applied Mathematics Department - Rennes Agrocampus

[email protected]

1 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found

Hierarchical clustering

1 Introduction 2 Principles of hierarchical clustering 3 Example 4 K-means : a partitioning algorithm 5 Extras • Making more robust partitions • Clustering in high dimensions • Qualitative variables and clustering • Combining with - clustering 6 Describing classes of individuals

1 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Hierarchical clustering

1 Introduction

2 Principles of hierarchical clustering

3 Example

4 Partitioning algorithm : K-means

5 Extras

6 Characterizing classes of individuals

2 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Introduction

• Definitions : • Clustering is : making or building classes • Class : set of individuals (or objects) with similar shared characteristics • Examples • of clustering : animal kingdom, computer hard disk, geographic division of France, etc. • of classes : social classes, political classes, etc. • Two types of clustering : • hierarchical : tree • partitioning methods

3 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Hierarchical example : the animal kingdom

4 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Hierarchical clustering

1 Introduction

2 Principles of hierarchical clustering

3 Example

4 Partitioning algorithm : K-means

5 Extras

6 Characterizing classes of individuals

5 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found What data ? What goals ?

Clustering is for data tables : rows of indivi- duals, columns of quantitative variables

Goals : build a tree structure that : 4

• shows hierarchical links between 3

individuals or groups of individuals 2

• detects a “natural” number of classes 1 in the population 0 F A B E C D H G

6 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Critères

Measuring similarity of individuals : • Euclidean • similarity indices • etc.

Similarity between groups of individuals : • minimum jump or single linkage (smallest distance) x x x x x x x x x x • complete linkage (largest distance) x x x x • Ward criterion

7 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Algorithm

ABC 7th grouping DEFGH 4.07 {ABCDEFGH}

ABC DE 6th grouping DE 4.72 FGH 4.07 1.81

ABC DE FG DE 4.72 5th grouping FG 4.23 1.81 H 4.07 2.90 1.12

ABC D E FG D 4.72 {ABC},{DEFGH} th 4 grouping E 5.55 1.00 FG 4.07 2.01 1.81 H 4.75 3.16 2.90 1.12 {ABC},{DE},{FGH} {ABC},{DE},{FG},{H} ABC D E F G D 4.72 {ABC},{D},{E},{FG},{H} 3rd grouping E 5.55 1.00 {ABC},{D},{E},{F},{G},{H} F 4.07 2.01 2.06 {AC},{B},{D},{E},{F},{G},{H} G 4.68 2.06 1.81 0.61 H 4.75 3.16 2.90 1.28 1.12 0 1 2{A},{B},{C},{D},{E},{F},{G},{H} 3 4 F A B E C D H G

AC B D E F G A B C D E F G B 0.50 B 0.50 st D 4.80 4.72 C 0.25 0.56 1 grouping 2nd grouping E 5.57 5.55 1.00 D 5.00 4.72 4.80 F 4.07 4.23 2.01 2.06 E 5.78 5.55 5.57 1.00 G 4.68 4.84 2.06 1.81 0.61 F 4.32 4.23 4.07 2.01 2.06 H 4.75 5.02 3.16 2.90 1.28 1.12 G 4.92 4.84 4.68 2.06 1.81 0.61 H 5.00 5.02 4.75 3.16 2.90 1.28 1.12

8 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Trees and partitions 1.5

Hierarchical● Clustering 1.0

Trees always end up . . . cut through ! 0.5

Click to cut the tree 0.0 inertia gain 1.5

Choosing a height to cut at 1.0 gives a partition ● 0.5 0.0 Qi Turi Clay Nool Uldal Terek CLAY Smith Hernu Drews NOOL Sebrle Macey Barras Karpov Gomez HERNU Bernard Lorenzo Smirnov Casarsa Warners Ojaniemi SEBRLE Schwarzl Karlivans BARRAS KARPOV YURKOV Zsivoczky Pogorelov Averyanov BERNARD WARNERS Korkizoglou McMULLEN

Remark : given how it was made, the partitionSchoenbeck is interesting but ZSIVOCZKY MARTINEAU Parkhomenko not optimal BOURGUIGNON

9 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Partition quality

When is a partition a good one ? • If individuals placed in the same class are close to each other • If individuals in different classes are far from each other

Mathematically speaking ? • small within-class variability • large between-class variability

=⇒ Two criteria. Which one to use ?

10 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Partition quality

x¯k the mean of the xk , x¯qk the mean of the xk in class q

K Q I K Q I K Q I X X X 2 X X X 2 X X X 2 (xiqk − x¯k ) = (xiqk − x¯qk ) + (¯xqk − x¯k ) k=1 q=1 i=1 k=1 q=1 i=1 k=1 q=1 i=1 | {z } | {z } | {z } total inertia within-class inertia between-class inertia

x2

x1 x x

x3

=⇒ 1 criterion only !

11 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Partition quality

Partition quality is measured by : between-class inertia 0 ≤ ≤ 1 total inertia

inertia between = 0 =⇒ ∀k, ∀q, x¯qk =x ¯k inertia total by variable, classes have the same means Doesn’t allow us to classify

inertia between = 1 =⇒ ∀k, ∀q, ∀i, xiqk =x ¯qk inertia total individuals in the same class are identical Ideal for classifying

Warning : don’t just accept this criteria at face value : it depends on the number of individuals and classes

12 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Ward’s method

• Initialize : 1 class = 1 individual =⇒ Between-class inertia = total inertia • At each step : combine classes a and b that minimize the decrease in between-class inertia m m Inertia(a) + Inertia(b) = Inertia(a ∪ b) − a b d2(a, b) ma + mb | {z } to minimize Group together objects with small weights and avoid chain effects + 10 + Saut minimum Ward Group together classes ++ + + 8 ++ + ++ + 6 with similar centers of 4

2 x x x xx gravity 0 x xxx xx 1 6 5 3 2 4 7 9 8 1 6 5 7 8 2 9 3 4 x x x 10 15 13 11 12 14 16 18 25 26 19 20 30 23 22 27 24 28 29 17 21 10 13 11 12 15 14 16 18 25 26 24 28 29 17 19 20 30 23 21 22 27 −2 −2 0 2 4 6 8 10 +

10 Saut minimum Ward + Saut minimum Ward ++ + + 8 Direct use for cluste- ++ +* ++ ***+ 6 *** ***

4 * *** ring *** 2 ** xx x** x x*x** 0 x xx*x* xx * 1 6 7 5 3 2 4 9 8 1 6 5 8 7 2 9 3 4 x x x 31 32 10 33 35 34 13 36 37 38 39 40 41 42 43 44 45 46 47 48 49 26 57 56 50 51 52 53 54 55 18 25 15 22 27 19 20 30 23 24 28 29 21 11 12 14 16 17 31 32 10 33 11 12 35 34 13 36 15 14 16 18 25 24 28 29 17 19 20 30 23 21 53 54 55 22 27 26 57 56 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 −2 −2 0 2 4 6 8 10 13 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Hierarchical clustering

1 Introduction

2 Principles of hierarchical clustering

3 Example

4 Partitioning algorithm : K-means

5 Extras

6 Characterizing classes of individuals

14 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Temperature data

• 23 individuals : European capitals • 12 variables : mean monthly temperatures over 30 years

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Area Amsterdam 2.9 2.5 5.7 8.2 12.5 14.8 17.1 17.1 14.5 11.4 7.0 4.4 West Athens 9.1 9.7 11.7 15.4 20.1 24.5 27.4 27.2 23.8 19.2 14.6 11.0 South Berlin -0.2 0.1 4.4 8.2 13.8 16.0 18.3 18.0 14.4 10.0 4.2 1.2 West Brussels 3.3 3.3 6.7 8.9 12.8 15.6 17.8 17.8 15.0 11.1 6.7 4.4 West Budapest -1.1 0.8 5.5 11.6 17.0 20.2 22.0 21.3 16.9 11.3 5.1 0.7 East Copenhagen -0.4 -0.4 1.3 5.8 11.1 15.4 17.1 16.6 13.3 8.8 4.1 1.3 North Dublin 4.8 5.0 5.9 7.8 10.4 13.3 15.0 14.6 12.7 9.7 6.7 5.4 North Elsinki -5.8 -6.2 -2.7 3.1 10.2 14.0 17.2 14.9 9.7 5.2 0.1 -2.3 North Kiev -5.9 -5.0 -0.3 7.4 14.3 17.8 19.4 18.5 13.7 7.5 1.2 -3.6 East Krakow -3.7 -2.0 1.9 7.9 13.2 16.9 18.4 17.6 13.7 8.6 2.6 -1.7 East Lisbon 10.5 11.3 12.8 14.5 16.7 19.4 21.5 21.9 20.4 17.4 13.7 11.1 South London 3.4 4.2 5.5 8.3 11.9 15.1 16.9 16.5 14.0 10.2 6.3 4.4 North Madrid 5.0 6.6 9.4 12.2 16.0 20.8 24.7 24.3 19.8 13.9 8.7 5.4 South Minsk -6.9 -6.2 -1.9 5.4 12.4 15.9 17.4 16.3 11.6 5.8 0.1 -4.2 East Moscow -9.3 -7.6 -2.0 6.0 13.0 16.6 18.3 16.7 11.2 5.1 -1.1 -6.0 East Oslo -4.3 -3.8 -0.6 4.4 10.3 14.9 16.9 15.4 11.1 5.7 0.5 -2.9 North Paris 3.7 3.7 7.3 9.7 13.7 16.5 19.0 18.7 16.1 12.5 7.3 5.2 West Prague -1.3 0.2 3.6 8.8 14.3 17.6 19.3 18.7 14.9 9.4 3.8 0.3 East Reykjavik -0.3 0.1 0.8 2.9 6.5 9.3 11.1 10.6 7.9 4.5 1.7 0.2 North Rome 7.1 8.2 10.5 13.7 17.8 21.7 24.4 24.1 20.9 16.5 11.7 8.3 South Sarajevo -1.4 0.8 4.9 9.3 13.8 17.0 18.9 18.7 15.2 10.5 5.1 0.8 South Sofia -1.7 0.2 4.3 9.7 14.3 17.7 20.0 19.5 15.8 10.7 5.0 0.6 East Stockholm -3.5 -3.5 -1.3 3.5 9.2 14.6 17.2 16.0 11.7 6.5 1.7 -1.6 North Which cities have similar weather patterns ? How to characterize groups of cities ? 15 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Temperature data : hierarchical tree Hierarchical clustering Cluster Dendrogram 0 1 2 3 4 5 6

inertia gain 0 1 2 3 4 5 6 7 Kiev Kiev Oslo Oslo Paris Paris Sofia Sofia Berlin Berlin Minsk Rome Elsinki Dublin Lisbon Lisbon Madrid Athens Athens Prague Krakow London London Moscow Brussels Sarajevo Sarajevo Budapest Reykjavik Reykjavik Stockholm Amsterdam Amsterdam 16 / 42 Copenhagen Copenhagen Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Temperature data

Loss in between-inertia when going from 23 clusters to 22 clusters: 0.01 22 clusters to 21 clusters: 0.01 21 clusters to 20 clusters: 0.01 ………………………………………….… 9 clusters to 8 clusters: 0.15 8 clusters to 7 clusters: 0.16 7 clusters to 6 clusters: 0.27

6 clusters to 5 clusters: 0.29 0 1 2 3 4 5 6 5 clusters to 4 clusters: 0.60 inertia gain 4 clusters to 3 clusters: 0.76 3 clusters to 2 clusters: 2.36 2 clusters to 1 clusters: 6.76 Important loss when going from 2 clusters to a unique cluster thus we prefer to keep 2 custers

Sum of losses of inertia = 12

17 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Using the tree to build a partition

Should we make 2 groups ? 3 ? 4 ?

Hierarchical clustering Cluster Dendrogram

Cut into 2 groups : between-class inertia 6.76 = = 56% total inertia 12

What can we compare this percentage with ? 0 1 2 3 4 5 6 7 Kiev Kiev Oslo Oslo Paris Paris Sofia Sofia Berlin Berlin Minsk Rome Elsinki Dublin Lisbon Lisbon Madrid Athens Athens Prague Krakow London London Moscow Brussels Sarajevo Sarajevo Budapest Reykjavik Reykjavik Stockholm Amsterdam Amsterdam Copenhagen Copenhagen

18 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Using the tree to build a partition

66 % of the information is contained in this 2-class cut What can we compare this percentage with ?

Moscow Kiev Budapest Minsk Athens Krakow Sofia Madrid Elsinki Oslo Rome Prague Sarajevo Stockholm Berlin Copenhagen Paris Brussels Dim 2 Dim(15.40%) London Amsterdam Lisbon -2 0 2 4

Dublin Reykjavik

-5 0 5 Dim 1 (82.90%)

19 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Using the tree to build a partition

Hierarchical clustering Cluster Dendrogram

Separate cold cities into 2 groups : between-class inertia 2.36 = = 20% total inertia 12 0 1 2 3 4 5 6 7 Kiev Kiev Oslo Oslo Paris Paris Sofia Sofia Berlin Berlin Minsk Rome Elsinki Dublin Lisbon Lisbon Madrid Athens Athens Prague Krakow London London Moscow Brussels Sarajevo Sarajevo Budapest Reykjavik Reykjavik Stockholm Amsterdam Amsterdam Copenhagen Copenhagen

19 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Using the tree to build a partition

The move from 23 cities to 3 classes : 56 %+ 20 % = 76 % of the variability in the data

Moscow Kiev Budapest Minsk Krakow Athens Prague Madrid Oslo Sarajevo Sofia Rome Elsinki Berlin Stockholm Copenhagen Paris

Dim 2 (15.40%) Dim 2 Brussels Amsterdam London Lisbon -2 0 2 4

Reykjavik Dublin

-5 0 5 Dim 1 (82.90%)

20 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Determining the number of classes • Plot with the bars • Starting from the tree • Ultimate criterion : • Depends on the use (survey, etc.) interpretability of the classes Hierarchical clustering Cluster Dendrogram 0 2 4 6 inertia gain 0 1 2 3 4 5 6 7 Kiev Kiev Oslo Oslo Paris Paris Sofia Sofia Berlin Berlin Minsk Rome Elsinki Dublin Lisbon Lisbon Madrid Athens Athens Prague Krakow London London Moscow Brussels Sarajevo Sarajevo Budapest Reykjavik Reykjavik Stockholm Amsterdam Amsterdam Copenhagen Copenhagen

20 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Hierarchical clustering

1 Introduction

2 Principles of hierarchical clustering

3 Example

4 Partitioning algorithm : K-means

5 Extras

6 Characterizing classes of individuals

21 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Partitioning algorithm : K-means

Algorithm for aggregating around moving centers (K-means)

Moscow Moscow Moscow Moscow ● Kiev Kiev Kiev Kiev ● • 2 2 2 2 Choose Budapest Budapest Budapest Budapest ● ● ● ● Minsk Minsk Minsk Minsk ●

Krakow Athens Krakow Athens Krakow Athens Krakow Athens 1 ● Sofia ● 1 ● Sofia 1 ● Sofia 1 ● Sofia Prague● Madrid Prague● Madrid Prague● Madrid Prague● Madrid Elsinki ● ● Elsinki ● ● Elsinki ● ● ● Elsinki ● ● Oslo Sarajevo Rome Oslo Sarajevo Rome Oslo Sarajevo Rome Oslo Sarajevo● Rome ● ● ● ● ● ● Stockholm Berlin Stockholm Berlin Stockholm Berlin Stockholm Berlin ● ● ● ● ● randomly 0 ● 0 ● 0 0 Copenhagen Copenhagen Copenhagen Copenhagen ● ● ● ● Paris Paris Paris Paris ● Brussels Brussels Brussels Brussels −1 ● −1 −1 −1 Dim 2 ( 15.4 %) Amsterdam Dim 2 ( 15.4 %) Amsterdam Dim 2 ( 15.4 %) Amsterdam Dim 2 ( 15.4 %) Amsterdam London● Lisbon London Lisbon London Lisbon London Lisbon ● ●

Q centers −2 −2 −2 −2 Dublin Dublin Dublin Dublin ● Reykjavik Reykjavik Reykjavik Reykjavik ● −3 −3 −3 −3

of gravity −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 Dim 1 ( 82.9 %) Dim 1 ( 82.9 %) Dim 1 ( 82.9 %) Dim 1 ( 82.9 %)

Moscow Moscow Moscow Moscow Kiev Kiev Kiev Kiev ● ● ● • 2 2 2 2 Assign the Budapest Budapest Budapest Budapest ● ● ● ● Minsk Minsk Minsk Minsk

Krakow Athens Krakow Athens Krakow Athens Krakow Athens 1 ● Sofia 1 ● Sofia 1 ● Sofia 1 ● Sofia Prague● Madrid Prague● Madrid Prague● Madrid Prague● Madrid Elsinki ●● Elsinki ● Elsinki ● Elsinki ● Oslo Sarajevo Rome Oslo Sarajevo● Rome Oslo Sarajevo Rome Oslo Sarajevo Rome ● ● ● ● Stockholm Berlin Stockholm Berlin Stockholm Berlin Stockholm Berlin ● ● ● ● points to 0 0 0 ● 0 ● Copenhagen Copenhagen Copenhagen Copenhagen ● ● ● ● Paris Paris Paris Paris ● Brussels Brussels Brussels Brussels −1 −1 ● −1 ● −1 ● Dim 2 ( 15.4 %) Amsterdam Dim 2 ( 15.4 %) Amsterdam Dim 2 ( 15.4 %) Amsterdam Dim 2 ( 15.4 %) Amsterdam London Lisbon London● Lisbon London● Lisbon London● Lisbon ● ● ●

the closest −2 −2 −2 −2 Dublin Dublin Dublin Dublin ● ● ● Reykjavik Reykjavik Reykjavik Reykjavik −3 −3 −3 −3

center −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 Dim 1 ( 82.9 %) Dim 1 ( 82.9 %) Dim 1 ( 82.9 %) Dim 1 ( 82.9 %)

Moscow Moscow Moscow Moscow Kiev Kiev Kiev Kiev ● • 2 2 2 2 Calculate Budapest Budapest Budapest Budapest ● ● ● ● Minsk Minsk Minsk Minsk

Krakow Athens Krakow Athens Krakow Athens Krakow Athens 1 ● Sofia 1 ● Sofia 1 ● Sofia 1 ● Sofia Prague● Madrid Prague● Madrid Prague● Madrid Prague● Madrid Elsinki ● Elsinki ● Elsinki ● Elsinki ● Oslo Sarajevo Rome Oslo Sarajevo Rome Oslo Sarajevo Rome Oslo Sarajevo Rome ● ● ● ● Stockholm Berlin Stockholm Berlin Stockholm Berlin Stockholm Berlin ● ● ● ● anew the 0 ● 0 0 0 ● Copenhagen Copenhagen Copenhagen ● Copenhagen ● ● ● ● ● Paris Paris Paris Paris ● ● ● ● Brussels Brussels Brussels Brussels −1 ● −1 ● −1 ● −1 ● Dim 2 ( 15.4 %) Amsterdam Dim 2 ( 15.4 %) Amsterdam Dim 2 ( 15.4 %) Amsterdam Dim 2 ( 15.4 %) Amsterdam London● Lisbon London● Lisbon London● Lisbon London● Lisbon ● ● ● ●

Q centers −2 −2 −2 −2 Dublin Dublin Dublin Dublin ● ● ● ● Reykjavik Reykjavik Reykjavik Reykjavik −3 −3 −3 −3

of gravity −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 Dim 1 ( 82.9 %) Dim 1 ( 82.9 %) Dim 1 ( 82.9 %) Dim 1 ( 82.9 %)

22 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Hierarchical clustering

1 Introduction

2 Principles of hierarchical clustering

3 Example

4 Partitioning algorithm : K-means

5 Extras

6 Characterizing classes of individuals

23 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Robustifying a partition obtained using hierarchical clustering

The partition obtained by hierarchical clustering is not optimal and can be improved or made robust using K-means

Algorithm : • use the obtained hierarchical partition to initialize K-means • run a few iterations of K-means

=⇒ potentially improved partition Advantage : more robust partition Disadvantage : loss of hierarchical structure

24 / 42 Hierarchical clustering Hierarchical Clustering

Cluster Dendrogram Hierarchical Classification 0.08 0.06 0.08 0.06 0.00 0.02 0.04 5 2 8 7 9 3 1 6 4 0.00 0.02 0.04 20 29 27 45 55 76 97 41 43 91 85 15 13 34 96 14 53 74 95 62 28 72 69 61 57 18 60 50 52 87 67 25 48 40 37 63 84 30 77 71 12 36 93 16 24 81 19 49 42 31 66 22 44 98 80 78 89 99 10 17 35 68 59 92 65 79 39 64 56 32 51 58 21 33 70 11 90 73 94 82 23 83 47 88 46 75 26 86 54 38 2 6 5 8 7 4 1 3 9 154 213 228 296 231 164 275 109 178 214 119 111 134 193 237 219 258 144 141 177 254 230 145 250 175 247 284 216 157 104 179 131 293 126 245 206 196 153 159 151 300 156 183 232 162 242 132 257 256 227 222 211 212 249 165 199 195 182 202 220 190 100 208 248 299 171 287 272 297 229 261 148 204 273 274 135 136 239 103 289 210 215 221 236 138 281 270 233 279 244 234 246 225 269 240 224 120 271 277 268 259 265 122 251 252 207 283 192 276 110 238 253 112 243 241 185 280 235 163 152 166 291 260 170 121 184 288 113 128 168 127 286 267 173 278 108 101 264 133 295 266 255 160 201 172 158 124 146 118 123 117 169 139 140 285 115 155 176 116 125 130 143 129 218 298 114 174 106 107 147 282 292 105 150 137 149 161 197 194 142 181 290 263 294 205 223 217 198 191 203 209 262 180 189 186 102 226 188 187 200 167 32 40 42 35 36 38 26 47 11 25 43 16 30 39 19 49 10 17 23 13 20 22 21 50 18 41 29 31 34 15 37 24 45 28 46 48 14 33 27 44 12 Tree from original data Tree using classes

Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Hierarchical clustering in high dimension

• If many variables : do PCA and keep only first axes =⇒ takes us to classical case • If many individuals, hierarchical algorithm is too long • Use K-means to partition into around 100 classes • Build tree using these classes (weighted by the number of individuals in each class) • Gives us the “top” of the tree

25 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Hierarchical clustering in high dimension

• If many variables : do PCA and keep only first axes =⇒ takes us to classical case • If many individuals, hierarchical algorithm is too long • Use K-means to partition into around 100 classes • Build tree using these classes (weighted by the number of individuals in each class) Hierarchical clustering • Gives us the “top” of the tree Hierarchical Clustering Cluster Dendrogram Hierarchical Classification 0.08 0.06 0.08 0.06 0.00 0.02 0.04 5 2 8 7 9 3 1 6 4 0.00 0.02 0.04 14 20 29 27 45 55 76 97 41 43 91 85 15 13 34 96 53 74 95 62 28 72 69 61 57 18 60 50 52 87 67 25 48 40 37 63 84 77 71 12 36 93 16 24 81 19 49 42 31 66 22 44 98 80 78 89 99 10 17 35 68 59 92 65 79 39 64 56 32 51 58 21 33 30 70 11 47 90 73 94 82 23 83 88 46 75 26 86 54 38 1 2 6 5 3 8 7 4 9 257 256 213 228 296 231 164 275 109 178 214 119 111 134 193 237 219 258 144 141 177 254 230 145 250 175 247 284 216 157 104 179 131 293 126 245 206 196 153 159 151 300 156 183 232 162 242 132 227 222 211 212 249 165 199 195 182 202 220 190 100 208 248 299 171 287 272 297 229 261 148 204 273 274 135 136 239 103 289 210 215 221 236 138 281 270 233 279 244 234 246 225 269 240 224 120 271 277 268 259 265 122 251 252 207 283 192 276 110 238 253 112 243 241 185 280 235 163 152 166 291 260 170 121 184 288 113 168 127 286 267 173 278 108 101 264 133 295 266 255 160 172 154 128 201 158 124 146 118 123 117 169 139 140 285 115 155 176 116 143 129 218 298 114 174 106 107 292 105 150 137 197 194 142 181 290 263 217 198 191 203 209 262 180 189 186 102 226 188 187 200 125 130 147 282 149 161 294 205 223 167 14 33 32 40 42 35 36 38 26 47 11 25 43 16 30 39 19 49 10 17 27 23 13 20 22 21 50 18 41 29 31 34 15 37 24 45 28 46 48 44 12 Tree from original data Tree using classes

25 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Hierarchical clustering on qualitative data

Two strategies :

• Transform them to quantitative data • Do MCA and keep only the first dimensions • Do hierarchical clustering using the principal axes of the MCA • Use measures/indices suitable for qualitative variables : similarity indices, Jaccard index, etc.

26 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Doing factor analysis followed by clustering

• Qualitative data : MCA outputs quantitative principal components • Factor analysis eliminates the last components, which are just noise =⇒ more stable clustering

x.1 x.k x.K F1 FQ FK

PCA Data Structure Noise

27 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Doing factor analysis followed by clustering

• Representation of the tree and classes on two factor axes =⇒ FA gives continuous information, the tree gives discontinuous information. The tree hints at information hidden in further axes Hierarchical clustering on the factor map

cluster 1 cluster 2 cluster 3 height

Moscow Kiev 3 Budapest 2 Minsk Krakow Prague Sofia Elsinki Oslo Athens 1 Berlin Sarajevo Madrid 0 Stockholm Paris Rome Copenhagen Brussels -1 Reykjavik Dublin London Amsterdam Lisbon -2

0 1 2 3 4 5 6 7 -3 -6 -4 -2 0 2 4 6 8 Dim 1 (82.9%) 28 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Hierarchical clustering

1 Introduction

2 Principles of hierarchical clustering

3 Example

4 Partitioning algorithm : K-means

5 Extras

6 Characterizing classes of individuals

29 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found The class make-up : using “model individuals” Model individuals : the ones closest to each class center Cluster 1: Oslo Helsinki Stockholm Minsk Moscow 0.339 0.884 0.9224 0.9654 1.7664 Cluster 2: Berlin Sarajevo Brussels Prague Amsterdam 0.5764 0.7164 1.038 1.0556 1.124 Cluster 3: Rome Lisbon Madrid Athens 0.360 1.737 1.835 2.167

cluster 1 cluster 2 cluster 3

Moscow Kiev Budapest Minsk Krakow Athens cluster 1 Prague Madrid Oslo Sarajevo Sofia Rome Elsinki Berlin Stockholm cluster 3 Copenhagen cluster 2 Paris Brussels Amsterdam Dim 2 Dim(15.40%) London Lisbon -2 0 2 4

Reykjavik Dublin

-5 0 5 Dim 1 (82.90%) 30 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Characterizing/describing classes

• Goals : • Find the variables which are most important for the partition • Characterize a class (or group of individuals) in terms of quantitative variables • Sort the variables that best describe the classes

• Questions : • Which variables best characterize the partition • How can we characterize individuals in the 1st class ? • Which variables describe them best ?

31 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Characterizing/describing classes Which variables best represent the partition ? • For each quantitative variable : • build an analysis of variance model between the quantitative variable and the class variable • do a Fisher test to detect class effect • Sort the variables by increasing p-value

Eta2 P-value October 0.8990 1.108e-10 March 0.8865 3.556e-10 November 0.8707 1.301e-09 September 0.8560 3.842e-09 April 0.8353 1.466e-08 February 0.8246 2.754e-08 December 0.7730 3.631e-07 January 0.7477 1.047e-06 August 0.7160 3.415e-06 July 0.6309 4.690e-05 May 0.5860 1.479e-04 June 0.5753 1.911e-04

32 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Characterizing classes using quantitative variables

January ● ● ● ●●● ●●● ●● ●●● ●● ● ● ●

February ● ● ● ●● ● ●● ● ● ●●● ● ● ● ● ●

March Elsinki ● ● ● ●● ●●● ● ●●●●● ●● ● ● ● ● Kiev Minsk April Moscow ●●● ● ●●● ●●●●●● ●● ● ● ● Oslo Reykjavik May ● ● ●● ● ●●●●●●● ● ●● ● ● Stockholm Amsterdam June Berlin ● ● ●●●●●●●●● ●● ● ●● ● ● Brussels Budapest July Copenhagen ● ● ●●●●●●● ● ●● ●● ● Dublin August Krakow ● ●●●●●●●●●●●●● ● ●● ●● ● London Paris September Prague ● ● ●● ●●●●●●●●● ● ●●● ● Sarajevo Sofia ●●● ● ● ●●●●●●●●●● ● ● ● ● ● October Athens Lisbon November Madrid ● ●● ●● ● ●● ● ●●●● ● ● ● ● Rome

December ● ●● ●●● ●●●● ● ●● ● ●

−10 0 10 20 33 / 42 Temperature Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Characterizing classes using quantitative variables

1st idea : if the values of X for class q seem to be randomly drawn from all the values of X, then X doesn’t characterize class q.

Random ● ● ●● ● ●● ●●●● ●●● ● ●●● ●● ● ● ●

January ● ● ●● ● ●● ●●●● ●●● ● ●●● ●● ● ● ●

−10 −5 0 5 10

Temperature

2nd idea : the more a random draw appears unlikely, the more X characterizes class q.

34 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Characterizing classes using quantitative variables

Idea : use as reference a random draw of nq values from N

What values can x¯q take ? (i.e., what is the distribution of X¯q ?) 2   ¯ ¯ s N − nq E(Xq) =x ¯ V(Xq) = nq N − 1

L(X¯q) = N because X¯q is a mean

x¯q − x¯ =⇒ Test statistic = ∼ N (0, 1) s2  N−nq  nq N−1

• If |test statistic| ≥ 1.96 then X characterizes class q • and the more the test statistic is large, the better X characterizes class q. Idea : rank the variables by decreasing |test statistic| 35 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Characterizing classes using quantitative variables

$quanti$‘1‘ v.test Mean in Overall sd in Overall p.value category mean category sd July -1.99 16.80 18.90 2.450 3.33 0.046100 June -2.06 14.70 16.80 2.520 3.07 0.039600 August -2.48 15.50 18.30 2.260 3.53 0.013100 May -2.55 10.80 13.30 2.430 2.96 0.010800 September -3.14 11.00 14.70 1.670 3.68 0.001710 January -3.26 -5.14 0.17 2.630 5.07 0.001130 December -3.27 -2.91 1.84 1.830 4.52 0.001080 November -3.36 0.60 5.08 0.940 4.14 0.000781 April -3.39 4.67 8.38 1.550 3.40 0.000706 February -3.44 -4.60 0.96 2.340 5.01 0.000577 October -3.45 5.76 10.10 0.919 3.87 0.000553 March -3.68 -1.14 4.06 1.100 4.39 0.000238

36 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Characterizing classes using quantitative variables

$‘2‘ NULL $‘3‘ v.test Mean in Overall sd in Overall p.value category mean category sd September 3.81 21.20 14.70 1.54 3.68 0.000140 October 3.72 16.80 10.10 1.91 3.87 0.000201 August 3.71 24.40 18.30 1.88 3.53 0.000211 November 3.69 12.20 5.08 2.26 4.14 0.000222 July 3.60 24.50 18.90 2.09 3.33 0.000314 April 3.53 14.00 8.38 1.18 3.40 0.000413 March 3.45 11.10 4.06 1.27 4.39 0.000564 February 3.43 8.95 0.96 1.74 5.01 0.000593 June 3.39 21.60 16.80 1.86 3.07 0.000700 December 3.39 8.95 1.84 2.34 4.52 0.000706 January 3.29 7.92 0.17 2.08 5.07 0.000993 May 3.18 17.60 13.30 1.55 2.96 0.001460

37 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Characterizing classes using qualitative variables

Which variables best characterize the partition ?

• For each qualitative variable, do a χ2 test between it and the class variable • Sort the variables by increasing p-value

$test.chi2 p.value df Area 0.001195843 6

38 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Characterizing classes using qualitative variables

Does the South category characterize the 3rd class ?

Cluster 3 Other cluster Total South nmc = 41 nm = 5 Not south 0 18 18 Total nc = 4 19 n = 23

Test : H : nmc = nm versus H : m abnormally overrepresented in 0 nc n 1 c nm Under H0 : L(Nmc ) = H(nc , n , n) PH0 (Nmc ≥ nmc ) Cluster 3 Cla/Mod Mod/Cla Global p.value v.test Area=South 80 100 21.74 0.000564 3.448

4 4 5 ×100 = 80 ; ×100 = 100 ; ×100 = 21.74 ; PH(4, 5 ,23) [Nmc ≥ 4] = 0.000564 5 4 23 23

=⇒ H0 rejected, South is overrepresented in the 3rd class Sort the categories in terms of p-values

39 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Characterizing classes using factor axes

These are also quantitative variables

$‘1‘ v.test Mean in Overall sd in Overall p.value category mean category sd Dim.1 -3.32 -3.37 0 0.85 3.15 0.000908 $‘2‘ v.test Mean in Overall sd in Overall p.value category mean category sd Dim.3 -2.41 -0.18 0 0.22 0.36 0.015776 $‘3‘ v.test Mean in Overall sd in Overall p.value category mean category sd Dim.1 3.86 5.66 0 1.26 3.15 0.000112

40 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Conclusions

• Clustering can be done on tables of individuals vs quantitative variables ⇒ MCA transforms qualitative variables into quantitative ones

• hierarchical clustering gives a hierarchical tree ⇒ number of classes

• K-means can be used to make classes more robust

• Characterize classes by active and supplementary variables, quantitative or qualitative

41 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found More

Husson Chapman & Hall/CRC Computer Science & Data Analysis Series Lê Chapman & Hall/CRC Computer Science & Data Analysis Series Pagès

Exploratory MultivariateExploratory Analysis by Example Using R Exploratory Multivariate Analysis by Example Using R SECOND EDITION Husson F., Lê S. & Pagès J. (2017) Exploratory Multivariate Analysis by Example Using R 2nd edition, 230 p., CRC/Press. Second Edition François Husson • Sébastien Lê K31116 Jérôme Pagès ISBN: 978-1-138-19634-6 90000

9 781 138 196346

The FactoMineR package for performing clustering : http://factominer.free.fr/index_fr.html Movies on Youtube : • a Youtube channel: youtube.com/HussonFrancois • a playlist with 11 movies in English • a playlist with 17 movies in French

42 / 42