Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found
Hierarchical clustering
François Husson
Applied Mathematics Department - Rennes Agrocampus
1 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found
Hierarchical clustering
1 Introduction 2 Principles of hierarchical clustering 3 Example 4 K-means : a partitioning algorithm 5 Extras • Making more robust partitions • Clustering in high dimensions • Qualitative variables and clustering • Combining with factor analysis - clustering 6 Describing classes of individuals
1 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Hierarchical clustering
1 Introduction
2 Principles of hierarchical clustering
3 Example
4 Partitioning algorithm : K-means
5 Extras
6 Characterizing classes of individuals
2 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Introduction
• Definitions : • Clustering is : making or building classes • Class : set of individuals (or objects) with similar shared characteristics • Examples • of clustering : animal kingdom, computer hard disk, geographic division of France, etc. • of classes : social classes, political classes, etc. • Two types of clustering : • hierarchical : tree • partitioning methods
3 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Hierarchical example : the animal kingdom
4 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Hierarchical clustering
1 Introduction
2 Principles of hierarchical clustering
3 Example
4 Partitioning algorithm : K-means
5 Extras
6 Characterizing classes of individuals
5 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found What data ? What goals ?
Clustering is for data tables : rows of indivi- duals, columns of quantitative variables
Goals : build a tree structure that : 4
• shows hierarchical links between 3
individuals or groups of individuals 2
• detects a “natural” number of classes 1 in the population 0 F A B E C D H G
6 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Critères
Measuring similarity of individuals : • Euclidean distance • similarity indices • etc.
Similarity between groups of individuals : • minimum jump or single linkage (smallest distance) x x x x x x x x x x • complete linkage (largest distance) x x x x • Ward criterion
7 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Algorithm
ABC 7th grouping DEFGH 4.07 {ABCDEFGH}
ABC DE 6th grouping DE 4.72 FGH 4.07 1.81
ABC DE FG DE 4.72 5th grouping FG 4.23 1.81 H 4.07 2.90 1.12
ABC D E FG D 4.72 {ABC},{DEFGH} th 4 grouping E 5.55 1.00 FG 4.07 2.01 1.81 H 4.75 3.16 2.90 1.12 {ABC},{DE},{FGH} {ABC},{DE},{FG},{H} ABC D E F G D 4.72 {ABC},{D},{E},{FG},{H} 3rd grouping E 5.55 1.00 {ABC},{D},{E},{F},{G},{H} F 4.07 2.01 2.06 {AC},{B},{D},{E},{F},{G},{H} G 4.68 2.06 1.81 0.61 H 4.75 3.16 2.90 1.28 1.12 0 1 2{A},{B},{C},{D},{E},{F},{G},{H} 3 4 F A B E C D H G
AC B D E F G A B C D E F G B 0.50 B 0.50 st D 4.80 4.72 C 0.25 0.56 1 grouping 2nd grouping E 5.57 5.55 1.00 D 5.00 4.72 4.80 F 4.07 4.23 2.01 2.06 E 5.78 5.55 5.57 1.00 G 4.68 4.84 2.06 1.81 0.61 F 4.32 4.23 4.07 2.01 2.06 H 4.75 5.02 3.16 2.90 1.28 1.12 G 4.92 4.84 4.68 2.06 1.81 0.61 H 5.00 5.02 4.75 3.16 2.90 1.28 1.12
8 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Trees and partitions 1.5
Hierarchical● Clustering 1.0
Trees always end up . . . cut through ! 0.5
Click to cut the tree 0.0 inertia gain 1.5
Choosing a height to cut at 1.0 gives a partition ● 0.5 0.0 Qi Turi Clay Nool Uldal Terek CLAY Smith Hernu Drews NOOL Sebrle Macey Barras Karpov Gomez HERNU Bernard Lorenzo Smirnov Casarsa Warners Ojaniemi SEBRLE Schwarzl Karlivans BARRAS KARPOV YURKOV Zsivoczky Pogorelov Averyanov BERNARD WARNERS Korkizoglou McMULLEN
Remark : given how it was made, the partitionSchoenbeck is interesting but ZSIVOCZKY MARTINEAU Parkhomenko not optimal BOURGUIGNON
9 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Partition quality
When is a partition a good one ? • If individuals placed in the same class are close to each other • If individuals in different classes are far from each other
Mathematically speaking ? • small within-class variability • large between-class variability
=⇒ Two criteria. Which one to use ?
10 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Partition quality
x¯k the mean of the xk , x¯qk the mean of the xk in class q
K Q I K Q I K Q I X X X 2 X X X 2 X X X 2 (xiqk − x¯k ) = (xiqk − x¯qk ) + (¯xqk − x¯k ) k=1 q=1 i=1 k=1 q=1 i=1 k=1 q=1 i=1 | {z } | {z } | {z } total inertia within-class inertia between-class inertia
x2
x1 x x
x3
=⇒ 1 criterion only !
11 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Partition quality
Partition quality is measured by : between-class inertia 0 ≤ ≤ 1 total inertia
inertia between = 0 =⇒ ∀k, ∀q, x¯qk =x ¯k inertia total by variable, classes have the same means Doesn’t allow us to classify
inertia between = 1 =⇒ ∀k, ∀q, ∀i, xiqk =x ¯qk inertia total individuals in the same class are identical Ideal for classifying
Warning : don’t just accept this criteria at face value : it depends on the number of individuals and classes
12 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Ward’s method
• Initialize : 1 class = 1 individual =⇒ Between-class inertia = total inertia • At each step : combine classes a and b that minimize the decrease in between-class inertia m m Inertia(a) + Inertia(b) = Inertia(a ∪ b) − a b d2(a, b) ma + mb | {z } to minimize Group together objects with small weights and avoid chain effects + 10 + Saut minimum Ward Group together classes ++ + + 8 ++ + ++ + 6 with similar centers of 4
2 x x x xx gravity 0 x xxx xx 1 6 5 3 2 4 7 9 8 1 6 5 7 8 2 9 3 4 x x x 10 15 13 11 12 14 16 18 25 26 19 20 30 23 22 27 24 28 29 17 21 10 13 11 12 15 14 16 18 25 26 24 28 29 17 19 20 30 23 21 22 27 −2 −2 0 2 4 6 8 10 +
10 Saut minimum Ward + Saut minimum Ward ++ + + 8 Direct use for cluste- ++ +* ++ ***+ 6 *** ***
4 * *** ring *** 2 ** xx x** x x*x** 0 x xx*x* xx * 1 6 7 5 3 2 4 9 8 1 6 5 8 7 2 9 3 4 x x x 31 32 10 33 35 34 13 36 37 38 39 40 41 42 43 44 45 46 47 48 49 26 57 56 50 51 52 53 54 55 18 25 15 22 27 19 20 30 23 24 28 29 21 11 12 14 16 17 31 32 10 33 11 12 35 34 13 36 15 14 16 18 25 24 28 29 17 19 20 30 23 21 53 54 55 22 27 26 57 56 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 −2 −2 0 2 4 6 8 10 13 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Hierarchical clustering
1 Introduction
2 Principles of hierarchical clustering
3 Example
4 Partitioning algorithm : K-means
5 Extras
6 Characterizing classes of individuals
14 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Temperature data
• 23 individuals : European capitals • 12 variables : mean monthly temperatures over 30 years
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Area Amsterdam 2.9 2.5 5.7 8.2 12.5 14.8 17.1 17.1 14.5 11.4 7.0 4.4 West Athens 9.1 9.7 11.7 15.4 20.1 24.5 27.4 27.2 23.8 19.2 14.6 11.0 South Berlin -0.2 0.1 4.4 8.2 13.8 16.0 18.3 18.0 14.4 10.0 4.2 1.2 West Brussels 3.3 3.3 6.7 8.9 12.8 15.6 17.8 17.8 15.0 11.1 6.7 4.4 West Budapest -1.1 0.8 5.5 11.6 17.0 20.2 22.0 21.3 16.9 11.3 5.1 0.7 East Copenhagen -0.4 -0.4 1.3 5.8 11.1 15.4 17.1 16.6 13.3 8.8 4.1 1.3 North Dublin 4.8 5.0 5.9 7.8 10.4 13.3 15.0 14.6 12.7 9.7 6.7 5.4 North Elsinki -5.8 -6.2 -2.7 3.1 10.2 14.0 17.2 14.9 9.7 5.2 0.1 -2.3 North Kiev -5.9 -5.0 -0.3 7.4 14.3 17.8 19.4 18.5 13.7 7.5 1.2 -3.6 East Krakow -3.7 -2.0 1.9 7.9 13.2 16.9 18.4 17.6 13.7 8.6 2.6 -1.7 East Lisbon 10.5 11.3 12.8 14.5 16.7 19.4 21.5 21.9 20.4 17.4 13.7 11.1 South London 3.4 4.2 5.5 8.3 11.9 15.1 16.9 16.5 14.0 10.2 6.3 4.4 North Madrid 5.0 6.6 9.4 12.2 16.0 20.8 24.7 24.3 19.8 13.9 8.7 5.4 South Minsk -6.9 -6.2 -1.9 5.4 12.4 15.9 17.4 16.3 11.6 5.8 0.1 -4.2 East Moscow -9.3 -7.6 -2.0 6.0 13.0 16.6 18.3 16.7 11.2 5.1 -1.1 -6.0 East Oslo -4.3 -3.8 -0.6 4.4 10.3 14.9 16.9 15.4 11.1 5.7 0.5 -2.9 North Paris 3.7 3.7 7.3 9.7 13.7 16.5 19.0 18.7 16.1 12.5 7.3 5.2 West Prague -1.3 0.2 3.6 8.8 14.3 17.6 19.3 18.7 14.9 9.4 3.8 0.3 East Reykjavik -0.3 0.1 0.8 2.9 6.5 9.3 11.1 10.6 7.9 4.5 1.7 0.2 North Rome 7.1 8.2 10.5 13.7 17.8 21.7 24.4 24.1 20.9 16.5 11.7 8.3 South Sarajevo -1.4 0.8 4.9 9.3 13.8 17.0 18.9 18.7 15.2 10.5 5.1 0.8 South Sofia -1.7 0.2 4.3 9.7 14.3 17.7 20.0 19.5 15.8 10.7 5.0 0.6 East Stockholm -3.5 -3.5 -1.3 3.5 9.2 14.6 17.2 16.0 11.7 6.5 1.7 -1.6 North Which cities have similar weather patterns ? How to characterize groups of cities ? 15 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Temperature data : hierarchical tree Hierarchical clustering Cluster Dendrogram 0 1 2 3 4 5 6
inertia gain 0 1 2 3 4 5 6 7 Kiev Kiev Oslo Oslo Paris Paris Sofia Sofia Berlin Berlin Minsk Rome Elsinki Dublin Lisbon Lisbon Madrid Athens Athens Prague Krakow London London Moscow Brussels Sarajevo Sarajevo Budapest Reykjavik Reykjavik Stockholm Amsterdam Amsterdam 16 / 42 Copenhagen Copenhagen Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Temperature data
Loss in between-inertia when going from 23 clusters to 22 clusters: 0.01 22 clusters to 21 clusters: 0.01 21 clusters to 20 clusters: 0.01 ………………………………………….… 9 clusters to 8 clusters: 0.15 8 clusters to 7 clusters: 0.16 7 clusters to 6 clusters: 0.27
6 clusters to 5 clusters: 0.29 0 1 2 3 4 5 6 5 clusters to 4 clusters: 0.60 inertia gain 4 clusters to 3 clusters: 0.76 3 clusters to 2 clusters: 2.36 2 clusters to 1 clusters: 6.76 Important loss when going from 2 clusters to a unique cluster thus we prefer to keep 2 custers
Sum of losses of inertia = 12
17 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Using the tree to build a partition
Should we make 2 groups ? 3 ? 4 ?
Hierarchical clustering Cluster Dendrogram
Cut into 2 groups : between-class inertia 6.76 = = 56% total inertia 12
What can we compare this percentage with ? 0 1 2 3 4 5 6 7 Kiev Kiev Oslo Oslo Paris Paris Sofia Sofia Berlin Berlin Minsk Rome Elsinki Dublin Lisbon Lisbon Madrid Athens Athens Prague Krakow London London Moscow Brussels Sarajevo Sarajevo Budapest Reykjavik Reykjavik Stockholm Amsterdam Amsterdam Copenhagen Copenhagen
18 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Using the tree to build a partition
66 % of the information is contained in this 2-class cut What can we compare this percentage with ?
Moscow Kiev Budapest Minsk Athens Krakow Sofia Madrid Elsinki Oslo Rome Prague Sarajevo Stockholm Berlin Copenhagen Paris Brussels Dim 2 Dim(15.40%) London Amsterdam Lisbon -2 0 2 4
Dublin Reykjavik
-5 0 5 Dim 1 (82.90%)
19 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Using the tree to build a partition
Hierarchical clustering Cluster Dendrogram
Separate cold cities into 2 groups : between-class inertia 2.36 = = 20% total inertia 12 0 1 2 3 4 5 6 7 Kiev Kiev Oslo Oslo Paris Paris Sofia Sofia Berlin Berlin Minsk Rome Elsinki Dublin Lisbon Lisbon Madrid Athens Athens Prague Krakow London London Moscow Brussels Sarajevo Sarajevo Budapest Reykjavik Reykjavik Stockholm Amsterdam Amsterdam Copenhagen Copenhagen
19 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Using the tree to build a partition
The move from 23 cities to 3 classes : 56 %+ 20 % = 76 % of the variability in the data
Moscow Kiev Budapest Minsk Krakow Athens Prague Madrid Oslo Sarajevo Sofia Rome Elsinki Berlin Stockholm Copenhagen Paris
Dim 2 (15.40%) Dim 2 Brussels Amsterdam London Lisbon -2 0 2 4
Reykjavik Dublin
-5 0 5 Dim 1 (82.90%)
20 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Determining the number of classes • Plot with the bars • Starting from the tree • Ultimate criterion : • Depends on the use (survey, etc.) interpretability of the classes Hierarchical clustering Cluster Dendrogram 0 2 4 6 inertia gain 0 1 2 3 4 5 6 7 Kiev Kiev Oslo Oslo Paris Paris Sofia Sofia Berlin Berlin Minsk Rome Elsinki Dublin Lisbon Lisbon Madrid Athens Athens Prague Krakow London London Moscow Brussels Sarajevo Sarajevo Budapest Reykjavik Reykjavik Stockholm Amsterdam Amsterdam Copenhagen Copenhagen
20 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Hierarchical clustering
1 Introduction
2 Principles of hierarchical clustering
3 Example
4 Partitioning algorithm : K-means
5 Extras
6 Characterizing classes of individuals
21 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Partitioning algorithm : K-means
Algorithm for aggregating around moving centers (K-means)
Moscow Moscow Moscow Moscow ● Kiev Kiev Kiev Kiev ● • 2 2 2 2 Choose Budapest Budapest Budapest Budapest ● ● ● ● Minsk Minsk Minsk Minsk ●
Krakow Athens Krakow Athens Krakow Athens Krakow Athens 1 ● Sofia ● 1 ● Sofia 1 ● Sofia 1 ● Sofia Prague● Madrid Prague● Madrid Prague● Madrid Prague● Madrid Elsinki ● ● Elsinki ● ● Elsinki ● ● ● Elsinki ● ● Oslo Sarajevo Rome Oslo Sarajevo Rome Oslo Sarajevo Rome Oslo Sarajevo● Rome ● ● ● ● ● ● Stockholm Berlin Stockholm Berlin Stockholm Berlin Stockholm Berlin ● ● ● ● ● randomly 0 ● 0 ● 0 0 Copenhagen Copenhagen Copenhagen Copenhagen ● ● ● ● Paris Paris Paris Paris ● Brussels Brussels Brussels Brussels −1 ● −1 −1 −1 Dim 2 ( 15.4 %) Amsterdam Dim 2 ( 15.4 %) Amsterdam Dim 2 ( 15.4 %) Amsterdam Dim 2 ( 15.4 %) Amsterdam London● Lisbon London Lisbon London Lisbon London Lisbon ● ●
Q centers −2 −2 −2 −2 Dublin Dublin Dublin Dublin ● Reykjavik Reykjavik Reykjavik Reykjavik ● −3 −3 −3 −3
of gravity −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 Dim 1 ( 82.9 %) Dim 1 ( 82.9 %) Dim 1 ( 82.9 %) Dim 1 ( 82.9 %)
Moscow Moscow Moscow Moscow Kiev Kiev Kiev Kiev ● ● ● • 2 2 2 2 Assign the Budapest Budapest Budapest Budapest ● ● ● ● Minsk Minsk Minsk Minsk
Krakow Athens Krakow Athens Krakow Athens Krakow Athens 1 ● Sofia 1 ● Sofia 1 ● Sofia 1 ● Sofia Prague● Madrid Prague● Madrid Prague● Madrid Prague● Madrid Elsinki ●● Elsinki ● Elsinki ● Elsinki ● Oslo Sarajevo Rome Oslo Sarajevo● Rome Oslo Sarajevo Rome Oslo Sarajevo Rome ● ● ● ● Stockholm Berlin Stockholm Berlin Stockholm Berlin Stockholm Berlin ● ● ● ● points to 0 0 0 ● 0 ● Copenhagen Copenhagen Copenhagen Copenhagen ● ● ● ● Paris Paris Paris Paris ● Brussels Brussels Brussels Brussels −1 −1 ● −1 ● −1 ● Dim 2 ( 15.4 %) Amsterdam Dim 2 ( 15.4 %) Amsterdam Dim 2 ( 15.4 %) Amsterdam Dim 2 ( 15.4 %) Amsterdam London Lisbon London● Lisbon London● Lisbon London● Lisbon ● ● ●
the closest −2 −2 −2 −2 Dublin Dublin Dublin Dublin ● ● ● Reykjavik Reykjavik Reykjavik Reykjavik −3 −3 −3 −3
center −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 Dim 1 ( 82.9 %) Dim 1 ( 82.9 %) Dim 1 ( 82.9 %) Dim 1 ( 82.9 %)
Moscow Moscow Moscow Moscow Kiev Kiev Kiev Kiev ● • 2 2 2 2 Calculate Budapest Budapest Budapest Budapest ● ● ● ● Minsk Minsk Minsk Minsk
Krakow Athens Krakow Athens Krakow Athens Krakow Athens 1 ● Sofia 1 ● Sofia 1 ● Sofia 1 ● Sofia Prague● Madrid Prague● Madrid Prague● Madrid Prague● Madrid Elsinki ● Elsinki ● Elsinki ● Elsinki ● Oslo Sarajevo Rome Oslo Sarajevo Rome Oslo Sarajevo Rome Oslo Sarajevo Rome ● ● ● ● Stockholm Berlin Stockholm Berlin Stockholm Berlin Stockholm Berlin ● ● ● ● anew the 0 ● 0 0 0 ● Copenhagen Copenhagen Copenhagen ● Copenhagen ● ● ● ● ● Paris Paris Paris Paris ● ● ● ● Brussels Brussels Brussels Brussels −1 ● −1 ● −1 ● −1 ● Dim 2 ( 15.4 %) Amsterdam Dim 2 ( 15.4 %) Amsterdam Dim 2 ( 15.4 %) Amsterdam Dim 2 ( 15.4 %) Amsterdam London● Lisbon London● Lisbon London● Lisbon London● Lisbon ● ● ● ●
Q centers −2 −2 −2 −2 Dublin Dublin Dublin Dublin ● ● ● ● Reykjavik Reykjavik Reykjavik Reykjavik −3 −3 −3 −3
of gravity −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 Dim 1 ( 82.9 %) Dim 1 ( 82.9 %) Dim 1 ( 82.9 %) Dim 1 ( 82.9 %)
22 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Hierarchical clustering
1 Introduction
2 Principles of hierarchical clustering
3 Example
4 Partitioning algorithm : K-means
5 Extras
6 Characterizing classes of individuals
23 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Robustifying a partition obtained using hierarchical clustering
The partition obtained by hierarchical clustering is not optimal and can be improved or made robust using K-means
Algorithm : • use the obtained hierarchical partition to initialize K-means • run a few iterations of K-means
=⇒ potentially improved partition Advantage : more robust partition Disadvantage : loss of hierarchical structure
24 / 42 Hierarchical clustering Hierarchical Clustering
Cluster Dendrogram Hierarchical Classification 0.08 0.06 0.08 0.06 0.00 0.02 0.04 5 2 8 7 9 3 1 6 4 0.00 0.02 0.04 20 29 27 45 55 76 97 41 43 91 85 15 13 34 96 14 53 74 95 62 28 72 69 61 57 18 60 50 52 87 67 25 48 40 37 63 84 30 77 71 12 36 93 16 24 81 19 49 42 31 66 22 44 98 80 78 89 99 10 17 35 68 59 92 65 79 39 64 56 32 51 58 21 33 70 11 90 73 94 82 23 83 47 88 46 75 26 86 54 38 2 6 5 8 7 4 1 3 9 154 213 228 296 231 164 275 109 178 214 119 111 134 193 237 219 258 144 141 177 254 230 145 250 175 247 284 216 157 104 179 131 293 126 245 206 196 153 159 151 300 156 183 232 162 242 132 257 256 227 222 211 212 249 165 199 195 182 202 220 190 100 208 248 299 171 287 272 297 229 261 148 204 273 274 135 136 239 103 289 210 215 221 236 138 281 270 233 279 244 234 246 225 269 240 224 120 271 277 268 259 265 122 251 252 207 283 192 276 110 238 253 112 243 241 185 280 235 163 152 166 291 260 170 121 184 288 113 128 168 127 286 267 173 278 108 101 264 133 295 266 255 160 201 172 158 124 146 118 123 117 169 139 140 285 115 155 176 116 125 130 143 129 218 298 114 174 106 107 147 282 292 105 150 137 149 161 197 194 142 181 290 263 294 205 223 217 198 191 203 209 262 180 189 186 102 226 188 187 200 167 32 40 42 35 36 38 26 47 11 25 43 16 30 39 19 49 10 17 23 13 20 22 21 50 18 41 29 31 34 15 37 24 45 28 46 48 14 33 27 44 12 Tree from original data Tree using classes
Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Hierarchical clustering in high dimension
• If many variables : do PCA and keep only first axes =⇒ takes us to classical case • If many individuals, hierarchical algorithm is too long • Use K-means to partition into around 100 classes • Build tree using these classes (weighted by the number of individuals in each class) • Gives us the “top” of the tree
25 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Hierarchical clustering in high dimension
• If many variables : do PCA and keep only first axes =⇒ takes us to classical case • If many individuals, hierarchical algorithm is too long • Use K-means to partition into around 100 classes • Build tree using these classes (weighted by the number of individuals in each class) Hierarchical clustering • Gives us the “top” of the tree Hierarchical Clustering Cluster Dendrogram Hierarchical Classification 0.08 0.06 0.08 0.06 0.00 0.02 0.04 5 2 8 7 9 3 1 6 4 0.00 0.02 0.04 14 20 29 27 45 55 76 97 41 43 91 85 15 13 34 96 53 74 95 62 28 72 69 61 57 18 60 50 52 87 67 25 48 40 37 63 84 77 71 12 36 93 16 24 81 19 49 42 31 66 22 44 98 80 78 89 99 10 17 35 68 59 92 65 79 39 64 56 32 51 58 21 33 30 70 11 47 90 73 94 82 23 83 88 46 75 26 86 54 38 1 2 6 5 3 8 7 4 9 257 256 213 228 296 231 164 275 109 178 214 119 111 134 193 237 219 258 144 141 177 254 230 145 250 175 247 284 216 157 104 179 131 293 126 245 206 196 153 159 151 300 156 183 232 162 242 132 227 222 211 212 249 165 199 195 182 202 220 190 100 208 248 299 171 287 272 297 229 261 148 204 273 274 135 136 239 103 289 210 215 221 236 138 281 270 233 279 244 234 246 225 269 240 224 120 271 277 268 259 265 122 251 252 207 283 192 276 110 238 253 112 243 241 185 280 235 163 152 166 291 260 170 121 184 288 113 168 127 286 267 173 278 108 101 264 133 295 266 255 160 172 154 128 201 158 124 146 118 123 117 169 139 140 285 115 155 176 116 143 129 218 298 114 174 106 107 292 105 150 137 197 194 142 181 290 263 217 198 191 203 209 262 180 189 186 102 226 188 187 200 125 130 147 282 149 161 294 205 223 167 14 33 32 40 42 35 36 38 26 47 11 25 43 16 30 39 19 49 10 17 27 23 13 20 22 21 50 18 41 29 31 34 15 37 24 45 28 46 48 44 12 Tree from original data Tree using classes
25 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Hierarchical clustering on qualitative data
Two strategies :
• Transform them to quantitative data • Do MCA and keep only the first dimensions • Do hierarchical clustering using the principal axes of the MCA • Use measures/indices suitable for qualitative variables : similarity indices, Jaccard index, etc.
26 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Doing factor analysis followed by clustering
• Qualitative data : MCA outputs quantitative principal components • Factor analysis eliminates the last components, which are just noise =⇒ more stable clustering
x.1 x.k x.K F1 FQ FK
PCA Data Structure Noise
27 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Doing factor analysis followed by clustering
• Representation of the tree and classes on two factor axes =⇒ FA gives continuous information, the tree gives discontinuous information. The tree hints at information hidden in further axes Hierarchical clustering on the factor map
cluster 1 cluster 2 cluster 3 height
Moscow Kiev 3 Budapest 2 Minsk Krakow Prague Sofia Elsinki Oslo Athens 1 Berlin Sarajevo Madrid 0 Stockholm Paris Rome Copenhagen Brussels -1 Reykjavik Dublin London Amsterdam Lisbon -2
0 1 2 3 4 5 6 7 -3 -6 -4 -2 0 2 4 6 8 Dim 1 (82.9%) 28 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Hierarchical clustering
1 Introduction
2 Principles of hierarchical clustering
3 Example
4 Partitioning algorithm : K-means
5 Extras
6 Characterizing classes of individuals
29 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found The class make-up : using “model individuals” Model individuals : the ones closest to each class center Cluster 1: Oslo Helsinki Stockholm Minsk Moscow 0.339 0.884 0.9224 0.9654 1.7664 Cluster 2: Berlin Sarajevo Brussels Prague Amsterdam 0.5764 0.7164 1.038 1.0556 1.124 Cluster 3: Rome Lisbon Madrid Athens 0.360 1.737 1.835 2.167
cluster 1 cluster 2 cluster 3
Moscow Kiev Budapest Minsk Krakow Athens cluster 1 Prague Madrid Oslo Sarajevo Sofia Rome Elsinki Berlin Stockholm cluster 3 Copenhagen cluster 2 Paris Brussels Amsterdam Dim 2 Dim(15.40%) London Lisbon -2 0 2 4
Reykjavik Dublin
-5 0 5 Dim 1 (82.90%) 30 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Characterizing/describing classes
• Goals : • Find the variables which are most important for the partition • Characterize a class (or group of individuals) in terms of quantitative variables • Sort the variables that best describe the classes
• Questions : • Which variables best characterize the partition • How can we characterize individuals in the 1st class ? • Which variables describe them best ?
31 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Characterizing/describing classes Which variables best represent the partition ? • For each quantitative variable : • build an analysis of variance model between the quantitative variable and the class variable • do a Fisher test to detect class effect • Sort the variables by increasing p-value
Eta2 P-value October 0.8990 1.108e-10 March 0.8865 3.556e-10 November 0.8707 1.301e-09 September 0.8560 3.842e-09 April 0.8353 1.466e-08 February 0.8246 2.754e-08 December 0.7730 3.631e-07 January 0.7477 1.047e-06 August 0.7160 3.415e-06 July 0.6309 4.690e-05 May 0.5860 1.479e-04 June 0.5753 1.911e-04
32 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Characterizing classes using quantitative variables
January ● ● ● ●●● ●●● ●● ●●● ●● ● ● ●
February ● ● ● ●● ● ●● ● ● ●●● ● ● ● ● ●
March Elsinki ● ● ● ●● ●●● ● ●●●●● ●● ● ● ● ● Kiev Minsk April Moscow ●●● ● ●●● ●●●●●● ●● ● ● ● Oslo Reykjavik May ● ● ●● ● ●●●●●●● ● ●● ● ● Stockholm Amsterdam June Berlin ● ● ●●●●●●●●● ●● ● ●● ● ● Brussels Budapest July Copenhagen ● ● ●●●●●●● ● ●● ●● ● Dublin August Krakow ● ●●●●●●●●●●●●● ● ●● ●● ● London Paris September Prague ● ● ●● ●●●●●●●●● ● ●●● ● Sarajevo Sofia ●●● ● ● ●●●●●●●●●● ● ● ● ● ● October Athens Lisbon November Madrid ● ●● ●● ● ●● ● ●●●● ● ● ● ● Rome
December ● ●● ●●● ●●●● ● ●● ● ●
−10 0 10 20 33 / 42 Temperature Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Characterizing classes using quantitative variables
1st idea : if the values of X for class q seem to be randomly drawn from all the values of X, then X doesn’t characterize class q.
Random ● ● ●● ● ●● ●●●● ●●● ● ●●● ●● ● ● ●
January ● ● ●● ● ●● ●●●● ●●● ● ●●● ●● ● ● ●
−10 −5 0 5 10
Temperature
2nd idea : the more a random draw appears unlikely, the more X characterizes class q.
34 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Characterizing classes using quantitative variables
Idea : use as reference a random draw of nq values from N
What values can x¯q take ? (i.e., what is the distribution of X¯q ?) 2 ¯ ¯ s N − nq E(Xq) =x ¯ V(Xq) = nq N − 1
L(X¯q) = N because X¯q is a mean
x¯q − x¯ =⇒ Test statistic = r ∼ N (0, 1) s2 N−nq nq N−1
• If |test statistic| ≥ 1.96 then X characterizes class q • and the more the test statistic is large, the better X characterizes class q. Idea : rank the variables by decreasing |test statistic| 35 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Characterizing classes using quantitative variables
$quanti$‘1‘ v.test Mean in Overall sd in Overall p.value category mean category sd July -1.99 16.80 18.90 2.450 3.33 0.046100 June -2.06 14.70 16.80 2.520 3.07 0.039600 August -2.48 15.50 18.30 2.260 3.53 0.013100 May -2.55 10.80 13.30 2.430 2.96 0.010800 September -3.14 11.00 14.70 1.670 3.68 0.001710 January -3.26 -5.14 0.17 2.630 5.07 0.001130 December -3.27 -2.91 1.84 1.830 4.52 0.001080 November -3.36 0.60 5.08 0.940 4.14 0.000781 April -3.39 4.67 8.38 1.550 3.40 0.000706 February -3.44 -4.60 0.96 2.340 5.01 0.000577 October -3.45 5.76 10.10 0.919 3.87 0.000553 March -3.68 -1.14 4.06 1.100 4.39 0.000238
36 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Characterizing classes using quantitative variables
$‘2‘ NULL $‘3‘ v.test Mean in Overall sd in Overall p.value category mean category sd September 3.81 21.20 14.70 1.54 3.68 0.000140 October 3.72 16.80 10.10 1.91 3.87 0.000201 August 3.71 24.40 18.30 1.88 3.53 0.000211 November 3.69 12.20 5.08 2.26 4.14 0.000222 July 3.60 24.50 18.90 2.09 3.33 0.000314 April 3.53 14.00 8.38 1.18 3.40 0.000413 March 3.45 11.10 4.06 1.27 4.39 0.000564 February 3.43 8.95 0.96 1.74 5.01 0.000593 June 3.39 21.60 16.80 1.86 3.07 0.000700 December 3.39 8.95 1.84 2.34 4.52 0.000706 January 3.29 7.92 0.17 2.08 5.07 0.000993 May 3.18 17.60 13.30 1.55 2.96 0.001460
37 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Characterizing classes using qualitative variables
Which variables best characterize the partition ?
• For each qualitative variable, do a χ2 test between it and the class variable • Sort the variables by increasing p-value
$test.chi2 p.value df Area 0.001195843 6
38 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Characterizing classes using qualitative variables
Does the South category characterize the 3rd class ?
Cluster 3 Other cluster Total South nmc = 41 nm = 5 Not south 0 18 18 Total nc = 4 19 n = 23
Test : H : nmc = nm versus H : m abnormally overrepresented in 0 nc n 1 c nm Under H0 : L(Nmc ) = H(nc , n , n) PH0 (Nmc ≥ nmc ) Cluster 3 Cla/Mod Mod/Cla Global p.value v.test Area=South 80 100 21.74 0.000564 3.448
4 4 5 ×100 = 80 ; ×100 = 100 ; ×100 = 21.74 ; PH(4, 5 ,23) [Nmc ≥ 4] = 0.000564 5 4 23 23
=⇒ H0 rejected, South is overrepresented in the 3rd class Sort the categories in terms of p-values
39 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Characterizing classes using factor axes
These are also quantitative variables
$‘1‘ v.test Mean in Overall sd in Overall p.value category mean category sd Dim.1 -3.32 -3.37 0 0.85 3.15 0.000908 $‘2‘ v.test Mean in Overall sd in Overall p.value category mean category sd Dim.3 -2.41 -0.18 0 0.22 0.36 0.015776 $‘3‘ v.test Mean in Overall sd in Overall p.value category mean category sd Dim.1 3.86 5.66 0 1.26 3.15 0.000112
40 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found Conclusions
• Clustering can be done on tables of individuals vs quantitative variables ⇒ MCA transforms qualitative variables into quantitative ones
• hierarchical clustering gives a hierarchical tree ⇒ number of classes
• K-means can be used to make classes more robust
• Characterize classes by active and supplementary variables, quantitative or qualitative
41 / 42 Introduction Principles of hierarchical clustering Example K-means Extras Describing the classes found More
Husson Chapman & Hall/CRC Computer Science & Data Analysis Series Lê Chapman & Hall/CRC Computer Science & Data Analysis Series Pagès
Exploratory MultivariateExploratory Analysis by Example Using R Exploratory Multivariate Analysis by Example Using R SECOND EDITION Husson F., Lê S. & Pagès J. (2017) Exploratory Multivariate Analysis by Example Using R 2nd edition, 230 p., CRC/Press. Second Edition François Husson • Sébastien Lê Statistics K31116 Jérôme Pagès ISBN: 978-1-138-19634-6 90000
9 781 138 196346
The FactoMineR package for performing clustering : http://factominer.free.fr/index_fr.html Movies on Youtube : • a Youtube channel: youtube.com/HussonFrancois • a playlist with 11 movies in English • a playlist with 17 movies in French
42 / 42