<<

Supplementary Information for

History of through the lens of entropy and complexity

Higor Y. D. Sigaki, Matjaž Perc, and Haroldo V. Ribeiro

To whom correspondence should be addressed. E-mail: [email protected] or hvr@dfi.uem.br

This PDF file includes: Supplementary text Figs. S1 to S11

Higor Y. D. Sigaki, Matjaž Perc, and Haroldo V. Ribeiro 1 of 12

www.pnas.org/cgi/doi/10.1073/pnas.1800083115 Supporting Information Text

0.14 1962-1970 1962-1970 1970-1980 1994-2016 1970-1980 1994-2016

1980-1994 1980-1994 0.12 1760-1836 1760-1836 1570-1760 1570-1760 1952-1962 1952-1962 1031-1570 1031-1570 0.10

Complexity, C Complexity, 1939-1952 1836-1869 1939-1952 1836-1869

1895-1902 1869-1880 1895-1902 1869-1880 0.08 1902-1909 1902-1909

0.80 0.84 0.88 0.92 0.96 0.80 0.84 0.88 0.92 0.96 Entropy, H Entropy, H

Fig. S1. Robustness of the evolution trends against sampling. Each gray curve corresponds to the average values of H and C obtained by randomly sampling 30% (left panel) and 10% (right panel) of the images in the dataset. A total of 100 different realizations of the sampling procedure were made. The black curves depict the average trend obtained with the full dataset. We observe that the historical trends displayed by the average values of H and C are robust against sampling, even when using only 10% of images.

2 of 12 Higor Y. D. Sigaki, Matjaž Perc, and Haroldo V. Ribeiro Fig. S2. The relationship between the values of H and C, calculated by means of the average RGB channels and by means of the gray-scale luminance transformation. Each dot in the scatter plots shows the values of H and C for each image, as obtained through the average values of the three color shades of each pixel, and through the gray-scale luminance transformation. We observe that both transformations yield strongly correlated values of H and C.

Higor Y. D. Sigaki, Matjaž Perc, and Haroldo V. Ribeiro 3 of 12 10 3 Image width Image height

10 4

width = 895 [313, 2491]95% 5 10 height = 913 [323, 2702]95% Probability distribution

10 6 0 1000 2000 3000 4000 Image dimensions

Fig. S3. Probability distribution of image dimensions. The red and blue curves show the probability distributions of the widths and heights of all images in our dataset on a log-linear plot. It can be observed that the width and height have a similar distribution and practically the same average value (895 pixels for width and 913 pixels for height). The shaded regions represent the intervals of width and height containing 95% of all images.

4 of 12 Higor Y. D. Sigaki, Matjaž Perc, and Haroldo V. Ribeiro Fig. S4. Complexity measures H and C are uncorrelated with image dimensions. The scatter plots depict the values of H (left panels) and C (right panels) versus the √ image length defined as the square root of the image area (that is, nxny , where nx is the image width and ny is the image height). The first row shows the relationship on a linear scale, the second on a linear-log scale, and the third row on a log-log scale. Each dot represents an image in our dataset. We observe no correlations between the complexity measures and image length. In particular, the Pearson linear correlation is ≈ 0.05 for the relationship between the image length and H, and ≈ 0.01 for C. Also, no significant correlation is detected by the maximal information coefficient (MIC), whose values are ≈ 0.07 for both relationships. This analysis indicates that our results obtained with embedding dimensions dx = dy = 2 are not biased by image dimensions.

Higor Y. D. Sigaki, Matjaž Perc, and Haroldo V. Ribeiro 5 of 12 Post-Impressionism (Modern) Naïve Art () Northern Art Informel Ukiyo-e (Late Renaissance) Early Renaissance Academicism Neo-Expressionism Contemporary Realism Concretism Nouveau Réalisme (New Realism) Neo-Romanticism Hard Edge Painting Post-Minimalism Ink and wash painting S saku hanga Naturalism Shin-hanga Luminism Neo-Dada Fantastic Realism Art Brut American Realism Proto Renaissance Light and Space Post-Painterly Abstraction Feminist Art Neo-Minimalism Neo-Pop Art Pictorialism Metaphysical art New European Painting Cubo-Futurism Neoplasticism Kitsch Muralism Spatialism Neo-baroque Zen Neo-Geo P&D (Pattern and Decoration) Intimism Byzantine Neo-Rococo 102 103 104 Total by style

Fig. S5. Image distribution among different artistic styles in our dataset. The barplot shows the number of images for all the 92 different styles that have at least 100 images each.

6 of 12 Higor Y. D. Sigaki, Matjaž Perc, and Haroldo V. Ribeiro Zen 0.14

0.13 Pop Art Kitsch New European Painting Precisionism

Lyrical Abstraction Purism Contemporary Realism Rococo 0.12 Feminist Art S saku hanga Classicism Dada Neo-Rococo Abstract Art Orphism

Complexity, C Suprematism Ink and wash 0.11 Abstract Expressionism painting Art Informel Neo-Expressionism

Art Brut Regionalism

0.10 Art Nouveau (Modern)

0.84 0.85 0.86 0.87 0.88 0.89 0.90 Entropy,H

0.25

Light and Space

Hard Edge Painting 0.20 Minimalism Neo-Minimalism Kinetic Art Color Field Painting Concretism Neo-Geo Spatialism Neoplasticism Post-Painterly Abstraction Post-Minimalism Tenebrism Conceptual Art Op Art 0.15 Neo-Dada Naturalism Art Deco Constructivism Neo-Pop Art Complexity, C

0.10

0.05

0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 Entropy,H

0.115 0.090 Post-Impressionism Mannerism (Late Renaissance) Intimism Cubo-Futurism Photorealism 0.110 0.085 Luminism Ukiyo-e Northern Renaissance Fauvism Baroque Biedermeier High Renaissance Symbolism 0.105 Nouveau Réalisme (New Realism) Early Renaissance 0.080 Cloisonnism Neoclassicism Pictorialism Academicism Impressionism 0.100 Byzantine Shin-hanga Fantastic Realism 0.075 Magic Realism Street art Socialist Realism Romanticism P&D (Pattern and Decoration) Muralism 0.095 International Gothic American Realism Orientalism 0.070 Action painting Surrealism Expressionism Proto Renaissance Pointillism Divisionism Complexity, C Complexity, C Neo-Romanticism Cubism Tonalism 0.090 Social Realism Tachisme Realism 0.065 Naïve Art (Primitivism) 0.085 Futurism Neo-baroque Metaphysical art 0.060 0.080 0.055 0.900 0.905 0.910 0.915 0.920 0.925 0.930 0.930 0.935 0.940 0.945 0.950 Entropy,H Entropy,H

Fig. S6. Distinguishing among different artistic styles with the complexity-entropy plane. The colored dots represent the average values of H and C for every style in our dataset. Error bars represent the standard error of the mean. The insets highlight three different regions of the plane for better visualization. All 92 styles having at least 100 images are shown in this plot.

Higor Y. D. Sigaki, Matjaž Perc, and Haroldo V. Ribeiro 7 of 12 Fig. S7. The average values of H and C are statistically significantly different among most styles. The matrix plot shows the outcome of the bootstrap two-sample t-test that compares the differences between the average values of H and C among all possible pairs of styles. We have also considered the Bonferroni correction in order to account for the multiple hypothesis testing. The yellow cells indicate pairwise comparisons where the null hypothesis is rejected at 95% confidence (that is, there is a significant difference between the values of H and/or C between the two styles), while the purple cells indicate pairwise comparisons where the null hypothesis cannot be rejected (that is, no significant difference between the values of H and/or C is observed between the two styles). We note that the null hypothesis is rejected in 91.7% of pairwise comparisons.

8 of 12 Higor Y. D. Sigaki, Matjaž Perc, and Haroldo V. Ribeiro 0.60 (0.03, 0.57)

0.55

0.50

0.45

Silhouette coefficient 0.40

0.35 0.05 0.10 0.15 0.20 Distance threshold

Fig. S8. Silhouette coefficient of clusters obtained by cutting the dendrogram of Figure 3B at different distance thresholds. This coefficient quantifies the quality of the clustering analysis. Its value is between −1 to +1, and the higher the value, the better the match among styles within a cluster in comparison to the neighboring clusters. Thus, by finding the distance threshold that maximizes the silhouette coefficient, we are maximizing the quality of the clustering obtained from the dendrogram. It can be observed that the silhouette coefficient has a maximum value (0.57) at the distance threshold of 0.03. We have thus used this value to cut the dendrogram and define the number of clusters in Figure 3B.

Higor Y. D. Sigaki, Matjaž Perc, and Haroldo V. Ribeiro 9 of 12 HgrY .Sgk,Mta ec n aod .Ribeiro V. Haroldo and Perc, Matjaž Sigaki, D. Y. Higor 12 of 10 ewe n 0 ( 100 and 0 between The texts. v Wikipedia the complexity-entropy from distance the obtained homogeneity ( threshold from clusters the completeness the obtained perfect by at clusters hand, quantified dendrogram 14 other be the the the can cutting than On which by more similarities, obtained considerably share clusters is clusters which the ( of clusters, indicate homogeneity groups style labels both different However, colored 24 3). The yields (Figure dendrogram. approach of plane (TF-IDF) the This “distance” frequency as maximum coefficient. document well the silhouette frequency-inverse as at the term matrix are maximizing the “distance” keywords using this common processed shows no were having plot texts styles matrix These Thus, page. styles. Wikipedia two its the from pages. between content Wikipedia textual top- corresponding the the the obtained from and have extracted approach, keywords we the images, to 100 according least styles at of organization Hierarchical S9. Fig. 0 = . 44 hs ausaesgicnl agrta hs bandfo ulmdlweetenme fsae ewrsi admycoe rmauiomdistribution uniform a from chosen randomly is keywords shared of number the where null a from obtained those than larger significantly are values These . h 1 = h mle htalcutr bandfo h iiei et oti nysye eogn otesm lsesotie rmtecmlxt-nrp plane. complexity-entropy the from obtained clusters same the to belonging styles only contain texts Wikipedia the from obtained clusters all that implies ) rand 100 0 = ewrswr bandfrec tl.W hsdfiete“itne ewe w tlsa h nes f1pu h ubro hrdkeywords shared of number the plus 1 of inverse the as styles two between “distance” the define thus We style. each for obtained were keywords . 42 Neo-Expressionism Neo-Romanticism

± Neo-Dada Nouveau Réalisme (New Realism) Pop Art 0 Kitsch .

02 Photorealism Pictorialism Light and Space , c Op Art c Concretism rand 1 = Neoplasticism Abstract Art

v Cubism 0 = mle htalsye eogn otesm lse bandfo h opeiyetoypaeaegopdi h same the in grouped are plane complexity-entropy the from obtained cluster same the to belonging styles all that implies ) mauei h amncma between mean harmonic the is -measure Orphism Divisionism Pointillism

. Fauvism

35 Impressionism Post-Impressionism Conceptual Art ± Minimalism Post-Minimalism

0 P&D (Pattern and Decoration)

. Feminist Art 01 Kinetic Art Byzantine

and , International Gothic High Renaissance Proto Renaissance Baroque Neo-baroque v Neoclassicism rand Early Renaissance Northern Renaissance Classicism

0 = Naturalism Realism Tenebrism Luminism . Cloisonnism 38 Tonalism Neo-Minimalism

± Neo-Pop Art Biedermeier Purism 0 Neo-Rococo .

01 Rococo Art Deco h Art Nouveau (Modern) vrg ausoe 0 needn realizations). independent 100 over values average –

and Regionalism Muralism Street art Precisionism

c Hard Edge Painting

htis, that , Lyrical Abstraction Post-Painterly Abstraction 1 Abstract Expressionism hl tlssaigsvrlkyod r tacoe dsac” The “distance”. closer a at are keywords several sharing styles while , Color Field Painting Action painting Art Informel Tachisme

v Social Realism Socialist Realism 2 = Fantastic Realism Contemporary Realism American Realism

hc/ Magic Realism Cubo-Futurism h Suprematism

completeness , Futurism ( Spatialism h New European Painting Dada + Surrealism Constructivism c Expressionism ) Intimism u eut yields results Our . Romanticism Symbolism Art Brut Neo-Geo o aho h 2dfeetsye having styles different 92 the of each For Mannerism (Late Renaissance) Academicism c Metaphysical art n the and , Naïve Art (Primitivism) Ukiyo-e Shin-hanga S saku hanga Orientalism Ink and wash painting Zen v Zen washpainting Ink and Orientalism sakuhanga S Shin-hanga Ukiyo-e Naïve Art(Primitivism) Metaphysical art Academicism Mannerism (LateRenaissance) Neo-Geo Art Brut Symbolism Romanticism Intimism Expressionism Constructivism Surrealism Dada New EuropeanPainting Spatialism Futurism Suprematism Cubo-Futurism Magic Realism American Realism Contemporary Realism Fantastic Realism Socialist Realism Social Realism Tachisme Art Informel Action painting Color FieldPainting Abstract Expressionism Post-Painterly Abstraction Lyrical Abstraction Painting Hard Edge Precisionism Street art Muralism Regionalism Art Nouveau(Modern) Art Deco Rococo Neo-Rococo Purism Biedermeier Neo-Pop Art Neo-Minimalism Tonalism Cloisonnism Luminism Tenebrism Realism Naturalism Classicism Northern Renaissance Early Renaissance Neoclassicism Neo-baroque Baroque Proto Renaissance High Renaissance International Gothic Byzantine Kinetic Art Feminist Art P&D (PatternandDecoration) Post-Minimalism Minimalism Conceptual Art Post-Impressionism Impressionism Fauvism Pointillism Divisionism Orphism Cubism Abstract Art Neoplasticism Concretism Op Art Light andSpace Pictorialism Photorealism Kitsch Pop Art Nouveau Réalisme(NewRealism) Neo-Dada Neo-Romanticism Neo-Expressionism mauemtis Perfect metrics. -measure h

0 = Distance . 49 0.2 0.4 0.6 0.8 1.0 , c 0 = . 40 and , Nearest Neighbors

Training score 0.5 A B Cross-validation score 0.185

0.4 0.180

0.3 Scores Scores 0.175

0.2 0.170 Training score Cross-validation score 0.1 0.165 0 100 200 300 400 500 0.2 0.4 0.6 0.8 Number of neighbors Training size Random Forest

1.0 1.0 0.195 C Training score Training score D E Training score Cross-validation score Cross-validation score Cross-validation score 0.8 0.8 0.190

0.6 0.6 0.185 Scores Scores Scores 0.180 0.4 0.4 0.175 0.2 0.2 0.170 0.0 0 20 40 60 80 100 0 10 20 30 0.2 0.4 0.6 0.8 Number of trees Max. depth Training size Support Vector Machine (RBF)

0.275 Training score Training score 0.35 Training score Cross-validation score Cross-validation score F Cross-validation score G 0.17 H 0.250 0.30

0.225 0.25 0.16 Scores Scores Scores 0.200 0.20 0.15

0.175 0.15 0.14 0.150 0.10 100 102 104 106 10 4 10 2 100 0.2 0.4 0.6 0.8 SVC SVC C Training size Neural Network

0.18 0.19 I Training score J Cross-validation score 0.18 0.17 0.17

0.16 0.16 Scores Scores 0.15 0.15 0.14 Training score Cross-validation score 0.14 0.13

10 6 10 4 10 2 100 0.2 0.4 0.6 0.8 Neural network Training size

Fig. S10. Training and cross-validation scores obtained with the four different machine learning algorithms used for predicting styles as a function of their main parameters and the training size. Panels (A) and (B) show the results for the nearest neighbors algorithm. Panels (C), (D) and (E) show the scores for the random forest algorithm. The main parameters, in this case, are the number of trees in the forest and the maximum depth of the trees (Max. depth). Panels (F), (G) and (H) show the scores for the support vector classification (SVC) with the radial basis function kernel (RBF). The parameter γ is associated with the width of the RBF, and C0 is the penalty parameter. Panels (I) and (J) show the results for a neural network algorithm. The parameter α is the so-called L2 penalty, and the number of neurons is equal to 100. The average accuracies reported in Figure 4 where obtained for = 400 neighbors; γ = 104 and C0 = 0.1; number of trees = 400 and Max. depth = 5; and α = 10−4. All algorithms are implemented in the scikit-learn library, and the learning curves are estimated with the best tuning parameters of each learning model.

Higor Y. D. Sigaki, Matjaž Perc, and Haroldo V. Ribeiro 11 of 12 0.12

0.08 Accuracy 0.04

0.00

Random Forest Neural Network Nearest Neighbors Dummy (Stratified)Dummy (Uniform)

Support Vector Machine (RBF)

Fig. S11. Accuracies of learning methods when considering the 92 different artistic styles with more than 100 images each. Depicted is the comparison between the four different statistical learning algorithms (nearest neighbors, random forest, support vector machine, and neural network), and the null accuracies obtained from two “dummy” classifiers. Error bars represent the standard error of the mean. The four classifiers have similar accuracies (≈13%) and significantly outperform the “dummy” classifiers. The parameters of the learning methods are the same as those used in Figure 4.

12 of 12 Higor Y. D. Sigaki, Matjaž Perc, and Haroldo V. Ribeiro