SUPPLEMENTARY INFORMATIONLetters https://doi.org/10.1038/s41562-019-0570-1

In the format provided by the authors and unedited.

Large-scale quantitative profiling of the verse tradition

Leonard Neidorf 1,7, Madison S. Krieger 2,7*, Michelle Yakubek3,4, Pramit Chaudhuri 5 and Joseph P. Dexter 6*

1Department of English, Nanjing University, Nanjing, China. 2Program for Evolutionary Dynamics, Harvard University, Cambridge, MA, USA. 3Research Science Institute, Center for Excellence in Education, McLean, VA, USA. 4Texas Academy of Mathematics and Science, Denton, TX, USA. 5Department of Classics, University of Texas at Austin, Austin, TX, USA. 6Department of Systems Biology, Harvard Medical School, Boston, MA, USA. 7These authors contributed equally: Leonard Neidorf, Madison S. Krieger. *e-mail: [email protected]; [email protected]

Nature Human Behaviour | www.nature.com/nathumbehav 10000 5000

1000 500

100

length of poem (words) 50

0 50 100 150 200 250 300 350 ordinal index of poem

Supplementary Figure 1. Distribution of poems in the OE verse corpus by length. The OE verse corpus contains approximately 350 poems of total length 291,000 words. Aside from a small number of more substantial works, the vast majority of texts in the corpus contain fewer than 1,000 words.

1/6 a 0.30

0.25

0.20 | c i, t - f i,

f 0.15 | 1 5 Σ i= 0.10

0.05 Psalm 118 Maxims II 0 0 20,000 40,000 60,000 80,000 100,000 120,000 length of text (characters) b 0.20

0.15 | c i, t - f i, 0.10 f | 1 5 Σ i=

0.05 Widsith Psalm 118 Maxims II 0 0 20,000 40,000 60,000 80,000 100,000 120,000 length of text (characters)

Supplementary Figure 2. Phonetic profiling using bigrams and four-grams. Plot of cumulative difference in functional n-gram frequency (for the five most common n-grams) against text length for a bigrams and b four-grams. As in Figure 1, each dot denotes one text, and anomalous texts are highlighted and labeled.

2/6 50

20

10

5

frequency of compound word 2

1 0 500 1000 1500 ordinal index of compound word

Supplementary Figure 3. Distribution of nominal compounds in the OE verse corpus. Most compounds are hapax legomena (bottom line).

3/6 a 1200 Genesis A1 Genesis A2 1000 31.5

800 19.5

600 16.4

400 line of compound incidence 200

0

0 10 20 30 40 ordinal index of hapax compound b 120 A 7.5 Exodus B 100 Exodus C 6.2

80

60 4.8

40 line of compound incidence 20

0

0 5 10 15 ordinal index of hapax compound c

Christ and 55.5 3000 2500 Exodus 2000 30.3 27.4 1500 24.6 1000 14.3 line of compound incidence 500 9.2

0

0 10 20 30 40 50 60 ordinal index of hapax compound

Supplementary Figure 4. Usage of hapax compounds differs between authors. Rate of use of compounds in a three sections of the composite poem Genesis (Pearson’s r(36) = 0.973,r(32) = 0.981, and r(22) = 0.990 for A1, A2, and B, respectively), b three random partitions of Exodus, which is believed to be of unitary authorship (Pearson’s r(10) = 0.995,r(10) = 0.971, and r(13) = 0.998 for A, B, and C, respectively), and c a selection of longer poems, some written by (Pearson’s r(14) = 0.968,r(21) = 0.934,r(47) = 0.992,r(69) = 0.997,r(229) = 0.992, and r(62) = 0.990 for , Juliana, Elene, Andreas, Beowulf, and Exodus, respectively). p < 0.001 for all correlations by a two-tailed t-test. Numbers next to linear fits denote their slope.

4/6 a

6

5

4

3 distance

2

1

0 45 37 34 22 29 17 46 24 28 44 20 11 19 21 49 18 47 36 43 26 16 30 40 48 39 25 33 23 0 7 14 2 5 1 6 3 4 9 8 27 13 35 38 10 12 31 42 41 15 32 text number b 10

8

6

distance 4

2

0 13 28 30 22 29 45 34 42 35 25 33 43 44 46 18 47 49 17 23 21 20 40 24 27 41 2 5 4 3 9 31 10 36 26 32 12 38 48 8 7 0 11 37 14 1 6 19 15 16 39 text number Supplementary Figure 5. Additional dendrograms. Dendrograms produced from hierarchical agglomerative clustering with a functional bigrams and b functional four-grams. The numbering and color scheme for the texts is the same as in Figure 4 and corresponds to the labels in Supplementary Table 1.

5/6 Supplementary Table 1. List of texts as numbered in Figure 4 and Supplementary Figure 5.

Label Poem 0 Genesis 1 Beowulf 1-2300 2 Andreas 3 Christ 4 Guthlac 5 Elene 6 Beowulf 2301-end 7 Daniel 8 Christ and Satan 9 Juliana 10 The Phoenix 11 Exodus 12 13 Paris Psalm 118 14 15 16 The Judgment Day II 17 Meters of Boethius 20 18 Maxims I 19 The Menologium 20 The Seasons for Fasting 21 Azarias 22 Paris Psalm 77 23 The 24 Psalm 50 25 I 26 Widsith 27 The Descent into 28 Paris Psalm 88 29 Paris Psalm 105 30 The Lord’s Prayer II 31 The Judgment Day I 32 33 Soul and Body II 34 Paris Psalm 106 35 Resignation 36 The Wanderer 37 Paris Psalm 104 38 Fates of the Apostles 39 Meters of Boethius 26 40 The Gifts of Men 41 The Order of the World 42 Paris Psalm 68 43 Riddle 40 44 Metrical Charm 1 45 Paris Psalm 103 46 Meters of Boethius 11 47 The Fortunes of Men 48 The 49 Meters of Boethius 29

6/6