<<

The Greene Corpus | Plays by 1

Alphonsus Friar Bacon and Friar Bungay James IV Orlando Selimus 1 Tamburlaine 2

What you see are the classification results of the Greene and Marlowe corpora. It’s a survey, the nature of which requires that superfluous information is left aside, but it is equally indispensible to know what the columns and lines actually represent. The first column of each play returns the nsc classification (nearest shrunken centroid) based on words, the second column gives the svm classification of words (support vector machine), the third uses the delta classifier (Burrowsian). This pattern continues to the right, but the next three columns are based on character bigrams, and the last three columns of each play refer to character trigrams, which are probably more reliable, and it is also said that the svm classifier has a high decision level whereas nsc jumps very quickly to attributions. All classifications were carried out with windows of 8000 words. Smaller windows tend to produce a large number of outliers and create a confusing picture. According to the 8000-word window the first measurement (line 1 of each play) is noted at 4000 words and the distance to line 2 is 250 words which means that the length of the play can be calculated from the last line to which another 4000 words must be added. There may be a small trail of words in the plays that was not considered. The sequence of classified plays is such that the Greene corpus was surveyed first, then you encounter a number of plays which are Plays by 2

Locrine The Battle of Alacazar David and Bethsabe Cornelia arrangement of classifiers in the charts stylistically linked with Marlowe, even though they contain quite a number of Greene signals. But as we come to the remaining Marlowe corpus on page 3 it becomes clear that the clarity of Marlowe attributions as displayed from Tamburlaine 1 down to Kyd’s Cornelia has disappeared.

In some cases classifiers contradict each other. Edward II for example shows a pattern where nsc results have a majority in the upper and the lower division of the chart indicating , whereas svm has a clear penchant for Greene in all three types of variables. Delta, on the other hand, returns mainly Kyd and Marlowe in the word section, Shakespeare in character bigrams, and Shakespeare and Rowley in character trigrams. It is not surprising that results vary to such a degree as the mathematical kernels of the classifiers concentrate on different patterns. The only safe assumption in the case of Edward II is however that the very small number of Marlowe signals supports the view that the play was not written by Marlowe. The other nominal Marlowe plays on page 3 don’t even have significant Marlowe sections in any of the classifiers. Judging by averages and majorities much of the style of , Queen of Carthage comes from Greene, would be Shakespeare’s work, would have played a major role in the writing of , and the two extant versions of Dr. Faustus are largely based on works by Greene.

In Greene’s corpus it is The Scottish History of James IV which does not replicate the typical Greene pattern as displayed in Alphonsus, Friar Bacon and Friar Bungay, and Orlando Furioso. Instead it is George Chapman who seems to have had a tremendous interest in historical themes. In addition Selimus is not at all clear as a Greene play.

The strong links of the two Tamburlaines with the anonymous of , with Peele’s The Battle of Alcazar and The Love of David and Bethsabe as well as Kyd’s Cornelia suggest that these plays are in fact by Marlowe.

3 The nominal Marlowe Corpus

Dido, Queen of Carthage Edward II (upper) Edward II (lower) The Jew of Malta The Massacre at Paris A Doctor Faustus B

It is obvious that the findings displayed in these charts must be underpinned by further methodical steps like for example Rolling Delta analyses. Matching n-grams and collocations between the texts can also be followed, and there is quite a number of secondary dealing with the texts above. Some of it can be found on this website.

But before we come to that it is necessary to go through the files again, but this time Greene’s plays The Scottish History of James IV and Selimus are removed from the list of reference texts as their Greene attribution is not that clear any more. As we come to the Tamburlaines on page 4 the Marlowe indications are stronger than before, and the Marlowe signals also grow in Locrine,The Battle of Alcazar, David and Bethsabe and finally Kyd’s Cornelia, which has now more Kyd signals as well.

The remaining Marlowe corpus on page 5 now shows a reduced number of Greene attributions. Shakespeare’s share, however, grows in Edward II, and The Jew of Malta now has a clear Shakespeare attribution. In The Massacre at Paris, Chapman’s part becomes more obvious, as Greene disappears. The Doctor Faustus texts, too are reduced in their Greene assignments, but apparently Rolling Classify cannot decipher the assortment of smaller textual contributions. The most important aspect is however that none of the texts is in any way related to the two Tamburlaines. The Marlowe cells in Dido and Edward II are far too few (see page 5).

Plays by Christopher Marlowe

4

Tamburlaine 1 Tamburlaine 2 Locrine The Battle of Alcazar David and Bethsabe Cornelia

5 The nominal Marlowe Corpus without the impact of James IV and Selimus as reference texts

Dido, Queen of Carthage Edward II The Jew of Malta The Massacre at Paris Dr. Faustus (A)

The chart with the set of problems on page 6 summarizes the present situation of this investigation. The thesis is that the Marlowe corpus consisting of plays no 1 to 7 is stylistically not homogeneous. This is a somewhat clear result of Rolling Classify (RC) analyses listed under methodology. Instead we find a number of files with Marlovian style and a remainder of corpus files under non-Marlovian style. The next task now is to employ the findings of Rolling Delta (RD). My publications up to “Christopher Marlowe: Hype and Hoax” made use of selected reference texts that were believed to be relevant in the authorship determination of the corpus. This, naturally, had a high degree of subjectivity, and it became necessary to use a large number of reference texts that were sole-authored and well-attributed. Now it was the program that extracted the lowest delta values from each of the reference text windows to indicate the play with the smallest stylistic difference from the target text. The window size was 4000 or 5000 words and the step size was 250 words which means that every 250-word segment was furnished with an authorial preference. The disadvantage, however, was that it took the computer about 3 to 4 hours to go through all the files. The set of problems

For this reason the chosen variables were character trigrams (mf3c) for 6 their reliability as compared to character bigrams (mf2c) and words (mf1w). The delta values that were detected by R Stylo were then transferred into a spreadsheet that contained in column A about 100 play names, and in column B the measurements of the first window. Column C then contained the next window measurement that had a 250-word overlap. This continued to the right according to the length of the target play. To give an example: If the window size was 4000 words, the first measurement of that window was noted at 2000 words (column B), the next at 2250 words (column C), the following at 2500 words (column D) and so on. In another step column B was subjected to conditional formatting in which the lowest delta of all was highlighted in green, the second-lowest in yellow, and the third-lowest in red. This was applied to all columns (except A of course). Then plays (rows) with no highlighting at all were eliminated, and what was left was then transformed by 90 ° into a table conveying the sequential attributions of the target play.

One example of the evaluation is given on page 7. It concerns the RD attributions of Tamburlaine, part 1. Column A recalls the number of words of the play at a distance of 250 words, starting at 2000 words, which means that the window size is 4000 words. Out of a hundred reference texts the three lowest delta values belong to Tamburlaine 2 (column E), Peele’s The Battle of Alcazar (column F), and Locrine (column A). Each line gives the next measurement and at the bottom of the table we find the percentage attributions. Column H indicates the scenes and acts of the play and their accumulated lengths, optimally adapted to the word counts of column A.

As could be expected the best suited reference text is Tamburlaine 2. This could have been predicted by anyone, but the noteworthy aspect is that the selection was done on the basis of the lowest delta values, extracted by RD from about a hundred reference texts. Furthermore we note that the other reference texts that are related to Tamburlaine 1 in terms of style are Locrine, Kyd’s Cornelia, and Peele’s The Battle of Alcazar, a confirmation of the findings of RC above. Selimus and Peele’s Edward I are doubtful as sole-authored plays and have to be regarded as hardly meaningful.

In subsequent presentations it would be very space consuming to reproduce each time the complete attribution scheme of each of the discussed plays. Suffice it therefore to account only for the final section of each evaluation which gives the number of rankings and the percentage of the highest ranking reference texts. This can further be supported by Pervez Rizvi’s Attribution Tester which makes us of maximal N-grams that plays have in common. There is in particular one feature which allows for authorship allocations and thus influences the choice of plays from which n-grams are taken. According to our present findings the Marlovian plays on page 6 are all registered as Marlowe plays, Dido and Dr. Faustus appear tentatively under Greene’s name, Edward II and The Jew of Malta are connected with Shakespeare, and The Massacre at Paris with Chapman. With these provisions the weighted n-gram matches between plays are then sorted and return the ranking of the most likely authors (see Tambulaine 1 result on page 8. 7

Tamburlaine 1

8

As we come to Tamburlaine 2 RD provides the following summary of lowest deltas in the respective reference texts among which three plays remained which are most probably not sole-authored, namely Greene’s Selimus, Lodge’s A Looking Glass for and , and Peele’s Edward I.

Tamburlaine 2

RD results of Tamburlaine 2 and ranking of weighted n-gram matches that plays have in common

With Locrine on page 9 RD returns a distinct result with Tamburlaine 2 as the most suitable reference text. But weighted n-gram matches prefer Robert Greene in unique matches, in unique tetragram+ matches, in unique tetragram+ matches and in tetragram+ matches, but Marlowe shows higher values in unique trigram+ matches and in tetragram+ matches. For the differences and definitions of terms in columns D to I please see Pervez Rizvi’s homepage http://www.shakespearestext.com/can/index.htm and in particular the section Collocations and N-grams. Locrine 9

Peele’s The Battle of Alcazar

Both RD and matching n-grams give a clear Marlowe authorship. Peele’s The Love of David and Bethsabe 10

RD returns Tamburlaine 2 as the best suited reference text and as far as matching n-grams are concerned Marlowe is top in three of the five criteria.

Thomas Kyd’s Cornelia

Both RD and the maximal n-gram tester bring Kyd into play, even though the RD percentage is very small and Marlowe is ahead in terms of all n-gram matches and trigram+ matches.

In all four plays on pages 9 and 10 the weighted number of n-grams always see Christopher Marlowe in prominent position, sometimes together with Robert Greene as in Locrine, sometimes with as in Cornelia. In the remaining official Marlowe corpus presented on pages 11 to 13 it is only Edward II which records Marlowe with higher numbers. Apparently the Marlowe sections displayed by RD in Dido and Edward II as well make themselves felt. Dido, Queen of Carthage 11

Unfortunately the n-gram tester and my version of EXCEL did not agree with each other. There was confusion because of the comma in the play’s title. Various workarounds were of no avail.

Edward II

The Jew of Malta 12

The Massacre at Paris

Doctor Faustus (A) 1604 13

The oppose-function is another feature of R Stylo. It ‘performs a contrastive analysis between two given sets of texts’ by generating ‘a vector of words significantly preferred by a tested author, and another vector containing the words significantly avoided.’ (stylo_howto.pdf, p. 26) The following chart contrasts the remaining official Marlowe corpus with the two Tamburlaines, and simultaneously tests the Marlowe plays from page 9 and 10. Their position between the preferred vocabulary of the official Marlowe corpus (top left) and the avoided vocabulary of the two Tamburlaines (bottom right) is most interesting. They seem to hold a somewhat middling position, but if the scaling is taken into account it becomes clear that the tested files The Battle of Alcazar, Cornelia, The Love of David and Bethsabe and Locrine have only 34 points in common with the

Doctor Faustus (B) 1616 preferred vocabulary of the primary set, but as far as avoided vocabulary is concerned they are in a 14 region of 49 +. Last but no least R Stylo provides a couple of functions like the bootstrap consensus tree, cluster analysis, and principal component analysis which were applied to the files used so far. The first tree is based on words:

The Marlowe plays from pages 2, 4 and 9 and 10 are clearly linked with the two Tamburlaines and the same outcome can be seen when character trigrams (mf3c are 15 used. (see chart on the left) It is no surprise that Cluster Analysis reproduces exactly the same relations between the texts to which MF3C results can be added (see page 16).

16

The comparison between Cluster Analysis results and those of Multidimensional Scaling of Delta, also carried out with MF3C, reports identical relations when in the latter the vertical position are accounted for. Alcazar, Tamburlaine 1 and Tamburlaine 2 are even printed into each other.

17

Principal Component Anylsis (PCA) also gives a clear verdict in the sense that we have two clearly distinct corpora one of which is composed of the official Marlowe corpus, and the other comprises the two Tamburlaines as Marlowe’s original work and the files with the heading “test_”. Scholarship and Learning

The Tragedy of Locrine

It is more than obvious that the results presented in all brevity on these pages do not correspond to what people have learned at school and in universities. Even when there is no clear attribution as in the case of The Tragedy of Locrine a multitude of possible relations and similarities are given. We learn that due to the title page of the 1595 and its reference to “W.S” the play was added to Chetwinde’s Shakespeare Third Folio in 1664, a manuscript note by Sir George Buck referred to his cousin Charles Tilney as author of the play, and parallels in the plot elements and verbal phrases in both Locrine and Selimus (1594) furthered speculations that the author of Selimus must have borrowed from Locrine. The opposite was also confirmed and some critics saw the same author at work in both. Generally seen as a play by Robert Greene and possibly , Selimus was not fully investigated here, but when RC classified the play from about a hundred reference texts (see page 1) there was no indication of Lodge, and the main attributions referred to Greene and Kyd. The attribution of Locrine to Marlowe then must rest on the findings of RC, RD and corresponding n-grams (page 9) as well as the accounts of consensus trees, cluster analyses, multidimensional scalings and principal component analyses (pages 14 to 17). Locrine was also extensively tested in the determination of Cornelia as a Marlowe play (see Hartmut Ilsemann, "Forensic Stylometry", Digital Scholarship in the Humanities, Volume 34, Issue 2, 2 June 2019, 335-349, <="" a="">doi.org/10.1093/llc/fqy023.) The Battle of Alcazar 18 ’s The Battle of Alcazar, which is believed to have been staged under the title Muly Molucco, was performed by Lord Strange’s Men between February 1592 and January 1593, so that 1591 has become the probable date of writing. The attribution to Peele is questionable (Edelmann, p. 16), and Chambers names the anthology England's Parnassus (1600) as the source of that apparently faulty attribution (Chambers, vol. III, p. 459- 60). When in 1999 Brian B. Ritchie dealt with this play in his thesis The Plays of Christopher Marlowe and George Peele: Rhetoric and he stated:

The Battle of Alcazar clearly shows the influence of Marlowe’s Tamburlaine: the resolution and drive of Sebastian; the aspirations of Stukley; and the prominence of the exotic, of pageantry, and of scenic effects, are all reminiscent of Marlowe’s heroic . Above all, the choice of the medium and even the diction reveal the influence of Marlowe (Ritchie, p. 69). In his footnote on the same page he refers to more background information:

See Cheffaud, pp. 75-78. Writing of Peele’s approach, Cheffaud comments on ‘la magnificence des ses tableaux, la rapidité de son action et le ton declamatoire des son style, en un mot par l’adoption sans réserve de tous les proceeds marlowesque’ (p.75) […] It is instructive, as an example of adaptation, to see just how Peele uses the characteristically Marlovian theme of aspiration to regal pomp. Kingship and the symbol of the crown as the object of aspiration are chief concerns of Tamburlaine; he says such things as: ‘Is it not passing brave to be a king, / And ride in triumph through Persepolis?’ […] and ‘That perfect bliss and sole felicity, / The sweet fruition of an earthly crown.’ […] Peele seizes upon the words ‘crown’ and ‘king’ in a speech he gives to Stukley:

There shall no action passe my hand or sword, Deeds, words and thoughts shall all be as kings, King of a mole-hill had I rather be, That cannot make a step to gaine a crowne, My chiefest companie shall be with kings, Than the richest subject of a monarchie. No word shall passe the office of my tong. And my deserts shall counterpoise a kings, Huffe is brave minde, and never cease t’aspire, That sounds not of affection to a crowne. Why should not I then looke to be a king? Before raigne soul king of thy desire. No thought have being in my lordly brest, I am the marques now of Ireland made, (2.3.452) That works not everie waie to win a crowne, And will be shortly king of Ireland,

The observations of Cheffaud and Ritchie illustrate how difficult it is for traditional scholarship to detect the hidden truth behind stylistic similarities. With the non-traditional stylometry tools of R Stylo employed in this presentation the Marlovian character of The Battle of Alcazar becomes more than obvious. The Love of David and Bethsabe

Annaliese Connolly (2007) calculates and extensively demonstrates that Peele’s David and Bethsabe bears traces of Marlovian influence. She sees 19 it in line with biblical drama which according to Blistein ‘as a whole seemed to interest neither the Elizabethan dramatists nor his audience’ (Blistein 1970, 174). Even though Peele’s David and Bethsabe was entered in the Stationers' Register in May 1594 its first quarto was only printed in 1599 when Peele had been dead for three years. Its performance is doubtful as Chambers reports:

Of one other play by Peele it is difficult to take any account in estimating evidence as to staging. This is David and Bethsabe, of which the extant text apparently represents an attempt to bring within the compass of a single performance a piece or fragments of a piece originally written in three discourses (Chambers, vol. III, 48).

To which he adds:

… but the provenance of David and Bethsabe is so uncertain and its text so evidently manipulated, that it would be very temerarious to rely upon it as affording any proof of public usage (Chambers, 118).

How the play came to the printers is not known, but when Adam Islip printed its quarto in 1599 an established pattern may have been used, namely to give the name of a deceased, but lucrative author on the title page. That it is not Marlowe’s name may have to do with Thomas Beard’s disastrous The Theatre of God's Judgements which had come out in 1597.

Objections to an unbelieving atheist author writing biblical drama are only an external contradiction. Connolly confirms that ‘in fact it is the king's relationship with his sons, particularly Absalom, with which the play is most concerned.’ (§2) Otherwise, she maintains, it followed Marlowe’s leaning towards exotic locations, charismatic protagonists and stage spectacle entirely.

If we draw the findings of this section together we can state that the playwright who wrote the plays Tamburlaine part 1 and part 2 has a strong stylistic presence in other plays of the time, namely the anonymous Tragedy of Locrine, Kyd’s Cornelia, a closet play, and Peele’s The Battle of Alcazar and David and Bethsabe. There is much in favour of the conjecture that Peele’s plays were wrongly attributed.

Cornelia

Ever since Thomas Kyd published his closet drama1 Cornelia early in 1594, which he had translated from the French original Cornélie published in 1574 by Robert Garnier, there has never been any doubt about his authorship. Quite to the contrary, his dedication to the Countess of Sussex in which he bids for aristocratic patronage is very explicit:

1 A play to be read rather than acted. I shall beseech your Honour to repaire, with the regarde of those so bitter times, and priuie broken passions that I endured in the writing it.’ 2

Kyd was to die a couple of months later, a consequence of brutal tortures instigated by the Privy Council in the attempt to find the source of mutinous libels that had been posted around London in May 1593. The writings that the authorities had found in Kyd’s possession belonged to 20 Christopher Marlowe with whom Kyd shared lodgings, while they were both in their patron’s service. Marlowe, however, was killed in on 30 May, stabbed to death by fellow government agent .

Kyd was best known for his most popular play (1587), and it is certainly surprising that he now turned his attention to a play that was never meant to be performed. Was he just imitating the fashionable aristocratic tastes of Mary Sidney, Countess of Pembroke, whose own of Garnier's Marc Antoine had been printed in 1592, as Curtis Perry remarked in his abstract of ‘The Uneasy Republicanism of Thomas Kyd's Cornelia’?3 Or was he actually involved in furthering the political thought of his time, exploring the limits of royal authority and the implementation of native liberties? The latter points would certainly have been part of his own experience that could be taken up again in the play and woven into the historical situation of a Roman Republic which was on the brink of mutating into the imperial rule of Caesar. A third option has just been offered above by the new stylometric features of R Stylo.4

Kyd was in dire straits after he had been released from torture and prison. Not only had he lost his patron, his estate was debt-ridden and his health was poor. A manuscript at hand for publication would have been more than welcome. In his dedication he qualifies the play as ‘small endeauours’, which critics understood as an acknowledgement of the weakness of the translation. But it may just as well have been a hint at material that had come into his possession and was not his own work.

One crucial question remains at the end of this paper. How do the results fit with the endeavours of other authorship studies?5 The answer is obvious: Not very well. It is equally clear that discrepancies do not arise from the careless handling of texts and materials. It is the chosen method that decides on the outcome, and here Gary Taylor, John V. Nance and Keegan Cooper have only just recently ventured into the snares and pitfalls of attributions. 6 Their main target is Brian Vickers and his ‘advocacy of Kyd,’ who according to Vickers had written most of Edward III, 1 Henry VI, ‘and all of Arden of , and King Leir’ (p.146). In this respect Rolling Delta results are in accordance with their criticism, since Rolling Delta attributes to Chapman and

2 The works of Thomas Kyd, 1558-1594, edited by Frederick S. Boas, (Oxford: Clarendon Press, 1901), p.102. Digitized by Google from the collections of Harvard University (https://archive.org/details/ worksthomaskyd00kydgoog) 3 From: Curtis Perry, “The Uneasy Republicanism of Thomas Kyd's Cornelia,“ Criticism, Volume 48, Number 4, Fall 2006, pp. 535-555 | 10.1353/crt.2008.0009 https://muse.jhu.edu/article/231447#f3,last contacted 28.01.2018 4 Eder, M., Rybicki, J. and Kestemont, M. (2016). Stylometry with R: a package for computational text analysis. R Journal 8(1): 107-121. 5 A comprehensive and exemplary survey of the methodology of authorship attributions was provided by Gabriel Egan in ‘A History of Shakespearean Authorship Attribution’, The New Oxford Authorship Companion, ed. by Gary Taylor & Gabriel Egan, Oxford: Univ. Press, 2017, pp. 27-47 6 Taylor, G., Nance, J., & Cooper, K. (2017). Shakespeare and Who? Aeschylus, Edward III and Thomas Kyd. In P. Holland (Ed.), Shakespeare Survey 70: Creating Shakespeare (Shakespeare Survey, pp. 146-153). : Cambridge University Press. doi:10.1017/9781108277648.015 Shakespeare. This again will not find favour with Taylor, Nance and Cooper, as their approach is based on a very different axiom. Stating the differences is inescapable, as is noting the areas of consensus. To break away from Vickers’s line of argument and to avoid self-defeating circularity, Taylor, Nance and Cooper established a couple of algorithms. Their first point, not to assume to know the author of the text in question, finds the full support of Rolling Delta procedures, in that a vast number of reference texts have been consulted, and it was in this way that Kyd’s Cornelia appeared, contrary to all expectations, to be of Marlovian provenance. To use databases and search engines that are publicly available certainly makes sense, particularly when results have to be checked. There is however one constraint with Rolling Delta, in that texts have undergone some changes. Speaker names, secondary text information and punctuation have been removed to exclude editorial influences. What is left are speeches alone in lower case letters. To identify the author of samples correctly is indeed one of the problems in authorship attributions, and here the danger of circularity is always present. Once a text has been wrongly attributed, its samples cannot be used to identify other texts. This is not only a problem of the Marlowe corpus, where most of his plays have been wrongly attributed, if one is to believe RD results. Taylor, Nance and Cooper use a control sample from Cornelia to successfully identify Kyd’s writing, in this case in Edward III, which had been wrongly attributed to Kyd by Vickers. As long as their use of ‘two to four consecutive words (‘n-grams’) or juxtapositions of two or more semantically significant words within ten words of one another (‘collocations’)’ (p. 147) is confirmed by Edward III this would lead to a faulty attribution. Luckily the n-gram figures did not match, particularly not in the Mariner’s speech on which Vickers had relied. Likewise Shakespeare, Greene or Peele were unlikely to have written the speech. So in the end it was the unique verbal parallels and philological expertise that pointed to Marlowe. But the discrepancy remains in the present state of stylometric assessments, on the one hand we have ‘laboriously collected, detailed data’ focusing on a small amount of text and therefore not permitting inferences to be drawn about a full-length play, and on the other hand there is a new methodology that uses longer sole-authored reference texts permitting the minutest stylistic differences to show up, even in smaller sections of the target text. However, n-grams can be very useful, as the following summary of n-grams, previously accounted for in their weighted numbers, shows.

N-grams and Collocations

In his recently composed paper ‘Working blind, without preconceived theories of authorship’ Thomas Merriam discusses in detail the usefulness of n-gram matches and refers to Mueller’s dictum ‘Authors are trumps’ as he ‘found that plays by the same known author share on average twice the number of matching n-grams as plays by different authors’ (Merriam, 2018, p.1). If we make use the of n-gram summaries of Rizvi that are based on 527 texts we can follow the relationships between the plays just referred to and the remaining nominal Marlowe corpus.

21

A B C D E F G H I J K L M N O 1 Contingency table of unique (bottom left) and total (top right) n-grams 22 2

3 1 2 3 4 5 6 7 8 9 10 11 Ø 1-6 Ø 7-11

4 1 Tamburlaine 1 866 195 225 139 70 124 77 107 85 166

5 2 Tamburlaine 2 271 166 188 123 99 138 133 142 83 166

6 Locrine 3 17 16 149 90 90 92 90 106 68 134 185,5 99,2

7 The Battle of Alcazar 4 4 12 21 104 50 60 149 73 47 152

8 David and Bethsabe 5 10 10 4 10 43 56 61 73 41 287

9 Cornelia 6 7 6 9 6 1 56 35 59 45 72

10 The Jew of Malta 7 7 7 0 1 5 6 65 91 72 152

11 The Massacre at Paris 8 2 9 8 15 5 4 5 41 59 321

12 Dido, Queen of Carthage 9 4 4 8 2 4 5 3 4 59 165

13 Dr. Faustus 10 7 8 3 4 1 2 1 5 6 97

14 Edward the Second 11 11 7 11 8 10 2 11 23 9 6

15

16 Ø 1-6 28,5

17 Ø 7-11 5,6

The plays that have been identified by Rolling Delta as Marlovian can be found in column A4 to A9 (no. 1 – 6). Nominal Marlowe plays follow from A10 to A14 (no. 7 – 11). The recorded numbers of unique n-grams and of total n-grams that each play shares with another are in B4 to M14. Unique n-grams which are not shared with any other play are at bottom left, the total number of n-grams shared between two or more plays can be found at top right. The conclusive information is that the average of unique n-grams (line 16) is 28.5 for the A B C D E F G H I J K L M identified Marlowe plays, but only 5.6 for the Marlowe corpus (line 17). If Contingency table of we look at the total number of n-grams the evaluated Marlovian plays 1 collocations (N6) share on average 185.5 n-grams, but the un-Marlovian Marlowe 2 corpus only 99.2. 3 1 2 3 4 5 6 7 8 9 10 11

Of course these figures are essentially influenced by the division of 4 1 Tamburlaine 1 5856 2187 2374 2386 2005 1178 1390 1346 1016 2801 Tamburlaine into two parts resulting in 271 unique n-grams between 5 2 Tamburlaine 2 5856 2517 2181 2600 2226 1307 1315 1620 1143 2422 them and a total of 866 n-grams. If these figures are disregarded there are still 9.5 unique real Marlowe n-grams on average as compared to 5.6 6 Locrine 3 2187 2517 1727 2101 2127 1113 1125 1315 853 2116 of the Marlowe corpus, and if Edward II is not taken into account due to 7 The Battle of Alcazar 4 2374 2181 1727 2072 1344 816 1522 864 646 2521 its deviating figures, the average number of unique n-grams is only 5 in 8 David and Bethsabe 5 2386 2600 2101 2072 1759 1042 1403 1449 819 2958 the Marlowe corpus. As to the total number of n-grams between plays 9 Cornelia 6 2005 2226 2127 1344 1759 1012 913 1194 740 1873 that have been identified as Marlovian the average without the Tamburlaines is 123.6. The official Marlowe corpus yields 99.2 and 10 The Jew of Malta 7 1178 1307 1113 816 1042 1012 742 921 768 1450 without Edward II the average number is 83.3. In stylistic terms there is a 11 The Massacre at Paris 8 1390 1315 1125 1522 1403 913 742 709 519 2844 clear prevalence of word n-grams among the plays that have been 12 Dido, Queen of Carthage 9 1346 1620 1315 864 1449 1194 921 709 621 1817 identified as Marlovian by RC, RD and other R Stylo features in this paper. Within the accepted Marlowe corpus there seems to be a close 13 Dr. Faustus 10 1016 1143 853 646 819 740 768 519 621 838 connection between Edward II and The Massacre at Paris. Unique n- 14 Edward the Second 11 2801 2422 2116 2521 2958 1873 1450 2844 1817 838 grams between the plays go up to 23 (J14) and the highest number of n- The group of plays that RC, RD etc. linked stylistically with the Tamburlaines grams in M11 is 321. There is reason to confirm Mueller’s view that the (1 – 6) have on average 2364 collocations in common, whereas the number of n-grams can attest to authorship. remaining Marlowe corpus (7 – 11) only produces 1394 collocations on Another test is the number of collocations by Rizvi.i The contingency table average in relation to the tested Marlovian plays. If the Tamburlaine reveals once again the difference between Marlovian plays and the nominal collocations are not counted, the average number is 2115 for the Marlovian Marlowe corpus. plays, and when Edward II is not considered because of its high counts, the Marlowe corpus yields on average 1131 collocations that the plays have in

common. The figures give a clear indication of the stylistic discrepancy 22 between the two groups and largely confirm the findings of n-grams listings and R Stylo’s attributions. Literature and References 23 (to be completed) Eder, M., Rybicki, J. and Kestemont, M. (2016). Stylometry with R: a package for computational text analysis. R Journal 8(1): 107-121.

i Please note Rizvi’s explanation: Search results are shown below in modern spelling. Searches are carried out using the lemmatised forms of words; so, for example, kind heart is matched with kind- hearted. Collocations are searched for in ten-word windows. A collocation is reported only if it contains at least two words which are not among the 154 most common words in the database. The list below shows the top 2.5% of matches, according to a formula that ranks each collocation match between two plays according to the number and commonness of the words in the collocation and how many plays it occurs in. Inferior ranks are given to collocations containing proper nouns. The full 100% of matches, and the constituents of the ranking formula, are provided separately in a CSV file.