The Greene Corpus | Plays by Marlowe 1
Total Page:16
File Type:pdf, Size:1020Kb
The Greene Corpus | Plays by Marlowe 1 Alphonsus Friar Bacon and Friar Bungay James IV Orlando Selimus Tamburlaine 1 Tamburlaine 2 What you see are the classification results of the Greene and Marlowe corpora. It’s a survey, the nature of which requires that superfluous information is left aside, but it is equally indispensible to know what the columns and lines actually represent. The first column of each play returns the nsc classification (nearest shrunken centroid) based on words, the second column gives the svm classification of words (support vector machine), the third uses the delta classifier (Burrowsian). This pattern continues to the right, but the next three columns are based on character bigrams, and the last three columns of each play refer to character trigrams, which are probably more reliable, and it is also said that the svm classifier has a high decision level whereas nsc jumps very quickly to attributions. All classifications were carried out with windows of 8000 words. Smaller windows tend to produce a large number of outliers and create a confusing picture. According to the 8000-word window the first measurement (line 1 of each play) is noted at 4000 words and the distance to line 2 is 250 words which means that the length of the play can be calculated from the last line to which another 4000 words must be added. There may be a small trail of words in the plays that was not considered. The sequence of classified plays is such that the Greene corpus was surveyed first, then you encounter a number of plays which are Plays by Christopher Marlowe 2 Locrine The Battle of Alacazar David and Bethsabe Cornelia arrangement of classifiers in the charts stylistically linked with Marlowe, even though they contain quite a number of Greene signals. But as we come to the remaining Marlowe corpus on page 3 it becomes clear that the clarity of Marlowe attributions as displayed from Tamburlaine 1 down to Kyd’s Cornelia has disappeared. In some cases classifiers contradict each other. Edward II for example shows a pattern where nsc results have a majority in the upper and the lower division of the chart indicating Samuel Rowley, whereas svm has a clear penchant for Greene in all three types of variables. Delta, on the other hand, returns mainly Kyd and Marlowe in the word section, Shakespeare in character bigrams, and Shakespeare and Rowley in character trigrams. It is not surprising that results vary to such a degree as the mathematical kernels of the classifiers concentrate on different patterns. The only safe assumption in the case of Edward II is however that the very small number of Marlowe signals supports the view that the play was not written by Marlowe. The other nominal Marlowe plays on page 3 don’t even have significant Marlowe sections in any of the classifiers. Judging by averages and majorities much of the style of Dido, Queen of Carthage comes from Greene, The Jew of Malta would be Shakespeare’s work, George Chapman would have played a major role in the writing of The Massacre at Paris, and the two extant versions of Dr. Faustus are largely based on works by Greene. In Greene’s corpus it is The Scottish History of James IV which does not replicate the typical Greene pattern as displayed in Alphonsus, Friar Bacon and Friar Bungay, and Orlando Furioso. Instead it is George Chapman who seems to have had a tremendous interest in historical themes. In addition Selimus is not at all clear as a Greene play. The strong links of the two Tamburlaines with the anonymous Tragedy of Locrine, with Peele’s The Battle of Alcazar and The Love of David and Bethsabe as well as Kyd’s Cornelia suggest that these plays are in fact by Marlowe. 3 The nominal Marlowe Corpus Dido, Queen of Carthage Edward II (upper) Edward II (lower) The Jew of Malta The Massacre at Paris Doctor Faustus A Doctor Faustus B It is obvious that the findings displayed in these charts must be underpinned by further methodical steps like for example Rolling Delta analyses. Matching n-grams and collocations between the texts can also be followed, and there is quite a number of secondary literature dealing with the texts above. Some of it can be found on this website. But before we come to that it is necessary to go through the files again, but this time Greene’s plays The Scottish History of James IV and Selimus are removed from the list of reference texts as their Greene attribution is not that clear any more. As we come to the Tamburlaines on page 4 the Marlowe indications are stronger than before, and the Marlowe signals also grow in Locrine,The Battle of Alcazar, David and Bethsabe and finally Kyd’s Cornelia, which has now more Kyd signals as well. The remaining Marlowe corpus on page 5 now shows a reduced number of Greene attributions. Shakespeare’s share, however, grows in Edward II, and The Jew of Malta now has a clear Shakespeare attribution. In The Massacre at Paris, Chapman’s part becomes more obvious, as Greene disappears. The Doctor Faustus texts, too are reduced in their Greene assignments, but apparently Rolling Classify cannot decipher the assortment of smaller textual contributions. The most important aspect is however that none of the texts is in any way related to the two Tamburlaines. The Marlowe cells in Dido and Edward II are far too few (see page 5). Plays by Christopher Marlowe 4 Tamburlaine 1 Tamburlaine 2 Locrine The Battle of Alcazar David and Bethsabe Cornelia 5 The nominal Marlowe Corpus without the impact of James IV and Selimus as reference texts Dido, Queen of Carthage Edward II The Jew of Malta The Massacre at Paris Dr. Faustus (A) The chart with the set of problems on page 6 summarizes the present situation of this investigation. The thesis is that the Marlowe corpus consisting of plays no 1 to 7 is stylistically not homogeneous. This is a somewhat clear result of Rolling Classify (RC) analyses listed under methodology. Instead we find a number of files with Marlovian style and a remainder of corpus files under non-Marlovian style. The next task now is to employ the findings of Rolling Delta (RD). My publications up to “Christopher Marlowe: Hype and Hoax” made use of selected reference texts that were believed to be relevant in the authorship determination of the corpus. This, naturally, had a high degree of subjectivity, and it became necessary to use a large number of reference texts that were sole-authored and well-attributed. Now it was the program that extracted the lowest delta values from each of the reference text windows to indicate the play with the smallest stylistic difference from the target text. The window size was 4000 or 5000 words and the step size was 250 words which means that every 250-word segment was furnished with an authorial preference. The disadvantage, however, was that it took the computer about 3 to 4 hours to go through all the files. The set of problems For this reason the chosen variables were character trigrams (mf3c) for 6 their reliability as compared to character bigrams (mf2c) and words (mf1w). The delta values that were detected by R Stylo were then transferred into a spreadsheet that contained in column A about 100 play names, and in column B the measurements of the first window. Column C then contained the next window measurement that had a 250-word overlap. This continued to the right according to the length of the target play. To give an example: If the window size was 4000 words, the first measurement of that window was noted at 2000 words (column B), the next at 2250 words (column C), the following at 2500 words (column D) and so on. In another step column B was subjected to conditional formatting in which the lowest delta of all was highlighted in green, the second-lowest in yellow, and the third-lowest in red. This was applied to all columns (except A of course). Then plays (rows) with no highlighting at all were eliminated, and what was left was then transformed by 90 ° into a table conveying the sequential attributions of the target play. One example of the evaluation is given on page 7. It concerns the RD attributions of Tamburlaine, part 1. Column A recalls the number of words of the play at a distance of 250 words, starting at 2000 words, which means that the window size is 4000 words. Out of a hundred reference texts the three lowest delta values belong to Tamburlaine 2 (column E), Peele’s The Battle of Alcazar (column F), and Locrine (column A). Each line gives the next measurement and at the bottom of the table we find the percentage attributions. Column H indicates the scenes and acts of the play and their accumulated lengths, optimally adapted to the word counts of column A. As could be expected the best suited reference text is Tamburlaine 2. This could have been predicted by anyone, but the noteworthy aspect is that the selection was done on the basis of the lowest delta values, extracted by RD from about a hundred reference texts. Furthermore we note that the other reference texts that are related to Tamburlaine 1 in terms of style are Locrine, Kyd’s Cornelia, and Peele’s The Battle of Alcazar, a confirmation of the findings of RC above. Selimus and Peele’s Edward I are doubtful as sole-authored plays and have to be regarded as hardly meaningful.