A Stylometric Analysis by Radial Basis Functions Authors(S): David Lowe and Robert Matthews Source: Computers and the Humanities, Vol

Shakespeare Vs. Fletcher: A Stylometric Analysis by Radial Basis Functions Authors(s): David Lowe and Robert Matthews Source: Computers and the Humanities, Vol. 29, No. 6 (Dec., 1995), pp. 449-461 Published by: Springer Stable URL: http://www.jstor.org/stable/30200368 Accessed: 27-03-2016 15:04 UTC Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://about.jstor.org/terms JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. Springer is collaborating with JSTOR to digitize, preserve and extend access to Computers and the Humanities http://www.jstor.org This content downloaded from 129.67.116.144 on Sun, 27 Mar 2016 15:04:38 UTC All use subject to http://about.jstor.org/terms Computers and the Humanities 29: 449-461, 1995. 449 C 1995 Kluwer Academic Publishers. Printed in the Netherlands Shakespeare Vs. Fletcher: A Stylometric Analysis by Radial Basis Functions David Lowe and Robert Matthews * Neural Computing Research Group, Aston University, Birmingham B4 7ET, England e-mail:[email protected]; [email protected] Key words: neural networks, stylometric analysis, Shakespeare, Fletcher, discrimination, classification Abstract In this paper we show, for the first time, how Radial Basis Function (RBF) network techniques can be used to explore questions surrounding authorship of historic documents. The paper illustrates the technical and practical aspects of RBF's, using data extracted from works written in the early 17th century by William Shakespeare and his contemporary John Fletcher. We also present benchmark comparisons with other standard techniques for contrast and comparison. 1. Introduction tive work of Shakespeare. Whilst some scholars have accepted the play as such, others remain unconvinced. Literary scholars have long debated over questions of Conventionally, the primary information used to try authorship of various works and documents. Many and ascribe authorship is centred around scholarly such questions centre on alleged works by William opinion of the aesthetic style of the prose and the subtle Shakespeare and one of the oldest of these disputes use of language, vocabulary and grammar when com- concerns the authorship of an obscure play, The Two pared to other works of undisputed provenance. Noble Kinsmen. This was first performed around 1613 This is a classic problem faced in many scholarly domains which use high level, human cognitive but has been relatively ignored ever since. A copy of this script circulating around 1634 ascribed the methods of reasoning combined with 'intuition' and work to William Shakespeare and John Fletcher (who 'experience' to try and arrive at a consensus of succeeded Shakespeare after his death in 1616 as chief opinion. However there are also quantitative, statistical dramatist to the Kings Men). The question arises as to approaches to data analysis which might have some- whether this obscure play really is a genuine collabora- thing to offer in these domains. The field of stylo- metry is essentially the application of mathematical methods to extract quantitative measures to assist in * David Lowe is Professor of Neural Computing at Aston such debates. University, UK. His research interests span from the theoretical Of course, no technique can ascribe definitive aspects of dynamical systems theory and statistical pattern processing, to a wide range of application domains, from financial market answers in such applications. The best we can hope for analysis ("Novel Exploitation of Neural Network Methods in Finan- is a technique which provides additional quantifiable cial Markets", invited paper, World Conference on Computational evidential weight in favour of one author or another. Intelligence, vol. VI, pp. 3623-28, 1994) to the 'artificial nose' Another problem is that in extracting high level quali- ("Novel 'Topographic' Nonlinear Feature Extraction using Radial Basis Functions for Concentration Coding in the 'Artificial Nose'", tative information from an abstract knowledge source 3rd IEE International Conference on Artificial Neural Networks, for quantitative analysis, we need to produce an inter- pp. 95-99, Conference Publication number 372, The Institute of mediate representation of information which is more Electrical Engineers, 1993). Robert Matthews is a visiting research fellow at Aston Univer- 'low-level'. This process of dimensionality reduction sity. His research interests include probability, number theory and and feature extraction is inevitably a nonlinear process. astronomy. His recent paper in Nature (vol. 374, pp. 681-82, 1995) If the transformed information has been nonlinearly somehow managed to combine all three. This content downloaded from 129.67.116.144 on Sun, 27 Mar 2016 15:04:38 UTC All use subject to http://about.jstor.org/terms 450 distorted, then evidently we need access to nonlinear mated by a suitable Radial Basis Function architecture. In addition it can be considered as a generalisation of analysis techniques to resolve any conflict. Unfortu- nately there are very few nonlinear methods which several traditional statistical pattern processing tech- have an inherent ability to extract and convey statistical niques. Its strength derives from a rich interpretational information. However, one such class of techniques basis since it lies in the confluence of a variety of exists in the neural network domain. 'established' scientific disciplines. Thus, although the There is already evidence (Matthews and Merriam, original motivation of this particular network structure was in terms of functional approximation tech- 1993) that the Multilayer Perceptron is a potentially very useful tool in stylometric analysis. It was shown niques (Powell, 1992), the network may be derived that the Multilayer Perceptron could be trained to on the basis of statistical pattern processing theory classify 96% of the training set successfully (using (Lowe, 1991), regression and regularisation (Girosi cross-validation) composed of known Shakespeare- et al., 1995), biological pattern formation, mapping Fletcher works. When applied to other data not used in the presence of noisy data etc. However, in addi- as part of the training set, very successful discrimination to exhibiting a range of useful theoretical proper- tion was obtained on known works, and when applied ties, it is also a practically useful construct as it may to disputed works the method provided information be applied to problem domains in discrimination (see which was in general broad agreement with current e.g. Niranjan and Fallside, 1990, for a speech classi- scholarly opinion. fication example), time series prediction (see articles However there are many distinct types of neural in Rao Vemuri and Rogers, 1994, for financial and other examples) and other mapping problems, and fea- network methods, each with their own properties, ture extraction/topographic mapping problem domains advantages and disadvantages. There are also many recent statistical techniques which have yet to be appro- (e.g. Lowe, 1993, for a chemical odour concentration priately developed in this type of problem domain. coding example). The previous work which has studied this particular 2.1. Neural networks and classification problems problem was a preliminary, feasibility study in that no comparative performance experiments were presented, Neural networks such as the Radial Basis Function either contrasting with other network techniques, or with other traditional methods. This paper addresses network are examples of techniques known as nonpara- these criticisms by presenting an alternative network metric methods. This means that they can be used to construct representations to problems where an explicit study as well as presenting comparative performance model of the problem domain is not known (such as in estimates using more traditional techniques. In particular this paper presents an analysis of Shakespeare- financial market prediction) or is too difficult to eval- Fletcher data using a range of quantitative techniques, uate (as in weather forecasting). This is achieved by including classical statistical pattern processing optimising the structure of a neural network architec- methods and the Radial Basis Function network. This ture by minimising a criterion function (usually a sum latter technique has several advantages over the previ- squared error criterion between the desired answer and ously applied Multilayer Perceptron, especially when the predicted network answer). Although originally motivated by the apparent structure of information applied to small sample data sets as exemplified by the specific problem considered in this paper. Some of processing in nervous systems, we now know that these advantages will be discussed later. artificial neural networks are more closely related to pattern processing methods than to biology. The architecture of an artificial neural network is 2. Classification Using the Radial Basis Function very simple and is composed of layers of process- Network ing elements with nonlinear (though differentiable) transfer functions at each node. An artificial neural net- The Radial Basis Function (Broomhead and Lowe, work has a set of input nodes, a set of 'hidden

Load more