Stylometry-based Approach for Detecting Writing Style Changes in Literary Texts Helena Gomez-Adorno´ 1,2, Juan-Pablo Posadas-Duran3, German´ R´ıos-Toledo4, Grigori Sidorov1, Gerardo Sierra2 1 Instituto Politecnico´ Nacional, Centro de Investigacion´ en Computacion,´ Ciudad de Mexico,´ Mexico 2 Universidad Nacional Autonoma´ de Mexico,´ Instituto de Ingenier´ıa, Ciudad de Mexico,´ Mexico 3 Instituto Politecnico´ Nacional (IPN), Escuela Superior de Ingenier´ıa Mecanica´ y Electrica´ Unidad Zacatenco (ESIME-Zacatenco), Ciudad de Mexico,´ Mexico 4 Centro Nacional de Investigacion´ y Desarrollo Tecnologico,´ Cuernavaca, Mexico
[email protected], german
[email protected],
[email protected],
[email protected],
[email protected] Abstract. In this paper, we present an approach to advantage of this situation in order to turn the vast identify changes in the writing style of 7 authors of amount of data into practical and useful knowledge. novels written in English. We defined 3 stages of writing for each author, each stage contains 3 novels with a In authorship analysis, typical features used for maximum of 3 years between each publication. We text representation in the Vector Space Model propose several stylometric features to represent the (VSM) are words, Bag of Words (BoW) model [11], novels in a vector space model. We use supervised word n-grams [16, 22], character n-grams [7, learning algorithms to determine if by means of this 22], and syntactic n-grams [19]. The values stylometric-based representation is possible to identify of these features can be Boolean [15], tf-idf to which stage of writing each novel belongs.