<<

The Explorative Value of Computational Methods: Rereading the American

Stephanie Siewert and Nils Reiter

ABSTRACT

This article explores the use of computational methods to study stylistic and content shifts of the nineteenth-century short story. It is generally assumed that the modern American short story somehow represents the democratic discourse of the United States. This paper argues that an explorative computational approach can help us to reconsider the connections between the textual patterns of the short story and U.S.-American modernity. We offer a critical, digital perspective on the distribution of certain indicative linguistic features across 123 short stories from 1840 to 1916. We used methods from computational linguistics to automatically anno- tate the texts with various linguistic properties (named entities and direct speech, for instance). The quantitative results of automated text processing are presented against the backdrop of the major social, economic, and cultural developments of the time. Our findings provide further insights on the tensions between processes of individualization and economic dependencies dur- ing the nineteenth century, especially with respect to the publishing industry. In addition, we propose that the more experimental nature of a macro-analytical perspective can direct our attention to texts or groups of texts that remain underestimated as to their literary value or exemplary nature for a certain topical, structural, or linguistic pattern. In this vein, the article offers a close reading of Thomas Bailey Aldrich’s short story “Mademoiselle Olympe Zabriski,” which features an above-average number of proper names within the corpus. When re-read in the light of quantitative results, we see the text comment humorously on 1870s class issues in New England. Thus, the article offers an example of how humanities research can benefit from quantitative approaches. To conclude, the article promotes the gains of digital literary analysis for future research in American Studies.

1 Introduction

In 2015, the British Comma Press released MacGuffin, a new digital self-pub- lishing platform for , essays, and , where users can pick their favorite short stories according to theme and length. What makes the platform different from all other popular blogging and fan fiction sites is the open analytics section, visible to both readers and authors. Writers and their publishers can test the at- tractiveness of narrative structures, measured according to the time each visitor spends reading a story.1 The platform might provide a way to revitalize the slow- selling short story, which lags behind the more popular genre of the . While the publishing industry has been less reluctant to use innovative techniques of data mining to evaluate their products and to place them more

1 See Bridle. 200 Stephanie Siewert and Nils Reiter effectively,2 universities and scholars still seem to shy away from the critical po- tential of digitally (re)mastered artefacts. While we should not hastily abandon established working tools in light of “ever-shorter cycles of innovation” (Kelleter 383), the digitization of culture will continue. Digital Humanities (DH), practiced as a (self-)critical and interdisciplinary effort, are “needed as a collective player— as a lobbying force and resource of practice and strategies” to meet the challenges of the digital age (Kelleter 387).3 Most objects of scholarly interest are multimodal and exist in a spectrum of analytical environments—computational methods simply extend their scope into a digital arena. The example of MacGuffin could be turned into a rewarding DH project on human attention and/or reader-response in the digital age if combined with a critical perspective on data mining tools and their epistemological value.4 Alternatively, one might also use digital methods to reassess these short stories and to look for patterns across extended periods of time and spatial scope.5 That also applies to a different corpus and historical material. In this vein, the follow- ing essay explores stylistic shifts in U.S.-American short fiction during the nine- teenth century with the help of computational methods. The genre itself has inspired a multifarious and at times overwhelming critical discourse, a quality that also holds true for the numerous efforts to demarcate its boundaries. Ruth Suckow wrote in 1927 that America is “the land of the defini- tion of the short story” (317-18), and yet the genre itself seems as elusive as ever.6 In fact, Oliver Scheiding has pointed out that the American short story is still

2 On a similar note, short-story dispensers owned by the French publisher ShortÉdition are increasingly gaining in popularity and have now reached Great Britain and the U.S. (ShortÉdi- tion; see also Flood). They are a welcome distraction for people waiting in line at an airport, train station, or elsewhere. The vending machine prints original stories on demand depending on how much time each user has to spare. 3 It is rather surprising that the pre-selection of the German Excellence Initiative 2017, which included 88 project proposals, rather skiped over the digital turn in the social sciences, law, economics, and cultural studies. However, the DFG listed three promising projects on data analysis in the humanities (TU Darmstadt), algorithms and the informational society (Karl- sruher Institut für Technologie), and hermeneutical processes with a strong DH focus (Stuttgart/ Tübingen). See: dfg.de/en/research_funding/programmes/excellence_initiative/index.html. 4 DH should (and already do) brace themselves against the technophile, affirmative, big data hype that supports the logic of market mechanisms and its rhetoric of accessibility and transparency. 5 As Kathleen Fitzpatrick states: “The particular contribution of the Digital Humanities, however, lies in its exploration of the difference that the digital can make to the kinds of work that we do as well as to the ways that we communicate with one another. These new modes of scholarship and communication will best flourish if they, like the Digital Humanities, are al- lowed to remain plural” (14). 6 The short story is a rather ‘young’ genre. As F. L. Pattee argues, “it began to be used more and more during the sixties and the seventies, but never in a generic sense; always the emphasis on the first word” (291). Taking up on Edgar Allan Poe’s ideas in “Philosophy of Composition,” Brander Matthews formalized the short story as a distinct art form that, for him, even exceeds the novel due to its “unity of impression” (17). Matthews provided a thorough definition of the short story in his seminal work “The Philosophy of the Short Story” (1901), but had first ex- pressed his ideas in an article in the Saturday Review in 1984. The Explorative Value of Computational Methods 201

“undertheorized” (235). A quantitative analysis might allow us to understand and conceptualize modern short fiction in different and potentially new ways, without doing away with qualitative work in the field. Hence, can a large-scale analysis support, specify or challenge certain assumptions about nineteenth-century short fiction? Among American studies scholars it is generally accepted that a more ‘democratic’ register came to dominate fictional and that the everyday as- sumed a central role as sujet, setting, and language. This includes a development from earlier plot-driven romantic tales towards character-based realistic stories, a shift from telling to showing, and an emphasis on vernacular culture (Bendixen 3-19). In comparing early romantic tales such as Washington Irving’s “Rip van Winkle” to Mark Twain’s realist story “The Celebrated Jumping Frog of Cala- veras County,” one is inclined to predict a development from longer to shorter stories. Yet do stories actually get shorter, and if so, does this correlate with a change in other metrics? Can we conceptualize developments of the short story during the nineteenth century on the basis of certain linguistic features to come to a more nuanced view on broad trends in the history of the short story? This article offers a critical, digital perspective on the distribution of certain indicative linguistic features across short stories from 1840 to 1916. Read against the backdrop of major social, economic, and cultural developments in the pub- lishing industry, our findings provide further insights on the tensions between processes of individualization and economic dependencies during the nineteenth century. In the long run, computational analysis might help us to understand which changes in narrative style actually reflect a more general concern with mo- dernity in the United States. To begin with, we will review general methodological considerations that form the basis of our work (Section 2). This will help to situate our approach in the larger framework of digital scholarship and in particular, with respect to the relation be- tween quantitative and qualitative research. Section 3 presents our corpus of 123 short stories and its internal composition before we will turn to the text-processing components (sentence splitting, part of speech tagging, lemmatization), and their evaluation (Section 4). In the following section, we will present and discuss the quantitative results of automated text processing in the form of plots organized by publication year on the x-axis (time series). In some cases we provide brief interpre- tations, which, in the spirit of an explorative analysis,7 serve as suggestions for fur- ther research. In Section 6 we will reconnect our most substantial findings to chang- es in the publishing landscape and certain influencing factors in the socio-economic set-up of the nineteenth century. While we analyze broader linguistic trends in our corpus throughout this essay, we will discuss particularly interesting examples in more depth.8 Thomas Bailey Aldrich’s short story “Mademoiselle Olympe Zabris-

7 Kathleen Fitzpatrick and Ryan Cordell have repeatedly emphasized the “explorative” value of digital methods. 8 The somewhat figurative term ‘distant reading’, coined by Moretti (“Operationalizing”), describes the use of automatic and statistical analysis on large text collections, in contrast to the classical ‘close reading’ of texts (i. e., the literal reading). Since both techniques are indispens- able and interconnected, Mueller employs the notion of ‘scalable reading,’ as a way to combine close and distant reading. 202 Stephanie Siewert and Nils Reiter ki” serves as a case study of a text that was forgotten during the heyday of realist and later experimental modernist fiction but caught our attention due to its surprising use of proper names. The results from the close reading are particularly interesting in light of a general increase of proper names in our corpus, which, we propose, might be indicative of a turn towards a social distinctiveness and actuality of events and people in modern short stories. Section 7 concludes with remarks on the state and nature of digital scholarship and interdisciplinary work in American Studies.9

2 On the Relation between Qualitative and Quantitative Methods

Computational literary studies open up a variety of large-scale and compara- tive research questions for textual analysis concerned with the publication histo- ries of texts (stylometrics) or single author analysis (text mining, stylometrics), for the correlation of different data sets (mapping), or visualizations of distinct phenomena and developments. However, all these approaches depend on the ex- istence of adequate infrastructure, technical support, and a research team that works towards a common goal rather than the efforts of one single individual. DH projects are challenging since they demand more than discursive exchange: They need continuous and elaborate communication, an open-minded reciprocal translation process, and iterative methods. While many interdisciplinary projects within the humanities are grounded in similar intellectual traditions, methods, and paradigm shifts, DH projects require an observational exchange on the in- terplay of computational and philological epistemologies and working tools. Julia Flanders has called the tension between the digital and the humanities a “produc- tive unease” that unfolds due to potentially different perspectives on the notion of progress and emphasis on progressiveness. On a more theoretical level, the unease with the idea of a quantifiable object of analysis that belongs to the realm of expressive creativity and imagination ex- poses a general paradox of hermeneutic approaches.10 In Radiant Textuality, Je- rome McGann reminds scholars of literary and cultural studies that all our critical tools of analysis are and always have been “prostheses for acting at a distance,” the same “distance that makes reflection possible” (103). It is undeniable that some of

9 We would like to thank Linda Kessler for annotations. Part of the research presented for this essay was conducted in the Center for Reflected Text Analytics (CRETA), funded by the German Ministry for Education and Research (BMBF). 10 Digital humanities have seen a fair amount of criticism with regard to their role and standing in literary studies departments. In his trenchant and broadly perceived article “Tech- nology Is Taking Over English Departments” (2014), Adam Kirsch accuses DH projects of a “post or pre-verbalist discourse of pictures and objects” and an “anti-humanistic” stance. The author certainly has a point in arguing that digital scholarship always needs the well-versed lit- erary scholar to ask the right questions and draw meaningful conclusions from the data in front of him. However, Kirsch also claims that digital scholarship has little to do with the “work of thinking.” While he mocks the allegedly disciplinary flaws of the digital humanities, its pluralist line up and lack of “essence”—especially in view of the accomplishments of the New Critics in literary studies—one might respond that the testing and cautious nature of digital scholarship demonstrates a commitment to complexity. The Explorative Value of Computational Methods 203 the most influential literary anthologies, such as the Norton Anthology of Ameri- can , define literary schools and paradigm shifts on the basis of a small number of canonized authors. Since this selection process is perceived as more ‘natural,’ it often remains unquestioned and conveys the impression that canonized texts are indicative of, or even representative for, certain genres or literary periods. The personal (and political) tastes of a few influential publishers, such as Harper’s, Scribner’s, or Putnam’s, and certain men of letters (e. g. William Dean Howells or William Randolph Hearst) had a lasting effect on the reading lists of young and old in the nineteenth century. Yet critics of DH find quantitative methods much more biased than qualitative selection and interpretation processes, which typically take only a very small selection of texts into account. The rise of data-driven approaches in DH and the accompanying methodological standards make this bias visible as long as selection criteria are documented and thus open to criticism. In his work Macroanalysis (2013), Matthew Jockers claims: “Both the theory and methodology [of macroanalysis] are aimed at the discovery and delivery of evidence.” (48) While this is certainly desirable for both close and distant reading, we would like to emphasize one specific aspect: A macro-analytical perspective can direct our attention to texts that remain underestimated in regards to their literary value or ex- emplary nature for a certain topical, structural, or linguistic shift. The experimental nature of DH methods allows us to see texts in new and surprising ways, for instance, as they appear within certain structural clusters or webs of meaning. This “new evi- dence” has the potential to “alter our sense of what we thought we knew” (Jockers 48). There is, however, one major challenge when dealing with literary concepts: before they can be studied quantitatively or digitally, they need to be operation- alized. While this procedure is related to defining the concepts, the two are not equivalent. Rather, operationalization requires one to formalize concepts in such a way that it is possible to implement a computer program to detect and count them. This is not a trivial task and often generates new insights into the concepts in question. The field of computational linguistics has been working on operation- alizing linguistic concepts since the 1950s, and purely linguistic concepts can now be operationalized reliably.11 These concepts can be employed as a basis for liter- ary text analysis, such that a detection of literary properties uses linguistic prop- erties as features. In contrast, the operationalization of many terms originating in literary studies remains at an early, experimental stage, even in the case of the comparatively formalistic concepts developed in narratology (Gius and Jacke). This applies whether one aims for strict hypothesis testing or explorative use of quantitative methods. While hypothesis testing constitutes a common practice in the natural sciences, it is very different from workflows in the humanities and puts more emphasis on methodological questions.12 An important criterion for any hy-

11 The relation between computational linguistic (CL) and (non-computational) linguistic concepts is not always clear. In general, one should assume a certain gap between concepts of the two areas. A strong guiding force in CL was to annotate concepts and to calculate inter-an- notator agreement to put a number on the consensus of multiple annotators. It quickly became clear that concepts had to be refined iteratively to be applicable intersubjectively. 12 Such as: Is the data representative? Did you select an appropriate hypothesis test? Is the data normally distributed? 204 Stephanie Siewert and Nils Reiter pothesis, however, is its falsifiability, which directly depends on its operationaliza- tion. Disproving vague concepts with debatable operationalizations always runs the risk of missing the point: For example, a rejected hypothesis might be about a slightly different concept and might then be discarded too easily as irrelevant. Even if the concepts can not be operationalized directly, we can still explore data collections with approximate operationalizations. As a first step, one needs to identify phenomena that are detectable in texts with reasonable accuracy (for instance, those originating from CL). Investigating several of these ‘proxies’ and their correlations, distributions, or developments sometimes yields inferences about the concepts in question. Clearly, we cannot expect quantitative results to present final answers since they need to be interpreted and contextualized. To cope with both the unavoidably imperfect performance of automatic tools and with the insecurities that arise from the ambiguity of the concepts in question, automated methods require sample-based verification and critical reflector. Before we can start testing, the scholars involved need to agree upon their operationalization. Ideally the whole community comes to an understanding to ensure a common ground for future research. Projects such as heureCLÉA have used an annotation workflow to reassess narratological categories (Bögel et al.), and initiatives such as shared tasks promote collaborative efforts to improve the operationalization process (Reiter et al.).13 These projects showcase a certain methodological departure from traditional humanities work. In addition to the work on conceptual definitions, projects aiming for strict hypothesis testing need to construct annotated datasets (corpora). Manual anno- tation of relevant concepts is only feasible on small- and mid-sized corpora (and on these corpora, classical close reading might even be more efficient). The an- notation of large corpora requires the development of automatic detection tools. Despite recent advances within natural language processing that are due to neu- ral networks, the inclusion of heterogeneous knowledge sources will remain chal- lenging. For instance, it is rather unlikely that scholars will be able to automati- cally detect focalization with an acceptable reliability in the coming years (even after the conceptual challenges have been met).

3 Corpus

We conducted an analysis of 123 short stories (1820-1916) based on early digi- tized anthologies (e. g. William Dean Howells, The Great Modern American Stories, 1920) and handpicked texts from the story collections of single authors. We con- sulted platforms such as the Internet Archive,14 Documenting the American South,15

13 Shared tasks are a workshop and research format popular in natural language process- ing (NLP). In a shared task, different participants work on the same, clearly defined problem, part of speech tagging, for example. The participants develop different systems to automatically solve the problem, and the systems are evaluated on the same test set. Thus, results are directly comparable, and the systems can be ranked according to their performance. 14 See . 15 See . The Explorative Value of Computational Methods 205

Oxford Text Archive,16 and Project Gutenberg,17 as well as other digital archives and bibliographies. The texts had to already be available as html or text files since we did not have the resources to cope with the digitization of more recently published anthologies and the mediocre results of automatic optical character recognition (OCR) of the digitized material.18 The corpus includes some of the most frequently anthologized texts of the period and others, such as Thomas Bailey Aldrich’s “Ma- demoiselle Olympe Zabriski,” that have been lost in the process of canonization but were rather popular in their own age. We chose the date of first publication as one criterion of selection, thereby fore- closing the more socially embedded ways that literary texts have circulated in the past. The corpus cannot be seen as statistically representative of the entire textual production of short fiction in the nineteenth century, yet every text analysis is prone to sample bias. We decided against using more contemporary collections of Ameri- can short stories for a number of reasons. In his deconstructivist study of American short fiction, Douglas Tallack argues that the short story “has been virtually mo- nopolized by Modernist theory. This very persistent strand of thinking has success- fully incorporated key elements of Romantic aesthetics and thereby instituted a lit- erary history of the short story in which the genre has its origins in Romanticism and its high point in Modernism” (vii). A look at the line-up of most literary anthologies and surveys, and their implied or straightforward promotion of the ‘best’ and ‘ma- jor’ stories, supports this claim (see e. g. Voss, Litz, Bausch/Cassill). Hence, some authors receive more extensive treatment, while others are omitted altogether.19 Our corpus relies on collections released close to the date of original publica- tion to capture the zeitgeist of the nineteenth century and to remain closer to the perception of the short story in its own time.20 While this does not eradicate the

16 See . 17 See . 18 Despite digitization efforts in recent years, a large number of texts are still only available in printed form. Moreover, a scan of the book needs to be followed by optical character recognition (OCR) to transform the image into digitally represented characters. As the variety of fonts, layouts, book printing, and storage qualities is enormous, the quality of OCR programs may range from excellent to inacceptable. Thus, manual inspection, control, and sometimes correction become necessary. Having the text of a book OCR’d is not enough, as we typically want to abstract from the physical form and instead investigate the text object, without index, imprint, or page numbers. 19 In an article on the most anthologized short stories of all time, Emily Temple considers twenty anthologies from 1983-2017. Her analysis rests on the more prominent anthologies, such as The Norton Anthology of American Short Fiction and The Oxford Book of American Short Stories, but also includes 100 Years of the Best American Short Stories, or The Best American Short Stories of the Century. While she does not differentiate between periods and excludes more generic and thematic anthologies (, , romance, war etc.), or prize- and journal-based selections, her short analysis provides interesting results. While Hawthorne’s “Young Goodman Brown,” Melville’s “Bartleby” or Perkins Gilman’s “Yellow Wallpaper” ap- pear several times, and Poe, unsurprisingly, is one of the most anthologized authors, none of the other nineteenth-century popular writers such as Harte, O’Brien, Garland, Stockton or Jewett enter the list. This further substantiates Tallack’s argument on the predominance of a certain modernist perspective, and perception of the short story as an art form. 20 In the nineteenth century, magazines such as McClure’s published their own ‘best sto- ries’ lists. In the future we will incorporate magazine collections such as Scribner’s Stories by 206 Stephanie Siewert and Nils Reiter editor’s own literary standards and/or economic dependencies, it might offer a less retrospective construction of the quality of nineteenth century short stories. As a consequence, our corpus contains stories such as Frank R. Stockton’s “The Lady, or the Tiger?” (1882), which “became the most-talked-of short story of the period” (Pattee 296) and is today missing from textbooks such as the Norton An- thology of Short Fiction. Like any other selection, our corpus is based on visibility and availability, yet we paid attention to regional variety and cultural diversity. The predominance of white anglophone writers in our corpus mirrors the influence of a certain socio- cultural group in those early anthologies. In this respect, the selection depicts (on a mid-sized scale) the hegemonic perspective of nineteenth-century print culture and the public sphere with all its mechanisms of exclusion and power relations, its social networks, as well as race, class and gender hierarchies. Both female and Af- rican American authors, as well as authors from other marginalized groups, were bound by the social restrictions of the time. While their situation changed after the Civil War, as educational opportunities expanded and literary productivity increased, they still faced discrimination in terms of salary, choice of topics, and overall recognition. The literary approval and broader success of African Ameri- can authors such as Charles W. Chesnutt at the end of the nineteenth century was still anything but usual and often depended on a white support system and reader- ship in the North.

4 Text Preprocessing

Technically, our corpus consists of 123 plain text files, with some metadata en- coded in the file names. This way of organizing metadata has its limits but also the advantage that these files can be processed without creating dependencies on external libraries or resources. In a first step, we made the corpus structurally more homogeneous by removing hyphenation and by harmonizing quotations. Texts that originate in scans and/or PDF files typically contain hyphenation. Leaving them un- touched would potentially disturb frequency analysis methods, because split words are then counted as two separate words. To give an example: hyphenating the token ‘corpus’ would result in the two character sequences ‘cor-’ and ‘pus’ appearing as two independent words. To avoid this, we decided to remove hyphenation before further processing. This is not always a trivial undertaking, however, as not all ap- pearances of the hyphen at line endings are caused by hyphenation (e. g. ‘human- looking’). In order to ensure that we only erased hyphens where appropriate, we

American Authors. Project Gutenberg grants access to these texts in hypertext and txt format. In addition, scholars looking for material can use the holdings of institutions subscribing to HathiTrust, where subscribers have access to single pages in plain text format. Internet Archive provides full text access, however the plain text is based on PDF files and contains multiple OCR errors, which, in turn, require time-consuming corrections before one can even start to process the text. The University of Pennsylvania has assembled a useful collection of links to American short fiction anthologies from the nineteenth and early twentieth century, many of which require further digitization efforts. The Explorative Value of Computational Methods 207 made use of a spelling dictionary. Only if the word formed by removing the hyphen was contained in such a dictionary, did we automatically replace the original by its proper spelling. Furthermore, as already mentioned, our corpus is compiled from various sources. Not all texts follow the same edition standards, most notably in the use of quote characters. Since we are not investigating edition standards per se, we harmonized all quote characters (e. g. `”“„«») by using a double quote. After this cleanup, we used automatic language analysis tools to process the corpus. More specifically, these were: detection of token and sentence boundar- ies, part of speech tagging, lemmatization, named entity recognition, tense/aspect tagging, and direct speech annotation.21 All these tools have been integrated in an UIMA pipeline to ensure their sequential processing.22 Part of the speech tagging and lemmatization relied on the Mate component (Björkelund et al.), while the Stanford Named Entity Recognition system provided information on the appear- ance and distribution of proper names (Finkel et al.). The other components are rule-based and were developed specifically for this project.

4.1 Evaluation of Automatic NLP Components

Since most modern tools for natural language processing (NLP) employ machine learning methods, their performance depends on the availability and size of the train- ing corpora. If language characteristics between training and application corpora dif- fer, one can expect a performance drop (Reiter 47). We manually added annotations for token boundaries, part of speech tags, and lemmas to a portion of the corpus to assess the quality of the automatically produced linguistic annotations. We selected the test corpus according to linguistic and pragmatic criteria, such that phenomena we knew to be problematic (e. g. direct speech) were included. We do not expect the plot, theme or setting to affect the performance on this level of linguistic analysis.

Tokens Tokenization PoS-Tagging Lemmatization Kate Chopin, “Désirée’s Baby” 2,575 97.5 86.8 93.8 (1893) Mark Twain, “The Celebrated 3,088 91.8 78.5 88.4 Jumping Frog of Calaveras County” (1865)

Table 1: Evaluation results on reference corpus. All columns except tokens show accuracy, i. e., the percentage of correctly identified/classified items.

21 Tokens include all words and all punctuation symbols. The task of tokenization is non- trivial because some punctuation symbols form a unit with their preceding word (abbrevia- tions), while others do not. Part of speech tagging describes the process of identifying the part of speech of a word (e. g., noun). Lemmatization aims at identifying the base form of inflected words (e. g., assigning the base form ‘be’ to the verb ‘is’). Named entity recognition identifies proper names in texts and groups them according to the type of the named entity (e. g., person or location). Tense/aspect tagging assigns grammatical tense and aspects to verb phrases, and direct speech annotation marks direct speech in a text. 22 See . 208 Stephanie Siewert and Nils Reiter

Table 1 shows the results of the evaluation, measured in accuracy (i. e. percent- age of correctly identified/classified items). The numbers indicate that none of the methods are working flawlessly. Up to 22 percent of the words are assigned an erroneous part of speech tag, up to 12 percent display an erroneous lemma. In the following analysis we focused on linguistic analysis levels that achieve high performance and do not use part of speech tags. The 7-12 percent error rate for lemmatization lies within an acceptable range. The results also emphasize another aspect: There is a clear difference between the two stories. The tools achieve a better performance on “Désirée’s Baby” than on the “The Celebrated Jumping Frog of Calaveras County.” Manual inspection of the errors reveals that this is mostly due to vernacular speech. In Twain’s short story, character speech includes words like “solit’ry,” “feller,” “thish-yer.” These tokens pose a significant challenge to tools that were developed originally for newspaper texts, as determining their correct reading often involves recognition of their pronunciation.

5 Experiments / Quantitative Results

This section presents and discusses the experiments we conducted, from a techni- cal and quantitative perspective.

5.1 Overview

Figure 1: Overview of the Corpus. Figure 1 displays the distribution of short stories over de- cades, in absolute numbers. As one can see, the corpus is not evenly distributed. About one half of the stories were written after 1890, and some decades are represented by five or fewer than five stories. The Explorative Value of Computational Methods 209

5.2 Experimental Results

In the following charts, each short story is represented by a number. The list of stories with their IDs is available online, along with high-resolution versions of all plots and the R code we used to produce them.23 Trend lines follow a polynomial equation that minimizes the squared error (implemented in the R function scat- ter.smooth()). In addition to the trend lines, the plots depict the certainty level as a dashed line. One can be relatively certain that the real trend line lies between the dashed lines (‘confidence band’). This space is wider at the beginning and end of the plots, because we have no information of the stories preceding and succeeding our corpus. We concentrated our analysis on the narrow parts. These confidence bands also serve to counterbalance the skewed temporal distribution (a large amount of stories in two decades).

Story Length

Figure 2: Story length measured in tokens and sentences

Irrespective of the way of measuring (by counting sentences or tokens), we did not observe a lot of variation in the length of the stories: 50 percent of the stories are between 4.800 and 8.600 words long, and consist of 202 to 454 sentences. Both numbers remain relatively stable. Clearly visible in figure 2 is one exceptionally long story: Washington Irving’s “Wolfert Webber; or, Gold Dreams” (27). This short fiction contains a long embedded narrative (and could thus be considered as two stories of roughly 1.300 sentences). The other two exceptions were both writ-

23 DOI: 10.5281/zenodo.1307315, currently hosted on . 210 Stephanie Siewert and Nils Reiter ten by Henry James: “The Story of a Year” (35) and “A Passionate Pilgrim” (42), and can thus be seen as indicative of a certain author style. The fact that story length did not change significantly within our corpus poses interesting questions: How can we further theorize and operationalize the concept of brevity with respect to the rather hazy but much acclaimed com- pression and concision of the short story (cf. Poe, Hamilton, Matthews)? Is the concept of brevity connected to a shortening of the form itself over the last two centuries? There are numerous reasons why the alleged decline of story length cannot be found in our corpus: (1) Corpus size: In terms of data science, the data set we work with can still be considered small. A larger corpus might show this tendency. (2) Time period: If we look closely, we can identify a small drop in story length, but it starts later than expected. The curve reaches its highest peak roughly in 1860, and this peak is followed by a minor decrease in story length. But since this decrease is only about 40-50 sentences; it is hardly noticeable in close reading. (3) Over-estimation of samples: If we compare stories by Henry James (35, 42) with those of Edith Wharton (89, 90), we would indeed conclude that the length has decreased. Such an assumption, however, fails to take into account that James’s stories are not at all representative for the short stories of the 1860s in terms of length. This fact further substantiates Tallack’s assump- tions about James’s “anachronistic techniques” (182). Indeed James’s short fic- tion seems ‘out of place’ within the modernist discourse and the more lyrical forms of the short story, while it also seems to defy magazine standards of the time. Tallack quotes James’s unease with the genre as being “too restrictive for him” (qtd. in Tallack 205-06). Story length is but one aspect in a range of features that can shed light on ques- tions of literary vanguard, often seen as a phenomenon of gifted single authors, declared and at times romanticized as bold innovators. Are authors such as Edgar Allan Poe, Nathaniel Hawthorne, or Herman Melville really distinct when ana- lyzed on the basis of linguistic features? Or do other, less visible authors, deter- mine (popular) shifts in genre at the time and beyond?

Sentence Length Next, we analyzed the length of sentences in the stories (measured in tokens). Figure 3 shows the mean length per story, their maximum, and standard devia- tion. The latter expresses the variance of sentence length within a story: the lower the standard deviation, the closer the sentence lengths are to their mean (within one story). The average sentence length decreases from over 30 to just over 20 tokens: sentences become shorter (a). The longest sentence per story rises to a local maxi- mum around 1860 but then quickly drops after that (b). Interestingly, the stan- dard deviation of sentence lengths also decreases (c). Stories become more ho- mogeneous with respect to sentence length, both extremely long and extremely short sentences within one narrative become less frequent. Henry James’s stories differ with respect to the overall length of the story, but not with respect to sen- tence length, whereas Mark Twain’s “The Celebrated Jumping Frog of Calaveras The Explorative Value of Computational Methods 211

Figure 3: Diachronic development of sentence length, measured in words

County” (36) and William Allen White’s “By the Rod of the Wrath” (105) depart significantly (t-test, 95 percent) from the average sentence length. An interesting case is Edgar Allan Poe’s “A Man of the Crowd” (19), since it features one extremely long sentence and thus a high standard deviation. The longest sentence is over 400 tokens long and describes the perception of a scene by the protagonist. One could argue whether the sentence is perceived as a single sentence by a reader, but the case is clear from a linguistic point of view. There is a single finite root-level verb at the beginning, followed by a list of noun phrases many of which contain relative clauses: “I saw Jew peddlers […]; sturdy profes- sional street beggars […]; modest young girls …” This, however, is the only sen- tence in the story that falls out of the ordinary: The average sentence length is only slightly outside of the confidence band. In his essays “Paris: The Capital of the Nineteenth Century” and “Some Mo- tifs on Baudelaire,” Walter Benjamin called the protagonist of Poe’s “A Man of the Crowd” a modern flâneur. Interestingly, the phenomenological impression of the strolling spectator, which often figures in longer, meandering sentences, does not correspond to further and comprehensive deviations from standard sentence length in Poe’s story. Dana Brand argues that the concept of the flâneur acts as a defining feature in Poe’s philosophical and fictional universe. Yet the author “was not a cultural anomaly, out of step with his time and place, but was thoroughly grounded in a specific historical and cultural moment, and a particular literary tradition,” in which “he was first and foremost a magazinist” (Werner viii).24

24 As early as 1923 F. L. Pattee observed: “Poe as a writer of tales undoubtedly was created by the magazines, and the magazines movement to a large extent had been brought about by copyright conditions” (130). 212 Stephanie Siewert and Nils Reiter

Poe was well aware of the burgeoning mass culture, and while he criticized the commodification of art, he clearly profited from the blooming publishing industry (cf. Whalen). The standard sentence length might tie back to publishing customs and reading habits of the mid-nineteenth century. While the story’s overall syn- tactic construction seems conventional, the long sentence acts as a disturbance, a defamiliarization within an otherwise consistent story flow.25

Proper Names The analysis of proper names is based on automatically detected named enti- ties. Systems for named entity recognition (NER) detect proper names for per- sons, organizations, locations, and others. Our analysis took only person names into account. We counted both the raw number (relative to text length) and the number of unique names. The raw number gives insight into stylistic features of the story, because authors have the choice of using pronouns (she) or appelative noun phrases (the dark-haired woman) to refer to characters instead. The number of unique names can be seen as a rough approximation of the number of different characters in the story. It is a rough approximation for two reasons: 1) Not all characters in narrative texts have to be named. If reference to one character is always established with a pronoun, the number of unique proper names does not include this character 2) The same character can be referenced by different names, e. g. with first name only, and with their full name. In this case, a single character will be counted twice (as John and John Doe, for instance). It is very hard to quantify these two error sources. Thus, we have to be cau- tious when interpreting results based on counting unique names. A third factor that influences these numbers is the length of the text: Longer texts may include more proper names. We therefore normalized the raw counts of proper names and unique names by story length and show the numbers per 100 tokens. If we look at figure 4a, we see a strong increase in the number of proper names, followed by a likely drop just at the end. The number of proper names increases relative to text length, which could indicate a stylistic change in this period. The number of unique names roughly follows the same development, rising in the be- ginning and reaching a stable plateau just above 0.2 unique names per 100 words. We will discuss the most striking outlier (44) in section 6.

25 Poe is in some regards a precursor of literary impressionism (Matz 38-9). The degree of sentence length deviation could be another valid indication of literary experiment. It might be used to explore textual manifestations of sensations within the author’s oeuvre or serve as a vantage point for more elaborate comparisons between Poe and his modernist peers Virginia Woolf, Marcel Proust or James Joyce. The Explorative Value of Computational Methods 213

Figure 4: Percentage of automatically detected names in relation to text length

Lexical Richness A standard metric for lexical richness is called type-token-ratio (TTR). In- tuitively, the TTR shows the percentage of new (= used for the first time) tokens for every token in a text. The TTR is calculated by dividing the number of types (i. e. the number of different tokens) in a text by the number of all tokens in a text. The sentence “the cat is on the mat” contains six tokens but only five different to- kens (= types) because “the” appears twice. The TTR of this sentence is therefore ⅚ = 0.833. A text in which no token is repeated would have a TTR of 1, while a text that consists of repeated words has a TTR of slightly above 0. The higher the TTR, the more varied the vocabulary of a text is. However, a large portion of natural language vocabulary are function words, over which authors have limited control, as grammatical rules govern the use of these words.26 Statistically, the most frequent words are function words (mostly determiners, prepositions, and pronouns). This means that longer texts ‘automati- cally’ have a lower TTR (if they adhere to grammar) because more and more words are repeated. This in turn also means that the TTR of texts with different lengths cannot be directly compared—longer texts have a lower TTR because of their increased length, not because of a lack of lexical richness. One way to make the texts comparable is to measure TTR over text segments. For this standardized TTR (STTR), we split a text into segments of 100 tokens, for instance. We then calculated TTR over these segments, and averaged the TTR of all segments.

26 This is also the reason why function words are so important for authorship attribution and are sometimes said to form the ‘fingerprint’ of an author (Hoover 176). 214 Stephanie Siewert and Nils Reiter

Figure 5: Standardized Type-Token-Ratio.

Figure 5a shows the STTR for the entire text of each short story. STTR values lie between 0.33 and 0.51, which is a relatively large range. Consequently, the trends are harder to determine (confidence bands have a wide distance). Generally, we see a decrease between 1860 and 1895, and more or less stable periods before and after. The story with the lowest STTR is Thomas Nelson Page’s “Marse Chan” (52), while the highest STTR is found in O. Henry’s “The Making of a New Yorker” (109). In figure 5a we looked at the entirety of the texts. Figure 5b shows STTR only for character speech, i. e., for all tokens within double quotes. We did not reflect indi- vidual characters here, but focused on excluding the narrator voice. The plot shows only texts that have at least 500 tokens of character speech. The decrease in STTR becomes stronger: The variety of the vocabulary used by speaking characters in the short stories declines up until 1880, and increases again afterwards. The dispersion is still high, but not as high as when analyzing entire texts.

Character Speech Based on these encouraging results, we proceeded further with our analysis of character speech. Detecting different kinds of character speech in narrative texts is far from trivial, as Brunner argues (“Automatic Recognition of Speech”), in par- ticular when it comes to indirect or reported speech. Our analysis only took into account direct speech signified by quotation marks. In a small sample evaluation, we did not find any instances of non-direct speech marked in quotation marks.27 Figure 6a shows the number of utterances. We see that the relative number of direct speech instances varies between zero and three (per 100 tokens) for most

27 This is likely a result of the restricted domain. In open domain text analysis, one has to consider other uses of quotation marks, e. g., as scare quotes or simple emphasis markers. The Explorative Value of Computational Methods 215

Figure 6: Number of direct speech events (a), amount of words in direct speech (b), mean utter- ance length (c) and standard deviation of utterance length (d). texts. Texts with more instances appear later, and the beginning of our period is marked by relatively few instances per story. The percentage of words in di- rect speech (fig. 6b) is more or less evenly distributed over time, with an increase between the 1820s and 1860s. There is one interesting outlier, however: George Washington Harris’ story “Sicily Burn’s Wedding” (37) is narrated (almost) en- tirely in dramatic mode. The average length of the utterances in a story (fig. 6c) shows a slight decrease, while their standard deviation remains stable, i. e. the amount of long and short utterances per story does not change much. “Sicily Burn’s Wedding” features very long utterances, together with Louisa May Alcott’s “The Frost King” (23) and Thomas Nelson Page’s “Marse Chan” (52).

Spelling Inspired by the evaluation of the automatic NLP components, we conducted a final analysis geared towards vernacular speech. Opening a text with a high rate of vernacular speech in a text editor immediately demonstrates that many vernacular words are considered spelling mistakes by spell checkers. We there- fore counted the misspellings (normalized against the length of the story).28 The dissemination is shown in figure 7 on the left. The right plot gives the same analy- sis with respect to the amount of misspelled words in the character speech. Of course, deviations from standardized spelling are only one aspect of vernacular speech. More elaborate forms, for instance unusual syntactic constructions, are not recognized by this approach. In addition, the spell checker allows for some

28 We follow the spell checkers’ view here. These are not necessarily mistakes or errors. 216 Stephanie Siewert and Nils Reiter

Figure 7: Portion of irregularly spelled words, as measured by spell checker. variation and combination of words (e. g. deriving adjectives from verbs), includ- ing certain words, which are ultimately not acceptable. As we can see in figure 7, some interesting changes take place with regard to vernacular speech. The overall amount of misspelled words varies between zero and five percent, with a few outliers after 1880. Around 1880, there are no stories that do not feature some vernacular words, which causes the trend line to form a hill over 1880. The number of outliers above the trend line (i. e., with a higher amount of misspelled words) increases over time. Looking at the vernacular char- acter speech, we see a similar picture. There are some stories that feature ver- nacular speech before 1880, but both their number and the extent increase after that date. The overall impression is that vernacular speech is the exception. Most stories and most characters in these stories do not feature a strong vernacular style. In- terestingly, one would expect a correlation of the spelling irregularities with the type-token-ratio (see above). If a word appears in different spellings (to reflect different characters’ speech), it generally increases the TTR. This effect, however, is not visible in our analysis, and calculating Pearson correlation only shows a mi- nor coefficient (ρ = –0.22).29 There are multiple possible reasons for this: The total number of spelling irregularities might be so low (between zero and five percent of the tokens) that it does not have a measurable impact on the TTR. In contrast,

29 The correlation describes the strength of the connection between two variables. The co- efficient ρ can be interpreted as a factor applied to the dependent variable. If the independent variable is increased by 1, the dependent is increased by ρ. Thus, a correlation coefficient of 1 describes a perfect correlation, -1 a perfect but negative correlation (the dependent variable is decreased). A coefficient of 0 describes no correlation. The Explorative Value of Computational Methods 217 if an author uses the same spelling irregularity multiple times (and instead of a more varied vocabulary) the TTR will be negatively impacted by these words. It is important to note that both TTR and spelling irregularities do not directly measure vernacular speech. The finding that they do not correlate suggests that TTR and spelling irregularities measure different aspects of vernacular speech.

6 Interpretation

The short story has often been called a seismograph of change: an apt form “for the perception of crisis” (Scofield 238). And indeed, one might argue that short fiction embodies a fundamental tension inherent in Western modernity: the rise of modes of individualization as a consequence of the growing economiza- tion of life. Our findings support this hypothesis. The increase of proper names (see fig. 4) and direct speech (see fig. 6a and 6b) indicate the growing interest of writers to create credible story worlds and to emphasize distinct cultural patterns and cultures of conversation. The decrease in length of utterance (see fig. 6c), the reduction of standard sentence length, the standardization in the distribution of sentence length (see fig. 3), and the decrease in lexical richness (see fig. 5) might be due to the accelerated pace of modern life, the impact of magazines and jour- nals on the publication of short stories, the rise of new printing standards, and an overall commodification of culture. In the following we will put our findings into context and present one outstanding example in the use of proper names, to il- lustrate the explorative value of our analysis.

6.1 Proper Names and Direct Speech as Sources of Credibility and Distinctiveness

The increase in the number of proper names per story from 1840 to 1916 re- quires a closer look (see fig. 4). In literature, names always prompt the question of referentiality, not only between fictional and real world but also within and between texts. While some names are endowed with almost metaphysical gravi- tas, others are meaningful due to their apparent lack of individuality or analo- gous function. Alastair Fowler has pointed out that names cannot be easily de- termined as either “hermogenean,” ordinary, meaningless names, or “cratylic,” meaningful, moral names (98). They exist in a social context, they are assigned, adopted, and interpreted by individuals, groups, and institutions at a certain time. While literary scholars usually dwell on the qualitative, telling function of names in texts, we tend to neglect the importance of their quantitative appearance. The frequency, time, and place of proper names within texts might add to their seman- tic value since they support, clarify or even contradict meaning. While fictional names are not always associated with ‘real’ individuals in the narrative, we pro- pose that these stories increasingly use proper names to attest to the actuality and the distinctiveness of events and people. They support the verisimilitude of the narrated world, its credibility of setting and characters. Our findings correspond 218 Stephanie Siewert and Nils Reiter to the rise of realist fiction after the Civil War, which reached its peak at the turn of the century and stressed the common-place, everyday experience of men and women and the diversity of the nation. The living conditions and social fabric of urban and rural areas changed dra- matically in the nineteenth century. Metropolises such as New York and Chicago grew fast, and alongside these cities stories also became more crowded. The ex- perience of the common man was a drawing card for the reading masses in the United States and Europe, promoting America as the land of dreams for ordinary people. Topics and writing styles had a lasting effect on the print industry and vice versa. The booming magazine culture catered to different social and political groups. Their struggle to define what it meant to be American fostered an individ- ualist perspective. Magazines, pamphlets, and newspapers shaped working-class identities and a sense of political union, but they also helped to define an ideol- ogy of middle-class professionalism. Along these lines, proper names were (and still are) used to perform class distinction and respectability. While the increase in proper names does not account for the variety of characters or the frequency of the appearance of a particular person (the numbers might be due to one name being mentioned several times), the results tell us something about the need for recognition. With respect to a more general shift towards individualized forms, the slight increase of direct speech in our corpus supports the argument of a turn towards the ordinary, which manifested itself in an enactment of the unmediated experi- ence and language of the characters and their social exchange. The success of the theater (from classical to burlesque) in the nineteenth century among all classes, and the emergence of new media devices such as the telephone affected com- munication patterns in U.S. society. These changes might also factor into the ris- ing curve visible in our analysis. After all, the prominence of direct speech con- tributes to a text’s performative quality. The increase in the percentage of direct speech serves as an expression of reduced distance and underlines the tendency towards a more immediate, unfiltered transmission of experience observable to- wards the end of the nineteenth century.30

6.1.1 Thomas Bailey Aldrich and the Politics of Names: A Close Reading The story “Mademoiselle Olympe Zabriski” (1873) by Thomas Bailey Al- drich, first published in The Atlantic Monthly and later anthologized in William Dean Howells’s The Great Modern American Short Stories, contains the highest number of proper names in the corpus (i. e., personal names). While this alone does not make for an interesting argument, the text is indeed a rediscovery, a submerged literary gem that was forgotten during the heyday of realist fiction and literary modernism. A close reading reveals that proper names are used strategi- cally to comment on the altered social sphere of the 1870s and to raise questions of class and identity in the Gilded Age.

30 For more information on the “showing”, mimetic function of dialogue and character speech and its implications for distance, see Chatman 32, Genette 44-49, Fludernik 36. The Explorative Value of Computational Methods 219

Aldrich (1836-1907) was a poet, a short story writer, and editor of The Atlan- tic Monthly (1881-1890). Today he remains best known for his short story collec- tion Marjorie Daw and Other People (1873) and his popular classic The Story of a Bad Boy (1870), which inspired Mark Twain’s Adventures of Tom Sawyer (1876) and Adventures of Huckleberry Finn (1884) (cf. Jacobson). At a young age, ­Aldrich took up a business position and wrote poetry for magazines (Oxford; Merriam Webster). Later criticized for his “Boston-plated conservatism,” he was “esteemed and respected as a major literary figure” in his own time (Samuels 16, 15). Nowadays, the author’s work is absent from short story collections and hardly recognized among scholars of the field.31 And yet, within our frame of analysis, the story proves to be an outstanding ex- ample of the politics of proper names in nineteenth-century fiction. The narrator, a member of a prestigious gentlemen’s club, tells the story of Ralph Van Twiller, “a lineal descendant of Wouter Van Twiller, the famous old Dutch governor of New York, Nieuw Amsterdam,” (385-6) who falls in love with a trapeze artist, named Olympe Zabriski. Young Van Twiller, whose “ancestors have always been bur- gomasters or admirals or generals” (386), admires the lady from a distance and is struck with shame when he finds out that his beloved is a poor, cross-dressing gymnast, who plays the part for a living. The name Van Twiller is mentioned 79 times in the narrative, which amounts to about 2 percent of the tokens in the text. On average, over the entire corpus, the most frequent name per text is only 0.3 percent of the tokens. The frequent use of a single name is not related to an overall low lexical richness of Aldrich. In fact, type token ratio in his story is higher than average. Instead of using the pronoun or other noun phrase more often, Aldrich chooses to repeat the last name Van Twiller, giving it an urgency of self-affirmation in view of a declining aristocratic culture and the rise of the working class. The constant use of the name works as a mantra of class-assurance, a bulwark of authority that slowly crumbles as the story progresses. Van Twiller once calls ‘Olympe Zabriski’ a “nom du guerre,” which strikes a comic note. Taken literally it captures the class conflict of the character constellation quite wittily. Humor serves as a central ploy to disrupt the social order and moral high ground of the genteel society of New York, whose hypocrisy became a target for writers such as Thomas B. Aldrich and Mark Twain.32 Aldrich

31 With the exception of his short story “Marjorie Daw,” his life and achievements as a witty short fiction writer with a talent for surprise endings has been lost on the critics of the twentieth and twenty-first century (cf. Samuels). There are only two major works dealing with his role as an editor and author, namely, the authorized biography by Ferris Greenslet and Charles E. Samuels’s small but informative rendering of Aldrich’s literary talents and his place among New England’s intellectuals. 32 ‘Olympe’ is not only home to the Gods, alluding to the heights at which the boy performs, both figuratively, in terms of the illusion he creates, and physically, as a trapeze artist, but also hints at the well-known female French playwright and early feminist Olympe de Gouges, who was executed during the French Revolution. While she was an abolitionist and fought for equal- ity between the sexes, she remained loyal to Louis XVI (Smart 115). The fusion of an ambi- tious lower-class and an emancipated female identity serves as an interesting counterpart to the aristocratic bachelor in the story. The last name ‘Zabriski’ adds to the hybrid constellation and might refer to the wave of Polish immigration in the 1870s as a consequence of the Franco-Prus- 220 Stephanie Siewert and Nils Reiter deliberately plays with the reader’s expectations and the allure of names as indica- tions of social rank and status. In the end, he presents the story of a clever ‘bad boy,’ whose queerness, adapt- ability, and ambition make him a better fit for the modern world. In a letter to Van Twiller, in which he explains his financial situation and reveals his gender identity, the boy signs with “Charles Montmorenci [sic] Walters” and once more uses a fake aristocratic name to debunk the romanticist notions of both Van Twiller and the reader. The name is an allusion to the French noble, Charles Montmorency, who also appears in Sir Walter Scott’s History of France (1834). The author him- self might have been the eponymist for the second surname. This proves to be a rather satirical statement on the popular sentimental and romantic literature of the time. While the boy’s adopted name mimics the authority of his counterpart Van Twiller, his note, riddled with mistakes, discloses his real social status and seems emblematic of the ‘uneducated masses.’ In a similar vein, the gentleman Van Twiller is exposed as an old-fashioned bohemian, whose own internalized ideals of romance and chivalry fail when confronted with the economic depen- dencies of ordinary men. While Aldrich’s text somehow derides the aristocratic elite and the uneducat- ed working class, who both indulge in a corrupted understanding of art, the new middle-class seems to be absent as the ‘reasonable’ third party. Aldrich, who had experienced the Hudson River aristocracy as a society that “based its hierarchy primarily on birth and money” moved to Boston, where he, as an (upper) middle class writer, was greeted with respect by the Brahmin class and recognized as a professional writer (Samuels 18). The author inhabits a middle ground between the old genteel traditions, melodramatic sentiment, and a new middle-class pro- fessionalism in the last third of the nineteenth century. He was a stern believer in the purity and relevance of his own profession, yet his politics remained close to genteel morals, preferring a detached intellect over populism and progressive middle-class values. 33 Contemporary literary critics and historians have shunned the author for his conservative views (cf. Samuels). While it is certainly true that he portrayed the “New England eccentricities” and a nostalgia for a vanishing genteel tradition (Samuels 7), he was also well aware of the regional differences within New Eng- land culture. “Mademoiselle Olympe Zabriski” shows that Aldrich understood the nuances of the changing social climate in the United States. He was less naïve about class issues than critics such as H. L. Mencken, C. Hartley Grattan or Ver- non Louis Parrington believed, who had attacked him for his “escapist fiction” (Samuels 134). Finally, we would like to emphasize Aldrich’s relevance as a literary role mod- el for one of the greatest U.S.-American authors, Mark Twain. In fact, the repre- sian War. However, the name also harks back to the colonial history of the United States and keeps the reader guessing about the true nature of the artist’s identity and class consciousness. ‘Zabriski’ was a respected name among the New England community, the first ancestor settled in colonial America, and became a judge and landowner (Bergen County Historical Society). 33 See Ferret; Samuels. The Explorative Value of Computational Methods 221 sentation of Young Van Twiller, whose “ancestors have always been burgomasters or admirals or generals” (386) reminds us of Mark Twain’s Life on the Mississippi (1883), which contains barbed assaults on southern chivalric fiction in American periodicals. In his famous essay “Enchantments and Enchanters,” Twain rages: It was Sir Walter that made every gentleman in the South a major, or a colonel, or a general, or a judge, before the war; and it was he, also, that made these gentleman value these bogus decorations. For it was he that created rank and caste down there, and also reverence for rank and caste, and pride and pleasure in them. (501, emphasis added) Aldrich’s mockery of the aristocratic New York elite seems to anticipate Twain’s ridicule of the glory of fake ranks and riches in the South—only ten years before Life on the Mississippi was published. The two men became acquainted in 1871. Four years later Twain worked together with Aldrich at the Atlantic Monthly (Ja- cobson; Samuels).34 A lifelong admirer of Aldrich’s work, Twain praises the au- thor in his autobiography. In an entry from 1904 he recalls a conversation with Louis Stevenson, where he describes the popular Bret Harte as a “good company” and “pleasant talker” (247), yet he must not be classed with Thomas Bailey Aldrich, nor must any other man, ancient or modern; that Aldrich was always witty, always brilliant […] Aldrich has never had his peer for prompt and pithy and witty and humorous sayings. None has equaled him, certainly none has surpassed him, in the felicity of phrasing with which he clothed these children of his fancy […] when he speaks the diamonds flash. (247-48) While Aldrich’s boy fiction The( Story of a Bad Boy) and its impact on Twain’s work has been noted,35 his more general influence on Twain’s style and attitude to- wards the New England elite and the aspiring ‘newly-rich’ might have been under- estimated and deserves further attention. Although Twain’s political views were more progressive, the two authors share a certain humor and wit in their tone and phrasing. A comparative quantitative analysis of their short fiction and essays could offer further insight into their literary entanglements.36

6.2 Economization in Language: Sentence Length and Vocabulary Richness

While the increase in proper names and direct speech (see fig. 4 and 6) cer- tainly shows a shift towards a more character-driven and identificatory narrative style, other aspects indicate an economization in language. The corpus presents a decrease in length of utterance (see fig. 6c), a reduction of standard sentence

34 Even earlier, in his novel The Gilded Age (1873), co-authored by Charles Dudley Warner, Twain satirizes the longing of the ‘newly-rich’ to belong to the refined society. Here the Irish family O’Reilly changes their last name into “Oreille” to pass for a wealthy French family from Cork. 35 See e. g. the Mark Twain Project. 36 The personal and professional connections between Aldrich, Twain, Howells, Keener, and others are foregrounded in Thomas Ruys Smith’s article on the literary friendships of the gilded age. Lilian Aldrich’s autobiography Crowding Memories provides further information on the nature of Aldrich’s and Twain’s acquaintance. 222 Stephanie Siewert and Nils Reiter length, and a standardization in the distribution of sentence length, while the lexi- cal richness decreases. The main reasons for these developments might be found in the standardization of language, the technical innovations of the print culture industry, and the economic and social developments of the post-Civil-War society. Oliver Scheiding is right to point out that one needs to examine the interrelations between “practices of text-making” and “cultures of print in which textual circu- lation and economic exchange are homologues” (Scheiding 325). The industrialization of print products, and especially the printed book, “be- longed to this broader development within the publishing trades and related in- dustries, which was itself part of the American and transatlantic Industrial Revo- lution of the mid-nineteenth century” (Casper 7). The proliferation of newspapers, magazines, and other periodicals from 1840 to 1880 altered the public sphere. As Groves states: “The growth in the number of periodicals reported was tremen- dous—from 1,631 to 11,314, almost sevenfold increase” (225). The second half of the nineteenth century and the beginning of the twentieth century witnessed another major step in the evolution and impact of the printing press and equip- ment manufacturers, printer unions and schools.37 Printers began to standardize spelling in order to make the printing process more efficient. Magazines facili- tated the idea of the professional writer, not least by unifying payment of author contributions. As Lumpfer states: The Atlantic had a base rate of $5 to $6 per page. Harper’s and Putnam’s paid about the same. These magazines did not simply expand the market for paying work; they also considerably altered how American writers understood the role that magazines might in their careers. (254) Short story writers began to understand that their own career depended on the growing magazine market and new journalistic forms of narration. As the literary historian F. L. Pattee summarizes: Magazines and newspaper offices now became schools for short story writing. […] By the later nineties, the short story had become so established an article of merchandise that the production became a recognized industry with numberless workers. The coming of the fifteen-cent magazine and the Sunday supplement stimulated greatly the quantity of the output. (337) Most of the short stories from our corpus first appeared in magazines facing a highly competitive field. Authors trying to enter the literary market discovered that certain topics and genres, such as melodrama and sentimental fiction, were well received among a broad audience. The increasing numbers of readers pro- duced by public education and their hunger for entertaining and educating stories put further pressure on the printing industry. With the new technical advancements in printing, quality improved and costs declined, making print products, such as magazines and journals, more afford- able to the general public. The shorter life cycle and thus transitory nature of texts

37 Merriam Webster’s speller, which he first published in 1783, and his American Dictionary of the English Language (1828) helped facilitate homogenization. In 1857, work on the Ox- ford English Dictionary began and in 1866 Thomas MacKellar published his famous American Printer. A Manual of Typography. The Explorative Value of Computational Methods 223 might have fostered a higher productivity at the cost of more elaborate modes. Already in 1844, Edgar Allan Poe observed: “(the) energetic, busy spirit of the age tended wholly to Magazine literature—to the curt, the terse, the well-timed, and the readily diffused, in preference to the old forms of the verbose and pon- derous and inaccessible” (“Letters” 271). At the same time, standardization and reduction of utterances mirror the economization of time in language. Short sto- ries competed with other more comprised textual forms of information such as advertisements and news—often within the same medium. The copyright law of 1881 might have further affected the length of utterances and the decrease of lexical richness. It was cheaper for magazines to pirate Eu- ropean works for publication since authors in the United States demanded royal- ties. At the time, young authors from outside the metropolitan centers took up the opportunity to publish their works for less to gain a foothold in the dominant northern literary market (Maguire Skaggs 1-7). The increase in spelling mistakes since 1860 (see fig. 6 and 7) supports the assumption that their texts might not have been as refined and linguistically rich as the work of their more experi- enced European counterparts. After the Civil War, many short , often hastily written, (re)told the story of a defeated yet glorious South (Washington Smith 4). Young writers longed to rebuild the respectability of a lost ground and their humiliated fathers (Maguire Skaggs 5-7). In return, Northern magazines furthered the interest in remote and unfamiliar territories and cultures at a time when people tried to grasp the sheer size and diversity of the nation (cf. Maguire Skaggs). In this respect, it would be rewarding for future research to focus more closely on the effects the accelerated society had on the stylistic developments of the short story.

6.3 Vernacular Language and the Publishing Sphere

Digital methods help us to explore certain trends in their nuances, deviations, and ambivalences. They might also aid us in questioning linear narratives alto- gether. The distance between texts allows us to classify and group them according to topics and linguistic patterns, which may further support or challenge labels such as romantic vs. realist or Ethnic vs. Anglo-American. One rather surprising pattern in our corpus pertains to the use of vernacular language (fig. 7). Regard- ing the prominence of local color fiction throughout the nineteenth century, we had expected a more significant development but vernacular cannot be called a determining stylistic aspect in this selection of texts.38 While the limited number and representativeness of texts does not allow for any statements on a general in- crease in the variety of American dialect and further class and ethnic distinctions throughout the nineteenth century, the findings support two main assumptions of American literary history. First, the close proximity of Thomas Nelson Page (52) and Charles W. Chesnutt (56) complements qualitative research in the field.

38 For a quantitative analysis of vernacular language in American , see Gemma et al. 224 Stephanie Siewert and Nils Reiter

Their topical connections and Chesnutt’s cautiously subversive variation of Page’s “Marse Chan” are well documented in literary studies.39 Second, the increase of spelling issues, which reach their peak at the end of the 1880s, further confirms the findings of literary historians such as Richard H. Brodhead, who argues that regional fiction and dialect had an important impact on reading tastes in the second half of the nineteenth century. F. L. Pattee goes further by saying: The eighties in the history of the American short story were ruled by the local ‘color- ists.’ It was a period of dialect stories, of small peculiar groups isolated and analyzed of unique local ‘characters’ presented primarily for exhibition. The short story writer now thought first of materials, often only materials. (268) Post-Civil War writing about the South, the West, and even rural northern New England represented a metropolitan nostalgia for a decaying picturesque world (cf. Brodhead). Amy Kaplan agrees: Regionalist writers […] were published by a highly centralized industry located in Bos- ton and New York that appealed to an urban middle-class readership; this readership was solidified as an imagined community by consuming images of rural ‘others’ as both a nostalgic point of origin and a measure of cosmopolitan development. (251) Plantation literature, in particular, written for (upper) middle class Americans, upheld the idea of the ‘Old South’ as an edenic space of past glories. Showing “characters, and dialects with varying degrees of literary skill and at least a ve- neer of authenticity,” authors such as Thomas Nelson Page fostered stereotypes of the servile and docile African-American character (Lamplugh 177; MacKethan). His fiction promoted white supremacy at a time when Jim Crow took its toll and America witnessed violent lynchings across the country. C. Vann Woodward fa- mously stated: “one of the most significant inventions of the New South was the ‘Old South’—a new idea in the eighties, and a legend of incalculable potentiali- ties” (154-55). This new idea also figured in a large number of publications from the South, as Merril Maguire Skaggs argues: “Between 1860 and 1900 local color burst into print” (1). In our corpus, the number of spelling issues drops after the 1880s and con- solidates around 1900. This result to an extent contradicts, or maybe provides additional nuance, to the notion of a prolonged and above-average representation of vernacular at the time. One might assume that the urban publishing market for regional dialect fiction quickly became saturated, although magazines such as Scribner’s Monthly were “especially receptive to local color” (Maguire Skaggs 2). In a similar vein, Pattee recalls a situation from the literary life of Joel Chandler Harris, author of the Uncle Remus stories: The period closed some time in the early nineties. In 1898 Harris, writing to Scribner’s Magazine, felt he should apologize for sending them sketches in dialect: ‘That sort of stuff,’ he wrote, ‘has seemed to be under a ban.’ For new writers certainly it had been under the ban for a decade. (287)

39 See, for instance, Lamplugh or Martin. The Explorative Value of Computational Methods 225

After 1900, authors such as Will Levington Comfort, Annie Trumbull Slos- son, Wells Hastings, and Jack London are noteworthy for their use of vernacu- lar language, yet the overall use of vernacular declines (see fig. 7). The issues of the Reconstruction era, which were still prominent in the poor South, faded from the spotlight and a Northern (elite) readership turned to new topics, locales, and forms as socialist movements gained momentum and the specter of class war occupied the public sphere. A few years later, World War I firmly held readers’ attentions. We might also assume that urban spaces no longer appeared as the antithesis to rural areas but rather became one topical strand in a broad spectrum of literary appetites.40 We will need full (or, for a start, easier and better) access to magazine fiction and popular short story collections of the time to verify these as- sumptions. Once available, representative corpora of U.S.-American short fiction will offer a transformative contribution to the discussions on the prominence and distribution of vernacular fiction.

7 Conclusion

Quantitative methods constitute a valid entry point to address texts as commu- nicative sources (being read, being distributed). Quantitative research in Ameri- can Studies has the potential to open up new fields of discussion and paths of interpretation. In future work, we will need to complement our findings with a number of follow-up studies to fully grasp the complexity of the short story as a cultural artifact. We believe that the simple repudiation of these novel approaches misunderstands the truly explorative quality of quantitative methods. The quantitative analysis we have presented here is not all-encompassing. Instead, we focused on surface-level properties that can be extracted from the texts with the help of robust and validated methods. In the future, we plan to extend our analysis in four directions: 1) A more focused analysis of individual authors. Many authors represented in the corpus produced more than one short story. Looking for similarities or differences between their stories is technically feasible and promises to yield interesting results. 2) The phenomenon of ver- nacular speech also warrants a more extensive treatment. At least on the level of grammar an operationalization seems feasible and would allow us to detect un- usual syntactic constructions. 3) Most importantly, we will also extend the depth of our analysis towards a quantitative analysis of semantic text properties. We aim to explore how exactly trends that have been attributed to this period (e. g. urbanization) affect plot, character, and setting. 4) Although the automatic ex- traction of these textual properties poses a challenging task, recent advances in natural language processing (e. g. topic modeling, deep learning) provide us with new tools to that end. Topic modeling in particular offers an interesting starting point for further research. In combination with keyword analysis (which words

40 Casper argues that literary historians tend to “render urbanization the central develop- ment of our period,” while “it is equally true that in those years of westward expansion and sectional division, the nation was defined literally by its regional parts” (27). 226 Stephanie Siewert and Nils Reiter are used more frequently or in different contexts in one corpus compared to another), we plan to undertake analyses on the changing semantic and ideologi- cal value of concepts such as ‘property’ or ‘work’ throughout the eighteenth and nineteenth centuries. A combination of bottom-up and top-down approaches can offer valuable insights into discursive fields such as commerce and might challenge well-established concepts that define our understanding of the societal frames in question. One of the most hotly debated issues in the DH community concerns the rela- tionship of quantitative and qualitative methods, and the ways in which quantita- tive methods fit into hermeneutic workflows. We believe this to be an important discussion that can be addressed from different perspectives. The angle we have proposed here is to use quantitative methods as an explorative tool that helps us to review research positions or/and generate new perspectives. Given the fact that these methods do not measure literary concepts directly, caution is required when interpreting them. One approach to validate quantitative results is to compare and relate them with established accounts of literary history. In any case, these approaches can illuminate, and possibly alter the bigger picture. They help us to identify general, diachronic trends or the distribution of objects of interest in a given data set. At the same time we might become aware of outliers, exceptions, irregularities, or other interesting cases, which can then be studied more closely. This procedure can bridge the supposed antagonism between empirical research and critical cultural studies, which sometimes dismiss quantitative work as posi- tivist and reactionary. Ideally, scholars undertaking quantitative research may come across aspects that have gone unnoticed in topical readings of texts. Automated literary pat- tern recognition can be tied back to historical and socio-economic conditions. In fact, quantitative results depend on critical scholars to unveil issues of race, class, gender, and identity in the material at hand. Our own research would ben- efit from an extension of the corpus towards a more ‘democratic’ and popular scope. The current selection rests mainly on the preselected material of a white intellectual elite and their literary tastes. Full access to available texts, further digitization efforts, and better processing tools for historical material will bring us closer to a perspective on short fiction of the nineteenth century that moves beyond the canon. However, this requires additional resources (time, funding, and technical infrastructure). And we should not forget the time-consuming but rewarding interdisciplinary dialogue on the logic of computational and human categorization. In sum, we believe that the use of quantitative methods in the humanities currently only scratches the surface. To fully unlock their potential, we need to find an appropriate way of connecting quantitative and qualitative work for each research question and object of study. Due to their brevity, their popularity dur- ing the nineteenth century, and their influential role in shaping public and coun- terpublic spheres, short stories are well-suited as a subject of scalable readings and pose many fascinating questions for future work in the field. The Explorative Value of Computational Methods 227

Works Cited

Aldrich, Thomas Bailey. “Mademoiselle Olympe Zabriski.” The Atlantic Month- ly 32. 192 (1873): 385-91. The Unz Review. Web. 10 Oct. 2018. “Aldrich, Thomas Bailey.” The Oxford Companion to American Literature. Eds. James D. Hart and Phillip Leininger. 6th ed. New York/Oxford: Oxford UP, 1995. 16. Print. “Aldrich, Thomas Bailey.” Merriam Webster’s Encyclopedia of Literature, Merri- am-Webster, 1995. Academic OneFile. Web. 12 Feb. 2018. Aldrich, Lilian. Crowding Memories. New York: Houghton Mifflin, 1920. Print. Bausch, Richard, Cassill R. V. The Norton Anthology of American Short Fiction. 8th ed. New York: W. W. Norton & Company, 2015. Print. Bendixen, Alfred. “The Emergence and Development of the American Short Sto- ry.” A Companion to the American Short Story. Eds. James Nagel and Alfred Bendixen. Malden: Wiley-Blackwell, 2010. 3-19. Print. Benjamin, Walter. Illuminations. Essay and Reflections. Ed. Hanna Arendt. Schocken, 1968. Print. Björkelund, Anders, et al. “A High-Performance Syntactic and Semantic Depen- dency Parser. Coling 2010: Demonstrations, Aug. 2010, Beijing, 33-6, aclan- thology.coli.uni-saarland.de/papers/C10-3009. Web. 11 Oct. 2018. Bögel, Thomas, et al. “Collaborative Text Annotation Meets Machine Learning: heureCLÉA, a Digital Heuristic of Narrative.” DHCommons 1 (2015): n.pag. Web. 27 Mar. 2018. Brand, Dana. The Spectator and the City in Nineteenth-Century American Lit- erature. Cambridge: Cambridge UP, 1991. Print. Bridle, James. “The New Platform Luring Readers into Short Fiction.” The Guardian 31 May 2015. Web. 28 Feb. 2018. Brodhead, Richard H. Cultures of Letters: Scenes of Reading and Writing in Nine- teenth Century America. Chicago: U of Chicago P, 1993. Print. Brunner, Annelen. “Automatic Recognition of Speech, Thought, and Writing Representation in German Narrative Texts.” Literary and Linguistic Com- puting 28.4 (2013): 563-75. Print. Casper, Scott E. “Introduction.” A History of the Book in America. Vol 3: The In- dustrial Book, 1840-1880. Eds. Scott E. Casper et al. Chapel Hill: U of North Carolina P, 2007. 1-39. Print. Chatman, Seymour. Story and Discourse. Narrative Structure in Fiction and Film. Ithaca: Cornell UP, 1978. Print. Cordell, Ryan. “A Larger View of Digital American Studies.” Amerikastudien/ American Studies 61. 3 (2016): 397-403. Print. De Castilho, Richard Eckart, and Iryna Gurevych. “A Broad-Coverage Collec- tion of Portable NLP Components for Building Shareable Analysis Pipe- lines.” Proceedings of the Workshop on Open Infrastructures and Analysis Frameworks for HLT, 23 Aug. 2014, Dublin, 1-11, Web. 28 Feb. 2018. Documenting the American South, 2004, docsouth.unc.edu. Web. 4 April 2018. Finkel, Jenny Rose, et al. “Incorporating Non-local Information into Informa- tion Extraction Systems by Gibbs Sampling.” Proceedings of the 43rd Annual 228 Stephanie Siewert and Nils Reiter

Meeting of the Association for Computational Linguistics, 25-30 June 2005, Ann Arbor, 363-70, aclanthology.coli.uni-saarland.de/papers/P05-1045. Web. 11. Oct. 2018. Fitzpatrick, Kathleen. “The Humanities, Done Digitally.” Debates in the Digital Humanities. Ed. M. K. Gold. Minneapolis: U of Minnesota P, 2012. 12-15. Print. Flanders, Julia. “The Productive Unease of 21st-century Digital Scholarship.” Digital Humanities Quarterly 3.3 (2009): n.pag. Web. 10 Oct. 2018. Flood, Alison. “Short Story Vending Machines Press French Communter’s But- ton.” The Guardian 13 Nov. 2015. Web. 28 Feb. 2018. Fludernik, Monika. An Introduction to Narratology. 2006. New York: Routledge, 2009. Print. Fowler, Alastair. “Proper Naming: Personal Names in Literature.” Essays in Crit- icism 58.2 ( 2008): 97-119. Web. 10 Oct. 2018. Gemma, Marissa et al. “Operationalizing the Colloquial Style: Repetition in Nineteenth-Century American Fiction.” Digital Scholarship in the Humani- ties 32.2 (2017): 312-335. Web. 28 Feb. 2018. Genette, Gérard. Narrative Discourse Revisited. 1983. Ithaca: Cornell UP, 1988. Print. Gius, Evelyn, and Janina Jacke. “The Hermeneutic Profit of Annotation: On Pre- venting and Fostering Disagreement in Literary Analysis.” International Jour- nal of Humanities and Arts Computing 11.2 (2017): 233-54. Web. 11. Oct. 2018. Goodman, Susan. “Thomas Bailey Aldrich: Guardian at the Gate.” Republic of Words: The Atlantic Monthly and Its Writers, 1857–1925. Hanover: U of New England P, 2011, 142-50. Print. Greenslet, Ferris. The Life of Thomas Bailey Aldrich. Boston: Houghton Mifflin, 1908. Internet Archive. Web. 10 Oct. 2018. Groves, Jeffrey D. “Periodicals and Serial Production. Introduction.” A History of the Book in America. Vol. 3: The Industrial Book, 1840-1880. Eds. Scott E. Casper et al. Chapel Hill: U North Carolina P, 2007. 224-29. Print. Hamilton, Clayton. “The Novel, the Novelette, and the Short Story.” A Manual of the Art of Fiction. New York: Doubleday, Page & Company, 1918, 172-88. Internet Archive. Web. 10 Oct. 2018. Hoover, David. “Corpus Stylistics, Stylometry, and the Styles of Henry James.” Style 41.2 (2007): 174-203. Web. 10 Oct. 2018. Howells, William Dean. The Great Modern American Stories: an Anthology. New York: Boni and Liveright, 1920. Internet Archive. Web. 10 Oct. 2018. Internet Archive, archive.org. Web. 4 April 2018. Jacobson, Marcia. “Thomas Bailey Aldrich.” The Mark Twain Encyclopedia, ed- ited by J. R. LeMaster, James Darrell Wilson. New York/London: Garland Publishing, 1993. 20-1. Print. Jockers, Matthew. Macroanalysis: Digital Methods and Literary History. Urbana: U of Illinois P, 2013. Print. Litz, Walton A. Major American Short Stories. New York: Oxford UP, 1975. Print. Kaplan, Amy. “Nation, Region, Empire.” The Columbia Literary History of the American Novel. Ed. Emory Elliot. New York: Columbia UP, 1991. 240-66. Print. The Explorative Value of Computational Methods 229

Kelleter, Frank. “Response to William Uricchio. ‘There’s Something Happening Here’: Digital Humanities and American Studies.” American Studies Today. Ed. Winfried Fluck et al. Heidelberg: Winter, 2014. 383-97. Print. Kirsch, Adam. “Technology is Taking over English Departments.” The New Re- public, 2 May 2014. Web. 10 Oct. 2018. Lamplugh, George R. “The Image of the Negro in Popular Magazine Fiction, 1875-1900.” The Journal of Negro History 57.2 (1972): 177-89. Web. 10 Oct. 2018. Lupfer, Eric. “The Business of American Magazines.” A History of the Book in America. Vol. 3: The Industrial Book, 1840-1880. Eds. Scott E. Casper et al. Chapel Hill: U North Carolina P, 2007. 248-57. Print. MacKethan, Lucinda. “Plantation Fiction, 1865-1900.” The History of Southern Literature. Eds. Louis D. Rubin et al., Baton Rouge: Louisiana State UP, 1985. 209-19. Print. Maguire Skaggs, Merrill. The Folk of Southern Fiction. Athens: U of Georgia P, 1972. Print. Mark Twain Project. “Thomas Baily Aldrich.” 2007. Web. 28 Feb. 2018. Martin, Matthew R. “The Two-Faced New South. Plantation Tales of Thomas Nelson Page and Charles W. Chesnutt.” The Southern Literary Journal 30.2 (1998): 17-36. Web. 10 Oct. 2018. Matthews, Brander. The Philosophy of the Short Story. New York, London: Long- mans, Green & Co, 1901. Internet Archive. Web. 10 Oct. 2018. Matz, Jesse. Literary Impressionism and Modernist Aesthetics. Cambridge: Cam- bridge UP, 2001. Print. McGann, Jerome J. Radiant Textuality: Literature after the World Wide Web. New York: Palgrave, 2001. Print. Moretti, Franco. “Conjectures on World Literature.” New Left Review 1 (2000): 54–68. Web. 10 Oct. 2018. ---. “‘Operationalizing’: or, the Function of Measurement in Modern .” Pamphlets of the Stanford Literary Lab 6, Stanford Literary Lab, December 2013. Web. 28 Feb. 2018. Mueller, Martin. “Scalable Reading.” scalablereading.northwestern.edu. 29 May 2012. Web. 27 Mar. 2018. Pattee, Fred L. The Development of the American Short Story. An Historical Sur- vey. New York/London: Harper & Bros, 1923. Print. Poe, Edgar Allan. “The Philosophy of Composition.” Graham’s Magazine 28.4 (1846): 163-67. HathiTrust. Web. 10 Oct. 2018. ---. The Letters of Edgar Allen Poe. Ed. John Ward Ostrom. Vol. 1. Cambridge: Harvard UP, 1948. Print. Reiter, Nils. Discovering Structural Similarities in Narrative Texts Using Event Alignment Algorithms. Diss. Heidelberg University, 2014. Web. 28 Feb. 2018. Reiter, Nils, et al. “A Shared Task for a Shared Goal - Systematic Annotation of Literary Texts.” Digital Humanities 2017: Conference Abstracts, Montreal, Canada, August 2017. Web. 11. Oct. 2018. 230 Stephanie Siewert and Nils Reiter

Smith, Thomas Ruys. “Missing Ralph Keeler: Bohemians, Brahmins and Literary Friendships in the Gilded Age.” Comparative American Studies: An Interna- tional Journal 14. 2 (2016): 139-61. Web. 10 Oct. 2018. Samuels, Charles E. Thomas Bailey Aldrich. New York: Twayne Publishers, Inc., 1965. Print. Scheiding, Oliver. “The American Short Story.” Handbook of Transatlantic North American Studies. Ed. Julia Straub. Berlin: De Gruyter, 2016. 234-50. Web. 10 Oct. 2018. Scofield, Martin. The Cambridge Introduction to the American Short Story. Cam- bridge/New York: Cambridge UP, 2006. Print. ShortÉdition, 2011, short-edition.com. Web. 28 Feb. 2018. Smart, Annie K. Citoyennes: Women and the Ideal of Citizenship in Eighteenth Century France. Newark: U Delaware P, 2011. Web. 10 Oct. 2018. Suckow, Ruth. “The Short Story.” Saturday Review of Literature 4.17 (1927): 317- 18. The Unz Review. Web. 10 Oct. 2018. Tallack, Douglas. The Nineteenth-Century American Short Story: Language, Form, and Ideology. London: Routledge & Kegan Paul, 1993. Print. Temple, Emily. “The Most Anthologized Short Stories of all Time.” LitHub, 6 July 2017. Web. 4 April 2018. The Oxford Text Archive, 2015, ota.ox.ac.uk. Web. 4 April 2018. Twain, Mark. “Enchantments and Enchanters.” Mississippi Writings. Life on the Mississippi. 1883. New York: Literary Classics of the United States, 1982. 499- 502. Print. ---. Mark Twain’s Autobiography. Vol. I. New York/London. Harper & Bros, 1924. Print. ---. The Gilded Age and Later Novels: the Gilded Age. The American Claimant. Tom Sawyer, Abroad. Tom Sawyer, Detective. No. 44, the Mysterious Strang- er / Mark Twain. Ed. Hamlin L. Hill. Library of America, 2002. Print. University of Pennsylvania Library. The Online Books Page. “Short Stories, American.” N. d. Web. 6 April 2018. Vann Woodward, C. Origins of the New South, 1877-1913. Baton Rouge: Louisi- ana State UP, 1951. Print. Voss, Arthur. The American Short Story. A Critical Survey. Norman: U Okla- homa P, 1973. Print. Whalen, Terence. Edgar Allan Poe and the Masses: The Political Economy of Literature in Antebellum America. Princeton: Princeton UP, 1999. Print. Washington Smith, Rebecca. The Civil War and its Aftermath in American Fic- tion, 1861-1899. Diss. Chicago: University of Chicago Libraries, 1932. Print. Werner, James. American Flaneur: The Cosmic Physiognomy of Edgar Allan Poe. Routledge, 2004. Print. Wright, Kevin. “Zabriskie Family History, Part I.” Bergen County Historical So- ciety. n. d. Web. 4 April 2018.