Counting for Humanists

Counting for Humanists Andrew Goldstone (http://andrewgoldstone.com) Wednesday, April 30, 2014 . Academic disciplines (and even interdisciplines or hybrids) are relational entitites; they must define themselves by what they are not. And what literary studies is not is a “counting” discipline. This negative relation to numbers is traditional— foundational, even—and it has not been seriously challenged by the rise of interdisciplinarity….Literary studies has shouldered much of the burden of…defending qualitative models and strategies against the naïve or cynical quantitative paradigm that has become the doxa of higher-educational management. Under these institutional circumstances, antagonism toward counting has begun to feel like an urgent struggle for survival. James English, “Everywhere and Nowhere: The Sociology of Literature After ‘the Sociology of Literature,’ ” NLH 41, no. 2 (Spring 2010): xii–xiii. shall we count? . shall we count? Academic disciplines (and even interdisciplines or hybrids) are relational entitites; they must define themselves by what they are not. And what literary studies is not is a “counting” discipline. This negative relation to numbers is traditional— foundational, even—and it has not been seriously challenged by the rise of interdisciplinarity….Literary studies has shouldered much of the burden of…defending qualitative models and strategies against the naïve or cynical quantitative paradigm that has become the doxa of higher-educational management. Under these institutional circumstances, antagonism toward counting has begun to feel like an urgent struggle for survival. James English, “Everywhere and Nowhere: The Sociology of Literature After ‘the Sociology of Literature,’ ” NLH 41, no. 2 (Spring 2010): xii–xiii. what shall we count? favorite author female (%) male (%) Stephen King 17.5 35.9 Wilbur Smith 3.0 23.5 Agatha Christie 11.0 7.2 Danielle Steel 13.0 0.3 Jeffrey Archer 8.1 9.1 Virginia Andrews 11.9 0.8 Catherine Cookson 11.0 0.9 Sidney Sheldon 3.7 3.1 Bryce Courtenay 3.2 2.7 Tom Clancy 1.5 11.6 Table: Australian readers’ favorite authors, by gender, from Tony Bennett et al., Accounting for Tastes (Cambridge UP, 1999), 151. Figure 6: Book imports into India 300 250 200 150 100 50 0 1850 1860 1870 1880 1890 1900 Thousands of pounds sterling. Source: Priya Joshi, In Another Country: Colonialism, Culture, and the English Novel in India, New York 2002. Figure reprinted in Franco Moretti,v “Graphs, Maps, Trees,” NLR 24 (Nov.-Dec. 2003): 75. An antipathy between politics and the novel. Still, it would be odd if all crises in novelistic production had a political origin: the .French. down-. turn of the 1790s was sharp, true, but there had been others. in. the. 1750s. and 1770s—as there had been in Britain, for that matter, notwithstand- ing its greater institutional stability. The American and the Napoleonic wars may well be behind the slumps of 1775–83 and 1810–17 (which are clearly visible in figure 2), write Raven and Garside in their splen- did bibliographic studies; but then they add to the political factor ‘a decade of poorly produced novels’, ‘reprints’, the possible ‘greater rel- ative popularity . of other fictional forms’, ‘a backlash against low fiction’, the high cost of paper . .6 And as possible causes multiply, one 6 James Raven, ‘Historical Introduction: the Novel Comes of Age’, and Peter Garside, ‘The English Novel in the Romantic Era: Consolidation and Dispersal’, in Garside, Raven and Schöwerling, eds, The English Novel 1770–1829, 2 vols, Oxford 2000; vol. i, p. 27, and vol. ii, p. 44. comma-separated values "firstname","surname","bornCountry" "Alice","Munro","Canada" "Mo","Yan","China" "Tomas","Tranströmer","Sweden" . the norms of CSV I plain-text file for tabular data I delimiter separates columns (usually , or a tab) I newline separates rows I names of columns in first row (optional) I tricky bits: I what if a data point contains a comma? I what if a data point contains a quotation mark? I what text-encoding should be used? I how do you know what rules have been followed? (There is RFC 4180, but no promises.) . people id,firstname,surname,born,died,bornCountry,bornCountryCode,bornCity,diedCountry,diedCountryCode,diedCity,gender,year 892,Alice,Munro,1931-07-10,0000-00-00,Canada,CA,Wingham,,,,female,2013 880,Mo,Yan,0000-00-00,0000-00-00,China,CN,Gaomi,,,,male,2012 868,Tomas,Tranströmer,1931-04-15,0000-00-00,Sweden,SE,Stockholm,,,,male,2011 854,Mario,"Vargas Llosa",1936-03-28,0000-00-00,Peru,PE,Arequipa,,,,male,2010 844,Herta,Müller,1953-08-17,0000-00-00,Romania,RO,"Nitzkydorf, Banat",,,,female,2009 832,"Jean-Marie Gustave","Le Clézio",1940-04-13,0000-00-00,France,FR,Nice,,,,male,2008 817,Doris,Lessing,1919-10-22,2013-11-17,"Persia (now Iran)",IR,Kermanshah,"United Kingdom",UK,London,female,2007 808,Orhan,Pamuk,1952-06-07,0000-00-00,Turkey,TR,Istanbul,,,,male,2006 801,Harold,Pinter,1930-10-10,2008-12-24,"United Kingdom",UK,London,"United Kingdom",UK,London,male,2005 Source: requests to api.nobelprize.org. See http://www.nobelprize.org/nobel_organizations/nobelmedia/nobelprize_ org/developer . words WORDCOUNTS,WEIGHT the,766 of,482 and,305 in,259 to,224 a,195 new,101 as,101 that,86 it,75 Source: a wordcounts CSV file from a http://dfr.jstor.org request. And what can’t be? affordances What kinds of data can be accommodated in this format? . affordances What kinds of data can be accommodated in this format? And what can’t be? . data types: simple: numerical I Whole numbers (integer scale). How many (books, people, words, genres…)? I Real numbers (interval scale). How much (distance, time, money…)? Special cases: I percentages or proportions (ratio scale). How much of the total (population, corpus of texts…)? I dates. When? (And does the day, month, year, decade, century… matter?) . data types: simple: categorical I Unordered. Which of… (languages, nations, genders(?))? Special cases: I binary or Boolean category: true or false, yes or no. I many categories (headwords in the dictionary, authors in the catalogue). I Ordinal. Which (letter of the alphabet, sales rank, “like, dislike, or neutral”)? Categories to numbers I true: 1, false: 0 I like: 1, neutral: 0, dislike: -1 I like: 2, neutral: 1, dislike: 0 I a: 1, b: 2, c: 3… (character encoding) . data types: compound The list / the series 17.5, 3.0, 11.0, 13.0, 8.1, 11.9, 11.0, 3.7, 3.2, 1.5 The list of lists / the table firstname: Alice, Mo, Tomas surname: Munro, Yan, Tranströmer bornCountry: Canada, China, Sweden firstname surname bornCountry Alice Munro Canada Mo Yan China Tomas Tranströmer Sweden (more elaborate possibilities exist…) . and text? a (looooong) list of characters (a “string”): O, n, c, e, *space*, u, p, o, n, *space*, a, *space*, t, i, m, e other representations I the bag of words (to: 2, be: 2, or: 1, not: 1) I content analysis (automated, human, or semi-automated) I marked-up text <sp who="#Salinus"><speaker>Duke.</speaker> <p>Haplesse <name>Egeon</name> whom the fates haue markt...</p> I parsed trees I page images . 2. A computer performs calculations on numbers and stores the results of those calculations. 3. If the inputs, outputs, and the formal description can be encoded as numbers, a program can be executed on a computer. programming in a nutshell 1. A program is a formal description of a process for transforming data. 3. If the inputs, outputs, and the formal description can be encoded as numbers, a program can be executed on a computer. programming in a nutshell 1. A program is a formal description of a process for transforming data. 2. A computer performs calculations on numbers and stores the results of those calculations. programming in a nutshell 1. A program is a formal description of a process for transforming data. 2. A computer performs calculations on numbers and stores the results of those calculations. 3. If the inputs, outputs, and the formal description can be encoded as numbers, a program can be executed on a computer. the R experience The console You type an expression, R figures out its value (and sometimes: stores a value, draws a figure, reads a file from the disk, saves a file on the disk). The script You prepare a list of expressions in a file, and R figures out their value one by one. first steps in the console R is a parrot 2 "Shiver me timbers" R gets crabby easily Shiver Shiver me timbers help ( "Shiver Press ESC. some hidden features I history navigation with up and down arrows (or RStudio History pane) I tab completion I help: help("sqrt") or ?sqrt . R data kinds (“modes”) Numbers Whole, integer, real, ratio… (complex too) Strings "Avast" "\"Avast,\" he said" "Beware the \\" Represent a newline with \n and a tab with \t. Booleans TRUE and FALSE or T and F for short Factors (For categorical data: more later) . Rithmetic Try: 2 * 2 5/7 TRUE | FALSE TRUE & FALSE T& T !FALSE !TRUE 4 == 3 !(4 == 3) 4 != 3 1 < 5 . R functions Functions map inputs to outputs. Describe these: sqrt(4) nchar("Munro") paste("Alice", "Munro") Functions in R can have named parameters as well. Experiment with: paste("Munro", "Alice", sep = ", ") paste("Munro", "Alice", sep = "") . assignment <- stores a value under a name which you can refer to (or change) later. x <- 108 x x + 2 storage <- 10 storage <- storage - 10 My_Perfectly_Good_Name2012 <- "Mo Yan" . R compound data types vectors (for a series of values) Construct a vector with the special function c (concatenate): xs <- c(2, 4, 8) xs bs <- c(T, F, T) bs people <- c("Munro", "Mo", "Transtromer") people c(people, "Vargas Llosa") .

Load more