Humanities Computing: A
Total Page:16
File Type:pdf, Size:1020Kb
Humanities Computing: A Federation of Disciplines John Nerb onne Alfa-Informatica University of Groningen [email protected] http://www.let.rug.nl/alfa/ Series: Is Humanities Computing an Academic Discipline? Organized by John Unsworth University of Virginia Oct. 29, 1999 RROPQR IJ Federation KL MN Federation Thesis: Humanities Computing (HC) is not a discipline (yet), but a federation of co op erating disciplines. a discipline is demarcated by a subject matter a range of analytical techniques one or more comp eting theories where appropriate, practical applications caution: these are claims ab out elds, not ab out every piece of work, or every scholar's cumulative work HC has neither coherent subject matter nor theory (apart from its comp onents) RROPQR 1 IJ Vision KL MN Federation Humanities Computing is ...die Fortsetzung der Geisteswissen- schaften mit anderen Mitteln. (with ap ologies to Clausewitz, Battus). unsurprising but signicant HC must engage traditional Humanities HC's primary value is within traditional Humanities which traditional Humanities problems have we solved? opp osed to view that HC's purp ose is to understand digital culture using humanities metho ds studies comparing printing press to computers Electronic Incunabula (Nerb onne, 1995) linguistic studies of computer-mediated communication literary studies of hyp ertext vs. planar text recent prop osal from Dutch Science CouncilWTR interesting, but not HC's job RROPQR 2 IJ Parallel? KL MN Federation in general, scholarship makes use of all available technology optical magnication astronomy medicine biology photography astronomy biology, biokinetics ethnology phonograph acoustics linguistics ethnology an essential to ol is insucient to create a eld of scholarship common subject matter, theory is essential RROPQR 3 IJ Promising KL MN Federation HC ought to be past the stage of a promising development Humanities colleagues are not anti-technical, certainly not in general (McCarty's essay) are not guarding noncomputational empires want results, like all go o d scholars HC vies for attention with new theoretical discussion broader views other interdisciplinary p ersp ectives . HC is over 30 (Computing and the Humanities, 1967) no longer a Wunderkind Which traditional Humanities problems have we solved? RROPQR 4 IJ Engaging the Humanities KL MN Federation George Welling (Groningen) digitized and organized the imp ort records (Paalgeld) of Amsterdam 1771-1817 computational metho ds deployed for organization (database) and ver- ication (consistency) and exploration (nominal record linkage) historical results Baltic trade (mo edernegotie) eclipsed by American trade even in 1771 (pace Israel, de Vries) American shipping to ok over Dutch business prevented by British blo ckade in 4th Anglo-Dutch war (1780-84) American shipping catapulted to world-wide second place organizational eect active interest in HC by lo cal historians RROPQR 5 IJ Art History KL MN Federation Elwin Koster (Groningen) has digitized and organized city maps computational metho ds used to reconstruct architectural work for which plans (and buildings) were inaccessible results in architectural history more complete reconstruction of urban development a digital terrain mo del of 17th cent. Groningen RROPQR 6 IJ KL (Computational) Linguistics MN Federation Linguistics is universally part of Humanities language is a cultural pro duct language is the vehicle for most elab orate and subtle cultural expression aggressively interdisciplinary, esp. wrt psychology, cognitive science Computational Linguistics fo cus on computational language pro cessing 1; 500-memb er prof. organization active collab oration with CS (50% memb ers) RROPQR 7 IJ Computational Linguistics KL MN Federation S' WH S i Turing machine Undecidable NP VP V NP i S Polynomial Push-down automaton NP VP V NP a c Linear Finite-state automaton b d Chomsky (1963): [. ] we must conclude that the comp etence of the native sp eaker cannot be characterised by a nite automaton [. ]. Nevertheless, the performance of the sp eaker or hearer must be representable by a nite automaton of some sort. Van No ord approximates sophisticated grammars in FSA's RROPQR 8 IJ Dialect Geography KL MN Federation In analogy to isotherm in climate map, linguists draw lines around areas in which same or similar forms are used. The lines are isoglosses. They are more broadly interesting b ecause they show cultural anity which might be due to so cial or commercial ties, migration, or conquest. Originally pursued (late 19th cent.) in order to see whether lo cal linguistic change might be more phonetically regular than global change (it isn't). RROPQR 9 IJ Isoglosses KL MN Federation hinnn hinnn hene hounr hinnn hinnn kiepen kjiepm hounder kippe hounder hinne hinnn hinnn kipm tuutn hinnn hinnn ouner hounr kippe hinnn kiepm oundern kipm kiepm hounder / tuutn kippin kipm, kiepm hoender kippe kipm, honer kippe kipmm honer kippe kipm kipm kiepm kippen kipm kiepn kippe hoonderen kippe kippe kiepe kippe kippe kippe hoonder kippe hoonder kiepe kippe kiepe tjoeken kippe kippen kiepe tjoeken tiete henne hinne kiekes kiepe tjoeknz hennen oenderz oenderz henne hinne kiekes kiekns kiekes kiekes ennn kiekes kuuke kieker keejkes kieke hoonder kiekenz kieknz kiekerne innn kiekes ennn kiekere / kekere hinne poeln ujern keke honder kiekiez / ennn kiekes keke kakez ennn hoonder hoender Isoglosses for dierent forms of 'kipp en' (chicken) would be drawn North-South around eastern border (variants of hounder), and in Flanders (variants of kieken). RROPQR 10 IJ Isoglosses KL MN Federation optilln optilln optilln optilln optilln optilln optillen optilln optilln optille optilln optilln optilln optilln optilln optilln optilln optilln optilln optilln optille optilln optilln optilln optilln optilln optilln optille optilln obbeurn optilln ? opbeurn obbeuen optille obbeurn obbuurn optille obbeuen obbeurn obbeuen opbeurn opbuurn opbeurn optille obbeurn opbeure optille opbeure opbeure optille opbeure opbeurn opbeure opbeurn opbeure uplichte oplichte oplichte oplichte uplicht oplichte oplichte opeffe opvatte ophuve oplichti oplichte oplichen upvatte oplichtn oplichte opheffe upheffe opeffe ipeff opeffn upeffe ipeffn uphuve ophuffe uphuffe uphuffe ophuffe opluchte ipeffn ipeffn opeffen opeffn upuffe upeffn opuffe opluchte oprejere opeffn upuffe opheffe epeffn opaken upave omhooglichte opeffn opheffe opheve Isoglosses for dierent forms of for dierent forms of 'optillen' (lift up) would run East-West. RROPQR 11 IJ Isoglosses KL MN Federation Isoglosses are imp ortant, but insucient for identifying dialect 1 areas areas with similar varieties. Blo omeld ( 1916,1933) summarized this, but the problem was already well-known: Blo omeld: every word has its history Coseriu (1956): danger of atomistic view RROPQR 12 IJ Linguistics KL MN Federation some unsolved problems in dialectology what is the analytical basis of `dialect areas' ? Coastal New England, U.S. Southern Coastal, Saxon (Dutch) Can we more precisely in what sense dialectal dierences are cumulative (Chalmers and Trudgill)? How do we reconcile the notions `dialect area' and dialect conti- nuum? RROPQR 13 IJ Computational Persp ective KL MN Federation need a way to aggregate individual dierences a numerical view Edit Distance ( = Levensthein Distance) equals the cost of (the least costly set of ) op erations mapping one string to another basis costs are insertions (1), deletions (1), substitutions (2) two strings are compared by calculating their Levenshtein distance adresse insert d 1 addresse delete e 1 address 2 How do you know it's the cheapest ? Try al l the sequences of op erations? RROPQR 14 IJ Algorithm KL MN Federation Levenshtein distance(adresse ,address ) a d d r e s s 0 1 2 ... a 1 d 2 . r . e s s e Top horizontal row is always 1; 2;::: cost of insertions Left vertical column is always 1; 2;::: cost of deletions begin at upp er left (( 0) diag ab ove to ll in a cell: min(ab ove + delete, left diag + replace, left + insert) lower right corner of table contains LevD RROPQR 15 IJ Algorithm KL MN Federation Levenshtein distance(adresse ,address ) a d d r e s s 0 1 2 3 4 5 6 7 a 1 0 1 2 3 4 d 2 1 0 1 2 r 3 2 1 2 1 e 4 3 2 1 s 5 4 1 s 6 1 e 7 2 address, adresse are two Levenshtein units apart. RROPQR 16 IJ Alignment KL MN Federation Levenshtein distance(adresse ,address ) a d d r e s s 0 1 2 3 4 5 6 7 a 1 0 1 2 3 4 d 2 1 0 1 2 r 3 2 1 2 1 e 4 3 2 1 s 5 4 1 s 6 1 e 7 2 path of lowest scores shows alignment of strings a d d r e s s j j j j j j j j a d r e s s e RROPQR 17 IJ Applications KL MN Federation other biology align DNA sequences ethology map evolution in bird songs In language sp ell checker given missp elling, nd closest match in dictionary more is needed for this! alignment align bilingual texts use sentence length as indicator of base similarity language therapy identify sources of deviant pronounciation language variation measure dierences among dialects or so cial groups RROPQR 18 IJ Dialect Pronunciations KL MN Federation use 100-word sample in large numb er of varieties dialect distance is equal to the sum of the word distances we've aggregated over individual words! rst applied for dialect comparison by Kessler (1995) for Irish dialects applied for Dutch dialects by Nerb onne et al. (1996), Nerb onne and Heeringa (1997), Nerb onne and Heeringa (1999, to app ear). American English example: 'saw a girl' is pronounced as [sO:@gIrl] (Standard American) and [sO:r@gø:l] (Boston). Change the rst pronounciation into the other. sO@gIrl delete r 1 sO@gIl replace I/ø 2 sO@gøl insert r 1 sOr@gøl 4 RROPQR 19 IJ Levenshtein distance KL MN Federation Calculate the cost of changing one string into another Renement: by lo oking at the features the value of a replacement r varies between 0 and 2.