Automatic discovery of syntactic changes

Micha Elsner and Emily Lane melsner0@gmail and [email protected] Department of Linguistics The Ohio State University

Abstract CE) with the Medieval writing of Thomas Aquinas (c. 1270) and the intermediate stage of the Vulgate Syntactic change tends to affect construc- Bible (4th century CE). Such a method can be used tions, but treebanks annotate lower-level for the initial “hypothesis discovery” step in a his- structure: PCFG rules or dependency arcs. torical research project. Although the method is This paper extends prior work in native currently incapable of discovering some (lexically language identification, using Tree Substi- bound) constructions, we demonstrate that it dis- tution Grammars to discover constructions covers several interpretable and interesting histor- which can be tested for historical variabil- ical changes. ity. In a case study comparing Classi- The method (which we review more fully be- cal and Medieval Latin, the system dis- low) induces a Tree Substitution Grammar (TSG) covers several constructions correspond- from a constituency treebank. TSG rules are larger ing to known historical differences, and than Context-Free Grammar (CFG) rules and thus learns to distinguish the two varieties with have the power to represent constructions, includ- high accuracy. Applied to an intermediate ing partial lexicalization. We use chi-squared fea- text (the Vulgate Bible), it indicates which ture selection to rank the TSG rules for their sen- changes between the eras were already oc- sitivity to historical change. We evaluate the rules curring at this earlier stage. both by building classifiers to identify the histori- cal period of unknown text, and by manual exam- 1 Introduction ination and interpretation. In recent years, the study of language variation and change has been aided by a variety of computa- 2 Variationist research tional tools that can automatically infer hypothe- ses about language change from a corpus (Eisen- Computational methods for studying language stein, 2015). In the domain of syntax, however, variation can enhance both diachronic (historical) computational work is still limited by the neces- and synchronic (sociolinguistic) research. In some sity of manually choosing interesting hypotheses cases, the computational contribution is to build a to study. For example, computational research on classifier for a particular feature which is already the syntax of African-American English (Stewart, of interest. For instance, Bane et al. (2010) tar- 2014) is driven by pre-existing scholarly intuitions get pre-selected phonetic features for analysis in about the distinctive features of this dialect, but recorded speech. Other computational systems are such intuitions are much harder to obtain for dead exploratory: capable of discovering new hypothe- (or newly-emerging) language varieties. ses about geographical or social variation in the This paper adopts a method for unsupervised data. But existing systems of this type are lex- learning of syntactic constructions previously icographic. For instance, Eisenstein (2015) de- found effective for native language identification tects previously unknown local slang terms, such (Swanson and Charniak, 2012), and shows that it as “deadass” in New York City. Rao et al. (2010) can discover a range of historically varying ele- discover words and ngrams correlated with gender ments in a Latin corpus. In particular, we conduct and other social attributes, as do later papers such a case study comparing classical prose (1st century as Bamman et al. (2014).

156 Proceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH), pages 156–164, Berlin, Germany, August 11, 2016. c 2016 Association for Computational Linguistics C creates a TSG rule for every maximal fragment which occurs more than once in the dataset. For C V.IND instance, the fragment in figure 1 would be ex- quod tracted from the trees for dicit quod Cicero con- 1 N.NOM N.NOM V.IND sul est and quod Caesar dux est scimus, since it is shared between them both, but cannot be fur- est ther expanded without adding an unshared ele- Figure 1: A TSG fragment with the root sym- ment. TSGs are equivalent in expressive power to bol C (complement clause), introducing an indica- CFGs and can be efficiently parsed using the same tive subclause headed by quod which contains two algorithms (Goodman, 1996). nominals and the verb est “is”. TSGs have been used effectively for native language identification (Swanson and Charniak, 2012): determining the native language of a writer Work on syntactic variation is much rarer. For with intermediate proficiency in English, given a the most part, it is confirmatory rather than ex- sample of their English writing. (Two closely planatory; computational systems are designed to related approaches are Wong and Dras (2011) find examples of specific constructions in order and Wong et al. (2012).) Swanson and Charniak to support investigations driven by pre-existing (2014) show that the rules learned by their system hypotheses. Such systems do not suggest new can be interpreted as transferring features or con- hypotheses from the data. Stewart (2014) de- structions from their native language. In this work, tects African-American copula deletion and aux- we argue that TSG is also useful for detecting the iliary verb structures; Doyle (2014) investigates forms of change which occur in historical corpora. “needs done” and double modals. We know of one exploratory project using syntactic features: Jo- 4 Classical and Medieval Latin hannsen et al. (2015) use universal dependencies to extract “treelets” correlated with age and gen- Lind (1941) divides Latin roughly into Classi- der. Our TSG fragments are similar to their treelet cal (250 BCE to 100 CE), Late (100-600 CE), features, but have the potential to be larger and are Medieval (600-1300) and Neo-Latin (1300-1700). partly lexicalized. Though these divisions are heuristic, they do cor- respond to episodes of lexical and grammatical 3 Tree substitution grammars change. Medieval Latin was an educated language A Tree Substitution Grammar (TSG) generalizes used by clerics and scholars. It diverges from Context-Free Grammar (CFG) by allowing rules its Classical roots partly due to the influence of to insert arbitrarily large tree fragments (Cohn et the evolving Romance languages and of Church al., 2009). Each fragment has a root symbol (anal- texts (themselves often influenced by Hebrew and ogous to the left-hand-side category in a CFG) and Greek) (Lind, 1941; Lofstedt,¨ 1959). a frontier which can consist of terminals (words) Scholars debate the nature and origins of vari- and non-terminal symbols to be filled in later in the ability within Medieval Latin. Lofstedt¨ (1959, ch. derivation. An example tree fragment is shown in 3) surveys this research. For instance, an early figure 1; this fragment describes a particular com- theory that African Late Latin was syntactically plement clause structure which can be interpreted distinct was rejected on the grounds that the sup- as the construction “that X is Y”. posedly African constructions represented a dis- A single treebank tree may have multiple TSG tinct rhetorical style rather than a dialect. Simi- derivations (depending on how it is split up into lar questions have been raised about dialectal dif- constructional fragments), so TSGs must be in- ferences between France and Spain and the influ- duced from the data. The Data-oriented Parsing ences of Germanic languages on their local vari- (DOP) method (Bod and Kaplan, 1998) was crit- eties of Latin. icized by Johnson (2002) for its poor estimation A robust computational method could help to procedure. Newer methods select a set of frag- resolve controversies like these. In many cases, ments either using Bayesian models (Cohn et al., the dispute is centered around some construction 2009; Post and Gildea, 2009) or using so-called 1“He says Cicero is consul” and “That Caesar is a general, Double-DOP (Sangati and Zuidema, 2011), which we know”.

157 which is claimed to be a regional variant. For Author Text Sents. Date instance, Hanssen (1945) claims that mittere pro Classical (Perseus) may be a calque of English “send for”, a claim Cicero In Catalinam 327 63 which Lofstedt¨ (1959) rebuts by providing a va- BCE riety of examples from elsewhere. The construc- Sallust Bellum 701 c. 42 tions involved may be quite rare, and a special- Catalinae BCE ist in one region or period may be unaware that a Caesar de Bello 71 c. 57 construction of interest is attested elsewhere, es- Gallico BCE pecially in obscure texts. An automatic method Petronius Satyricon 1114 c. 54- for discovering cases which vary across regions or 68 CE periods could not only help to reject this type of Late (Perseus) spurious claim, but also find genuine examples of Jerome Vulgate 405 c. 380 regional variation which may not have been previ- (editor) Bible (Reve- CE ously noticed. lation) Medieval (Thomisticus) 5 Case study Thomas Summa 9859 c. Aquinas Contra 1250- To demonstrate the effectiveness of our method, Gentiles 70 we use it to construct a classifier which differen- tiates between single utterances of Classical and Table 1: Authors and texts used in the current Medieval Latin. The classifier features are a set of study; dates from (Shipley et al., 2008; Horn- TSG fragments. We induce the TSG from train- blower et al., 2012). ing data, then run a feature selection procedure to limit their number. We show that the learned clas- 6 Data and preprocessing sifier is fairly effective, and analyze two sets of its learned features by hand, connecting them to the Our case study uses two Latin treebanks, Perseus literature on known historical changes. (Bamman and Crane, 2011) and Index Thomisti- As a secondary question, we investigate the cus (Passarotti, 2011), each of which contains placement of the Vulgate Bible: is it more simi- dependency-parsed Latin prose (Bamman et al., lar to Classical or Medieval Latin? The Vulgate 2007). Table 1 provides a list of authors, dates is often seen as an intermediate between the two and sizes. Unfortunately, the Late and Medieval periods. Sidwell (1995, p.30) says that it: groups are represented by a single author each; this represents a weakness of this project, since it will be impossible to distinguish Medieval Latin “sanctified usages such as changes in the in general from the specific style of Aquinas. The use of cases and the subjunctive, and the data is also somewhat unbalanced, with Aquinas more frequent use of quod/quia clauses representing much more text than any other author. in reported speech. . . . It is linguistically These limitations are imposed by the system’s re- a central text.” quirement for parse trees, and the unavailability of other parsed Latin data. But while the Vulgate has a strong influence on Both source treebanks use non-projective de- 2 Medieval tradition, its compiler, St. Jerome, was pendency trees. To employ the TSG technique, classically educated; in a famous letter, he actu- we convert these to constituency trees. Our con- ally chastised himself for being “a Ciceronian, not version introduces a phrasal projection over ev- a Christian” (Wright, 1933) because of his pref- ery head word with children; following Klein and erence for classical prose over the “uncultivated” Manning (2004), we give this projection the same Biblical style. Running the classifier on sentences label as the head word’s part of speech. Non- from the Vulgate can reveal how the text balances projective edges are converted to projective ones these two affinities. by reordering the words so that the descendants of every head are contiguous. When a subtree is 2Our sample, the book of Revelation, was “slightly re- vised” (Sidwell, 1995) by Jerome from an older Latin trans- moved for this purpose, its tag is marked with a lation of the 2nd century CE (Hornblower et al., 2012). diacritic, so that the grammar can learn separate

158 ROOT UNK, although the part of speech tags remain as a guide to the grammatical form. Finally, we split our data into random 1 train/dev/test sections, with 10 of each era for de- 2 quem ad finem iactabit velopment, 10 testing and the rest training. Since what to end hurl.3.FUT we do not train or develop on the Vulgate, we set V.IND this data aside as a single set.

7 Learning and ranking constructions R V.IND

iactabit We extract a set of TSG rules using Double-DOP R N.ACC (Sangati and Zuidema, 2011). As stated above, ad P.ACCmov N.ACC this process yields the set of all maximal TSG fragments which occur in more than one tree- quem finem bank tree. It is usually more exhaustive than the Figure 2: Transformation of the non-projective Bayesian extractors (Cohn et al., 2009), although construction quem ad finem iactabit (“how far will it can be slow for large corpora. [your audacity] hurl [itself]”) from the Perseus As in Swanson and Charniak (2013), we then Treebank into a constituency structure. mov is the match each rule against each treebank sentence, “moved element” diacritic. deciding whether that rule can occur in any deriva- tion of the sentence. We assemble these decisions into a (rule sentence) binary co-occurrence ma- 3 × rules for non-projective constructions. See figure trix. To compute variants which change between 2 for an example. Classical and Medieval Latin, we sum across sen- The treebanks use multidimensional part of tences in the training set to compute the four-way speech tags, with slightly different tagging con- contingency table of counts: sentences with and ventions. We use only the top-level part of speech without the rule in each era. We compute the χ2 tag for most words, converting the Thomisticus statistic for each table and use this to rank the tags deterministically into the Perseus tags. We rules for feature selection. Swanson and Charniak use the remaining dimensions to annotate nomi- (2013) recommend χ2 because it tends to retain nals with their case and verbs with their mood (in- moderately rare rules with good predictive power, dicative, subjunctive, imperative or infinitive). rather than focusing on generally-applicable rules Finally, again following Swanson and Charniak with weak predictions (as is the case for Informa- (2013), we selectively delexicalize the trees. This tion Gain). prevents the “syntactic” patterns our system learns We select rules for which the χ2 probability is as markers of variation from being dominated by less than .00001 (tuned on development experi- lexical items marking different topics (Sarawgi ments; the corresponding χ2 statistic is about 19). et al., 2011). For instance, Aquinas frequently In our dataset, this method selects 357 TSG frag- uses the adjective Christiana “Christian”, while ments. We use the Megam (Daume´ III, 2004) the Classical authors do not. But this is a change maximum entropy classifier to learn a predictor for in culture, rather than in language. the era (Classical or Medieval) of a sentence given We remove all lexical items except prepositions the binary feature vector indicating presence or ab- (POS tag R), conjunctions (C), and a short list of sence of these 357 fragments. Results are shown adverbials (D), ne, non, tam, tamen, ita, etiam in table 2. The classifier overpredicts the major- (“lest, not, so, however, thus, besides”). We re- ity class (Medieval) but still achieves 77% accu- place all forms of the verb esse (“to be”), which is racy on the minority class, indicating that its fea- often used as an auxiliary, with the Perseus Tree- tures are reasonably informative about language bank lemmatized form sum1. For example, the change. phrase ad quem finem is delexicalized to ad UNK 7.1 Analysis 3Unlike the construction of Nivre and Nilsson (2005), this tag is intended to describe non-projective constructions but We will discuss two interpretable patterns discov- does not give enough information to parse them. ered by our system: a known historical change in

159 Era N Correct Acc Fragment χ2 hits Classical 442 341 77% More Classical Medieval 1972 1931 98% (V.INF N.ACC V.INF) 46 69 Majority 2414 1972 82% (C (C cum) V.SBJV) 299 68 Overall 2414 2268 94% (C (C cum) V.IND 102 24 More Medieval Table 2: Classifier accuracy on test data. (C igitur) 353 1575 (C (C autem) V.IND) 351 1475 the use of complement clauses, and a stable but (C (C quod) V.IND) 161 990 hard-to-interpret pattern in adjective/noun order- (C (C quod) V.SBJV) 150 738 ing. Finally, we discuss our failure to detect the decline of a parenthetical construction called the Table 3: Hand-selected features related to changes absolute. In each case, we have manually in complement clauses. grouped together TSG fragments selected by the system and imposed an interpretation on them by The system’s failure to extract this construction doing additional linguistic analysis. with high confidence stems from an inability to Classical Latin verbs like dicere “to say” typi- generalize over the contents of the subclause. Due cally take nonfinite complement clauses (Pinkster, to the flat tree structure, a subclause with a tem- 1990). In Medieval Latin, these verbs more com- poral modifier, for example: dico te priore nocte monly take finite complement clauses, often with venisse “I say that you came last night” cannot the complementizer quod “that” (Sidwell, 1995, be unified with a subclause without. This leads p.368). The two sentences below (the first from to data fragmentation, and therefore to a low fre- Cicero, the second from Aquinas) exemplify these quency for the construction, which reduces the different structures: system’s confidence in associating it with the Clas- (1) Lepidum te habitare sical period. Lepidus.ACC you.ACC live.with.INF A second interpretable set of TSG fragments velle dixisti governs adjective ordering. It consists of all the want.INF say.2.PFV rules (N.case N.case Adj.case) (a nominal, headed “You said you want to live with Lepidus.” by a noun with a postnominal adjective) and (N.case Adj.case N.case) (prenominal adjective). (2) Dicitur quod sapientia Table 4 shows the statistics. With the exception of say.3.PASS COMP wisdom.NOM the ablative case, the Classical data slightly prefers infinitus thesaurus est postnominal adjectives, while the Medieval data infinite treasury be.3 strongly prefers prenominals. Ledgeway (2012) “It is said that wisdom is an infinite trea- states that Classical Latin used postnominal ad- sury.” jectives in unmarked contexts, with prenominals Table 3 shows a collection of tree fragments re- serving some semantic and pragmatic functions. lated to this change, along with their χ2 statistic This preference is claimed to be stable throughout values. The system clearly identifies the Medieval the Middle Ages, leading to modern Romance lan- complementizer quod “that” with both indicative guages with mainly postnominal adjectives. Our and subjunctive clauses, along with autem “how- Medieval corpus data does not follow this pat- ever” and igitur “therefore”. Although these do tern, since prenominals are more typical. But occur in Classical prose, the high values of the whether this reflects an actual localized or tempo- χ2 statistic show that they are clearly much more rary change, or Aquinas’s personal style, cannot widely used in Medieval Latin. The system also be determined without further investigation. identifies the decline of the Classical complemen- The ablative absolute is an adverbial modifier tizer cum (“when” with indicatives, “since” with that is frequently used to denote a time, or the subjunctives). The low χ2 value (46) for the in- cause of an action, and often takes the place of a finitival complement clause, however, must be ac- subordinate clause. However, the ablative absolute counted as a partial failure of the system; this frag- is not grammatically dependent on any word in its ment appears low in the selected list of features. sentence (Allen and Greenough, 1983, p. 263).

160 Case % postnominals χ2 tive absolute, but which is actually a prepositional Classical Medieval phrase: Nom 53 26 135 Gen 56 25 115 (4) Quem in rebus cognoscendis Dat 65 8 90 That in things.ABL known.PART Acc 57 34 413 quotidie experimur Abl 35 36 228 daily experience.1.PL “That we experience daily in the knowing Table 4: Percent of postnominal adjectives in of things” noun-adjective phrases, and χ2 value for the post- nominal rule. The Classical data contains more Thus, we miss this historical change because the postnominals, while the Medieval data contains ablative absolute is quite varied in form, and be- more prenominals. cause our representation fails to distinguish it from similar constructions.

It normally consists of a noun and a passive par- 7.2 Late Latin: The Vulgate ticiple, (although another noun or an adjective can We run the Classical/Medieval classifier on the replace the ): Vulgate, with results shown in table 5. Despite (3) Omni pacata Gallia the classifier’s overall bias towards the Medieval All.ABL pacified.PAST.PART Gaul.ABL class, we find that the Vulgate is generally more ad eos exercitus noster Classical. However, the proportion of sentences against them army.NOM our.NOM labeled in this way (64%) is not comparable to the adduceretur 77% of Classical sentences labeled as Classical, lead.3.SBJV indicating that the Vulgate is indeed intermediate between the two eras. “With all of Gaul having been pacified, To determine which features most typify the our army would be led against them” Classical and Medieval components of the Vul- There is current speculation that the ablative ab- gate, we compute the summed contribution of each solute descends from either an instrumental or a feature to the entire set of decisions. If a feature fi locative origin (Allen and Greenough, 1983). Ra- has weight θi, we compute its importance M(i) mat (1991) argues that it developed from a very over a set of examples x: colloquial style of speech, as a way to compensate M(i) = f θ (1) for a lack of “complementizers, auxiliaries, and | i i| x determiners” (p. 261). Furthermore, Ramat argues X that the construction is “more pragmatic than syn- The top 5 features for each class are shown tactic”, and thus declined as Medieval Latin be- in table 6. Several features represent changes in came more formal and syntactically rigid. adjective ordering (discussed above) and the use The system finds several rules for ablative of complementizers or clause-initial markers. A noun/participle phrases, but none with a χ2 value few, such as the occurrence of conjunctions and above 40. We detect 56 uses in the Classics and adverbs, do not represent real historical changes 65 in the Medieval corpus. This construction is and are presumably markers of specific topics or hand-annotated in the treebanks, however, so we styles. The importance attached to the preposi- can check our accuracy. In fact, the Classical cor- tion in may reflect either a stylistic difference, or pus contains 105 ablative absolutes, while the Me- the Medieval tendency to use a preposition where dieval corpus has none. Our system underdetects Classical Latin uses the bare ablative case (Sid- the Classical cases due to modifiers and reorder- well, 1995, p. 367). We believe these results show ings, as discussed above. It overdetects the Me- that the system can aid a linguist in finding lan- dieval ones; Medieval constructions that appear to guage change, but that the output still needs to be be ablative absolutes often contain rather analyzed and interpreted by hand. than passive , an issue hidden by delexi- With the exception of cum, the clausal features calization and the use of coarse tags. Additionally, discussed above have little impact on the classifi- Thomas favors a construction similar to the abla- cation of Vulgate sentences. To determine whether

161 N% pears to be a transitional form, in which the first Total 405 100 quod is not a complementizer, but a demonstra- Labeled Classical 258 64 tive introducing a direct quote (“you say this:I Labeled Medieval 147 36 . . . ”). This is evident from the following first- person verb, where an indirect quote ought to be Table 5: Classifier results on the Late Latin Vul- in second person. The use of quod here echoes the gate. Greek text and is an instance of the well-known Classical M(i) Medieval M(i) influence of the Greek Bible on Christian Latin Postnominal 868 Genitive pro- 757 (Lofstedt,¨ 1959, ch. 6). adj. (abl) nouns Any conjunc- 768 Preposition in 713 tion “in” 8 Discussion Preposition su- 725 Clause-initial et 601 per “on” “and” We find that TSGs are effective at identifying sev- Postnominal 631 Any adverb 559 adj. (acc) eral historical changes in a modestly-sized corpus Conjunction 600 Postnominal 507 of Latin text. This extends the results of earlier cum “when” adjective in PP papers which use TSGs to identify the writing of Table 6: Features important in the classification of non-native English users. Here, the same features Vulgate sentences, ranked by importance M(i). are applied to changes across time; we anticipate that similar results could be obtained in synchronic analysis of different dialects. this represents a failure to generalize, or genuine The approach does have significant limitations, ambiguity, we search the Vulgate Book of Revela- however. Firstly, the dependence on treebank tions by hand for verbs with clausal complements; parses limits the set of texts to which the method these are not particularly frequent, accounting for can be applied. Parsing historical data may require their small importance weights. However, both 4 specialized techniques (Pettersson et al., 2013) types of complements appear: and fits within a larger set of cross-domain pars- (5) his, qui se dicunt ing problems which are notoriously difficult (Mc- those.ABL who REFL.ACC say.3.PL Closky et al., 2010). In particular, we suspect that Judaeos esse, et non sunt the most difficult constructions will be precisely Jews.ACC be.INF and not be.3.PL the ones which are novel in a particular era or re- “those who say that they are Jews and are gion, since these may not appear in the training not” data. Parsers for Latin of any kind are rare, al- though working systems (McGillivray, 2013; Pas- sarotti and Dell’Orletta, 2010) do exist. (6) quia dicis quod dives sum . . . et Secondly, as seen above, the system has trouble because say.2 DEM rich be.1 . . . and unifying different examples of large constructions, nescis quia tu es miser such as clauses with and without modifiers. This not.know.2 COMP you be.2 poor prevents it from learning constructions larger than “For you say, “I am rich, . . . ” You do not one or two context-free rules due to data sparsity. realize that you are wretched” More expressive versions of TSG like Tree Ad- joining Grammar (Joshi and Schabes, 1997) have (7) diabolus . . . sciens quod modicum been studied as solutions to this problem, includ- devil . . . knowing COMP short ing variants reducible to TSG (Swanson et al., tempus habet 2013). It seems likely that such a more sophis- time has.3 ticated grammatical representation could help to “the devil [has come down to you with address this problem. great wrath], because he knows that his Although delexicalization of all content words time is short” was effective in controlling for the very different topics represented in our corpus, it also renders Example 5 shows the Classical infinitive clause the system incapable of recognizing any lexically and 7 the Medieval quod-clause. Example 6 ap- mediated changes. For instance, the system can- 4Translations from the New Revised Standard Edition. not represent changes in the argument structure

162 or subcategorization of a particular verb. Lofstedt¨ David Bamman, Jacob Eisenstein, and Tyler Schnoe- (1959) lists changes such as datives with verbs of belen. 2014. Gender identity and lexical varia- asking. Detecting this kind of change would re- tion in social media. Journal of Sociolinguistics, 18(2):135–160. quire relexicalizing the trees, and therefore devel- oping more sensitive statistical controls for topic. Max Bane, Peter Graff, and Morgan Sonderegger. Due to the rarity of any individual word in a small 2010. Longitudinal phonetic variation in a closed system. Proc. CLS, 46:43–58. corpus, however, a solution to this problem would be far less useful without methods for solving the Rens Bod and Ronald Kaplan. 1998. A probabilistic previous ones as well. Only with a large automat- corpus-driven model for lexical-functional analysis. In Proceedings of the 17th international conference ically parsed corpus and a method for reducing on Computational linguistics-Volume 1, pages 145– fragmentations could enough examples of a lexi- 151. Association for Computational Linguistics. cally specific construction be gathered for any but Trevor Cohn, Sharon Goldwater, and Phil Blun- the most common words. som. 2009. Inducing compact but accurate tree- Finally, the system cannot represent any substitution grammars. In Proceedings of Human changes involving semantic shifts. For instance, Language Technologies: The 2009 Annual Confer- (Sidwell, 1995, p. 364) describes shifts in the ence of the North American Chapter of the Associa- tense system, including the use of pluperfect tion for Computational Linguistics, pages 548–556. Association for Computational Linguistics. where perfect would be expected. Such changes cannot be detected from trees alone. Discovering Hal Daume´ III. 2004. Notes on CG and LM-BFGS them requires an ability to interpret the text and optimization of logistic regression. Paper avail- able at http://pub.hal3.name#daume04cg-bfgs, im- infer the implied time at which actions take place. plementation available at http://hal3.name/megam/, August. 9 Conclusion Gabriel Doyle. 2014. Mapping dialectal variation by Despite these limitations, we believe TSGs offer querying social media. In EACL, pages 98–106. a useful exploratory tool for discovering syntactic Jacob Eisenstein. 2015. Written dialect variation in variation in corpora. Such a tool can allow histor- online social media. In Charles Boberg, John Ner- ical linguists to learn about possible grammatical bonne, and Dom Watt, editors, Handbook of Dialec- changes in dead languages for which they have no tology. Wiley. native intuition, broadening the kinds of questions Joshua Goodman. 1996. Efficient algorithms for pars- they might investigate. This would parallel the re- ing the DOP model. In Proceedings of EMNLP. cent use of computational systems to learn about Jens T. Hanssen. 1945. Observations on Theodor- lexical variation, allowing similar insights about icus Monachus and his history of the old Norwe- the nature and history of syntactic change. gian kings, from the end of the XII. sec. Symbolae Osloenses, 24. Acknowledgments Simon Hornblower, Antony Spawforth, and Esther Ei- Ben Swanson, Brian Joseph, Alex Erdmann, Julia dinow. 2012. The Oxford Classical Dictionary. Ox- Papke and our reviewers, gratias vobis agimus. ford University Press. Anders Johannsen, Dirk Hovy, and Anders Søgaard. 2015. Cross-lingual syntactic variation over age and References gender. In Proceedings of CoNLL. J.H. Allen and J.B. Greenough. 1983. Allen and Gree- Mark Johnson. 2002. The DOP estimation method is nough’s New . Caratzas Publishing biased and inconsistent. Computational Linguistics, Co., Inc., New Rochelle, New York. 28(1):71–76. David Bamman and Gregory Crane. 2011. The an- Aravind K Joshi and Yves Schabes. 1997. Tree- cient Greek and Latin dependency treebanks. In adjoining grammars. In Handbook of formal lan- Language Technology for Cultural Heritage, pages guages, pages 69–123. Springer. 79–98. Springer. Dan Klein and Christopher Manning. 2004. Corpus- David Bamman, Marco Passarotti, Gregory Crane, and based induction of syntactic structure: Models of Savina Raynaud. 2007. A collaborative model of dependency and constituency. In Proceedings of the treebank development. In Proceedings of the Sixth 42nd Meeting of the Association for Computational International Workshop on Treebanks and Linguistic Linguistics (ACL’04), Main Volume, pages 478–485, Theories, pages 1–6. Barcelona, Spain, July.

163 Adam Ledgeway. 2012. From Latin to Romance: Mor- Federico Sangati and Willem Zuidema. 2011. Ac- phosyntactic typology and change. Oxford Univer- curate parsing with compact tree-substitution gram- sity Press. mars: Double-DOP. In Proceedings of the Con- ference on Empirical Methods in Natural Language L.R. Lind. 1941. Medieval Latin studies: Their nature Processing, pages 84–95. Association for Computa- and possibilities. University of Kansas Publications. tional Linguistics.

Einar Lofstedt.¨ 1959. Late Latin. Instituttet for Sam- Ruchita Sarawgi, Kailash Gajulapalli, and Yejin Choi. menlignende Kulturforsking. 2011. Gender attribution: tracing stylometric evi- dence beyond topic and genre. In Proceedings of David McClosky, Eugene Charniak, and Mark John- the Fifteenth Conference on Computational Natural son. 2010. Automatic domain adaptation for pars- Language Learning, pages 78–86. Association for ing. In Human Language Technologies: The 2010 Computational Linguistics. Annual Conference of the North American Chap- Graham Shipley, John Vandespoel, David Mattingly, ter of the Association for Computational Linguistics, and Lin Foxhall. 2008. The Cambridge Dictio- pages 28–36, Los Angeles, California, June. Associ- nary of Classical Civilization. Cambridge Univer- ation for Computational Linguistics. sity Press. Barbara McGillivray. 2013. Methods in Latin Compu- Keith Sidwell. 1995. Reading Medieval Latin. Uni- tational Linguistics. Brill. versity of Cambridge.

Joakim Nivre and Jens Nilsson. 2005. Pseudo- Ian Stewart. 2014. Now we stronger than ever: projective dependency parsing. In Proceedings of African-american syntax in Twitter. EACL 2014, the 43rd Annual Meeting on Association for Compu- page 31. tational Linguistics, pages 99–106. Association for Computational Linguistics. Ben Swanson and Eugene Charniak. 2012. Native language detection with tree substitution grammars. Marco Passarotti and Felice Dell’Orletta. 2010. Im- In Proceedings of the 50th Annual Meeting of the provements in parsing the Index Thomisticus tree- Association for Computational Linguistics: Short bank. revision, combination and a feature model for Papers-Volume 2, pages 193–197. Association for medieval Latin. In Proceedings of LREC. Computational Linguistics. Ben Swanson and Eugene Charniak. 2013. Extract- Marco Carlo Passarotti. 2011. Language resources. ing the native language signal for second language the state of the art of Latin and the Index Thomisti- acquisition. In HLT-NAACL, pages 85–94. cus treebank project. In Corpus anciens et Bases de donnes, ALIENTO. changes sapientiels en Mditer- Ben Swanson and Eugene Charniak. 2014. Data rane, pages 301–320. ALIENTO. driven language transfer hypotheses. In EACL, pages 169–173. Eva Pettersson, Beata´ Megyesi, and Jorg¨ Tiedemann. 2013. An SMT approach to automatic annotation Ben Swanson, Elif Yamangil, Eugene Charniak, and of historical text. In Proceedings of the Workshop Stuart M Shieber. 2013. A context free TAG variant. on Computational Historical Linguistics at NODAL- In ACL (1), pages 302–310. IDA 2013, NEALT Proceedings Series, volume 18, pages 54–69. Sze-Meng Jojo Wong and Mark Dras. 2011. Exploit- ing parse structures for native language identifica- Harm Pinkster. 1990. Latin Syntax and Semantics. tion. In Proceedings of the Conference on Empiri- Routledge. cal Methods in Natural Language Processing, pages 1600–1610. Association for Computational Linguis- Matt Post and Daniel Gildea. 2009. Bayesian learning tics. of a tree substitution grammar. In Proceedings of the Sze-Meng Jojo Wong, Mark Dras, and Mark Johnson. ACL-IJCNLP 2009 Conference Short Papers, pages 2012. Exploring adaptor grammars for native lan- 45–48. Association for Computational Linguistics. guage identification. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natu- Paolo Ramat. 1991. On Latin absolute constructions. ral Language Processing and Computational Natu- Linguistic Studies on Latin: Selected Papers from ral Language Learning, pages 699–709. Association the 6th International Colloquium on Latin Linguis- for Computational Linguistics. tics, pages 259–268. F.A. Wright. 1933. Letter to Eustochium: Select letters Delip Rao, David Yarowsky, Abhishek Shreevats, and of St. Jerome. Harvard University Press. Manaswi Gupta. 2010. Classifying latent user at- tributes in Twitter. In Proceedings of the 2nd in- ternational workshop on Search and mining user- generated contents, pages 37–44. ACM.

164