Strong Generative Capacity of Morphological Processes

Strong Generative Capacity of Morphological Processes

Proceedings of the Society for Computation in Linguistics Volume 4 Article 22 2021 Strong Generative Capacity of Morphological Processes Hossep Dolatian Stony Brook University, [email protected] Jonathan Rawski Stony Brook University, [email protected] Jeffrey Heinz Stony Brook University, [email protected] Follow this and additional works at: https://scholarworks.umass.edu/scil Part of the Computational Linguistics Commons Recommended Citation Dolatian, Hossep; Rawski, Jonathan; and Heinz, Jeffrey (2021) "Strong Generative Capacity of Morphological Processes," Proceedings of the Society for Computation in Linguistics: Vol. 4 , Article 22. DOI: https://doi.org/10.7275/sckf-8f46 Available at: https://scholarworks.umass.edu/scil/vol4/iss1/22 This Paper is brought to you for free and open access by ScholarWorks@UMass Amherst. It has been accepted for inclusion in Proceedings of the Society for Computation in Linguistics by an authorized editor of ScholarWorks@UMass Amherst. For more information, please contact [email protected]. Strong generative capacity of morphological processes Hossep Dolatian, Jonathan Rawski, and Jeffrey Heinz Department of Linguistics & Institute for Advanced Computational Science Stony Brook University hossep.dolatian,jeffrey.heinz,jonathan.rawski @stonybrook.edu { } Abstract have not been previously applied to computational morphology: origin semantics (Bojanczyk´ , 2014) Morphological processes are generally com- and order-preservation (Filiot, 2015). They provide putable with 1-way finite-state transducers. a unique lens for examining the input-output cor- However, we show that 1-way transducers do not capture the strong generative capacity of respondences created by different classes of finite- certain morphological analyses for more com- state grammars and their corresponding logical plex processes, including mobile affixation, in- transductions (Engelfriet and Hoogeboom, 2001). fixation, and partial reduplication. As diagnos- We use these diagnostics to show that simple tics for strong generative capacity, we use ori- affixation is definable with 1-way FSTs both in gin semantics and order-preservation. These terms of WGC and SGC. However, depending on analyze the input-output correspondences gen- erated by finite-state transducers and their cor- the specific morphological theory, these diagnostics responding logical transductions. For some indicate that 1-way FSTs do not match the SGC linguistic analyses of these complex processes, of more complex processes. Instead, some mor- their strong generative capacity is matched by phological analyses are more faithfully computed more expressive grammars, such as non-order- with more expressive non-order-preserving trans- preserving transductions and their correspond- ductions which themselves are computed by 2-way ing 2-way finite-state transducers. FSTs. These results do not argue against the prac- ticality or efficiency of 1-way FSTs. Instead, they 1 Introduction are scientific results about the computational and A central goal of computational morphology is mathematical properties of morphology. to define the minimally sufficient and restrictive This paper is organized as follows. We review classes of grammars which can compute attested mathematical results on generative capacity in lin- morphological processes. Virtually all of morphol- guistics in 2. In 3, we define origin semantics and § § ogy is sufficiently computable with 1-way finite- order-preservation as diagnostics for SGC. We use state transducers (FSTs) (Roark and Sproat, 2007). these diagnostics in 4 to show how 1-way FSTs § Furthermore, most of morphology can be computed capture the SGC of simple affixation. In 5, we § with restricted subclasses of these finite-state gram- show how 1-way FSTs do not capture the theory- mars (Chandlee, 2017). Thus, 1-way FSTs are dependent SGC for other morphological processes, adequate in weak generative capacity (WGC). while 2-way FSTs do. We conclude in 6. We § This paper examines the strong generative ca- provide an appendix A of some illustrative 2-way § pacity (SGC) of 1-way FSTs when computing mor- FSTs which do capture the SGC of these analyses. phological functions. For a given theory, we find a divergence between the WGC and SGC of different 2 Weak vs. strong generative capacity morphological processes, including infixation, mo- bile affixation, and partial reduplication. There is a Given a grammar, its WGC defines the set of forms longstanding controversy around defining adequate which it can generate, usually stringsets. In con- diagnostics for the SGC of linguistic structures trast, its SGC defines the type of hidden structure (Manaster-Ramer, 1987a; Miller, 1991, 1999). For that it posits during the derivation. It is generally our purposes, we use two diagnostics which are harder to determine the SGC of a grammar than its well-defined in theoretical computer science, but WGC. Informally there are two issues: 228 Proceedings of the Society for Computation in Linguistics (SCiL) 2021, pages 228-243. Held on-line February 14-19, 2021 1. Fundamental issues in SGC ing) should be used for SGC, and thus what diag- (a) Grounding: basis for interpretations nostics or metrics to use. (b) Diagnostic: formal tools for evaluations In terms of WGC, virtually all attested morpho- logical and phonological processes are sufficiently The grounding for SGC is the external basis characterized by the class of Regular languages assumed when evaluating grammars. For syntax, and functions (Johnson, 1972; Koskenniemi, 1983; the external basis for evaluating SGC is semantic Sproat, 1992; Ritchie, 1992; Kaplan and Kay, 1994; interpretation and constituency, i.e., if a grammar’s Beesley and Karttunen, 2003; Roark and Sproat, phrase structure tree is similar to the semantic inter- 2007). In fact, most of these processes only require pretation. The diagnostic for SGC is simply the set less expressive subclasses of subregular languages of formal tools used to determine ‘similarity’. The and rational functions (Rogers and Pullum, 2011; simplest diagnostic is to require, for example, that Rogers et al., 2013; Heinz and Idsardi, 2013; Chan- the tree and semantics are identical. More elabo- dlee, 2014, 2017; Aksenova¨ et al., 2016; Chandlee rate diagnostics utilize nuanced interpretations and and Heinz, 2018; Chandlee et al., 2018; Heinz, deductions from tree geometry (Miller, 1999). 2018). The exception is total reduplication which In syntax, WGC and SGC often converge. Most is not definable with FSAs (Culy, 1985) or 1-way context-free (CF) phenomena are CF in both WGC FSTs (Chandlee, 2017). Furthermore, many the- and SGC (Chomsky, 1956; Pullum and Gazdar, ories of phonology are computationally proven 1982; Gazdar and Pullum, 1985), while most non- to be notationally equivalent and thus equivalent CF phenomena are non-CF in both WGC and SGC in WGC. This includes theories for phonotactics (Culy, 1985; Radzinski, 1991; Stabler, 2004; Ko- (Graf, 2010a,b), vowel harmony (Andersson et al., bele, 2006; Clark and Yoshinaka, 2014). But, WGC 2020), syllabification (Strother-Garcia, 2019), and and SGC can diverge when the overt syntax is CF, tone (Danis and Jardine, 2019; Jardine et al., 2020; 2 but the associated semantics is non-CF (Radzinski, Oakden, 2020). For morphology, many theories 1990). For example, both Dutch and Swiss German are likewise finite-state definable and thus equiva- have cross-serial clause constructions where the lent in WGC (Karttunen, 2003; Roark and Sproat, languages contain a sequence of noun phrases, fol- 2007; Ermolaeva and Edmiston, 2018). lowed by a sequence of verbs which subcategorize There are few debates on the SGC of phonol- for these nouns: N1N2N3V1V2V3. In terms of their ogy and morphology. For phonology, the proper semantics, such constructions are non-CF in both grounding for SGC is unclear. For morphology, the languages (Bresnan et al., 1982; Shieber, 1985), grounding of SGC is often treated as the semantic and thus non-CF in SGC. But in Dutch, these se- constituency of words. Due to prefix-suffix depen- quences are CF in terms of WGC because there is dencies, the semantic constituency of words (SGC) no overt morphological marking for subcategoriza- is context-free (Langendoen, 1981; Selkirk, 1982; tion between verbs and nouns. In contrast, Swiss Carden, 1983; Oseki, 2018; Oseki et al., 2019; Os- German nouns show different case marking based eki and Marantz, 2020); but in practice, the mor- on the verbs which subcategorize for them. Thus, photactics of words (WGC) are regular (Hammond, these constructions are non-CF in both WGC and 1993; Bjorkman and Dunbar, 2016; Aksenova¨ and SGC in Swiss German, but only in SGC in Dutch.1 De Santo, 2019).3 Furthermore, although partial In morphology and phonology, there are fewer 2There is debate on the WGC of constraint-interaction debates on generative capacity. We speculate that grammars like Optimality-Theory (Prince and Smolensky, this is due to two issues. First, morphology and 2004). There are many finite-state approximations (Eis- phonology have comparatively restrictive WGC. ner, 1997, 2000a,b; Karttunen, 1998; Frank and Satta, 1998; Riggle, 2004; Gerdemann and Van Noord, 2000; Gerde- Second, it is unclear what external basis (ground- mann and Hulden, 2012) of varying computational tractabil- ity (Idsardi, 2006; Heinz et al., 2009). But in principle, 1A more elaborate example is total reduplication (copy- constraint-interaction can express non-regular functions (La- ing),

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    17 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us