Quick viewing(Text Mode)

Analytic Morphology – Merging the Paradigmatic and Syntagmatic Perspective in a Treebank

Analytic Morphology – Merging the Paradigmatic and Syntagmatic Perspective in a Treebank

Analytic – Merging the Paradigmatic and Syntagmatic Perspective in a Treebank

Vladimír Petkevicˇ Alexandr Rosen Hana Skoumalová Premyslˇ Vítovec Charles University in Prague, Faculty of Arts [email protected] [email protected]

Abstract (often even phonological ); (ii) AVFs may be expressed by a potentially discontinuous string We present an account of analytic of a content verb and multiple auxiliaries, some- forms in a treebank of Czech texts. Ac- times in an order determined by information struc- cording to the Czech linguistic tradition, ture rather than by rules of morphology or syntax description of periphrastic constructions is proper; (iii) some auxiliary forms share properties a task for morphology. On the other hand, with some content – like weak pronouns, their components cannot be analyzed sep- the auxiliary is a 2nd position clitic. arately from syntax. We show how the Our claim is that the two views are compati- paradigmatic and syntagmatic views can ble, complementary and amenable to formaliza- be represented within a single framework. tion within a single framework, combining the traditional paradigmatic view with a syntagmatic 1 Introduction view. This reconciliatory effort is part of a more Analytic verb forms (henceforth AVFs) consist of general goal: a choice of different interpretations one or more auxiliaries and a content verb. The of annotated corpus data, depending on the prefer- auxiliaries can be seen either as marking the con- ences of a user or an application. tent verb with morphological categories or as be- AVFs are assigned a syntactic structure: the (fi- ing part of a multi-word expression, to which the nite) auxiliary is treated as the surface head, gov- categories are assigned. This is the perspective erning the rest of the form – the deep head.2 taken by all standard books of Czech, In Czech, AVFs are used to express the verbal which treat AVFs as a morphological rather than categories of mood, tense and in periphrastic a syntactic phenomenon. AVFs are listed in con- passive (all moods and tenses), in periphrastic fu- jugation paradigms quite like synthetic forms for a ture, in 1st and 2nd person past tense, in pluper- good reason: from a meaning-based view, whether fect and in present and past conditional. In all a certain category in a certain happens to these forms the auxiliary is být ‘to be’. Here we be expressed by a single word or a string of words focus on past tense and conditional forms, includ- is an epiphenomenon. ing pluperfect and past conditional, but the solu- From a different perspective, each of the com- tion works for all the above AVFs, and covers also ponents has its role in satisfying a syntactic gram- negation of some components of the AVFs by the maticality constraint and in making a contribution prefix ne- and can be extended to some other kinds to the lexical, grammatical or semantic meaning of function words, such as prepositions and con- of the whole. This approach is common in both junctions. In (1)–(4) below we show some prop- corpus and generative linguistics (including the- erties of the past and conditional forms. The fi- 1 ories such as LFG or HPSG), where each form nite auxiliary is marked for person, number and is treated as a syntactic word and AVFs belong to mood, while the l-participle3 is marked for gender the domain of syntax. As a result, morphological and number. Past tense (1) consists of the aux- categories are not assigned to units spanning word iliary in the present tense and the l-participle of boundaries. This is for several reasons: (i) an AVF does not emerge as a single orthographical word 2We use the term government in the sense of “subcatego- rization” or “imposition of valency requirements.” 1See, e.g., Webelhuth (1995), Dalrymple (1999), Pollard 3We avoid the frequently used term past participle be- and Sag (1994). cause the same form is also used in present conditional.

9 Proceedings of the 5th Workshop on Balto-Slavic Natural Language Processing, pages 9–16, Hissar, Bulgaria, 10–11 September 2015. the content verb. Present conditional (2) consists and Karlík (2004), analyze past tense and pe- of the conditional auxiliary and the content verb’s riphrastic passive within the Minimalist Program. l-participle. Past conditional (3) includes an addi- A non-transformational account was pursued by tional l-participle of the auxiliary. In (4) we show Karel Oliva in an HPSG-inspired prototype gram- that other words can be inserted, the auxiliary mar checker of Czech (Avgustinova et al., 1995). l-participle can be repeated, and any l-participle HPSG and LFG have been used to account for sim- can be negated. ilar phenomena in closely related Polish, where the border between morphology and syntax is even (1) Já jsem pˇrišel less apparent than in Czech: all forms of the past I be.PRS.1SG come.PTCP.M.SG tense and conditional auxiliaries are floating suf- ‘I have come.’ fixes, attached either directly to the l-participle, (2) Já bych pˇrišel or to some other preceding word. In the follow- I be.COND.1SG come.PTCP.M.SG ing, we briefly review several proposals for Polish, ‘I would come.’ with an extension to Czech. Based on the analysis of similar phenomena (3) Já bych byl pˇrišel in West European , Borsley (1999) pro- I be.COND.1SG be.PTCP.M.SG come.PTCP.M.SG poses two structures for modelling Polish AVFs: ‘I would have come.’ (i) classic VP complementation where the auxil- (4) Kdybys tenkrát nebyl iary is a -raising verb selecting a phrasal If-be.COND.2SG back then be.PTCP.M.SG.NEG complement headed by an l-participle (Fig. 1), býval tak duchapˇrítomnˇe and (ii) flat structures where the auxiliary subcat- be.PTCP.M.SG.ITER so readily egorizes for an l-participle and its complements 6 zasáhl... (Fig. 2). The former is used for future tense while intervene.PTCP.M.SG the latter for present conditional and past tense. ‘If you haven’t intervened so readily back This distinction is motivated by the ability or in- then...’ ability of the auxiliary to be preceded by the asso- ciated l-participle and its complements: while the We exemplify the solution using a treebank of future auxiliary allows for VP-preposing, the other 4 Czech. The framework is based on the HPSG. auxiliaries are prohibitive in this respect. The annotation, originally produced by a stochas- tic dependency parser, is checked by a formal VP grammar, using a valency and imple- Aux VP mented in Trale.5 Trees complying with grammat- ical and lexical constraints are augmented with in- Figure 1: VP complementation. formation derived from the lexicon and any anno- tation provided by a stochastic parser. VP 2 Previous Work Aux VCC... of Czech take a paradigmatic perspec- tive, treating AVFs as an exclusively morphologi- Figure 2: Flat structure. cal phenomenon (Karlík et al., 1995; Cvrcekˇ et al., 2010; Komárek et al., 1986), glossed over without Kups´c´ (2000) follows Borsley (1999) but rejects describing their syntagmatic and word-order prop- the flat structure for past tense and present con- erties. In Komárek et al. (1986), components of ditional as it makes incorrect predictions with re- AVFs are assigned a particular grammatical mean- spect to clitic climbing. Instead, she assumes VP ing (person, number, tense, mood, voice) but their complementation for all AVFs. syntactic status is not specified. Kups´c´ and Tseng (2005) argue against the uni- The syntagmatic approach has been introduced fied treatment of AVFs. Only the future tense aux- to Czech by Veselovská (2003) and Veselovská iliary behaves like a full syntactic word. In con- 4See, e.g., Pollard and Sag (1994). trast, the forms of conditional auxiliary, albeit syn- 5See http://www.sfs.uni-tuebingen.de/hpsg/archive/ projects/trale/. For more details see Jelínek et al. (2014). 6Heads are denoted by boxed nodes in the figures.

10 VP SurfHead DeepHead VP+MRK XP bych2 Figure 3: Local marker. SurfHead DeepHead nebyl1 spal3

VP Figure 5: Structure of nebyl bych spal ‘I wouldn’t have slept’. XP+MRK VP

Figure 4: Nonlocal agreement marker. 3 Our Approach 3.1 Two Types of Heads: Surface and Deep tactic words, are clitics and thus subject to spe- In addition to strictly linguistic criteria for an op- cific constraints (dependencies on var- timal analysis of AVFs, our choice of the core rep- ious clitic hosts). Past tense is viewed as a sim- resentation format was influenced by the treebank ple tense and the past tense agreement markings design, which should allow for the derivation of are treated as inflectional elements, even if they syntactic structure and categorial labels of vari- are not attached to the l-participle. This analysis ous shapes and flavours to be used in queries, re- builds on (i) an observation that agreement mark- sponses and exported data. Adopting a uniform ings are much more closely bound to the preceding analysis for all AVFs simplifies the task. Each word than conditional clitics, and (ii) the fact that AVF is represented as a syntactic phrase with two there are no agreement markings used in the third constituents: a surface head daughter represent- person. As a result, the l-participle becomes the ing the auxiliary, and a deep head daughter rep- head of the whole structure. In order to ensure that resenting the auxiliary’s VP complement, which the agreement marking appears somewhere in the includes the content verb.7 Multiple auxiliaries structure the head acts as its trigger, carried by an within a single AVF are surface heads within re- agreement marker, either the head itself (Fig. 3), cursively embedded deep heads (see Fig. 5).8 or some other preceding element (Fig. 4). 3.2 Modifications of the HPSG Signature In light of diachronic and comparative consider- HPSG represents linguistic data as typed feature ations, Tseng and Kups´c´ (2006) and Tseng (2009) structures. Words and phrases are subtypes of extend the analysis of Kups´c´ and Tseng (2005) sign, a structure representing their form, meaning to other Slavic languages, including Czech. The and combinatorial properties. Fig. 6 shows a sim- Czech past auxiliary forms are at the same time plified representation of an English sentence dogs syntactic words and clitics with a restricted distri- bark. Types are in italics, attributes in upright cap- bution (2nd position, cannot be negated). More- itals, boxed numbers indicate identity of values. over, the 2nd person singular clitic -s is similar Each word consists of two parts: PHONOL- to the Polish floating suffixes, suggesting that the OGY for the analyzed string and SS (SYNSEM) head is the l-participle. As a result, the analysis for its paradigmatic analysis. Phrases have two of the Polish past tense can be applied to Czech additional attributes: SD (SUBJECT-DAUGHTER) with only a slight modification: the agreement and HD (HEAD-DAUGHTER). The value of SS markings are carried (mostly) by syntactic rather has L (LOCAL) as its single attribute; its NON- than morphological elements. No changes are pro- LOCAL counterpart, used for discontinuous con- posed for the analysis of the Czech conditional stituents, is not relevant for our example. The CAT either, where the only complication is the sepa- (CATEGORY) attribute specifies (i) morphosyntac- rable ending -s in the 2nd person singular (bys). tic properties of the expression as its HEAD fea- However, the extremely restricted distribution of tures and (ii) its VALENCY. The CONT (CONTENT) this phenomenon (only in combination with the attribute is responsible for semantic interpretation. si and se reflexives, resulting in the sis and ses forms) does not motivate treating Czech condi- 7Cf. Przepiórkowski (2007) for an equivalent distinction tional structures like Polish past tense structures. between syntactic and semantic heads. 8The node labels in Fig. 5 are actually feature structure The authors admit that the analysis based on the attributes modelling phrasal daughters, abbreviated as SH and standard VP complementation is equally possible. DH in Fig. 7.

11 as numerals and pronouns. Their standard defini- phrase tions are based on semantic criteria, but otherwise PHONOLOGY dogs, bark h i cardinal numerals and personal pronouns behave  verb  HEAD 1 CAT VFORM finite like , whereas ordinal numerals and posses-        SS L VALENCY  sive pronouns behave like . The cross-  h i      bark   classification can also be used to model some regu-  CONT 2     3      ARG1   lar derivational relations, e.g., deverbal nouns and         word   adjectives (inflectional classes) are derived from  PHONOLOGY dogs   h i  (lexical class).     HEAD   CAT CASE nom        3.3 Representing Analytic Categories    VALENCY    SD  h i      NUMBER pl  To accommodate AVFs, the 3D classification has  SS 4L       INDEX 3 GENDER neut  been extended by an analytic dimension. The         CONT  "PERSON 3rd # AC attribute specifies categories appropriate to the          dog     RESTR   AVF as a whole. A verbal AC includes three ba-      INSTANCE 3         sic properties: TENSE, MOOD and VOICE. Their        word   values are encoded in the lexical specifications of  PHONOLOGY bark   h i  function words, including the deep head’s con- HD  HEAD 1    CAT   SS L VALENCY 4  tribution, which is mediated through the valency      h i    CONT 2   frame of the auxiliary, including the content verb’s           ALEMMA. The rest is the task of ValP and HFP. Figure 6: An HPSG representation of a sentence. More specifically, the surface head and its mother share their head features, including the analytic categories, and their deep valency frames – the Some values are shared due to Head Feature Prin- deep structure is thus available in the phrasal cat- ciple – HFP, projecting features of the head daugh- egory. Since AC is a head feature, tense, mood ter to its phrasal mother, Valency Principle – ValP, and voice are projected from the auxiliary as the a general valency satisfaction mechanism, and Se- surface head of the AVF. mantics Principle. Morphosyntactic categories of the noun relevant for pronominal reference are 3.4 An Example the properties of CONTENT’s INDEX, while those The mechanism is illustrated in Figs. 5 and 7, us- of the verb relevant for agreement are specified ing the past conditional form of the verb spát ‘to indirectly, as properties of its subject. For lan- sleep’ (5).10 guages with rich morphology, NP-internal agree- ment and null subjects, such as Czech, other ar- (5) nebyl bych spal rangements of morphosyntactic and valency fea- be.PTCP.M.SG.NEG be.COND.1SG sleep.PTCP.M.SG tures have been proposed. ‘I wouldn’t have slept’ In addition to the introduction of surface and deep heads, the standard HPSG signature has been Past conditional consists of the finite condi- modified in two main aspects: (i) at least in the tional auxiliary (bych), the l-participle form of the current version, attributes such as LOCAL and ‘to be’ auxiliary (nebyl) and the l-participle of the HEAD are missing to simplify annotation of ex- content verb (spal).11 tensive data – discontinuities are treated as word 10Fig. 5 ignores word order, which is specified within the order variations and head features are the value PHON list of the phrase. of CATEGORY; and (ii) the signature is extended 11Past conditional may include additional l-participle aux- by introducing a cross-clasification of morpholog- iliaries with the meaning unchanged: an iterative and a plain form (6). Passive past conditional, where two l-participle ical and morphosyntactic categories along three auxiliaries are obligatory (7), shows that the iterative is used dimensions: morphological (inflectional), syntac- to avoid two identical l-participles. tic and semantic (lexical).9 This is useful espe- (6) nebyl bych býval cially for word classes where classification crite- be.PTCP.M.SG.NEG be.COND.1SG be.PTCP.M.SG.ITER ria in the three dimensions do not coincide, such (byl) spal (be.PTCP.M.SG) sleep.PTCP.M.SG 9See Rosen (2014) for more details. ‘I wouldn’t have slept’

12 The auxiliary bych is the surface head of the en- tire structure (see Fig. 5). Its sister phrase, i.e., sdheaded PH nebyl, bych, spal h i nebyl spal, is the deep head, consisting of nebyl  ss  and spal as the surface and the deep head daughter.  cat    aVerb   The auxiliary bych takes a single l-participle, un-    ALEMMA 2 spát  specified as a content verb or auxiliary (see Fig. 8      AMOOD cond     AC   below). This distinction, related to the interpre-   1   SS C  APOL minus   tation of tense and/or voice of the AVF, is han-    ATENSE past              AVOICE 3 actv   dled by grammar – see (8)–(12) below. In our          IC iFinCond: být, sg, first   example, the conditional auxiliary takes an aux-        LC lVerbAux: být, imperf, plus  iliary form nebyl, which in turn can take another  COMPS     h i   l-participle as part of (i) indicative pluperfect (as  word      PH bych  in byl spal, ‘he had slept’), or (ii) past conditional,   h i   SH ss  as in our example. It is the presence or absence of  1   SS C   the conditional auxiliary that identifies the struc-   "COMPS 5 #    h i   ture as conditional or indicative. The substructure   sdheaded      PH nebyl, spal  nebyl spal determines its tense as past, the result-  h i    ss  ing phrase is thus identified as past conditional.    cat     aVerb   The binary tree shown in Fig. 5 is represented       ALEMMA 2   as a feature structure of the sdheaded type (i.e., a         AC APOL minus   surface/deep-headed phrase) in Fig. 7. The struc-        SS C 4 APPLE more       AVOICE 3   ture is similar to that in Fig. 6, except for the           IC iLPple: být, ma, sg  additions and some abbreviations: PH stands for           lVerbAux:   PHONOLOGY, SH for the surface head daughter,    LC       být, imperf, minus   DH C CATEGORY        for the deep head daughter, for  COMPS   12    h i   and COMPS for non-subject valency.   word         PH nebyl  The C attribute consists of three parts, repre-  h i   SH  ss   DH 5  senting three aspects of the category: analytic in   SS C 4  13      AC, inflectional in IC and lexical in LC. The AC    "COMPS 6 #     h i   attribute includes the lemma of the content verb   word      2   PH spal  (ALEMMA, shared as with the lexical deep head    h i    ss  and all its deep head projections), its mood, po-   cat      larity (minus due to the negated auxiliary nebyl),    aVerb         ALEMMA 2   tense and voice (actv for active). As in ALEMMA,          AC APOL plus  3 shows that the lexical deep head shares AVOICE  DH         APPLE one    SS 6C   with its projections. AMOOD and ATENSE are un-      AVOICE 3         specified, because they can be determined only     IC iLPple: spát,ma,sg       when the AVF is evaluated as a whole. E.g.,      lVerbMain:      LC spát, imperf,        the embedded DH phrase nebyl spal can be either      "plus, reflno #        part of indicative pluperfect or past conditional    COMPS      h i  and the content verb participle spal can be part of      past or pluperfect indicative, pluperfect indicative, Figure 7: Analysis of nebyl bych spal. present conditional or past conditional.

(7) byl bych býval / Values of the other two attributes IC and LC are be.PTCP.M.SG be.COND.1SG be.PTCP.M.SG.ITER / abbreviated: the categorial type is followed by a ?byl dopaden be.PTCP.M.SG catch.PASS.M.SG list of attribute values. More importantly, they re- ‘I would have been caught’ fer only to the surface head of the phrase. They are obtained from the input parse. The grammar 12Subject valency, specified by a separate attribute, SUBJ, is not shown in Fig. 7. checks that some of them (person, number, also 13sC for the syntactic aspect is omitted for brevity. gender) agree with corresponding values in the

13 rest of the predicate and/or in the subject. Inflectional properties of the surface head as entry ss the finite conditional auxiliary (iFinCond) are first  cat  person singular of the lemma být. As the value of    aVerb  LC shows, bych is an imperf ective positive (plus)  AC AMOOD cond          AVOICE 1  form of the verb být.  C         The top-level SS part, concerning the entire   IC iFinCond       lVerbAux   SH      phrase, is followed by two daughters: (i)   LC LASPECT imperf        bych, whose categorial properties are identified  "LLEMMA být #       SUBJ 3  with those of the whole phrase ( 1 , due to HFP), SS       arg  and whose single valency is identified with the SS      SFUN dhead     part of its deep head sister ( 5 ); and (ii) DH nebyl    ss      spal. As for AC, the negative polarity minus as the   cat        value of APOL is due to the negative form nebyl.  COMPS  aVerb   C AC   h  1  i The rather technical APPLE attribute specifies the   SS  AVOICE           number of l-participles (more) and helps to deter-     IC iLPple       2      DVAL    mine AMOOD and ATENSE. The attributes IC and    SUBJ 3         LC refer only to the l-participle (iLPple) nebyl as  DVAL 2        the surface head of the embedded phrase in sin-    gular masculine animate. LC (the lexical cate- Figure 8: Lexical entry for the conditional auxil- gory) states that nebyl is a negative form of the iary (e.g., bych). imperf ective auxiliary být. The COMPS (non-subject valency) list is empty Fig. 8 describes the conditional auxiliary irre- – the phrase nebyl spal is saturated. It is made up spective of person or number, i.e., including bych. of DH, the content verb spal, and SH, the auxil- The value of C determines the iary participle nebyl, whose C is shared with that (in AMOOD) for the whole AVF. Its voice is the its mother’s C ( 4 ) and whose single item on the same as the voice of its l-participle complement COMPS list ( 6 ) is identified with its deep head sis- and of the entire structure. Inflectional category is ter, the content verb’s SS. The categorial features finite conditional and lexically a form of the im- of the content verb are specified in SS|C. The form perfective auxiliary být (lVerbAux). The valency is positive (APOL plus) and the phrase spal con- (COMPS) specifies an l-participle (iLPple) whose sists of the single form (APPLE one). As above, deep valency is shared with that of bych itself. the values of the IC and LC attributes refer to the The subject of bych is also shared with that of its form spát itself: lemma = spát, masculine animate complement, including a potentially null subject. form (ma), imperf ective voice, polarity positive The lexical entry for the form nebyl in Fig. 9 dif- (plus), non-reflexive (reflno) content verb (lVerb- fers from the entry for bych in the following re- Main). The intransitive content verb has no non- spects: (i) no value of mood is present (there is subject valency – the COMPS list is empty. no AMOOD in the AC attribute), (ii) the type of Representations of AVFs are built from: (i) the IC attribute is iLPle, i.e., the form nebyl is an skeletal phrase structures, converted from depen- l-participle, and (iii) there is an SC (syntactic cat- dency trees produced by the parser, including mor- egory) attribute whose sLPple value states that the phosyntactic information about the terminals, and form is a syntactic participle rather than iFinPlain, (ii) valency of auxiliaries (except for subject, va- reserved for 3rd person l-participles. lency of content verbs are irrelevant for analytic predicates). 3.6 Constraints for the Analytic Categories Additional specifications are due to constraints of 3.5 Lexical Entries for the Auxiliaries the grammar. Deep and surface heads share their The forms bych and nebyl, used in Fig. 7, are de- ALEMMAs (8), deep head shares AVOICE with its rived from lexical entries shown in Figs. 8 and 9. auxiliary (9), l-participle surface head is marked as The entries stand for all forms of the conditional APPLE:more if the deep head is also an l-participle auxiliary and l-participle. (10), tense is determined by the mood and num-

14 4 Discussion entry ss We presented a uniform and compact approach   cat to the annotation of AVFs, supporting effective  aVerb   AC    1   search options in a treebank. Information about   AVOICE       an AVF as a whole is contained in its analytic cat-   IC iLPple    C    egory (AC, Fig. 7) in the phrasal node represent-    lVerbAux          LC LASPECT imperf   ing this form (Fig. 5). E.g., content verbs in past    "LLEMMA být #       conditional can be retrieved by a straightforward        SC sLPple   query quoting appropriate values of the ACAT at-      SS SUBJ 3   tributes. The entire AVF, including auxiliaries, is      arg    SFUN dhead  retrieved when the selection is extended by all sur-       ss   face and deep heads along the analytic projection      cat  of the content verb.        COMPS  aVerb     h C AC i This is an advantage over an approach adopted,  SS  AVOICE 1    14       e.g., in the Prague Dependency Treebank (PDT).     IC iLPple          15     2   On its analytic level, auxiliaries are immedi-    DVAL       SUBJ 3   ate dependents of a content verb (unless coordi-        DVAL 2       nation is involved) and sisters of dependents of    other types. Thus it is not easy to identify AVFs Figure 9: Lexical entry for the past auxiliary l- and their type or to infer their properties, e.g., as participle (e.g., nebyl). a response to a query. On the PDT’s tectogram- matical level the auxiliaries are absent: an AVF is ber of the l-participles (11), polarity of the entire represented as a single complex node, but compo- AVF is positive unless any of its constituents is nents of the complex node on the analytic level can negated (12). be recovered since the representations on the two Using the same mechanism with different at- levels are interlinked. However, the corpus anno- tributes, prepositions and conjuctions as surface tated on the tectogrammatical level is too small for heads can model the AC of prepositional phrases many research tasks. and subordinate clauses. AVFs can have a complex internal syntax. If

SH ..ALEMMA 1 there is a single auxiliary for two or more coordi- (8) sdheaded | → DH ..ALEMMA 1 nated content verbs (e.g., in Já jsem pˇrišela vidˇel.  |  ‘I came and saw’), the two content verbs as well sdheaded SH ..AVOICE 1 (9) | as the predicate are identified as active past in- SH ..LC lVerbAux → DH ..AVOICE 1  |   |  dicative forms. On the other hand, such structures sdheaded are very difficult to identify on the PDT analytic SH ..LC lVerbAux (10) | SH ..APPLE more layer. Searching, e.g., for all present condition- SH ..SC sLPple → | | DH ..SC sLPple als, requires a complex query, based on detailed  |      knowledge of the PDT representation. sdheaded SH ..AMOOD 1 The automatically determined analytic cate- SH ..LC lVerbAux | (11) | SH ..ATENSE 2 SH ..IC iFin → | gories can be projected to a different annotation | "DH ..APPLE 3 # 16 DH ..SC sLPple | format, including PDT or CoNLL-U. At the very  |   mood_tense( 1, 2 , 3 ) least, the annotation of content verbs can be ex- mood_tense( ind, past, one ) mood_tense( ind, plusq, more ) tended by analytically determined specification of ∧ mood_tense( cond, pres, one ) mood, tense and voice. In addition to theoret- mood_tense( cond, past, more ) ical interest, some NLP applications may profit

sdheaded DH ..APOL 1 from the identification of AVFs as a distinctive | (12) SH ..LC lVerbAux SH ..LPOL 2 unit with specific properties. While a certain lan- | → | "DH ..SC sLPple # "SH ..APOL 3 # | | polarity( 1 , 2 , 3 ) 14http://ufal.mff.cuni.cz/pdt3.0 polarity( bool, minus, minus ) 15Note that analytic level denotes a a level of surface syn- ∧ polarity( minus, plus, minus ) tax rather than anything related to AVFs. polarity( plus, plus, plus ) 16http://universaldependencies.github.io/docs/format.html

15 guage tends to express morphological meanings Analytic Layer Annotation of the Prague Depen- analytically, using auxiliaries and other function dency Treebank. Technical report, ÚFAL MFF UK, words, a different (synthetic) language may avoid Prague. AVFs. Identification of such equivalent units may Tomáš Jelínek, Vladimír Petkevic,ˇ Alexandr Rosen, improve the quality of parallel texts alignment and Hana Skoumalová, Premyslˇ Vítovec, and Jiríˇ Zna- machine . Similarly, a parser trained on menácek.ˇ 2014. A grammar-licensed treebank of texts where such units are identified can produce Czech. In Verena Henrich, Erhard Hinrichs, Daniël de Kok, Petya Osenova, and Adam Przepiórkowski, better results. editors, Proceedings of TLT13, pages 218–229, The first release of a part of the Czech National Tübingen. Corpus annotated in the style of the PDT ana- Petr Karlík, Marek Nekula, and Zdenka Rusínová. lytic level is due soon. A pilot treebank including 1995. Pˇríruˇcnímluvnice ˇceštiny. Lidové noviny, the proposed annotation of analytic categories will Praha. follow, supplemented by the formal grammar and Miroslav Komárek, Jan Korenský,ˇ Jan Petr, and Jarmila lexicon. The planned size is in the order of tens of Veselková, editors. 1986. Mluvnice ˇceštiny2 – millions of words. The annotation will include an- Tvarosloví. Academia, Praha. alytic categories and other information added by the grammar and the lexicon, or a flag identifying Anna Kups´c´ and Jesse Tseng. 2005. A New HPSG Approach to Polish Auxiliary Constructions. In Ste- a failure in the application of the grammar and its fan Müller, editor, Proceedings of HPSG12, CSLI possible reason, while the annotation will retain Publications, pages 253–273. University of Lisbon. only information from the parser. At present, the Anna Kups´c.´ 2000. A HPSG Grammar of Polish Cli- grammar and the lexicon are developed and tested tics. Ph.D. thesis, Atelier national de Reproduction on a sample of 1000 sentences from the PDT anno- des Thèses. tation manual,17 covering a wide range of linguis- tic phenomena. A proper evaluation is previewed Carl J. Pollard and Ivan A. Sag. 1994. Head-Driven Phrase Structure Grammar. University of Chicago on a larger sample extracted from real corpus texts. Press, Chicago. Acknowledgments Adam Przepiórkowski. 2007. On heads and coordi- nation in valence acquisition. In Alexander Gel- This research was supported by the Grant Agency bukh, editor, CICLing 2007, pages 50–61, Berlin. of the Czech Republic, grant no. 13-27184S. Springer. Alexandr Rosen. 2014. A 3D taxonomy of word classes at work. In Ludmila Veselovská and Markéta References Janebová, editors, Complex Visibles Out There. Pro- ceedings of OLINCO 2014, volume 4, pages 575– Tania Avgustinova, Alla Bémová, Eva Hajicová,ˇ Karel 590, Olomouc. Palacký University. Oliva, Jarmila Panevová, Vladimír Petkevic,ˇ Petr Sgall, and Hana Skoumalová. 1995. Linguistic Jesse Tseng and Anna Kups´c.´ 2006. A cross-linguistic problems of Czech. Project Peco 2924. Technical approach to Slavic past tense and conditional con- report, Charles University, Prague. structions. Proceedings of FDSL6. Robert D Borsley. 1999. Auxiliaries, verbs and Jesse Tseng. 2009. A formal model of grammatical- complementizers in Polish. Slavic in Head-Driven ization in Slavic past tense constructions. Current Phrase Structure Grammar, pages 29–59. Issues in Unity and Diversity of Languages, pages 749–762. Václav Cvrcek,ˇ Vilém Kodýtek, Marie Koprivová,ˇ Do- minika Kováríková,ˇ Petr Sgall, Michal Šulc, Jan Ludmila Veselovská and Petr Karlík. 2004. Analytic Táborský, Jan Volín, and Martina Waclawicová.ˇ passives in Czech. Zeitschrift für Slawistik, pages 2010. Mluvnice souˇcasné ˇceštiny. Karolinum, 163–235. Praha. Ludmila Veselovská. 2003. Analytické préteritum a Mary Dalrymple. 1999. Lexical-functional grammar. opisné pasivum v ceštinˇ e:ˇ dvojí zp˚usobsaturace sil- In Rob Wilson and Frank Keil, editors, MIT Ency- ného rysu <+v> hlavy v*. Sborník filozofické fakulty clopedia of the Cognitive Sciences. The MIT Press. Masarykovy Univerzity, pages 161–177.

Jan Hajic,ˇ Jarmila Panevová, Eva Buránová,ˇ Zdenkaˇ Gert Webelhuth, editor. 1995. Government and Bind- Urešová, and Alla Bémová. 1999. A Manual for ing Theory and the Minimalist Program. Blackwell Publishers, Oxford, UK. 17Hajicˇ et al. (1999).

16