Manual for Morphological Annotation of Czech Sentences

Manual for Morphological Annotation of Czech Sentences

M a n u a l f o r M o r p h o l o g i c a l A n n o t a t i o n Revision for the Prague Dependency Treebank 2.0 Ú F A L Technica Report ! o. 200" #2$ J i ř í H a n a D a n i e l Z e m a n Jan Hajič Hana Hanová Barbora Hladká Emil Jeř á bek Table of Contents Preface to Version 2.0 Preface to Version 1.0 1. ntrod!ction 2. "emma and ta# str!ct!re 2.1. "emma str!ct!re 2.1.1. Base form and n!mber 2.1.2. $eference 2.1.%. &ate#or' 2.1.(. )erm 2.1.*. +t'le 2.1.,. E-.lanational comment 2.1./. &omment on derivation 2.2. )a# +tr!ct!re 2.2.1. Positional ta#s 2.2.2. &om.act ta#s 2.2.%. nformal abbreviations %. 0ames %.1. Personal names %.1.1. von1 van1 etc. %.1.2. &2inese and 3orean names %.1.%. 4orei#ni5ed &5ec2 names %.2. 6eo#ra.2ical names %.2.1. &o!ntries1 cities1 rivers1 mo!ntains %.2.2. +treets %.%. &om.anies and instit!tions %.%.1. $esta!rants %.%.2. +.ort cl!bs %.(. Horses1 7J8 s etc. %.*. Prod!cts %.,. +.ortin# and ot2er events %./. 9t2er %./.1. B!ildin#s %./.2. )elevisions %./.%. 0e:s and ma#a5ines %./.(. +on# names %.;. <djectives derived from names (. <bbreviations (.1. 6ender (.2. solated letters (.%. =nits of meas!rements (.(. <!t2ors8 si#nat!res (.*. <cademic titles *. &ollo>!ial &5ec2 *.1. &os1 kd's1 jaks... *.2. +!ffi- ?@ in .l!ral of ne!ter ,. 4orei#n :ords and .2rases ,.1. <rticles 2 ,.2. En#lis2 no!n cl!sters ,.%. 0o!ns ,.(. Verbs ,.(.1. En#lis2 verbs ,.*. +lavic lan#!a#es and &5ec2 dialects /. Errors /.1. &2aracters /.2. +e.arators ;. Hard to decide ;.1. aA ;.2. jak ;.%. má lo ;.(. moc ;.*. .roto ;.,. svB j ;./. tak C. +elected :ords 10 . 7ate and time 11. 0!mbers1 n!merals and >!antifiers 12. H'.2enated com.osites 1%. nsertion 1%.1. Possessive adjectives 1%.2. Dords endin# :it2 ?ism!s1 ?i5m!s 1%.%. )ranscri.tion of .ron!nciation 1%.(. &ri..led forms 1%.*. solated mor.2emes 1%.,. 6eometr' 1%./. &2ess codes L i st of Tables 2.1. "emma e-am.les 2.2. "emma cate#ories 2.%. )erm t'.es 2.(. +t'le fla#s 2.*. <ttrib!tes in .ositional ta#s 2.,. P9+ 2./. +=BP9+ 2.;. 9bsolete +=BP9+ val!es 2.C. 6E07E$ 2.10 . 0=EBE$ 2.11. &<+E 2.12. P9++6E07E$ 2.1%. P9++0=EBE$ 2.1(. PE$+90 2.1*. )E0+E 2.1,. 6$<7E 2.1/. 0E6<) 90 2.1;. V9 &E 2.1C. V<$ % %.1. 0ame t'.es %.2. E-am.les of #eo#ra.2ical names %.%. E-am.les of com.an' names %.(. E-am.les of resta!rant names %.*. E-am.les of s.ort cl!b names %.,. E-am.les of event names (.1. E-am.les of abbreviations (.2. 6ender of abbreviations (.%. E-am.les of isolated letters (.(. E-am.les of !nits *.1. &ollo>!ial e-am.les ,.1. E-am.les of forei#n .2rases ,.2. <rticles in common forei#n lan#!a#es ,.%. 0!mber and case of En#lis2 no!ns ,.(. E-am.les of En#lis2 verbs L i st of E x am p les 2.1. 4ollo:in# e-am.les ill!strate t2isF 2.2. 9t2er e-am.lesF %.1. Personal names :it2 von1 van etc. %.2. &2inese and 3orean names %.%. +treet names %.(. 0ames of 2orses %.*. )V com.an' names %.,. 0ames of .eriodicals 11.1. &ase a#reement in co!nted .2rases 12.1. H'.2enated com.osites 1%.1. ?ism!s1 ?i5m!s 1%.2. )ranscri.tion of .ron!nciation 1%.%. &ri..led forms ( Preface to % ersion 2.0 <lt2o!#2 t2e title of t2is re.ort in2erits t2e :ord G Ean!alG from t2e .revio!s version1 it is no more intended to #!ide t2e annotators. $at2er it attem.ts to describe t2e c!rrent state of t2e mor.2olo#ical annotation in P7) 2.0 . Eost of t2e added information res!lted from several semi?a!tomatic c2ecks .erformed on t2e data before 2avin# released it. n some cases it :as not mana#eable to brin# t2e data to t2e desired state ? if so1 bot2 t2e desired and t2e c!rrent state of t2e data are described. P7) 2.0 contains 11C,0 1,*/ mor.2olo#icall' annotated tokens in 12,1;%1 sentences. )2ere are 1,;1(*( distinct :ord forms1 /1/1, distinct lemmas1 and 1/(0 mor.2olo#ical ta#s. )2e final c2eckin# and anal'sis of t2e data as :ell as t2e :ork on t2is man!al revision :ere s!..orted b' t2e &5ec2 <cadem' of +ciences .ro#ram called G nformation +ociet'G 1 .rojects 0o. 1E)10 1120 *0 % and 1E)10 1120 (1%1 and t2e #rant 0o. 6<(0 *H 0 %H 0 C1%. * Preface to % ersion & .0 De are .leased to .!blis2 t2e first version of t2e man!al for mor.2olo#ical annotation of &5ec2 sentences. De believe t2at s!c2 #!idelines can be of !se to t2e !sers of Pra#!e 7e.endenc' )reebank 1.0 IP7) 1.0 J1 as :ell as for .re.aration of ne: data. "et !s recall t2e most im.ortant ste.s :e .assed in order to #et abo!t t:o million mor.2olo#icall' annotated :ords IP7) 1.0 J. <t t2e ver' be#innin#1 :e .!t to#et2er a team of ei#2t annotators ? :e did introd!ce t2em to a s'stem of mor.2olo#ical ta#s :e desi#ned to describe &5ec2 mor.2olo#ical .ro.ertiesK :e also !sed Ias a .re.rocessin# ste.J a mor.2olo#ical anal'5er for .rocessin# isolated :ords1 and1 last b!t not least1 :e did rel' on t2eir kno:led#e of &5ec2 mor.2olo#' t2e' 2ave ac>!ired :2ile st!d'in# at secondar' sc2ool1 i.e. :e did not offer t2em an' annotation #!idelines. 9ne can ass!me t2at t2is strate#' is too 2a5ardo!s ? 2o: to deal :it2 discre.ancies t2e annotators .rod!ce to ens!re t2e consistenc' of annotationL 4irst1 t:o annotators annotated eac2 te-t file. )2en1 b' a G blindG a!tomatic .roced!re Ino matter :2at :ord is .rocessed ? j!st com.arin# t:o strin#sJ :e detected :ords annotated differentl'. &onse>!entl'1 t2e onl' one annotator Ias a member of j!st t:o?member teamJ 2andled t2ese cases and1 also1 c2ecked t2e mor.2olo#ical annotations a#ainst t2e s'ntactic?anal'tic annotations. )2is :a' :e re.laced t2e absence of annotation #!idelines b' se>!ential elimination of discre.ancies across bot2 t2e mor.2olo#ical and s'ntactic?anal'tic levels of annotation. <lon# t2e :a' :e :ere :ritin# t2is annotation man!al. t is not intended as a com.re2ensive #!ide to t2e mor.2olo#ical annotation of &5ec2 sentences Iin contrast to t2e man!al for s'ntactic?anal'tic annotationsJ. )2e a!t2ors concentrate G onl'G on t2ose cases :2ic2 ca!sed t2e most ambi#!ities and .roblems :2ile annotatin# P7) 1.0 . )2e on#oin# effort is directed to t2e treatin# of not? 'et?solved .roblematic cases in accord :it2 t2e conventions of t2e a!tomatic mor.2olo#ical anal'5er. )2e mor.2olo#ical annotation of P7) 1.0 :as carried o!t in t2e frame:ork of e-.erimental verification of t2e definition of formal re.resentation of t2e anal'sis of &5ec2 sentences It2e .roject 6<M $ (0 *H C,H 0 1C;1 G 4ormal re.resentation of lan#!a#e str!ct!resG J. )2e material obtained in t2is :a' IdataJ is !sed in man' domains of researc2 in com.!tational lin#!istics1 above all as basic Itrainin#J data in .rojects of t2e a!tomatic lan#!a#e anal'sis1 t2e EN E) researc2 .roject E+E11%0 0 0 0 0 ,1 t2e G "aborator' for "an#!a#e 7ata Processin#G It2e EN E) .roject V+C,1*10 J and t2e &enter for &om.!tational "in#!istics It2e EN E) .roject "00 0 <0 ,%J. )2ese data 2ave been also !sed as verification material for vario!s .artial .rojects :it2in t2e com.le- .ro#ram 6<M $ (0 *H C,H 321( IG &5ec2 "an#!a#e in &om.!ter <#eG J. )2e G &enter for &om.!tational "in#!isticsG .roject financiall' s!..orted :ork on t2ese mor.2olo#ical annotation #!idelines. , ' hapter & . ( ntroduction De do not :ant to s!bstit!te a #rammarbook of &5ec2. +o :e are not #oin# to s'stematicall' define :ord classes and .aradi#ms. <ll t2e annotators s2o!ld !nderstand t2e f!ndamentals of &5ec2 mor.2olo#'1 as most native &5ec2 s.eakers do It2e st!ff is bein# ta!#2t in elementar' sc2oolsJ. D2at :e are #oin# to describe are t2e diffic!lt or !n!s!al .2enomena. Eost notabl' :e :ill address t2e annotation of .ro.er names1 forei#n :ords1 and abbreviations. +!c2 cate#ories are rarel' and s.arsel' covered b' standard dictionaries. )o #et an idea :2at a forei#n :ord1 .ro.er name etc. mean it is !sef!l to tr' to find it !sin# an internet .ortal1 an enc'clo.edia etc. 7!rin# annotation1 :e fo!nd t2e follo:in# internet links !sef!lF P or tals.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    55 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us