A Morphological Analyser for Crimean Tatar

Total Page:16

File Type:pdf, Size:1020Kb

A Morphological Analyser for Crimean Tatar A Morphological Analyser for Crimean Tatar .HPDO $OWLQWDV ,O\DV &LFHNOL NHPDO#FVELONHQWHGXWU LO\DV#FVELONHQWHGXWU %LONHQW 8QLYHUVLW\ 'HSDUWPHQW RI &RPSXWHU (QJLQHHULQJ %LONHQW $QNDUD 7XUNH\ $EVWUDFW 7KLV SDSHU GHVFULEHV WKH GHWDLOV RI D PRUSKRORJLFDO DQDO\VHU GHVLJQHG IRU &ULPHDQ 7DWDU ODQJXDJH 7KH V\VWHP LV EDVHG RQ WZROHYHO PRUSKRORJ\ DQG LPSOHPHQWHG XVLQJ ;(52; ILQLWH VWDWH WRROV 7KH SKRQRORJLFDO UXOHV WKDW JRYHUQ WKH GLIIHUHQWLDWLRQ RI VRXQGV DUH H[SODLQHG LQ GHWDLOHG 7KHQ WKH PRUSKRWDFWLF UXOHV WKDW RUJDQLVH WKH &ULPHDQ 7DWDU PRUSKHPH RUGHUV DUH JLYHQ $ EULHI FRPSDULVRQ RI 7XUNLVK DQG &ULPHDQ 7DWDU LV IROORZHG E\ VRPH H[DPSOHV RI WKH SURJUDP RXWSXWV , ,QWURGXFWLRQ 7KH ODQJXDJH VSRNHQ E\ &ULPHDQ 7DWDUV LV DFWXDOO\ D SDVVDJH EHWZHHQ 2JKX] RULHQWHG $QDWROLDQ 7XUNLVK DQG .LSFKDN RULHQWHG ODQJXDJHV VXFK DV .D]DQ 7DWDU DQG .D]DNK.LUJL] +DYLQJ FORVH KLVWRULFDO UHODWLRQV ZLWK 2WWRPDQ (PSLUH &ULPHDQ 7DWDU SHRSOH VSHDN D ODQJXDJH WKDW LV LQWHOOLJLEOH WR $QDWROLDQ 7XUNV +RZHYHU WKH .LSFKDN JUDPPDU UXOHV DQG ZRUGV DUH QRW QHJOLJLEOH >@ 7KHUH DUH WKUHH PDLQ GLDOHFWV RI &ULPHDQ 7DWDU 1RUWKHUQ GLDOHFW ZKLFK LV FDOOHG ³o|O úLYHVL´ VWHSSH GLDOHFW LQ &ULPHDQ 7DWDU VKRZV PXFK PRUH .LSFKDN SURSHUWLHV DQG LV FORVH WR .D]DNK DQG .LUJL] 7KH FHQWUDO GLDOHFW LV FDOOHG ³%DKoHVDUD\ úLYHVL´ UHIHUHQFLQJ %DKoHVDUD\ WKH FDSLWDO FLW\ RI &ULPHDQ .KDQDWH DQG LV WKH EDVLF OLWHUDU\ GLDOHFW 7KH VRXWKHUQ GLDOHFW LV ³<DOÕER\X úLYHVL´ FRDVWDO GLDOHFW DQG LV YHU\ FORVH WR $QDWROLDQ 7XUNLVK >@ ,Q WKLV SURMHFW ZH LPSOHPHQWHG WKH V\VWHP FRPSDWLEOH ZLWK %DKoHVDUD\ GLDOHFW VLQFH LW LV WKH OLWHUDU\ ODQJXDJH 7KURXJKRXW WKH SDSHU WKH WHUP &ULPHDQ 7DWDU PHDQV ³%DKoHVDUD\ GLDOHFW RI &ULPHDQ 7DWDU ODQJXDJH´ DQG WKH WHUP 7XUNLVK PHDQV ³OLWHUDU\ 7XUNLVK ODQJXDJH VSRNHQ LQ 7XUNH\´ 0RVW RI WKH URRW ZRUGV LQ &ULPHDQ 7DWDU DUH FRPPRQ ZLWK 7XUNLVK > @ +RZHYHU WRGD\ WKH GLIIHUHQFHV ERWK LQ URRWV DQG LQ JUDPPDWLFDO UXOHV DUH QRW QHJOLJLEOH 0DQ\ ZRUGV HVSHFLDOO\ LQ 1RUWKHUQ GLDOHFW DUH FRPSOHWHO\ GLIIHUHQW IURP $QDWROLDQ 7XUNLVK $]EDU DYOX \DUG .|NUHN J|÷V FKHVW <HQJLO KDILI OLJKW 0DQ\ ZRUGV DUH SUHVHQW LQ ERWK ODQJXDJHV KRZHYHU WKH\ PHDQ GLIIHUHQW ³7DúODPDT´ LQ &ULPHDQ 7DWDU PHDQV WR OHDYH VRPHWKLQJ DW VRPHZKHUH KRZHYHU LQ 7XUNLVK ³WDúODPDN´ LV WR VWRQH ³6DOPDT´ LQ &ULPHDQ 7DWDU LV WR SXW RU DGG DQG ³VDOPDN´ LQ 7XUNLVK LV WR OHW VRPHWKLQJ JR 7KHUH DUH PDQ\ YDULDQFHV EHWZHHQ WKH JUDPPDUV RI 7XUNLVK DQG &ULPHDQ 7DWDU )RU H[DPSOH WKH VHFRQG WHQVH RI D YHUE LV ZULWWHQ DV D VHSDUDWH ZRUG LQ &ULPHDQ 7DWDU ZKLOH LW LV MRLQHG WR WKH URRW LQ 7XUNLVK $OVR WKH QDUUDWLYH VXIIL[ LQ &ULPHDQ 7DWDU LV ±JHQ RU LWV HTXLYDOHQWV DFFRUGLQJ WR KDUPRQ\ UXOHV ZKLOH QDUUDWLRQ LV H[SUHVVHG ZLWK ±PLú RU ZLWK LWV HTXLYDOHQFH FODVV LQ 7XUNLVK )RU H[DPSOH ³NHOJHQ HGL´ LV ZULWWHQ DV ³JHOPLúWL´ KH KDG FRPH LQ 7XUNLVK /LYLQJ XQGHU 5XVVLDQ UXOH IRU PRUH WKDQ WZR FHQWXULHV WKH HIIHFWV RI 5XVVLDQ LV KHDYLO\ IHOW RYHU &ULPHDQ 7DWDUV 1RW RQO\ WKHUH DUH PDQ\ ZRUGV GHULYHG IURP 5XVVLDQ VRPHWLPHV HYHQ 5XVVLDQ JUDPPDWLFDO UXOHV DUH DSSOLHG WR &ULPHDQ 7DWDU ZRUGV +RZHYHU VLQFH WKHVH DUH QRW YDOLG VWUXFWXUHV IRU &ULPHDQ 7DWDU WKH\ KDYH QRW EHHQ FRQVLGHUHG IRU WKH V\VWHP ZH GHYHORSHG :RUGV HVSHFLDOO\ UHODWHG WR WHFKQRORJ\ DQG XVXDOO\ WKH FRXQWHUSDUWV RI 7XUNLVK ZRUGV WKDW FRPH IURP ZHVWHUQ ODQJXDJHV DUH PRVWO\ GHULYHG IURP 5XVVLDQ 6RPH H[DPSOHV DUH 7HOHYL]RU WHOHYL]\RQ WHOHYLVLRQ $YWREXV RWREV EXV 3HoTD SHoND VRED VWRYH 7KH UHVW RI WKH SDSHU LV RUJDQL]HG DV IROORZV WKH QH[W VHFWLRQ JLYHV D EULHI H[SODQDWLRQ RI WZROHYHO PRUSKRORJ\ 7KH IROORZLQJ VHFWLRQ JLYHV WKH &ULPHDQ 7DWDU DOSKDEHW DQG WKH IRXUWK VHFWLRQ OLVWV WKH YRZHO DQG FRQVRQDQW KDUPRQ\ UXOHV 7KH ILIWK VHFWLRQ H[SODLQV WKH PRUSKRWDFWLFV IRU &ULPHDQ 7DWDU DQG WKH VL[WK VHFWLRQ FRPSDUHV 7XUNLVK DQG &ULPHDQ 7DWDU JUDPPDUV 7KH VHYHQWK VHFWLRQ VXPPDULVHV WKH LPSOHPHQWDWLRQ RI WKH SURJUDP ZLWK VHYHUDO H[DPSOH UXQV DQG WKH SDSHU HQGV ZLWK D FRQFOXVLRQ VHFWLRQ ,, 2YHUYLHZ RI 7ZR/HYHO 0RUSKRORJ\ 7ZROHYHO PRUSKRORJ\ LV D ZD\ RI KDQGOLQJ PRUSKRORJLFDO VWUXFWXUHV E\ H[HFXWLQJ SVHXGRSDUDOOHO UXOHV>@ 7KHUH DUH WZR OHYHOV RI WKH V\VWHP VXUIDFH OHYHO DQG OH[LFDO OHYHO 6XUIDFH OHYHO UHSUHVHQWDWLRQ LV WKH GLUHFW UHSUHVHQWDWLRQ RI DQ LQSXW DV LW LV UHSUHVHQWHG LQ WKH RULJLQDO ODQJXDJH /H[LFDO OHYHO LV WKH GHFRPSRVHG IRUP RI WKH LQSXW DQG LV WKH RXWSXW RI WKH V\VWHP ZKHQ WKH VXUIDFH UHSUHVHQWDWLRQ LV JLYHQ DV DQ LQSXW ,Q D ILQLWH VWDWH WUDQVGXFHU QRUPDOO\ WKH VXUIDFH DQG OH[LFDO OHYHOV DUH UHSUHVHQWHG DV WZR H[SUHVVLRQV VHSDUDWHG E\ D FRORQ )RU H[DPSOH DQ H[SUHVVLRQ OLNH DE LV XVXDOO\ H[SHFWHG WR PHDQ ³OH[LFDO IRUP D LV GHULYHG IURP WKH VXUIDFH IRUP E´ 5XOHV WKDW GHQRWH WKH PRUSKRORJLFDO PRGLILFDWLRQV DQG YDULDWLRQV DUH DOO H[HFXWHG LQ SDUDOOHO DQG DOO WKH UXOHV ZRUN RQ WKH VDPH LQSXW ,I DOO RI WKH UXOHV DFFHSW WKH LQSXW WKHQ WKH PDFKLQH DFFHSWV WKH LQSXW +RZHYHU LI WKH LQSXW LV UHMHFWHG E\ DQ\ RI WKH UXOHV WKHQ WKH PDFKLQH UHMHFWV WKH LQSXW GLUHFWO\ 7KHUH DUH IRXU GLIIHUHQW UXOH W\SHV LQ VXFK D V\VWHP DE ! /& B 5& /H[LFDO D LV PDSSHG WR VXUIDFH E LI LW DSSHDUV LQ WKHVH OHIW DQG ULJKW FRQWH[WV +RZHYHU LWV DSSHDULQJ LQ WKLV FRQWH[W GRHV QRW UHTXLUH VXFK D PDSSLQJ ,Q RWKHU ZRUGV LI D LV PDSSHG WR E WKHQ LW PXVW EH LQ WKLV FRQWH[W DQG FDQQRW KDSSHQ LQ DQRWKHU FRQWH[W DE /& B 5& OH[LFDO IRUP D LV PDSSHG WR VXUIDFH IRUP E LI LW DSSHDUV LQ /& DQG 5& +RZHYHU LW LV DOVR SRVVLEOH WR PDS D WR E LQ DQRWKHU FRQWH[W DE !/&B5&DOH[LFDO D LV DOZD\V PDSSHG WR D VXUIDFH E LQ WKLV FRQWH[W DQG WKLV LV SRVVLEOH RQO\ LQ WKLV FRQWH[W DE /& B 5& D OH[LFDO D LV QHYHU PDSSHG WR D VXUIDFH E LQ WKH JLYHQ FRQWH[W 7KH PRUSKRWDFWLF UXOHV DUH FRPSLOHG WR D ILQLWH VWDWH WUDQVGXFHU DQG DUH MRLQHG ZLWK WKHVH UXOHV 7KH V\VWHP DV D ZKROH WULHV WR ORFDWH WKH URRWV DQG SRVVLEOH IROORZLQJ VXIIL[HV IRU D JLYHQ VXUIDFH IRUP LQSXW ,I WKH V\VWHP DW DQ\ VWDJH FDQQRW ORFDWH D YDOLG VXIIL[ RU LW GLVFRYHUV D VLWXDWLRQ YLRODWLQJ WKH PRUSKRORJLFDO PRGLILFDWLRQ UXOHV LW UHWXUQV ZLWK QR DQVZHU )RU D GHWDLOHG H[SODQDWLRQ RI WZROHYHO PRUSKRORJ\ VHH >@ ,,, 7KH $OSKDEHW $V LW LV WKH FDVH IRU DOO 7XUNLF ODQJXDJHV &ULPHDQ 7DWDU ZDV DOVR ZULWWHQ XVLQJ WKH $UDELF 6FULSW LQ WKH EHJLQQLQJ RI WZHQWLHWK FHQWXU\ $IWHU WKH IRUPDWLRQ RI 6RYLHW UXOH WKH &\ULOOLF DOSKDEHW RI 5XVVLDQ ODQJXDJH ZDV VWDUWHG WR EH XVHG 1RZ D /DWLQ EDVHG DOSKDEHW ZKLFK LV WKH VDPH DV 7XUNLVK DOSKDEHW ZLWK IHZ DGGLWLRQV ZDV DFFHSWHG E\ WKH &ULPHDQ 7DWDU 1DWLRQDO $VVHPEO\ DQG LV EHLQJ XVHG 7KH /DWLQ EDVHG &ULPHDQ 7DWDU DOSKDEHW LV $D Æk %E &F do 'G (H )I *J ö÷ +K ,Õ øL -M .N /O 0P 1Q fx 2Rg| 3S 4T 5U 6V ùú 7W 8X h 9Y <\ =] ,Q WKH SURJUDP WKH OHWWHUV WKDW DUH SUHVHQW LQ WKH $6&,, FKDUDFWHUV DUH XVHG DV LV LQ ORZHUFDVH %RWK LQ VXUIDFH IRUP DQG OH[LFDO IRUP ZH UHSUHVHQWHG WKH OHWWHUV ZKLFK DUH DEVHQW LQ $6&,, ZLWK WKH FDSLWDO IRUP RI WKH FORVHVW V\PERO 7KH FRUUHVSRQGHQFHV DUH DV IROORZV o±&÷±* Õ±, |±2 ú±6 ±8 x±1 $W WKH OH[LFDO OHYHO KRZHYHU ZH QHHG WRXVH VRPH H[WUD FKDUDFWHUV WRUHSUHVHQW RQHWRPDQ\ PDSSLQJV DQG H[FHSWLRQV )RU WKLV SXUSRVH ZH XVH WKH IROORZLQJ FDSLWDO OHWWHUV ZKLFK DUH XVHG RQO\ LQ WKH SURJUDP DQG DUH LQYLVLEOH WRWKH XVHU - ± o WKDW GRHV QRW FKDQJH WR F N8-8 N8&8 3 ± S WKDW GRHV QRW FKDQJH WR E VD3, VDS, 4 ± T WKDW GRHV QRW FKDQJH WR ÷ ED4D EDTD 7 ± W WKDW GRHV QRW FKDQJH WR G EH7 L EHWL : ± N WKDW GRHV QRW FKDQJH WR J WH:L WHNL + ± FRUUHVSRQGV WR V\PEROV , L X 8 DFFRUGLQJ WR YRZHO KDUPRQ\ $ ± FRUUHVSRQGV WR D RU H DFFRUGLQJ WR YRZHO KDUPRQ\ UXOHV < ± FRUUHVSRQGV WR X RU 8 DFFRUGLQJ WR YRZHO KDUPRQ\ UXOHV . ± FRUUHVSRQGV WR J N * T DFFRUGLQJ WR FRQVRQDQW KDUPRQ\ UXOHV ' ± FRUUHVSRQGV WR D G RU W DFFRUGLQJ WR FRQVRQDQW KDUPRQ\ UXOHV = ± V WKDW GRHV QRW GURS DV D MRLQLQJ VRXQG DOLP =Lx DOLPVLQ :H DOVR XVH WKH IROORZLQJ JURXSLQJV LQ WKH WZR OHYHO PRUSKRORJ\ UXOHV 7KH YRZHOV DUH 92:(/ DH,LR2X8$+0<k 7KH FRQVRQDQWV DUH &216 EF&GIJ*KMNOPQ1STUV6WY\].=%34-: 7KH RWKHU JURXSLQJV DUH DV IROORZV %DFN 9RZHO %$&.9 D,XRk )URQW 9RZHO )52179 HL28 )URQW 8QURXQGHG 9RZHO )581529 L H )URQW 5RXQGHG 9RZHO )5529 2 8 %DFN 5RXQGHG 9RZHO %.529 X R %DFN 8QURXQGHG 9RZHO %.81529 D,k 6RIW &RQVRQDQWV 6('$/, EFGJ*MY]OPQ1U\K% +DUG &RQVRQDQWV 6('$6,= S&WNT6IV=34-: -RLQLQJ &RQVRQDQWV ; V \ ,9 9RZHO DQG &RQVRQDQW +DUPRQ\ 5XOHV $ UHDOL]HG DV D $D ! >%$&.9@ >&216@ >&216 _ &216 _ @ B $IWHU D EDFN YRZHO WKH IROORZLQJ $ PXVW EH UHSUHVHQWHG DV DQ D EDOD O$U ! EDODODU oRFXNODU ± NLGV TR\ O$U ! TR\ODU NR\XQODU VKHHS $ UHDOL]HG DV H $H ! >)52179@ >&216@ >&216 _ &216 _ @ B 6LPLODU WR WKH IROORZLQJ UXOH WKLV IROORZV WKH JUDPPDU UXOH VWDWLQJ WKDW IURQW YRZHOV DUH WR IROORZ IURQW YRZHOV N2\ O$U ! N2\O0U N|\OHU ± YLOODJHV J8O '$Q ! J8OG0Q JOGHQ ± IURP WKH URVH $ UHDOL]HG DV \ $\ ! >92:(/@ B ,Q &ULPHDQ 7DWDU SUHVHQW SURJUHVVLYH WHQVH VXIIL[ LV ±\ DQG IXWXUH VXIIL[ LV \F$. LI WKH URRW HQGV LQ D YRZHO VRUD $ !VRUD \ VRUX\RU ± VKH LV DVNLQJ TRU&DOD $F$. !TRU&DOD \FDT NRUX\DFDN ± VKH ZLOO SURWHFW + UHDOL]HG DV X +X ! >&216@ >%.529@ >&216@ >&216 _ &216 _ @ B ,I WKHUH LV RQO\ RQH V\OODEOH LQ WKH URRW ZKLFK PHDQV WKHUH LV RQO\ RQH YRZHO EHIRUH + DQG LI LW FRPHV DIWHU D EDFN URXQGHG YRZHO R X LW LV UHVROYHG WR X VR1+QF+ ! VR1:QF, VRQXQFX ± WKH ODVW + UHDOL]HG DV 8 +8 ! >&216@ >)5529@ >&216@ >&216 _ &216 _ @ B ,Q RWKHU FDVHV QDPHO\ + FRPLQJ LQ WKH VHFRQG V\OODEOH DQG IROORZLQJ D IURQW URXQGHG YRZHO 8 2 LW LV UHVROYHG WR 8 N2\ =+]! N2\V &] N|\V]± ZLWKRXW D YLOODJH 8& +QF+ ! 8&&QFL oQF ± WKH WKLUG + UHDOL]HG DV L +L ! >92:(/@ >&216@ >&216 _ &216 _ @ >)52179@ >&216@ >&216 _ &216 _ @ B >&216@ >)52179@ >&216@ >&216 _ &216 _ @ B ,Q &ULPHDQ 7DWDU ODQJXDJH WKH X DQG LQ VXIIL[HV FDQ DSSHDU RQO\ LQ WKH VHFRQG V\OODEOH DQG IRU WKH VDPH VXIIL[ LW LV ZULWWHQ DV Õ RU L LQ WKH WKLUG DQG WKH ODWHU V\OODEOHV )HZ H[FHSWLRQDO PRUSKHPHV VXFK DV SDVW PRUSKHPH ±GL DQG DFFXVDWLYH PRUSKHPH ±QL DUH PRVW RI WKH WLPH ZULWWHQ ZLWK ÕL HYHQ LI WKH\ DSSHDU LQ WKH VHFRQG V\OODEOH +HUH DUH WZR UXOHV RSHUDWLRQJ LQ SDUDOOHO 7KH ILUVW UXOH FKHFNV ZKHWKHU WKHUH DUH DW OHDVW WZR V\OODEOHV ,I WKHUH DUH WKHQ + LV UHVROYHG WR L DIWHU DOO IURQW YRZHOV QDPHO\ H L 2 8 ,I WKHUH LV RQH V\OODEOH WKHQ WKH VHFRQG UXOH UXQV DQG PDSV + WR L DIWHU RQO\ IURQW XQURXQGHG YRZHOV N2U '+ ! N2UGL J|UG ± VKH VDZ V8W V+] O+.
Recommended publications
  • 1 Introduction
    State Service of Geodesy, Cartography and Cadastre State Scientific Production Enterprise “Kartographia” TOPONYMIC GUIDELINES For map and other editors For international use Ukraine Kyiv “Kartographia” 2011 TOPONYMIC GUIDELINES FOR MAP AND OTHER EDITORS, FOR INTERNATIONAL USE UKRAINE State Service of Geodesy, Cartography and Cadastre State Scientific Production Enterprise “Kartographia” ----------------------------------------------------------------------------------- Prepared by Nina Syvak, Valerii Ponomarenko, Olha Khodzinska, Iryna Lakeichuk Scientific Consultant Iryna Rudenko Reviewed by Nataliia Kizilowa Translated by Olha Khodzinska Editor Lesia Veklych ------------------------------------------------------------------------------------ © Kartographia, 2011 ISBN 978-966-475-839-7 TABLE OF CONTENTS 1 Introduction ................................................................ 5 2 The Ukrainian Language............................................ 5 2.1 General Remarks.............................................. 5 2.2 The Ukrainian Alphabet and Romanization of the Ukrainian Alphabet ............................... 6 2.3 Pronunciation of Ukrainian Geographical Names............................................................... 9 2.4 Stress .............................................................. 11 3 Spelling Rules for the Ukrainian Geographical Names....................................................................... 11 4 Spelling of Generic Terms ....................................... 13 5 Place Names in Minority Languages
    [Show full text]
  • Proposal to Encode Four Latin Letters for Janalif — 2009-03-16 Page 1 of 8 in 1928 Jaalif Was Finally Reformed and Was in Active Usage for 12 Years (See Fig
    Universal Multiple-Octet Coded Character Set SC2/WG2 N3581 International Organization for Standardization Organisation Internationale de Normalisation Международная организация по стандартизации Doc Type: Working Group Document Title: Proposal to encode four Latin letters for Jaalif Source: Karl Pentzlin, Ilya Yevlampiev (Илья Евлампиев) Status: Individual Contribution Action: For consideration by JTC1/SC2/WG2 and UTC Date: 2008-11-03, revised 2009-03-16 Revision history: The revision of 2009-03-16 takes into account the code points (U+A790/U+A791) devised by UTC #117 for the n with descender. Moreover, it takes into account the name "Latin capital/small letter yeru" for the letter initially proposed as "Latin capital i with right bowl / Latin small letter dotless i with right bowl", as proposed by Michael Everson and continued by the German comments to PDAM7. Also, some sorting considerations were added for the Latin yeru, and fig. 6 was updated. Additions for Janalif U+A790 LATIN CAPITAL LETTER N WITH DESCENDER → 04A2 cyrillic capital letter n with descender U+A791 LATIN SMALL LETTER N WITH DESCENDER U+A792 LATIN CAPITAL LETTER YERU → 042B cyrillic capital letter yeru → 042C cyrillic capital letter soft sign → 0184 latin capital letter tone six U+A793 LATIN SMALL LETTER YERU → 0131 latin small letter dotless i Properties: A790;LATIN CAPITAL LETTER N WITH DESCENDER;Lu;0;L;;;;;N;;;;A791; A791;LATIN SMALL LETTER N WITH DESCENDER;Ll;0;L;;;;;N;;;A790;;A790 A792;LATIN CAPITAL LETTER YERU;Lu;0;L;;;;;N;;;;A793; A793;LATIN SMALL LETTER YERU;Ll;0;L;;;;;N;;;A792;;A792 1. The Jaalif alphabet (fig.
    [Show full text]
  • Proposal to Encode Four Latin Letters for Janalif
    Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de Normalisation Международная организация по стандартизации Doc Type: Working Group Document Title: Proposal to encode four Latin letters for Jaŋalif Source: Karl Pentzlin, Ilya Yevlampiev (Илья Евлампиев) Status: Individual Contribution Action: For consideration by JTC1/SC2/WG2 and UTC Date: 2008-11-03 Additions for Janalif U+A792 LATIN CAPITAL LETTER I WITH RIGHT BOWL = Janalif yeru → 042B cyrillic capital letter yeru → 042C cyrillic capital letter soft sign → 0185 latin small letter tone six U+A793 LATIN SMALL LETTER DOTLESS I WITH RIGHT BOWL U+A794 LATIN CAPITAL LETTER N WITH DESCENDER → 04A2 cyrillic capital letter n with descender U+A795 LATIN SMALL LETTER N WITH DESCENDER Properties: A792;LATIN CAPITAL LETTER I WITH RIGHT BOWL;Lu;0;L;;;;;N;;;;A793; A793;LATIN SMALL LETTER DOTLESS I WITH RIGHT BOWL;Ll;0;L;;;;;N;;;A792;;A792 A794;LATIN CAPITAL LETTER N WITH DESCENDER;Lu;0;L;;;;;N;;;;A795; A795;LATIN SMALL LETTER N WITH DESCENDER;Ll;0;L;;;;;N;;;A794;;A794 1. The Jaalif alphabet (fig. 3, 4) In 1908–1909 the Tatar poet Säğit Rämiev started to use the Latin alphabet in his own works. He offered the use of digraphs: ea for ä, eu for ü, eo for ö and ei for ı. But Arabists turned down his project. In the early 1920s Azerbaijanis invented their own Latin alphabet, but Tatarstan scholars set a little store to this project, preferring to reform the İske imlâ (en.wikipedia.org/wiki/iske_imla). The simplified İske imlâ, known as Yaña imlâ (en.wikipedia.org/wiki/yana_imla) was used from 1920–1927.
    [Show full text]
  • 3 Dice/20 Sided)
    Dice-Road Dictionary™ Password Code Book for English (3 Dice/20 Sided) T. Borawski <ChiefOperator (circle A) GeneralTelegraph.com> WWW.GeneralTelegraph.com ©2011 T.Borawski. This Document is provided under the terms of CreativeCommons Public License,Attr ibution-ShareAlike3.0 Unported. The full license text is included at the end of this document. Impor tant Notes for Paper CopyUsers: -- Nevertouch the pages. -- Use a clean, clear plastic sheet to protect the pages when looking up words. Warning The Dice-Road Dictionary™ contains randomly selected words that maybeoffensive. Parental discretion is advised. ©2011 T.Borawski www.GeneralTelegraph.com Released for Free Distribution Under The CreativeCommons Public License Pa ge1 Dice-Road Dictionary™ Password Code Book English (3 Dice/20 Sided) Pa ge2 1-4-11 nicki boohofni p6!#zy English (3 Dice/20 Sided) Start: 1-4-12 abseiling rhachiex` eC};oI 1-1-1 tangies FacOc< fKw5#k 1-4-13 sabre juWyok ^sim'V 1-1-2 keynoter alEfby %–83%? 1-4-14 yetis Otpiget M^2L'a 1-1-3 cobbing GertAtAg( Dv<:7Q 1-4-15 bursitis Disbov Y'$ig5 1-1-4 superheat yewxag< !CC0M\ 1-4-16 warezes Euddyigpi /8{OF[ 1-1-5 totted deijcepeu ~qfa), 1-4-17 hereabout Hyfryok \5.+]\ 1-1-6 arman sukEc= FJ;cu! 1-4-18 minuses waywrig9 8+4y%C 1-1-7 ortensia elHunroVi Ba6!b: 1-4-19 muskegs RacFovDew M(pIHz 1-1-8 spotless Utvot5 d^8nm' 1-4-20 zoophytic adnajco a)uQm} 1-1-9 fixate kevekBie yQ{~hn 1-5-1 coding Pyquanja crpKM( 1-1-10 pond ogquivCue %Zgn'y 1-5-2 misprint CrigsUm_ 'v0>py 1-1-11 listens yegAulm n&/B%v 1-5-3 laptops CludEish3 Jo#$&g 1-1-12 truisms IgBak.
    [Show full text]
  • Research Article Special Issue
    Journal of Fundamental and Applied Sciences Research Article ISSN 1112-9867 Special Issue Available online at http://www.jfas.info THE APPLICATION OF STATISTICAL METHODS IN THE DEVELOPMENT OF CYRILLIC-LATIN CONVERTER FOR TATAR LANGUAGE A. V. Danilov1,*, L. L. Salekhova1, N. Anyameluhor2 1Kazan Federal University, Institute of Philology and Intercultural Communications 2Nottingham Trent University (Great Britain), Department of Computing and Technology Published online: 24 November 2017 ABSTRACT The article describes the process of a software product development that allows you to convert a text written in Tatar to Latin using Cyrillic graphics. The aspects of Cyrillic graphics to Latin graphics conversion are considered for Tatar language. The authors study the application of various statistical methods necessary for converter operation and analyze the speed and the accuracy of the conversion algorithms. An algorithm was created and software modules were developed that made it possible to convert messages written in Tatar Cyrillic alphabet to Tatar Latin alphabet. Based on normative documents and scientific works on the use of Latin graphics in Tatar language, a verbal and an algorithmic model of conversion was constructed. In the process of development, it turned out that the process of a Tatar word conversion depends on its origin. If native Tatar words are converted according to the phonetic principle (кәлам - qäläm), the borrowed words are converted according to the rules of transliteration. The main problem of the study is the problem of a word origin determination. In order to solve this problem, the authors propose various algorithms. Software tools based on the statistical processing of linguistic data are considered and developed in the work: combined bigram analysis, naive Bayesian classification and a direct search.
    [Show full text]
  • Cool Letter N Designs
    Cool Letter N Designs herOctal intenseness Corky modifies jointure hebdomadally prance and whilechronicled Kirk always initially. edged Phlogistic his shot-putter Chen always interview whipt hisexteriorly, nightmares he demobilises if Napoleon so is tattlingly.pursued orFlip mythicizes and tumular feignedly. Gayle scrimpy In between the designs to say fire safety in front of tsushima locations in scrabble, you have a project type your text. Initial Letter N Logo With Creative Modern Business Typography. Develop creative and critical thinking skills side by train with students from pretty the world in one school the most vibrant creative cities in four world. This letter a great accomplice to think they fire letters. The designer will be. Apple fries and beyond to write a quantity discount will convert its top of the! Head relieve the siege of the Crimean Tatar alphabet, written inside the savings Drop? Your letter n graphic, it can also a fire isolated on black backgrounds name. Master established art and design fields or advance emerging ones. If stud is your first time, cost your tattoo at someplace which is not visible, boost your hand or sensitive feet. To assist in the letters. And rose line of the ndc also do not store any other pieces focused on. This letter tattoo designs which may receive updates and affection for cool design. The community letters use the snake Field above to type enter letter location between. Sign goes with Facebook. Choose letters design minimalist droplet vector rose line of! What you can be open in the designs you must look classic. When there are not too.
    [Show full text]
  • Abecedario Lettering Pdf
    Abecedario lettering pdf Continue Vectors Photos Psd Icons Vectors Photos Psd Icons Training, how to pronounce the Spanish alphabet, or abecedario, easy! Most letters have only one sound, making their pronunciation quite simple. The table below shows the letters in abecedario, along with their Spanish name (s), as well as some tips on pronunciation them alone and in combination with other letters. Pronouncing the Spanish alphabet This letter sounds like the sound you use to express the realization in English: this one! This letter often sounds like English b. Especially when it happens between two vowels, it is pronounced with lips without touching, just like the Spanish V. You can also hear it called larga, be grand to be de burro. This letter often sounds like English k. Before e or i, it sounds like s (or th in thick in many parts of Spain.) Although it is not considered a letter anymore RAE, it sounds like a ch in cheese. This letter sounds just like English d, except you have to place the tongue against the upper teeth and not the roof of your mouth when pronouncing it. It often sounds like th in English then, especially when it comes between two vowels. This letter sounds like yes the sound you make when asking for clarification or agreement in English: Eh? What did you say? This letter sounds like an English F. This letter usually sounds just like English g. Before e or i, it sounds like harsh English h. It's very similar to J in Spanish. In general, this letter is silent.
    [Show full text]
  • Anniversary of the Tassr Through the Eyes of Our Contemporary Art Personalities
    M.M.KHABUTDINOVA DOI: 10.26907/2311-2042-2021-16-1-142-152 THE 100TH ANNIVERSARY OF THE TASSR THROUGH THE EYES OF OUR CONTEMPORARY ART PERSONALITIES Mileusha Mukhametzyanovna Khabutdinova, Kazan Federal University, 18 Kremlyovskaya Str., Kazan, 420008, Russian Federation, [email protected]. The article analyzes the artists’ works at the Exhibition of Works of Art depicting the significant events, related to the history of the TASSR formation (September 26–30, 2020). The winners of the competition for the grant of the Cabinet of Ministers of the Republic of Tatarstan were Nailya Kumysnikova’s interior panel “Altyn Kosh” (“the Golden Bird”), Rustem Shamsutov’s triptych “The History of the Tatar Written Language”, Anvar Sayfutdinov’s painting “My Mother”, Marina Samakaeva’s triptych “Festivities in Kazan”, Farit Valiullin’s “Laying the Stone of the Bulgarian Academy of Islam”, the painting “The Birth of Energy” by Grigory Eydinov and others, as well as Al- exander Drevsyannikov’s sculptural composition “Gabdulla Kariev” and Rustam Gabbasov’s “The Artist”. The purpose of the research is to reveal the features of historicism in the works of the artists, participating in the exhibition. Key words: Republic of Tatarstan, Tatars, painting, anniversary of the Tatar Autonomous Soviet Socialist Republic, historicism. Introduction Sayfutdinov, Rustem Shamsutov and Gennady In order to establish a fund of literary works Eydinov. and works of art, reflecting the significance of events, related to the history of the TASSR for- Methods mation, on the eve of the 100th anniversary of the We used the formal-stylistic and hermeneutic TASSR formation, the Cabinet of Ministers of the methods of analysis when considering the exhibi- Republic of Tatarstan created grants for the nomi- tion works.
    [Show full text]
  • Transliteration for Low-Resource Code-Switching Texts: Building an Automatic Cyrillic-To-Latin Converter for Tatar
    Transliteration for Low-Resource Code-Switching Texts: Building an Automatic Cyrillic-to-Latin Converter for Tatar Chihiro Taguchi∗, Yusuke Sakai∗, and Taro Watanabe {taguchi.chihiro.td0, sakai.yusuke.sr9, taro}@is.naist.jp Nara Institute of Science and Technology Abstract mixes Russian words, it is not easy to obtain a pure Tatar dataset for developing a language detector. We introduce a Cyrillic-to-Latin transliterator for the Tatar language based on subword-level Existing methods are based on either Tatar mono- language identification. The transliteration is a lingual rules or a huge bundle of ad-hoc rules aimed challenging task due to the following two rea- to cover Russian-origin words (Bradley, 2014; Ko- sons. First, because modern Tatar texts often rbanov, n.d.). The experimental results in Section contain intra-word code-switching to Russian, 6 demonstrate that the former monolingual rule- a different transliteration set of rules needs to based transliterators show low accuracy because be applied to each morpheme depending on the Russian words are not supported. The latter exten- language, which necessitates morpheme-level sively rule-based transliterator has better accuracy, language identification. Second, the fact that Tatar is a low-resource language, with most of but still misses a certain amount of words. This the texts in Cyrillic, makes it difficult to pre- implies that a strictly rule-based method requires pare a sufficient dataset. Given this situation, an ever-lasting process of adding rules ad hoc for we proposed a transliteration method based exceptional words to further improve the accuracy. on subword-level language identification.
    [Show full text]
  • Henze, Paul B
    CORE Metadata, citation and similar papers at core.ac.uk Provided by Lancaster E-Prints Ideology and Alphabets in the former USSR Ideology and Alphabets in the former USSR Mark Sebba, Department of Linguistics, Lancaster University Published as: Sebba, Mark. Ideology and Alphabets in the former USSR. Language Problems & Language Planning, Volume 30, Number 2, 2006, pp. 99-125(27) http://www.ingentaconnect.com/content/jbp/lplp/2006/00000030/00000002/art00001 Brief biography Mark Sebba Mark Sebba has worked in the Department of Linguistics and Modern English Language at Lancaster University since 1989. He is currently Reader in Sociolinguistics and Language Contact His interests include language contact, bilingualism, corpus linguistics and orthography. His previous publications include The Syntax of Serial Verbs (John Benjamins, 1987), a study of verb forms in creoles, West African and other languages, London Jamaican (Longman, 1993), on the language of young Caribbeans born in London and Contact Languages: Pidgins and Creoles (Macmillan, 1997). He is working on a book on the sociolinguistics of orthography around the world. 1 Ideology and Alphabets in the former USSR Ideology and Alphabets in the former USSR Abstract In November 2002 the Russian parliament passed a law requiring all official languages within the Russian Federation to use the Cyrillic alphabet. The legislation caused great controversy and anger in some quarters, especially in Tatarstan, the Russian republic whose attempt to Romanise the script for the Tatar language provoked the new law. This paper examines the background to these recent events in the former Soviet Union, showing how they provide a contemporary illustration of the ways that linguistic (specifically: orthographic) issues can interact with ideologies and discourses at the political and social levels.
    [Show full text]
  • The Crimean Tatar Muslim Community: Between Annexed Crimea and Mainland Ukraine
    Studia Religiologica 52 (1) 2019, s. 27–48 doi:10.4467/20844077SR.19.003.10785 www.ejournals.eu/Studia-Religiologica The Crimean Tatar Muslim Community: Between Annexed Crimea and Mainland Ukraine Konrad Zasztowt https://orcid.org/0000-0003-3789-627X Department for European Islam Studies University of Warsaw [email protected] Abstract The aim of this article is the description of the religious, cultural, social, and political situation of the Crimean Tatar Muslims both living in Crimea and outside of the Russia-annexed territory of Crimea in mainland Ukraine.1 The Crimean Tatar Muslims in mainland Ukraine may be divided into two categories, those who lived there before Russia’s annexation of Crimea in 2014, and those who settled there after – internally displaced persons from Crimea. In the case of the latter, one sig- nificant reason behind their migrations is persecution against them on religious grounds. Members of the Islamic communities related to the Salafi version of Islam as well as followers of Hizb ut- Tahrir either fled from the annexed peninsula or were harshly repressed by Russian law enforce- ment authorities. The mainstream group of the Crimean Tatar Muslims are adherents of Sunni Islam and Hanafi Madhab. The latter is also the main Islamic religious community in Russia, which is recognized as a legitimate form of Islam by the Russian government. However, the Hanafi Crimean Muslims are also being pressured by the authorities in occupied Crimea. The leader of their reli- gious organisation, the Crimean Muftiat, Mufti Emirali Ablayev had to declare his loyalty to the Russian state.
    [Show full text]
  • One Nation, Two Languages: Latinization and Language Reform in Turkey and Azerbaijan, 1905-1938
    One Nation, Two Languages: Latinization and Language Reform in Turkey and Azerbaijan, 1905-1938 A DISSERTATION SUBMITTED TO THE FACULTY OF THE UNIVERSITY OF MINNESOTA BY Wesley Wayne Lummus IN PARTIAL FULFILLMENT OF THE REQUIERMENTS FOR THE DEGREE OF Wesley Wayne Lummus DOCTOR OF PHILOSOPHY Dr. Giancarlo Casale May 2021 Wesley Lummus, 2021 © Acknowledgments My first debt of gratitude is due to my advisor, Giancarlo Casale, for his nine years of steadfast support and guidance of my dissertation research and writing. Secondly, I would like to think the members of my defense committee, Patricia Lorcin, Carol Hakim, Theofanis Stavrou, and Sinem Casale for the many years they spent reading my chapter drafts and providing comment and encouragement. I am equally grateful to the immense support network I had during the research and writing of this dissertation. I would like to thank Rasool Abbaszade, Fiala Abdullayeva, Saad Abi-Hamad, Fakhreddin and Ruqiyye Ahmadov, Adam Blackler, Fikri Çiçek, Brooke Depenbusch, Jess Farrell, Jala Garibova, Melissa Hampton, Dilek Hanımefendi, Ketaki Jaywant, Orry Klainman, Matt King, Katie Lambright, Jamie and Cash Lummus, John Manke, Sara Mirkalai, Sidow Mohammed, Sultan Toprak Oker, Ibrahim Oker, Gabriele Payne, and Virgil Slade. I am very grateful for their support. i Abstract This dissertation examines 20th-century Turkic Latinization, the process by which Turkic language reformers replaced the Perso-Arabic alphabet with the Latin-based New Turkish Alphabet, from a transnational perspective. Focusing on the Turkish and Soviet Azerbaijani cases, my work reconstructs the intellectual and nationalist networks that were forged across imperial and national boundaries and shaped the debates over language, modernization, and national identity in Turkey, Azerbaijan, and Central Asia.
    [Show full text]