https://helda.helsinki.fi
The Finno-Ugric Languages and the Internet project
Jauhiainen, Tommi
Septentrio Academic Publishing 2015-01-15
Jauhiainen , T , Jauhiainen , H & Linden , K 2015 , The Finno-Ugric Languages and the Internet project . in T Pirinen , F Tyers & T Trosterud (eds) , First International Workshop on Computational Linguistics for Uralic Languages : Proceedings of the Workshop . vol. 2 , Septentrio Conference Series , no. 2 , vol. 2015 , Septentrio Academic Publishing , Tromsø , pp. 87–98 , International Workshop on Computational Linguistics for Uralic Languages , Tromsø , Norway , 16/01/2015 . https://doi.org/10.7557/scs.2015.2 http://hdl.handle.net/10138/159402 https://doi.org/10.7557/scs.2015.2
Downloaded from Helda, University of Helsinki institutional repository. This is an electronic reprint of the original article. This reprint may differ from the original in pagination and typographic detail. Please cite the original version. Proceedings of 1st International Workshop in Computational Linguistics for Uralic Languages (IWCLUL 2015); ‹http://dx.doi.org/10.7557/scs.2015.2› य़F 'JOOP6HSJD -BOHVBHFT BOE य़F *OUFSOFU QSPKFDU
)FJEJ +BVIJBJOFO 6OJWFSTJUZ PG )FMTJOLJ %FQBSUNFOU PG .PEFSO -BOHVBHFT ?2B/BXDm?BBM2M!?2HbBMFBX7B 5PNNJ +BVIJBJOFO 6OJWFSTJUZ PG )FMTJOLJ %FQBSUNFOU PG .PEFSO -BOHVBHFT iQKKBXDm?BBM2M!?2HbBMFBX7B ,SJTUFS -JOE©O 6OJWFSTJUZ PG )FMTJOLJ %FQBSUNFOU PG .PEFSO -BOHVBHFT F`Bbi2`XHBM/2M!?2HbBMFBX7B %FDFNCFS
"CTUSBDU य़JT QBQFS EFTDSJCFT B ,POF 'PVOEBUJPO GVOEFE QSPKFDU DBMMFE ۡय़F 'JOOP 6HSJD -BOHVBHFT BOE य़F *OUFSOFUۡ UPHFUIFS XJUI TPNF PG UIF BDIJFWFE SFTVMUT य़F NBJO BDUJWJUZ PG UIF QSPKFDU JT UP DSBXM UIF JOUFSOFU BOE HBUIFS UFYUT XSJ॒FO JO TNBMM 6SBMJD MBOHVBHFT य़F TFOUFODFT BOE XPSET PG UIF GPVOE UFYUT XJMM CF BTTFNCMFE JOUP B GSFFMZ BWBJMBCMF DPSQVT $SBXMJOH JT EPOF VTJOH UIF PQFO TPVSDF DSBXMFS )FSJUSJY XIJDI JT EFWFMPQFE CZ UIF *OUFSOFU "SDIJWF )FSJUSJY DSBXMT UISPVHI UIF QBHFT BOE QBTTFT UIF GPVOE UFYUT UP B MBOHVBHF JEFOUJFS 8F BSF VT JOH B TUBUF PG UIF BSU MBOHVBHF JEFOUJFS XIJDI IBT CFFO GVSUIFS EFWFMPQFE XJUIJO UIF QSPKFDU BOE IBT CFFO FWBMVBUFE VTJOH MBOHVBHFT 8F EFTDSJCF UIF MBOHVBHF
य़JT XPSL JT MJDFOTFE VOEFS B $SFBUJWF $PNNPOT "॒SJCVUJPOۗ/P%FSJWBUJWFT *OUFSOBUJPOBM -JDFODF 3.7. e Finno-Ugric Languages and e Internet project [page 87 of 131]‹http://dx.doi.org/10.7557/5.3471› -JDFODF EFUBJMT ?iiT,ff+`2iBp2+QKKQMbXQ`;fHB+2Mb2bf#v@M/f9Xyf
Proceedings of 1st International Workshop in Computational Linguistics for Uralic Languages (IWCLUL 2015); ‹http://dx.doi.org/10.7557/scs.2015.2›
JEFOUJDBUJPO FWBMVBUJPO SFTVMUT DPODFSOJOH UIF 6SBMJD MBOHVBHFT LOPXO CZ UIF MBOHVBHF JEFOUJFS 8F BMTP EFTDSJCF UIF JOJUJBM PCTFSWBUJPOT BOE SFTVMUT GSPN UIF STU WF MBSHF DSBXMT XIJDI XFSF EPOF JO UIF OBUJPOBM JOUFSOFU EPNBJOT PG 'JO MBOE 4XFEFO /PSXBZ 3VTTJB BOE &TUPOJB
*OUSPEVDUJPO य़F 'JOOP6HSJD -BOHVBHFT BOE य़F *OUFSOFUۡ QSPKFDUy TUBSUFE BU UIF CFHJOOJOH PGۡ BT QBSU PG UIF ,POF 'PVOEBUJPO -BOHVBHF 1SPHSBNNF <> य़F QSPKFDU JT MPDBUFE BU UIF %FQBSUNFOU PG .PEFSO -BOHVBHFTr BU UIF 6OJWFSTJUZ PG )FMTJOLJ BOE JT QBSU PG UIF JOUFSOBUJPOBM $-"3*/s DPPQFSBUJPO य़F NBJO HPBM PG UIF QSPKFDU JT UP CVJME B QSP UPUZQF PG B TZTUFN UIBU XJMM DSBXM UIF JOUFSOFU BOE HBUIFS UFYUT XSJ॒FO JO TNBMM 6SBMJD MBOHVBHFT 8FC DSBXMJOH IBT CFFO VTFE UP DPMMFDU UFYU DPSQPSB GPS B WBSJFUZ PG MBO HVBHFT < > CVU UP PVS LOPXMFEHF UIJT QSPKFDU JT UIF STU DPMMFDUJOH UIF UFYUT JO TNBMM 6SBMJD MBOHVBHFT य़F MBSHFTU 6SBMJD MBOHVBHFT )VOHBSJBO 'JOOJTI BOE &TUPOJBO BSF PVUTJEF UIF TDPQF PG UIF QSPKFDU 8F BSF VTJOH MBOHVBHF JEFOUJDBUJPO TPॏXBSF UP EF UFDU UIF MBOHVBHF PG UIF DSBXMFE XFCQBHFT य़F HBUIFSFE UFYUT XJMM CF DPMMFDUFE JOUP TFOUFODF BOE XPSE DPSQPSB GPS FBDI MBOHVBHF BOE UIF MJOLT UP UIF BTTPDJBUFE XFC QBHFT JOUP MJOL DPMMFDUJPOT य़F DPSQPSB XJMM BDU BT B TPVSDF GPS MJOHVJTUT BOE UIF MJOL DPMMFDUJPOT XJMM IPQFGVMMZ TQSFBE UIF LOPXMFEHF PG UIF FYJTUFODF PG SFMFWBOU QBHFT UP JOUFSFTUFE QBSUJFT %VF UP UIF DPQZSJHIUT DPOOFDUFE XJUI MPOHFS UFYUT XF BSF POMZ QVCMJTIJOH DPSQPSB PG VQ UP TFOUFODFMFOHUI UFYU TOJQQFUT XJUI MJOLT UP UIF PSJHJOBM UFYU PO UIF JOUFSOFU य़F OFHPUJBUJPOT GPS GSFF VTF PG DPQZSJHIUFE UFYUT XPVME CF GBS CFZPOE UIF SFTPVSDFT PG UIJT QSPKFDU 8F BJN BU NBLJOH UIF DPNQMFUF XPSLPX GSPN UIF DSBXMJOH UP UIF DSFBUJPO PG DPSQPSB BT BVUPNBUFE BT QPTTJCMF *O 4FDUJPO XF UBML BCPVU UIF MBOHVBHF JEFOUJFS VTFE JO UIF QSPKFDU XJUI SFTQFDU UP 6SBMJD MBOHVBHFT 4FDUJPO EFBMT XJUI UIF WF MBSHF DSBXMT EPOF TP GBS JF JO UIF OBUJPOBM EPNBJOT PG 'JOMBOE 4XFEFO /PSXBZ 3VTTJB BOE &TUPOJB *O 4FDUJPOT BOE XF EFTDSJCF UIF MJOL DPMMFDUJPO BOE UIF TFOUFODF DPSQPSB SFTQFDUJWFMZ
-BOHVBHF *EFOUJࣲDBUJPO GPS 6SBMJD -BOHVBHFT 8F BSF VTJOH BO FYUFOEFE WFSTJPO PG UIF MBOHVBHF JEFOUJFS EFTDSJCFE JO <> य़F FYUFOEFE WFSTJPO PG UIF MBOHVBHF JEFOUJDBUJPO NFUIPE XJMM CF EFTDSJCFE JO B GPSUI DPNJOH KPVSOBM BSUJDMF <> XIFSF JU JT FWBMVBUFE UPHFUIFS XJUI UIF NFUIPET QSFTFOUFE
y?iiT,ffbmFBXHBM;X?2HbBMFBX7B r?iiT,ffrrrX?2HbBMFBX7BfKQ/2`MHM;m;2bf s?iiT,ff+H`BMX2m 3.7. e Finno-Ugric Languages and e Internet project [page 88 of 131]‹http://dx.doi.org/10.7557/5.3471›
Proceedings of 1st International Workshop in Computational Linguistics for Uralic Languages (IWCLUL 2015); ‹http://dx.doi.org/10.7557/scs.2015.2›
JO <> <> <> <> <> BOE <> य़F MBOHVBHF JEFOUJFS VTFT SFMBUJWF GSFRVFODJFT PG OHSBNT PG DIBSBDUFST UPHFUIFS XJUI UPLFOT BOE UPLFOCBTFE CBDLP य़F FWBMVBUFE MBOHVBHF JEFOUJFS SFDPHOJ[FT MBOHVBHFT GSPN BMM BSPVOE UIF XPSME य़F EFOJUJPO PG B MBOHVBHF JT UBLFO GSPN &UIOPMPHVF <> BOE UIF EJWJTJPO PG UIF MBOHVBHFT JT BT JO UIF *40 TUBOEBSE܅ य़F MBOHVBHFT JODMVEF 6SBMJD MBOHVBHFT )VOHBSJBO ,IBOUZ .BOTJ &TUPOJBO 'JOOJTI ,WFO 'JOOJTI 5PSOFEBMFO 'JOOJTI *OHSJBO ,BSF MJBO -JW -JWWJ,BSFMJBO -VEJBO 7FQT 7PUJD 7µSP )JMM .BSJ .FBEPX .BSJ &S[ZB .PLTIB 6ENVSU ,PNJ1FSNZBL ,PNJ;ZSJBO *OBSJ 4BNJ ,JMEJO 4BNJ 4LPMU 4BNJ 6NF 4BNJ -VMF 4BNJ /PSUI 4BNJ 4PVUI 4BNJ /FOFUT /HBOBTBO 'PSFTU &OFUT 5VO ESB &OFUT BOE 4FMLVQ य़F DVSSFOU WFSTJPO PG UIF MBOHVBHF JEFOUJFS POMZ LOPXT POF PSUIPHSBQIZ QFS MBOHVBHF CVU UIJT XJMM CF DPSSFDUFE JO GVUVSF WFSTJPOT य़F FWBMVBUJPO PG UIF MBOHVBHF JEFOUJFS XBT EPOF JO UFTUT VTJOH TFRVFODFT GSPN UP DIBSBDUFST JO MFOHUI BOE UIF SFDBMM HVSFT GPS 6SBMJD MBOHVBHFT DBO CF TFFO JO 5BCMF य़F DIBSBDUFS TFRVFODFT BSF SBOEPN QBSUT PG UIF UFTU DPSQVT BMXBZT CF HJOOJOH GSPN UIF CFHJOOJOH PG B XPSE 'PS NPTU PG UIF MBOHVBHFT UIF UFTU TFU DPOTJTUT PG UIF UFYUT PG UIF 6OJWFSTBM %FDMBSBUJPO PG )VNBO 3JHIUT JO UIBU QBSUJDVMBS MBOHVBHF BOE UIF USBJOJOH TFU JT UIF UFYU GSPN 8JLJQFEJB य़F BJN XBT UIBU UIF UFTU TFU XPVME BM XBZT CF B UFYU GSPN B EJFSFOU EPNBJO UIBO UIF USBJOJOH UFYU य़JT XBT FBTJMZ QPTTJCMF GPS NPTU PG UIF MBSHFS MBOHVBHFT CVU RVJUF EJਖ਼DVMU GPS TPNF PG UIF TNBMMFS 6SBMJD MBO HVBHFT *O TPNF DBTFT TVDI BT 'PSFTU BOE 5VOESB &OFUT UIF UFTU TFU JT GSPN B EJFSFOU TFDUJPO PG UIF TBNF EPDVNFOU BT UIF USBJOJOH UFYU य़F BNPVOU PG USBJOJOH NBUFSJBM EJFST DPOTJEFSBCMZ CFUXFFO MBOHVBHFT SBOHJOH GSPN DIBSBDUFST JO 6NF 4BNJ UP PWFS NJMMJPO DIBSBDUFST JO UIF )VOHBSJBO NBUFSJBM य़F BWFSBHF JEFOUJDBUJPO BDDVSBDZ GPS 6SBMJD MBOHVBHFT JT HFOFSBMMZ TMJHIUMZ MPXFS UIBO GPS BMM MBOHVBHFT य़JT JT EVF UP TPNF PG UIF MBOHVBHFT CFJOH WFSZ DMPTF WBSJFUJFT PG FBDI PUIFS FTQFDJBMMZ XJUIJO UIF 'JOOJD MBOHVBHFT य़F MBOHVBHF JEFOUJFS IBT OPU CFFO PQUJNJ[FE UP QFSGPSN CF॒FS XJUI 6SBMJD MBOHVBHFT PS FWFO XJUI DMPTFMZ SFMBUFE MBOHVBHFT *O UIF UFTU MFOHUI PG DIBSBDUFST UIF PWFSBMM BWFSBHF JT XIFSFBT UIF BWFSBHF JT GPS UIF 6SBMJD MBOHVBHFT "MNPTU BMM MBOHVBHFT B॒BJO SFDBMM BU DIBSBDUFST 5BCMF BMTP JODMVEFT UIF SFDBMM HVSFT B॒BJOFE CZ UIF XJEFMZ VTFE NFUIPE EFTDSJCFE JO <> 5BCMF JT B DPOGVTJPO NBUSJY TIPXJOH UIF LJOE PG NJTUBLFT UIBU XFSF NBEF JO UIF MBOHVBHF JEFOUJDBUJPOT CFUXFFO NPTU PG UIF 'JOOJD MBOHVBHFT JO UIF DIBS BDUFS TJ[FE UFTUT /PUBCMF QSPCMFN QBJST BSF UIPTF PG 'JOOJTI O BOE 5PSOFEBMFO 'JOOJTI U 5PSOFEBMFO 'JOOJTI U BOE ,WFO 'JOOJTI ॎW BT XFMM BT -VEJBO MVE BOE -JWWJ,BSFMJBO PMP य़F 4BNJD MBOHVBHFT BSF OPU BT FBTJMZ DPOGVTFE BOE OP UBCMF JT QSFTFOUFE GPS UIFN
܅?iiT,ffrrrXbBHXQ`;fBbQejN@jf
3.7. e Finno-Ugric Languages and e Internet project [page 89 of 131]‹http://dx.doi.org/10.7557/5.3471›
Proceedings of 1st International Workshop in Computational Linguistics for Uralic Languages (IWCLUL 2015); ‹http://dx.doi.org/10.7557/scs.2015.2›
46,* ࠩBS $5 ࠩBS FLL FOG FOI ࣲO ࣲU GLW IVO J[I LDB LPJ LQW LSM MJW MVE NEG NIS NOT NSK NZW OJP PMP TFM TKE TKV TNB TNF TNK TNO TNT VEN WFQ WPU WSP ZSL "WFSBHF
5BCMF 3FDBMMT PG 6SBMJD MBOHVBHFT PCUBJOFE CZ UIF UXP MBOHVBHF JEFOUJFST GPS UFTU MFOHUIT CFUXFFO BOE DIBSBDUFST 1FSDFOUBHFT BSF BWFSBHFT PWFS TBNQMF TFRVFODFT PG FBDI MFOHUI य़F HVSFT PO UIF MFॏ BSF GPS UIF JEFOUJFS EFWFMPQFE XJUIJO UIF QSPKFDU BOE UIF HVSFT PO UIF SJHIU BSF GPS BO JEFOUJFS VTJOH UIF XFMMLOPXO NFUIPE PG $BWOBS 5SFOLMF
3.7. e Finno-Ugric Languages and e Internet project [page 90 of 131]‹http://dx.doi.org/10.7557/5.3471›
Proceedings of 1st International Workshop in Computational Linguistics for Uralic Languages (IWCLUL 2015); ‹http://dx.doi.org/10.7557/scs.2015.2›
FLL ࣲO ࣲU GLW J[I LSM MVE PMP WFQ WPU WSP FLL ࣲO ࣲU GLW J[I LSM MVE PMP WFQ WPU WSP
5BCMF $POGVTJPO NBUSJY PG 'JOOJD MBOHVBHFT य़F 'JOOJD MBOHVBHFT XFSF NJTUBLFO BMTP BT PUIFS MBOHVBHFT BOE JG UIF HVSFT PG UIF PUIFS MBOHVBHFT XPVME CF BEEFE UP UIF UBCMF UIF SPXT XPVME BEE UP
5BCMF TIPXT UIF DPOGVTJPOT CFUXFFO NPTU PG UIF 6SBMJD MBOHVBHFT XSJ॒FO JO $ZSJMMJD TDSJQU 'PSFTU &OFUT TFFNT UP EPNJOBUF PWFS 5VOESB &OFUT BT EPFT ,PNJ 1FSNZBL PWFS ,PNJ;ZSJBO य़F MBOHVBHF NPEFM GPS ,PNJ1FSNZBL JT CBTFE QSJNBS JMZ PO 8JLJQFEJB BOE UIF POF GPS ,PNJ;ZSJBO PO B CJCMF USBOTMBUJPO य़F UFTU NBUFSJBM GPS ,PNJ;ZSJBO JT BMTP GSPN UIF CJCMF XIJDI TIPVME JO GBDU NBLF JU FBTJFS UP JEFOUJGZ CVU OFWFSUIFMFTT PG UIF DIBSBDUFS FYUSBDUT BSF JEFOUJFE BT ,PNJ1FSNZBL
FOG FOI LDB LPJ LQW NEG NIS NSK NZW VEN FOG FOI LDB LPJ LQW NEG NIS NSK NZW VEN
5BCMF $POGVTJPO NBUSJY PG TPNF PG UIF 6SBMJD MBOHVBHFT XSJ॒FO JO $ZSJMMJD TDSJQU "T XJUI UIF 5BCMF UIF SPXT JO UIJT UBCMF XPVME BEE UP JG UIF HVSFT GPS BMM UIF MBOHVBHFT XPVME CF TIPXO
$SBXMJOH UIF /BUJPOBM %PNBJOT *O PSEFS UP DSBXM GPS QBHFT XSJ॒FO JO TNBMM 6SBMJD MBOHVBHFT XF VTF )FSJUSJY <> B XFC BSDIJWJOH TZTUFN EFWFMPQFE CZ UIF *OUFSOFU "SDIJWF܆ 8F DIPTF UP VTF )FSJUSJY
܆?iiT,ffrrrX`+?Bp2XQ`; 3.7. e Finno-Ugric Languages and e Internet project [page 91 of 131]‹http://dx.doi.org/10.7557/5.3471›
Proceedings of 1st International Workshop in Computational Linguistics for Uralic Languages (IWCLUL 2015); ‹http://dx.doi.org/10.7557/scs.2015.2›
'JHVSF " EJBHSBN TIPXJOH IPX 6SBMJD XFC QBHFT BSF QSPDFTTFE EVSJOH B DSBXM
BॏFS DPOTJEFSJOH TFWFSBM BWBJMBCMF DSBXMFST )FSJUSJY JT UIF PVUDPNF PG NBOZ ZFBST PG EFWFMPQNFOU CZ UIF *OUFSOFU "SDIJWF BOE JU JT TUJMM CFJOH NBJOUBJOFE *O UIF CFHJOOJOH JU XBT B QSPEVDU PG DPPQFSBUJPO CFUXFFO *OUFSOFU "SDIJWF BOE UIF /PSEJD OBUJPOBM MJCSBSJFT BOE JU JT TUJMM VTFE CZ TFWFSBM OBUJPOBM MJCSBSJFT BSPVOE UIF XPSME UP DPMMFDU OBUJPOBM XFC BSDIJWFT *U IBT BMTP CFFO TVDDFTTGVMMZ VTFE GPS DPMMFDUJOH TJNJMBS DPSQPSB CZ <> य़F HPBM PG UIF *OUFSOFU "SDIJWF JT UP BSDIJWF UIF TJUFT BT VTBCMF DPMMFDUJPOT GPS GVUVSF HFOFSBUJPOT *O UIJT QSPKFDU XF BSF POMZ JOUFSFTUFE JO DPMMFDUJOH UIF UFYUVBM NBUFSJBM JO UIF TNBMM 6SBMJD MBOHVBHFT य़F WFSTJPO XF BSF DVSSFOUMZ VTJOH EPXOMPBET BMM UFYU MFT BT XFMM BT QEG MFT JU OET GSPN XJUIJO UIF EPNBJO JO RVFTUJPO 8F IBWF NBEF TPNF DVTUPN DIBOHFT UP UIF DPEF PG UIF DSBXMFS TP UIBU XIFO B MF IBT CFFO EPXOMPBEFE UIF SVOOJOH UFYU JT FYUSBDUFE GSPN JT य़F DSBXMFS TFOET BO FYDFSQU PG DIBSBDUFST GSPN UIF NJEEMF PG UFYU UP UIF MBOHVBHF JEFOUJFS XIJDI SFTQPOET XJUI UIF *40 DPEF PG UIF MBOHVBHF PG UIF UFYU *G UIF MBOHVBHF JT POF PG UIF 6SBMJD MBOHVBHFT XF BSF JOUFSFTUFE JO UIF DSBXMFS TFOET UIF XIPMF UFYU UP CF SFJEFOUJFE *G UIJT JEFOUJDBUJPO TUJMM QPJOUT UP B TNBMM 6SBMJD MBOHVBHF UIF XIPMF UFYU PG UIF QBHF JT BSDIJWFE य़F BEESFTT BOE UIF JEFOUJDBUJPO SFTVMUT PG BMM DSBXMFE QBHFT JODMVEJOH UIF POFT SFKFDUFE BSF TUPSFE 8F IBWF DIPTFO UP TUBSU DPMMFDUJOH UIF NBUFSJBM CZ DSBXMJOH UIF OBUJPOBM EPNBJOT NPTU MJLFMZ UP DPOUBJO NBUFSJBM XSJ॒FO JO TNBMM 6SBMJD MBOHVBHFT JF FF OP SV BOE TF 5BCMF TIPXT UIF TUBUJTUJDT GPS FBDI PG UIF WF OBUJPOBM EPNBJO DSBXMT य़F STU DPMVNO ۡ63-Tۡ JOEJDBUFT UIF UPUBM OVNCFS PG EPXOMPBEFE MFT EVSJOH UIF DSBXM BOE UIF GPVSUI DPMVNO ۡEPNBJOTۡ JOEJDBUFT IPX NBOZ TVCEPNBJOT XFSF DSBXMFE PS UIF 3VTTJBO DSBXM ۡEPNBJOTۡ OVNCFST POMZ UIF UPQ MFWFM EPNBJOT BT PVS DSBXMJOH' UBDUJD IBE DIBOHFE XIFO UIF DSBXM TUBSUFE य़F TFDPOE DPMVNO ۡ-* 63-Tۡ HJWFT UIF OVNCFS PG QBHFT JEFOUJFE UP DPOUBJO TNBMM 6SBMJD MBOHVBHFT EVSJOH UIF DSBXM य़F UIJSE DPMVNO ۡ ۡ JT UIF OVNCFS PG QBHFT TUJMM JEFOUJFE BT 6SBMJD BॏFS B NPSF -* 63-T3.7. e Finno-Ugric Languages and e Internet project [page 92 of 131]‹http://dx.doi.org/10.7557/5.3471›
Proceedings of 1st International Workshop in Computational Linguistics for Uralic Languages (IWCLUL 2015); ‹http://dx.doi.org/10.7557/scs.2015.2›
QSFDJTF MBOHVBHF JEFOUJDBUJPO XIJDI XBT EPOF BॏFS UIF DSBXM य़F ॏI BOE TJYUI DPMVNOT JOEJDBUF UIF OVNCFS PG TVCEPNBJOT JO 6SBMJD MBOHVBHFT CFGPSF BOE BॏFS UIF NPSF QSFDJTF JEFOUJDBUJPO
63-T -* 63-T -* 63-T EPNBJOT -* EPNBJOT -* EPNBJOT ࣲ TF OP SV FF
5BCMF 4UBUJTUJDT GPS UIF DSBXMT PG UIF WF OBUJPOBM EPNBJOT
*O UIF GPMMPXJOH QBSBHSBQIT XF NBLF B GFX OPUFT PG UIF JOEJWJEVBM OBUJPOBM EPNBJO DSBXMT 8F XJMM CF EPJOH OFX DSBXMT GPS UIFN BMM BT NPTU PG UIF DSBXMT FOEFE CFGPSF UIF EPNBJOT XFSF SFBMMZ FYIBVTUFE *U JT BDUVBMMZ GBS GSPN USJWJBM UP EFOF XIFO XF IBWF FYIBVTUFE B OBUJPOBM EPNBJO य़FSF BSF NBOZ TJUFT UIBU EZOBNJDBMMZ HFOFSBUF BO JOOJUF OVNCFS PG XFCQBHFT BOE FWFO TVCEPNBJOT XIJDI NBLFT FBDI PG UIF OBUJPOBM EPNBJOT JOOJUF JO TJ[F JG XF BSF DBMDVMBUJOH UIF OVNCFS PG QBHFT PS TVC EPNBJOT $VSSFOUMZ XF IBWF TFU UIF DSBXMFS UP BDDFQU POMZ VQ UP QBHFT QFS UPQEPNBJO &WFO UIJT EPFT OPU BMMPX VT UIF MVYVSZ UP KVTU XBJU GPS UIF FYIBVTUJPO PG UIF RVFVFE 63-T BT TPNF PG UIF TJUFT BSF WFSZ TMPX UP TFSWF UIF QBHFT BOE XBJUJOH GPS UIFN UP SFBDI UIF QBHF MJNJU DPVME UBLF NPOUIT PS FWFO ZFBST 8F BSF OPX USZJOH UP EFUFSNJOF JG UIF TQFFE PG UIF DSBXM DPVME CF VTFE BT BO BEEJUJPOBM JOEJDBUJPO PG EPNBJO FYIBVTUJPO 8F DPVME DPOTJEFS GPS FYBNQMF UIBU JG UIF IPVSMZ BWFSBHF TQFFE ESPQT CFMPX PG UIF BWFSBHF TQFFE PG UIF STU XFFL PG UIF DSBXM UIF EPNBJO JT FYIBVTUFE "T XF IBWF OPU ZFU TUBCJMJ[FE PVS DSJUFSJB GPS FYIBVTUJPO UIF DVSSFOU HVSFT DBO OPU SFBMMZ CF DPNQBSFE XJUI FBDI PUIFS BOE EP OPU HJWF B SFBMJTUJD QJDUVSF PG UIF TJ[F PG UIF OBUJPOBM EPNBJOT
'JOOJTI ࣲ EPNBJO *O UIF DSBXM PG UIF 'JOOJTI JOUFSOFU XF EPXOMPBEFE BSPVOE NJMMJPO MFT य़F 'JOOJTI DSBXM XBT UFSNJOBUFE BT UIF DSBXMFS XBT SVOOJOH PVU PG EJTL TQBDF BOE UIF TQFFE IBE TMPXFE EPXO UP BSPVOE QBHFT QFS TFDPOE य़F BWFSBHF TQFFE GPS UIF STU XFFL PG UIF DSBXM XBT QBHFT QFS TFDPOE TP XF DPVME DPOTJEFS UIF DSBXM FYIBVTUFE य़F 'JOOJTI MBOHVBHF NPEFM VTFE JO UIF MBOHVBHF JEFOUJFS JT EFSJWFE GSPN UIF 'JOOJTI 8JLJQFEJB XIJDI JT NPTUMZ XSJ॒FO JO UIF Pਖ਼DJBM GPSN PG XSJ॒FO 'JOOJTI )PXFWFS NBOZ QFPQMF EP XSJUF UFYUT VTJOH UIF GPSNT PG UIFJS SF TQFDUJWF EJBMFDUT य़F XSJ॒FO GPSNT PG 5PSOFEBMFO 'JOOJTI ,WFO 'JOOJTI BOE *OHSJBO BSF NVDI DMPTFS UP UIFTF XSJ॒FO 'JOOJTI EJBMFDUT UIBO UIF Pਖ਼DJBM XSJ॒FO 'JOOJTI य़JT DSFBUFT B QSPCMFN BT B HSFBU OVNCFS PG UFYUT JO EJBMFDUBM 'JOOJTI BSF JEFOUJFE BT UIFTF UISFF MBOHVBHFT य़JT DPVME CF DPSSFDUFE CZ DSFBUJOH TFQBSBUF MBOHVBHF NPEFMT 3.7. e Finno-Ugric Languages and e Internet project [page 93 of 131]‹http://dx.doi.org/10.7557/5.3471›
Proceedings of 1st International Workshop in Computational Linguistics for Uralic Languages (IWCLUL 2015); ‹http://dx.doi.org/10.7557/scs.2015.2›
GPS XSJ॒FO EJBMFDUT PG 'JOOJTI
4XFEJTI TF EPNBJO *O UIF DSBXM PG UIF 4XFEJTI JOUFSOFU XF EPXOMPBEFE BSPVOE NJMMJPO MFT BOE JU XBT UFSNJOBUFE BॏFS UIF TQFFE PG UIF DSBXM IBE TMPXFE UP BSPVOE QBHFT QFS TFDPOE XIJDI JT XFMM CFMPX PG UIF QBHFT QFS TFDPOE BWFSBHF GPS UIF STU XFFL PG UIF DSBXM *O UIJT DSBXM UIF MJCSBSZ TZTUFNT XIJDI XFSF MPDBMJ[FE GPS /PSUIFSO 4BNJ UVSOFE PVU UP CF B QSPCMFN 0WFS EPNBJOT EFEJDBUFE UP MJCSBSZ TZTUFNT XFSF GPVOE CZ UIF DSBXMFS UIF MBSHFTU PG UIFN CJCMJPUFLOPSBTF XJUI QBHFT JO /PSUIFSO 4BNJ /PU POMZ EP UIF MJCSBSZ DBUBMPHVF MJOLT FYQJSF RVJDLMZ UIFZ VTVBMMZ JODMVEF UIF TBNF UFYU PWFS BOE PWFS BHBJO 8F XJMM IBWF UP JODPSQPSBUF B EPVCMFDIFDLJOH NFDIBOJTN CFGPSF DSFBUJOH UIF MJOL DPMMFDUJPOT BOE DPSQPSB JO PSEFS UP BWPJE DPMMFDUJOH UIF TBNF UFYU NBOZ UJNFT 4PNF NFUIPET GPS SFNPWJOH EPVCMFT BSF JOUSPEVDFE JO <> BOE <>
/PSXFHJBO OP EPNBJO 3VTTJBO SV EPNBJO BOE &TUPOJBO FF EPNBJO "MM PG UIF UISFF DSBXMT FOEFE JO QSPCMFNT XJUI FJUIFS TPॏXBSF IBSEXBSF PS UIF DSBXM TUSBUFHZ VTFE य़F OBUJPOBM EPNBJOT XFSF GBS GSPN FYIBVTUFE CZ BOZ DSJUFSJB XF IBWF DPO TJEFSFE "T NPTU PG UIF QBHFT XSJ॒FO JO 6SBMJD MBOHVBHFT IBWF CFFO GPVOE JO PVS DSBXM PG UIF /PSXFHJBO JOUFSOFU XF BSF JODMVEJOH UIF TUBUJTUJDT GPS UIFTF DSBXMT JO UIF 5BCMFT BOE
ࠬF MJOL DPMMFDUJPO य़F MJOL DPMMFDUJPO UIBU JT BWBJMBCMF BU UIF UJNF UIJT XBT XSJ॒FO IBT CFFO DVSBUFE CZ IBOE GSPN UIF QBHFT PG UIF DSBXM܇ *U DPOUBJOT MJOLT UP TJUFT GSPN XIJDI UFYU XBT GPVOE JO PG UIF TNBMM 6SBMJD MBOHVBHFT TFBSDIFE य़F MJOLT IBWF OPU CFFO WFSJFE CZ FYQFSUT PS OBUJWF TQFBLFST 8F BSF QMBOOJOH UP JODPSQPSBUF B TJNQMF DSPXE TPVSDJOH QMBUGPSN UP CF BCMF UP HFU GFFECBDL GSPN UIPTF XIP BSF NPSF GBNJMJBS XJUI UIF MBOHVBHFT य़F MJOLT MFBE UP UIF BDUVBM QBHFT DVSSFOUMZ GPVOE PO UIF JOUFSOFU TP JU JT DFSUBJO UIBU TPNF PG UIF MJOLT XJMM CSFBL XIJMF UJNF QBTTFT 8F XJMM OPU SFNPWF UIF CSPLFO MJOLT DPNQMFUFMZ GSPN UIF EBUBCBTF CVU NPWF UIFN FMTFXIFSF BOE JG QPTTJCMF NBLF MJOLT UP DPSSFTQPOEJOH QBHFT JO UIF *OUFSOFU "SDIJWF 0VS HPBM JT UP NBLF UIF DSFBUJPO PG UIF MJOL DPMMFDUJPO BT BVUPNBUFE BT QPTTJCMF BWPJEJOH NBOVBM MJOL DVSBUJPO य़F MJTU PG TJUFT GSPN UIF 'JOOJTI DSBXM BWBJMBCMF BU UIF NPNFOU JODMVEFT POMZ UIF GSPOU QBHF PG TPNF TJUFT BMUIPVHI NPSF QBHFT XFSF GPVOE EVSJOH UIF DSBXM *O UIF GVUVSF BMM UIF MJOLT GPVOE XIFO DSBXMJOH XJMM CF JO UIF MJTU PG
܇?iiT,ffbmFBXHBM;X?2HbBMFBX7BfbBi2b
3.7. e Finno-Ugric Languages and e Internet project [page 94 of 131]‹http://dx.doi.org/10.7557/5.3471›
Proceedings of 1st International Workshop in Computational Linguistics for Uralic Languages (IWCLUL 2015); ‹http://dx.doi.org/10.7557/scs.2015.2›
MJOLT य़F HSFBUFTU QSPCMFNT XJMM BSJTF GSPN UIF QBHFT XIJDI BSF XSJ॒FO JO B DPSSFDUMZ JEFOUJFE MBOHVBHF CVU BSF OFBSEPVCMFT PG PUIFS QBHFT BT JO UIF DBTF PG UIF 4XFEJTI MJCSBSZ TZTUFNT NFOUJPOFE BCPWF
4FOUFODF DPSQPSB 8IFO XF BSF DSFBUJOH B TFOUFODF DPSQPSB POF PG UIF HSFBUFTU QSPCMFNT XF IBWF BU UIF NPNFOU JT UIBU NBOZ PG UIF EPXOMPBEFE QBHFT BSF NVMUJMJOHVBM 8F BSF DVSSFOUMZ NBLJOH B TVSWFZ PG UIF NFUIPET GPS MBOHVBHF JEFOUJDBUJPO JO NVMUJMJOHVBM EPDVNFOUT BOE JO GVUVSF XF XJMM JODPSQPSBUF B NVMUJMJOHVBM EFUFDUJPO NFUIPE JO UIF TZTUFN 8F EJE B TFQBSBUF MBOHVBHF JEFOUJDBUJPO GPS BMM UIF MJOFT PG BMM UIF MFT DPOUBJOJOH TNBMM 6SBMJD MBOHVBHFT JO PSEFS UP TFF XIJDI POFT BSF JOEFFE XSJ॒FO JO UIF MBOHVBHF JOEJ DBUFE CZ UIF JEFOUJDBUJPO PG UIF MF BT B XIPMF य़F STU DPMVNO PG 5BCMF TIPXT UIF OVNCFS PG VOJRVF MJOFT JEFOUJFE BT XSJ॒FO JO UIF SFTQFDUJWF MBOHVBHF य़F DPMVNOT BOE TIPX UIF UPUBM OVNCFS PG XPSET BOE DIBSBDUFST JO UIFTF MJOFT &WFO UIPVHI UIF MBOHVBHF JEFOUJDBUJPO VTFE JT TUBUF PG UIF BSU JU JT GBS GSPN QFS GFDU य़F DPMMFDUJPOT IBWF OPU CFFO DIFDLFE CZ FYQFSUT JO UIF DPSSFTQPOEJOH MBO HVBHFT CVU TPNF UIJOHT BSF DMFBS FWFO GPS B MBZNBO य़F UISFF TNBMMFTU DPMMFDUJPOT 4FMLVQ /HBOBTBO BOE 5VOESB &OFUT EP OPU BDUVBMMZ DPOUBJO UIF JOUFOEFE MBOHVBHF BU BMM CVU BSF NPTUMZ TPNF TPSU PG MJTUT PG NPEFM OVNCFST JO $ZSJMMJD GPS /HBOBTBO BOE 5VOESB &OFUT य़F 4FMLVQ DPMMFDUJPO DPOTJTUT PG QBHFT GSFRVFOUFE XJUI UIF XPSE ܈ XIJDI JT B WFSZ GSFRVFOU XPSE JO UIF USBJOJOH UFYU VTFE GPS 4FMLVQ ۡBSUJDMFۡ ξϟύϟϩϬۡۡ य़F *OHSJBO DPMMFDUJPO DPOUBJOT NPTUMZ IZQIFOBUFE PS PUIFSXJTF CSPLFO 'JOOJTI PS 'JOOJTI EJBMFDUT XSJ॒FO BT TQPLFO &TQFDJBMMZ TPVUIXFTUFSO 'JOOJTI EJBMFDUT TFFN UP CF JEFOUJFE BT *OHSJBO /PSUIFSO 'JOOJTI EJBMFDUT BSF JEFOUJFE BT FJUIFS ,WFO PS 5PSOFEBMFO 'JOOJTI *O PSEFS UP Y UIFTF QSPCMFNT XJUI EJBMFDUBM 'JOOJTI XF XJMM USZ UP JODMVEF TFQBSBUF MBOHVBHF NPEFMT GPS EJBMFDUBM 'JOOJTI JO UIF GVUVSF 4PNF MPOH MJTUT PG OBNFT GSPN UIF 4XFEJTI DSBXM IBWF CFFO JEFOUJFE BT -VEJBO BOE BSF OPX QPMMVUJOH UIF DPMMFDUJPO PG UIF MBOHVBHF य़F ,IBOUZ DPMMFDUJPO JT QPMMVUFE CZ MPOH MJTUT PG NPEFM OVNCFST XSJ॒FO JO $ZSJMMJD TDSJQU
'VUVSF XPSL -BOHVBHF JEFOUJDBUJPO NFUIPET XJMM CF GVSUIFS EFWFMPQFE JO PSEFS UP JNQSPWF UIF SPCVTUOFTT PG UIF MBOHVBHF JEFOUJFS XF VTF 8F XJMM BMTP USZ UP FOIBODF UIF MBOHVBHF NPEFMT JO PSEFS UP NPSF Fਖ਼DJFOUMZ EJTUJOHVJTI TNBMM MBOHVBHFT GSPN WBSJPVT EJBMFDUT
܈rrrXvKH+?BH/X`mf/Q+bfFQMpnb2HFmTX/Q+ 3.7. e Finno-Ugric Languages and e Internet project [page 95 of 131]‹http://dx.doi.org/10.7557/5.3471›
Proceedings of 1st International Workshop in Computational Linguistics for Uralic Languages (IWCLUL 2015); ‹http://dx.doi.org/10.7557/scs.2015.2›
VOJRVF MJOFT XPSET ࠩBSBDUFST /PSUIFSO 4BNJ TNF 7µSP WSP *OHSJBO J[I &BTUFSO .BSJ NIS 8FTUFSO .BSJ NSK 4PVUIFSO 4BNJ TNB 6ENVSU VEN &S[ZB NZW -VMF 4BNJ TNK *OBSJ 4BNJ TNO 5PSOFEBMFO 'JOOJTI ࣲU .PLTIB NEG ,PNJ;ZSJBO LQW 4LPMU 4BNJ TNT -JWWJ PMP -JW MJW ,WFO 'JOOJTI GLW -VEJBO MVE ,IBOUZ LDB 7FQT WFQ ,PNJ1FSNZBL LPJ ,BSFMJBO LSM .BOTJ NOT 7PUJD WPU ,JMEJO 4BNJ TKE 6NF 4BNJ TKV /FOFUT ZSL 4FMLVQ TFM /HBOBTBO OJP 5VOESB &OFUT FOI
5BCMF य़F OVNCFS PG MJOFT XPSET BOE DIBSBDUFST JO TNBMM 6SBMJD MBOHVBHFT BॏFS MBOHVBHF JEFOUJGZJOH FBDI JOEJWJEVBM MJOF
BOE UP JEFOUJGZ MBOHVBHFT JO NVMUJMJOHVBM EPDVNFOUT य़F NBUFSJBM GPVOE EVSJOH UIF BMSFBEZ QFSGPSNFE DSBXMT XJMM CF PG BTTJTUBODF GPS UIJT 8F XJMM GVSUIFSNPSF USZ UP JODSFBTF UIF TQFFE PG UIF DSBXMFS JO PSEFS UP DSBXM NPSF XJEFMZ BOE NPSF PॏFO य़F NPTU JNQPSUBOU OBUJPOBM EPNBJOT JO SFHBSE UP UIF 6SBMJD MBOHVBHF TQFBLFST XJMM CF SFDSBXMFE XJUI NPSF EFQUI BOE NPSF GSFRVFODZ 8F BMTP JOUFOE UP MPPL JOUP DSBXMJOH UIF DPN BOE PSH EPNBJOT 8F XPVME BMTP MJLF UP FYUSBDU UFYU GSPN PUIFS CJOBSZ MFT UIBO QEGT
"ࠨOPXMFEHNFOUT 8F BSF UIBOLGVM GPS UIF TVQQPSU PG UIF ,POF 'PVOEBUJPO BOE UP +BDL 3VFUFS GPS TIBSJOH IJT JOWBMVBCMF SFTPVSDFT JO 'JOOP6HSJD MBOHVBHFT 8F BMTP UIBOL UIF BOPOZNPVT SFWJFXFST GPS UIFJS TVHHFTUJPOT BOE SFGFSFODFT
3.7. e Finno-Ugric Languages and e Internet project [page 96 of 131]‹http://dx.doi.org/10.7557/5.3471›
Proceedings of 1st International Workshop in Computational Linguistics for Uralic Languages (IWCLUL 2015); ‹http://dx.doi.org/10.7557/scs.2015.2›
3FGFSFODFT <> ,POF 'PVOEBUJPO य़F MBOHVBHF QSPHSBNNF I॒QXXXLPOFFOTBBUJPFO <> . #BSPOJ 4 #FSOBSEJOJ " 'FSSBSFTJ BOE & ;BODIF॒B य़F XBDLZ XJEF XFC " DPMMFDUJPO PG WFSZ MBSHF MJOHVJTUJDBMMZ QSPDFTTFE XFCDSBXMFE DPSQPSB -BOHVBHF ۗ 3FTPVSDFT BOE &WBMVBUJPO <> 7MBEJNS #FOLP "SBOFB :FU BOPUIFS GBNJMZ PG DPNQBSBCMF XFC DPSQPSB *O #SOP QBHFT ۗ 54% 1SPDFFEJOHT PG UI *OUFSOBUJPOBM $POGFSFODF $[FDI 3FQVCMJD T UIF5PNNJ +BVIJBJOFO 5FLTUJO LJFMFO BVUPNBB॒JOFO UVOOJTUBNJOFO .BTUFS <> TJT 6OJWFSTJUZ PG )FMTJOLJ )FMTJOLJ <> 5PNNJ +BVIJBJOFO BOE ,SJTUFS -JOE©O *EFOUJGZJOH UIF MBOHVBHF PG EJHJUBM UFYU *O SFWJFX TVCNJ॒FE <> 8JMMJBN # $BWOBS BOE +PIO . 5SFOLMF /HSBNCBTFE UFYU DBUFHPSJ[BUJPO *O 1SPDFFEJOHT PG 4%"*3 SE "OOVBM 4ZNQPTJVN PO %PDVNFOU "OBMZTJT BOE *O -BT 7FHBT QBHFT ۗ GPSNBUJPO 3FUSJFWBM <> &SJL 5SPNQ BOE .ZLPMB 1FDIFOJ[LJZ (SBQICBTFE OHSBN MBOHVBHF JEFOUJDB UJPO PO TIPSU UFYUT *O #FOFMFBSO 1SPDFFEJOHT PG UIF 5XFOUJFUI #FMHJBO %VUDI य़F )BHVF QBHFT ۗ POGFSFODF PO .BDIJOF -FBSOJOH$ <> +PIO 7PHFM BOE %BWJE 5SFTOFS,JSTDI 3PCVTU MBOHVBHF JEFOUJDBUJPO JO TIPSU OPJTZ UFYUT *NQSPWFNFOUT UP MJHB *O ࡋF ࡋJSE *OUFSOBUJPOBM 8PSLTIPQ PO .JOJOH #SJTUPM QBHFT ۗ 6CJRVJUPVT BOE 4PDJBM &OWJSPONFOUT <> +PTI ,JOH BOE +PO %FIEBSJ "O OHSBN CBTFE MBOHVBHF JEFOUJDBUJPO TZTUFN य़F 0IJP 4UBUF 6OJWFSTJUZ <> 3BMG % #SPXO 4FMFDUJOH BOE XFJHIUJOH OHSBNT UP JEFOUJGZ MBOHVBHFT *O 5FYU 4QFFDI BOE %JBMPHVF UI *OUFSOBUJPOBM $POGFSFODF 54% 1JMTFO $[FDI 1JMTFO QBHFT ۗ 4FQUFNCFS 1SPDFFEJOHT 3FQVCMJD <> 5PNNJ 7BUBOFO +BBLLP + 7¤ZSZOFO BOE 4BNJ 7JSQJPKB -BOHVBHF JEFOUJDBUJPO PG TIPSU UFYU TFHNFOUT XJUI OHSBN NPEFMT *O -3&$ 4FWFOUI *OUFSOBUJPOBM .BMUB QBHFT ۗ POGFSFODF PO -BOHVBHF 3FTPVSDFT BOE &WBMVBUJPO$
3.7. e Finno-Ugric Languages and e Internet project [page 97 of 131]‹http://dx.doi.org/10.7557/5.3471›
Proceedings of 1st International Workshop in Computational Linguistics for Uralic Languages (IWCLUL 2015); ‹http://dx.doi.org/10.7557/scs.2015.2›
<> . 1BVM -FXJT (BSZ ' 4JNPOT BOE $IBSMFT % 'FOOJH FEJUPST &UIOPMPHVF -BO HVBHFT PG UIF XPSME TFWFOUFFOUI FEJUJPO 4*- *OUFSOBUJPOBM %BMMBT 5FYBT <> (PSEPO .PIS .JDIBFM 4UBDL *HPS 3OJUPWJD %BO "WFSZ BOE .JDIFMF ,JNQUPO *OUSPEVDUJPO UP IFSJUSJY *O UI *OUFSOBUJPOBM 8FC "SDIJWJOH 8PSLTIPQ #BUI
3.7. e Finno-Ugric Languages and e Internet project [page 98 of 131]‹http://dx.doi.org/10.7557/5.3471›