https://helda.helsinki.fi

The Finno-Ugric and the Internet project

Jauhiainen, Tommi

Septentrio Academic Publishing 2015-01-15

Jauhiainen , , Jauhiainen , & Linden , 2015 , The Finno- and the Internet project . in T Pirinen , Tyers & T Trosterud (eds) , First International Workshop on Computational Linguistics for : Proceedings of the Workshop . vol. 2 , Septentrio Conference Series , no. 2 , vol. 2015 , Septentrio Academic Publishing , Tromsø , pp. 87–98 , International Workshop on Computational Linguistics for Uralic Languages , Tromsø , , 16/01/2015 . https://doi.org/10.7557/scs.2015.2 http://hdl.handle.net/10138/159402 https://doi.org/10.7557/scs.2015.2

Downloaded from Helda, University of institutional repository. This is an electronic reprint of the original . This reprint may differ from the original in pagination and typographic detail. Please cite the original version. Proceedings of 1st International Workshop in Computational Linguistics for Uralic Languages (IWCLUL 2015); ‹http://dx.doi.org/10.7557/scs.2015.2› य़F 'JOOP6HSJD -BOHVBHFT BOE य़F *OUFSOFU QSPKFDU

)FJEJ +BVIJBJOFO 6OJWFSTJUZ PG )FMTJOLJ %FQBSUNFOU PG .PEFSO -BOHVBHFT ?2B/BXD?BM2M!?2HbBMFBX7B 5PNNJ +BVIJBJOFO 6OJWFSTJUZ PG )FMTJOLJ %FQBSUNFOU PG .PEFSO -BOHVBHFT iQKKBXDm?BBM2M!?2HbBMFBX7B ,SJTUFS -JOE© 6OJWFSTJUZ PG )FMTJOLJ %FQBSUNFOU PG .PEFSO -BOHVBHFT F`Bbi2`XHBM/2M!?2HbBMFBX7B %FDFNCFS  

"CTUSBDU य़JT QBQFS EFTDSJCFT B ,POF 'PVOEBUJPO GVOEFE QSPKFDU DBMMFE ۡय़F 'JOOP 6HSJD -BOHVBHFT BOE य़F *OUFSOFUۡ UPHFUIFS XJUI TPNF PG UIF BDIJFWFE SFTVMUT य़F NBJO BDUJWJUZ PG UIF QSPKFDU JT UP DSBXM UIF JOUFSOFU BOE HBUIFS UFYUT XSJ॒FO JO TNBMM 6SBMJD MBOHVBHFT य़F TFOUFODFT BOE XPSET PG UIF GPVOE UFYUT XJMM CF BTTFNCMFE JOUP B GSFFMZ BWBJMBCMF DPSQVT $SBXMJOH JT EPOF VTJOH UIF PQFO TPVSDF DSBXMFS )FSJUSJY XIJDI JT EFWFMPQFE CZ UIF *OUFSOFU "SDIJWF )FSJUSJY DSBXMT UISPVHI UIF QBHFT BOE QBTTFT UIF GPVOE UFYUT UP B MBOHVBHF JEFOUJ੗FS 8F BSF VT JOH B TUBUF PG UIF BSU MBOHVBHF JEFOUJ੗FS XIJDI IBT CFFO GVSUIFS EFWFMPQFE XJUIJO UIF QSPKFDU BOE IBT CFFO FWBMVBUFE VTJOH  MBOHVBHFT 8F EFTDSJCF UIF MBOHVBHF

य़JT XPSL JT MJDFOTFE VOEFS B $SFBUJWF $PNNPOT "॒SJCVUJPOۗ/P%FSJWBUJWFT  *OUFSOBUJPOBM -JDFODF 3.7.  Finno-Ugric Languages and e Internet project [page 87 of 131]‹http://dx.doi.org/10.7557/5.3471› -JDFODF EFUBJMT ?iiT,ff+`2iBp2+QKKQMbXQ`;fHB+2Mb2bf#@M/f9Xyf

 Proceedings of 1st International Workshop in Computational Linguistics for Uralic Languages (IWCLUL 2015); ‹http://dx.doi.org/10.7557/scs.2015.2›

JEFOUJ੗DBUJPO FWBMVBUJPO SFTVMUT DPODFSOJOH UIF  6SBMJD MBOHVBHFT LOPXO CZ UIF MBOHVBHF JEFOUJ੗FS 8F BMTP EFTDSJCF UIF JOJUJBM PCTFSWBUJPOT BOE SFTVMUT GSPN UIF ੗STU ੗WF MBSHF DSBXMT XIJDI XFSF EPOF JO UIF OBUJPOBM JOUFSOFU EPNBJOT PG 'JO MBOE 4XFEFO /PSXBZ 3VTTJB BOE &TUPOJB

 *OUSPEVDUJPO य़F 'JOOP6HSJD -BOHVBHFT BOE य़F *OUFSOFUۡ QSPKFDUy TUBSUFE BU UIF CFHJOOJOH PGۡ  BT QBSU PG UIF ,POF 'PVOEBUJPO -BOHVBHF 1SPHSBNNF <> य़F QSPKFDU JT MPDBUFE BU UIF %FQBSUNFOU PG .PEFSO -BOHVBHFTr BU UIF 6OJWFSTJUZ PG )FMTJOLJ BOE JT QBSU PG UIF JOUFSOBUJPOBM $-"3*/ DPPQFSBUJPO य़F NBJO HPBM PG UIF QSPKFDU JT UP CVJME B QSP UPUZQF PG B TZTUFN UIBU XJMM DSBXM UIF JOUFSOFU BOE HBUIFS UFYUT XSJ॒FO JO TNBMM 6SBMJD MBOHVBHFT 8FC DSBXMJOH IBT CFFO VTFE UP DPMMFDU UFYU DPSQPSB GPS B WBSJFUZ PG MBO HVBHFT < > CVU UP PVS LOPXMFEHF UIJT QSPKFDU JT UIF ੗STU DPMMFDUJOH UIF UFYUT JO TNBMM 6SBMJD MBOHVBHFT य़F MBSHFTU 6SBMJD MBOHVBHFT )VOHBSJBO 'JOOJTI BOE &TUPOJBO BSF PVUTJEF UIF TDPQF PG UIF QSPKFDU 8F BSF VTJOH MBOHVBHF JEFOUJ੗DBUJPO TPॏXBSF UP EF UFDU UIF MBOHVBHF PG UIF DSBXMFE XFCQBHFT य़F HBUIFSFE UFYUT XJMM CF DPMMFDUFE JOUP TFOUFODF BOE XPSE DPSQPSB GPS FBDI MBOHVBHF BOE UIF MJOLT UP UIF BTTPDJBUFE XFC QBHFT JOUP MJOL DPMMFDUJPOT य़F DPSQPSB XJMM BDU BT B TPVSDF GPS MJOHVJTUT BOE UIF MJOL DPMMFDUJPOT XJMM IPQFGVMMZ TQSFBE UIF LOPXMFEHF PG UIF FYJTUFODF PG SFMFWBOU QBHFT UP JOUFSFTUFE QBSUJFT %VF UP UIF DPQZSJHIUT DPOOFDUFE XJUI MPOHFS UFYUT XF BSF POMZ QVCMJTIJOH DPSQPSB PG VQ UP TFOUFODFMFOHUI UFYU TOJQQFUT XJUI MJOLT UP UIF PSJHJOBM UFYU PO UIF JOUFSOFU य़F OFHPUJBUJPOT GPS GSFF VTF PG DPQZSJHIUFE UFYUT XPVME CF GBS CFZPOE UIF SFTPVSDFT PG UIJT QSPKFDU 8F BJN BU NBLJOH UIF DPNQMFUF XPSL੘PX GSPN UIF DSBXMJOH UP UIF DSFBUJPO PG DPSQPSB BT BVUPNBUFE BT QPTTJCMF *O 4FDUJPO  XF UBML BCPVU UIF MBOHVBHF JEFOUJ੗FS VTFE JO UIF QSPKFDU XJUI SFTQFDU UP 6SBMJD MBOHVBHFT 4FDUJPO  EFBMT XJUI UIF ੗WF MBSHF DSBXMT EPOF TP GBS F JO UIF OBUJPOBM EPNBJOT PG 'JOMBOE 4XFEFO /PSXBZ 3VTTJB BOE &TUPOJB *O 4FDUJPOT  BOE  XF EFTDSJCF UIF MJOL DPMMFDUJPO BOE UIF TFOUFODF DPSQPSB SFTQFDUJWFMZ

 -BOHVBHF *EFOUJࣲDBUJPO GPS 6SBMJD -BOHVBHFT 8F BSF VTJOH BO FYUFOEFE WFSTJPO PG UIF MBOHVBHF JEFOUJ੗FS EFTDSJCFE JO <> य़F FYUFOEFE WFSTJPO PG UIF MBOHVBHF JEFOUJ੗DBUJPO NFUIPE XJMM CF EFTDSJCFE JO B GPSUI DPNJOH KPVSOBM BSUJDMF <> XIFSF JU JT FWBMVBUFE UPHFUIFS XJUI UIF NFUIPET QSFTFOUFE

?iiT,ffbmFBXHBM;?2HbBMFBX7B ?iiT,ffrrrX?2HbBMFBX7BfKQ/2`MHM;m;2bf s?iiT,ff+H`BMX2m 3.7. e Finno-Ugric Languages and e Internet project [page 88 of 131]‹http://dx.doi.org/10.7557/5.3471›

 Proceedings of 1st International Workshop in Computational Linguistics for Uralic Languages (IWCLUL 2015); ‹http://dx.doi.org/10.7557/scs.2015.2›

JO <> <> <> <> <> BOE <> य़F MBOHVBHF JEFOUJ੗FS VTFT SFMBUJWF GSFRVFODJFT PG OHSBNT PG DIBSBDUFST UPHFUIFS XJUI UPLFOT BOE UPLFOCBTFE CBDLP੖ य़F FWBMVBUFE MBOHVBHF JEFOUJ੗FS SFDPHOJ[FT  MBOHVBHFT GSPN BMM BSPVOE UIF XPSME य़F EF੗OJUJPO PG B MBOHVBHF JT UBLFO GSPN &UIOPMPHVF <> BOE UIF EJWJTJPO PG UIF MBOHVBHFT JT BT JO UIF *40  TUBOEBSE܅ य़F  MBOHVBHFT JODMVEF  6SBMJD MBOHVBHFT )VOHBSJBO ,IBOUZ .BOTJ &TUPOJBO 'JOOJTI ,WFO 'JOOJTI 5PSOFEBMFO 'JOOJTI *OHSJBO ,BSF MJBO -JW -JWWJ,BSFMJBO -VEJBO 7FQT 7PUJD 7µSP )JMM .BSJ .FBEPX .BSJ &S[ZB .PLTIB 6ENVSU ,PNJ1FSNZBL ,PNJ;ZSJBO *OBSJ 4BNJ ,JMEJO 4BNJ 4LPMU 4BNJ 6NF 4BNJ -VMF 4BNJ /PSUI 4BNJ 4PVUI 4BNJ /FOFUT /HBOBTBO 'PSFTU &OFUT 5VO ESB &OFUT BOE 4FMLVQ य़F DVSSFOU WFSTJPO PG UIF MBOHVBHF JEFOUJ੗FS POMZ LOPXT POF PSUIPHSBQIZ QFS MBOHVBHF CVU UIJT XJMM CF DPSSFDUFE JO GVUVSF WFSTJPOT य़F FWBMVBUJPO PG UIF MBOHVBHF JEFOUJ੗FS XBT EPOF JO UFTUT VTJOH TFRVFODFT GSPN  UP  DIBSBDUFST JO MFOHUI BOE UIF SFDBMM ੗HVSFT GPS 6SBMJD MBOHVBHFT DBO CF TFFO JO 5BCMF  य़F DIBSBDUFS TFRVFODFT BSF SBOEPN QBSUT PG UIF UFTU DPSQVT BMXBZT CF HJOOJOH GSPN UIF CFHJOOJOH PG B XPSE ' NPTU PG UIF MBOHVBHFT UIF UFTU TFU DPOTJTUT PG UIF UFYUT PG UIF 6OJWFSTBM %FDMBSBUJPO PG )VNBO 3JHIUT JO UIBU QBSUJDVMBS MBOHVBHF BOE UIF USBJOJOH TFU JT UIF UFYU GSPN 8JLJQFEJB य़F BJN XBT UIBU UIF UFTU TFU XPVME BM XBZT CF B UFYU GSPN B EJ੖FSFOU EPNBJO UIBO UIF USBJOJOH UFYU य़JT XBT FBTJMZ QPTTJCMF GPS NPTU PG UIF MBSHFS MBOHVBHFT CVU RVJUF EJਖ਼DVMU GPS TPNF PG UIF TNBMMFS 6SBMJD MBO HVBHFT *O TPNF DBTFT TVDI BT 'PSFTU BOE 5VOESB &OFUT UIF UFTU TFU JT GSPN B EJ੖FSFOU TFDUJPO PG UIF TBNF EPDVNFOU BT UIF USBJOJOH UFYU य़F BNPVOU PG USBJOJOH NBUFSJBM EJ੖FST DPOTJEFSBCMZ CFUXFFO MBOHVBHFT SBOHJOH GSPN   DIBSBDUFST JO 6NF 4BNJ UP PWFS  NJMMJPO DIBSBDUFST JO UIF )VOHBSJBO NBUFSJBM य़F BWFSBHF JEFOUJ੗DBUJPO BDDVSBDZ GPS 6SBMJD MBOHVBHFT JT HFOFSBMMZ TMJHIUMZ MPXFS UIBO GPS BMM MBOHVBHFT य़JT JT EVF UP TPNF PG UIF MBOHVBHFT CFJOH WFSZ DMPTF WBSJFUJFT PG FBDI PUIFS FTQFDJBMMZ XJUIJO UIF 'JOOJD MBOHVBHFT य़F MBOHVBHF JEFOUJ੗FS IBT OPU CFFO PQUJNJ[FE UP QFSGPSN CF॒FS XJUI 6SBMJD MBOHVBHFT PS FWFO XJUI DMPTFMZ SFMBUFE MBOHVBHFT *O UIF UFTU MFOHUI PG  DIBSBDUFST UIF PWFSBMM BWFSBHF JT  XIFSFBT UIF BWFSBHF JT  GPS UIF 6SBMJD MBOHVBHFT "MNPTU BMM MBOHVBHFT B॒BJO  SFDBMM BU  DIBSBDUFST 5BCMF  BMTP JODMVEFT UIF SFDBMM ੗HVSFT B॒BJOFE CZ UIF XJEFMZ VTFE NFUIPE EFTDSJCFE JO <> 5BCMF  JT B DPOGVTJPO NBUSJY TIPXJOH UIF LJOE PG NJTUBLFT UIBU XFSF NBEF JO UIF MBOHVBHF JEFOUJ੗DBUJPOT CFUXFFO NPTU PG UIF 'JOOJD MBOHVBHFT JO UIF  DIBS BDUFS TJ[FE UFTUT /PUBCMF QSPCMFN QBJST BSF UIPTF PG 'JOOJTI ੗O BOE 5PSOFEBMFO 'JOOJTI ੗U 5PSOFEBMFO 'JOOJTI ੗U BOE ,WFO 'JOOJTI ॎ BT XFMM BT -VEJBO MVE BOE -JWWJ,BSFMJBO PMP  य़F 4BNJD MBOHVBHFT BSF OPU BT FBTJMZ DPOGVTFE BOE OP UBCMF JT QSFTFOUFE GPS UIFN

܅?iiT,ffrrrXbBHXQ`;fBbQejN@jf

3.7. e Finno-Ugric Languages and e Internet project [page 89 of 131]‹http://dx.doi.org/10.7557/5.3471›

 Proceedings of 1st International Workshop in Computational Linguistics for Uralic Languages (IWCLUL 2015); ‹http://dx.doi.org/10.7557/scs.2015.2›

46,*  ࠩBS $5  ࠩBS            FLL           FOG           FOI           ࣲO           ࣲU           GLW           IVO           J[I           LDB           LPJ           LQW           LSM           MJW           MVE           NEG           NIS           NOT           NSK            NZW           OJP           PMP           TFM           TKE           TKV           TNB           TNF           TNK           TNO           TNT           VEN           WFQ           WPU           WSP           ZSL           "WFSBHF          

5BCMF  3FDBMMT PG 6SBMJD MBOHVBHFT PCUBJOFE CZ UIF UXP MBOHVBHF JEFOUJ੗FST GPS UFTU MFOHUIT CFUXFFO  BOE  DIBSBDUFST 1FSDFOUBHFT BSF BWFSBHFT PWFS  TBNQMF TFRVFODFT PG FBDI MFOHUI य़F ੗HVSFT PO UIF MFॏ BSF GPS UIF JEFOUJ੗FS EFWFMPQFE XJUIJO UIF QSPKFDU BOE UIF ੗HVSFT PO UIF SJHIU BSF GPS BO JEFOUJ੗FS VTJOH UIF XFMMLOPXO NFUIPE PG $BWOBS  5SFOLMF

3.7. e Finno-Ugric Languages and e Internet project [page 90 of 131]‹http://dx.doi.org/10.7557/5.3471›

 Proceedings of 1st International Workshop in Computational Linguistics for Uralic Languages (IWCLUL 2015); ‹http://dx.doi.org/10.7557/scs.2015.2›

FLL ࣲO ࣲU GLW J[I LSM MVE PMP WFQ WPU WSP FLL       ࣲO             ࣲU             GLW               J[I                     LSM                   MVE                       PMP                     WFQ                 WPU                       WSP            

5BCMF  $POGVTJPO NBUSJY PG 'JOOJD MBOHVBHFT य़F 'JOOJD MBOHVBHFT XFSF NJTUBLFO BMTP BT PUIFS MBOHVBHFT BOE JG UIF ੗HVSFT PG UIF PUIFS MBOHVBHFT XPVME CF BEEFE UP UIF UBCMF UIF SPXT XPVME BEE UP 

5BCMF  TIPXT UIF DPOGVTJPOT CFUXFFO NPTU PG UIF 6SBMJD MBOHVBHFT XSJ॒FO JO $ZSJMMJD TDSJQU 'PSFTU &OFUT TFFNT UP EPNJOBUF PWFS 5VOESB &OFUT BT EPFT ,PNJ 1FSNZBL PWFS ,PNJ;ZSJBO य़F MBOHVBHF NPEFM GPS ,PNJ1FSNZBL JT CBTFE QSJNBS JMZ PO 8JLJQFEJB BOE UIF POF GPS ,PNJ;ZSJBO PO B CJCMF USBOTMBUJPO य़F UFTU NBUFSJBM GPS ,PNJ;ZSJBO JT BMTP GSPN UIF CJCMF XIJDI TIPVME JO GBDU NBLF JU FBTJFS UP JEFOUJGZ CVU OFWFSUIFMFTT  PG UIF  DIBSBDUFS FYUSBDUT BSF JEFOUJ੗FE BT ,PNJ1FSNZBL

FOG FOI LDB LPJ LQW NEG NIS NSK NZW VEN FOG             FOI           LDB         LPJ     LQW       NEG               NIS           NSK     NZW                 VEN                

5BCMF  $POGVTJPO NBUSJY PG TPNF PG UIF 6SBMJD MBOHVBHFT XSJ॒FO JO $ZSJMMJD TDSJQU "T XJUI UIF 5BCMF  UIF SPXT JO UIJT UBCMF XPVME BEE UP  JG UIF ੗HVSFT GPS BMM UIF  MBOHVBHFT XPVME CF TIPXO

 $SBXMJOH UIF /BUJPOBM %PNBJOT *O PSEFS UP DSBXM GPS QBHFT XSJ॒FO JO TNBMM 6SBMJD MBOHVBHFT XF VTF )FSJUSJY <> B XFC BSDIJWJOH TZTUFN EFWFMPQFE CZ UIF *OUFSOFU "SDIJWF܆ 8F DIPTF UP VTF )FSJUSJY

܆?iiT,ffrrrX`+?Bp2XQ`; 3.7. e Finno-Ugric Languages and e Internet project [page 91 of 131]‹http://dx.doi.org/10.7557/5.3471›

 Proceedings of 1st International Workshop in Computational Linguistics for Uralic Languages (IWCLUL 2015); ‹http://dx.doi.org/10.7557/scs.2015.2›

'JHVSF  " EJBHSBN TIPXJOH IPX 6SBMJD XFC QBHFT BSF QSPDFTTFE EVSJOH B DSBXM

BॏFS DPOTJEFSJOH TFWFSBM BWBJMBCMF DSBXMFST )FSJUSJY JT UIF PVUDPNF PG NBOZ ZFBST PG EFWFMPQNFOU CZ UIF *OUFSOFU "SDIJWF BOE JU JT TUJMM CFJOH NBJOUBJOFE *O UIF CFHJOOJOH JU XBT B QSPEVDU PG DPPQFSBUJPO CFUXFFO *OUFSOFU "SDIJWF BOE UIF /PSEJD OBUJPOBM MJCSBSJFT BOE JU JT TUJMM VTFE CZ TFWFSBM OBUJPOBM MJCSBSJFT BSPVOE UIF XPSME UP DPMMFDU OBUJPOBM XFC BSDIJWFT *U IBT BMTP CFFO TVDDFTTGVMMZ VTFE GPS DPMMFDUJOH TJNJMBS DPSQPSB CZ <> य़F HPBM PG UIF *OUFSOFU "SDIJWF JT UP BSDIJWF UIF TJUFT BT VTBCMF DPMMFDUJPOT GPS GVUVSF HFOFSBUJPOT *O UIJT QSPKFDU XF BSF POMZ JOUFSFTUFE JO DPMMFDUJOH UIF UFYUVBM NBUFSJBM JO UIF TNBMM 6SBMJD MBOHVBHFT य़F WFSTJPO XF BSF DVSSFOUMZ VTJOH EPXOMPBET BMM UFYU ੗MFT BT XFMM BT QEG ੗MFT JU ੗OET GSPN XJUIJO UIF EPNBJO JO RVFTUJPO 8F IBWF NBEF TPNF DVTUPN DIBOHFT UP UIF DPEF PG UIF DSBXMFS TP UIBU XIFO B ੗MF IBT CFFO EPXOMPBEFE UIF SVOOJOH UFYU JT FYUSBDUFE GSPN JT य़F DSBXMFS TFOET BO FYDFSQU PG  DIBSBDUFST GSPN UIF NJEEMF PG UFYU UP UIF MBOHVBHF JEFOUJ੗FS XIJDI SFTQPOET XJUI UIF *40 DPEF PG UIF MBOHVBHF PG UIF UFYU * UIF MBOHVBHF JT POF PG UIF 6SBMJD MBOHVBHFT XF BSF JOUFSFTUFE JO UIF DSBXMFS TFOET UIF XIPMF UFYU UP CF SFJEFOUJ੗FE *G UIJT JEFOUJ੗DBUJPO TUJMM QPJOUT UP B TNBMM 6SBMJD MBOHVBHF UIF XIPMF UFYU PG UIF QBHF JT BSDIJWFE य़F BEESFTT BOE UIF JEFOUJ੗DBUJPO SFTVMUT PG BMM DSBXMFE QBHFT JODMVEJOH UIF POFT SFKFDUFE BSF TUPSFE 8F IBWF DIPTFO UP TUBSU DPMMFDUJOH UIF NBUFSJBM CZ DSBXMJOH UIF OBUJPOBM EPNBJOT NPTU MJLFMZ UP DPOUBJO NBUFSJBM XSJ॒FO JO TNBMM 6SBMJD MBOHVBHFT JF FF ੗ OP SV BOE TF 5BCMF  TIPXT UIF TUBUJTUJDT GPS FBDI PG UIF ੗WF OBUJPOBM EPNBJO DSBXMT य़F ੗STU DPMVNO ۡ63-Tۡ JOEJDBUFT UIF UPUBM OVNCFS PG EPXOMPBEFE ੗MFT EVSJOH UIF DSBXM BOE UIF GPVSUI DPMVNO ۡEPNBJOTۡ JOEJDBUFT IPX NBOZ TVCEPNBJOT XFSF DSBXMFE PS UIF 3VTTJBO DSBXM ۡEPNBJOTۡ OVNCFST POMZ UIF UPQ MFWFM EPNBJOT BT PVS DSBXMJOH' UBDUJD IBE DIBOHFE XIFO UIF DSBXM TUBSUFE य़F TFDPOE DPMVNO ۡ-* 63-Tۡ HJWFT UIF OVNCFS PG QBHFT JEFOUJ੗FE UP DPOUBJO TNBMM 6SBMJD MBOHVBHFT EVSJOH UIF DSBXM य़F UIJSE DPMVNO ۡ ۡ JT UIF OVNCFS PG QBHFT TUJMM JEFOUJ੗FE BT 6SBMJD BॏFS B NPSF -* 63-T3.7. e Finno-Ugric Languages and e Internet project [page 92 of 131]‹http://dx.doi.org/10.7557/5.3471›

 Proceedings of 1st International Workshop in Computational Linguistics for Uralic Languages (IWCLUL 2015); ‹http://dx.doi.org/10.7557/scs.2015.2›

QSFDJTF MBOHVBHF JEFOUJ੗DBUJPO XIJDI XBT EPOF BॏFS UIF DSBXM य़F ੗ॏI BOE TJYUI DPMVNOT JOEJDBUF UIF OVNCFS PG TVCEPNBJOT JO 6SBMJD MBOHVBHFT CFGPSF BOE BॏFS UIF NPSF QSFDJTF JEFOUJ੗DBUJPO

63-T -* 63-T -* 63-T EPNBJOT -* EPNBJOT -* EPNBJOT ࣲ              TF             OP            SV              FF           

5BCMF  4UBUJTUJDT GPS UIF DSBXMT PG UIF ੗WF OBUJPOBM EPNBJOT

*O UIF GPMMPXJOH QBSBHSBQIT XF NBLF B GFX OPUFT PG UIF JOEJWJEVBM OBUJPOBM EPNBJO DSBXMT 8F XJMM CF EPJOH OFX DSBXMT GPS UIFN BMM BT NPTU PG UIF DSBXMT FOEFE CFGPSF UIF EPNBJOT XFSF SFBMMZ FYIBVTUFE *U JT BDUVBMMZ GBS GSPN USJWJBM UP EF੗OF XIFO XF IBWF FYIBVTUFE B OBUJPOBM EPNBJO य़FSF BSF NBOZ TJUFT UIBU EZOBNJDBMMZ HFOFSBUF BO JO੗OJUF OVNCFS PG XFCQBHFT BOE FWFO TVCEPNBJOT XIJDI NBLFT FBDI PG UIF OBUJPOBM EPNBJOT JO੗OJUF JO TJ[F JG XF BSF DBMDVMBUJOH UIF OVNCFS PG QBHFT PS TVC EPNBJOT $VSSFOUMZ XF IBWF TFU UIF DSBXMFS UP BDDFQU POMZ VQ UP   QBHFT QFS UPQEPNBJO &WFO UIJT EPFT OPU BMMPX VT UIF MVYVSZ UP KVTU XBJU GPS UIF FYIBVTUJPO PG UIF RVFVFE 63-T BT TPNF PG UIF TJUFT BSF WFSZ TMPX UP TFSWF UIF QBHFT BOE XBJUJOH GPS UIFN UP SFBDI UIF QBHF MJNJU DPVME UBLF NPOUIT PS FWFO ZFBST 8F BSF OPX USZJOH UP EFUFSNJOF JG UIF TQFFE PG UIF DSBXM DPVME CF VTFE BT BO BEEJUJPOBM JOEJDBUJPO PG EPNBJO FYIBVTUJPO 8F DPVME DPOTJEFS GPS FYBNQMF UIBU JG UIF IPVSMZ BWFSBHF TQFFE ESPQT CFMPX  PG UIF BWFSBHF TQFFE PG UIF ੗STU XFFL PG UIF DSBXM UIF EPNBJO JT FYIBVTUFE "T XF IBWF OPU ZFU TUBCJMJ[FE PVS DSJUFSJB GPS FYIBVTUJPO UIF DVSSFOU ੗HVSFT DBO OPU SFBMMZ CF DPNQBSFE XJUI FBDI PUIFS BOE EP OPU HJWF B SFBMJTUJD QJDUVSF PG UIF TJ[F PG UIF OBUJPOBM EPNBJOT

'JOOJTI ࣲ EPNBJO *O UIF DSBXM PG UIF 'JOOJTI JOUFSOFU XF EPXOMPBEFE BSPVOE  NJMMJPO ੗MFT य़F 'JOOJTI DSBXM XBT UFSNJOBUFE BT UIF DSBXMFS XBT SVOOJOH PVU PG EJTL TQBDF BOE UIF TQFFE IBE TMPXFE EPXO UP BSPVOE  QBHFT QFS TFDPOE य़F BWFSBHF TQFFE GPS UIF ੗STU XFFL PG UIF DSBXM XBT  QBHFT QFS TFDPOE TP XF DPVME DPOTJEFS UIF DSBXM FYIBVTUFE य़F 'JOOJTI MBOHVBHF NPEFM VTFE JO UIF MBOHVBHF JEFOUJ੗FS JT EFSJWFE GSPN UIF 'JOOJTI 8JLJQFEJB XIJDI JT NPTUMZ XSJ॒FO JO UIF Pਖ਼DJBM GPSN PG XSJ॒FO 'JOOJTI )PXFWFS NBOZ QFPQMF EP XSJUF UFYUT VTJOH UIF GPSNT PG UIFJS SF TQFDUJWF EJBMFDUT य़F XSJ॒FO GPSNT PG 5PSOFEBMFO 'JOOJTI ,WFO 'JOOJTI BOE *OHSJBO BSF NVDI DMPTFS UP UIFTF XSJ॒FO 'JOOJTI EJBMFDUT UIBO UIF Pਖ਼DJBM XSJ॒FO 'JOOJTI य़JT DSFBUFT B QSPCMFN BT B HSFBU OVNCFS PG UFYUT JO EJBMFDUBM 'JOOJTI BSF JEFOUJ੗FE BT UIFTF UISFF MBOHVBHFT य़JT DPVME CF DPSSFDUFE CZ DSFBUJOH TFQBSBUF MBOHVBHF NPEFMT 3.7. e Finno-Ugric Languages and e Internet project [page 93 of 131]‹http://dx.doi.org/10.7557/5.3471›

 Proceedings of 1st International Workshop in Computational Linguistics for Uralic Languages (IWCLUL 2015); ‹http://dx.doi.org/10.7557/scs.2015.2›

GPS XSJ॒FO EJBMFDUT PG 'JOOJTI

4XFEJTI TF EPNBJO *O UIF DSBXM PG UIF 4XFEJTI JOUFSOFU XF EPXOMPBEFE BSPVOE  NJMMJPO ੗MFT BOE JU XBT UFSNJOBUFE BॏFS UIF TQFFE PG UIF DSBXM IBE TMPXFE UP BSPVOE  QBHFT QFS TFDPOE XIJDI JT XFMM CFMPX  PG UIF  QBHFT QFS TFDPOE BWFSBHF GPS UIF ੗STU XFFL PG UIF DSBXM *O UIJT DSBXM UIF MJCSBSZ TZTUFNT XIJDI XFSF MPDBMJ[FE GPS /PSUIFSO 4BNJ UVSOFE PVU UP CF B QSPCMFN 0WFS  EPNBJOT EFEJDBUFE UP MJCSBSZ TZTUFNT XFSF GPVOE CZ UIF DSBXMFS UIF MBSHFTU PG UIFN CJCMJPUFLOPSBTF XJUI   QBHFT JO /PSUIFSO 4BNJ /PU POMZ EP UIF MJCSBSZ DBUBMPHVF MJOLT FYQJSF RVJDLMZ UIFZ VTVBMMZ JODMVEF UIF TBNF UFYU PWFS BOE PWFS BHBJO 8F XJMM IBWF UP JODPSQPSBUF B EPVCMFDIFDLJOH NFDIBOJTN CFGPSF DSFBUJOH UIF MJOL DPMMFDUJPOT BOE DPSQPSB JO PSEFS UP BWPJE DPMMFDUJOH UIF TBNF UFYU NBOZ UJNFT 4PNF NFUIPET GPS SFNPWJOH EPVCMFT BSF JOUSPEVDFE JO <> BOE <>

/PSXFHJBO OP EPNBJO 3VTTJBO SV EPNBJO BOE &TUPOJBO FF EPNBJO "MM PG UIF UISFF DSBXMT FOEFE JO QSPCMFNT XJUI FJUIFS TPॏXBSF IBSEXBSF PS UIF DSBXM TUSBUFHZ VTFE य़F OBUJPOBM EPNBJOT XFSF GBS GSPN FYIBVTUFE CZ BOZ DSJUFSJB XF IBWF DPO TJEFSFE "T NPTU PG UIF QBHFT XSJ॒FO JO 6SBMJD MBOHVBHFT IBWF CFFO GPVOE JO PVS DSBXM PG UIF /PSXFHJBO JOUFSOFU XF BSF JODMVEJOH UIF TUBUJTUJDT GPS UIFTF DSBXMT JO UIF 5BCMFT  BOE 

 ࠬF MJOL DPMMFDUJPO य़F MJOL DPMMFDUJPO UIBU JT BWBJMBCMF BU UIF UJNF UIJT XBT XSJ॒FO IBT CFFO DVSBUFE CZ IBOE GSPN UIF QBHFT PG UIF ੗ DSBXM܇ *U DPOUBJOT MJOLT UP  TJUFT GSPN XIJDI UFYU XBT GPVOE JO  PG UIF  TNBMM 6SBMJD MBOHVBHFT TFBSDIFE य़F MJOLT IBWF OPU CFFO WFSJ੗FE CZ FYQFSUT PS OBUJWF TQFBLFST 8F BSF QMBOOJOH UP JODPSQPSBUF B TJNQMF DSPXE TPVSDJOH QMBUGPSN UP CF BCMF UP HFU GFFECBDL GSPN UIPTF XIP BSF NPSF GBNJMJBS XJUI UIF MBOHVBHFT य़F MJOLT MFBE UP UIF BDUVBM QBHFT DVSSFOUMZ GPVOE PO UIF JOUFSOFU TP JU JT DFSUBJO UIBU TPNF PG UIF MJOLT XJMM CSFBL XIJMF UJNF QBTTFT 8F XJMM OPU SFNPWF UIF CSPLFO MJOLT DPNQMFUFMZ GSPN UIF EBUBCBTF CVU NPWF UIFN FMTFXIFSF BOE JG QPTTJCMF NBLF MJOLT UP DPSSFTQPOEJOH QBHFT JO UIF *OUFSOFU "SDIJWF 0VS HPBM JT UP NBLF UIF DSFBUJPO PG UIF MJOL DPMMFDUJPO BT BVUPNBUFE BT QPTTJCMF BWPJEJOH NBOVBM MJOL DVSBUJPO य़F MJTU PG TJUFT GSPN UIF 'JOOJTI DSBXM BWBJMBCMF BU UIF NPNFOU JODMVEFT POMZ UIF GSPOU QBHF PG TPNF TJUFT BMUIPVHI NPSF QBHFT XFSF GPVOE EVSJOH UIF DSBXM *O UIF GVUVSF BMM UIF MJOLT GPVOE XIFO DSBXMJOH XJMM CF JO UIF MJTU PG

܇?iiT,ffbmFBXHBM;X?2HbBMFBX7BfbBi2b

3.7. e Finno-Ugric Languages and e Internet project [page 94 of 131]‹http://dx.doi.org/10.7557/5.3471›

 Proceedings of 1st International Workshop in Computational Linguistics for Uralic Languages (IWCLUL 2015); ‹http://dx.doi.org/10.7557/scs.2015.2›

MJOLT य़F HSFBUFTU QSPCMFNT XJMM BSJTF GSPN UIF QBHFT XIJDI BSF XSJ॒FO JO B DPSSFDUMZ JEFOUJ੗FE MBOHVBHF CVU BSF OFBSEPVCMFT PG PUIFS QBHFT BT JO UIF DBTF PG UIF 4XFEJTI MJCSBSZ TZTUFNT NFOUJPOFE BCPWF

 4FOUFODF DPSQPSB 8IFO XF BSF DSFBUJOH B TFOUFODF DPSQPSB POF PG UIF HSFBUFTU QSPCMFNT XF IBWF BU UIF NPNFOU JT UIBU NBOZ PG UIF EPXOMPBEFE QBHFT BSF NVMUJMJOHVBM 8F BSF DVSSFOUMZ NBLJOH B TVSWFZ PG UIF NFUIPET GPS MBOHVBHF JEFOUJ੗DBUJPO JO NVMUJMJOHVBM EPDVNFOUT BOE JO GVUVSF XF XJMM JODPSQPSBUF B NVMUJMJOHVBM EFUFDUJPO NFUIPE JO UIF TZTUFN 8F EJE B TFQBSBUF MBOHVBHF JEFOUJ੗DBUJPO GPS BMM UIF MJOFT PG BMM UIF ੗MFT DPOUBJOJOH TNBMM 6SBMJD MBOHVBHFT JO PSEFS UP TFF XIJDI POFT BSF JOEFFE XSJ॒FO JO UIF MBOHVBHF JOEJ DBUFE CZ UIF JEFOUJ੗DBUJPO PG UIF ੗MF BT B XIPMF य़F ੗STU DPMVNO PG 5BCMF  TIPXT UIF OVNCFS PG VOJRVF MJOFT JEFOUJ੗FE BT XSJ॒FO JO UIF SFTQFDUJWF MBOHVBHF य़F DPMVNOT  BOE  TIPX UIF UPUBM OVNCFS PG XPSET BOE DIBSBDUFST JO UIFTF MJOFT &WFO UIPVHI UIF MBOHVBHF JEFOUJ੗DBUJPO VTFE JT TUBUF PG UIF BSU JU JT GBS GSPN QFS GFDU य़F DPMMFDUJPOT IBWF OPU CFFO DIFDLFE CZ FYQFSUT JO UIF DPSSFTQPOEJOH MBO HVBHFT CVU TPNF UIJOHT BSF DMFBS FWFO GPS B MBZNBO य़F UISFF TNBMMFTU DPMMFDUJPOT 4FMLVQ /HBOBTBO BOE 5VOESB &OFUT EP OPU BDUVBMMZ DPOUBJO UIF JOUFOEFE MBOHVBHF BU BMM CVU BSF NPTUMZ TPNF TPSU PG MJTUT PG NPEFM OVNCFST JO $ZSJMMJD GPS /HBOBTBO BOE 5VOESB &OFUT य़F 4FMLVQ DPMMFDUJPO DPOTJTUT PG QBHFT GSFRVFOUFE XJUI UIF XPSE ܈ XIJDI JT B WFSZ GSFRVFOU XPSE JO UIF USBJOJOH UFYU VTFE GPS 4FMLVQ ۡBSUJDMFۡ ξϟύϟϩϬۡۡ य़F *OHSJBO DPMMFDUJPO DPOUBJOT NPTUMZ IZQIFOBUFE PS PUIFSXJTF CSPLFO 'JOOJTI PS 'JOOJTI EJBMFDUT XSJ॒FO BT TQPLFO &TQFDJBMMZ TPVUIXFTUFSO 'JOOJTI EJBMFDUT TFFN UP CF JEFOUJ੗FE BT *OHSJBO /PSUIFSO 'JOOJTI EJBMFDUT BSF JEFOUJ੗FE BT FJUIFS ,WFO PS 5PSOFEBMFO 'JOOJTI *O PSEFS UP ੗Y UIFTF QSPCMFNT XJUI EJBMFDUBM 'JOOJTI XF XJMM USZ UP JODMVEF TFQBSBUF MBOHVBHF NPEFMT GPS EJBMFDUBM 'JOOJTI JO UIF GVUVSF 4PNF MPOH MJTUT PG OBNFT GSPN UIF 4XFEJTI DSBXM IBWF CFFO JEFOUJ੗FE BT -VEJBO BOE BSF OPX QPMMVUJOH UIF DPMMFDUJPO PG UIF MBOHVBHF य़F ,IBOUZ DPMMFDUJPO JT QPMMVUFE CZ MPOH MJTUT PG NPEFM OVNCFST XSJ॒FO JO $ZSJMMJD TDSJQU

'VUVSF XPSL -BOHVBHF JEFOUJ੗DBUJPO NFUIPET XJMM CF GVSUIFS EFWFMPQFE JO PSEFS UP JNQSPWF UIF SPCVTUOFTT PG UIF MBOHVBHF JEFOUJ੗FS XF VTF 8F XJMM BMTP USZ UP FOIBODF UIF MBOHVBHF NPEFMT JO PSEFS UP NPSF Fਖ਼DJFOUMZ EJTUJOHVJTI TNBMM MBOHVBHFT GSPN WBSJPVT EJBMFDUT

܈rrrXvKH+?BH/X`mf/+bfFQMpnb2HFmTX/Q+ 3.7. e Finno-Ugric Languages and e Internet project [page 95 of 131]‹http://dx.doi.org/10.7557/5.3471›

 Proceedings of 1st International Workshop in Computational Linguistics for Uralic Languages (IWCLUL 2015); ‹http://dx.doi.org/10.7557/scs.2015.2›

VOJRVF MJOFT XPSET ࠩBSBDUFST /PSUIFSO 4BNJ TNF         7µSP WSP         *OHSJBO J[I         &BTUFSO .BSJ NIS         8FTUFSO .BSJ NSK        4PVUIFSO 4BNJ TNB        6ENVSU VEN        &S[ZB NZW        -VMF 4BNJ TNK        *OBSJ 4BNJ TNO        5PSOFEBMFO 'JOOJTI ࣲU        .PLTIB NEG        ,PNJ;ZSJBO LQW        4LPMU 4BNJ TNT        -JWWJ PMP       -JW MJW       ,WFO 'JOOJTI GLW       -VEJBO MVE       ,IBOUZ LDB       7FQT WFQ       ,PNJ1FSNZBL LPJ       ,BSFMJBO LSM      .BOTJ NOT      7PUJD WPU      ,JMEJO 4BNJ TKE      6NF 4BNJ TKV      /FOFUT ZSL      4FMLVQ TFM      /HBOBTBO OJP     5VOESB &OFUT FOI   

5BCMF  य़F OVNCFS PG MJOFT XPSET BOE DIBSBDUFST JO TNBMM 6SBMJD MBOHVBHFT BॏFS MBOHVBHF JEFOUJGZJOH FBDI JOEJWJEVBM MJOF

BOE UP JEFOUJGZ MBOHVBHFT JO NVMUJMJOHVBM EPDVNFOUT य़F NBUFSJBM GPVOE EVSJOH UIF BMSFBEZ QFSGPSNFE DSBXMT XJMM CF PG BTTJTUBODF GPS UIJT 8F XJMM GVSUIFSNPSF USZ UP JODSFBTF UIF TQFFE PG UIF DSBXMFS JO PSEFS UP DSBXM NPSF XJEFMZ BOE NPSF PॏFO य़F NPTU JNQPSUBOU OBUJPOBM EPNBJOT JO SFHBSE UP UIF 6SBMJD MBOHVBHF TQFBLFST XJMM CF SFDSBXMFE XJUI NPSF EFQUI BOE NPSF GSFRVFODZ 8F BMTP JOUFOE UP MPPL JOUP DSBXMJOH UIF DPN BOE PSH EPNBJOT 8F XPVME BMTP MJLF UP FYUSBDU UFYU GSPN PUIFS CJOBSZ ੗MFT UIBO QEGT

"ࠨOPXMFEHNFOUT 8F BSF UIBOLGVM GPS UIF TVQQPSU PG UIF ,POF 'PVOEBUJPO BOE UP +BDL 3VFUFS GPS TIBSJOH IJT JOWBMVBCMF SFTPVSDFT JO 'JOOP6HSJD MBOHVBHFT 8F BMTP UIBOL UIF BOPOZNPVT SFWJFXFST GPS UIFJS TVHHFTUJPOT BOE SFGFSFODFT

3.7. e Finno-Ugric Languages and e Internet project [page 96 of 131]‹http://dx.doi.org/10.7557/5.3471›

 Proceedings of 1st International Workshop in Computational Linguistics for Uralic Languages (IWCLUL 2015); ‹http://dx.doi.org/10.7557/scs.2015.2›

3FGFSFODFT <> ,POF 'PVOEBUJPO य़F MBOHVBHF QSPHSBNNF  I॒QXXXLPOFFOTBBUJP੗FO  <> . #BSPOJ 4 #FSOBSEJOJ " 'FSSBSFTJ BOE & ;BODIF॒B य़F XBDLZ XJEF XFC " DPMMFDUJPO PG WFSZ MBSHF MJOHVJTUJDBMMZ QSPDFTTFE XFCDSBXMFE DPSQPSB -BOHVBHF  ۗ   3FTPVSDFT BOE &WBMVBUJPO <> 7MBEJN­S #FOLP "SBOFB :FU BOPUIFS GBNJMZ PG DPNQBSBCMF XFC DPSQPSB *O #SOP  QBHFT ۗ  54% 1SPDFFEJOHT PG UI *OUFSOBUJPOBM $POGFSFODF $[FDI 3FQVCMJD  T UIF۝5PNNJ +BVIJBJOFO 5FLTUJO LJFMFO BVUPNBB॒JOFO UVOOJTUBNJOFO .BTUFS <> TJT 6OJWFSTJUZ PG )FMTJOLJ )FMTJOLJ  <> 5PNNJ +BVIJBJOFO BOE ,SJTUFS -JOE©O *EFOUJGZJOH UIF MBOHVBHF PG EJHJUBM UFYU *O SFWJFX TVCNJ॒FE   <> 8JMMJBN # $BWOBS BOE +PIO . 5SFOLMF /HSBNCBTFE UFYU DBUFHPSJ[BUJPO *O 1SPDFFEJOHT PG 4%"*3 SE "OOVBM 4ZNQPTJVN PO %PDVNFOU "OBMZTJT BOE *O  -BT 7FHBT  QBHFT ۗ GPSNBUJPO 3FUSJFWBM <> &SJL 5SPNQ BOE .ZLPMB 1FDIFOJ[LJZ (SBQICBTFE OHSBN MBOHVBHF JEFOUJ੗DB UJPO PO TIPSU UFYUT *O #FOFMFBSO   1SPDFFEJOHT PG UIF 5XFOUJFUI #FMHJBO %VUDI  य़F )BHVF  QBHFT ۗ POGFSFODF PO .BDIJOF -FBSOJOH$ <> +PIO 7PHFM BOE %BWJE 5SFTOFS,JSTDI 3PCVTU MBOHVBHF JEFOUJ੗DBUJPO JO TIPSU OPJTZ UFYUT *NQSPWFNFOUT UP MJHB *O ࡋF ࡋJSE *OUFSOBUJPOBM 8PSLTIPQ PO .JOJOH  #SJTUPM  QBHFT ۗ 6CJRVJUPVT BOE 4PDJBM &OWJSPONFOUT <> +PTI ,JOH BOE +PO %FIEBSJ "O OHSBN CBTFE MBOHVBHF JEFOUJ੗DBUJPO TZTUFN य़F 0IJP 4UBUF 6OJWFSTJUZ  <> 3BMG % #SPXO 4FMFDUJOH BOE XFJHIUJOH OHSBNT UP JEFOUJGZ  MBOHVBHFT *O 5FYU 4QFFDI BOE %JBMPHVF UI *OUFSOBUJPOBM $POGFSFODF 54%  1JMTFO $[FDI  1JMTFO  QBHFT ۗ 4FQUFNCFS  1SPDFFEJOHT 3FQVCMJD <> 5PNNJ 7BUBOFO +BBLLP + 7¤ZSZOFO BOE 4BNJ 7JSQJPKB -BOHVBHF JEFOUJ੗DBUJPO PG TIPSU UFYU TFHNFOUT XJUI OHSBN NPEFMT *O -3&$  4FWFOUI *OUFSOBUJPOBM  .BMUB  QBHFT ۗ POGFSFODF PO -BOHVBHF 3FTPVSDFT BOE &WBMVBUJPO$

3.7. e Finno-Ugric Languages and e Internet project [page 97 of 131]‹http://dx.doi.org/10.7557/5.3471›

 Proceedings of 1st International Workshop in Computational Linguistics for Uralic Languages (IWCLUL 2015); ‹http://dx.doi.org/10.7557/scs.2015.2›

<> . 1BVM -FXJT (BSZ ' 4JNPOT BOE $IBSMFT % 'FOOJH FEJUPST &UIOPMPHVF -BO HVBHFT PG UIF XPSME TFWFOUFFOUI FEJUJPO 4*- *OUFSOBUJPOBM %BMMBT 5FYBT  <> (PSEPO .PIS .JDIBFM 4UBDL *HPS 3OJUPWJD %BO "WFSZ BOE .JDIFMF ,JNQUPO *OUSPEVDUJPO UP IFSJUSJY *O UI *OUFSOBUJPOBM 8FC "SDIJWJOH 8PSLTIPQ #BUI 

3.7. e Finno-Ugric Languages and e Internet project [page 98 of 131]‹http://dx.doi.org/10.7557/5.3471›