Quick viewing(Text Mode)

Automatic Recognition of Chinese Personal Name Based on Role

Automatic Recognition of Chinese Personal Name Based on Role

 ȖzᢖᄶʃͼՈę¦ȑႮ¬᪊/ۘऺ

ŬŒ¿   ๨   

(  ę࢕ƚ┦ᩥਜ਼äƫۘऺ¤ḳâۘऺǘ ě˜) ( 2 , (1 2) (ě˜̓ƚǍʻƚ┦ᩥਜ਼ƶிᩥਜ਼᪱ᣄ¤ ě˜ 100871 ) ʼᡅ ę¦ȑႮ¬᪊/«ƦԿơ᪑᪊/Ոₑͥʐ▂ͥ ֲQՈᢧΟ5˄ႮᵯƌńP‡ƨᯬՈ෾┻ ǒ┉ά Șẜ▂Ñƍᱷ◄˖4ƨȴϦrP࢑ȖzᢖᄶʃͼՈę¦ȑႮ¬᪊/5ͩ ͢Ȗƨɩ̿«ʵǒń¦ȑ᪊/ ՈňϬ ₋ǚ Viterbi ਜ਼ͩȭ ᪑඗Șẟᜐᢖᄶʃͼ ńᢖᄶÛՈȖ܄Z ẟᜐµśż̓Ľ‑ żඌǒɴę¦ ȑՈ᪊/4᪊/ẋ࣏uØǮ◄ᡅɌ᪑ň&ĽǎᢖᄶՈɋÑǎᢖᄶ7ⒸՈḰࢿɋ4᪩5ͩՈǒϬɳẜńz ǒ᪱ßՈȵ⒱^Ō΢ϟ᪙ ᪩5ͩףǒ᪱ßႮ¬ġǚǣ44Ởẋȭ 16M Ƌᅆףẝ‡ᢖᄶǍʻǀ͔dzѺ ǚǣrȉẕ 98% Ոǰúɋ4ᩥਜ਼¤˝᪱᪑ͩ Ȍிඣ ICTCLAS ▊t¦ȑ᪊/ਜ਼ͩ7Ȓ ᪑ͩ ȌՈβܲɋȴʌ r 1.41% Ȑr¦ȑ᪊/ՈවȌūʃ F-1 ȨẂ4r 95.40% 4]ȐǒɀºȈᢖòᜬŠȖzᢖᄶʃͼՈ¦ȑ᪊/ ਜ਼ͩᜐ7ƅά4 ͟ ⏲ ᪑ ę¦ȑ᪊/ ƦԿơ᪑᪊/ ᢖᄶʃͼ Viterbi ਜ਼ͩ Ěͩ ିǻ TP3

Automatic Recognition of Chinese Person al Name Based on Role Tagging

    ZH ANG -ping , L IU Qun

1) ,2 ) (Software division , Institute of Computing Technology , The Chinese Academy of Sciences , Beijing , 100080) 2) ( Inst. of Computational Linguistics, Dept. of Computer Sci. Tech., Information Sci School, Peking University , Beijing, 100 871 ) Abstract: Automatic recognition of Chinese person al name is emphasis and difficulty for unknown words recognition. Because of their inherent deficiencie s, previous solutions are not satisfactory. This paper presents a n approach for Chinese person al name recognition based on role tagging. That is: tokens after segmentation are tagged using Viterbi algorithm with different roles according to their functions in the generation of Chinese person al name; The possible names are recognized after maximum pattern matching on the roles sequence. During the recognition process, only the possibilities of tokens being specific roles and the transition possibilities betw een roles are required . The significance is that such lexical knowledge can be totally extracted from corpus automatically. In both close and open test on a 16 -Mbyte realistic corpus , its recalling rate is nearly 98%. After combine d with the algorithm for person al name recognition, Our Chinese lexical analysis system ICTCLAS improves 1.41% in

ᰈ­ -ŬŒ¿ ,ϻ ʐᩥਜ਼¤:ǻQ͓☖ÀȖₕ-ֲ *  ऺ-ֲۘ܄ƨ᫂Lǣ4ęǪₑͥȖ ๨ ϻ 1966 Àϣ ń༐Ş̇ۘऺϣ sۘऺÀϣ , Ş̇ۘऺϣ 'ᡅۘऺ5ȕ&ᩥਜ਼᪱ᣄƚ Ǎʻ̠ˊ^Ǎʻġǚ 4 1978 ɜ 'ᡅۘऺ5ȕ&ƶ„຿᪕ ႮϢ᪱ᣄ̠ˊ^Ǎʻ̠ˊ 4 performance while the agglomerative evaluation argument F -1 value of person recognition achieve 95.40%. Various experiments show that: our role -based algorithm is effective for Chinese person al name recognition. Keywords: Chines e person al name recognition; unknown words recognition; role tagging; Viterbi algorithm.

1 šᣄ

ȌՈۘऺ>ඓǚǣṇ̓ẟʉ IJń̠ˊȯ Q᪑ֲͩ ܄Ȍ«ႮϢ᪱ᣄ̠ˊՈQȴʐȖ ᪑ͩ ƅƦԿơ᪑Ոƨr ͢඗Șǔ▂ƍᱷǒ┉◄˖ 4ƦԿơ᪑njnj^͢QȒՈƋ᪑ǍඈȌ ]±ʺ¤rႮᵯ ȌՈǷܲɋ 4 Ȍ/ႷϘǩƄ º໐̓̓Ո┑ĺr᪑ͩ Ո▂ò ໐FₑŌ¾Ôrּ὿᪑ՈǷܲ ƦԿơ᪑⒲Lǒ┉Z>ඓt&r᪑ͩ ȌǒϬĚՈ'ᡅκ< 4

ę¦ȑ 1ńƦԿơ᪑Ťƅṇ̓ɨₑ K«ƦԿơ᪑᪊/Ո'ᡅ▂ͥ 4ń¦ʥaĉ 1998 À 1 ƄՈ᪱ ß͝ᩥ 2,305 ,896 Ƌ  ¿ţɣ 100 ƋĉȯƦԿơ᪑ 1.192 (]ᩥϔ᪑ 3rⒸ᪑ ) ͢ 48.6% ՈƦԿ ơ᪑«ę¦ȑ 4863 -306 ö࿁ȉǧäƫcǪඈ 98 ÀòȭęͱႮ¬ ᪑ḳâՈ᪈ϟ඗ȘᜬŠ [1] ę¦ȑ᪊ /Ոǰúɋ±& 68.77% ͢ ⏝᪳ʌẂ 50% ÑZ ȭ¤ƅ ᪑⏝᪳ẟᜐඣᩥ ȑ⏝᪳ŤrȺẕ 90% [2] 4 üǸę¦ȑՈႮ¬᪊/«ƦԿơ᪑᪊/Ոₑͥʐ͟⏲ ę¦ȑ᪊/⒲LՈᢧΟȑϢĆȴʌ᪑ͩ Ȍ 3ǩ ͩ Ȍ/ႷϘǍʻ̠ˊՈżඌᯬₓ 4

1.1 ę¦ȑႮ¬᪊/ՈČ▂ ę¦ȑϔₓă̶ ᢈǗȈŎ ƅǔ̓Ո╓͛ɳ 4ȭ͢ẟᜐ᪊/Ո'ᡅČ▂ńz 1ę¦ȑȀtՈ ʳɳ 2¦ȑͱᾬּ~t᪑ 3¦ȑ^͢Z[ඈȌt᪑ 4ǻ5ˊᢧ 4̶ 1ę¦ȑ Ȁt ՈƮśƅ  1 +ȑ ΞŬŒ¿ 3Ŭ 3ᡃ⒬Ƚ▮ 3᪼ጟš  2 ƅȑ\ Ξ 8¡ᅵ ͥͥ͐ 9 8Ǭ ŌΙț 93 ƅ\ȑ Ξ8ࢴ᰹>ࡿŌ˳ᡃ 9 (4) +QȒහ Ξ ʇ 3 Ŭ໅ 3ɃNJ 3ήɌ (5) ÛɟǴ਍Ō>ŽΣΏՈȑ Ξᇇǜ)΄ 3ƹŬ☖ 4 ȑͱᾬּ~t᪑ ūՈ«^ȑ 3ȑ^ȑ7Ⓒƨᵯɥ«P>ඓᝯʴȏ᪑ͤΚơՈ᪑ 4Ξ8¦χ¦2 ȒǑɍ ú4ɭō┦ 9 [ɏę ]ම3[ʌ̤ ]3[˾Ο ]3Ŭ[ƙ⓷ ]4ʵǒuØȭ 80 ,000 ǝ¦ȑՈඣᩥ ͱᾬt᪑Ո ɨŷʌẂ 6.89% 4 3¦ȑ^͢Z[ඈȌt᪑ĉŐ¦ȑՈŊᾬ zȑՈŊƋ ^¦ȑՈZt᪑Ñǎ¦ȑՈɲᾬ  zȑՈƧƋ ^[t᪑ 4ŷΞ 8 ẝₐ [ƅ͟]ͅȕ Ո̊ʹwẽ 98 ᯽Ƒ Ởȕ ¦̓„ĆȴR☦ĉ Ɏ94 ń ¦ʥaĉ 1998 À 1 ƄՈ᪱ß ẝ࢑̑Ρ ȉẕ 200 ŷ4 Ð 9ƌńę¦ȑ^ŌȑÐ 9Ո 8 4ǻ5ˊᢧ'ᡅ« ϵȐļ ǻ5 Ξॅ [3] š᰻ ՈŷΞ 8 ͇ěׅ ʐ 9Ոǻ5 [4] 4؃9ʐ8ɬ ؃ʐ Ȑƚ 9ƌń¦ȑ 8ɬ؃Ո࢑ ǻ5ˊᢧ 8 ɬ 1.2 ɴƅᢧΟ5˄ǎ͢]ᱷ ⍌ȭę¦ȑՈႮ¬ ᪊/⒲L ¦Ø>ඓ ňẋ͑ ՈȆ௦ ÂȴϦr̶ ࢑ᢧΟ 5˄ 4ʵǒ͢ ūϬՈ 5ͩ ^ඣᩥּ඗ȌՈ 5ͩ[1,7,8] 4 5 ͩ[2,4,5] 3ඣᩥ 5ͩ[6] Ñǎᢈ& Y࢑ ᢈ Ȑ ẝ‡5˄ ̓Ⴘdz Ñ[ ᢈ5 ͩ'ᡅ« -Ϭ࢑ Ǎʻ ʣϬƋ ି[5] ʐ└: ɳt [8] 4ŷ Ȍ ẋ࣏  ƟÏȳ 4ͣƅŠ º ȭȑՈQȒ Ĺา ẟᜐ └: 4Ƀ ĽǍ ՈȑϬƋ ŌϧᢪǕ ȑՈ᪊/ ẋ࣏ Â₋▊ ȑQȒּ͟Ոt ᢈµϟ᪙Ո඗ȘᜬŠ ͢βܲɋ dzÑʌẂ 97% [4] 4ń෾; ̓ᢈ µc ᪱ßՈr ȅ ᢈĨ: «̳PdzᜐՈ 5 ͩ4ඣᩥ 5ͩ'ᡅ« ⍌ȭȑ᪱ß ǡᩱඇɌ Ƌ ň&ȑඈtᾬ Ո ɋ ÂϬƷØǡᩥਜ਼Ɍ ȅọ Ƌɉ ^ඣᩥּ඗Ȍ 5ͩ P5☦Ởň&ȑՈ ɋȨ ͢ ɋȨ̓z ɌPⓌȨ ՈƋ ɉ᪊/&ę¦ȑ [6] 4ᢈ

1ẝₐ¤ ᪸Ոę¦ȑĉŐę̓ ┊3ÛɟǴŌ ľՈ˝K ¦ȑÑǎ ⚭ęƙѐ ਍ŌՈ ☢⚷᪕ ¦ȑ 4 Ո̩Ϭ ǡ┑ĺඣᩥ 5ͩȭ᪱ßᢈ µՈ5 ͩՈ ̩ƾ ɳ^ ֲֶɳ ǪP5☦Ởẋ ᢈẋ ɋᩥ ਜ਼ǡλɅ ᢈ ^ඣᩥՈ] ȐƓ ₑ4 ^ඣᩥּ඗ȌՈ 5ͩ ]Ȑ7̠±ńzᢈ ᡅ˖ 4ֲQՈۘऺȖƨZ ῁«₋ǚᢈ ɴƅᢧΟ 5˄ ƨᵯ ƌńP ‡Ė ƅՈ]ᱷ Ŋ̴ ƷØP ჰ῁₋ ǚ8řͥŊzɲ ɬϏ 9[4] Ոƶ:ǡᢪ Ǖ¦ȑՈ᪊/̠ˊ 4ŷÏȳ 4 ʣϬƋ 3༐᜘ 3ࢴʀ਍ͣƅŠ ºȑ ĽǍ ՈƋ ɉr ±ĆȺQȒՈ όƋ  &ȅọ ȑƋ ɉẟᜐ¦ȑՈ᪊/ 4ẝʳnjnjĆ ͍ὧ‡ ]̣ͣ ŠºĽǍ Ոȑ ΞZȴ4Ո 8ƅȑ\ 9 ۘऺ ້ȺɅₓՈ xƋz̶Ƌ᪑ ൷ ȒՈ řƋݒċ [1,2,4,6] Kƅᾬ Ո̑Ρ 4͢Ƶ ȑ ȅọ Ƌɉ̓῁«ọǚ ͑ȅọ ƋɉՈọǚᇇĐ [4] 4ńẝ ࢑ọ ǚƶ: ՈňϬ[ ͱᾬt᪑Ñǎ^Z[t᪑Ո¦ȑɥǔ▂ǰú 4ʵǒ njnj ˔ݒ ZՈඣᩥϔǒ ϵzẝ ࢑ƶ: ¤š᰻ Ոǰúɋ ǃ͍ ]Ƀz 10% 4͹້ ¦ȑ᪊/¤ϬՈᢈ ǒ᪱ß Ⱥȑף PჰÏã~᯹ ໐F▂Ñ Íʉ4ŷΞ Dz[4]  ۘऺ ້ɥ«º 10 Wǝ¦ȑß 32 «ƋՈ  ǒ᪱ß ẜ«ȴ ͨ᪊/ᢈף ^4\ ᩾«Κ ▊ᢈµ4 ̓Ո¦ȑß &r 9 ି Âʇ ඗r 21 ǝ᪊/ᢈ ϬƋ P᯽r᯽ ŸՈ̓1࣏ «P ჰƶ Ȁ▂Ñ ãǛ Ո4Pbʺ¤r ,ĽǍ Ո¦ȑ ẜȑ/ʺ¤ּ àՈ,ᢈ»῁ dz Ñlj᪅ ǔʌՈ βܲɋ IJ« çŁ἗ ᢈک 5 ͩǔ▂ Íʉ4uØK ₑ,ǚᩦ üǸᢈ ȭÑQՈᢈ  Ŀி Ոᡊ֚ᇇĐ῁ «ƅ └Ո ȭz ᡊ֚▊ 7̲Ո¦ȑɥ ǀ͔ \࿁& Ÿ4ᢈ ƨȴ ƇP࢑dz Ñὃ̹ ZẴ]ᱷՈᢧΟ 5˄ 44 Ȗz ᢖᄶʃͼ Ոę¦ȑႮ¬᪊/ 5ͩ4᪩5 ͩ'ᡅ ₋ Ϭ╔ȠɈ࢕͇µƧ ń ᪑඗ȘZ ʃͼ ¦ȑȀt ᢖᄶ ϢȒń ʃͼϦ ՈᢖᄶÛ Ȗ܄ZʵǒȈ] ȐՈᢖᄶ ẟᜐż ⑃µ śĽ‑ żඌ᪊/ Ϧ¦ȑ 4ɌƋ᪑¦ȑȀt ᢖᄶ Ո:ǎƉ ǒ« ᪩Ƌ᪑ń¦ȑȀt¤ ᰻Ո] Ȑň Ϭ Ξ3ȑ3Z 3[਍ 4ᢖᄶʃͼ ¤◄ՈƋ᪑ ň&] Ȑᢖᄶ ՈϦɴ ɋʐ ᢖᄶ ⒸՈ Ḱࢿ ɋ ῁«ń᪱ß ᩱඇẋ࣏ Ⴎ¬ ġǚՈ 4Ȗz ᢖᄶʃͼ Ո¦ȑ᪊/ ਜ਼ͩՈ Ľᄶẜ ńz Ⴎ¬ ƚL 3Ⴎ¬᪊/ 1Ոָȉ¾ 8 Νǜᩱඇ ʳƨ ɥdzÑỆà, Ո̑Ρ PƵÏȳ \◄ú ś Ȑrọǚ¤ƅ dz࿁Ƌ ɉ ¦◄\ Šǒ᪱ßZՈ] Ȑϟ᪙ǒɀᜬ ף Û Z \◄ ɬϏ 4ń̓ᢈ µ ň&ȅọ ȑ ᪊/̠ˊ ňϬńϘ Ȍ ᪩5 ͩ࿁ǚ ǣṇʌՈǰúɋ 4żȒ uØȺ ᢖᄶʃͼ Ո¦ȑ᪊/ àϬ4r ࢕┦ᩥ ਜ਼¤ۘ :Ո˝᪱᪑ͩ ிඣ ICTCLAS  àϬ¦ȑ᪊/QȒՈȭɨǒ ɀ᪸ Š¦ȑ᪊/ ǽ̓Ոȴʌr᪑ͩ Ȍɳ࿁ ¦ȑ᪊/K ǚǣrṇΙՈ වȌɳ࿁ 4 ƨ ৰxᅆ ȺⓔẴ᪩5 ͩՈˊ ᩾Ɖ ǒ ╓Ȓ ඝϦͣĿ Ոǒ ɴਜ਼ ͩ żȒȴ Ƈ¦ȑ᪊/] ȐՈ᪈ϟǒ ɀ ẟᜐ඗Ș Ȍ 4

2 ȖzᢖᄶʃͼՈę¦ȑႮ¬᪊/5ͩ

2.1 ę¦ȑՈȀtᢖᄶ ę¦ȑϬƋ^Z[Ϭ᪑ ῁ɨṇ ▊ ƅǔ ƆՈᢈǗɳ 4ń 83 ,077 ǝ¦ȑß  ʣϬƋ±ƅ 820  ͢ ɏ3Ŭ3NJY ̓ ɥŤr 20% 20 ,631 řȑ řȑϬƋ& 1,489 ; ǐȑՈŊƋ^ƧƋϔₓţ] 4 2000 4¦ȑՈZ[Ϭ᪑ ᇇĐ Kǔƅ └4ZP ჰ«ࢴ ʀ3༐᜘ ʐP ‡Ả ᪑3¬᪑਍ Ξ 8 ʇඣ938' ç938 ·938 ȕ9਍4[̶̓« 8᪸938 ᜬ ߾938 Ȑȣ 97ିՈ¬᪑ʐࢴ ᫗᪑4 ńẝₐ uØȺPǩƄ¤ƅՈ᪑  & ¦ȑՈͱᾬඈt 3Z[ 3\͟᪑ Âࢴ7&ę¦ȑՈ Ȁt ᢖᄶ (&ᜐ 5ƫ Ñ[ ੄ࢴᢖᄶ )4ͣĿ Ոᢖᄶ ିᢅ ᜬ 1 ʵǒ ᪩ᜬ ȭ ඗Ș 8ĺ/ͱ/┌ /ɬ/ʵ/ ǡ/ʐ/ὗ/J/᱉ϣ /Q/ūϬ/ẋ/Ո/ĭ˅ 9ẟᜐ ᢖᄶʃͼ ͢඗Ș& 8 ĺ/A ͱ/A ┌ /K ɬ/B ʵ/C ǡ/D ʐ/M ὗ /B J/C ᱉ϣ /V Q/A ūϬ/A ẋ/A Ո/A ĭ˅ /A 94 ᜬᜬᜬ 1 ę¦ȑՈȀtᢖᄶᜬ ͛5 ŷƄ ۅේ B ʣ ŬŒ¿̴ϣ ƻ⓷ ǚ C ǐȑՈŊƋ ŬŒ¿̴ϣ D ǐȑՈƧƋ ŬŒ ¿̴ϣ E řȑ Ŭ᪸8 u«PΙ¦ 9 F Qහ ໅3ɃNJ G Ȓහ ɏʇ3໅3ཚʣ3ȸΤ3ǺQ K ¦ȑՈZ njǡ4 zξΟ ՈǪ 4 ʨ ٸL ¦ȑՈ[ ,Œࠂᩴ້ M ę¦ȑ7ⒸՈt ේkό⍫ȓ ʐँ἗☖᪸ U ¦ȑՈZ^ ʣt᪑ ɴç'y&Ł е) 4 V ¦ȑՈƧƋ^[t᪑ ݎƚ ¿਍ :Ȱ , ὗJ ᱉ϣ Q X ^ ǐȑՈŊƋt᪑ ɏę ම3 Y ^ řȑt᪑ ʌ̤ 3˾Ο Z ǐȑƨᵯt᪑ Ŭƙ⓷ A ͢Â\͟᪑ ͔· ʐ Ǻᨪ nj͡  ෉Ɍ ὗ Ƀ¿

2.2 ᢖᄶႮ¬ʃͼ^ę¦ȑ᪊/ ^Ϣȯę¦ȑՈǩƄĉȯ 3ȑ3Z[਍Ȁt ᢖᄶ ὧ4dž P ᢖò᪸uØ dzÑʃͼ ǩƄ᪑^ Ո] ȐȀt ᢖᄶ ,Ởẋ ȭᢖᄶÛ ẟᜐ ੄ř ՈµśĽ‑ǡ ǒɴę¦ȑՈ᪊/ 4ǒ┉Z ę¦ȑȀt ᢖᄶ Ո ʃͼ ƨᯬZ«P ੄ř Ո᪑ ିʃͼẋ࣏ 4 uØ ₋Ϭ Viterbi ਜ਼ͩ[9] ǡǒɴᢖᄶ Ⴎ¬ ʃͼ 4ŷº¤ƅ dz࿁Ո ʃͼÛ ĄọϦ ɋż̓ ້ň &żඌ ʃͼ ඗Ș 4˖ᢧ ẋ࣏ ȌȰΞ[  uØ ȳǎ W « ᪑ȒՈ Token Û (ŷƦԿơ᪑᪊/QՈ᪑᪱ ඗Ș ) T « W Ɍdz࿁Ո ᢖᄶʃͼ Û .͢ T# &żඌ ʃͼ ඗Ș ŷ ɋż̓Ո ᢖᄶÛ 4ƅ

W=( w1, w 2 , … , w m),

T= (t1, t2, … , tm), m>0, T#= arg max P(T | W)...... …...... …...... E1 T ʵǒ ᯡǺ+͘ ś ƅ P(T |W )= P(T)P(W|T)/P(W) ...... …... E2 ȭzP Ľǎ Ո Token Ûǡ᪸ P(W) «P„ϔ üǸʵǒ E1 ʐ E2 uØ dzÑǣ4 T#= arg max P(T)P(W|T) …...... …...... …...... ….E3 T

ΞȘ î᪑ wi ᢊ&ᢆȓ Ȩ, îᢖᄶ ti ᢊ&źɍ Ȩ͢ t0 &!ϧźɍ 4  W «ᢆȓ Û ໐ T &╔ᒓń W ȒՈ źɍ Û ẝ«P ╔ȠɈ࢕͇ ⏂4ὧ4 uØ dzÑš͑╔ȠɈ࢕͇µƧ [10] ǡᩥਜ਼ P(T)P(W|T) 4ŷ m P(T) P(W|T)  ∏ p(wi | ti ) p(ti | ti−1) i=1 m _T#= arg max ∏ p(wi | ti ) p(ti | ti−1) ...... …...... …... …...... …...... E4 T i=1 &rᩥ ਜ਼Ո੄ƫ ȭ E4 ࢑Ոɋǚ ᯣȭϔ ƅ m T #= arg min (− ∑{ln p(wi | ti ) + ln p(ti | ti−1)})...... …...... …...... E5 T i=1 żȒ ᢖᄶ Ⴎ¬ ʃͼ ⒲Lɥ Ḱdž& E5 Ո˖ᢧ⒲L 4Vite rbi ਜ਼ͩ[9] «ᢧΟẝ ି⒲LՈඓͤ ਜ਼ͩ T #Kɥ Ẓ ໐ᢧr 4 &rᢧΟ¦ȑ^͢Z[ඈȌt᪑Ո⒲L ń¦ȑ᪊/7Q uØᡅȭ ᢖᄶ U¦ȑՈZʐt᪑ ឆ& KB 3DL z້ EL 4໐Ȗz ᢖᄶÛ Ո¦ȑ᪊ ឆ̠ˊ 4ּàŌ ẟᜐ ʐ V¦ȑՈƧƋʐ[t᪑ /dzÑඓ ẋżඌ ᢖᄶÛ Ȗ܄ZՈ µśż̓ Ľ‑ ǒɴ4uØ ūϬ4Ո¦ȑ᪊/ µś▊&{ BBCD, BBE, BBZ, BCD, BE,BG,BXD,BZ,CD, FB, Y,XD} 4P bĽ‑ 4͢ՈPż ⑃µ ś ͢ȭ àՈ Token ċɉ ɥ᪊/ &ę¦ȑ 42. 1 ᅆՈŷǩ 8ĺ/ͱ/┌ /ɬ/ʵ/ǡ/ʐ/ὗ/J/᱉ϣ /Q/ūϬ/ẋ/Ո/ĭ˅ 9ඓẋ Viterbi ᩥਜ਼Ȓ ȭ à Ո T# &8AAKBCDMBCVAAAAAA 94 V ឆ ̠ ˊ Ȓ ż ඌ Ո ᢖ ᄶ Û  & 8AAKBCDMBCDLAAAAAA 94 µśż̓ Ľ‑ Ȓ uØ᪊/ ϦՈ¦ȑ« 8 ɬʵǡ 9ʐ8ὗJ᱉ 94 2.3 ᢖᄶǍʻՈႮ¬ġǚ

p(wi|ti) ʐ p (ti|ti-1)« E5 ͟⏲Ո ᢖᄶ Ǎʻ džϔ4͢ p(wi|ti)ȭàՈǒ┉͛5«ń ඝǎᢖᄶ ti Ոǝ

â[ ᪩ Token & wi Ոɋ p (ti|ti-1)ᜬ߾ᢖᄶ ti-1 4ᢖᄶ ti ՈḰࢿ ɋ4ń̓ᢈ µ᪱ß ᩱඇ ՈQȴ[ ʵ ǒ̓ϔ ǎˊ uØ dzÑǣ4 

p(wi|ti)C( wi,ti)/C( ti) …...... …...... …...... …...... …...... E6

͢ C( wi,ti)wi ň&ᢖᄶ ti Ϧɴ ՈƵϔC( ti)ᢖᄶ ti Ϧɴ ՈƵϔ4

i>1 r p(ti|ti-1)C( ti-1,ti)/C( ti-1)...... …...... …...... …...... …...... …E7

͢ C( ti-1,ti)ᢖᄶ ti-1 Ո[P ᢖᄶ « ti ՈƵϔ

C( wi,ti), C( ti), C( ti-1,ti)ţdzỞẋ ȭ>ඓ ʃͼ ΙՈ c᪱ßẟᜐ ƚLᩱඇ 3Ⴎ¬ ġǚǣ4 4

3 Ⴎ¬᪊/Ոǒɴਜ਼ͩ

Ȗz ᢖᄶʃͼ Ոę¦ȑႮ¬᪊/'ᡅĉŐ Yẋ࣏ ᢖᄶ ǍʻՈႮ¬ ġǚᢖᄶʃͼ Ñǎ¦ȑՈżඌ ᪊/ 4ᢖᄶʃͼ ǒᯬZ ିĨ zP ɃƧ Ո᪑ɳ ʃͼẋ࣏ 'ᡅ«º¤ƅ dz࿁Ո ᢖᄶʃͼ  ɱȷ ọǚƍᱷ E5 ՈʃͼÛ 4Viterbi ਜ਼ͩc ⒬ᢧΟẝ ି⒲L >ඓ ☢„t c4ńǸ] ň·එ 4[☦ / ඝϦᢖᄶ ǍʻႮ¬ ġ ǚਜ਼ͩʐϘę¦ȑႮ¬᪊/ ϕ࣏4

3.1 ᢖᄶǍʻႮ¬ġǚਜ਼ͩ uØՈ ᩱඇ▊ǡ Ⴎz ʃͼ ΙՈ¦ʥaĉ᪱ß ᪩᪱ß ₋ϬՈ« ě˜̓ƚᩥਜ਼᪱ᣄ¤ :ǎ Ո᪑ ି ʃͼ▊ 4ńě̓ʃβ  ę¦ȑՈ^ȑ Ō IJǮ₋ϬȐP ʃͼ nr ]-zę¦ȑՈ ψ/ ΞĚ 1 ¤߾4 &Ǹ uØ ljНƣ ϧՈ ඗Ș ₋Ϭ ICTPOS ࢕┦ᩥ ਜ਼¤᪑ ିʃͼ▊ ÏŻě̓ʃͼ▊ ICTPOS ȭȑ / ʃͼ & nf ʐ nl ÂȺϘ¦ȑ ʃͼ & nr ΞĚ 2 ¤߾4ǚǷȒՈ᪱ß Ű5ƫ- z¦ȑ ՈǎĹ ʐ Ȍ 4

 nr Ɠ /nr 3/w ՁՁՁ/nr Ṳ /nr ĉ἗/v/ٸٸٸ ƨĉ /r ᕐǼ /ns Ƅ/t a/t Ϲ/n ᩴ້ /n /w ,À/t Ո/u ⍣̌ /n  /d ϖˑ /v /we Ň/m ₐ/q š͇/ns Čǡ/v Πᩳ /n Ě 1. ₋ǚě̓ᩥ ਜ਼᪱ᣄ¤᪑ ିʃͼ▊ Ոƣϧƣ nf Ɠ /nl]nr 3/we [ ՁՁՁ/nf Ṳ /nl ]nr ĉ/ٸٸٸ ] ƨĉ /r ᕐǼ /ns Ƅ/t a/t Ϲ/n ᩴ້ /n ἗/v /we ,À/t Ո/uj ⍣̌ /n  /d ϖˑ /v /we Ň/m ₐ/q š͇/ns Čǡ/v Πᩳ /n Ě 2. ₋ǚ ICTPOS ࢕┦ᩥ ਜ਼¤᪑ ିʃͼ▊ ʃͼ Ոƣ ᢖᄶ ǍʻႮ¬ ġǚՈȖƨ ɩᲳ «ńǚǷՈ᪱ßȖ܄Z Ⱥ᪑ ିʃͼḰ dž&ᢖᄶ ẟᜐ ᢖᄶ Ǎʻඣᩥ 4 ͣĿਜ਼ ͩΞ[  (1) º ʃͼ ΙՈ c᪱ß ƉƵ ᪿ͑ŭ᪑ɳ ʃͼ ΙՈǩƄ  (2) ʵǒ᪑ɳ ʃͼ nf ʣ ,nl ȑ z້ nr ȑ ǎĹϦ ę¦ȑ ʃͼ Ⱥę¦ȑÑ ̲Ո᪑Ո ʃͼ džtᢖᄶ A4 (3) ᆩ¦ȑQ☦Ո ċ) p ʐ¦ȑŊᾬ f t& ,᪑ pf Ⱥ pf ʃͼ & U, ȪȺ p ʃ& K( ᆩ p ƣǡʃͼ Ոᢖ ᄶ« A) z M( ᆩ p ƣǡʃͼ Ոᢖᄶ « L) 4 (4) ᆩ¦ȑɲᾬ t ʐ¦ȑȒ☦Ո ċ) n t& ,᪑ tn Ⱥ tn ʃͼ & V, ȪȺ n ʃ& L4 /ȭ 3ǐȑŊƋ 3ǐȑƧƋ 3řȑ3Qහ 3Ȓහּ àŌʃ ʵǒƨ 1.1 ᅆ¦ȑՈ 5 ࢑ି / (5) ͼ&ᢖᄶ B3C3D3 E 3F3G4ͱᾬt᪑Ո ̑Ρ ּàŌʃͼ & X3Y3Z4

(6) ńǩƄՈ ᢖᄶÛ  Ⱥᢖᄶ ]« A Ո᪑ wi ƌ͑ ę¦ȑ᪊/᪑ͤ Âඣᩥ wi ň& ti ՈϦɴƵ ϔ

C( wi,ti)4Ȑr௳ᩥ¤ƅ] Ȑᢖᄶ ՈϦɴƵ ϔ C( ti)Ñǎּ὿ ᢖᄶ Ո͝ ɴƵ ϔ C( ti-1,ti)4 Ởẋ ZẴẋ࣏ ḰdžȒՈ ᢖᄶ ᪱Ξ Ě 3 ¤߾4

B ƓƓƓ/C /D 333/M ՁՁՁ/B UUU/C ̤̤̤/D ĉĉĉ/ٸٸٸ ƨĉ /A ᕐǼ /A Ƅ/A a/A Ϲ/A ᩴ້ /K ἗἗἗/L /A ,À/A Ո/A ⍣̌ /A  /A ϖˑ /A /A Ň/A ₐ/A š͇/A Čǡ/A Πᩳ /A

Ě 3. ¦ȑ ᢖᄶ ʃͼ ƣ 3.2 ę¦ȑՈ᪊/ϕ࣏  ₋ǚ Viterbi ਜ਼ͩẟᜐ ᢖᄶ µƧ [11] Ჳǐ Ո˝᪱୛ڱ᪑ uØ ₋ǚՈ«Ȗz N-ż ȭǩƄẟᜐ (1) ʃͼ ˖Ϧ ɋż̓Ո ᢖᄶÛ T#4 (2) Ⱥᢖᄶ & U Ոċ) pf ឆ& KB ᆩ f & 3 KC ᆩ f &ǐȑŊƋ z ᆩ f &řȑ 4 (3) Ⱥᢖᄶ & V Ոċ) tn ឆ& DL ᆩ t &ǐȑƧƋ z EL ᆩ t &řȑ 4 (4) ȭ ឆ̠ˊȒՈ ᢖᄶÛ ńȑ᪊/ µś▊ẟᜐ µśż̓ Ľ‑ ṗϦȭàċɉ ඈt¦ȑ Ȑ rᩴơƷØńǩƄ ƟՈ Ĺา 4 (5) ȭ᪊/ Ϧǡ Ո඗Ș¤ ͑P‡└: ᢈ Ƕ┨ ⏝᪳Ոę¦ȑ 4Ξę¦ȑQȒ ]࿁« 89 ü&ẝ ࢑̑Ρ [ njnj« ̲ę¦Ո ᪕ȑ 4

4 ǒɀ඗Ș^ Ȍ

Ȍ 4 ඝϦ uØńǒ ɀẋ࣏ ¤ ₋ǚՈϟ ᪙▊ ʐū ʃ ϢȒ ඝϦ ]Ȑǝâ[Ոǒ ɀ඗Șǎּ à ☦]

4.1 ϟ᪙▊^᪈ϟūʃ ę¦ȑẝ ିcȑՈ᪊/᪈ϟ njnj◄ᡅP ǎᢈµՈϟ ᪙▊ 4ϟ᪙▊ Pჰƅ࢑ ǮȯcȑՈǩƄ ▊ʐ ǒף ǒ᪱ᣄ ɳʟ ̓ₓՈ]ȯ Ľǎ cȑՈǩƄ ϟ᪙඗Șnjnj Ȼʌ4ΞńףǒՈ᪱ß 4Q້ɉЩ r ףǀ͔ ᪱ ]ȯę¦ȑՈǩƄ ᱉ẋ 90% ໐ẝ ‡ǩƄǔ dz࿁ᝯ⏝᪳Ո᪊/ Ϧ¦ȑ ǡ4ŷΞ 8 ș˽ ՈĽͥ« ᯯȦǧŤ ͔ׅ Ո 1/3 2Ƿ 49 Ո 8ș˽ 9Pჰ῁ Ćᝯ᪊/ &¦ȑ 4IJ«ń ǮȯcȑՈǩƄ ▊ZẟᜐՈϟ ᪙ ǒɀ ẝ࢑dz ࿁ᝯ⏝᪳᪊/ՈǩƄw ̴ţᝯ¦&Ո Ƕ┨ǭ r4Ǫ̲ ʵǒ> ǕᜬՈ তº߾ ֲQϬ ǡϟ ᪙ՈǩƄ ▊ĉȯ¦ȑՈϔₓţ]ᱷ 1000 [2,4] ּɨ7[ ᪱ßᢈ µnjnjɨṇ̓ ĉȯՈ¦ȑϔₓʐ ࢑ି ǒ᪱ßZẟᜐϟ ᪙ ᪈ϟ඗ ףZϟ ᪙Űͣඣᩥ͛5 4üǸ ń] ɆçŁ ਟọՈ̓ᢈ µ܄Ȁ ńǸȖǔ῁ ǒՈ᪱ᣄ ɳʟ 4ףȘŰ৪ Ȍ ʵǒϟ ᪙▊ ʐᩱඇ▊ Ո] Ȑ͟ி uØȺ᪈ϟƅ dzÑ & ȵ⒱ ϟ᪙ʐŌ ΢ϟ᪙4͢ ȵ⒱ ϟ᪙Ոϟ ᪙▊ «ᩱඇ▊ ՈƄ ▊ ໐Ō ΢ϟ᪙Ոϟ ᪙▊ ^ᩱඇ▊ ]ƌńĉȯz ້ᝯĉȯ Ո͟ ி4ńÑ[Ոǒ ɀ uØȺń ǒ᪱ßZẟᜐȈ ࢑ǝâ[Ո ȵ⒱ ʐŌ ΢᪈ϟ 4ף ʃͼ ΙՈ ʥaĉ :¦9 ⍌ȭę¦ȑ᪊/ uØ ₋ǚr„ϬՈ Y᪈ϟū ʃ ŷβ ܲɋ (P) ǰúɋ (R) ʐවȌū ʃ F Ȩ(F) 4͢ ǎ5Ξ[  βܲɋ P=Ƿܲ᪊/ ϦՈ¦ȑϔ /᪊/ ϦՈ¦ȑϔ 100% ǰúɋ R=Ƿܲ᪊/ ϦՈ¦ȑϔ /ǒ┉¦ȑϔ 100%

2  R × P × (1 + β ) , ͢ ?«βܲɋ P ʐǰúɋ R 7ⒸՈ ƿᜥ üƄ ńẝₐ uØ ᩨ& P ʐ R Ȑ਍ₑ R + P × β 2

ᡅ üǸ ?ǚ 1 Ǹr F ࢴ& F-1 Ȩ4 4.2 ƦԿơ¦ȑՈ᪊/᪈ϟǒɀ ᪑Ոʴȏ᪑ͤΚơՈ᪑ǝϔֲÑǎΚơ P5☦ ę¦ȑՈ᪊/͟⏲ńz᪊/ ਜ਼ͩ ǪP5☦ Ϭǡ Ľ‑ ᪊ȑՈϔₓK ָȉƽˑ żඌՈ᪊/ άȘ4ȭ᪑ͤ>Κơ¦ȑẟᜐ᪊/ ǒ┉Zɥ« ੄ř Ոµś¦ /βܲɳʐǰúɋnjnjȉẕz 100% 4ȭzPʴȏ᪑ͤΚơr̓ₓ¦ȑՈ ிඣǡ᪸ żඌՈ¦ȑ᪊/ɳ ࿁Ű࣏̓òZ ƞ£ zʴȏ᪑ͤ 4&r ǖᢆ ᪈ãȖz ᢖᄶʃͼ Ոę¦ȑ᪊/ ਜ਼ͩ uØ ɆrYඈǮ໇ᔕ ƦԿ ơ¦ȑՈ᪊/ǒ ɀ ŷń¦ȑՈඣᩥ Ǯඣᩥ᪑ͤ ̵ƅΚơ¦ȑՈ᪊/ɳ࿁ 4Qඈǒ ɀ«ȵ⒱ ϟ᪙ ᩱඇ▊ ʐϟ ᪙▊ ּȐৰY ඈ&Ō ΢ϟ᪙ ᩱඇ▊ &9¦ʥaĉ :1998 À 1 Ƅ 1 aႷ 2 Ƅ 19 aՈ ,Ⓙ᪱ 4 Yඈǒ ɀՈ඗ȘΞ ᜬ 2 ¤߾4 ᜬᜬᜬ 2 ƦԿơ¦ȑ᪊/Ոǒɀ඗Ș ϟ᪙ିƧ ȵ⒱ ϟ᪙ 1 ȵ⒱ ϟ᪙ 2 Ō΢ϟ᪙ ,Ⓙaƛ 98.1.1 -31 98.2.1 -20 98.2.20 -28 ϟ᪙▊ ̓Ƀ(KB) 8,621 6,185 2,605 ǒ┉¦ȑϔ 13722 7534 3149 ᪊/ ϦՈ¦ȑϔ 17167 10646 4130 Ƿܲ᪊/ϔ 13376 7489 2886 βܲɋ 77.92% 70.35% 69.88% ǰúɋ 97.48 % 99.29% 91.65% F-1 Ȩ 86.61% 82.35% 79.30% ͼ᪱ßţǡႮz 9¦ʥaĉ : »ºᜬ 2  uØ dzÑ׏4Ȗz ᢖᄶʃͼ ¦ȑ᪊/Ոǰúɋń ȵ⒱ ϟ᪙Ո̑Ρ [࿁Ẃ4 97.48% ϞႷ 99.29%, Ō΢ϟ᪙ՈǰúɋKȉẕ 92% 4໐ɴńՈP ‡ᢧΟ 5˄ ±& 68.77% [1] żẕP ‡5 ͩɃᢈµϟ᪙Ո ǰúɋP ჰKǔ▂Ẃ4 90% [2,3,5,6] 4ȭ¦ȑ᪊/ ǡ᪸ ǰúɋɨ βܲɋ ۤₑᡅ ü&ĺǰúɋɥ͛ ɷ؄̵ ƅ ¢ͩ͹ň ȒනՈᜩεȎ9 ໐βܲɋ ǀ͔dz ÑỞẋ└: ǝâz ້Ȓන̠ˊ Ξ᪑ɳ ʃͼ 3ǩͩ Ȍ਍ ਍ ̲ę¦ȑ Ξ ±±« „̄ ᾬɉȺ⏝᪳Ո¦ȑ Ƕ┨ǭ º໐ȴʌżඌՈ βܲɋ 4ֲQuØ ₋ϬՈ └: ᢈ¯ Ș͹ʺ¤P ‡ƅάՈ4┨ ᢈ z້▊ t4Ϙ᪑ͩ Ȍ βܲɋ ẜƅǔ̓ՈZ ŋा Ⓒ4 ᩱඇ▊ \᩾ƅ̶̓ Ʒɩণ « ȵ⒱ Ոƅ └▊ 4üǸuØ š͑ rϔǒ¿ Ž̠ˊ ƶ: ūǣȖz ᢖᄶʃͼ Ո5ͩẜdz Ñt £᪊/ Ϧᩱඇ ʳƨ ▊a֚ᇇĐ 7̲Ոǔ̶¦ȑ 4Ȗƨ ɩᲳ «⍌ȭę¦ȑ᪊/᪑ͤ ̵ƅ ΚơՈP ‡Ƌ᪑ uØ Ǡϟ͢ dz࿁Ո ᢖᄶ ϢȒ ᰏt Pṇ ɃՈɋ ẟᜐ ᢖᄶʃͼ żඌ᪊/¦ȑ 4ŷΞ  8ʾ9ńę¦ȑ᪊/᪑ͤ Â̵ƅᝯΚơ IJuØĆȭ 8ᩴ້ ̫ʾ 98ʾ9 Ոᢖᄶ ẟᜐȈ ࢑dz ࿁Ո Ǡ ϟ żȒ 8ʾ9ň&ᢖᄶ D( řȑ)ՈQȴ[ ᪩ǩ±࿁ǚǣż̓ ɋՈ ᢖᄶÛ 4Ǹr 8 ̫ʾ 9Ո ᢖᄶ &8 BD 9 (+řȑ) üǸ dzÑᝯ᪊/& řřȑՈ¦ȑ 4 4.3 ICTCLAS ^¦ȑ᪊/

Ȍ ிඣ ICTCLAS ✑Institute of Computing Technology Chinese Lexical ńuØۘ :Ոᩥ ਜ਼¤˝᪱᪑ͩ Analysis System  uØ àϬrȖz ᢖᄶʃͼ Ոę¦ȑ᪊/ 5ͩ4&r᪈ϟ¦ȑ᪊/^᪑ͩ ȌՈ͟ ி4 ń 1,108 ,049 ᪑ՈŌ ΢ϟ᪙▊ Z uØȭ ICTCLAS ẟᜐrPඈȭɨǒ ɀ 1 BASE: ̵ƅàϬ¦ȑ᪊/Ո ICTCLAS 2 PERSON: àϬȖ ᄶʃͼ ¦ȑ᪊/7ȒՈ ICTCLAS ȭɨǒ ɀ඗ȘΞᜬ 3 ¤߾4 ᜬᜬᜬ 3 ICTCLAS àϬę¦ȑ᪊/QȒȭɨǒɀ ି/ BASE PERSON Ƿܲɋ 96.55 % 97.96 % ᪑ିʃͼ Ƿܲɋ 93.9 3% 95.34 % ¦ȑ᪊/Ƿܲɋ 16.32 % 95.57 %

¦Ⴎ ϵ[ṁ ƶẒ ۘऺ ʐ ˟ ţdzńႮϢ᪱ᣄ̠ˊŌ ΢¿Ǵ www.nlp.org.cnۅ᪩ி ඣ͔ᾬՈ ļ✑ ɜ31࣏ äƫ¦ ɜÑǎǍʻ̠ˊՈ õΙ້ū ϬÂūǷ 4 ¦ȑ᪊/ǰúɋ 94.91 % 95.23 % ¦ȑ᪊/ Ž Ȩ 27.85% 95.40% ͼ 1 ϟ᪙▊ ̓Ƀ 1 ,108 ,049 ᪑¦ȑ 15 ,888  2 Ƿܲɋ = ǷܲՈ᪑ϔ /ʇ᪑ϔ 100%  3 ᪑ିʃͼ Ƿܲɋ =᪑ିʃͼ Ƿܲϔ /ʇ᪑ϔ 100%  4 ẝₐȴ ƇՈ¦ȑ᪊/ū ʃĉȯrʴȏ᪑ͤΚơՈ¦ȑ 4 ᜬ 3 ՈϔǒᜬŠ  1 ¦ȑ᪊/ǒ┉Ōȴʌr ˝᪱ ᪑ʐ᪑ ିʃͼ ՈǷܲɋ 4ńuØՈǒ ɀ ę¦ȑ᪊/ àϬȒ ICTCLAS Ո Ƿܲɋʐ᪑ ିʃͼ Ƿܲɋ Ȑrȴʌr 1.41% 4 2 ᪑ͩ Ȍȴʌr¦ȑ᪊/Ոżඌɳ࿁ 4ę¦ȑ᪊/ àϬń ICTCLAS 7Ȓ ¦ȑ᪊/Ո F-1 Ȩȴʌ r:]̶ 15% Pᾬ ƣü«ʴȏ᪑ͤ>Κơ¦ȑՈ¤ ͑ IJŰ'ᡅՈü ௤«᪑ͩ Ȍ z­ ¦ȑ᪊/ Ƕ┨ rǔ̶]ȌˊՈ ȅọ ඗Ș ȴʌrǷܲɋ 4ŷΞ 8 ÐՈˈǔϠ9 ń¦ȑ᪊/ ⓺ɉǔdz࿁Ⱥ 8Ð9⏝᪳ Ո᪊/&¦ȑ ໐ńϘ᪑ͩ Ȍ ȅọ ඗Ș ▊Ɵ  8Ð9ň&Ōȑr Ϙ Ȍ඗ȘՈ ɋ̓z͢&¦ȑ Ո̑Ρ ¤ÑżඌՈ᪑ͩ Ȍ඗ȘĆȺẝ ࢑⏝᪳Ո ̑Ρ Ƕ┨ǭ 4

5 ඗᩾

:ȌrֲQᢧΟ 5˄ ɬϏƶ ƨ ிඣŌۘऺrę¦ȑՈ̶ ࢑ȀtƮśÑǎǍt᪑ՈȈ ࢑̑Ρ ʐȅọ ȑ ọǚՈƨᯬ ෾┻4⍌ȭǒ┉⒲L^>ƅ 5ͩՈ]ᱷ ň້ ȴϦrP ࢑Ȗz ᢖᄶʃͼ Ոę¦ȑ᪊ ẟᜐ ᢖᄶʃ 5ͩ4ŷ₋ Ϭ Viterbi ਜ਼ͩ -Ϭę¦ȑȀt ᢖᄶ ᜬǎּ͢͟ඣᩥǍʻ ȭǩƄՈ] Ȑt/ ͼ ńᢖᄶÛ ՈȖ܄Z ẟᜐ µśż̓ Ľ‑ º໐᪊/ Ϧę¦ȑ 4ę¦ȑȀt ᢖᄶ ūՈ«Ȉ ᪑ ċ )ń¦ȑ᪊/ ẋ࣏ ¤ Òǀ Ո] Ȑᢖᄶ Ξ3řȑ3Z[਍ 4Ɍ᪑ ň&Ľǎᢖᄶ ՈɋÑǎ ᢖᄶ 7 ⒸՈ Ḱࢿ ɋ ͔ᾬº ᩱඇ ᪱ßႮ¬ ġǚ º໐┑ĺr¦ 1ʇ ඗ᢈ Ոʌtƨ^ͱń ෾┻4ᢖᄶ Ոʃͼ ǒ᪱ßՈ ףẋ࣏ ɥ« ọǚż̓ ɋՈ ᢖᄶÛẋ࣏ ὃ̹ rÑQ 5ֲֶͩᢪǕ Ո]ᱷ 4Ởẋ ȭ̓ᢈ µǀ͔ ȵ⒱ ^Ō ΢ϟ᪙ ᪩5 ͩǚǣrּ ƟΙՈ άȘ uØ ẜȺȖz ᢖᄶʃͼ Ո¦ȑ᪊/ ਜ਼ͩ▊t4ᩥ ਜ਼¤˝᪱᪑ ͩ Ȍ ிඣ ICTCLAS  ṇ̓Ոȴʌr᪑ͩ Ȍ άȘ ȐrK Ưẟr¦ȑ᪊/Ո වȌɳ࿁ 4Ȉ࢑ȭɨǒ ɀ ᜬŠȖz ᢖᄶʃͼ Ո¦ȑ᪊/ ਜ਼ͩ«ᜐ7ƅ άՈ ࿁ƍᱷǒ┉Ո◄˖ 4uØ[P ǹՈۘऺ 1ň «ȺȖz ᢖᄶ ʃͼ Ո5ͩȌË 4ȭęŌȑ 3᪕ȑ3෭Щ ᪱਍͢ ÂƦԿơ᪑Ո᪊/ 4

ऺ ɜȭƨՈ 1ňඝ trඊȏՈۘ ܙႸ᫦ ę ࢕ƚ ┦ᩥ ਜ਼äƫۘऺ¤ḳâ ǘՈ࣏ƚ S'çʐŊ y࢕ƚ ǪՁ 3ώ൶ 3Qęʇ਍ȐwȭƨՈ ǀtȴ Ϧrǔ̶ ƅ֎Ոņᩲ ,ńǸ ˵׳᪊źǼ ඈՈ Ŭ 3NJණ⏏ 3NJک ūȰ PÂᜬ߾ͫ᫦ 4żȒ Ľ/ͫ᫦ ᪈Ǖ໅Tඊ Ⴘ໔ȏՈ Ǖɡ 4

dž໇Dz ;1= Heng and Luo Zhen -Shen. Inverse Name Frequency Model and Rules Based on Chinese Name Identifying. In: -& Pu, ed.,"Natural Language Understanding and Machine Translation", Tsinghua Univ. Pr ess, Beijing, China, 2001. pp. 123 -128. (in Chinese) Ƶ Ŭê .ႮϢ᪱ᣄˊᢧ^ƶ„຿᪕ .ě˜ :±Œ̓ƚϦȈٸ:ՈȑႮ¬Ṭ᪊ிඣ .ᢅƗ" ปƓ̌ .ȖzǑɨɋµƧʐᢈ) ࠂ,2001 .p.123 -128 . ) ;2= Lv Y a-Juan , Zhao T ie -Jun . Levelled Unknown Chinese Words Resolution by Dynamic Programming. J ournal of Chinese Informa tion Processing. 2001 .15, 1: pp. 28 -33. (in Chinese) (ș▉S ᰹⎅·਍ .Ȗz ᢧ^¬ɍᢈਚЩՈ˝᪱ƦԿơ᪑᪊/ .Ǎʻƚĉ ,2001 , Vol. 15 , No. 1 : P.123 -128 .) ;3= Sun Mao -Song. et al. J ournal of Chinese Information Processing. 1994. 2 . (in Chinese) (ƍᇆǺ਍ .ȑՈႮ¬Ṭ᪊ .Ǎʻƚĉ ,1994 , No.2 . ) ;4= Luo Zhi -Yong and Song Rou. Integrated and Fast Recognition of Proper Noun in Modern Chinese Word Segmentation. Proceedings of International Conference on Chinese Computing Singapore. 2001. pp. 323 -328. (in Chinese) (ปöË ƿɐ . ɴÏ˝᪱Ⴎ¬ ᪑cȑՈPĿĚ 3ȷợ᪊/5ͩ .ᢅ:Ji Do ng Hong ਍.2001 .ę┉Ϲ࿕ƚƫĆᩲ .,¤ Ž:, 2001 .p.323 -328 .) ;5=Zhen Jia -Heng and Kai -Ying. Discussion on strategy of surname and personal name processing in Chinese word segmentation. In: Chen -Wei. "Research and Application of Computational Linguistics", Beijin g: Linguistics and Culture Press, 1993. ( in Chinese ) (. ᪑ிඣʣ¦ȑՈ̠ˊਚЩȆᩬ .ᢅ:┌Ÿ& .ᩥਜ਼᪱ᣄۘऺ^àϬ .ě˜ :ě˜᪱ᣄƚ┦ϦČࠂ :, 1993 Ō̟ . Ⴎ¬ ᾕǪʞ) ;6=Song Rou and Zhu Hong. Approach of personal name recognition based on corpus and rules. In: Chen Li -Wei. "Research and Application of Computational Linguistics", Beijing: Linguistics and Culture Press, 1993. ( in Chinese ) (. ßՈ¦ȑ᪊/ͩ .ᢅ:┌Ÿ& .ᩥਜ਼᪱ᣄۘऺ^àϬ .ě˜ :ě˜᪱ᣄƚ┦ϦČࠂ :, 1993ƿɐ ƭǃ਍ . Ȗz᪱ßʐᢈ) ;7=Wang Sheng, Huang De -Gen and Yuan -Sheng. Chinese person name recognition based on mixture of stati stics and rules. In: Huang Chang -Ning& Dong Zhen -Dong ed., "Corpora of Computational Linguistics ", Tsinghua Univ. Press, Beijing, China, 1999 (in Chinese) (. Ƶ ጧƓ .ᩥਜ਼᪱ᣄƚ▊ .ě˜ :±Œ̓ƚϦČࠂ ,1999ˆٸ:ּ඗ȌՈȑ᪊/ .ᢅȃʵ Ǥ̯ϣ . Ȗzඣᩥʐᢈٸ ɏׅ) ;8=Chen Xiao -He. Automatic Analys is of Modern Chinese. Beijing: Beijing Univ. Linguistics and Culture Press, 2000. p.104 -114( in Chinese ) (┌Ƀሻ . ɴÏ˝᪱Ⴎ¬ Ȍ . ě˜ ě˜᪱ᣄĚ̓ƚϦČࠂ ,2000 . p.104 -114 ) ;9=L. R.Rabiner (198 9) A Tutorial on Hidden Markov Models and Selected Applications in Speech Recogni tion . Proceedings of IEEE 77(2): pp. 257 -286 . ;10 =L.R. Rabiner and B.H. Juang, (Jun. 1986 ) Introduction to Hidden Markov Models . IEEE ASSP Mag., Pp.4 -166 . ;11 =Zhang Hua -Ping, Liu Qun. Model of Chinese Words Rough Segmentation Based on N -Shortest -Paths Method. Journal of Chinese information processing, 2002,16,5:1 -7 (in Chinese) (µƧ . 2002, 16,5: 1 -7 ᲳǐՈ᪑᪱୛ڱ๨ .Ȗz N-żŬŒ¿ , )

ZHANG Hua -Ping , male, born in 1978, Ph.D. Candidate. His research interests include computational linguistics , Chinese information processing and information extraction.

LIU Qun , male, born in 1966, vice professor, Ph.D. Candidate. His research interests include machine translation, natural language processing and Chinese information processing.