INSO Islamic Republic of Iran 16939-1

16939-1             1st.Revision Iranian National Organization 1392 !"

Dec.2013

 /,   -. %)*+ - $ %&' (    4)/ 5 :1 (12 - 0

6 7

Language resource management - Word segmentation of written texts- Part 1: Basic concepts and general principles

ICS: 01.140.10  8 9 :$        $  '0"  (   ( (    ' $ %&  $ 3"  !              (+( ) (7 8 6 ( 1  '  )' 45 2 3 12 + 0  , 1371 -  '+,  * )   .  - ,;    ( 90/6/29 < ( 8  = (; 8 (> (7 '  - ?@  *A              B . 3 - > N&  3, 90/7/24 <  206/35838 - +>  H  I     7  J   (  U(2   (T P ( )  J (  ( > 2 J 2 S 8 6  +2  Q7R 8 6 -J P    '  )8 (=  Y (> ( (  (  (7 \= *  B [+6 1> 2  >  B ? YZ  - WX 8 *$  8 =  )16V@ )+7;   W 2   ) W 2_* ) W 2 =  ^ > )04  ]P  ZP   4*   6 WX 32 1 J 2 3 8 ?  8  S (7 8 6 ( b a@.  >  ^ P = `  = 8 6  J  ) 6 , )**R  +7; U2  ) W 2    6 ,1@  6T 3S   J b@   > g  f  S 8 6  +2 8 e;  04 8d 0   6 cT 8    . >  1  h i   (+ ) 7    ;   *     %H > X  YZ 7 +2 (2 ( (, - (> '( Y j 3 ;  U %& 8d   $&; 8 6  J     2  6  b a@  6  ) ' .  >  1  h i   7    ;  ) *        %H 7 +2  (2 f (  ( (7 +2   '  5 " +>   7    - > >  4 k   2 >  7 7 . >  -   *  6 ^A1     7  J   2(IEC) !(AA= 77+= '  +2) 1(ISO)   77+= '  J  7 8 e; J     7  J  (12  5(CAC)  l(` b2 (2  (+2 4Y ,  ;   3 3(OIML)  $  > -J 77+= '  J  3S(1@ '(cX J ) 12 m c 8 6 8 J   72 Y >    '+j   7 8 6  '   . 2  3= S .  > 8W-, 77+= ' 8 6    ,   S )+7; 8 6  3& n4P ) W 2 _* J 3 +P 8  ) $  - >  a@ 'J  3 ;        7  J  8 6  J e 8  )8 *$  p 3J  TP&  o * 342 J  +H g *P ) +;  8S + (  J  .  + 8 Z )  = ; 8 >  *  )   B&$ / 12 ^c  8 =  o * 8    7 8 (Z X 8 ((     ( 8 6o 2   8  ) 12 o * 8  77+= ' 8 6 J  n4P T   )rJ (X )-  (1 (J  g S    6  J    c J  W 2 - 4   1R  +H 8  'q+6 .  +  ( Z= 2 U(2   (6 - [(1 JX )(p3(J 3 (  3(42 3 ( 8 6 s 6 W    8U+ ) J  3P&( (t B (T Y j k      6  J   W '     7  J  )a? ^  (? ) . (2 (  T 6 X A7+;   p; 6 X  3P& t  6 W )BJo Y > J P     2   J   8  8 2   B ?  ,Z W  U7S ; ' )a? ^  (? )   Z= 2 ) 6 A 77+= ' - [ u .3  J  ' Q 5 [ J   7 8 6  \p 8 

1- International Organization for Standardization 2 - International Electrotechnical Commission 3- International Organization of Legal Metrology (Organisation Internationale de Metrologie Legale) 4 - Contact point 5 - Codex Alimentarius Commission    ,  '; )1)6

 0  /,   -. %)*+ -  $ %&' ( "

"6 7   4)/ 5 :1 (12

- '   / ( : <)= B& P  &  JX - [1  k  k Z; )8 +P ( >  J > k > 2)

:)$ B&   P  6 [1   , k   + )[ 7; (7[  J +   > 2)

( Z4= _P  ) :? @A B&    !  k > 2 + )+6  (7[  J +   > 2)

B& x !  k > 2 8 , ) p (7[  J +   > 2)

B&   P  6 [1  , k > 2  U, ) c ( S     J k > 2 )

B&   *   ^2 -  k > 2 +H S )B 7T (+> A k > 2)

  ; +   - S ) 2 ^2 -  k > 2 J, )8  B& (7[  J k > 2 )

v B  ' (;

:C57 'A    7  J   y >X

v   '  S  +2  4W a@

J   1  2    _ 6 1

3  U= 0  2

3 Q    P&p 3

12  Wz  0p 8     i i 4

18  Wz  0p  +; g  5

23 XML   Wz  0p a + ( U= ) Q= 3 @

24   2

 (6 5- D)E

(2 «(72 g     s6 4 :1 3+$ -8 > 8 6 '  Wz  0p- J 0Z 3 »    ( |7 +2 k& '   *A   - > '   , f  8 6  +2  X b a@ -  !     ! )3 SW $  *   91/11/29<  >J X  8   U,?     (; ( )1371 - ( '+,Z * )            ' $ %&  $ 3 . >  1   |7  

)  (c  B (7; )0 ( (J   (,  (|7 8 6 3S1@  o   [6 +6   [+6 n4P 8  '( ^(+A  %&( 8 ( 2 8 ,1@ 6  > 6 c T  ? BU= 0$     |7 8 6  (  )'   .3SW 6 c $     f  S  +2  T  ? B [6 ) > y 6  .2 - 4 |7 8 6  T  ? 'cX J - +6

:3 J %>  SW $ - 4     ' , 8  2 8lct  0Z

ISO24614-1:2012,Language resource management - Word segmentation of written texts – Part 1: Basic concepts and general principles

-  : +    s6 4    ' .3 -  $ _ 6 8 > 8 6 J   Wz  0p 7   ' .J @ 72 H  6 J  2   U2+  Wz  0p 72 g 

« 4 < 2» )g }  ;  . >      2 3  J 8 6 P   ' 0p ) Wz  0p = P    - > 3 4 2 c 2  2 « 4»  «< 2» 2 s   P       - > AX -  o  ,+ by 3 $ ^  2 3   P A  f  « 4 < 2» 2  . '  ، ' W  P 6 J 8  €   P 6  0p 8 Wz WSU))1  -  < . . > ^A1 -z  ! J a J    Wz  0p 8 6 P  ) > -   1 7Z$ g }  2 p +6 »:  )  > ^A1  6   •  ! J :7+ J >  >  47R 8 63= P  WSU! :  ) >  %&p ! )(«k S u7c»:  ) >  m c s )(«-  R» :   ) 2 -z  !)(« 2+ 2  6 J 8  .( «  000 $ »:  ) >  8 -z  i  Z;  )(«     Z B» g } J - 4   Wz  0p 8 6 P   ' 0p )7[ g }  ;  ) lW 7 S  +72 ' 2 i6 )  > - 4 -   Wz  0p 8 6 P  8 6J  ? 8  8  @  ;    8 6 eS  81  TP&     [ '   €8 - z  i 8 6 P   8 lW 1 ) 63 >  2 "  8  ' 8 P  2  6 J 8   @ z )i ^} )  7 S 6z  ' 2  6 J 8  .W $ T J  8[ A   Wz  0p 8 6 P   ' 0p 8 - 2  8 7  ^} )  7 S 6-z  .3

W xi 2  6 J )  8 -W  Z2 2 i    6  J 8  6 -z  0p )'  -&; (@ z  (J (2 3 3$  ' )[ 8  J .3 - q@   ?  8 -2 )@ z    -W .3 4  Wz  0p 8  ) 2+ Z1@  W i 8 6YR

ƒ ( !(  (; ( « €8 @» g } 8  .3 S 2 ' 0p 8   ,  = c 8 eS )g P '  U(? "z  0(p P   ;  «» «8 @» b@ )3 - > •   J - > c  €  -  i 72 36  !  ;  X   P&p 2Z2 m c     )[   J . - > SW T 

1 -Word segementation units

2-Collocation

J (P ) >   4 6 J      8  s ; $ .2 7 X U? 0p P  !  P  .(  2  g   P&p 8  2 $

(   (6 (J  (   8 X ( (  ( 6 -z  0p 3, 6r  ;   6  \1 ) &( g(2 (7+ J ) (J ( f ( 8 68 X' S   ' . 2 ^, +    J 8W  v R( ) (;&H  (J  )(+  (TS P ) P&p( 3  )3I= „6S    ) J  =2  ;  « c TS P c gU`»   > B ; g }  ;  . 1R  Z, >  +    ;&H T47= 3 +  ! P&p +  8  )>  8 6 +  8 X 'S  )! "z  0p P  ! . 2 X J

% J  0  /,   -. %)*+ - $ %&' ( 

6 7   4)/ 5 :1 (12

$ 6 :' F /1 J ^ 8 6 ^+=    Wz  0p 72 g     s6 4 '   ' '  J _ 6  (WSU) - z  0p 8 6 P   )l@  ?  Z 8 - >  8 >   0p  3  J .J 

(6 -z  ( '( !( 0(p 8 ( '  .3 8 j     ,4 )-z  )3   J  f    _ "  J , 6 -z   P ' 8   + 2 .s>  >   , Q ! 6 6-z  " 6 ^A1 q X J 3 8 j s(6 ( 7( S Y(c ( (2 (2 8 (6 s    6 3$  8  8 ; $ 'i . 2 - 4 8 lW  1  6 7 S ; $ 0(p. ( (j  ) (6 - +(>   6  + ^ > 2    Z; 8 6-z     P&p ) 6 3>  2 )  @ - x(i 8 6 J 8   ) 2+ - 4 7 S J 6-z  8J  3, 2 @ z  i    6 J 8  P  Wz  .3 ^ A1 ) 2 - 4 6   ;  8   +72 8 6- c 2 8 -2  

(J  (   ( ' ) ( 6z  ( ' 0p  J  2  6J  8 2 8 6  J 8  :   2

: 8 6         Wz  0p . >   +  8 6U6 Z  8  7 r 8 +> -z  B ? %&p v R U 7   Wz  0p. 3 1 (CAT)   !+2  +  U  +  TS P   ( CAT)  !+2  +  8 6 U   P&p 3  8 6      $ 6 W 2 ) > .W

1-Computer-Assisted Translation

1 C ( 

? 2   . 6 A  +72 8 ? "J  3  8 6  W  - [ @  6   }2 a   J  ?  7+; )'  -&;. 6  ?   "z   … Zp -J   > 0p    > .   +72 8 6J

5-  /  ",; ) +72 8 ? 8  ?   2  =   +72 k   4W ) 4W  ' ^ Z 8 6   .  J   Wz  0p  -`  2j; 8 [= †*R )1  1S †*R

:   'G $ . 6 B ?  c Q 5  2 0p  Wz   '  3(NLP)ZH  J r J @ ƒ  8 6   :J  Z; NLP 8 6    )8  -S 8 6- J @ - )8  8 6 - 2 U? - )& 8 6 - 2 g2 -

 ) ' 8  ZH 8 6   -

. J "A@ ; +?   > P -

1 H'/; . >  J ) +72      o + )-J k   7`  Wz  0 

(  (+72 r +>  g + H   J 0  "J '. 3 6X 3  8     ) J 0  -J _ "   ( ( !( (6 ) (2 ( - 4( 47R 0p 8 6r  J NLP 8 2 8 6    i )g P '  . X 3 ) ( 8(W-J ! .    4  +72 ƒ +?   A ' ! 8   2  Z  6- z     4 3( '(A+ 8 (2 8 (6 (  (2 3( (  ( '( .J  s6 S  ? ?  )l@  ? ) +; ^ $

1- Stress assignment

2- Prosodic pattern assignment

3- Natural language process

2    3 'A+ 4W 2 8 2   ! )g }  ;  . 2 - 4  c US B  * 0p 8 6 - >J . 2 0p W U  Ai 2 8 6 P   ' 8[   

IJ % 2

' ( .3( - (> ƒ ( 6 X    |7   ' '  2 3   8 P J  U= •  ‡ ( (2d ( 2   2    . >      |7   ' J yU   X  .3(  ( |7   ' T   X 8  8 6T  ?  6 P& ) >  - > -  ƒ 1 8 6 P&  T  ? 'cX - +6 )3 - > - > 6 X  1 ‡  2d   2 2     .3 T   6X 8 

:3  U=   ' 8  J 0  J - 4

2.1- ISO 1087-1:2000, Terminology work — Vocabulary — Part 1: Theory and application 2.2- ISO 1087-2:2000, Terminology work — Vocabulary — Part 2: Computer applications 2.3- ISO 24611, Language resource management — Morpho-syntactic annotation framework) 2.4- ISO 24612, Language resource management — Linguistic annotation framework (LAF)) 2.5- ISO 24613:2008, Language resource management — (LMF) 2.6- ISO 12620, Computer applications in terminology — Data categories 2.7- ISO 16642:2003, Computer applications in terminology — Terminological markup framework 2.8- ISO 30042:2008, Systems to manage terminology, knowledge and content — TermBase eXchange (TBX)

K L B M*73

:  2  J Q    P&p )  ' 

1-3 1(0 :6

1- Abbreviation

3 †R1  A B ,4  SW ^A> [ BS ! J _P   +72 _lP  2 3 &2 ' . 2 [ISO 1087-1:2000]

2-3 1 . > S j (14-3) 3I= !  (22-3) •  !  3 'A+ 2 (5-3) 2  z !

$ >   6  . + 8 ZH3    ) @ ) 1@   ;S ƒ ' i    6 _ "  . > Z6 xi  S    > 

3-3 4 )E . >  (22-3) •  !,(2-3)  i  ! ' @ s6   [ISO 24613:2008]

s  S  8   [ _ " 

4-3 )-: A 8 - z  ˆ6 2 J o + ) >  k Z$ [  J J  J  Z; ! X  2 3 -z  ^A1  .    B ,4  > 8 

1- Affix

2- Bound morpheme

3- Circumfix-3 xi  X   SW S •  _ H )- > ^A1 U? 3+$  J 2 3 8 

4- Agglutination

4 5-3 1 )+ . P

. X [ z  ! ' i  !  YS 2 (18-3) 8z  ! [ISO 24613:2008]

- 6 ^A1  ; J A  ;  X )‰ ; . > - 4 '  -z  !  ;   + «- W» : S:1  Q .   2 « +>- W» )«- W » )«- W »   ) +72 J 8 

)Unvoiced =  . >   7[  J  «un, non, ir, a, in » 8 6 1@ g  2 «» 1@ : S:2  Q .inactive =Š  )atonic = 3c  )irregular=- ; ($ )non-functioning= (Š 

6-3 2R)6 . >  c  (14-3) 3I= i   J 2(23-3)8 -z 

.3 - > k Z$ 10-3 Q )ISO 24613:2008   J -1 " 

-  @  (2 ^2 7 8  8 P 2   aR ) 46 ! W >  3U2   'A+ 2 -2 "   o (H (  ( 2 ! . >  > 6 2 W > 5U2   ) >  > ( 2     ' 2)  6 .2  Z;  2+6z  :    2 J -W J  )68  -z       . > 

7-3 1 6 R)6

1- Bound morpheme

2-Compound

3- Endocentric

4- Head

5- Exocentric

6- Lexicalization

5     c 7 ^A>  )(14-3) 3I=  ^$ P 'SW $ s6 2 J  -z X  2 3 J -z  . >  ^A1 )+2 8 6  WW [ISO 24613:2008]

8-3 2… >    (22-3) •  I  o + )(23-3)  -z   ? 8  3 (23-3) +72 ! ^A>  I .  a US [ISO 24613:2008]

9-3 3 JX z ! .W $ - 4   (23-3) -z  !  ;   c 8 c    2 3 (18-3) 8z !

.3 z A ).3 «8» 2 = P  )3  JX z ! ! « c» )« c»" > -  -z :1  Q

10-3 4:1 4/  s6) s6 4 [  + 2 6  A &  (23-3)  Wz   (24-3) -z  g A> i   J ! 6 . >   Q7R (68   s6) 8  8 6A7+;  (5  [ISO 1087-2:2000]

1- Compounding

2- Derivation

3- Free morpheme

4- Homograph

5 - Semantic homography

6 - Syntactic homography

6 11-3 1KV . X     (22-3) •   (2-3)  2 S j  (24-3) -z  ^A> ! X  2 8  S . Wz    3     S Q* _1 " 

12-3 2:  W. X8  .3 (14-3) 3I7A    1 8  - >  R 8  $ ^A> [ISO 24613:2008]

^c   ;  «'S »^A> ) S  J )«'S »)«  » )«3S »g }  ;  '+ Wz  8 6^ A> ; +? : Q . >  R  Wz  8 6^ A> J -W ' J W  +  8  -z 

13-3

3,8  :  W. X8  . >   ' !  ' (24-3)  Wz  ^A> ! 8  (12-3) 8   -z  ^c  ' 

.X 8  -z  ^c   ;  «'S »   >  ? 'c  8  - z  ^c   ) S  «3S » ' -z  : Q

.14-3 Q )ISO 30042:2008 19-3 Q )ISO 1087-2:2000J SW - " 

1- Inflection

2- Lemma

3 - Lemmatization

7 14-3 1(YJ .   A  +; 8  2 2  32> 6^ A> J 8   Π +; 2 ; U 8 P  [ISO 24613:2008]

.3 2  … > !?  ) >  [ I=J 1R 3 'A+ 3I=EF-1 "  .3 - > Q « 6 •z  J 8 -? J»  ;  ISO 24613    «^A>»-2 " 

15-3 2 6  2 . 2 ^+6z  !  ;  A J P  ! 'c  

 W» g }  ;  >  6• z  J 8 - ? J)«  c» g }  ;  >  •z  !    J P  'i- "  . 6 ^A1 P&p  Z; ! 2 > « 12  3+P ] »  + Z; ! P   «[ S

16-3 3Z 2 . > J `X 6X  f   ;&H  (12-3) 8   - z  8 6^ c   Π +; 2 6^ c  J ,S

17-3 4[. . >  -   1 S  * (18-3) z ! ! 7  2 p ^A> )« (6U» g }  ; ) 6 «,»  )« » )« 6» ^ > « »  « 6» J  0+ 8 6z ! 8 6•z  ) S  J :g }  «U(» •z   ^ (> « (6U» "z   ' ( J . ( 8(S (* p( ^A(> ˆ6 «,» 2 («  »  « W @» . 6 « »  «- @»   « W»  «- @» 8 6• z   f  8 6z ! 2 = P  3 « 6»

1- Lexeme

2-Lexicalization

3- Lexican

4-Morph

8 18-3 1. P . >    _P J 8 - ? J  6v  J 8 - ? J 7  2   P  'Ai 2 [ISO 24613:2008]

.  z A   JX z ! :    z ! 8  ; +?J ƒ - " 

19-3 2 W. '\ B &A ^ $ , X 2 €ZH 3= P   I= ! ! m c J 2    c  - > ^A1 63 I= J 8 - ? J J 2 (14-3) I= .3 a@ [ISO 24613:2008]

7+ ! J 1R )%&p ! ) Z; Z2  - z  Z2] >  2  Z; !  8 -z  i  Z; - "  (i  (Z; " ( B (+ 8 ( 'R(   ' 1+6 .[( >X g $ ^  ^}+= j ! g }  ; ) 7+ !   .3 'A++ -z 

20-3 3 &A R)6 .3  a@ ^ $ X - 6 ^A1  ; J X 8  2 )(14-3 )3I= i   J ^A1 (23-3) 8 -z 

' 8   X   2 )3 «»  «3c »3I=  J ^A1+ Z; 2 !  S  J « 3c » : Q .3 - > n4P 2

. 6 + ^A1  Z; 2 !  ) 2 - 4  Wz  *; i   J  P&p -1 "   B (, Z  g P'   . X  P   Z; !  ;    > J c T J 3 'A+  Z; 2-2 "  )Z2  Z; !  8 -z  2 ! ' > 8U + 1+6 ^+;  )8J  -z  XS      8l@  a@

1- Morpheme

2- Multiword expression

3- Phrasal compound

9 (p '  +, a -†c H   Wz     –1 Wz  X .    Z; !   Z; 2 ! '  . 2 4

21 -3 2 ] .3 - > A )X J 1R   )(23-3) -z   + X  2 8  S

22 -3 [  Š  3 3 'A+  P  (14-3) 3I= ! ^A>     Ai 2 X ^A> 2 3  J P  .W $ $ >   Z2 )38 @ )4*  [ISO 24613:2008]

23 -3 -z  .  B&2 U J 1R )^$ P WV !  ;  ) 2 3 (14-3) I= [ISO 24613:2008]

24 -3 4W. X]0 .3 ' (23-3) -z  ! J 8  -S 8  W [ISO 1087-2:2000]

. 6 «2 @»"z  J  6^ A>« 2 @»  «   »)«2 @» > ) S  J  : Q

1- Lexico-statistics: W  $ - 4   Q7R 8 6  J ^ A  •  2 3 8 X AA -1

2- Reduplication

3 -Agglutinative

4-Word form

10 25 -3 1 -. %)*+ . >  (26 -3)  Wz  0p 8 6 P  J 8 -? J  ' 0p XS

26 -3 2(WSU) Wz  0p 8 6 P  . >   c X  P  !  ;  2 3 [ ; J 3&; > !   (24-3) -z  ^A>

  8 lW p sy&; ) c sy&; )8 ; sy&; J 3 'A+ 3 -z  ! ^A>  2 3&; > ! - "  ^A1 8 ;  'o sy&; J Z2  )H2O    +> 8 6 + )i 8 6g A   [ $4 sy&; J c .F16   >  - >

27 -3 3W. 8  .3  >- z  ^7  U? J ^ P (23-3) -z  ! 7c  c 

 z  !( J o  Z j  ) 6z !J 8 > 3 'A+ -z  2  @ z )8 - 2  +, x 6  J - "  )'> 8  m c 8  ! [  g + H ($ >  8     6 ) }    6 X  2 >  )-z  (  )X " 6 ^A1  ;  ;   W 8 6    JĤ 6z !  ) 6 J '  -z  ! c  .3 !  !  . >  - q@ 

28 -3 42 "z  .3 a@ ^ $ & 2 X " 6 ^A1 8 6a R J X 72 8  2 (6-3) 2 !

1- Word segementation

2-Word segementation units

3-Word structure

4-Word compound

11 «B +6 » )« 1W- W »: Q

 -. %)*+ $   $\ \4

 -. %)*+ :$ ^$   4)/ 5 1 -4

.3 8 j  Wz  0p g  •    '  - > -  %> s6 4

«k  ($»  «-z  ^A(>» )«•z  » € ,4( ;  «3I= » «z !» €; U  ;   f Z 1 ^A> !( .3( -z  ^A(> ! 3I= !  ,4 ^A> .3 •z  ! z ! !  ,4 ^A> . 6   1 J 'c ( 8 ( -z  ^c ( (XS ](H J (2  (>  ^A1 8  -z  8 6 ^c  J Œ  +; k  $ . X  3  -z  8 6^A>

 6'  . 6  P&p   > J 8 6 J  Q7R   8  «-z »  «z ! »  q+6  P&p -1  "  .3 6X  >  J       ) > -  %> 2   2  P&p [

s6  , X     [i 6- z     ^A> 8  A  3  J   8 6 P  = p  > -z  )  2  I7   -z  8W^A>   +; 2 )8 I=  > -z       > -z  .2 2 ! k   -z  8W ^A>   +; 2 )( J  W k  ) 18 @  >- z   4*  >- z     8(W ( ; )3>  2 )2 )… > 8 6 XS ^ > 8 I=  > -z  .2 s   2 -z  .3 A

1-Agglutinative morphology

12 ` V

AI  7 'A 5 7 'A

.__] [.

Z 2 ,8  : W. X8  (YJ W.  /X]0

:  W. X8 

Z 2 P (8  5 AI  7 'A ,)$ ^ &  - 1 X]0 ^A> XS ! , … > 2 i «$ >  >- z  »   > - 4 «8 I=  >- z » %&p 1 -2 "  .3 -z  8W

•;  3 'A+ A . >  A A7Z US  , J  4 ƒ  ^ 1 @  4*  >-z   . W T  8 I=  >- z   S ! U X 2 3 ^= '+6  ) >   Wz W^A> »   «krap»   ) 2  - 4 8 A "z  B ,4  2  8  A J 1b ASX  )g }  ; 2 )8 @ 8 , J 8  .3 « > c  > »  « krap krap-krap» 2 = P  3 « > c J (    Wz  0p T   >- z  ; $ J m c ; +? ! ) xi  •   6  6 X  .3

. - > 'ISO 24614-2    ; $ ' -3 " 

1-Afrikaans:   8 SX  J

13 . Z 2 ^A>

 '0W.

8 I=  >-z  8 @ /4* >-z 

… > 2 *c 8W ; A )  US  (8 @ /4*

/  $  '0W . :   -2 X]0

^A>  .3 >X 8 6g $^  6^ }+= j ) P&p ) Z2 ^ > 1(WWE) 8 - z  i  Z; 8 J  + 8 - z  2 . >   Z;  Z2  Wz   Z2 ^ >  Z2. 2 ƒ 3 8 ,+ 3  3 $ ^  ;  « 4 < 2»)g }  ;  . >  - > SW X 8 6a R ! ! 8 )g P ' .3 4 2 c 2 ! YS  )  S  * B ,4 !  - > )-  o  ! «s  ? »g }  ;  . >  - > SW X 8 6a R ! ! 8  J    Z; 2   )    ,4  2 3  ? ZA s6 « ,4  ? »i W .   s  2 3  ?  i  Z; ! ?   )(20 -5   g }    2 - [ ) X   P Z ZZ2 !1@

1- Multi Word Expression

14 - 4 P&p  Z;  P 3 g «s  ? » Wz  2 2 i ) 6  ^A1 8 -z  .  c 'i « ,4  ?  » 2 = P  )« 3SX s  ?  » ) >

8 -z  i  Z;

(MWE)

2 %&p ^}+= j >X 8 6 g $ ^

 Wz  2  Z; 2

W. '\ B &A a  -3 X]0

8 ; sy&; ^ > sy&; > . > ^A1 8[ sy&; >  6- z  g A> J  Wz  0p 8 6 P   6 ; '   *  3  sy&;   [ sy&; > J c  8 lW p sy&; ) c  .3 -z   ? 3&; 8 P«! 6»)g }  ;  . > 2 , X 

(WSU)  -. %)*+  / M

 Wz  g A> [ sy&; >

8 lWp sy&; 8 ; sy&;  c sy&; > $4 sy&; >

 -. %)*+  / M a  -4 X]0

15 '6 X)1   -. %)*+  :6 L$ ' 2-4

: W-, J 0   U J  m c  J -J P   Wz  0p  ’H  k  $ -KJ ’B 2 6      ) 6    ) 6 @ ) 6 1@ ^ > ) 6  J =-d ’ 6   -&; )  8 6z  ! J =-e    8 6 -  @ k    Wz  0p  c ' 3, - J  > -z  8   *R1- ’4   - > -  %> g  3) J . J ! J W  +   J 8 -A@ -f

8 ((Q(7R 8 (6 U  '+A ( ) Q(7R 8 (6 '(  Wz  0p  WJ  J  > '“+p T  ) (> ( g +; 0p  ! ' (3-4  2 ƒ ) €8 6  1 r +> 8  2 $ A J  +H  (yU   1         o   ((6  (Q= J - > 2d 0  ) 2 s6 S 8l@ k $ ; . > -  %>

 -. %)*+ 3-4

. 2 \1  Wz  0p  5 ^A>

   bx  ) >  8 lW3 &;  A 8 6† c >   0p sy&;  ' )' = B c 8 6 -   )- (> (  (> P  B (c € J "A@ . >  0p   7 8 6 P   ISO 24612     ( 8 (6z A J = Œo +P   Wz  g A> ^ > 2  X  s6 S k  $ !  ? 8  8  @ ( k  ($   Wz  0p ; $ ) J -A@ . > s6 S U  Wz  0p ; $ J 8  . >  sy&;  Wz  0p 8 6 P  "   0p !    0p ^ Z 8  2 6  ^A1   s6 . 6 8 j

16 B c = 8 6-   :8  '

 Wz  0p 8 6 P   Wz  0p S   J 

 J 8 6-A@ 8   ;&H 0  > P  B c  Wz  0p - > 

; $ "& k  $ sy&; J 8   Wz  0p m c

 -. %)*+ -5 X]0

5 N 4 3> W 3  2  c 1 2 0

5 N 4 3> W 3  2  c 1 2 0

 -. %)*+   %)*+ D  -6 X]0 17 3 (&; '(= g (}  (; ()  A †c >  ' -  !  ;   6 )= 0p \p   (Wz  0(p  (J ( (> P \p  (3 - > †R1<0 )1> -  !  6 ^A> « 2»  (J )3( - > †R1 <0 )2> "   ) >  c > -z  !  ;  « c 2»  c '= « N( 3> W» B  P  .3 «» &; ! "z  ! B P  . X  P   ,    + -z   (Wz  0(p (P   J ^A(1 (2 (7c  c  !  <3 )5> "   )3  Z; 2 !  (i (     Wz  0p P   B   . - > †R1<4 )5> «N»  <3 )4> «3> W» . 2 !+2 -z  8    ! 6  >  >    ^ H     3&;  '

J 8 (> ( - (> -  '( s( •(;   (Wz  0(p.  ( 2  B c ' 8   Wz  0p    Wz  0p P  ! ) >  J ? 'U[ 8 60 p 2 $ ) >   Wz  0p 8 6 P   ' J aR '  «2 • AX -  o  j »7+  . >  >     0p c  ! W   ?  2 )2 0p1« 1 »B   6a R  X     )0p ; $ J c k  8  8  47R ; $ ) 2 + - 4 6 7 S J 2 i :    6  J ) >  7 S k   (  (6 a(R ' J c J 8 > )k  $     bx.(W $ - 4     28 lW  1  2 )38 -z  i  Z;  ;    «-z » B   J P  !  ;   « AX -  o  » (c )   [( k  $     B 7P u  . > 7 )- > SW T  -z  ƒ !  ; ( ) (>  '( ( -z  ^c  !  ;  « AX -  o  » > B + ^ > 3 'A+ 6 k  $ . > ^ > « AX» P  « -  o  »

 -. %)*+ A 75  '0 W.   7 1-5

8Ai 2 8 6 P   6 -z  8   J 6 2 3 ' ISO 24614         , ^ . > «z !»B

1- Token

2- Tokenization

3- Multi word expression

18  -. %)*+ M P ,8  &L 7 2-5

B )6 1-2-5

  J - W  J A : 6 '  Wz  0p P  'c  Z 8   J J ^ g  J   m (c 8 6 J  2 ISO 24614  8 6 aR [   U*R  y } .8 2 - W  J 8[ 8  P ) > g +; Q7R 8 63 $   3 'A+ Q7R g  .3 - > -  \j  )  2  .'  A 8 6>

 $ W -   7 2-2-5

1 )+ . P X7 -KJ  « » g }  ; ) 3  Wz  0p P  ! ? - [ X ) > ^ -z  !   z ! W .(«?  »   z A  ;

2YJ ()  X7 -d ^ ' -z  !  2 W . 6 + $     -z  ! 7c  c  8  ; $ J - 4  2 $ )« 4 < 2» sy&; ' )g }  ; . 3  Wz  0p P  ! Œo +P 6 [ X )J  - X • @ < 2»g }  ;  )  > v   + Ui ˆ6 )  - > -  o  8 ,+ 3  3 $ ^ .  - >  4 < 2 6  2 $«U+ 4 < 2» 34W    2 = P  « 4

"  /D h$  W. P  'L  $ ')$ D)E X$ 2 )g X7 -e

0(p (P  !( - ([ X b(@ ) >      a @^  $ ` 3 c ! 8  -z  !  2 W ' )'   ) >  UZ 3 'A+ ) >  -   Z U= «-  R» ! )g }  ; . >    Wz  .3  Wz  0p P  ! " 6 ^A1 -z 

1- Principle of bound morpheme

2- Principle of lexical integrity

19 1M*7 i 5 X7 -B 0p P  ! !  ;  - [ X b@ ) > - 4 P&p     Wz  8 6^ A> J 8  > W - 4 P&p  Z; !  ; «'4W ƒ   S » g }  ; ) X   P   Wz  (. >

2  X7 -f -z  ! )« A1@» )g } 8  .3  Wz  0p P  ! b@ ) >   J c   -z   2 ! W 2 'U[ ^ 8   &; ˆ6    + «31@» 8   &; 2 i )3  S 8  J . >  >     S  - X 3 2 2

 $ 6 W -   7 3-2-5

3 X7 -KJ A   Wz  J 8  >  -z  ! .3 -z   248 - z    ' 8     !   .3  Wz  0p P  !

(68 '0 9A ) 5(J G- X7 -d   Z;  Z2 c 8    6 c 8 6 > ^ ' . >  -  ^2 !  ;  1 6Ui .  T   W &$ 6 5  , X W P k  $  8  -z  ^c   ;

1- Principle of idiomatic use

2- Principle of non-productivity

3 - Principle of frequency

4- Lexicalization

5- Gestalt principle

6- Cognitive science

20 1(8 '0  '0  $ )  '$ : 7  @A X7 -e

7( (` 8 (e; J 8 (6 8    7 8 e; )  k  $   27 8 [= T      (P (   X ( - X ( (   -  2 TS P  81 3$  , X   . 6    c  > ^ > 8   X s6 S p ^ ' . >  k   - > n4P   7 TS P g }  ;  )  A  Wz  ^A1 8 [= !  7 8 6 +  ;     A Z;  Z2 J .k  $  « 2 + ^ »  «,> +  ?  » 8 6 [=     S  J  « A1@»  «s  ? »

3 $ V2 X7 -B

3( (  b@ ) 6 a6 2 X  J ^7  U? ^A1  k  $  8  2 "z  g +> W  (T 8(  (;&H g (Z  =  8 eS b7@») Q4R  S « S» )g }  ; . >  -z  ! (P  !(  (; ( X †R(1 ) (> - ?W k  $  « S» > W )3 («  & 8 ,+ .3  X  Wz  0p

 -. jX 6 X8  X7 3-5

 ?    U k  $ . > - ?W k  $  A - 4   Wz  0p 8 6 P ,+6 )^  . >  WJ    @   Wz  0p 8 6 P 

 -. %)*+ k  $ J7 4-5 4 $ :  :  X7 -KJ  2 =  - X 3  Wz  0p 8 6 P  7c  8 6 c  3 'A+  Wz  0p  . 6  1 Q7R 8 6 2 [ 8  = +P 'U[ 8 60p

1 - Cognitive linguistic

2- Prototype theory

3- Principle of language economy

4- Principle of granularity

21 / :'  6 :')G)$ X7 -d

0(p (P   (+6 J (1R 6 (   ) 2   c 0   > ^* X  2  6  B + •  !  ;  «8- 4 - -  » :' !  ‰4 « 4  » )g }  ;  ) 6 ^A1  Wz  (  « (4»  « 4 »  ' !  W g P'  )3  Wz  0p P  !  c ^2 ! . X   P   Wz  0p 8 6 P   ;   c )s >

/R )6 :'  6 :')G)$ X7 -e

(2 - ([ X )0(  k  ($     ) >  [ 2 ! ^ > 2  > 3S    2 ! W  (; ( Œo +P 2 -  2 2 - +6  X   P   Wz  0p P  !  ;   c W U «AZ(> ^ ;   »‰4 > )g }  ;  . > 8 lW  1 [ 7c   Wz  0p P  ! X  (2 ) (X (  (P (  (Wz  0p P  !  ;  ^2 !    > B + )' !  .3  Wz  0p P  'U[ U «^ ;   » -  2 2

l  / :0 %)*+ X7 -B

(  ) 6X J Z2 6  > [ sy&;    c sy&;   8 ; 8 6 > 7+ J )sy&; > 6 . 6 8  0  J c ^ P ' !  2  > k P 2    >   Wz  0p P  ! 0p P  ! «1945» 8 ; > )«    @  1945 g   B  , „ »g }  )g }  ; 2 m c ' '    «ㄱ» )«3 8 - 2  J  3  3&; '= ㄱ» jS 7+ . 3  Wz  . >    Wz  0p P  !)3 7+ ' ^; S

  ,  W 5 B &m X 6 D0E X75-5

'( (. 6 a> @  J 6   6  W $ - 4   WJ  8 - >    *$   '    I J c m c  6  J 8  ) > -  %>  ' J 8[ aR  2  [ +6 )g P $ •   1 6a R '  U  Wz  0p  -          ' 3+6 . 3 J  . > -   1  3SW 6 c

22 KJ ()E

( An) 1XML  -. %)*+ D 

. >  ISO 24611 k   XML   Wz  0p 8 6 P  a + J g } '

N 3> W   c 2

entry="urn:lexicon:cn:: 8 - 6" />

1- Extensible Markup Language

23 : '$ 6

1. ISO 639-1:2002, Codes for the representation of names of languages — Part 1: Alpha-2 code 2. ISO 639-2:1998, Code for the representation of names of languages — Part 2: Alpha-3 code 3. ISO 639-3:2007, Codes for the representation of names of languages — Part 3: Alpha-3 code for comprehensive coverage of languages 4. ISO 639-5:2008, Codes for the representation of names of languages — Part 6: Alpha-3 code for language families and groups 5. ISO 704, Terminology work — Principles and methods 6. ISO 860, Terminology work — Harmonization of concepts and terms 7. ISO 1087-1:2000, Terminology work — Vocabulary — Part 1: Theory and application 8. ISO 1087-2:2000, Terminology work — Vocabulary — Part 2: Computer applications 9. ISO 24611, Language resource management — Morpho-syntactic annotation framework) 10. ISO 24612, Language resource management — Linguistic annotation framework (LAF)) 11. ISO 24613:2008, Language resource management — Lexical markup framework (LMF) 12. ISO 12620, Computer applications in terminology — Data categories 13. ISO 16642:2003, Computer applications in terminology — Terminological markup framework 14. ISO 30042:2008, Systems to manage terminology, knowledge and content — TermBase eXchange (TBX) 15. Britannica Online Encyclopedia, http://www.britannica.com 16. ALLEN, J., Natural Language Understanding, (1994) Addison Wesley 17. ARONOFF, M. and REES-MILLER, J., The Handbook of Linguistics. 2001, Blackwell 18. BIBER, D. et al., Corpus Linguistics. 1998, Cambridge University Press 19. BUSSMANN, H., Routledge Dictionary of Language and Linguistics. 1996, Routledge 20. CRYSTAL, D., The Cambridge Encyclopedia of Language. 1997, Cambridge University Press 21. JOHNSON, K. and JOHNSON, H., Encyclopedia Dictionary of Applied Linguistics: A Handbook for Language Teaching. 1999, Blackwell 22. KENNEDY, G., An Introduction to Corpus Linguistics. 1998, Addison Wesley Longman

24 23. MATTHEWS, P.H., Morphology. 1991, Cambridge University Press 24. PACKARD, J.L., The Morphology of Chinese: A Linguistic and Cognitive Approach. 2000, Cambridge University Press 25. POOLE, S.C., An Introduction to Linguistics, 1999, Macmillan 26. RICHARDS, J. et al., Longman Dictionary of Applied Linguistics. 1985, Longman 27. UNGERER, F. and SCHMIDT, H-J., An Introduction to Cognitive Linguistics. 1996, Addison Wesley Longman 28. Zhu, Dexi, Lecture on Grammar, 2003, Commercial Press (written in Chinese)

25