A Comparative Evaluation of Extraction Techniques

Darren Pearce

School of Cognitive and Computing Sciences (COGS) University of Sussex Falmer Brighton BN1 9QH [email protected] Abstract This paper describes an experiment that attempts to compare a range of existing collocation extraction techniques as well as the imple- mentation of a new technique based on tests for lexical substitutability. After a description of the experiment details, the techniques are discussed with particular emphasis on any adaptations that are required in order to evaluate it in the way proposed. This is followed by a discussion on the relative strengths and weaknesses of the techniques with reference to the results obtained. Since there is no general agreement on the exact nature of collocation, evaluating techniques with reference to any single standard is somewhat controversial. De- parting from this point, part of the concluding discussion includes initial proposals for a common framework for evaluation of collocation extraction techniques.

1. Introduction With recent significant increases in efficiency Over the last thirty years, there has been much discus- and accuracy, there is no reason why explicit parse infor- sion in the linguistics literature on the exact nature of col- mation should not be used. Indeed, Goldman et al. (2001) location. Even now, there is no widely-accepted definition. emphasises the necessity to use such information since some collocational dependencies can span as many as 30 A by-product of the lack of any agreed definition is a words (in French). Several researchers have exploited the lack of any consistent evaluation methodology. For exam- availability of parse data. Lin (1998) uses an information- ple, Smadja (1993) employs the skills of a professional lex- theoretic approach over parse dependency triples, Blaheta icographer, Blaheta and Johnson (2001) use native speaker and Johnson (2001) use log-linear models to measure asso- judgements and Lin (1998) uses a term bank. Evaluation ciation between verb-particle pairs and Pearce (2001) uses can also take the form of a discussion of the ‘quality’ of the restrictions on synonym substitutions within parse depen- extracted (e.g. Kita and Ogata (1997)). dency pairs. Since it is somewhat controversial to evaluate these Recent research into collocation extraction (Pearce, techniques with reference to anything considered as a stan- 2001) has produced a new technique based on analysing dard whether relying on human judgement or machine read- the possible substitutions for synonyms within candidate able resources, relatively little work has been done on com- phrases. For example, emotional baggage (a collocation) parative evaluation in this area (e.g. Evert and Krenn occurs more frequently than the phrase emotional luggage (2001), Krenn and Evert (2001) and Kita et al. (1994)). formed when baggage is substituted for its synonym lug- However, the view taken in this paper is that such evalua- gage. This technique uses WordNet (Miller, 1990) as a tion does offer at least some departure points for a discus- source of synonymic information. sion of the relative merits of each technique. 3. Experiment Description 2. Existing Collocation Extraction The experiment compares implementations of many of Techniques the techniques discussed in the previous section. Each tech- Various researchers in natural language processing have nique was supplied with co-occurrence frequencies proposed computationally tractable definitions of colloca- obtained from 5.3 million words of the BNC. For each pair tion accompanying empirical experiments seeking to vali- of words in this frequency data, the technique was used to date their formulation. With the availability of large cor- assign a ‘score’ where the higher the score, the ‘better’ the pora came several techniques based on N-gram statistics collocation (according to the technique). Alternatively, the derived from these large bodies of text. This begins with technique, based on some predefined criteria (such as the the -score of Berry-Rogghe (1973) and later with Church application of a threshold), could opt to ignore scoring the and Hanks (1990), Kita et al. (1994) and Shimohata et al. pair. Such a strategy tends to increase precision at the ex- (1997). pense of recall. Smadja (1993) went one step further and developed a Evaluation of the output list uses the multi-word infor- technique that not only extracted continuous sequences of mation in the machine-readable version of the New Oxford

words that form collocations but also those with gaps (e.g. Dictionary of English (NODE) (Pearsall and Hanks, 1998). ¡

break-down, door ¢ in break down the old door). Under- This was processed to produce a list of 17,485 two-word pinning the technique was the implicit inference of syntax collocations such as adhesive tape, hand grenade and value through analysis of co-occurrence frequency distributions. judgement. This list was subsequently reduced to 4,152 en-

tries that occurred at least once in the same data supplied to IJII

the techniques, thus forming the ‘gold’ standard. IIJI open the door but the door III

was open III so I left the door open III open III the open door

4. Technique Details

6¤LK © ?¤NM The following subsections briefly describe each of the © open and door so:

evaluated techniques with particular emphasis on the as-

M

© door

>

6¤ @ ¤ ¤QO(¨ O%ORM

signment of scores to word sequences, especially sequences  door

PO%O%O!-AK of length two. In addition, accompanying each description -A© open is a figure consisting of a small example of the application of the technique to a fabricated corpus of a 1000 words. The and

subsection titles also serve to identify the way in which the

B *

techniques will be referred to in Section 5. >

 ©DP© ©DS 

© ¡'& ¢( 6¤ door open

¡ ¢

Throughout the descriptions, a word, ( , , etc), when

¡

¤QO(¨ O%ORM4TUKTVM;¤QO(¨ O%W

§¦©¨ ¨ ¨ ¢

part of a (contiguous) sequence of words, £¥¤

 (or ¦ ), may be written indicating its position within

With © ¡'& ¢( 6¤XM :

 £©¤

this sequence where  and . It is use-

MY-ZO(¨ O%W © !

ful to distinguish the frequency of a word, , the fre-

[

¤ ¤^P_(¨ `

© £"

O%W\7]-ZO(¨ O%ORMS

quency of a sequence, , and, in particular, the distance O(¨

frequency, $#% ¡'& ¢( , which represents the count of word

) ¡ ) ¢ occurring words after . In general. the distance, , Figure 1: Example for Berry-Rogghe (1973).

can be negative as well as positive, represented by the set *

¤,+.- /& ¨ ¨ ¨0%& %& ¨ ¨ ¨1©2 . When this set is restricted to a *43

positive range only, this is written ¤5+.6¨ ¨ ¨7©2 . The 4.2. Church and Hanks (1990) (church) 96:8; arithmetic mean of a bag, 8 is notated and the stan- Although widely used as a basis for collocation extrac- 1

dard deviation by

One of the earliest attempts at automatic extraction, this a

#

  ¡'& ¢(

technique uses a window of words. Given a word, ¡ , and

>

© ¡= ¢

its frequency, , the probability of another word, , oc- # bRc6d

 ¡'& ¢( ?¤ * @

3 ED curring in any other position in the corpus is approximated  by: where the normalisation factor takes account of the possi-

bility of counting the same word more than once within the © ¢(

> 2 @  ¢( ?¤ window. The co-occurrence of the two words is scored

-A© ¡= using point-wise mutual information:

@

>

-"© ¡= e

where the normalisation factor ( ) is the number of  ¡'& ¢(

 ¡'& ¢( ?¤Nf g%h.i

> >

words in the corpus that aren’t ¡ . The expected frequency

 ¡= ©D  ¢( ¢ of ¡ and within a certain window of words is then:

where word probabilities are calculated directly using rel-

B

*

>

 ¢( CDE© ¡= ©D.  C ¡'& ¢( ?¤ ative frequency. This gives the amount of information (in

bits) that the co-occurrence gives over and above the infor-

*

 ¡ ¢ in which there are  chances around each for to occur. mation of the individual occurrences of the two words.

The significance of the actual frequency count, © ¡'& ¢( in Since this measure becomes unstable when the counts

B © ¡'& ¢( ]jNK comparison to © ¡'& ¢( is computed using a normal approx- are small, it is only calculated if . imation to the binomial distribution, yielding the -score:

4.3. Kita et al. (1994) (kita)

B © ¡'& ¢(

© ¡'& ¢( C- This technique is based on the idea of the cognitive cost

F

¤ B

of processing a sequence of words, kR £" . In general, the

>  ¢( © ¡'& ¢( ©D%7- non-collocational cost of a word sequence is assumed to be

3 £ linear in the number of words ( k l] £" §¤, ). However,

In order to process bigram data, this must be modified collocations are processed as a single unit ( kEm  £" ¤n ). slightly such that the window is just one word wide. Motivated by the rationale that a collocation would serve to reduce the cognitive cost of processing a word sequence, 1These functions, de®ned purely for the sake of brevity, neces- collocations are extracted by considering cost reduction sitate the use of bag-theory. Bags are similar to sets except that

2

p!q/o rtsvu

they can contain repeated elements. To correspond with this sim- Note that o corresponds to in Church and Hanks H ilarity, they are written using the same font as for sets ( G , , etc). (1990). The distinction is made where appropriate although this is usually 3This is, as the authors admit, a greatly simplifying assump- obvious from the context. tion.

IJII III III III

butterfly stroke butterfly IIJI animal liberation animal lib-

III III III

stroke butterfly stroke stroke eration front IIJI animal liberation

III III III stroke stroke animal liberation III animal liberation

front III

_ >

> The table below shows the cost reduction calculations

?¤  ?¤

 butterfly stroke

£ ¦ PO%O%O

PO%O%O for two word sequences: animal liberation ( ) and i animal liberation front ( £ ).

so, using a window of two words:

¢ ¢

¡

£  £  l m£

¡

>

 6¤

¡

K PO K K

butterfly stroke £"¦

¡ ¡

T PO%O%O

i

£ _ M

i

n£

and However, since £V¦ , the calculations above i

have considered two occurrences of £ also as occur-

e

O(¨ O%O(

£V¦

 6¤Xf g%h ¤LK ¨ W

butterfly stroke i rences of a larger collocation, . The cost reduction

O(¨ O%O%_T O(¨ O%O

of £"¦ must therefore be reduced accordingly:

¡

i

£  £"¦ 6¤  D. © £"¦0 ©-A© £ ?¤QK!- ¤Q_

Figure 3: Example for Kita et al. (1994). Figure 2: Example for Church and Hanks (1990). across all occurrences of the phrase within the corpus. With ability distributions of the words on each side:

the two costs calculated by:

© & £" © £U& !

> >

¢

 & £" ?¤  £U& ! ?¤

l] £" ?¤Xk0l] £" ©DE© £" 6¤  £ DP© £"

© £" © £"

¢ mP £" ?¤Xk mP £" ©DE© £" 6¤L© £"

are used to find the left and right entropy: t £"

the difference represents the cost reduction, £ , for pro-

 a

cessing the word sequence as a unit rather than as separate 

> >

E£" 6¤ -   & £" Sf g%h.i   & £"

words: 

 C¦

¢ ¢

!

£t £" ?¤ l  £" ©- mP £"

a



> >

" "

 £ R 6¤ -  £U& Sf g%hSi  £U&

¤^  £R- E CDP© £"

"

The set of collocations, ¤ , extracted by this method is

$# &% formed by applying a threshold to the cost reduction: where  and are the number of different words occur-

ring to the left and the right of £ respectively. The entropy

U¤ +P£¦¥§£t £" j©¨Y2 ¤ of the word sequence is then given by:

Special consideration must be made, however, for the

  

('V 6 E£" J&  £ R possibility of one collocational phrase as a subsequence of  £" ?¤

another. This leads to the corrected cost reduction, £ :

a

 £" 6¤   £ - E CD. © £" ©- ©'

£ 4.5. Blaheta and Johnson (2001) (blaheta)



  b In the specific case of two-word collocations, this tech-

nique corresponds to the log odds ratio. Performed on a

¡ ¡

where  is the subsequence operator. Importantly, when ¡

T contingency table, this measure indicates the degree to £§¤  , the cost reduction is just the frequency of the

which one variable influences another and is unaffected by

t £" ?¤£  £" ?¤L© £" word sequence: £ .

sample size (Howell, 1992). For a particular pair of words,

¡

i

& ¢

4.4. Shimohata et al. (1997) (shim) ¦ , the contingency table, based on pair counts of the

¡ ¢ By determining the entropy of the immediate context of form ¡'& ¢ is then:

a word sequence, this technique ranks collocations accord-

i i

*)¤N ¢ ¤X

ing to the assumption that collocations occur as units in (an ¢

i i

)¤X ¦ k,+ k k,+-vk

information-theoretically) ‘noisy’ environment.4 The prob- ¡

¡V¤X ¦ kP¦ k/. kP¦0-vk/.

i i

,+1-vkP¦ k -vk/. k,+-vkP¦2-vk -vk/. 4In fact, this forms only the ®rst phase of the technique pro- k

posed in Shimohata et al. (1997). The second phase combines

/+  3 3 3  . these collocational units iteratively (after applying a threshold) to where the counts k can be calculated directly from fre-

5

i !¦ yield collocations with `gaps'. The technique is therefore capa- quency information. The odds of occurring given ble of extracting both interrupted and uninterrupted collocations. For the purposes of the task investigated in this paper, the second 5In practice, the frequencies are corrected for continuity by phase is ignored. adding 0.5.

Assuming the 1000 word corpus consists of 40 sen- III

III actual bodily harm grievous tences each 25 words in length, the total number of

¡

. III

bodily harm III actual bodily harm

§ k0]¤ M"TM.OV¤ ` O $+

,  . Given the fol-

grievous bodily harm III grievous bod- lowing contingency table for the bigram collocation, III ily harm III grievous bodily harm paddling pool:

grievous bodily harm III actual bodily

©¨ ¨ >©¨ ¨

harm >

¢*)¤ ¢ ¤

¡ )¤ ).)   `%O%O _%O `%_%O

¡

¡V¤ ).)   O PO _%O

¡ ?¤XW

© bodily harm

` O M.O ` O

?¤X_ © actual bodily harm

the odds ratio is: ?¤QK

© grievous bodily harm

`%O%OT PO

¤ EK

¡

O4T _%O

so

¢

¡

g%hEK¤ ¨ S

and so ¤Qf . The standard error in this

_ >

estimate is:

 6¤

actual & bodily harm

W

K

   

>

 & 6¤

- - - ¤QO(¨ M._

grievous bodily harm ¡

W

`%O%O O _%O PO

¢

¡

-Z_(¨ `%< ¤ %¨ _%O

giving: so 9¤ .



a

 >

> Figure 5: Example for Blaheta and Johnson (2001).

E£" 6¤ -  7& £" Sf g%hSi   & £"

 C¦

_ _ K K

¤^- g%h.i - f g%h.i

f the experiment described in this paper, the degree to which

W W W

W such changes are possible is obtained and this is used as a `.K ¤QO(¨ score. Mapping a word to its synonyms for each of its senses

is achieved using WordNet,  , which consists of a set of

Figure 4: Example for Shimohata et al. (1997). synonym sets (or synsets):

i

 ¤ +©¦E& & ¨ ¨ ¨02

i¡ i

k,+

has not occurred is k . Similarly, the odds of oc-

k . kP¦ curring given !¦ has occurred is . The ratio of this .

second fraction to the first is the odds ratio: ¡

^¤ ¦C¨ ¨ ¨ ¢

A phrase,  is considered to be one lexi-

 

k/. kP¦ ,+Pk/.

k cal realisation of the concept . Assuming that is seman-

¤

¡ i

i tically compositional, the set of competing lexical realisa-

k k,+ kP¦Jk

 tions of  , can be constructed through reference to Taking the natural logarithm of this measure gives the WordNet synsets:

log odds ratio:

 ?¤% ¥=& §t? /¨R '

¦ 

¢

k,+Pk/.

i

¤Xf g%h ¤Xf g%h?k,+- f g%h?kP¦?-Zf g%h?k -Af g%h?k/. i

kP¦Jk where the WordNet synset corresponding to the correct

'

which has a standard error of: sense of each  is .

£¤

© 

For each phrase  , two probabilities are cal-

¤

.

¦

a ¦

 culated. The first is the likelihood that the phrase will occur ¥

<¤ based on the joint probability of independent trials from k

$+ each synset:

¢

9 ¤ -

Collocations are then scored by calculating

¡ !

B

> >

`%< 9

_(¨ which sets to the lower bound of a 0.001 confi-

C 6¤   

C¦ dence interval. This has the effect of trading recall for pre- 

cision.

>

 C with   1 , the probability that is ‘picked’ from , 4.6. Pearce (2001) (pearce) approximated by a maximum likelihood estimate:

This supervised technique is based on the assumption

©" #¡$   '

that if a phrase is semantically compositional, it is possible >

   6¤

©" #¡$   '1 §

to substitute component words within it for synonyms with- l &%(' out a change in meaning. If a phrase does not permit such b substitutions then it is a collocation. More specifically to The second probability is based on the occurrences of phrases (rather than words):

For the bigram collocation pedestrian crossing, the

©

> frequencies and probabilities of alternatives (syn-

C 6¤ a



© onyms) are first calculated:

¢¡ £¥¤ ¢¦

b

>

© !   The difference between these two probabilities,

pedestrian 12 0.60

B

> >

C1 C ©- ).C¤ walker 7 0.35 footer 1 0.05 indicates the amount of ‘deviation’ from what is expected crossing 8 1.00 under the substitution assumptions of this technique and crosswalk 0 0.00

what actually occurs. To gauge the strength of this devi-  ation (and hence the strength of the collocation), each ) is Analysing the co-occurrence frequency information

converted into units of standard deviation:6 yields the following statistics:

B

> >

).

C  ). '¤

O(D !¤QO(¨ O O(¨ M

18 1 O(¨

In stark contrast to the other techniques investigated in crossing

pedestrian

O(D O;¤ O O this paper, this development of the work in Pearce (2001) 0 0 O(¨ requires sense information. In the absence of such data, the crosswalk

walker

_.K D !¤QO(¨ _.K -O(¨ _.K realisation function is modified to: 0 0 O(¨

crossing

§

  6¤ R ¥=& ! 6 /¨R ©¨  ¤6  

 

¦ walker

_.K D O;¤ O O 0 0 O(¨

crosswalk

6 !

where ¤ is the concept set of : footer

O.K D !¤QO(¨ O.K -O(¨ O.K 0 0 O(¨

crossing

¤6 ! ?¤ +¥§ 2

 footer

O.K D O;¤ O O 0 0 O(¨

Similar modifications are also made to the calculation crosswalk

>

O of   . In addition, since WordNet consists of synonym Total 18 1 1 information on nouns, verbs, adjectives and adverbs, the 

calculations above are carried out for all possible taggings With

O(¨ ORM RK ¡¡ ¤QO(¨ and the maximum score is used. When one of the words or

both do not occur in WordNet, the score is not used. This trades precision for recall in a similar way to Church and the score for pedestrian crossing is then:

Hanks (1990) and Blaheta and Johnson (2001).

O(¨ M ¡

¤ %¨ W ¡¡

5. Results O(¨ Each implementation returns a ranked list of word se- . quences, strongest collocations first (according to the tech- Figure 6: Example for Pearce (2001).

nique). Taking to be the set of extracted collocations and

>

as the gold standard, recall, " , and precision, , are de-

fined:

Recall

  

 100

>

T PO%O ¤ T PO%O " ¤

    90 As in Evert and Krenn (2001), the results are pre- 80 baseline

sented by way of precision and recall graphs with preci- 70 berry @ @ blaheta sion and recall calculated for the top % where ¤ 60

church

& PO(& EK & ¨ ¨ ¨& PO%O K . For comparison, a baseline was also ob- 50 kita tained where scores were allocated at random to each of the Recall pearce 40 bigrams occurring in the corpus. Figures 7 and 8 show re- shim call and precision respectively for each of the techniques. 30 Figure 10 shows the relationships between precision and 20 recall. 10 Particularly noteworthy is the degree to which church 0 0 20 40 60 80 100 differs from the other curves, especially for precision and Percentage N−best recall against precision. This is due to the condition (as de- scribed in Section 4.2.) that the score is only calculated if Figure 7: Recall for all techniques. 6The mean of the distribution is zero. Precision Recall v. Precision 14 100

baseline 90 baseline 12 berry berry blaheta 80 blaheta 10 church church kita 70 kita pearce 60 pearce 8 shim shim 50 Recall

Precision 6 40

4 30 20 2 10

0 0 0 20 40 60 80 100 0 2 4 6 8 10 12 14 Percentage N−best Precision

Figure 8: Precision for all techniques. Figure 10: Recall against precision for all techniques.

Precision Recall v. Precision 3 100 baseline 90 baseline 2.5 berry berry blaheta 80 blaheta kita kita 2 pearce 70 pearce shim 60 shim

1.5 50 Recall Precision 40 1 30

0.5 20 10

0 0 0 20 40 60 80 100 0 0.5 1 1.5 2 2.5 3 Percentage N−best Precision

Figure 9: Precision for all techniques except church. Figure 11: Recall against precision for all techniques ex- cept church. the bigram frequency is greater than 5. Imposing this con- straint leads to the loss of over 80% of the collocations in task, however, is just one of many that could be performed the gold standard since most collocations occur very infre- and forms part of an ongoing comparative evaluation of quently. This can be seen in Figure 12 which shows the first new techniques based on Pearce (2001). few entries in the (proportional) distributions of frequencies Evaluating against any one particular set of criteria is in the corpus and in the gold standard. With more data, this controversial simply because there is no general agreement problem would be somewhat lessened. on the definition of collocation. This is confirmed by the To facilitate discussion of differences between the other wide range of strategies advanced in the literature as dis- techniques, Figures 9 and 11 show the same information as cussed in Section 2. In particular, some of the techniques Figures 8 and 10 but without church. The recall curve for are specifically designed to extract dependencies with gaps blaheta grows particularly smoothly. In contrast, shim per- such as those of Berry-Rogghe (1973) and Church and forms consistently worse than even the baseline. pearce ob- Hanks (1990) although they have been adapted here to pro- tains consistently higher precision than the other techniques cess bigram frequency information. (except church) even though it lacks the sense information required. This is possibly due to the fact that by reference As Krenn and Evert (2001) describe, there is very much a need for the concept of collocation to be precisely defined. to WordNet, bigrams involving closed-class words are not However, as has been shown, even in the absence of such scored. The trade of recall for precision is not nearly so conditions, it is still possible for comparative evaluation to severe as in church since it still extracts nearly 90% of the be a useful pursuit. Collocation extraction techniques are gold standard. founded on a variety of assumptions and, in order to fully investigate the utility of a technique, it is necessary to eval- 6. Conclusions uate it against a range of gold standards as well as adopting A number of collocation extraction techniques have other strategies such as native speaker judgements and task- been evaluated in a bigram collocation extraction task. This based evaluation. 0.7 of collocations and bilingual collocation . Corpus Computer Assisted Language Learning: An Interna- 0.6 Gold Standard tional Journal, 10(3):229–238.

0.5 Kenji Kita, Yasuhiko Kato, Takashi Omoto, and Yoneo Yano. 1994. A comparative study of automatic extrac- 0.4 tion of collocations from corpora: Mutual information vs. cost criteria. Journal of Natural Language Process- 0.3 Proportion ing, 1(1):21–33.

0.2 Brigitte Krenn and Stefan Evert. 2001. Can we do better than frequency? a case study on extracting PP-verb col- 0.1 locations. In 39th Annual Meeting and 10th Conference of the European Chapter of the Association for Computa- 0 1 2 3 4 5 6 tional Linguistics (ACL39), pages 39–46, CNRS - Insti- Frequency tut de Recherche en Informatique de Toulouse, and Uni- versite´ des Sciences Sociales, Toulouse, France, July. Figure 12: Proportion of bigrams with counts in the range Dekang Lin. 1998. Extracting collocations from text cor- 1 to 6 in the corpus and the gold standard. pora. In First Workshop on Computational Terminology, Montreal, Canada, August. George A. Miller. 1990. WordNet: An on-line lexical Acknowledgements database. International Journal of Lexicography. Much of this work was carried out under a stu- Darren Pearce. 2001. Synonymy in collocation extraction. dentship attached to the project ‘PSET: Practical Sim- In NAACL 2001 Workshop: WordNet and Other Lexi- plification of English Text’ funded by the UK EPSRC cal Resources: Applications, Extensions and Customiza- (ref GR/L53175). Further information about PSET is tions, Carnegie Mellon University, Pittsburgh, June. available at http://osiris.sunderland.ac.uk/ Judy Pearsall and Patrick Hanks, editors. 1998. The New ˜pset/welcome.html. Oxford Dictionary of English. Oxford University Press, I would also like to thank my supervisor, John Carroll, August. Machine Readable Version. for his continued support and advice. Sayori Shimohata, Toshiyuko Sugio, and Junji Nagata. 1997. Retrieving collocations by co-occurrences and 7. References word order constraints. In 35th Conference of the Asso- Godelieve L. M. Berry-Rogghe. 1973. The computation ciation for Computational Linguistics (ACL’97), pages of collocations and their relevance to lexical studies. In 476–481, Madrid, Spain. A. J. Aitken, R. W. Bailey, and N. Hamilton-Smith, edi- Frank Smadja. 1993. Retrieving Collocations from tors, The Computer and Literary Studies, pages 103–112. Text: Xtract. Computational Linguistics, 19(1):143– University Press, Edinburgh, New York. 177, March. Don Blaheta and Mark Johnson. 2001. Unsupervised learning of multi-word verbs. In 39th Annual Meet- ing and 10th Conference of the European Chapter of the Association for Computational Linguistics (ACL39), pages 54–60, CNRS - Institut de Recherche en Informa- tique de Toulouse, and Universite´ des Sciences Sociales, Toulouse, France, July. Kenneth Ward Church and Patrick Hanks. 1990. Word as- sociation norms, mutual information, and lexicography. Computational Linguistics, 16(1):22–29, March. Stefan Evert and Brigitte Krenn. 2001. Methods for the qualitative evaluation of lexical association measures. In Proceedings of the 39th Annual Meeting of the Associa- tion for Computational Linguistics, Toulouse, France. Jean-Philippe Goldman, Luka Nerima, and Eric Wehrli. 2001. Collocation extraction using a syntactic parser. In 39th Annual Meeting and 10th Conference of the Eu- ropean Chapter of the Association for Computational Linguistics (ACL39), pages 61–66, CNRS - Institut de Recherche en Informatique de Toulouse, and Universite´ des Sciences Sociales, Toulouse, France, July. David C. Howell. 1992. Statistical Methods For Psychol- ogy. Duxbury Press, Belmont, California, 3rd edition. Kenji Kita and Hiroaki Ogata. 1997. Collocations in lan- guage learning: Corpus-based automatic compilation