Corpus Sp eci c Stemming using Word
Form Co o ccurrence
W Bruce Croft and Jinxi Xu
Computer Science Department
University of Massachusetts Amherst
Abstract
Stemming is used in many information retrieval IR systems to reduce word forms to
common ro ots It is one of the simplest and most successful applications of natural language
pro cessing for IR Current stemming algorithms are however either in exible or di cult
to adapt to the sp eci c characteristics of a text corpus except by the manual de nition of
exception lists We prop ose a technique for using corpus based word co o ccurrence statistics
to mo dify a stemmer Exp eriments show that this technique is e ective and is very suitable
for query based stemming