Automatic Corpus-Based Extraction of Chinese Legal Terms Oi Yee Kwong and Benjamin K. Tsou Language Information Sciences Research Centre City University of Hong Kong Tat Chee Avenue, Kowloon, Hong Kong
[email protected] [email protected] is a bilingual glossary of mostly legal terms Abstract derived from the bilingual judgments. There are This paper reports on a study existing bilingual legal dictionaries (e.g. involving the automatic extraction of Department of Justice, 1998; 1999) widely used Chinese legal terms. We used a word and referenced by law students and court segmented corpus of Chinese court interpreters. Nevertheless, according to many judgments to extract salient legal legal professionals, different terminologies are expressions with standard collocation in fact used for different genres of legal learning techniques. Our method documents such as statutes, judgments, and takes the characteristics of Chinese contracts. Therefore, robust and authentic legal terms into account. The glossaries are needed for different uses. extracted terms were evaluated by The compilation of a glossary from human markers and compared against judgments is hence one of the main tasks in the a legal term glossary manually project. However, identification of legal terms compiled from the same set of data. and relevant concepts by humans depends to a Results show that at least 50% of the large extent on their sensitivity which is, in turn, extracted terms are legally salient. based on personal experience and legal Hence they may supplement the knowledge. So not only is the process labour outcome and lighten the inconsistency intensive, the results are also seriously prone to of human efforts.