Leveraging Morpho-Semantics for the Discovery of Relations in Chinese Wordnet
Total Page:16
File Type:pdf, Size:1020Kb
Leveraging Morpho-semantics for the Discovery of Relations in Chinese Wordnet Shu-Kai Hsieh Yu-Yun Chang Graduate Institute of Linguistics Graduate Institute of Linguistics National Taiwan University National Taiwan University Taipei, Taiwan Taipei, Taiwan [email protected] [email protected] Abstract data from the web (Cimiano et al., 2005), Semantic relations of different types have but runs the risk of influenced by the web played an important role in wordnet, and have genre (Alain, 2010). been widely recognized in various fields. In re- To enrich the relations coverage in Chinese cent years, with the growing interests of con- structing semantic network in support of in- Wordnet (CWN), in this paper, we propose telligent systems, automatic semantic relation an in situ approach by exploiting the morph- discovery has become an urgent task. This semantic information. This method, simple paper aims to extract semantic relations re- lying on the in situ morpho-semantic struc- and straightforward as it seems, does not incur ture in Chinese which can dispense of an the difficulties associated with lexical gaps in outside source such as corpus or web data. cross-language mapping that any translation- Manual evaluation of thousands of word pairs shows that most relations can be successful based model would encounter; and it is also predicted. We believe that it can serve as a economic and complementary with previous valuable starting point in complementing with approaches in that we can dispense of an out- other approaches, which will hold promise for the robust lexical relations acquisition. side corpus resource. In what follows, Section 2 gives a brief sum- 1 Introduction mary of lexical semantic relations acquisition Semantic relations are at the core of WordNet- from two perspectives. Section 3 explains the alike architecture, and constitute the essential proposed methods for the automatic discovery and integral part of linguistic and conceptual of semantic relations, which are the main focus knowledge formalization. However, the man- of this study. Section 4 shows the experiment ual labeling task of semantic relations is very results and discussion. Finally, we conclude laborious. this paper in Section 5. To minimize the labor, in recent years, 2 Relations in Chinese Wordnet automatic ways of extracting semantic rela- tions from textual data have been proposed. Modelling on English WordNet, CWN has Among these methods, extensive works have been launched by Academia Sinica in 2006 and been done based on the so-called pattern-based continuously broadened its scope (Huang et approaches, which was pioneered by (Hearst, al., 2010).1. The initial version of CWN con- 1992). The patterns predefined or plucked out tains a manually created fine-grained senses of a corpus are often referred to as lexico- repository but sparse relations. However, se- syntactic patterns, which serve as an infor- mantic relation labeling is a time-consuming mation marker for a certain relation between and labor-demanding task. Two main meth- two concepts. Later representative works us- ods were employed to automatic relation ac- ing such approaches include (Cimiano et al., quisition. 2005), and (Pantel and Pennacchiotti, 2006), etc. Pattern-based extraction has shown quite 2.1 Bilingual Bootstrapping Approach reasonable success characterized by a (rela- Though lexical semantic relations (LSRs) tively) high precision rate, but suffers from could be presumed to be more universal than a very low recall resulting from the fact that word senses in human languages, a direct the patterns are rare in corpora. Remedies 1Freely available at http://lope.linguistics. against the problem involve exploiting scaled ntu.edu.tw/cwn copying or simple porting of LSRs from one finding semantic relations by using various wordnet to another could possibly lead to pattern-based algorithm has become the most invalid relations in the target wordnet. A common approach. broader view on the underlying inference logic We (Lo et al., 2008) have tried to define of cross-language LSRs with 26 rules was first some patterns (e.g., a manner of ) to extract proposed by (Huang et al., 2002) and for- troponymy among verbs in Chinese. To avoid mally introduced in (Hsieh, 2009). A series the interference of unnecessary contextual in- of large-scaled bilingual bootstrapping experi- formation which may include modal verbs, ments showed substantial improvements (with hedging, negation that often occur in different 55% precision) over baseline model (47%). corpus genres, we applied the proposed pat- However, it was also reported that among the terns on the gloss of CWN. The results were correctly predicted LSRs, a large portion (c.a. evaluated with the substitution tests. Substi- 60%) belongs to non-lexical relations such as tution test is commonly used in linguistic lit- similar to, pertainym, also see, etc. erature (Tsai et al., 2002); EuroWordnet pro- To look deeper into the issues, second ex- vided linguistic tests for each semantic rela- periment focusing only on the hypernymy- tion to examine the validity. In (Tsai et al., troponymy among the verbs was conducted. 2002), sentence formulae were created follow- The bootstrapping model returned totally ing the frame in EuroWordnet to examine the 12214 verb pairs mapped from WordNet 3.0, validity of certain semantic relations in Chi- which were manually evaluated. The analy- nese. Linguistic semantic tests help researcher sis shows that around 50% verb pairs can be check if two word meanings have a certain kind recognized as fit in CWN, however, two main of semantic relation or not, and further ensure error types are identified: [1] Lexicalization of the quality and consistence of the database. verbs: similar to the problems of lexical gap Therefore, following the previous framework, appeared in the cross-language sense mapping, a set of sentence formulae based on properties a single word in English often has meanings of troponymy was created to verify the cor- that require several words in Chinese to ex- rectness of hypernymy-troponymy verb pairs. plain. By analyzing the results, it is found that However, due to data sparseness, the system many verbs could not be described by a sin- can achieve only high precision but low recall. gle lexeme in Chinese. [2] Mismatch of synset: other than the above, there are cases when the 3 Morpho-semantic Linkage hypernymy-troponymy relations of the verb pairs are approved, but the synset that CWN Instead of assuming any external context in chooses is not the same with that of PWN. which words to be linked appear, we propose This could be due to the different semantic to exploit the language-internal evidence man- ranges between CWN and PWN hypernymy- ifested at the morpho-syntactic levels in Chi- troponymy pairs, or due to the subtlety of nese, which is assumably guided by underlying sense division when the sense levels are sim- semantic composition of morphemes. ilar. 3.1 Morpho-semantics in WordNets The bilingual bootstrapping experiments showed that lexical relations turn out to be The idea of exploiting morpho-semantic in- not subject to automatic importing and would formation for the enrichment of WordNet has still require tremendous human efforts of vali- been discussed and implemented in the Word- dations. Net community for a while. (Miller and Fell- baum, 2003) first described the importance of 2.2 Pattern-based Approach adding ”morphosemantic links” to WordNet, There has been a variety of studies on the with later works (Fellbaum et al., 2009) on the automatic acquisition of lexical semantic re- classification of regular polysemous patterns of lations, Hearst (Hearst, 1992) first proposed a morphosemantic V-N pairs related via -er af- lexico-syntactic pattern based method for au- fixation (e.g., build-builder). tomatic acquisition of hyponymy from unre- The notion of morpho-semantic links stricted texts, and since then automatically (MSLs) has been applied to other (morphologically-rich) languages such as formation in Chinese can be used to identify Czech (Pala and Hlavác̆ková, 2007) (in terms these relations based on the position and se- of D-relations), Turkish (Bilgin et al., 2004) mantic role of morphemes in modification. and Bantu languages (Bosch et al., 2008). In the case of Verb-Verb (compound) words, It is worth of mentioning that the proposed where the word is composed of two verbal morpho-semantic relations or derivational morphemes, linguistics have sorted out differ- relations are relations that hold among literals ent types resulting from the interplay of mor- (lemmas) rather than synsets, which leaves phemes within (Li and Thompson, 1981). For some room of discussion about the extra instance, for the type of so-called ‘parallel’ level these relations should be anchored VV compounds, V1 (verb in the first position) because neither paradigmatic nor syntagmatic and V2 (verb in the second position) share relations would fit. the similar meaning (near synonyms), such as It is note here that for morphologically-poor bang-zhu ‘help-assist’ (help), fang-qi ‘loosen- languages like Chinese, the MSLs are quite dif- abandon’ (give up). With a fine-grained sense ferent in that they do not exist between stems analysis, we can label the troponymy between and suffixes, but between word-to-be/word- V1 and V1V2, where V1 is widely recognized as used-to-be morphemes instead. This has the the component that carries heavier semantic practical advantages for the enrichment of ex- load in VV compound (a.k.a. left-headedness). isting paradigmatic relations, as we will intro- In the case of Noun-Noun (compound) duce in the following. words, e.g., noodle-shop ‘mian-dian’ (noodle shop), where the word is composed of two 3.2 Probing Morpho-Semantic nominal morphemes, the N modifier - N head Relations in Chinese 1 2 structure is prevalently observed (a.k.a. right- The vast majority of Chinese characters rep- headedness).