Latent Semantic Kernels for Wordnet: Transforming a Tree-Like Structure Into a Matrix

Latent Semantic Kernels for WordNet: Transforming a Tree-like Structure into a Matrix Young-Bum Kim Yu-Seop Kim Department of Computer Engineering Department of Computer Engineering Hallym University Hallym University Chuncheon, Gangwon, 200-702, Korea Chuncheon, Gangwon, 200-702, Korea [email protected] [email protected] Abstract considering its context. Latent Semantic Kernel, which was proposed by [3], WordNet is one of the most widely used linguistic re- has shown simpli£ed methods to measure the similarity be- sources in the computational linguistics society. However, tween documents and also has been applied to many appli- many applications using the WordNet hierarchical structure cations such as an automatic assessment system of [7]. Un- are suffering from the word sense disambiguation (WSD) like the WordNet based methods, the kernel method fortu- caused by its polysemy. In order to solve the problem, nately has no need to consider the polysemy problem during we propose a matrix representing the WordNet hierarchical measurement process. This is the reason we tried to trans- structure. Firstly, we transform a term as a vector with ele- form the WordNet structure into the kernel form, that is, a ments of each corresponding to a synset of WordNet. Then, matrix. This matrix from WordNet has another nice prop- with singular value decomposition (SVD), we reduce the di- erty that it can easily be integrated with other matrix-based mension size of the vector to represent the latent semantic information in other application later. structure. For evaluation, we implement an automatic as- At £rst, we built a term-synset matrix, like a term- sessment system for short essays and acquire reliable accu- document matrix in traditional IR society, with a given Ko- racy. As a result, the scores which are assessed by the au- rean WordNet which is called KorLEX [10]. We initially tomatic assessment system are signi£cantly correlated with gave a score, which was calculated based on a distance to those of human assessors. The new WordNet is expected to the given synset, to each element of the matrix. Then we be easily combined with other matrix-based approaches. constructed a new semantic kernel from the matrix using the SVD algorithm. For evaluation of the new kernel matrix, we implemented an automatic assessment system for 1 Introduction short essay questions. The correlation coef£cient between the automatic system and a human was 0.922. Section 2 will describe the representation of the trans- For a decade, WordNet [4] has been one of the most formed WordNet. The latent semantic kernel method inte- widely used linguistic resources, and is a broad coverage grated with WordNet will be explained in section 3. The lexical network of English words [2]. [1] made use of Wor- experimental results and the concluding remarks will be de- Net to measure the semantic distance between concepts and scribed in section 4 and 5, respectively. [13] used WordNet to disambiguate word senses. [9], [5], and [11] measured the relatedness of concepts, not words themselves, and the relatedness of words were estimated by 2 Transformed WordNet Structure using an assumption proposed by [11]. [11] measured the relatedness between words by calculating the related- Synsets (synonym sets) are representing speci£c under- ness between the most-related pair of concepts. lying lexical concepts in the WordNet. Even though the The assumption would be proper to measure the relat- WordNet has been utilized to solve many semantic prob- edness of words themselves. However, it is less proper to lems in computational linguistics and information retrieval measure the similarity between sentences, paraphrases, and societies, its inherent polysemy problem, that is, one term even documents, and a new complicated method is needed could be shown in multiple synsets, has also caused another to be developed to disambiguate sense of each word with problems. In order to solve the problem, we adapted the latent semantic kernel to the WordNet. eigenvectors associated with the r nonzero eigenvalues of Firstly, we built a term-synset matrix by transforming the AAT and AT A, respectively. The original term-synset ma- WordNet hierarchical structure. Each row vector of the matrix (A) has size of m × n. One component matrix (U), trix is associated with each term listed in WordNet and the with m × r, describes the original row entities as vectors of terms are also appearing more than once in corpus data hav- derived orthogonal factor value, another (V ), with n×r, de- ing about 38,000 documents. The vector can be represented scribes the original column entities in the same way, and the like: third (Σ) is a diagonal matrix, with r ×r, containing scaling values when the three components are matrix-multiplied, tj =<s1,s2,...,si,...,sN > (1) the original matrix is reconstructed. The singular vectors corresponding to the k (k ≤ r) largest singular values are , where tj means a row vector for the j-th term and N is s then used to de£ne k-dimensional synset space. Using these the total number of synsets in the WordNet. And i,setto m × k n × k U V zero initially, is calculated by vectors, and matrices k and k may be rede- £ned along with k×k singular value matrix Σk. It is known α A = U Σ V T k s = that k k k k is the closest matrix of rank to the i k (2) 2 original matrix A.AndUk is replaced with P . [8] explains , where α is a constant value. The si is decreasing along more details of above SVD-based methods, latent semantic with the number of edges, 0 ≤ k ≤ kmax, on the path from analysis (LSA). the synset including the term tj to the i-th synset. The kmax decides the range of synsets related with the term. If kmax 4 Evaluation is increasing, more synsets will be regarded as to be related with the synset of the term. In this paper, we decided that In this section, we will £rstly explain the automatic as- k =2 α =2 max and also . sessment procedure developed to evaluate the usefulness of Figure 1 from [12] shows a part of WordNet extract for the new kernel based on WordNet, and then show the use- { the term ‘car’. The term appears in multiple synsets, car, fulness by giving the assessment accuracy. gondola}, {car, railway car}, and {car, automobile}. Then s (α=2) =2 the values of i for the synsets are determined to (20=1) 4.1 Automatic Assessment for Short Es- in this paper. The adjacent synsets to {car, automobile}, say which are {motor vehicle}, {coupe}, {sedan}, and {taxi} (α=2) in £gure 1, are all given (21=2) =1s as their si values. This Figure 2 shows the whole process of the automatic as- procedure is continued until the kmaxth adjacent synsets are sessment system developed to evaluate the new kernel. The faced. whole process is started with the sentence input written by a student. At £rst, Korean Morphological Analyzer 3 Latent Semantic Kernel for Similarity (KMA) [6] extracts main indexed terms from the input sen- Measurement tence. With the term list, an initial vector is constituted with elements of the vocabulary generated from both a large document collection and WordNet. 16,000 words were listed in With the initial term-synset matrix, A, created above, we the vocabulary. Then, the dimension is reduced by comput- build the latent semantic kernel [3]. Similarity between doc- T ing P d, where P is the kernel from WordNet and d is the uments, d1 and d2, is estimated as follows. initial vector. With model sentences, which were created by instructors and were transformed on the same way as the dT PPd sim(d ,d )=cos(P T d ,PT d )= 1 2 input student sentence, the similarities are estimated by us- 1 2 1 2 T T (3) |P d1||P d2| ing the equation (3), where the student sentence is mapped into d1 and the model sentences are mapped into d2.Inthis P , where is a matrix transforming documents from an input research, we stored £ve model sentences for each question. k(d ,d )=< space to a feature space. A kernel function 1 2 Finally, the highest similarity value is determined as the £- φ(d1),φ(d2) > uses the matrix P to replace φ(d1) with T nal score of the student sentence. P d1. To £nd P , the term-synset matrix A is transformed by 4.2 Comparison to Human Assessors using SVD like A = UΣV T We gave 30 questions about Chinese proverbs to 100 stu- (4) dents, requiring proper meaning of each proverbs. Firstly, a , where Σ is a diagonal matrix composed of nonzero eigen- human assessor decided whether each answer is correct or values of AAT or AT A, and U and V are the orthogonal not and gave one of two scores, 1, and 0, respectively. The Figure 1. WordNet Extract for the term ‘car’ Figure 2. Automatic Assessment Process score of each student are then computed. Likewise, our as- References sessment system does the same thing except that the system gives a partial score to each answer ranged from 0 to 1. [1] A. Budanitsky and G. Hirst. Semantic distance in word- Figure 3 shows the correlation between the scores from net: An experimental, application-oriented evaluation of £ve the human assessor and those of the automatic assessor. The measures. Workshop on WordNet and Other Lexical Re- correlation coef£cient value is 0.922. As shown in £gure sources, Second Meeting of the North Americal Chapter of 3, the assessment system tends to give a little lower score the Association for Computational Linguistics, 2001.

Load more