1994-The Relationship Between Architectures and Example
Total Page:16
File Type:pdf, Size:1020Kb
From: AAAI-94 Proceedings. Copyright © 1994, AAAI (www.aaai.org). All rights reserved. The Relationship Between Architectures and Example-Retrieval Times Eiichiro SUMITA, Naoya NISIYAMA and Hitoshi IIDA ATR Interpreting Telecommunications Research Laboratories 2-2 Hikaridai, Seika, Souraku, Kyotc 619-02, JAPAN [email protected] Abstract Example-Based Approach In the early 198Os, Nagao proposed a novel model This paper proposes a method to find the for machine translation, one that translates by mim- most suitable architecture for a given response time requirement for Example-Retrieval (ER), icking best-match translation examples, based on the which searches for the best match from a fact that a human translates according to past trans- bulk collection of lingusitic examples. In the lation experience.(Nagao 1984) Since the end of the Example-Based Approach(EBA), which attains 198Os, large corpora and powerful computational de- substantially higher accuracy than traditional vices have allowed us to realize Nagao’s model and ex- approaches, ER is extensively used to carry out pand the model to deal with not only translation but natural language processing tasks, e.g., parsing also other tasks such as parsing. First, the word se- and translation. ER, however, is so computa- lection problem in machine translation was attacked tionally demanding that it often takes up most (Sato 1991; Sumita & Iida 1991; Nomiyama 1992), of the total sentence processing time. This paper then, experimental full translation systems ’ were re- compares several accelerations of ER on differ- ent architectures, i.e., serial, MIMD and SIMD. alized or proposed (Sato 1991; Furuse & Iida 1992; Experimental results reveal the relationship be- Kitano 1991; Watanabe 1992; Maruyama & Watan- tween architectures and response times, which abe 1992)) next, case frame selection (Nagao 1992) will allows us to find the most suitable architec- and pp-attachment(Sumita, Furuse & Iida 1993) were ture for a given response time requirement. investigated. These papers have demonstrated that EBAs surpass conventional approaches in several as- pects, particularly coverage and accuracy. Introduction So far, many different procedures for ER have been Novel models for natural language processing (NLP) proposed, however, they share a framework that cal- that make use of the increasing availability of large- culates the semantic distance between the input ex- scale corpora and bulk processing power have been pression and the example expression and returns the studied in recent years. They are called Example- examples whose semantic distance is minimum. Here, Based Approaches (EBAs) because they rely upon lin- a typical definition of semantic distance measure is ex- guistic examples, such as translation pairs, derived plained(Sumita & Iida 1991). Suppose Input I and from corpora. Example E are composed of n words. The semantic Example-Retrieval (ER), the central mechanism of distance measure is the summation of the distance, the EBA, provides the best match, which improves the d(.&, Ek) at the k-th word, multiplied by the weight coverage and accuracy of NLP systems over those of of the k-th word, wk. The distance, d(1k, &) is deter- traditional methods. ER, however, is computation- mined based on a thesaurus. The weight, wk, is the ally demanding. Success in speeding up ER using a degree to which the word influences the task. massively parallel associative memory processor has al- As explained in detail in the next section, ER is ready been reported. This paper does not focus on such computationally demanding. This problem was at- a single implementation but aims to clarify the rela- tacked through the Massively Parallel Artificial Intel- tionship between architectures and response times, and ligence (MPAI) paradigm(Kitano et al. 1991; Stanfill answers the question, which architecture is most & Waltz 1986), producing a successful result(Sumita suitable for ER? ‘Translation examples are varied from phrases to sen- First, EBA and ER are briefly introduced, then, tences. Some input sentences match a whole example the computational cost of ER is described, next, vari- sentence. Other input sentences match a combination of ous implementations and experimental results are ex- fragmental examples (noun phrase, verb phrase, adverbial plained, followed by discussion. phrase and so on). 478 Enabling Technologies et al. 1993) using a massivley parallel associative pro- Achieving this goal means that we will have suc- cessor, IXM2(Higuchi et al. 1991). Unlike this, the ceeded in accelerating ER to a sufficient speed be- authors aim to obtain the relaticr ,hip between vari- cause the maximum of the total ER time, 100 (= T*C ous architectures and response times. = l*lOO) milliseconds is much less than an utterence time. The Computational Cost of Example-Retrieval Implementations of ER ER is the full retrieval of similar examples from a large- This section explains items related to implementations: scale example database. The total ER time is pre- architectures, the task, example and thesaurus-based dominant in processing a sentence and depends on two calculation of semantic distance, and the retrieval al- parameters, i.e., the example database size and the in- gorithms. put sentence length. From an estimation of the two parameters, we can derive the time requirement. Architectures Although we have implemented and are implementing Two Parameters ER on many machines 2, this paper concentrates on The ER time, T, rises according to the number of ex- the DEC alpha/7000, KSRl(Kendal1 Square Research amples, N. N is so large, as explained in the next Corp.1992) and MP-2(MasPar Computer Corp. 1992) subsection that T is considerable. ER is called many as representative of three architectures - serial, MIMD times while processing a sentence. The number of ER and SIMD - respectively. calls, C, rises according to the sentence length, L. Con- sequently, the total ER time is the predominant factor Task, Example and Thesaurus-Based in the time required for processing a sentence by an Calculation of Semantic Dist ante EBA system. For example, in Furuse et al.‘s proto- For the experiments, we chose the task of translating type system for spoken language translation, ER takes Japanese noun phrases of the form“A a) B” into En- up about 50-80% of the time required for translating a glish (A and B are Japanese nouns. “a)” is an ad- sentence and other processes such as pattern-matching, nominal particle such as “a,” “Ta),” “$14, a),” and so morphological analysis and generation consuming the on.) They are translated into various English noun rest of the time.(Oi et al. 1993) If N increases, the phrases of the form “B of A,” “B in A,” “B for A,” “I? ER/other ratio will increase. at A,” and so on (A and B being English translations of A and B.) We used 100,000 examples, which were Response Time Requirement collected from the ASAHI newspaper(Tanaka 1991). Here, we estimate N and L by extrapolating those of “A a) B” is represented in the strucure that consists the machine translation system mentioned at the end of the fields: AW for the WORD of A; BW for the of the previous subsection. WORD of B; AC for the the CODE (thesaurus code N depends on the vocabulary size. The vocabu- explained below) of A; BCfor the the CODE of B; and lary size of the prototype system is about 1,500. The NO for the WORD of “a)“. The example database is vocabulary size of the average commercially available stored in an array of the structure with another field, machine translation system is about 100,000. In the D, for the semantic distance. prototype, N of the most common example is about Each word corresponds to its concept in the the- 1,000. N, in direct proportion to the vocabulary size, saurus. The semantic distance between words is re- 100,000 is about 70,000. For the sake of convenience, duced to the semantic distance between concepts. The we assume N = 100,000. semantic distance between concepts is determined ac- The investigation of the ATR dialogue database cording to their positions in the the thesaurus hierar- (Ehara, Ogura & Morimoto 1990) reveals that the chy 3, which is a tree. The semantic distance varies lengths of most sentences are under 20 (words), i.e., from 0 to 1. When the thesaurus is (n + 1)-layered, L = 20. In our experiments, the following approxi- the semantic distance, (k/n) is given to the concepts mate equation holds between the number of ER calls, in the k-th layer from the bottom (0 5 k 5 n). C and the sentence length, L. The semantic distance is calculated based on the CODE (thesaurus code), which clearly represents the c = lob (1) thesaurus hierarchy, as in Table 1, instead of traversing In sum, N = 100,000 and L = 20, i.e., C = 100. Then 2They include the CM-2(Thinking Machines Corp. the total ER time with a serial machine is so large that 1990), IXM2(Higuchi et al. 1991), iPSC/2(Intel 1989), it would be not acceptable in a real-time application, CM-5(Thinking Machines Corp. 1991), and a workstation such as interpreting telephony, which we are striving cluster. to realize. 3The hierarchy is in accordance with a general thesaurus Goal: an ER time, T, with 100,000 examples (Ohno & Hamanishi 1984),a brief explanation of which is under 1 millisecond. found in the literature(Sumita & Iida 1992). Enabling Technologies 479 the hierarchy. Our n is 3 and the width of each layer is The Index-Based Retrieval Algorithm The ba- 10. Thus, each word is assigned a three-digit decimal sic algorithm is speeded up by an indexing technique code of the concept to which the word corresponds. for suppressing unnecessary computation.