Index of terms AFFICODE, 343 BACCS, 313, 325, 327 affix, 7 Bell Communications Research, 14 alignable sentence matrix, 327 bigram, 29, 281 aligned block, 41,42,43,44 bilingual concordances, 10, ll, 98, 254, alignment 264,272 clause alignment, 8, 117, 127, 128, 130, bilingual parsing, 139-69 131, 321 See a/so phrase alignment bitext, 2, 8, 25-49, 107, 115,278,279,280, constituent alignment, 139, 144, 147 See 347 a/so phrase alignment bitext axes, 26 multilingual alignment (more than two bitext geometry, 25 languages), 6, 49-69 bitext map, 25-49, 30 paragraph alignment, 43, 293 bitext slope, 26, 32, 33, 37 phrase alignment, 139, 140, 161,322 bitext space, 25-49 sentence alignment, xi, 3, 5, 6, 7, 9, 16, main diagonal, 26, 27, 28, 33, 34, 35, 37, 43,45,49,51,71,117-39,157,187- 38,84 200,240,254,292,313,315,322, bootstrapping, 6 335,343,344,345,346,369,370, bootstrapping grammar, 162 381,382,387 bootstrapping lexicon, 5 structural alignment, 8, 201-17 bracketing, xiv, 139-69 tenn alignment, 12,246,253,264,266, 269 CtC++,359 word alignment,S, 6,16,49,69-117, CALL See computer-assisted language learning 118, 132, 187, 188, 197, 198, 199, char_align, 27, 37, 119 221,253,255,260,262,263,321, CLIR See cross-language infonnation 369,381,385,387,388 retrieval anchor matrix, 327 closed class expression, 102 anchor point, 4, 6, 43, 78, 84, 119, 161, closed class word, 30, 106, 109 See also 213,326,329,345,346 function word ANSI, 356 cognate,S, 12, 29, 30, 37, 43, 45, 51, 64, ARCADE, vii, viii, 4, 6, 15, 16,37,44,66, 70,71,97,105,106, 119, 197, 198,213, 108, 109, 115, 196, 197,254,369-88 375 AJtPA,280,281,289 orthographic cognate, 29 ASCII, 356, 367 phonetic cognate, 29, 40 axis generator, 29 390 Index ofterms cogtag,213 226,227,230,233,275,279,280, collocations, 10, 11, 104, 120, 155, 161, 291,304,326,329,330,331,333 225,256 bilingual dictionary extraction, 275, 276, comparable corpora, 219, 220, 223, 224, 280, 294 See also bilingual lexicon 233 extraction compound, 7, 8, 98, 128, 155, 188, 189, COBUILD,9 190, 191, 198, 199,240,237-52,255, Collins Spanish-English Dictionary, 282 257,262,269,321,322,387 Kodansha's Japanese to English computer-assisted language learning, 14, dictionary, 314 299-311 content word, 7, 120, 121, 132,219,220, machine-readable dictionary, 10, 30, 98 229,231,232,261,262,267,324 Oxford-Hachette French Dictionary, 10 context-free grammar, xv, 77, 121, 139-69 Tn:sor de la Langue Frant;:aise, 9 contingency (table, matrix), 119,260,328 dictionary-based term translation 282 287 291,292 ' , , contrastive studies, 3, 16, 169, 170, 177, 182,184,301 DK-vec, 69, 71, 72, 78, 79, 80, 84, 88, 91, Convec,219,224,225,233,234 92 corpus document type definition, 203, 212, 318, ARCADE, 383 355,357,361,364,366,367 BAF, 37, 51,196, 197,370,371,372, Document Type Definition, 203 DTD document type definition 373,374 See DTW See dynamic time warping BLINKER, 15 dynamic programming, 4, 29, 57, 58, 59, CCITI handbook, 15 63, 64, 66, 78, 79, 80, 81, 86, 90, 91, CELEX,123 117, 119, 122, 133,323,330 COBUILD,9 dynamic time warping, 202, 208, 214 CRATER, 15,49,335 ECI, 14, 15,72,370 EBMT See example-based machine English-Norwegian Parallel Corpus, 15 translation EPPC, 335-46 EBT See example-based term substitution ETlO-63,335 ELRA See European Language Resource Hansard, 12, 14, 15,44,98,202,323, Association 324,373,374,375,381 EM See Estimation-Maximisation JEIDA, 15,313,315,316,320,332 algorithm JOC, 109, 196, 197,370,371,372,375, EMIR, 188 383 Estimation-Maximisation algorithm, 86, 88 LINGUA, 15, 304 European Language Resource Association, MULTEXT, 15,49,335,370 14 evaluation MULTEXT-EAST,15 evaluation of alignment systems, 6, 14, PAHO, 280, 281,289 16,26,35,43,45,51,60,65,71,84, PEDANT, 15 89,92, 106, 107, 110,123, 133, 191, SGGS, 335-46 198,204,213,219,231,232,234, TELRI,15 240,244,246,248,250,254,286, UN Multilingual Corpus, 276, 280, 286 287,288,295,332,369-88 UNICEF, 286, 287, 288, 289, 290, 293 evaluation of MT systems, 315 Dice coefficient, 5, 29, 99, 119, 120, 384, user evaluation, 308 385 example-based term substitution 275 281 287,292,293,296 ", DICT See dictionary-based term translation dictionary bilingual dictionary, xiv, 5, 10, 13, 69, FAHQT See fully automated, high-quality 99,119,120, 190, 191, 192, 197, machine translation false friends, 30, 302 See also faux amis Index of terms 391 faux amis, 30 ISO See International Organization for finite-state Standardization finite-state automaton, 243 ISO 3166, 356 finite-state parser, 267 ISO 639, 356 finite-state transducer, xv, 139, 140,259, ISO 646, 356 325 ISO 8601, 361, 367 weighted finite-state automaton, 29, 30 ISO 8879, 203, 364 F-measure, xiv, 51, 55, Ill, 196, 197,376- ISOIDIS 1087, 240 88 ISOIIEC TR 9573-1 1,318 frozen expression, 7 iterative proportional fitting procedure, 260 function word, 7, 78, 109, 232, 261, 262, 267,382 Java, 359 JIS X 0208, 319 generalized vector space model, 275-98 lIS X 201, 319 Geometric Segment Alignment algorithm, 25-49 K-Vec,5 GLOSS, 282, 292 GLOSSER, 299-311 language proportion coefficient, 79 See also GSA See Geometric Segment Alignment length algorithm latent semantic indexing, 275, 282, 285, GVSM See generalized vector space model 295 LCSR See longest common subsequence Hidden Markov Model, 4 ratio HMM See Hidden Markov Model LDC See Linguistic Data Consortium HTML, 349, 355, 359, 362,363, 364 lemma, 73, 74, 75, 76, 77, 78, 89, 93, 96, 124,125,126,146,238,259,306 IBM model, 161 See also Brown, P. F. lemmatisation, 69, 72, 74, 76, 114, 124, (authors) 189,238,243,308 IBM modell, 276 length, 27 IBM model 2, 262 character length, 4, 193 IBM T. J. Watson Research Center, 14 clause length, 13 I IDF See inverse document frequency sentence length, 4,5,51,119,323,343, infix, 74 348 inflection, 7, 113, 150,294,300,305,307 word length, 117 inflectional language, 73, 125 lexical anchoring, 4, 338 See also anchor inflectional morphology, 141 points information retireval lexicography, vii, 1,9, 10, 11,50,97 cross-language information retrieval, 2 lexicon information retrieval, 225, 240, 266 bilingual lexicon extraction, 10,69,85, cross-language information retrieval, 9, 88,100,109,120,219-37,254-63 13,16,187-200,275-98 See also bilingual dictionary measures, xii See also precision, recall, extraction F-measure core lexicon, 78, 79, 82, 83, 84 monolingual information retrieval, 287- seed translation lexicon, 30, 37, 39, 43 93 translation lexicon, 40, 43, 44, 45, 98, multilingual information retrieval, 118 109, 120, 121, 141, ISS, 157, 158, International Organization for 159, 161 Standardization, 352, 367 Linguistic Data Consortium, 14, 25, 223, inverse document frequency, 229, 283, 288, 280,286,289 289,293 LinkOping Word Aligner, 97-117 IPPF See iterative proportional fitting LISA See Localization Industry Standards procedure Association 392 Index ofterms Localisation Industry Standards Association, 15,347-68 noise, 261, 263 LOCOLEX, 308 in evaluation measures, 381 Logos, 353, 354 noise filtering, 6, 25, 28, 31, 32, 45 longest common subsequence ratio, 29, 30, noisy alignments, 54, 114, 268 34, 37, 43, 105 noisy corpora, 8,10,120, 172,219,220, LPC See language proportion coefficient 222,233 LSI See latent semantic indexing noisy output, 258, 264 L WA See Linkoping Word Aligner noisy texts, 6, 71, 119 noisy translation lexicon, 160 machine translation, 13, 40, 118, 169, 184, 223, 238, 253, 275, 283, 295, 314, 321, signal-to-noise ratio, 25, 32 noun phrase, 117, 120, 124, 129,244,247, 323,350,353,354,362 250,253,256,259,269,270 computer-aided translation, 117, 118, 187,238 See also machine-aided open class expression, 102 translation open class word, 106 See also content word example-based machine translation, 12, Open Standards for Container/content 135,158,254,276,322,350 Allowing Re-use, 347-68 fully automated high-quality machine OpenTa~353,354,357,362,363 translation, 11 OSCAR See Open Standards for machine translation in the fifties, 2 Container/content Allowing Re-use machine-aided human translation, 8, 11, 169,253-74 PANGLOSS, 282, 292 statistical machine translation, 69 ParaConc, 264 MAHT See machine-aided human parallel concordancer, 10, 174 See also translation GLOSSER, ParaConc, TransSearch, MARK ALISTeR, 308 bilingual concordances matching predicate, 27, 29, 30, 32, 34, 37, parallel texts 40,43 alignment, aligned parallel text maximal marginal relevance, 281 (definition), 2 maximum-likelihood, 139, 157 definition, 1 measure, III part-o~speech, 70, 76,239,253,382 metric, 34, 35, 39, 43, 51, 120, 125, 196, part-of-speech pattern, 241 287,327,369,376,387 part-of-speech tag, 76, 77, 321 Microsoft, 353 part-of-speech tagger, 77, 98, 123 MLA/ign, 211, 213, 214 part-of-speech tagging, xii, 124, 245, morphology, xv, 70, 73,103,104,106,109, 259,305,308,325,336 112,115,213,240,243,250 pattern recognition, 25 morphological analysis, 305, 307, 308 Penn Treebank, 123 morphological module, 102, 103 Perl, 60, 63, 106, 243 morphological variant, 74, 96, 103, 113, phraseology, 7, 10, 11 114,256,263,272 See also inflection point of correspondence, 31, 40, 41, 43 MRBD See machine-readable bilingual POS See part-of-speech dictionary precision, xii, xiv, 51, 55, 56, 59, 60, 61, Multiconcord, 202 62,77, 106-15, 126, 133, 158, 160, 191, multitext, 2 192, 193, 197, 198, 199,214,219,222, multi-word unit, 102, 104, 108, 112, 382 234,244,245,258,269,271,287,289, See also compound 290,294,332,374,376-88 mutual conditional probability, 277, 294 prefix, 74, 90 mutual information, 99, 120, 156,221,244, PRF See pseudo relevance feedback 246,327,328,329,330 pseudo relevance
File Typepdf
Upload Time-
Content LanguagesEnglish
Upload UserAnonymous/Not logged-in
File Pages13 Page
File Size-