Index

AD, 480-492 Audiovisual intelligibility, 238 Accentcorrunand,333-346,405-412, Automatic segmentation, 197,262,275, 420-423 308-315 Accent component, 335-336, 403-408, Automatic stylization, 328, 347-349, 414,419-423 360-361 Accent function, 141-144, 150-154 Accent group, 405-409 Accent nucleus, 338 Bag-of-words strategy, 158 Accent pattern, 403,420,444 Baseline declination, 191-192 Accent realization, 370, 375 Bayesian classifier, 157-158, 165,170 Accent type, 189-192,333,338-341, Bigrams, 158-165 411,499-505 Boundary type, 320, 500, 505 Accent-lending rise, 583 Breath group, 123,448 Acoustic cue, 240, 328, 387, 550-553 British English, 92,117-118,191,293, Acoustic model, 42,306-308,314-322 302 Acoustic-phonetic information, 543, 550-553 Ad-restructuring, 481, 482-484, 489'-492 Case-based reasoning, 86 Adduction, 15 Centering theory, 144-148 Allophony, 10, 23 Cepstral distance, 286-290 Ambisyllabicity, 93,100-101, 107-115 Cepstral mismatch, 295 Amplitude envelope, 62-66 Cepstral representation, 63 Analogical reasoning, 86 Chinese, 329-332, 383-386, 397, 569 Angry style, 417-418, 424-428 Cliticization, 125-127 Anomalous sentence, 545-546 Close-copy stylization, 347, 348, Aperiodic component, 5, 41-55 353-362 Aperiodic ratio, 51 Closed-response format, 544-545 Articulatory goal, 228-229 Closure interval, 215-218 Articulatory synthesis, 175-184,201, Co-production model, 96, 100, 111 517,588 Coarticulation, 14,94,100,106-111, Aspiration noise, 32, 41, 214-215, 119,180-184,245,263-267, 566-570 314,355,378,465 Assimilation, 25, 74, 101-102, 107, COCOSDA,513-522 265,467,534 Cognitive effort, 544-552 Association domain, 434, 478-480, 492 Collocations, 160-170,290 Attentional state, 139-153,553 Compensatory effect, 394-398 592 Index

Comprehension, 156,521-525, , 191-196,339,460,468, 541-556 473,487 Concatenation, 3, 24, 57, 101, 125, Duration rule, 96, 197,301,462,469, 187,200,217,261-274,279, 505,556-557 287-292,308,355-357,413, Dutch, 77-79, 86-87, 106,282,434, 459-461,470,530,563-588 463,477,485-487,492,526, Connectionism, 86 580,585-587 Consonantal closure, 216 Dyad, 181,563-571 Consonantal constriction, 211-218 Dynamic , 347, 358 Constraint-based phrase structure grammar, 93 Constriction, 4, 41, 124, 131, 183,200, Eagles, 516-526 211-218,229-231 Early peak, 461-463 Content word, 134, 158-162,372, Echo question, 406-413 444-445,452-453,583-585 English, 10-12, 20-27, 38, 77-82, Context dependency, 268 87,92-93, 105-107, 112, Context-free grammar, 123, 125-137 117-119, 125-127, 102-141, Continuity distortion, 279, 285-292 154,183-198,203,211,220, Coordinative structure, 227-232 247,275,282-302,334,343, Corpora, 83, 171, 197,262,279-280, 366,385-386,398,439,446, 314-316,332-333,365,383, 455,462,471,485,492, 435-436,533 522-532,543-545,564-572, Corpus-based method, 74, 313-314 577,587-588 Cross-sectional area, 200, 212-215, 226 EuroCocosda,519,526 Evaluation, 11, 25, 43, 50, 55, 69, 86, 91,97, 106, 125, 168-170, 180-182, 256, 262-264, Date expressions, 133 270-275, 287, 293, 309, Decision list, 82,157-171 315-322,341,365,417-419, Decision tree, 83-85, 157-158, 164, 424-427,463,470-471,506, 170,337 511-543,556,584-586 Declarative, 74, 91-94, 99-105,111, Excitation, 3-6, 13-28,41-66,215 189-191,343,404-414,427, 581-583 Declarative phonology, 74 Focontour, 15-16,24, 135, 194,262, Declination line, 406, 436-438 334-349,354,362,403-411, Definite clause grammar, 92 419-427,444-447,463,566 Deletion phenomena, 114 Fodeclination, 405-406, 414 Demisyllable inventory, 263, 269-273 Foend point, 467 Devocalization, 504 Fomaximum, 460 Diphone, 24-25,183-189,200-203, Fominimum, 460 261-275,280,293-303,378, Fopeak, 203, 462-465, 572 413,514,530-531,539, Foproduction, 403, 408 585-587 Fosynthesis, 23 Discourse model, 139-142, 195 Fo trace, 189,201 Discourse salience, 145, 153 Face model, 236-244 Discourse , 10, 142-152,448 Feature sharing, 95 Discourse structure, 139-146, 154, 192, Feature specification, 93-98, 460 397,499 Feature structure, 94, 101-102 Index 593

Final lengthening, 1I8, 394-397, 449, Global focusing, 144, 152-153 462,467 Global probability, 163-164 Finite state automaton, 136 Glottalftow,15-18,27-29,38-42,47 Firthian prosodic analysis, 109-1I0 Glottal opening, 28, 211-219 -timed languages, 116 Glottal orifice, 212-214 Force field, 222-223 Glottal source module, 569 Formant bandwidth, II, 217, 496 Glottal stop, 5-9, 18-20, 25, 268-270, Formant filter, 41, 47-48 352 Formant frequency, 432, 495-497, Glottal waveform, 4-6,30-33,38 504-505 Glottalization, 6-25 Formant parameter, 53, 213-220 Grapheme-phoneme conversion, 529 Formant structure, 98, 105, 387, Grapheme-to-phoneme conversion, 504-505 77-88,521,530-539,572 Formant synthesis, 9, 25, 47, 188, 193, Greedy algorithm, 329, 383-387, 398 202 Formant synthesizer, 10, 38, 55, 91, 184,211-212,220 Hard palate, 227 Formant transition, 13-14,23,102-103, Harmonic modeling, 31 188,200,509 Hat pattern, 461, 474 Formant wave form, 41-43 Head based implementation, 22-23 French, 78-83, 127, 197,202-203, Hidden markov models, 306, 313-317, 239-240,245-248, 256-257, 452 329-332, 348-349, 357-372, High valley, 468, 473 378, 398, 529-532, 563, High-predictability sentence, 546-547 569-571,587-588 Homograph, 82-84,131, 157-171,313, Frequency domain, 28, 43-45, 51-54, 566 408 Hurried style, 417-418, 424-428 Frication noise, 4, 41, 214-215 Fujisaki model, 329,420 Function word, 105, 136, 283, 372, 444, 449-453,460,466,478,585 Idioms, 128, 133, 137 Fundamental frequency contour, 66, Information gain, 79-85 187-194,203,343,367,401, Information status, 140-145, 150-153, 414-427,439 454 Fundamental frequency modification, Information theory, 83, 311 61 Initial demisyllable, 263, 268-270, 413 Instantaneous frequency, 48-49 Instantaneous phase, 48-49 Generative phonology, 105, 11 0, Intermediate phrase, 478 186-187,432-436 Internal clock, 366-372 . German, 12,23-25,261-272,328-334, Interrogative, 136,407-412,581 401,411-414,432,459-462, Intervocalic consonant, 1'00-101, 266, 468-471,485,514,529-533, 274 538-539, 563, 569-571, group, 134-135 587-588 Intonation model, 181, 192, 342, German intonation, 401, 41I 348-349, 358, 401-403, German language, 531,539 410-411,459,569 Given/new information, 140, 145, Intonation module, 569 150-153 Intonation stylization, 348, 362 594 Index

Intonation synthesis, 137, 175, 187, Linguistic features, 119, 129,401, 193,362 406-409 Intonation(al) contour, 73, 132-136, Linguistic information, 123-126, 180, 189-194,271, 355-358, 147,333-337,343,439,552, 401-414,434-436,477-487, 566-568 492,531 Linguistic knowledge, 77, 317, 548-550 Intonational boundary, 484, 490 Linguistic structure, 5, 109-110, Intonational domain, 436, 477-492 118-119,142,328,433-437, Intonational phonology, 477 553,567-568,577 Intonational phrase, 189-192, 433-434, Lip model, 235-257 478,492,569-571 Local focusing, 145-146, 152 Intonational phrasing module, 492, Long-term memory, 548, 553 566-572 Low rise, 469 Intonational prominence, 139, 149-153, Low valley, 468, 473 572 Lpc analysis, 4,33-34,57-58, 188, 194 Intrinsic duration, 118,389-390 Inventory structure, 263-274 Inverse filtering, 4-5, 30,44 MBRtalk,86 Ipex, 569-572 MITalk, 74-77,107,186,203,398,541, Ipox,91-108 557,587 , 116,365-366 Mandarin, 329-332, 383-398, 567-571, Italian, 123-137, 305-308, 514, 532, 588 569,588 Markov model, 306, 313-317, 452 Medial peak, 461-467 Memory load, 549-551 Jaw model, 248, 253-257 Memory-based reasoning, 86 Metrical foot, 97, 102-105, III Metrical grid, 478 K-nearest neighbors, 85 Metrical parsing, 113 Klatt model, 23-24 Metrical structure, 92, 100, 105 Klatt synthesizer, 91, 99, 106,216-221, Metrical-prosodic structure, 104, 108 427 Microprosody, 355, 462-465 Mixed inventory, 261-263,268-275 Modified rhyme test, 521, 542-543 LRE,519-523 Modular architecture for TTS, 563-568 Labial constriction, 217 , 331-333, 338-343 Language comprehension, 154,541, Morpa-cum-Morphon, 77, 87 548-554 Morpho-syntactic analyzer, 123, 125, Language generation, 581 132-134 Language processing, 555-557 Morphologic(al) analysis, 74, 87, Laryngeal muscles, 403, 408 123-133,137,569 Late peak, 461-467 Morphological decomposition, 529-537 Lemmatization,566 Morphosyntactic structure, 92, 104,478 Letter-to-sound rule, 531-533, 543, 572 Motor command, 229-232, 369 Lexical ambiguity, 157, 170 Motor control, 221-222, 227-232 Lexical , 123-124, 130-131, Multilingual, 138,275,513,520-526, 137,459-460,465-466,472, 563-565,572,587 532-533,538 Multiple pronunciation, 158,316 Linguistic constraint, 404, 411, 545 Muscle fiber, 223-226 Index 595

Muscle tissue, 223-225 Phonological phrase, 433, 478 Muscular activation level, 225-227 Phonological structure, 9, 22-25, 97-99,106-112,117-119,205, 437,478 N-gram tagger, 157, 158, 170 Phonological word, 136,478 NetSpraak, 87 Phrasal stress, 10-12, 23 NetTalk, 86-87 Phrase boundary, 124,338,406-411, NewTIS, 563-572 444-455,460,467-473 Ngrams, 159-160, 167, 172 Phrase command, 333-343, 405-412, Non-interactive model, 27, 29 420-423 Non-linear fitting, 32 Phrase component, 336-342,403-408, Non-terminal peak, 92, 112,433, 460 419-423 Nonnative speaker, 544 Phrase contour, 403-411 Nuclear accent, 11, 190,201 Phrase-level prosodic structure, 92, 105 Pitch accent, 139-142, 152-154, 189-195,203,217,420,436, Open phase, 28-32 478-482,487,492,566,572 Open quotient, 5, 28, 34-36, 215, , 11, 62, 123, 327-329, 566-570 347-362,480,569,583-584 Open/closed vowels, 274 Pitch movement, 347-362, 411, 436, 583 Pitch peak, 408, 459-461, 466-467 Pair comparison test, 270-274 Pitch reset, 436, 460, 468 Parametric control, 225, 459, 468 Pitch valley, 461-473 Part-of-speech ambiguity, 158 Place name, 316 Partially deaccented, 459 Pre-head, 462, 466 Pause duration, 339, 367, 459, 464, Preboundary lengthening, 482, 492 502-505 Prefixation, 125, 132 Perceptual analysis, 542, 548 Primary stress, 459 Perceptual centers, 366-368, 377 Prolog, 92 Perceptual equivalence, 357-362 Pronunciation accuracy, 537-538 Perceptual evaluation, 270, 365, 463, Pronunciation error, 529-531, 538 534,541-543,555,586 Pronunciation network, 316-318 Perceptual test, 50, 257, 286-288, Proper name, 128, 133, 140-141, 301-303,360,556 147-152, 452, 514-516, Perceptual threshold, 349-353 530-533,538,588 Periodic component, 32, 42-50 Prosodic analysis tree, 104 Phase distortion, 30 Prosodic boundary, 447, 459-469, 483, Phoneme boundary estimation, 262, 492,584-586 305-308 Prosodic constituency, 434-436, Phoneme duration, 306, 502-505 478-484,492 Phoneme environment, 504-505 Prosodic context, 13, 22-23, 102, Phonemic transcription, 78-80, 87,192, 283-284,462 306 Prosodic grammar, 104-105 Phonetic exponency, 97-98,104 Prosodic head, 22, 93-95 Phonetic transcription, 77-79, 123-132, Prosodic hierarchy, 22, 433, 449, 137,217,294,313-314,321, 478-480,491-492 468,496-497,531-533,539, Prosodic parameter, 41, 57, 414, 571, 589 435-437,495-496,509,582 596 Index

Prosodic parser, 453--455 Segmental intelligibility, 263-264, Prosodic phonology, 432--434, 470, 270-275, 525, 541-546, 477--478,490 554-556 Prosodic phrase, 101, 123-124, Segmental test, 272, 520 405--408,434--436,444--455, Semantic ambiguity, 158 462,473 Semantic constraint, 169,545 Prosodic rule, 328, 343, 583-585 Sentence accent, 88, 331,409--412,520, Prosodic structure, 22-24, 91-92, 97, 578,583 104-105,154,329-332,367, Sentence duration, 501-502 378,433--437,443--455,477, Sentence mode, 406--413 485--492,554 Sentence stress, 192,459-466,472 Prosodic test, 520 Sequential network, 371-375 Prosodic transformation, 59-69 Silence duration, 372-374 Prosodic utterance, 434, 446--453 Silent interval, 449 Prosodic word, 433--434, 444--453 Simplified overlap and add technique, Pruning, 85, 163-164,282-283,318 137 Psola, 5, 24-27, 181, 261, 275, Sinusoidal model, 55-69 292-293,300,375,410,531, Source model, 6-9, 27-33 538,564,580-586 Source parameter, 4, 15-20, 27-38, Punctuation, 136, 338-339, 385, 214-215,566-572 450--453,466--469,524,571 Source spectrum, 12-24 Spanish, 57, 293-298, 303, 332, 569-572,587-588 Quantitative intonation model, 401, 411 Spectral energy, 281 Quantity-sensitive stress systems, 93 293, 302 Quasi-periodic component, 41--43 Spectral mismatch, Spectral tilt, 3, 9, 15-20,30-34,51-53, 215,281-282,291,432,437, Referent-tracking, 443 495-498,566-570 Repartition model, 365-366, 371-372 Speech corpora, 314-315, 332-333 Response latency, 550 Speech corpus, 152,279-282,359-362, Rhythmic organization, 110 526 Rhythmic programming unit, 369 Speech database, 185, 280, 305, Rule compiler, 91-94, 187 313-318,359,383-388 Rule-based synthesis, 211 Speech intelligibility, 240, 247, Running window implementation, 23 255-256,531 Russian, 82, 88, 563, 569-571, 588 Speech melody, 431-433,577-578 Speech perception, 176-178,245-247, 517,541,548-555 SAM, 272, 513-519 Speech production, 38-51, 59, 179-186, , 459, 466, 534 199,211-212,221,227-236, Segment label, 279, 289, 319-322 291,331,463--471,517,589 Segment(al) duration, 34-38, 96-97, Speech quality, 4-6, 284, 521, 526-530, 110,119,197-203,262,291, 549,577,584-585 314,367-378,389-393,398, Speech rate, 329, 365-375, 381, 417-418,423-432,462-471, 422-425, 433-436, 448, 501,566-571 459--473,490--492,502 Segmental context, 9-13, 22-23, Speech recognizer, 306, 314-315, 353, 281-283,289 575-576 Index 597

Speech rhythm, 92,100-104,109,300, TTS architecture, 563-572 378,431 Temporal alignment, 413, 462 Speech style, 495, 509, 582-583 Temporal compression, 113-114 Speech timing, 202, 433, 47~71, 554 Text analysis module, 571 Speech waveform, 11,30,279-282, Tilt bandwidth, 16-17 289,315-317,566 Time-aligned phonetic transcription, Spontaneous narrative, 139-140, 314,321 154-156 Tonal perception, 347-362 Squish, 113-117 Tonal segment, 351-353, 358, 436 Static tone, 347-350 Tonal stylization, 362 Tone copy rule, 480 Statistical computational model, Tone sequence, 203, 338 334-337 Tongue blade, 211-213, 219, 229-231, Stop consonant, 102,218-220,317,322 265 Street names, 529-532 Tongue body, 198,211-216,223-225, Stress group, 402-406 231-234 Stress-timed languages, 116 Tongue tip, 224-232 Strict layer hypothesis, 433-434, 446, Trigrams,160-167 491-492 Stylization, 328, 347-362 Sums-of-products models, 109 Unification, 94, 227 Superlatives, 134-135 Unit distortion, 279, 285-292 Superposition, 401-402, 407, 437-438, Unit inventory, 263-266, 275 471 Unmarked style, 417-428 Suprasegmental, 6-9, 20,113,331,492, Unrestricted text, 171,572-577 543,586 Unstressed, 12,93, 103-109, 118,219, Swedish, 203, 343, 366,414,433, 264,358-359,366,459-466 443-444,449-457,514 Syllabic tone, 358-359 boundary, 263-272, 316-318, Vector space model, 160 390,499 Velocity field, 222-223 Syllable compression, 102-108 Viseme, 245-250, 257 Syllable constituent, 111-112 Visual intelligibility, 240-241, 253-256 Viterbi decoder, 317-318 Syllable duration, 110, 370, 395-398, Vocal tract, 3, 27-30, 41-42,59,73, 414 182-183,211-222,227-232, Syllable grammar, 93-94 279,368 Syllable structure, 10, 93-95, 112-113, Vocatives, 479 187,388,394,413 Voice quality, 6-10, 15,23-28,34-47, Syllable weight, 93 53-55,69,194,220,279-282, Syntactic ambiguity, 132-135 327,431,555 Syntactic structure, 110, 134-135, Voice source, 10,25-27,38-42,54-55, 139,195,413,433-434,455, 291,347,509 465,477-478,483-492,546, Voiced speech, 3-6, 27-32, 38, 67 579-583 Vowel duration, 27, 36-38, 118, Synthesis module, 244, 564-569 388-398,462,502-505 Synthesis of intonation, 477 Vowel elision, 103, 108 Synthesis-by-concept, 529, 531, 538 Vowel quality, 103, 529, 534-536 Synthetic face, 179,239-247,555 Vowel quantity, 462, 534-538 598 Index

Vowel reduction, 92, 100-104,264 Word (cont.) Word class model, 365, 370,443, 452, 520,529 Waveform manipulation, 575, 584-585 Word concatenation, 564, 580-586 Wh-question, 411-412 Word pronunciation module, 88 Whispered speech, 42, 50-54 Word accent, 331, 402-409, 443-448, 529 Yes/no question, 407-412 Yorktalk, 91-101,107-119