<<

and Computers Prologue: Encoding Language

Writing systems Alphabetic Syllabic Logographic Systems with unusual realization Language and Computers Relation to language Encoding written language Prologue: Encoding Language ASCII

Spoken language Transcription Based on Dickinson, Brew, & Meurers (2013) Why speech is hard to represent Articulation Measuring sound Acoustics

Relating written and spoken language From Speech to Text From Text to Speech

Language modeling

1 / 60 Language and Language and Computers – where to start? Computers Prologue: Encoding Language

Writing systems Alphabetic Syllabic Logographic I Systems with unusual If we want to do anything with language, we need a way realization to represent language. Relation to language Encoding written language I We can interact with the computer in several ways: ASCII Unicode I write or read text Spoken language I speak or listen to speech Transcription Why speech is hard to represent I Computer has to have some way to represent Articulation Measuring sound I text Acoustics I speech Relating written and spoken language From Speech to Text From Text to Speech

Language modeling

2 / 60 Language and Outline Computers Prologue: Encoding Language

Writing systems Alphabetic Syllabic Writing systems Logographic Systems with unusual realization Relation to language

Encoding written language Encoding written language ASCII Unicode Spoken language Spoken language Transcription Why speech is hard to represent Relating written and spoken language Articulation Measuring sound Acoustics Language modeling Relating written and spoken language From Speech to Text From Text to Speech

Language modeling

3 / 60 Language and Writing systems used for human Computers Prologue: Encoding Language

What is writing? Writing systems Alphabetic Syllabic “a of more or less permanent marks used Logographic Systems with unusual to represent an utterance in such a way that it can realization be recovered more or less exactly without the Relation to language Encoding written intervention of the utterer.” language ASCII (Peter . Daniels, The World’s Writing Systems) Unicode Spoken language Transcription Why speech is hard to Different types of writing systems are used: represent Articulation Measuring sound I Alphabetic Acoustics Relating written and I Syllabic spoken language From Speech to Text I Logographic From Text to Speech Language modeling Much of the information on writing systems and the graphics used are taken from the great site http://www.omniglot.com.

4 / 60 Language and Alphabetic systems Computers Prologue: Encoding Language

Writing systems Alphabetic (phonemic alphabets) Syllabic Logographic Systems with unusual realization I represent all sounds, i.e., and Relation to language Encoding written I Examples: Etruscan, Latin, Korean, Cyrillic, Runic, language ASCII International Phonetic Unicode

Spoken language Transcription Why speech is hard to ( alphabets) represent Articulation Measuring sound I represent consonants only (sometimes plus selected Acoustics Relating written and vowels; generally available) spoken language From Speech to Text I Examples: , , From Text to Speech

Language modeling

5 / 60 Language and Alphabet example: Fraser Computers Prologue: Encoding Language An alphabet used to write Lisu, a Tibeto-Burman language spoken by Writing systems about 657,000 people in Myanmar, India, Thailand and in the Chinese Alphabetic Syllabic provinces of Yunnan and Sichuan. Logographic Systems with unusual realization Relation to language

Encoding written language ASCII Unicode

Spoken language Transcription Why speech is hard to represent Articulation Measuring sound Acoustics

Relating written and spoken language From Speech to Text From Text to Speech

Language modeling

(from: http://www.omniglot.com/writing/fraser.htm) 6 / 60 Language and example: Phoenician Computers Prologue: Encoding Language An abjad used to write Phoenician, created between the 18th and 17th Writing systems centuries BC; assumed to be the forerunner of the Greek and Hebrew Alphabetic Syllabic alphabet. Logographic Systems with unusual realization Relation to language

Encoding written language ASCII Unicode

Spoken language Transcription Why speech is hard to represent Articulation Measuring sound Acoustics

Relating written and spoken language From Speech to Text From Text to Speech

Language modeling (from: http://www.omniglot.com/writing/phoenician.htm)

7 / 60 Language and A note on the -sound correspondence Computers Prologue: Encoding Language

I Alphabets use letters to encode sounds (consonants, Writing systems Alphabetic vowels). Syllabic Logographic Systems with unusual I But the correspondence between spelling and realization Relation to language

pronunciation in many languages is quite complex, i.e., Encoding written not a simple one-to-one correspondence. language ASCII Unicode I Example: English Spoken language Transcription I same spelling – different sounds: ough: ought, cough, Why speech is hard to represent tough, through, though, hiccough Articulation I silent letters: knee, knight, knife, debt, psychology, Measuring sound Acoustics mortgage Relating written and I one letter – multiple sounds: exit, use spoken language From Speech to Text I multiple letters – one sound: the, revolution From Text to Speech

I alternate spellings: jail or gaol; but not possible seagh Language modeling for chef (despite sure, dead, laugh)

8 / 60 Language and More examples for non-transparent letter-sound Computers Prologue: Encoding correspondences Language

Writing systems Alphabetic Syllabic Logographic French Systems with unusual realization Relation to language (1) a. Versailles → [veRsai] Encoding written language . ete, etais, etait, etaient → [ete] ASCII Unicode

Spoken language Transcription Why speech is hard to Irish represent Articulation Measuring sound (2) a. samhradh (summer) → [sauruh] Acoustics Relating written and b. scri’obhaim (I write) → [shgri:m] spoken language From Speech to Text From Text to Speech

Language modeling What is the notation used within the []?

9 / 60 Language and The International Phonetic Alphabet (IPA) Computers Prologue: Encoding Language

Writing systems Alphabetic I Several special alphabets for representing sounds have Syllabic Logographic been developed, the best known being the International Systems with unusual realization Phonetic Alphabet (IPA). Relation to language Encoding written language I The phonetic symbols are unambiguous: ASCII Unicode I designed so that each speech sound gets its own Spoken language , Transcription I eliminating the need for Why speech is hard to represent I multiple symbols used to represent simple sounds Articulation Measuring sound I one symbol being used for multiple sounds. Acoustics

Relating written and I spoken language Interactive example chart: http://web.uvic.ca/ling/ From Speech to Text resources/ipa/charts/IPAlab/IPAlab.htm From Text to Speech Language modeling

10 / 60 Language and Syllabic systems Computers Prologue: Encoding Language Syllabic alphabets (Alphasyllabaries) Writing systems Alphabetic I writing systems with symbols that represent a Syllabic Logographic consonant with a vowel, but the vowel can be changed Systems with unusual realization by adding a (= a symbol added to the letter). Relation to language Encoding written I Examples: Balinese, Javanese, Tibetan, Tamil, Thai, language ASCII Tagalog Unicode

(cf. also: http://www.omniglot.com/writing/syllabic.htm) Spoken language Transcription Why speech is hard to represent Articulation Measuring sound Acoustics I writing systems with separate symbols for each Relating written and spoken language of a language From Speech to Text From Text to Speech

I Examples: Cherokee. Ethiopic, Cypriot, Ojibwe, Language modeling (Japanese)

(cf. also: http://www.omniglot.com/writing/syllabaries.htm#syll)

11 / 60 Language and example: Cypriot Computers Prologue: Encoding Language The or Cypro-Minoan writing is thought to have Writing systems developed from the Linear A, or possibly the of Crete, Alphabetic Syllabic though its exact origins are not known. It was used from about 800 to 200 Logographic Systems with unusual BC. realization Relation to language

Encoding written language ASCII Unicode

Spoken language Transcription Why speech is hard to represent Articulation Measuring sound Acoustics

Relating written and spoken language From Speech to Text From Text to Speech

Language modeling

(from: http://www.omniglot.com/writing/cypriot.htm)

12 / 60 Language and Syllabic alphabet example: Lao Computers Prologue: Encoding Language Script developed in the 14th century to write the Lao language, based on Writing systems an early version of the , which was developed from the Old Alphabetic Syllabic , which was itself based on Mon scripts. Logographic Systems with unusual realization Example for vowel diacritics around the letter k: Relation to language Encoding written language ASCII Unicode

Spoken language Transcription Why speech is hard to represent Articulation Measuring sound Acoustics

Relating written and spoken language From Speech to Text From Text to Speech

Language modeling

(from: http://www.omniglot.com/writing/lao.htm)

13 / 60 Language and Logographic writing systems Computers Prologue: Encoding Language

I Logographs (also called ): Writing systems Alphabetic I Pictographs (): originally pictures of Syllabic Logographic things, now stylized and simplified. Systems with unusual realization Example: development of Chinese horse: Relation to language Encoding written language ASCII Unicode

Spoken language I Ideographs (): representations of abstract Transcription Why speech is hard to ideas represent Articulation I Compounds: combinations of two or more logographs. Measuring sound Acoustics I Semantic-phonetic compounds: symbols with a Relating written and meaning element (hints at meaning) and a phonetic spoken language element (hints at pronunciation). From Speech to Text From Text to Speech I Examples: Chinese (Zhongw¯ en)´ , Japanese (Nihongo), Language modeling Mayan, Vietnamese, Ancient Egyptian

14 / 60 Language and Logograph example: Chinese Computers Prologue: Encoding Language Pictographs Writing systems Alphabetic Syllabic Logographic Systems with unusual realization Relation to language

Encoding written Ideographs language ASCII Unicode

Spoken language Transcription Why speech is hard to represent Articulation Measuring sound Compounds of Pictographs/Ideographs Acoustics Relating written and spoken language From Speech to Text From Text to Speech

Language modeling

(from: http://www.omniglot.com/writing/chinese types.htm)

15 / 60 Language and Semantic-phonetic compounds Computers Prologue: Encoding Language

Writing systems Alphabetic Syllabic Logographic Systems with unusual realization Relation to language

Encoding written language ASCII Unicode

Spoken language Transcription Why speech is hard to represent Articulation Measuring sound An example from Ancient Egyptian Acoustics Relating written and spoken language From Speech to Text From Text to Speech

Language modeling

(from: http://www.omniglot.com/writing/egyptian.htm) 16 / 60 Language and Two writing systems with unusual realization Computers Prologue: Encoding Language Tactile Writing systems Alphabetic I is a writing system that makes it possible to read Syllabic Logographic and write through touch; primarily used by the (partially) Systems with unusual realization blind. Relation to language Encoding written I It uses patterns of raised dots arranged in cells of up to language ASCII six dots in a 3 x 2 configuration. Unicode I Each pattern represents a character, but some frequent Spoken language Transcription words and letter combinations have their own pattern. Why speech is hard to represent Articulation Measuring sound Chromatographic Acoustics Relating written and spoken language I The Benin and Edo people in southern Nigeria have From Speech to Text From Text to Speech

supposedly developed a system of writing based on Language modeling different color combinations and symbols.

(cf. http://www.library.cornell.edu/africana/Writing Systems/Chroma.html)

17 / 60 Language and Braille alphabet Computers Prologue: Encoding Language

Writing systems Alphabetic Syllabic Logographic Systems with unusual realization Relation to language

Encoding written language ASCII Unicode

Spoken language Transcription Why speech is hard to represent Articulation Measuring sound Acoustics

Relating written and spoken language From Speech to Text From Text to Speech

Language modeling

18 / 60 Language and Chromatographic system Computers Prologue: Encoding Language

Writing systems Alphabetic Syllabic Logographic Systems with unusual realization Relation to language

Encoding written language ASCII Unicode

Spoken language Transcription Why speech is hard to represent Articulation Measuring sound Acoustics

Relating written and spoken language From Speech to Text From Text to Speech

Language modeling

19 / 60 Language and Relating writing systems to languages Computers Prologue: Encoding Language

Writing systems Alphabetic Syllabic Logographic I Systems with unusual There is not a simple correspondence between a realization writing system and a language. Relation to language Encoding written I For example, English uses the Roman alphabet, but language ASCII (e.g., 3 and 4 instead of III and IV). Unicode Spoken language Transcription I Why speech is hard to We’ll look at three other examples: represent Articulation I Japanese Measuring sound Acoustics I Korean Relating written and I Azeri spoken language From Speech to Text From Text to Speech

Language modeling

20 / 60 Language and Japanese Computers Prologue: Encoding Language

Writing systems Japanese: logographic system , syllabary , Alphabetic Syllabic syllabary hiragana Logographic Systems with unusual realization I kanji: 5,000-10,000 borrowed Relation to language Encoding written I katakana language I ASCII used mainly for non-Chinese loan words, onomatopoeic Unicode

words, foreign names, and for emphasis Spoken language I Transcription hiragana Why speech is hard to represent I originally used only by women (10th century), but Articulation Measuring sound codified in 1946 with 48 Acoustics I used mainly for word endings, kids’ books, and for Relating written and words with obscure kanji symbols spoken language From Speech to Text I romaji: Roman characters From Text to Speech Language modeling

21 / 60 Language and Korean Computers Prologue: Encoding Language

Writing systems Alphabetic Syllabic Logographic I The system was developed in 1444 during King Systems with unusual realization Sejong’s reign. Relation to language I There are 24 letters: 14 consonants and 10 vowels Encoding written language I But the letters are grouped into syllables, i.e. the letters ASCII in a syllable are not written separately as in the English Unicode system, but together form a single character. Spoken language Transcription E.g., “Hangeul” (from: http://www.omniglot.com/writing/korean.htm): Why speech is hard to represent Articulation Measuring sound I In South Korea, (logographic Chinese characters) Acoustics Relating written and are also used. spoken language From Speech to Text From Text to Speech

Language modeling

22 / 60 Language and Azeri Computers Prologue: Encoding Language

Writing systems Alphabetic Syllabic A Turkish language with speakers in Azerbaijan, northwest Logographic Systems with unusual Iran, and (former Soviet) realization Relation to language I Encoding written 7th century until 1920s: Arabic scripts. Three different language Arabic scripts used ASCII Unicode

I 1929: enforced by Soviets to reduce Spoken language Transcription Islamic influence. Why speech is hard to represent I 1939: Cyrillic alphabet enforced by Stalin Articulation Measuring sound Acoustics I 1991: Back to Latin alphabet, but slightly different than Relating written and before. spoken language From Speech to Text From Text to Speech

Language modeling

23 / 60 Language and Encoding written language Computers Prologue: Encoding Language I Information on a computer is stored in bits. Writing systems Alphabetic I A bit is either on (= 1, yes) or off (= 0, no). Syllabic Logographic I A list of 8 bits makes up a byte, e.g., 01001010 Systems with unusual realization I Just like with the base 10 numbers we’re used to, the Relation to language Encoding written order of the bits in a byte matters: language I ASCII Big Endian: most important bit is leftmost (the standard Unicode

way of doing things) Spoken language Transcription I The positions in a byte thus encode: Why speech is hard to represent 128 64 32 16 8 4 2 1 Articulation I “There are 10 kinds of people in the world; those who Measuring sound know binary and those who don’t” Acoustics Relating written and (from: http://www.wlug.org.nz/LittleEndian) spoken language From Speech to Text I Little Endian: most important bit is rightmost (only From Text to Speech

used on Intel machines) Language modeling I The positions in a byte thus encode: 1 2 4 8 16 32 64 128

24 / 60 Language and Converting decimal numbers to binary Computers Tabular Method Prologue: Encoding Language

Writing systems Alphabetic Syllabic Logographic Systems with unusual Using the first 4 bits, we want to know how to write 10 in bit realization (or binary) notation. Relation to language Encoding written language ASCII 8 4 2 1 Unicode

???? Spoken language Transcription 8 < 10 ? ? ? Why speech is hard to represent 1 8 + 4 = 12 > 10 ? ? Articulation Measuring sound 1 0 8 + 2 = 10 ? Acoustics 1 0 1 0 Relating written and spoken language From Speech to Text From Text to Speech

Language modeling

25 / 60 Language and Converting decimal numbers to binary Computers Division Method Prologue: Encoding Language

Writing systems Alphabetic Syllabic Logographic Systems with unusual realization Relation to language

Encoding written Decimal Remainder? Binary language ASCII 10/2 = 5 no 0 Unicode 5/2 = 2 yes 10 Spoken language Transcription 2/2 = 1 no 010 Why speech is hard to represent 1/2 = 0 yes 1010 Articulation Measuring sound Acoustics

Relating written and spoken language From Speech to Text From Text to Speech

Language modeling

26 / 60 Language and An encoding standard: ASCII Computers Prologue: Encoding Language

Writing systems With 8 bits (a single byte), you can represent 256 different Alphabetic Syllabic characters. Logographic Systems with unusual realization I With 256 possible characters, we can store: Relation to language I every single letter used in English, Encoding written language I plus all the things like , periods, space bar, ASCII percent sign (%), back space, and so on. Unicode Spoken language Transcription ASCII = the American Standard Code for Information Why speech is hard to represent Interchange Articulation Measuring sound I 7-bit code for storing English text Acoustics Relating written and I 7 bits = 128 possible characters. spoken language From Speech to Text I The numeric order reflects alphabetic ordering. From Text to Speech Language modeling

27 / 60 Language and The ASCII chart Computers Prologue: Encoding Language

Codes 1–31 are used for control characters (backspace, line Writing systems Alphabetic feed, tab, . . . ). Syllabic Logographic 32 48 0 65 A 82 R 97 a 114 r Systems with unusual 33 ! 49 1 66 B 83 S 98 b 115 s realization Relation to language 34 “ 50 2 67 C 84 T 99 c 116 t 51 3 68 D 85 U 100 d 117 u Encoding written 35 # language 36 $ 52 4 69 E 86 V 101 e 118 v ASCII 37 % 53 5 70 F 87 W 102 f 119 w Unicode 38 & 54 6 71 G 88 X 103 g 120 x Spoken language 39 ’ 55 7 72 H 89 Y 104 h 121 y Transcription Why speech is hard to 40 ( 56 8 73 I 90 Z 105 i 122 z represent 41 ) 57 9 74 J 91 [ 106 j 123 { Articulation Measuring sound 42 * 58 : 75 K 92 \ 107 k 124 — Acoustics 43 + 59 ; 76 L 93 108 l 125 ] } Relating written and 44 , 60 < 77 M 94 ^ 109 m 126 ˜ spoken language 45 - 61 = 78 N 95 _ 110 n 127 DEL From Speech to Text 46 . 62 > 79 O 96 ‘ 111 o From Text to Speech 47 / 63 ? 80 P 112 p Language modeling 64 @ 81 Q 113 q

28 / 60 Language and E-mail issues Computers Prologue: Encoding Language

Writing systems I Mail sent on the internet used to only be able to transfer Alphabetic Syllabic the 7-bit ASCII messages. But now we can detect the Logographic Systems with unusual incoming character set and adjust the input. realization Relation to language I Note that this is an example of meta-information = Encoding written information which is printed as part of the regular language ASCII message, but tells us something about that message. Unicode Spoken language I Multipurpose Internet Mail Extensions (MIME) provides Transcription Why speech is hard to meta-information on the text, which tells us: represent Articulation I Measuring sound which version of MIME is being used Acoustics I what the charcter set is Relating written and I if that character set was altered, how it was altered spoken language From Speech to Text From Text to Speech Mime-Version: 1.0 Content-Type: text/plain; Language modeling charset=US-ASCII Content-Transfer-Encoding: 7bit

29 / 60 Language and Different coding systems Computers Prologue: Encoding Language

Writing systems Alphabetic But wait, didn’t we want to be able to encode all languages? Syllabic Logographic Systems with unusual There are ways ... realization Relation to language I Encoding written Extend the ASCII system with various other systems, language for example: ASCII Unicode I ISO 8859-1: includes extra letters needed for French, Spoken language Transcription German, Spanish, etc. Why speech is hard to represent I ISO 8859-7: Articulation I ISO 8859-8: Hebrew alphabet Measuring sound Acoustics I JIS X 0208: Japanese characters Relating written and spoken language I Have one system for everything → Unicode From Speech to Text From Text to Speech

Language modeling

30 / 60 Language and Unicode Computers Prologue: Encoding Language

Writing systems Problems with having multiple encoding systems: Alphabetic Syllabic I Conflicts: two encodings can use: Logographic Systems with unusual I realization same number for two different characters Relation to language I different numbers for the same character Encoding written language I Hassle: have to install many, many systems if you want ASCII to be able to deal with various languages Unicode Spoken language Transcription Unicode tries to fix that by having a single representation for Why speech is hard to represent every possible character. Articulation Measuring sound “Unicode provides a unique number for every Acoustics Relating written and character, no matter what the platform, no matter spoken language what the program, no matter what the language.” From Speech to Text From Text to Speech

(www.unicode.org) Language modeling

31 / 60 Language and How big is Unicode? Computers Prologue: Encoding Language

Writing systems Alphabetic Syllabic Logographic Systems with unusual realization Version 6.0 has codes for 109,449 characters from Relation to language alphabets, syllabaries and logographic systems. Encoding written language ASCII I Uses 32 bits – meaning we can store Unicode 32 2 = 4, 294, 967, 296 characters. Spoken language Transcription Why speech is hard to I 4 billion possibilities for each character? That takes a lot represent Articulation of space on the computer! Measuring sound Acoustics

Relating written and spoken language From Speech to Text From Text to Speech

Language modeling

32 / 60 Language and Compact encoding of Unicode characters Computers Prologue: Encoding Language

Writing systems I Alphabetic Unicode has three versions Syllabic Logographic I UTF-32 (32 bits): direct representation Systems with unusual 16 realization I UTF-16 (16 bits): 2 = 65536 Relation to language 8 I UTF-8 (8 bits): 2 = 256 Encoding written language I 32 ASCII How is it possible to encode 2 possibilities in 8 bits Unicode

(UTF-8)? Spoken language Transcription I Several bytes are used to represent one character. Why speech is hard to represent I Use the highest bit as flag: Articulation Measuring sound I highest bit 0: single character Acoustics I highest bit 1: part of a multi byte character Relating written and spoken language I From Speech to Text Nice consequence: ASCII text is in a valid UTF-8 From Text to Speech

encoding. Language modeling

33 / 60 Language and UTF-8 details Computers Prologue: Encoding Language I First byte unambiguously tells you how many bytes to expect after it Writing systems Alphabetic I e.g., first byte of 11110xxx has a four total bytes Syllabic Logographic I all non-starting bytes start with 10 = not the initial byte Systems with unusual realization Relation to language

Encoding written Byte 1 Byte 2 Byte 3 Byte 4 Byte 5 Byte 6language 0xxxxxxx ASCII Unicode 110xxxxx 10xxxxxx Spoken language 1110xxxx 10xxxxxx 10xxxxxx Transcription Why speech is hard to 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx represent 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx Articulation Measuring sound 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxxAcoustics Relating written and Example: Greek α (‘alpha’) has a code value of 945 spoken language From Speech to Text I Binary: 11 10110001 From Text to Speech I 11 10110001 = 011 10110001 = 01110 110001 Language modeling I Insert these numbers into x’s in the second row: 11001110 10110001 34 / 60 Language and Unwritten languages Computers Prologue: Encoding Language

Writing systems Alphabetic Syllabic Many languages have never been written down. Of the 6912 Logographic Systems with unusual realization spoken languages, approximately 3000 have never been Relation to language

written down. Encoding written language ASCII Some examples: Unicode Spoken language I Salar, a Turkic language in China. Transcription Why speech is hard to represent I Gugu Badhun, a language in Australia. Articulation Measuring sound I Southeastern Pomo, a language in California Acoustics Relating written and spoken language (See: http://www.ethnologue.com/ and http://www.sil.org/mexico/ilv/iinfoilvmexico.htm) From Speech to Text From Text to Speech

Language modeling

35 / 60 Language and The need for speech Computers Prologue: Encoding Language

Writing systems Alphabetic We want to be able to encode any spoken language Syllabic Logographic Systems with unusual I What if we want to work with an unwritten language? realization Relation to language

I What if we want to examine the way someone talks and Encoding written language don’t have time to write it down? ASCII Unicode

Many applications for encoding speech: Spoken language Transcription I Building spoken dialogue systems, i.e. speak with a Why speech is hard to represent computer (and have it speak back). Articulation Measuring sound I Helping people sound like native speakers of a foreign Acoustics language. Relating written and spoken language I Helping speech pathologists diagnose problems From Speech to Text From Text to Speech

Language modeling

36 / 60 Language and What does speech look like? Computers Prologue: Encoding Language

Writing systems We can transcribe (write down) the speech into a phonetic Alphabetic Syllabic alphabet. Logographic Systems with unusual realization I It is very expensive and time-consuming to have Relation to language Encoding written humans do all the transcription. language ASCII I To automatically transcribe, we need to know how to Unicode relate the audio file to the individual sounds that we Spoken language Transcription hear. Why speech is hard to represent ⇒ We need to know: Articulation Measuring sound I some properties of speech Acoustics I Relating written and how to measure these speech properties spoken language I how these measurements correspond to sounds we From Speech to Text hear From Text to Speech Language modeling

37 / 60 Language and What makes representing speech hard? Computers Prologue: Encoding Language Sounds run together, and it’s hard to tell where one sound Writing systems ends and another begins. Alphabetic Syllabic People say things differently from one another: Logographic Systems with unusual realization I People have different Relation to language I People have different size vocal tracts Encoding written language ASCII People say things differently across time: Unicode I What we think of as one sound is not always (usually) Spoken language Transcription said the same: coarticulation = sounds affecting the Why speech is hard to represent way neighboring sounds are said Articulation Measuring sound e.g. k is said differently depending on if it is followed by ee or Acoustics Relating written and by oo. spoken language From Speech to Text I What we think of as two sounds are not always all that From Text to Speech different. Language modeling e.g. The s in see is acoustically very similar to the in shoe

38 / 60 Language and Articulatory properties: How it’s produced Computers Prologue: Encoding Language

Writing systems Alphabetic Syllabic Logographic Systems with unusual We could talk about how sounds are produced in the vocal realization tract, i.e. articulatory Relation to language Encoding written language I place of articulation (where): [t] vs. [k] ASCII Unicode

I manner of articulation (how): [t] vs. [s] Spoken language Transcription I voicing (vocal cord vibration): [t] vs. [d] Why speech is hard to represent Articulation But we need to know acoustic properties of speech which Measuring sound we can quantify. Acoustics Relating written and spoken language From Speech to Text From Text to Speech

Language modeling

39 / 60 Language and Measuring sound Computers Prologue: Encoding Language

Writing systems sampling rate = how many times in a given second we Alphabetic Syllabic extract a moment of sound; measured in samples per Logographic Systems with unusual second realization Relation to language I Sound is continuous, but we have to store data in a Encoding written language discrete manner. ASCII Unicode

Spoken language Transcription Why speech is hard to represent Articulation CONTINUOUS DISCRETE Measuring sound Acoustics I We store data at each discrete point, in order to capture Relating written and spoken language the general pattern of the sound From Speech to Text From Text to Speech Now, we can talk about what we need to measure Language modeling

40 / 60 Language and Acoustic properties: What it sounds like Computers Prologue: Encoding Language

Sound waves = “small variations in air pressure that occur Writing systems very rapidly one after another” (Ladefoged, A Course in Alphabetic Syllabic Phonetics), akin to ripples in a pond Logographic Systems with unusual realization The main properties we measure: Relation to language Encoding written I speech flow = rate of speaking, number and length of language ASCII pauses (seconds) Unicode I loudness (amplitude) = amount of energy (decibels) Spoken language Transcription I frequencies = how fast the sound waves are repeating Why speech is hard to represent (cycles per second, i.e. Hertz) Articulation Measuring sound Acoustics I pitch = how high or low a sound is Relating written and I In speech, there is a fundamental frequency, or pitch, spoken language along with higher-frequency overtones. From Speech to Text From Text to Speech Researchers also look at things like intonation, i.e., the rise Language modeling and fall in pitch

41 / 60 Language and Oscillogram (Waveform) Computers Prologue: Encoding Language

Writing systems Alphabetic Syllabic Logographic Systems with unusual realization Relation to language

Encoding written language ASCII Unicode

Spoken language Transcription Why speech is hard to represent Articulation (Check out the Speech Analysis Tutorial, of the Deptartment of Linguistics at Lund University, Sweden at Measuring sound Acoustics http://www.ling.lu.se/research/speechtutorial/tutorial.html, from which the illustrations on this and the following Relating written and slides are taken.) spoken language From Speech to Text From Text to Speech

Language modeling

42 / 60 Language and Fundamental frequency (F0, pitch) Computers Prologue: Encoding Language

Writing systems Alphabetic Syllabic Logographic Systems with unusual realization Relation to language

Encoding written language ASCII Unicode

Spoken language Transcription Why speech is hard to represent Articulation Measuring sound Acoustics

Relating written and spoken language From Speech to Text From Text to Speech

Language modeling

43 / 60 Language and Spectrograms Computers Prologue: Encoding Language

Writing systems Alphabetic Syllabic Logographic Spectrogram = a graph to represent (the frequencies of) Systems with unusual realization speech over time. Relation to language

Encoding written language ASCII Unicode

Spoken language Transcription Why speech is hard to represent Articulation Measuring sound Acoustics

Relating written and spoken language From Speech to Text From Text to Speech

Language modeling

44 / 60 Language and Measurement-souund correspondence Computers Prologue: Encoding Language

Writing systems Alphabetic Syllabic I How dark is the picture? → How loud is the sound? Logographic Systems with unusual realization I We can measure this in decibels. Relation to language Encoding written I Where are the lines the darkest? → Which frequencies language ASCII are the loudest and most important? Unicode

I We can measure this in terms of Hertz, and it tells us Spoken language Transcription what the vowels are. Why speech is hard to represent Articulation I How do these dark lines change? → How are the Measuring sound Acoustics frequencies changing over time? Relating written and spoken language I Which consonants are we transitioning into? From Speech to Text From Text to Speech

Language modeling

45 / 60 Language and Applications of speech encoding Computers Prologue: Encoding Language

Writing systems Alphabetic Syllabic Logographic Systems with unusual realization Mapping sounds to symbols (alphabet), and vice versa, has Relation to language Encoding written some very practical uses. language ASCII I Automatic Speech Recognition (ASR): sounds to text Unicode Spoken language I Text-to-Speech Synthesis (TTS): texts to sounds Transcription Why speech is hard to represent As we’ll see, these are not easy tasks. Articulation Measuring sound Acoustics

Relating written and spoken language From Speech to Text From Text to Speech

Language modeling

46 / 60 Language and Automatic Speech Recognition (ASR) Computers Prologue: Encoding Language

Writing systems Alphabetic Automatic speech recognition = process by which the Syllabic Logographic Systems with unusual computer maps a speech signal to text. realization Relation to language

Encoding written Uses/Applications: language ASCII I Dictation Unicode Spoken language I Dialogue systems Transcription Why speech is hard to I Telephone conversations represent Articulation Measuring sound I People with disabilities – e.g. a person hard of hearing Acoustics could use an ASR system to the text (closed Relating written and spoken language captioning) From Speech to Text From Text to Speech

Language modeling

47 / 60 Language and Steps in an ASR system Computers Prologue: Encoding Language

Writing systems Alphabetic Syllabic Logographic Systems with unusual 1. Digital sampling of speech realization Relation to language 2. Acoustic signal processing = converting the speech Encoding written samples into particular measurable units language ASCII 3. Recognition of sounds, groups of sounds, and words Unicode Spoken language Transcription May or may not use more sophisticated analysis of the Why speech is hard to represent utterance to help. Articulation Measuring sound I e.g., a [t] might sound like a [d], and so word Acoustics Relating written and information might be needed (more on this later) spoken language From Speech to Text From Text to Speech

Language modeling

48 / 60 Language and Kinds of ASR systems Computers Prologue: Encoding Language

Writing systems Alphabetic Different kinds of systems, with an accuracy-robustness Syllabic Logographic tradeoff: Systems with unusual realization Relation to language I Speaker dependent = work for a single speaker Encoding written I Speaker independent = work for any speaker of a given language ASCII variety of a language, e.g. American English Unicode Spoken language Thus, a common type of system starts general, but learns: Transcription Why speech is hard to represent I Speaker adaptive = start as independent but begin to Articulation adapt to a single speaker to improve accuracy Measuring sound Acoustics

I Adaptation may simply be identifying what type of Relating written and speaker a person is and then using a model for that spoken language From Speech to Text type of speaker From Text to Speech Language modeling

49 / 60 Language and Kinds of ASR systems Computers Prologue: Encoding Language

Writing systems Alphabetic Syllabic Logographic I Systems with unusual Differing sizes and types of vocabularies realization I from tens of words to tens of thousands of words Relation to language Encoding written I might be very domain-specific, e.g., flight vocabulary language ASCII I continuous speech vs. isolated-word systems: Unicode

Spoken language I continuous speech systems = words connected Transcription Why speech is hard to together and not separated by pauses represent I isolated-word systems = single words recognized at a Articulation Measuring sound time, requiring pauses to be inserted between words Acoustics → easier to find the endpoints of words Relating written and spoken language From Speech to Text From Text to Speech

Language modeling

50 / 60 Language and Text-to-Speech Synthesis (TTS) Computers Prologue: Encoding Language

Could just record a saying phrases or words and then Writing systems play back those words in the appropriate order. Alphabetic Syllabic Logographic I Systems with unusual This won’t work for, e.g., dialogue systems where realization speech is generated on the fly. Relation to language Encoding written language Or can break the text down into smaller units ASCII Unicode 1. Convert input text into phonetic alphabet (unambiguous) Spoken language Transcription Why speech is hard to 2. Synthesize phonetic characters into speech represent Articulation To synthesize characters into speech, people have tried: Measuring sound Acoustics I Relating written and using formulas which adjust the values of the spoken language frequencies, the loudness, etc. From Speech to Text From Text to Speech I using a model of the vocal tract and trying to produce Language modeling sounds based on how a human would speak

51 / 60 Language and Synthesizing Speech Computers Prologue: Encoding Language

Writing systems Alphabetic Syllabic In some sense, TTS really is the reverse process of ASR Logographic Systems with unusual I Since we know what frequencies correspond to which realization Relation to language

vowels, we can play those frequencies to make it sound Encoding written like the right vowel. language ASCII I However, as mentioned before, sounds are always Unicode Spoken language different (across time, across speakers) Transcription Why speech is hard to represent One way to generate speech is to have a database of Articulation speech and to use the diphones, i.e., two-sound segments, Measuring sound Acoustics

to generate sounds. Relating written and spoken language I Diphones help with the context-dependence of sounds From Speech to Text From Text to Speech

Language modeling

52 / 60 Language and Speech to Text to Speech Computers Prologue: Encoding Language

Writing systems Alphabetic Syllabic Logographic Systems with unusual realization If we convert speech to text and then back to speech, it Relation to language should sound the same, right? Encoding written language ASCII I But at the conversion stages, there is information loss. Unicode To avoid this loss would require a lot of memory and Spoken language Transcription knowledge about what exact information to store. Why speech is hard to represent I Articulation The process is thus irreversible. Measuring sound Acoustics

Relating written and spoken language From Speech to Text From Text to Speech

Language modeling

53 / 60 Language and Demos Computers Prologue: Encoding Language

Writing systems Alphabetic Syllabic Logographic Systems with unusual realization Text-to-Speech Relation to language Encoding written language I ASCII AT&T mulitilingual TTS system: Unicode ∼ http://www2.research.att.com/ ttsweb/tts/demo.php Spoken language Transcription I various systems and languages: Why speech is hard to represent http://www.ims.uni-stuttgart.de/∼moehler/synthspeech/ Articulation Measuring sound Acoustics

Relating written and spoken language From Speech to Text From Text to Speech

Language modeling

54 / 60 Language and N-grams: Motivation Computers Prologue: Encoding Language Let’s say we’re having trouble telling what word a person Writing systems said in an ASR system Alphabetic Syllabic I Logographic We could look it up in a phonetic dictionary Systems with unusual realization I But if we hear something like ni, how can we tell if it’s Relation to language knee, neat, need, or some other word? Encoding written language I All of these are plausible words ASCII I So, we can assign a probability, or weight, to each Unicode change: Spoken language Transcription I e.g., deleting a [t] at the end of a word is slightly more Why speech is hard to represent common than deleting a [d] Articulation I We can look at how far off a word is from the Measuring sound Acoustics pronunciation; we’ll return to the issue of minimum edit Relating written and distance with spell checking spoken language From Speech to Text I But if the previous word was I, the right choice becomes From Text to Speech clearer ... Language modeling

Material based upon chapter 5 of Jurafsky and Martin 2000

55 / 60 Language and N-gram definition Computers Prologue: Encoding Language

Writing systems Alphabetic An n-gram is a stretch of text n words long Syllabic Logographic Systems with unusual I Approximation of language: information in n-grams tells realization us something about language, but doesn’t capture the Relation to language Encoding written structure language ASCII I Efficient: finding and using every, e.g., two-word Unicode Spoken language collocation in a text is quick and easy to do Transcription Why speech is hard to represent N-grams help a variety of NLP applications, including word Articulation Measuring sound prediction Acoustics

Relating written and I N-grams can be used to aid in predicting the next word spoken language − From Speech to Text of an utterance, based on the previous n 1 words From Text to Speech

Language modeling

56 / 60 Language and Simple n-grams Computers Prologue: Encoding Language

Writing systems Let’s assume we want to predict the next word, based on the Alphabetic Syllabic previous context of I dreamed I saw the knights in Logographic Systems with unusual realization I What we want to find is the likelihood of w8 being the Relation to language next word, given that we’ seen w1, ..., w7 Encoding written language I So, we’ll have to examine P(w1, ..., w8) ASCII Unicode In general, for wn, we are looking for: Spoken language Transcription Why speech is hard to (3) P(w1, ..., wn) = P(w1)P(w2|w1)...P(wn|w1, ..., wn−1) represent Articulation Measuring sound But these probabilities are impractical to calculate: they Acoustics hardly ever occur in a corpus, if at all. Relating written and spoken language I From Speech to Text And it would be a lot of data to store, if we could From Text to Speech

calculate them. Language modeling

57 / 60 Language and Unigrams Computers Prologue: Encoding Language

Writing systems Alphabetic So, we can approximate these probabilities to a particular Syllabic Logographic n-gram, for a given n. What should n be? Systems with unusual realization I Unigrams (n = 1): Relation to language Encoding written (4) P(w |w , ..., w ) ≈ P(w ) language n 1 n−1 n ASCII Unicode

I Easy to calculate, but we have no contextual information Spoken language Transcription Why speech is hard to (5) The quick brown fox jumped represent Articulation Measuring sound (6) P(jumped|The, quick, brown, fox) ≈ P(jumped) Acoustics

Relating written and I We would like to say that over has a higher probability spoken language From Speech to Text in this context than lazy does. From Text to Speech Language modeling

58 / 60 Language and Bigrams Computers Prologue: Encoding Language

Writing systems Alphabetic Syllabic Logographic bigrams (n = 2) are a better choice and still easy to Systems with unusual realization calculate: Relation to language Encoding written (7) P(wn|w1, ..., wn−1) ≈ P(wn|wn−1) language ASCII (8) P(over|The, quick, brown, fox, jumped) ≈ Unicode Spoken language P(over|jumped) Transcription Why speech is hard to represent And thus, we obtain for the probability of a sentence: Articulation Measuring sound Acoustics (9) P(w , ..., w ) = P(w )P(w |w )P(w |w )...P(w |w ) 1 n 1 2 1 3 2 n n−1 Relating written and spoken language From Speech to Text From Text to Speech

Language modeling

59 / 60 Language and Bigram example Computers Prologue: Encoding Language

Writing systems Alphabetic Syllabic Logographic Systems with unusual What is the probability of seeing the sentence The quick realization brown fox jumped over the lazy dog? Relation to language Encoding written language (10) P(The quick brown fox jumped over the lazy dog) = ASCII Unicode

P(The|START)P(quick|The)P(brown|quick)...P(dog|lazy) Spoken language Transcription Why speech is hard to Or, for our ASR example, we can compare: represent Articulation Measuring sound (11) P(need|I) > P(neat|I) Acoustics Relating written and spoken language From Speech to Text From Text to Speech

Language modeling

60 / 60