<<

and Language and Language and Computers Language and Computers – where to start? Computers Outline Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding Speech Encoding

Writing systems systems Writing systems Alphabetic Alphabetic Alphabetic Syllabic Syllabic Syllabic Logographic I Logographic Logographic Systems with unusual If we want to do anything with language, we need a way Systems with unusual Writing systems Systems with unusual realization to represent language. realization realization Language and Computers (Ling 384) Relation to language Relation to language Relation to language Comparison of systems Comparison of systems Encoding written language Comparison of systems Topic 1: Text and Speech Encoding Encoding written I We can interact with the computer in several ways: Encoding written Encoding written language language language ASCII I write or read text ASCII ASCII Unicode Spoken language Unicode Typing it in I speak or listen to speech Typing it in Typing it in Detmar Meurers∗ Spoken language Spoken language Spoken language Dept. of Linguistics, OSU Transcription I Computer has to have some way to represent Transcription Transcription Why speech is hard to Why speech is hard to Relating written and spoken language Why speech is hard to represent represent represent Winter 2005 I Articulation text Articulation Articulation Acoustics I speech Acoustics Acoustics Relating written and Relating written and Relating written and spoken language spoken language spoken language From Speech to Text From Speech to Text From Speech to Text From Text to Speech From Text to Speech From Text to Speech

∗ The course was created together with Markus Dickinson and Chris Brew.

1 / 57 2 / 57 3 / 57

Language and Language and Language and Writing systems used for human Computers Alphabetic systems Computers example: Fraser Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding An alphabet used to write Lisu, a Tibeto-Burman language spoken by Speech Encoding

What is writing? Writing systems Writing systems about 657,000 people in Myanmar, India, Thailand and in the Chinese Writing systems Alphabetic (phonemic alphabets) Alphabetic provinces of Yunnan and Sichuan. Alphabetic “a of more or less permanent marks used Syllabic Syllabic Syllabic Logographic Logographic Logographic to represent an utterance in such a way that it can Systems with unusual Systems with unusual Systems with unusual realization I represent all sounds, i.e., and realization realization be recovered more or less exactly without the Relation to language Relation to language Relation to language Comparison of systems Comparison of systems Comparison of systems intervention of the utterer.” I Examples: Etruscan, Latin, Korean, Cyrillic, Runic, Encoding written Encoding written Encoding written (Peter . Daniels, The World’s Writing Systems) language International Phonetic Alphabet language language ASCII ASCII ASCII Unicode Unicode Unicode Typing it in ( alphabets) Typing it in Typing it in Different types of writing systems are used: Spoken language Spoken language Spoken language Transcription Transcription Transcription Why speech is hard to Why speech is hard to Why speech is hard to I Alphabetic represent I represent consonants only (sometimes plus selected represent represent Articulation Articulation Articulation I Syllabic Acoustics vowels; generally available) Acoustics Acoustics Relating written and I Examples: , , Relating written and Relating written and I Logographic spoken language spoken language spoken language From Speech to Text From Speech to Text From Speech to Text Much of the information on writing systems and the graphics used are From Text to Speech From Text to Speech From Text to Speech taken from the amazing site http://www.omniglot.com.

(from: http://www.omniglot.com/writing/fraser.htm)

4 / 57 5 / 57 6 / 57

Language and Language and Language and example: Phoenician Computers A note on the -sound correspondence Computers More examples for non-transparent letter-sound Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and An alphabet used to write Phoenician, created between the 18th and 17th Speech Encoding Speech Encoding correspondences Speech Encoding

centuries BC; assumed to be the forerunner of the Greek and Hebrew Writing systems I Alphabets use letters to encode sounds (consonants, Writing systems Writing systems alphabet. Alphabetic Alphabetic Alphabetic Syllabic vowels). Syllabic Syllabic Logographic Logographic French Logographic Systems with unusual I Systems with unusual Systems with unusual realization But the correspondence between spelling and realization realization Relation to language Relation to language (1) a. Versailles → [veRsai] Relation to language Comparison of systems pronounciation in many languages is quite complex, Comparison of systems Comparison of systems

Encoding written i.e., not a simple one-to-one correspondence. Encoding written . ete, etais, etait, etaient → [ete] Encoding written language language language ASCII I Example: English ASCII ASCII Unicode Unicode Unicode Typing it in Typing it in Typing it in I same spelling – different sounds: ough: ought, cough, Irish Spoken language Spoken language Spoken language Transcription tough, through, though, hiccough Transcription Transcription Why speech is hard to I Why speech is hard to Why speech is hard to represent silent letters: knee, knight, knife, debt, psychology, represent (2) a. Baile A’tha Cliath (Dublin) → [bl’a: kli uh] represent Articulation mortgage Articulation Articulation Acoustics Acoustics b. samhradh (summer) → [sauruh] Acoustics I one letter – multiple sounds: exit, use Relating written and Relating written and c. (I write) → [shgri:m] Relating written and spoken language I multiple letters – one sound: the, revolution spoken language scri’obhaim spoken language From Speech to Text From Speech to Text From Speech to Text I From Text to Speech alternate spellings: jail or gaol; but not possible seagh From Text to Speech From Text to Speech (from: http://www.omniglot.com/writing/phoenician.htm) for chef (despite sure, dead, laugh) What is the notation used within the []?

7 / 57 8 / 57 9 / 57 Language and Language and Language and The International Phonetic Alphabet (IPA) Computers Syllabic systems Computers example: Cypriote Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding Speech Encoding Syllabic alphabets (Alphasyllabaries) The or Cypro-Minoan writing is thought to have Writing systems Writing systems developed from the , or possibly the of , Writing systems I Several special alphabets for representing sounds have Alphabetic I writing systems with symbols that represent a Alphabetic Alphabetic Syllabic Syllabic though its exact origins are not known. It was used from about 800 to 200 Syllabic been developed, the best known being the International Logographic consonant with a vowel, but the vowel can be changed Logographic Logographic Systems with unusual Systems with unusual BC. Systems with unusual realization realization realization Phonetic Alphabet (IPA). Relation to language by adding a (= a added to the letter). Relation to language Relation to language Comparison of systems Comparison of systems Comparison of systems I Examples: Balinese, Javanese, Tibetan, Tamil, Thai, I The phonetic symbols are unambiguous: Encoding written Encoding written Encoding written language Tagalog language language I designed so that each speech sound gets its own ASCII ASCII ASCII Unicode (cf. also: http://www.omniglot.com/writing/syllabic.htm) Unicode Unicode symbol, Typing it in Typing it in Typing it in I eliminating the need for Spoken language Spoken language Spoken language I multiple symbols used to represent simple sounds Transcription Transcription Transcription Why speech is hard to Why speech is hard to Why speech is hard to I one symbol being used for multiple sounds. represent represent represent Articulation Articulation Articulation Acoustics I writing systems with separate symbols for each Acoustics Acoustics I Interactive example chart: http://web.uvic.ca/ling/ Relating written and of a language Relating written and Relating written and spoken language spoken language spoken language resources/ipa/charts/IPAlab/IPAlab.htm From Speech to Text I Examples: Cherokee. Ethiopic, Cypriot, Ojibwe, From Speech to Text From Speech to Text From Text to Speech From Text to Speech From Text to Speech (Japanese)

(cf. also: http://www.omniglot.com/writing/syllabaries.htm#syll) (from: http://www.omniglot.com/writing/cypriot.htm)

10 / 57 11 / 57 12 / 57

Language and Language and Language and Syllabic alphabet example: Lao Computers Logographic writing systems Computers Logograph example: Chinese Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding Speech Encoding Script developed in the 14th century to write the Lao language, based on I Logographs (also called ): Pictographs Writing systems Writing systems Writing systems an early version of the , which was developed from the Old I Alphabetic Pictographs (): originally pictures of Alphabetic Alphabetic , which was itself based on Mon scripts. Syllabic things, now stylized and simplified. Syllabic Syllabic Logographic Logographic Logographic Systems with unusual Systems with unusual Systems with unusual realization Example: development of Chinese horse: realization realization Example for vowel diacritics around the letter k: Relation to language Relation to language Relation to language Comparison of systems Comparison of systems Ideographs Comparison of systems Encoding written Encoding written Encoding written language language language ASCII ASCII ASCII Unicode Unicode Unicode Typing it in I Ideographs (): representations of abstract Typing it in Typing it in

Spoken language ideas Spoken language Spoken language Transcription I Compounds: combinations of two or more ideographs Transcription Transcription Why speech is hard to Why speech is hard to Why speech is hard to represent or ideograms. represent represent Articulation Articulation Compounds of Pictographs/Ideographs Articulation Acoustics I Semantic-phonetic compounds: symbols with a Acoustics Acoustics Relating written and meaning element (hints at meaning) and a phonetic Relating written and Relating written and spoken language spoken language spoken language From Speech to Text element (hints at pronunciation). From Speech to Text From Speech to Text From Text to Speech From Text to Speech From Text to Speech I Examples: Chinese (Zhongw¯ en)´ , Japanese (Nihongo), (from: http://www.omniglot.com/writing/lao.htm) Mayan, Vietnamese, Ancient Egyptian (from: http://www.omniglot.com/writing/chinese types.htm)

13 / 57 14 / 57 15 / 57

Language and Language and Language and Semantic-phonetic compounds Computers Two writing systems with unusual realization Computers alphabet Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding Speech Encoding Tactile Writing systems Writing systems Writing systems Alphabetic Alphabetic Alphabetic Syllabic I Braille is a writing system that makes it possible to read Syllabic Syllabic Logographic Logographic Logographic Systems with unusual and write through touch; primarily used by the (partially) Systems with unusual Systems with unusual realization realization realization Relation to language blind. Relation to language Relation to language Comparison of systems Comparison of systems Comparison of systems

Encoding written I It uses patterns of raised dots arranged in cells of up to Encoding written Encoding written language six dots in a 3 x 2 configuration. language language ASCII ASCII ASCII Unicode Unicode Unicode Typing it in I Each pattern represents a character, but some frequent Typing it in Typing it in Spoken language words and letter combinations have their own pattern. Spoken language Spoken language Transcription Transcription Transcription Why speech is hard to Why speech is hard to Why speech is hard to represent represent represent An example from Ancient Egyptian Articulation Articulation Articulation Acoustics Chromatographic Acoustics Acoustics Relating written and The Benin and Edo people in southern Nigeria have Relating written and Relating written and spoken language spoken language spoken language From Speech to Text developed a system of writing based on different color From Speech to Text From Speech to Text From Text to Speech From Text to Speech From Text to Speech combinations and symbols.

(cf. http://www.library.cornell.edu/africana/Writing Systems/Chroma.html)

(from: http://www.omniglot.com/writing/egyptian.htm) 16 / 57 17 / 57 18 / 57 Language and Language and Language and Chromatographic system Computers Relating writing systems to languages Computers Japanese Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding Speech Encoding

Writing systems Writing systems Japanese: logographic system , syllabary , Writing systems Alphabetic Alphabetic Alphabetic Syllabic Syllabic syllabary hiragana Syllabic Logographic Logographic Logographic Systems with unusual I There is not a simple correspondence between a Systems with unusual Systems with unusual realization realization I kanji: 5,000-10,000 borrowed realization Relation to language writing system and a language. Relation to language Relation to language Comparison of systems Comparison of systems I katakana Comparison of systems Encoding written I For example, English uses the Roman alphabet, but Encoding written Encoding written language language I Used mainly for non-Chinese loan words, onomatopoeic language ASCII (e.g., 2 instead of the Roman II). ASCII ASCII Unicode Unicode words, foreign names, and for emphasis Unicode Typing it in I We’ll look at three other examples: Typing it in I hiragana Typing it in Spoken language Spoken language Spoken language Transcription I Japanese Transcription I Originally used only by women (10th century), but Transcription Why speech is hard to Why speech is hard to Why speech is hard to represent I Korean represent codified in 1946 with 48 represent Articulation Articulation Articulation I I Acoustics Azeri Acoustics used mainly for word endings, kids’ books, and for Acoustics

Relating written and Relating written and words with obscure kanji symbols Relating written and spoken language spoken language spoken language From Speech to Text From Speech to Text I Romaji: Roman characters From Speech to Text From Text to Speech From Text to Speech From Text to Speech

19 / 57 20 / 57 21 / 57

Language and Language and Language and Japanese example Computers Korean Computers Azeri Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding Speech Encoding

Writing systems Writing systems Writing systems Alphabetic “Korean writing is an alphabet, a syllabary and logographs Alphabetic A Turkish language with speakers in Azerbaijan, northwest Alphabetic Syllabic Syllabic Syllabic Logographic all at once.” (http://home.vicnet.net.au/˜ozideas/writkor.htm) Logographic Iran, and (former Soviet) Logographic kanji (red), hiragana (black), katakana (blue) Systems with unusual Systems with unusual Systems with unusual realization realization realization Relation to language I The system was developed in 1444 during King Relation to language I 7th century until 1920s: Arabic scripts. Three different Relation to language Comparison of systems Sejong’s reign. Comparison of systems Arabic scripts used Comparison of systems Encoding written Encoding written Encoding written Translation: I There are 24 letters: 14 consonants and 10 vowels language language I 1929: enforced by Soviets to reduce language Capsule Hotel ASCII I But the letters are grouped into syllables, i.e. the letters ASCII ASCII Unicode Unicode Islamic influence. Unicode A simple hotel where each room is Typing it in in a syllable are not written separately as in the English Typing it in Typing it in Spoken language system, but together form a single character. Spoken language I 1939: Cyrillic alphabet enforced by Stalin Spoken language capsule-shaped. When businessmen miss the last Transcription Transcription Transcription Why speech is hard to E.g., “Hangeul” (from: http://www.omniglot.com/writing/korean.htm): Why speech is hard to I Why speech is hard to train home, they can stay overnight very cheaply represent represent 1991: Back to Latin alphabet, but slightly different than represent Articulation Articulation before. Articulation instead of paying a lot of money to go home by taxi. Acoustics Acoustics Acoustics I Relating written and In South Korea, (logographic Chinese characters) Relating written and → Latin and computer were in great Relating written and spoken language are also used. spoken language spoken language From Speech to Text From Speech to Text demand in 1991 From Speech to Text From Text to Speech From Text to Speech From Text to Speech (from: http://www.omniglot.com/writing/japanese.htm#origin)

22 / 57 23 / 57 24 / 57

Language and Language and Language and Comparison of writing systems Computers Encoding written language Computers Using bytes to store characters Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding Speech Encoding I Information on a computer is stored in bits. Writing systems Writing systems Writing systems What are the pros and cons of each type of system? Alphabetic I A bit is either on (= 1, yes) or off (= 0, no). Alphabetic Alphabetic Syllabic Syllabic Syllabic Logographic I A list of 8 bits makes up a byte, e.g., 01001010 Logographic With 8 bits (a single byte), you can represent 256 different Logographic I accuracy: Can every word be written down accurately? Systems with unusual Systems with unusual Systems with unusual realization realization realization Relation to language I Just like with the base 10 numbers we’re used to, the Relation to language characters. Why would we want so many? Relation to language I learnability: How long does it take to learn the system? Comparison of systems Comparison of systems Comparison of systems order of the bits in a byte matters: I I Encoding written Encoding written If you look at a keyboard, you will find lots of Encoding written cognitive ability: Are some systems unnatural? (e.g. language I Big Endian: most important bit is leftmost (the standard language language non-English characters. Does show that alphabets are unnatural?) ASCII ASCII ASCII Unicode way of doing things) Unicode Unicode Typing it in Typing it in I With 256 possible characters, we can store every single Typing it in I language-particular differences: English has thousands I The positions in a byte thus encode: 128 64 32 16 8 4 2 Spoken language Spoken language letter used in English, plus all the things like , Spoken language of possible syllables; Japanese has very few in Transcription 1 Transcription Transcription Why speech is hard to I “There are 10 kinds of people in the world; those who Why speech is hard to periods, space bar, percent sign (%), back space, and Why speech is hard to comparison represent represent represent Articulation know binary and those who don’t” Articulation so on. Articulation Acoustics Acoustics Acoustics I connection to history/culture: Will changing a writing (from: http://www.wlug.org.nz/LittleEndian) Relating written and Relating written and Relating written and system have social consequences? spoken language I Little Endian: most important bit is rightmost (only spoken language spoken language From Speech to Text From Speech to Text From Speech to Text From Text to Speech used on Intel machines) From Text to Speech From Text to Speech I The positions in a byte thus encode: 1 2 4 8 16 32 64 128

25 / 57 26 / 57 27 / 57 Language and Language and Language and An encoding standard: ASCII Computers The ASCII chart Computers E-mail issues Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding Speech Encoding Codes 1–31 are used for control characters (backspace, line Writing systems Writing systems Writing systems Alphabetic feed, tab, . . . ). Alphabetic Alphabetic Syllabic Syllabic I Have you ever had something like the following at the Syllabic Logographic 32 48 0 65 A 82 R 97 a 114 r Logographic Logographic Systems with unusual Systems with unusual top of an e-mail sent to you? Systems with unusual realization 33 ! 49 1 66 B 83 S 98 b 115 s realization realization I ASCII = the American Standard Code for Information Relation to language 34 “ 50 2 67 C 84 T 99 c 116 t Relation to language [The following text is in the ‘‘ISO-8859-1’’ character set.] Relation to language Comparison of systems 35 # 51 3 68 D 85 U 100 d 117 u Comparison of systems [Your display is set for the ‘‘US-ASCII’’ character set. ] Comparison of systems Interchange Encoding written 36 $ 52 4 69 E 86 V 101 e 118 v Encoding written [Some characters may be displayed incorrectly. ] Encoding written language 37 % 53 5 70 F 87 W 102 f 119 w language language I 7-bit code for storing English text ASCII ASCII ASCII Unicode 38 & 54 6 71 G 88 X 103 g 120 x Unicode I Mail sent on the internet used to only be able to transfer Unicode I 7 bits = 128 possible characters. Typing it in 39 ’ 55 7 72 H 89 Y 104 h 121 y Typing it in the 7-bit ASCII messages. But now we can detect the Typing it in Spoken language 40 ( 56 8 73 I 90 Z 105 i 122 z Spoken language Spoken language I The numeric order reflects alphabetic ordering. Transcription 41 ) 57 9 74 J 91 [ 106 j 123 { Transcription incoming character set and adjust the input. Transcription Why speech is hard to Why speech is hard to Why speech is hard to represent 75 K \ 107 k represent represent 42 * 58 : 92 124 — I Note that this is an example of meta-information = Articulation 43 + 59 ; 76 L 93 ] 108 l 125 } Articulation Articulation Acoustics Acoustics Acoustics 44 , 60 < 77 M 94 ^ 109 m 126 ˜ information which is printed as part of the regular Relating written and _ Relating written and Relating written and spoken language 45 - 61 = 78 N 95 110 n 127 DEL spoken language message, but tells us something about that message. spoken language From Speech to Text 46 . 62 > 79 O 96 ‘ 111 o From Speech to Text From Speech to Text From Text to Speech 47 / 63 ? 80 P 112 p From Text to Speech From Text to Speech 64 @ 81 Q 113 q

28 / 57 29 / 57 30 / 57

Language and Language and Language and Multipurpose Internet Mail Extensions (MIME) Computers Different coding systems Computers Unicode Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding Speech Encoding

Writing systems Writing systems Problems with having multiple encoding systems: Writing systems Alphabetic Alphabetic Alphabetic Syllabic But wait, didn’t we want to be able to encode all languages? Syllabic Syllabic Logographic Logographic I Conflicts: two encodings can use the same number for Logographic Systems with unusual There are ways ... Systems with unusual Systems with unusual MIME provides meta-information on the text, which tells us: realization realization two different characters and use different numbers for realization Relation to language Relation to language Relation to language Comparison of systems I Extend the ASCII system with various other systems, Comparison of systems the same character. Comparison of systems I which version of MIME is being used Encoding written for example: Encoding written I Hassle: have to install many, many systems if you want Encoding written I what the charcter set is language language language ASCII I ISO 8859-1: includes extra letters needed for French, ASCII to be able to deal with various languages ASCII I if that character set was altered, how it was altered Unicode Unicode Unicode Typing it in German, Spanish, etc. Typing it in Unicode tries to fix that by having a single representation for Typing it in Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Spoken language I ISO 8859-7: Spoken language Spoken language Transcription Transcription every possible character. Transcription I Content-Transfer-Encoding: 7bit Why speech is hard to ISO 8859-8: Why speech is hard to Why speech is hard to represent represent represent Articulation I JIS X 0208: Japanese characters Articulation “Unicode provides a unique number for every Articulation Acoustics Acoustics character, no matter what the platform, no matter Acoustics Relating written and I Have one system for everything → Unicode Relating written and Relating written and spoken language spoken language what the program, no matter what the language.” spoken language From Speech to Text From Speech to Text From Speech to Text From Text to Speech From Text to Speech (www.unicode.org) From Text to Speech

31 / 57 32 / 57 33 / 57

Language and Language and Language and How big is Unicode? Computers Compact encoding of Unicode characters Computers How do we type everything in? Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding Speech Encoding

Writing systems Writing systems Writing systems I Alphabetic Unicode has three versions Alphabetic I Use a keyboard tailored to your specific language Alphabetic Syllabic Syllabic Syllabic Logographic I UTF-32 (32 bits): direct representation Logographic e.g. Highly noticeable how much slower your English Logographic Systems with unusual 16 Systems with unusual Systems with unusual Version 3.2 has codes for 95,221 characters from alphabets, realization I UTF-16 (16 bits): 2 = 65536 realization typing is when using a Danish-designed keyboard. realization Relation to language 8 Relation to language Relation to language syllabaries and logographic systems. Comparison of systems I UTF-8 (8 bits): 2 = 256 Comparison of systems I Use a processor that allows you to switch between Comparison of systems Encoding written 32 Encoding written Encoding written I Uses 32 bits – meaning we can store language I How is it possible to encode 2 possibilities in 8 bits language different character systems. language 32 ASCII ASCII e.g. Type in Cyrillic characters on your English ASCII 2 = 4, 294, 967, 296 characters. Unicode (UTF-8)? Unicode Unicode Typing it in Typing it in Typing it in I keyboard. I 4 billion possibilities for each character? That takes a lot Spoken language Several bytes are used to represent one character. Spoken language Spoken language I Transcription I Use the highest bit as flag: Transcription Use combinations of characters. Transcription of space on the computer! Why speech is hard to Why speech is hard to Why speech is hard to represent represent An e followed by an ’ might result in an e´ represent I Articulation highest bit 0: single character Articulation Articulation Acoustics I highest bit 1: part of a multi byte character Acoustics I Pick and choose from a table of characters. Acoustics Relating written and Relating written and Relating written and spoken language I Nice consequence: ASCII text is in a valid UTF-8 spoken language So, now we can encode every language, as long as it’s spoken language From Speech to Text From Speech to Text From Speech to Text From Text to Speech encoding. From Text to Speech written. From Text to Speech

34 / 57 35 / 57 36 / 57 Language and Language and Language and Unwritten languages Computers The need for speech Computers What does speech look like? Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding Speech Encoding

Writing systems Writing systems Writing systems Alphabetic Alphabetic We can transcribe (write down) the speech into a phonetic Alphabetic Syllabic Syllabic alphabet. Syllabic Logographic I What if we want to work with an unwritten language? Logographic Logographic Systems with unusual Systems with unusual Systems with unusual realization I realization I It is very expensive and time-consuming to have realization Many languages have never been written down. Of the 6700 Relation to language What if we want to examine the way someone talks and Relation to language Relation to language spoken, 3000 have never been written down. Comparison of systems don’t have time to write it down? Comparison of systems humans do all the transcription. Comparison of systems Encoding written Encoding written Encoding written language Many applications for encoding speech: language I To automatically transcribe, we need to know how to language I Salar, a Turkic language in China. ASCII ASCII ASCII Unicode Unicode relate the audio file to the individual sounds that we Unicode I I Gugu Badhun, a language in Australia. Typing it in Building spoken dialogue systems, i.e. speak with a Typing it in Typing it in hear. Spoken language Spoken language Spoken language I computer (and have it speak back). Southeastern Pomo, a language in California Transcription Transcription ⇒ We need to know: Transcription Why speech is hard to I Helping people sound like native speakers of a foreign Why speech is hard to Why speech is hard to represent represent represent I Articulation language. Articulation some properties of speech Articulation Acoustics Acoustics I Acoustics I Helping speech pathologists diagnose problems how to measure these speech properties Relating written and Relating written and I Relating written and spoken language spoken language how these measurements correspond to sounds we spoken language From Speech to Text From Speech to Text hear From Speech to Text From Text to Speech From Text to Speech From Text to Speech

37 / 57 38 / 57 39 / 57

Language and Language and Language and What makes representing speech hard? Computers Articulatory properties: How it’s produced Computers Acoustic properties: What it sounds like Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding Speech Encoding Difficulties: Writing systems Writing systems Sound waves = “small variations in air pressure that occur Writing systems Alphabetic Alphabetic very rapidly one after another” (Ladefoged, A Course in Alphabetic I People have different and different size vocal Syllabic Syllabic Syllabic Logographic We could talk about how sounds are produced in the vocal Logographic ) Logographic tracts and thus say things differently Systems with unusual Systems with unusual Systems with unusual realization tract, i.e. articulatory phonetics realization ⇒ Akin to ripples in a pond realization I Sounds run together, and it’s hard to tell where one Relation to language Relation to language Relation to language Comparison of systems Comparison of systems Comparison of systems sound ends and another begins. I Encoding written place of articulation (where): [t] vs. [k] Encoding written I speech flow = rate of speaking, number and length of Encoding written I What we think of as one sound is not always (usually) language language pauses (seconds) language ASCII I (how): [t] vs. [s] ASCII ASCII said the same: coarticulation = sounds affecting the Unicode Unicode I loudness (amplitude) = amount of energy (decibels) Unicode Typing it in I voicing (vocal cord vibration): [t] vs. [d] Typing it in Typing it in way neighboring sounds are said Spoken language Spoken language I frequencies = how fast the sound waves are repeating Spoken language e.g. k is said differently depending on if it is followed by Transcription But unless the computer is modeling a vocal tract, we need Transcription Transcription Why speech is hard to Why speech is hard to (cycles per second, i.e. Hertz) Why speech is hard to ee or by oo. represent to know acoustic properties of speech which we can represent represent Articulation Articulation I pitch = how high or low a sound is Articulation I What we think of as two sounds are not always all that Acoustics Acoustics Acoustics quantify. I In speech, there is a fundamental frequency, or pitch, Relating written and Relating written and Relating written and different. spoken language spoken language along with higher-frequency overtones. spoken language e.g. The s see is very acoustically similar to the in From Speech to Text From Speech to Text From Speech to Text From Text to Speech From Text to Speech I intonation = rise and fall in pitch From Text to Speech shoe

40 / 57 41 / 57 42 / 57

Language and Language and Language and Oszillogram (Waveform) Computers Fundamental frequency (F0, pitch) Computers Spectrograms Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding Speech Encoding

Writing systems Writing systems Writing systems Alphabetic Alphabetic Alphabetic Syllabic Syllabic Syllabic Logographic Logographic Logographic Systems with unusual Systems with unusual Spectrogram = a graph to represent (the frequencies of) Systems with unusual realization realization realization Relation to language Relation to language speech over time. Relation to language Comparison of systems Comparison of systems Comparison of systems

Encoding written Encoding written Encoding written language language language ASCII ASCII ASCII Unicode Unicode Unicode Typing it in Typing it in Typing it in

Spoken language Spoken language Spoken language Transcription Transcription Transcription (Check out the Speech Analysis Tutorial, of the Deptartment of Linguistics at Lund University, Sweden at Why speech is hard to Why speech is hard to Why speech is hard to represent represent represent http://www.ling.lu.se/research/speechtutorial/tutorial.html, from which the illustrations on this and the following Articulation Articulation Articulation Acoustics Acoustics Acoustics slides are taken.) Relating written and Relating written and Relating written and spoken language spoken language spoken language From Speech to Text From Speech to Text From Speech to Text From Text to Speech From Text to Speech From Text to Speech

43 / 57 44 / 57 45 / 57 Language and Language and Language and How measurements correspond to sounds we Computers How did we these measurements? Computers Sampling rate Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and hear Speech Encoding Speech Encoding Speech Encoding

Writing systems Writing systems Writing systems Alphabetic sampling rate = how many times in a given second we Alphabetic Alphabetic Syllabic Syllabic Syllabic Logographic extract a moment of sound; measured in samples per Logographic Logographic Systems with unusual Systems with unusual Systems with unusual I How dark is the picture? → How loud is the sound? realization second realization I The sampling rate is often 8000 or 16,000 samples per realization Relation to language Relation to language Relation to language We can measure this in decibels. Comparison of systems Comparison of systems second. The rate for CDs is 44,100 samples/second (or Comparison of systems I Sound is continuous, but we have to store data in a Encoding written Encoding written Hertz (Hz)) Encoding written I Where are the lines the darkest? → Which frequencies language discrete manner. language language ASCII ASCII I ASCII are the loudest and most important? Unicode Unicode The higher the sampling rate, the better quality the Unicode We can measure this in terms of Hertz, and it tells us Typing it in Typing it in recording ... but the more space it takes. Typing it in Spoken language Spoken language Spoken language what the vowels are. Transcription Transcription I Speech needs at least 8000 samples/second, but most Transcription Why speech is hard to Why speech is hard to Why speech is hard to I How do these dark lines change? → How are the represent CONTINUOUS DISCRETE represent likely 16,000 or 22,050 Hz will be used nowadays. represent Articulation Articulation Articulation Acoustics Acoustics Acoustics frequencies changing over time? I We store data at each discrete point, in order to capture Which consonants are we transitioning into? Relating written and Relating written and Relating written and spoken language the general pattern of the sound spoken language spoken language From Speech to Text From Speech to Text From Speech to Text From Text to Speech From Text to Speech From Text to Speech

46 / 57 47 / 57 48 / 57

Language and Language and Language and Applications of speech encoding Computers Automatic Speech Recognition (ASR) Computers Kinds of ASR systems Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding Speech Encoding

Writing systems Writing systems Writing systems Alphabetic Alphabetic Alphabetic Syllabic Syllabic Syllabic Logographic Logographic Logographic Systems with unusual Automatic speech recognition = process by which the Systems with unusual Systems with unusual realization realization Different kinds of systems: realization Relation to language computer maps a speech signal to text. Relation to language Relation to language Mapping sounds to symbols (alphabet), and vice versa, isn’t Comparison of systems Comparison of systems Comparison of systems Uses/Applications: I Speaker dependent = work for a single speaker all that easy. Encoding written Encoding written Encoding written language language I Speaker independent = work for any speaker of a given language I Automatic Speech Recognition (ASR): sounds to text ASCII I Dictation ASCII ASCII Unicode Unicode variety of a language, e.g. American English Unicode Typing it in Typing it in Typing it in I Text-to-Speech Synthesis (TTS): texts to sounds I Telephone conversations Spoken language Spoken language I Speaker adaptive = start as independent but begin to Spoken language Transcription I People with disabilities – e.g. a person hard of hearing Transcription Transcription Why speech is hard to Why speech is hard to adapt to a single speaker to improve accuracy Why speech is hard to represent could use an ASR system to get the text represent represent Articulation Articulation Articulation Acoustics Acoustics Acoustics

Relating written and Relating written and Relating written and spoken language spoken language spoken language From Speech to Text From Speech to Text From Speech to Text From Text to Speech From Text to Speech From Text to Speech

49 / 57 50 / 57 51 / 57

Language and Language and Language and Kinds of ASR systems Computers Steps in an ASR system Computers Text-to-Speech Synthesis (TTS) Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding Speech Encoding

Writing systems Writing systems Writing systems Alphabetic Alphabetic Could just record a saying phrases or words and then Alphabetic Syllabic Syllabic Syllabic Logographic Logographic play back those words in the appropriate order. Logographic I Differing sizes of vocabularies, from tens of words to Systems with unusual Systems with unusual Systems with unusual realization realization Or can break the text down into smaller units realization Relation to language 1. Digital sampling of speech Relation to language Relation to language tens of thousands of words Comparison of systems Comparison of systems Comparison of systems 2. Acoustic signal processing = converting the speech 1. Convert input text into phonetic alphabet I continuous speech vs. isolated-word systems: Encoding written Encoding written Encoding written language samples into particular measurable units language 2. Synthesize phonetic characters into speech language ASCII ASCII ASCII I continuous speech systems = words connected Unicode Unicode Unicode together and not separated by pauses Typing it in 3. Recognition of sounds, groups of sounds, and words Typing it in To synthesize characters into speech, people have tried: Typing it in I isolated-word systems = single words recognized at a Spoken language Spoken language Spoken language Transcription May or may not use more sophisticated analysis of the Transcription I using formulas which adjust the values of the Transcription time, requiring pauses to be inserted between words Why speech is hard to Why speech is hard to Why speech is hard to represent utterance to help. represent frequencies, the loudness, etc. represent → easier to find the endpoints of words Articulation Articulation Articulation Acoustics Acoustics I using a model of the vocal tract and trying to produce Acoustics Relating written and Relating written and Relating written and spoken language spoken language sounds based on how a human would speak spoken language From Speech to Text From Speech to Text From Speech to Text From Text to Speech From Text to Speech From Text to Speech

52 / 57 53 / 57 54 / 57 Language and Language and Language and It’s hard to be natural Computers Speech to Text to Speech Computers Demos Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding Speech Encoding

Writing systems Writing systems Writing systems When trying to make synthesized speech sound natural, we Alphabetic Alphabetic Alphabetic Syllabic Syllabic Syllabic encounter the same problems as what makes speech Logographic Logographic Logographic Systems with unusual Systems with unusual Systems with unusual encoding in general hard: realization If we convert speech to text and then back to speech, it realization realization Relation to language Relation to language Text-to-Speech Relation to language Comparison of systems should sound the same, right? Comparison of systems Comparison of systems I The same sound is said differently in different contexts. Encoding written Encoding written I AT&T mulitilingual TTS system: Encoding written I Different sounds are sometimes said nearly the same. language I But at the conversion stages, there is information loss. language language ASCII ASCII http://www.research.att.com/projects/tts/demo.html ASCII I Different sentences have different intonation patterns. Unicode To avoid this loss would require a lot of memory and Unicode Unicode Typing it in Typing it in Typing it in I knowledge about what exact information to store. I various systems and languages: Lengths of words vary depending on where in the Spoken language Spoken language Spoken language sentence they are spoken. Transcription Transcription http://www.ims.uni-stuttgart.de/˜moehler/synthspeech/ Transcription Why speech is hard to I The process is thus irreversible. Why speech is hard to Why speech is hard to represent represent represent The car crashed into the tree. Articulation Articulation Articulation It’s my car. Acoustics Acoustics Acoustics Relating written and Relating written and Relating written and Cars, trucks, and bikes are vehicles. spoken language spoken language spoken language From Speech to Text From Speech to Text From Speech to Text From Text to Speech From Text to Speech From Text to Speech

55 / 57 56 / 57 57 / 57