<<

and Language and Language and Computers Language and Computers – where to start? Computers Outline Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding Speech Encoding

Writing systems systems Writing systems Alphabetic Alphabetic Alphabetic Syllabic Syllabic Syllabic Logographic I Logographic Logographic Systems with unusual If we want to do anything with language, we need a way Systems with unusual Writing systems Systems with unusual realization realization realization Linguistics 384: Language and Computers Relation to language to represent language. Relation to language Relation to language Comparison of systems Comparison of systems Encoding written language Comparison of systems Topic 1: Text and Speech Encoding Encoding written I We can interact with the computer in several ways: Encoding written Encoding written language language language ASCII I write or read text ASCII ASCII Unicode Spoken language Unicode Typing it in I speak or listen to speech Typing it in Typing it in Scott Martin∗ Spoken language Spoken language Spoken language Dept. of Linguistics, OSU Transcription I Computer has to have some way to represent Transcription Transcription Why speech is hard to Why speech is hard to Relating written and spoken language Why speech is hard to represent represent represent Spring 2008 I Articulation text Articulation Articulation Acoustics I speech Acoustics Acoustics Relating written and Relating written and Relating written and spoken language spoken language spoken language From Speech to Text From Speech to Text From Speech to Text From Text to Speech From Text to Speech From Text to Speech

∗ The course was created by Chris Brew, Markus Dickinson and Detmar Meurers.

1 / 59 2 / 59 3 / 59

Language and Language and Language and Writing systems used for human Computers Alphabetic systems Computers example: Fraser Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding An alphabet used to write Lisu, a Tibeto-Burman language spoken by Speech Encoding

What is writing? Writing systems Writing systems about 657,000 people in Myanmar, India, Thailand and in the Chinese Writing systems Alphabetic (phonemic alphabets) Alphabetic provinces of Yunnan and Sichuan. Alphabetic “a of more or less permanent marks used Syllabic Syllabic Syllabic Logographic Logographic Logographic to represent an utterance in such a way that it can Systems with unusual Systems with unusual Systems with unusual realization I represent all sounds, i.e., and realization realization be recovered more or less exactly without the Relation to language Relation to language Relation to language Comparison of systems Comparison of systems Comparison of systems intervention of the utterer.” I Examples: Etruscan, Latin, Korean, Cyrillic, Runic, Encoding written Encoding written Encoding written (Peter . Daniels, The World’s Writing Systems) language International Phonetic Alphabet language language ASCII ASCII ASCII Unicode Unicode Unicode Typing it in ( alphabets) Typing it in Typing it in Different types of writing systems are used: Spoken language Spoken language Spoken language Transcription Transcription Transcription Why speech is hard to Why speech is hard to Why speech is hard to I Alphabetic represent I represent consonants only (sometimes plus selected represent represent Articulation Articulation Articulation I Syllabic Acoustics vowels; generally available) Acoustics Acoustics Relating written and I Examples: , , Relating written and Relating written and I Logographic spoken language spoken language spoken language From Speech to Text From Speech to Text From Speech to Text Much of the information on writing systems and the graphics used are From Text to Speech From Text to Speech From Text to Speech taken from the amazing site http://www.omniglot.com.

(from: http://www.omniglot.com/writing/fraser.htm)

4 / 59 5 / 59 6 / 59

Language and Language and Language and example: Phoenician Computers A note on the -sound correspondence Computers More examples for non-transparent letter-sound Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and An abjad used to write Phoenician, created between the 18th and 17th Speech Encoding Speech Encoding correspondences Speech Encoding

centuries BC; assumed to be the forerunner of the Greek and Hebrew Writing systems I Alphabets use letters to encode sounds (consonants, Writing systems Writing systems alphabet. Alphabetic Alphabetic Alphabetic Syllabic vowels). Syllabic Syllabic Logographic Logographic French Logographic Systems with unusual I Systems with unusual Systems with unusual realization But the correspondence between spelling and realization realization Relation to language Relation to language (1) a. Versailles → [veRsai] Relation to language Comparison of systems pronounciation in many languages is quite complex, Comparison of systems Comparison of systems

Encoding written i.e., not a simple one-to-one correspondence. Encoding written . ete, etais, etait, etaient → [ete] Encoding written language language language ASCII I Example: English ASCII ASCII Unicode Unicode Unicode Typing it in Typing it in Typing it in I same spelling – different sounds: ought, cough, tough, Irish Spoken language Spoken language Spoken language Transcription through, though, hiccough Transcription Transcription Why speech is hard to I Why speech is hard to Why speech is hard to represent silent letters: knee, knight, knife, debt, psychology, represent (2) a. Baile A’tha Cliath (Dublin) → [bl’a: kli uh] represent Articulation mortgage Articulation Articulation Acoustics Acoustics b. samhradh (summer) → [sauruh] Acoustics I one letter – multiple sounds: exit, use Relating written and Relating written and → [shgri:m] Relating written and spoken language I multiple letters – one sound: the, revolution spoken language c. scri’obhaim (I write) spoken language From Speech to Text From Speech to Text From Speech to Text I From Text to Speech alternate spellings: jail or gaol; but chef does not have From Text to Speech From Text to Speech (from: http://www.omniglot.com/writing/phoenician.htm) an alternative seagh (despite sure, dead, laugh) What is the notation used within the []?

7 / 59 8 / 59 9 / 59 Language and Language and Language and The International Phonetic Alphabet (IPA) Computers Syllabic systems Computers example: Cypriote Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding Speech Encoding Syllabic alphabets (Alphasyllabaries) The or Cypro-Minoan writing is thought to have Writing systems Writing systems developed from the Linear A, or possibly the of Crete, Writing systems I Several special alphabets for representing sounds have Alphabetic I writing systems with symbols that represent a Alphabetic Alphabetic Syllabic Syllabic though its exact origins are not known. It was used from about 800 to 200 Syllabic been developed, the best known being the International Logographic consonant with a vowel, but the vowel can be changed Logographic Logographic Systems with unusual Systems with unusual BC. Systems with unusual realization realization realization Phonetic Alphabet (IPA). Relation to language by adding a (= a added to the letter). Relation to language Relation to language Comparison of systems Comparison of systems Comparison of systems I Examples: Balinese, Javanese, Tibetan, Tamil, Thai, I The phonetic symbols are unambiguous: Encoding written Encoding written Encoding written language Tagalog language language I designed so that each speech sound gets its own ASCII ASCII ASCII Unicode (cf. also: http://www.omniglot.com/writing/syllabic.htm) Unicode Unicode symbol, Typing it in Typing it in Typing it in I eliminating the need for Spoken language Spoken language Spoken language I multiple symbols used to represent simple sounds Transcription Transcription Transcription Why speech is hard to Why speech is hard to Why speech is hard to I one symbol being used for multiple sounds. represent represent represent Articulation Articulation Articulation Acoustics I writing systems with separate symbols for each Acoustics Acoustics I Interactive example chart: http://web.uvic.ca/ling/ Relating written and of a language Relating written and Relating written and spoken language spoken language spoken language resources/ipa/charts/IPAlab/IPAlab.htm From Speech to Text I Examples: Cherokee. Ethiopic, Cypriot, Ojibwe, From Speech to Text From Speech to Text From Text to Speech From Text to Speech From Text to Speech (Japanese)

(cf. also: http://www.omniglot.com/writing/syllabaries.htm#syll) (from: http://www.omniglot.com/writing/cypriot.htm)

10 / 59 11 / 59 12 / 59

Language and Language and Language and Syllabic alphabet example: Lao Computers Logographic writing systems Computers Logograph example: Chinese Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding Speech Encoding Script developed in the 14th century to write the Lao language, based on I Logographs (also called ): Pictographs an early version of the , which was developed from the Old Writing systems Writing systems Writing systems Alphabetic I Pictographs (): originally pictures of Alphabetic Alphabetic , which was itself based on Mon scripts. Syllabic Syllabic Syllabic Logographic things, now stylized and simplified. Logographic Logographic Systems with unusual Systems with unusual Systems with unusual realization Example: development of Chinese horse: realization realization Example for vowel diacritics around the letter k: Relation to language Relation to language Relation to language Comparison of systems Comparison of systems Ideographs Comparison of systems Encoding written Encoding written Encoding written language language language ASCII ASCII ASCII Unicode Unicode Unicode Typing it in I Ideographs (): representations of abstract Typing it in Typing it in Spoken language ideas Spoken language Spoken language Transcription Transcription Transcription Why speech is hard to I Compounds: combinations of two or more logographs Why speech is hard to Why speech is hard to represent represent represent Articulation I Semantic-phonetic compounds: symbols with a Articulation Compounds of Pictographs/Ideographs Articulation Acoustics meaning element (hints at meaning) and a phonetic Acoustics Acoustics Relating written and Relating written and Relating written and spoken language element (hints at pronunciation). spoken language spoken language From Speech to Text From Speech to Text From Speech to Text From Text to Speech I Examples: Chinese (Zhongw¯ en)´ , Japanese (Nihongo), From Text to Speech From Text to Speech Mayan, Vietnamese, Ancient Egyptian (from: http://www.omniglot.com/writing/lao.htm) (from: http://www.omniglot.com/writing/chinese types.htm)

13 / 59 14 / 59 15 / 59

Language and Language and Language and Semantic-phonetic compounds Computers Two writing systems with unusual realization Computers alphabet Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding Speech Encoding Tactile Writing systems Writing systems Writing systems Alphabetic I Braille is a writing system that makes it possible to read Alphabetic Alphabetic Syllabic Syllabic Syllabic Logographic and write through touch; primarily used by the (partially) Logographic Logographic Systems with unusual Systems with unusual Systems with unusual realization blind. realization realization Relation to language Relation to language Relation to language Comparison of systems I It uses patterns of raised dots arranged in cells of up to Comparison of systems Comparison of systems Encoding written Encoding written Encoding written language six dots in a 3 x 2 configuration. language language ASCII ASCII ASCII Unicode I Each pattern represents a character, but some frequent Unicode Unicode Typing it in Typing it in Typing it in words and letter combinations have their own pattern. Spoken language Spoken language Spoken language Transcription Transcription Transcription Why speech is hard to Why speech is hard to Why speech is hard to represent represent represent An example from Ancient Egyptian Articulation Chromatographic Articulation Articulation Acoustics Acoustics Acoustics Relating written and I The Benin and Edo people in southern Nigeria have Relating written and Relating written and spoken language spoken language spoken language From Speech to Text developed a system of writing based on different color From Speech to Text From Speech to Text From Text to Speech From Text to Speech From Text to Speech combinations and symbols.

(cf. http://www.library.cornell.edu/africana/Writing Systems/Chroma.html) (from: http://www.omniglot.com/writing/egyptian.htm) 16 / 59 17 / 59 18 / 59 Language and Language and Language and Chromatographic system Computers Relating writing systems to languages Computers Japanese Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding Speech Encoding

Writing systems Writing systems Japanese: logographic system , syllabary , Writing systems Alphabetic Alphabetic Alphabetic Syllabic Syllabic syllabary hiragana Syllabic Logographic Logographic Logographic Systems with unusual I There is not a simple correspondence between a Systems with unusual Systems with unusual realization realization I kanji: 5,000-10,000 borrowed realization Relation to language writing system and a language. Relation to language Relation to language Comparison of systems Comparison of systems I katakana Comparison of systems Encoding written I For example, English uses the Roman alphabet, but Encoding written Encoding written language language I used mainly for non-Chinese loan words, onomatopoeic language ASCII (e.g., 3 and 4 instead of III and IV). ASCII ASCII Unicode Unicode words, foreign names, and for emphasis Unicode Typing it in I We’ll look at three other examples: Typing it in I hiragana Typing it in Spoken language Spoken language Spoken language Transcription I Japanese Transcription I originally used only by women (10th century), but Transcription Why speech is hard to Why speech is hard to Why speech is hard to represent I Korean represent codified in 1946 with 48 represent Articulation Articulation Articulation I I Acoustics Azeri Acoustics used mainly for word endings, kids’ books, and for Acoustics

Relating written and Relating written and words with obscure kanji symbols Relating written and spoken language spoken language spoken language From Speech to Text From Speech to Text I romaji: Roman characters From Speech to Text From Text to Speech From Text to Speech From Text to Speech

19 / 59 20 / 59 21 / 59

Language and Language and Language and Japanese example Computers Korean Computers Azeri Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding Speech Encoding

The example uses kanji (red), hiragana (black), and katakana (blue): Writing systems Writing systems Writing systems Alphabetic “Korean writing is an alphabet, a syllabary and logographs Alphabetic A Turkish language with speakers in Azerbaijan, northwest Alphabetic Syllabic Syllabic Syllabic ∼ Logographic all at once.” (http://home.vicnet.net.au/ ozideas/writkor.htm) Logographic Iran, and (former Soviet) Logographic Systems with unusual Systems with unusual Systems with unusual realization realization realization Relation to language I The system was developed in 1444 during King Relation to language I 7th century until 1920s: Arabic scripts. Three different Relation to language Comparison of systems Sejong’s reign. Comparison of systems Arabic scripts used Comparison of systems Encoding written Encoding written Encoding written I There are 24 letters: 14 consonants and 10 vowels language language I 1929: enforced by Soviets to reduce language Translation: ASCII I But the letters are grouped into syllables, i.e. the letters ASCII ASCII Unicode Unicode Islamic influence. Unicode Capsule Hotel Typing it in in a syllable are not written separately as in the English Typing it in Typing it in Spoken language system, but together form a single character. Spoken language I 1939: Cyrillic alphabet enforced by Stalin Spoken language A simple hotel where each room is capsule-shaped. When businessmen Transcription Transcription Transcription Why speech is hard to E.g., “Hangeul” (from: http://www.omniglot.com/writing/korean.htm): Why speech is hard to I 1991: Back to Latin alphabet, but slightly different than Why speech is hard to miss the last train home, they can stay overnight very cheaply instead of represent represent represent Articulation Articulation before. Articulation paying a lot of money to go home by taxi. Acoustics Acoustics Acoustics I Relating written and In South Korea, (logographic Chinese characters) Relating written and → Latin and computer were in great Relating written and spoken language are also used. spoken language spoken language From Speech to Text From Speech to Text demand in 1991 From Speech to Text From Text to Speech From Text to Speech From Text to Speech (from: http://www.omniglot.com/writing/japanese.htm#origin)

22 / 59 23 / 59 24 / 59

Language and Language and Language and Comparison of writing systems Computers Encoding written language Computers Converting decimal numbers to binary - Tabular Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding Method Speech Encoding I Information on a computer is stored in bits. Writing systems Writing systems Writing systems What are the pros and cons of each type of system? Alphabetic I A bit is either on (= 1, yes) or off (= 0, no). Alphabetic Alphabetic Syllabic Syllabic Syllabic Logographic I A list of 8 bits makes up a byte, e.g., 01001010 Logographic Logographic I accuracy: Can every word be written down accurately? Systems with unusual Systems with unusual Systems with unusual realization realization Using the first 4 bits, we want to know how to write 10 in bit realization Relation to language I Just like with the base 10 numbers we’re used to, the Relation to language Relation to language I learnability: How long does it take to learn the system? Comparison of systems Comparison of systems (or binary) notation. Comparison of systems order of the bits in a byte matters: I cognitive ability: Are some systems unnatural? (e.g. Encoding written Encoding written Encoding written language I Big Endian: most important bit is leftmost (the standard language 8 4 2 1 language Does show that alphabets are unnatural?) ASCII way of doing things) ASCII ASCII Unicode Unicode ???? Unicode Typing it in Typing it in Typing it in I language-particular differences: English has thousands I The positions in a byte thus encode: 128 64 32 16 8 4 2 8 < 10 ??? Spoken language 1 Spoken language Spoken language of possible syllables; Japanese has very few in Transcription Transcription 1 8 + 4 = 12 > 10 ? ? Transcription Why speech is hard to I “There are 10 kinds of people in the world; those who Why speech is hard to Why speech is hard to comparison represent represent represent Articulation know binary and those who don’t” Articulation 1 0 8 + 2 = 10 = 10 ? Articulation Acoustics Acoustics Acoustics I connection to history/culture: Will changing a writing (from: http://www.wlug.org.nz/LittleEndian) 1 0 1 0 Relating written and Relating written and Relating written and system have social consequences? spoken language I Little Endian: most important bit is rightmost (only spoken language spoken language From Speech to Text From Speech to Text From Speech to Text From Text to Speech used on Intel machines) From Text to Speech From Text to Speech I The positions in a byte thus encode: 1 2 4 8 16 32 64 128

25 / 59 26 / 59 27 / 59 Language and Language and Language and Converting decimal numbers to binary - Division Computers Using bytes to store characters Computers An encoding standard: ASCII Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Method Speech Encoding Speech Encoding Speech Encoding

Writing systems Writing systems Writing systems Alphabetic Alphabetic Alphabetic Syllabic Syllabic Syllabic Logographic Logographic Logographic Systems with unusual With 8 bits (a single byte), you can represent 256 different Systems with unusual Systems with unusual realization realization realization Relation to language characters. Why would we want so many? Relation to language Relation to language I ASCII = the American Standard Code for Information Decimal Remainder? Binary Comparison of systems Comparison of systems Comparison of systems Encoding written I If you look at a keyboard, you will find lots of Encoding written Interchange Encoding written 10/2 = 5 no 0 language language language ASCII non-English characters. ASCII I 7-bit code for storing English text ASCII 5/2 = 2 yes 10 Unicode Unicode Unicode 2/2 = 1 no 010 Typing it in I With 256 possible characters, we can store every single Typing it in I 7 bits = 128 possible characters. Typing it in Spoken language letter used in English, plus all the things like , Spoken language Spoken language 1/2 = 0 yes 1010 Transcription Transcription I The numeric order reflects alphabetic ordering. Transcription Why speech is hard to periods, space bar, percent sign (%), back space, and Why speech is hard to Why speech is hard to represent represent represent Articulation so on. Articulation Articulation Acoustics Acoustics Acoustics

Relating written and Relating written and Relating written and spoken language spoken language spoken language From Speech to Text From Speech to Text From Speech to Text From Text to Speech From Text to Speech From Text to Speech

28 / 59 29 / 59 30 / 59

Language and Language and Language and The ASCII chart Computers E-mail issues Computers Multipurpose Internet Mail Extensions (MIME) Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Codes 1–31 are used for control characters (backspace, line Speech Encoding Speech Encoding Speech Encoding feed, tab, . . . ). Writing systems Writing systems Writing systems Alphabetic Alphabetic Alphabetic Syllabic I Have you ever had something like the following at the Syllabic Syllabic Logographic Logographic Logographic Systems with unusual top of an e-mail sent to you? Systems with unusual Systems with unusual realization realization MIME provides meta-information on the text, which tells us: realization Relation to language [The following text is in the ‘‘ISO-8859-1’’ character set.] Relation to language Relation to language Comparison of systems [Your display is set for the ‘‘US-ASCII’’ character set. ] Comparison of systems Comparison of systems I 32 48 0 65 A 82 R 97 a 114 r Encoding written Encoding written which version of MIME is being used Encoding written 33 ! 49 1 66 B 83 S 98 b 115 s language [Some characters may be displayed incorrectly. ] language language 34 “ 50 2 67 C 84 T 99 c 116 t I what the charcter set is 35 # 51 3 68 D 85 U 100 d 117 u ASCII ASCII ASCII Unicode I Unicode Unicode 36 $ 52 4 69 E 86 V 101 e 118 v Mail sent on the internet used to only be able to transfer I if that character set was altered, how it was altered 37 % 53 5 70 F 87 W 102 f 119 w Typing it in Typing it in Typing it in 38 & 54 6 71 G 88 X 103 g 120 x the 7-bit ASCII messages. But now we can detect the 39 ’ 55 7 72 H 89 Y 104 h 121 y Spoken language Spoken language Spoken language 40 ( 56 8 73 I 90 Z 105 i 122 z Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII 41 ) 57 9 74 J 91 [ 106 j 123 { Transcription incoming character set and adjust the input. Transcription Transcription 42 * 58 : 75 K 92 \ 107 k 124 — Why speech is hard to Why speech is hard to Content-Transfer-Encoding: 7bit Why speech is hard to 43 + 59 ; 76 L 93 ] 108 l 125 } represent represent represent 44 , 60 < 77 M 94 ^ 109 m 126 ˜ Articulation I Note that this is an example of meta-information = Articulation Articulation 45 - 61 = 78 N 95 _ 110 n 127 DEL Acoustics Acoustics Acoustics 46 . 62 > 79 O 96 ‘ 111 o information which is printed as part of the regular 47 / 63 ? 80 P 112 p Relating written and Relating written and Relating written and 64 @ 81 Q 113 q spoken language message, but tells us something about that message. spoken language spoken language From Speech to Text From Speech to Text From Speech to Text From Text to Speech From Text to Speech From Text to Speech

31 / 59 32 / 59 33 / 59

Language and Language and Language and Different coding systems Computers Unicode Computers How big is Unicode? Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding Speech Encoding

Writing systems Problems with having multiple encoding systems: Writing systems Writing systems Alphabetic Alphabetic Alphabetic But wait, didn’t we want to be able to encode all languages? Syllabic Syllabic Syllabic Logographic I Conflicts: two encodings can use the same number for Logographic Logographic There are ways ... Systems with unusual Systems with unusual Systems with unusual realization two different characters and use different numbers for realization Version 3.2 has codes for 95,221 characters from alphabets, realization Relation to language Relation to language Relation to language I Extend the ASCII system with various other systems, Comparison of systems the same character. Comparison of systems syllabaries and logographic systems. Comparison of systems for example: Encoding written I Hassle: have to install many, many systems if you want Encoding written Encoding written language language I Uses 32 bits – meaning we can store language I ISO 8859-1: includes extra letters needed for French, ASCII to be able to deal with various languages ASCII 32 ASCII Unicode Unicode 2 = 4, 294, 967, 296 characters. Unicode German, Spanish, etc. Typing it in Typing it in Typing it in Unicode tries to fix that by having a single representation for I I ISO 8859-7: Spoken language Spoken language 4 billion possibilities for each character? That takes a lot Spoken language Transcription every possible character. Transcription Transcription I ISO 8859-8: Hebrew alphabet Why speech is hard to Why speech is hard to of space on the computer! Why speech is hard to represent represent represent I JIS X 0208: Japanese characters Articulation “Unicode provides a unique number for every Articulation Articulation Acoustics character, no matter what the platform, no matter Acoustics Acoustics I Have one system for everything → Unicode Relating written and Relating written and Relating written and spoken language what the program, no matter what the language.” spoken language spoken language From Speech to Text From Speech to Text From Speech to Text From Text to Speech (www.unicode.org) From Text to Speech From Text to Speech

34 / 59 35 / 59 36 / 59 Language and Language and Language and Compact encoding of Unicode characters Computers How do we type everything in? Computers Unwritten languages Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding Speech Encoding

Writing systems Writing systems Writing systems I Unicode has three versions Alphabetic I Use a keyboard tailored to your specific language Alphabetic Alphabetic Syllabic Syllabic Syllabic I UTF-32 (32 bits): direct representation Logographic e.g. Highly noticeable how much slower your English Logographic Logographic 16 Systems with unusual Systems with unusual Systems with unusual I UTF-16 (16 bits): 2 = 65536 realization realization realization Relation to language typing is when using a Danish-designed keyboard. Relation to language Many languages have never been written down. Of the 6700 Relation to language 8 I UTF-8 (8 bits): 2 = 256 Comparison of systems Comparison of systems Comparison of systems I Use a processor that allows you to switch between spoken, 3000 have never been written down. 32 Encoding written Encoding written Encoding written I How is it possible to encode 2 possibilities in 8 bits language different character systems. language language ASCII ASCII I Salar, a Turkic language in China. ASCII (UTF-8)? Unicode e.g. Type in Cyrillic characters on your English Unicode Unicode Typing it in Typing it in I Gugu Badhun, a language in Australia. Typing it in I keyboard. Several bytes are used to represent one character. Spoken language Spoken language Spoken language I I I Use the highest bit as flag: Transcription Use combinations of characters. Transcription Southeastern Pomo, a language in California Transcription Why speech is hard to Why speech is hard to Why speech is hard to represent An e followed by an ’ might result in an e´ represent represent I highest bit 0: single character Articulation Articulation Articulation I highest bit 1: part of a multi byte character Acoustics I Pick and choose from a table of characters. Acoustics Acoustics Relating written and Relating written and Relating written and I Nice consequence: ASCII text is in a valid UTF-8 spoken language So, now we can encode every language, as long as it’s spoken language spoken language From Speech to Text From Speech to Text From Speech to Text encoding. From Text to Speech written. From Text to Speech From Text to Speech

37 / 59 38 / 59 39 / 59

Language and Language and Language and The need for speech Computers What does speech look like? Computers What makes representing speech hard? Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding Speech Encoding Difficulties: Writing systems Writing systems Writing systems Alphabetic We can transcribe (write down) the speech into a phonetic Alphabetic Alphabetic I Syllabic alphabet. Syllabic People have different and different size vocal Syllabic I What if we want to work with an unwritten language? Logographic Logographic Logographic Systems with unusual Systems with unusual tracts and thus say things differently Systems with unusual I What if we want to examine the way someone talks and realization I It is very expensive and time-consuming to have realization realization Relation to language Relation to language I Sounds run together, and it’s hard to tell where one Relation to language don’t have time to write it down? Comparison of systems humans do all the transcription. Comparison of systems sound ends and another begins. Comparison of systems Encoding written Encoding written Encoding written language I To automatically transcribe, we need to know how to language I What we think of as one sound is not always (usually) language Many applications for encoding speech: ASCII ASCII ASCII Unicode relate the audio file to the individual sounds that we Unicode said the same: coarticulation = sounds affecting the Unicode I Building spoken dialogue systems, i.e. speak with a Typing it in Typing it in Typing it in hear. way neighboring sounds are said computer (and have it speak back). Spoken language Spoken language Spoken language Transcription ⇒ We need to know: Transcription e.g. k is said differently depending on if it is followed by Transcription I Helping people sound like native speakers of a foreign Why speech is hard to Why speech is hard to Why speech is hard to represent represent represent I ee or by oo. language. Articulation some properties of speech Articulation Articulation Acoustics I Acoustics I What we think of as two sounds are not always all that Acoustics I Helping speech pathologists diagnose problems how to measure these speech properties Relating written and I Relating written and Relating written and spoken language how these measurements correspond to sounds we spoken language different. spoken language From Speech to Text hear From Speech to Text e.g. The s see is very acoustically similar to the in From Speech to Text From Text to Speech From Text to Speech From Text to Speech shoe

40 / 59 41 / 59 42 / 59

Language and Language and Language and Articulatory properties: How it’s produced Computers Acoustic properties: What it sounds like Computers Oscillogram (Waveform) Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding Speech Encoding

Writing systems Sound waves = “small variations in air pressure that occur Writing systems Writing systems Alphabetic very rapidly one after another” (Ladefoged, A Course in Alphabetic Alphabetic Syllabic Syllabic Syllabic We could talk about how sounds are produced in the vocal Logographic ) Logographic Logographic Systems with unusual Systems with unusual Systems with unusual tract, i.e. articulatory phonetics realization ⇒ realization realization Relation to language Akin to ripples in a pond Relation to language Relation to language Comparison of systems Comparison of systems Comparison of systems I place of articulation (where): [t] vs. [k] Encoding written I speech flow = rate of speaking, number and length of Encoding written Encoding written language pauses (seconds) language language I manner of articulation (how): [t] vs. [s] ASCII ASCII ASCII Unicode I loudness (amplitude) = amount of energy (decibels) Unicode Unicode I voicing (vocal cord vibration): [t] vs. [d] Typing it in Typing it in Typing it in Spoken language I frequencies = how fast the sound waves are repeating Spoken language Spoken language But unless the computer is modeling a vocal tract, we need Transcription Transcription Transcription Why speech is hard to (cycles per second, i.e. Hertz) Why speech is hard to (Check out the Speech Analysis Tutorial, of the Deptartment of Linguistics at Lund University, Sweden at Why speech is hard to to know acoustic properties of speech which we can represent represent represent Articulation I pitch = how high or low a sound is Articulation http://www.ling.lu.se/research/speechtutorial/tutorial.html, from which the illustrations on this and the following Articulation quantify. Acoustics Acoustics Acoustics I In speech, there is a fundamental frequency, or pitch, slides are taken.) Relating written and Relating written and Relating written and spoken language along with higher-frequency overtones. spoken language spoken language From Speech to Text From Speech to Text From Speech to Text From Text to Speech I intonation = rise and fall in pitch From Text to Speech From Text to Speech

43 / 59 44 / 59 45 / 59 Language and Language and Language and Fundamental frequency (F0, pitch) Computers Spectrograms Computers How measurements correspond to sounds we Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding hear Speech Encoding

Writing systems Writing systems Writing systems Alphabetic Alphabetic Alphabetic Syllabic Syllabic Syllabic Logographic Logographic Logographic Systems with unusual Spectrogram = a graph to represent (the frequencies of) Systems with unusual Systems with unusual realization realization I How dark is the picture? → How loud is the sound? realization Relation to language speech over time. Relation to language Relation to language Comparison of systems Comparison of systems We can measure this in decibels. Comparison of systems Encoding written Encoding written Encoding written language language I Where are the lines the darkest? → Which frequencies language ASCII ASCII ASCII Unicode Unicode are the loudest and most important? Unicode Typing it in Typing it in We can measure this in terms of Hertz, and it tells us Typing it in Spoken language Spoken language Spoken language Transcription Transcription what the vowels are. Transcription Why speech is hard to Why speech is hard to Why speech is hard to represent represent I How do these dark lines change? → How are the represent Articulation Articulation Articulation Acoustics Acoustics frequencies changing over time? Acoustics Relating written and Relating written and Relating written and spoken language spoken language Which consonants are we transitioning into? spoken language From Speech to Text From Speech to Text From Speech to Text From Text to Speech From Text to Speech From Text to Speech

46 / 59 47 / 59 48 / 59

Language and Language and Language and How did we these measurements? Computers Sampling rate Computers Applications of speech encoding Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding Speech Encoding

Writing systems Writing systems Writing systems sampling rate = how many times in a given second we Alphabetic Alphabetic Alphabetic Syllabic Syllabic Syllabic extract a moment of sound; measured in samples per Logographic Logographic Logographic Systems with unusual Systems with unusual Systems with unusual second realization I The sampling rate is often 8000 or 16,000 samples per realization realization Relation to language Relation to language Relation to language Comparison of systems second. The rate for CDs is 44,100 samples/second (or Comparison of systems Mapping sounds to symbols (alphabet), and vice versa, isn’t Comparison of systems I Sound is continuous, but we have to store data in a Encoding written Hertz (Hz)) Encoding written all that easy. Encoding written discrete manner. language language language ASCII I ASCII ASCII Unicode The higher the sampling rate, the better quality the Unicode I Automatic Speech Recognition (ASR): sounds to text Unicode Typing it in Typing it in Typing it in recording ... but the more space it takes. I Text-to-Speech Synthesis (TTS): texts to sounds Spoken language Spoken language Spoken language Transcription I Speech needs at least 8000 samples/second, but most Transcription Transcription Why speech is hard to Why speech is hard to Why speech is hard to CONTINUOUS DISCRETE represent likely 16,000 or 22,050 Hz will be used nowadays. represent represent Articulation Articulation Articulation Acoustics Acoustics Acoustics I We store data at each discrete point, in order to capture Relating written and Relating written and Relating written and the general pattern of the sound spoken language spoken language spoken language From Speech to Text From Speech to Text From Speech to Text From Text to Speech From Text to Speech From Text to Speech

49 / 59 50 / 59 51 / 59

Language and Language and Language and Automatic Speech Recognition (ASR) Computers Kinds of ASR systems Computers Kinds of ASR systems Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding Speech Encoding

Writing systems Writing systems Writing systems Alphabetic Alphabetic Alphabetic Syllabic Syllabic Syllabic Logographic Logographic Logographic Automatic speech recognition = process by which the Systems with unusual Systems with unusual I Systems with unusual realization Different kinds of systems: realization Differing sizes of vocabularies, from tens of words to realization computer maps a speech signal to text. Relation to language Relation to language Relation to language Comparison of systems Comparison of systems tens of thousands of words Comparison of systems Uses/Applications: I Speaker dependent = work for a single speaker Encoding written Encoding written I continuous speech vs. isolated-word systems: Encoding written language I Speaker independent = work for any speaker of a given language language I ASCII ASCII ASCII Dictation I continuous speech systems = words connected Unicode variety of a language, e.g. American English Unicode Unicode I Telephone conversations Typing it in Typing it in together and not separated by pauses Typing it in Spoken language I Speaker adaptive = start as independent but begin to Spoken language I isolated-word systems = single words recognized at a Spoken language I People with disabilities – e.g. a person hard of hearing Transcription Transcription Transcription Why speech is hard to adapt to a single speaker to improve accuracy Why speech is hard to time, requiring pauses to be inserted between words Why speech is hard to could use an ASR system to get the text represent represent represent Articulation Articulation → easier to find the endpoints of words Articulation Acoustics Acoustics Acoustics

Relating written and Relating written and Relating written and spoken language spoken language spoken language From Speech to Text From Speech to Text From Speech to Text From Text to Speech From Text to Speech From Text to Speech

52 / 59 53 / 59 54 / 59 Language and Language and Language and Steps in an ASR system Computers Text-to-Speech Synthesis (TTS) Computers It’s hard to be natural Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding Speech Encoding

Writing systems Writing systems Writing systems Alphabetic Could just record a saying phrases or words and then Alphabetic When trying to make synthesized speech sound natural, we Alphabetic Syllabic Syllabic Syllabic Logographic play back those words in the appropriate order. Logographic encounter the same problems as what makes speech Logographic Systems with unusual Systems with unusual Systems with unusual realization Or can break the text down into smaller units realization encoding in general hard: realization 1. Digital sampling of speech Relation to language Relation to language Relation to language Comparison of systems Comparison of systems Comparison of systems I 2. Acoustic signal processing = converting the speech Encoding written 1. Convert input text into phonetic alphabet Encoding written The same sound is said differently in different contexts. Encoding written samples into particular measurable units language 2. Synthesize phonetic characters into speech language I Different sounds are sometimes said nearly the same. language ASCII ASCII ASCII Unicode Unicode I Different sentences have different intonation patterns. Unicode 3. Recognition of sounds, groups of sounds, and words Typing it in To synthesize characters into speech, people have tried: Typing it in Typing it in I Spoken language Spoken language Lengths of words vary depending on where in the Spoken language May or may not use more sophisticated analysis of the Transcription I using formulas which adjust the values of the Transcription sentence they are spoken. Transcription Why speech is hard to Why speech is hard to Why speech is hard to utterance to help. represent represent represent Articulation frequencies, the loudness, etc. Articulation The car crashed into the tree. Articulation Acoustics I using a model of the vocal tract and trying to produce Acoustics It’s my car. Acoustics Relating written and Relating written and Relating written and spoken language sounds based on how a human would speak spoken language Cars, trucks, and bikes are vehicles. spoken language From Speech to Text From Speech to Text From Speech to Text From Text to Speech From Text to Speech From Text to Speech

55 / 59 56 / 59 57 / 59

Language and Language and Speech to Text to Speech Computers Demos Computers Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding

Writing systems Writing systems Alphabetic Alphabetic Syllabic Syllabic Logographic Logographic Systems with unusual Text-to-Speech Systems with unusual If we convert speech to text and then back to speech, it realization realization Relation to language Relation to language should sound the same, right? Comparison of systems I AT&T mulitilingual TTS system: Comparison of systems Encoding written Encoding written I But at the conversion stages, there is information loss. language http://www.research.att.com/projects/tts/demo.php language ASCII ASCII To avoid this loss would require a lot of memory and Unicode I Nuance Realspeak: Unicode Typing it in Typing it in knowledge about what exact information to store. http://www.nuance.com/realspeak/demo/default.asp Spoken language Spoken language I Transcription I Transcription The process is thus irreversible. Why speech is hard to various systems and languages: Why speech is hard to represent ∼ represent Articulation http://www.ims.uni-stuttgart.de/ moehler/synthspeech/ Articulation Acoustics Acoustics

Relating written and Relating written and spoken language spoken language From Speech to Text From Speech to Text From Text to Speech From Text to Speech

58 / 59 59 / 59