Language and Computers (Ling 384) Language and Computers – Where
Total Page:16
File Type:pdf, Size:1020Kb
Language and Language and Language and Computers Language and Computers – where to start? Computers Outline Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding Speech Encoding Writing systems Writing systems Writing systems Alphabetic Alphabetic Alphabetic Syllabic Syllabic Syllabic Logographic I Logographic Logographic Systems with unusual If we want to do anything with language, we need a way Systems with unusual Writing systems Systems with unusual realization to represent language. realization realization Language and Computers (Ling 384) Relation to language Relation to language Relation to language Comparison of systems Comparison of systems Encoding written language Comparison of systems Topic 1: Text and Speech Encoding Encoding written I We can interact with the computer in several ways: Encoding written Encoding written language language language ASCII I write or read text ASCII ASCII Unicode Unicode Spoken language Unicode Typing it in I speak or listen to speech Typing it in Typing it in Detmar Meurers∗ Spoken language Spoken language Spoken language Dept. of Linguistics, OSU Transcription I Computer has to have some way to represent Transcription Transcription Why speech is hard to Why speech is hard to Relating written and spoken language Why speech is hard to represent represent represent Winter 2005 I Articulation text Articulation Articulation Acoustics I speech Acoustics Acoustics Relating written and Relating written and Relating written and spoken language spoken language spoken language From Speech to Text From Speech to Text From Speech to Text From Text to Speech From Text to Speech From Text to Speech ∗ The course was created together with Markus Dickinson and Chris Brew. 1 / 57 2 / 57 3 / 57 Language and Language and Language and Writing systems used for human languages Computers Alphabetic systems Computers Alphabet example: Fraser Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding An alphabet used to write Lisu, a Tibeto-Burman language spoken by Speech Encoding What is writing? Writing systems Writing systems about 657,000 people in Myanmar, India, Thailand and in the Chinese Writing systems Alphabetic Alphabets (phonemic alphabets) Alphabetic provinces of Yunnan and Sichuan. Alphabetic “a system of more or less permanent marks used Syllabic Syllabic Syllabic Logographic Logographic Logographic to represent an utterance in such a way that it can Systems with unusual Systems with unusual Systems with unusual realization I represent all sounds, i.e., consonants and vowels realization realization be recovered more or less exactly without the Relation to language Relation to language Relation to language Comparison of systems Comparison of systems Comparison of systems intervention of the utterer.” I Examples: Etruscan, Latin, Korean, Cyrillic, Runic, Encoding written Encoding written Encoding written (Peter T. Daniels, The World’s Writing Systems) language International Phonetic Alphabet language language ASCII ASCII ASCII Unicode Unicode Unicode Typing it in Abjads (consonant alphabets) Typing it in Typing it in Different types of writing systems are used: Spoken language Spoken language Spoken language Transcription Transcription Transcription Why speech is hard to Why speech is hard to Why speech is hard to I Alphabetic represent I represent consonants only (sometimes plus selected represent represent Articulation Articulation Articulation I Syllabic Acoustics vowels; vowel diacritics generally available) Acoustics Acoustics Relating written and I Examples: Arabic, Aramaic, Hebrew Relating written and Relating written and I Logographic spoken language spoken language spoken language From Speech to Text From Speech to Text From Speech to Text Much of the information on writing systems and the graphics used are From Text to Speech From Text to Speech From Text to Speech taken from the amazing site http://www.omniglot.com. (from: http://www.omniglot.com/writing/fraser.htm) 4 / 57 5 / 57 6 / 57 Language and Language and Language and Abjad example: Phoenician Computers A note on the letter-sound correspondence Computers More examples for non-transparent letter-sound Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and An alphabet used to write Phoenician, created between the 18th and 17th Speech Encoding Speech Encoding correspondences Speech Encoding centuries BC; assumed to be the forerunner of the Greek and Hebrew Writing systems I Alphabets use letters to encode sounds (consonants, Writing systems Writing systems alphabet. Alphabetic Alphabetic Alphabetic Syllabic vowels). Syllabic Syllabic Logographic Logographic French Logographic Systems with unusual I Systems with unusual Systems with unusual realization But the correspondence between spelling and realization realization Relation to language Relation to language (1) a. Versailles → [veRsai] Relation to language Comparison of systems pronounciation in many languages is quite complex, Comparison of systems Comparison of systems Encoding written i.e., not a simple one-to-one correspondence. Encoding written b. ete, etais, etait, etaient → [ete] Encoding written language language language ASCII I Example: English ASCII ASCII Unicode Unicode Unicode Typing it in Typing it in Typing it in I same spelling – different sounds: ough: ought, cough, Irish Spoken language Spoken language Spoken language Transcription tough, through, though, hiccough Transcription Transcription Why speech is hard to I Why speech is hard to Why speech is hard to represent silent letters: knee, knight, knife, debt, psychology, represent (2) a. Baile A’tha Cliath (Dublin) → [bl’a: kli uh] represent Articulation mortgage Articulation Articulation Acoustics Acoustics b. samhradh (summer) → [sauruh] Acoustics I one letter – multiple sounds: exit, use Relating written and Relating written and c. (I write) → [shgri:m] Relating written and spoken language I multiple letters – one sound: the, revolution spoken language scri’obhaim spoken language From Speech to Text From Speech to Text From Speech to Text I From Text to Speech alternate spellings: jail or gaol; but not possible seagh From Text to Speech From Text to Speech (from: http://www.omniglot.com/writing/phoenician.htm) for chef (despite sure, dead, laugh) What is the notation used within the []? 7 / 57 8 / 57 9 / 57 Language and Language and Language and The International Phonetic Alphabet (IPA) Computers Syllabic systems Computers Syllabary example: Cypriote Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding Speech Encoding Syllabic alphabets (Alphasyllabaries) The Cypriot syllabary or Cypro-Minoan writing is thought to have Writing systems Writing systems developed from the Linear A, or possibly the Linear B script of Crete, Writing systems I Several special alphabets for representing sounds have Alphabetic I writing systems with symbols that represent a Alphabetic Alphabetic Syllabic Syllabic though its exact origins are not known. It was used from about 800 to 200 Syllabic been developed, the best known being the International Logographic consonant with a vowel, but the vowel can be changed Logographic Logographic Systems with unusual Systems with unusual BC. Systems with unusual realization realization realization Phonetic Alphabet (IPA). Relation to language by adding a diacritic (= a symbol added to the letter). Relation to language Relation to language Comparison of systems Comparison of systems Comparison of systems I Examples: Balinese, Javanese, Tibetan, Tamil, Thai, I The phonetic symbols are unambiguous: Encoding written Encoding written Encoding written language Tagalog language language I designed so that each speech sound gets its own ASCII ASCII ASCII Unicode (cf. also: http://www.omniglot.com/writing/syllabic.htm) Unicode Unicode symbol, Typing it in Typing it in Typing it in I eliminating the need for Spoken language Spoken language Spoken language I multiple symbols used to represent simple sounds Transcription Syllabaries Transcription Transcription Why speech is hard to Why speech is hard to Why speech is hard to I one symbol being used for multiple sounds. represent represent represent Articulation Articulation Articulation Acoustics I writing systems with separate symbols for each syllable Acoustics Acoustics I Interactive example chart: http://web.uvic.ca/ling/ Relating written and of a language Relating written and Relating written and spoken language spoken language spoken language resources/ipa/charts/IPAlab/IPAlab.htm From Speech to Text I Examples: Cherokee. Ethiopic, Cypriot, Ojibwe, From Speech to Text From Speech to Text From Text to Speech From Text to Speech From Text to Speech Hiragana (Japanese) (cf. also: http://www.omniglot.com/writing/syllabaries.htm#syll) (from: http://www.omniglot.com/writing/cypriot.htm) 10 / 57 11 / 57 12 / 57 Language and Language and Language and Syllabic alphabet example: Lao Computers Logographic writing systems Computers Logograph writing system example: Chinese Computers Topic 1: Text and Topic 1: Text and Topic 1: Text and Speech Encoding Speech Encoding Speech Encoding Script developed in the 14th century to write the Lao language, based on I Logographs (also called Logograms): Pictographs Writing systems Writing systems Writing systems an early version of the Thai script, which was developed from the Old I Alphabetic Pictographs (Pictograms): originally pictures of Alphabetic Alphabetic Khmer script, which was