Introduction to Formal Linguistics
Simon Dobnik Department of Philosophy, Linguistics and Theory of Science
September 3, 2015
Based on slides by Robin Cooper Outline
Practicalities
Overview of linguistics
Phonetics and Phonology
Morphology
Syntax
Semantics
Lexicon
A broader view Practicalities The course website
LT2112 H15 Introduction to formal linguistics on https://gul.gu.se
https://gul.gu.se/courseId/65958/content.do?id=26978419
http://gul.gu.se/public/courseId/70822/lang-en/publicPage.do
4 / 50 Course lecturers
I Ellen Breitholtz (morphology)
I Simon Dobnik (syntax and semantics with pragmatics, course organiser)
I Johan Gross (phonetics and phonology)
5 / 50 Overview of linguistics I Noam Chomsky, starting mid-fifties I but goes back to ancient grammarians (P¯an.ini, 4th cent. B.C.)
I nineteenth century (historical perspective, diachronic, Hermann Paul: sentences are the sum of their parts)
I pre-Chomskyan 20th century – synchronic (Saussure), structuralists (Leonard Bloomfield, Charles Hockett, Zellig Harris)
Linguistics – a scientific view of language
I formal: explicit, exact (to an extent)
7 / 50 I but goes back to ancient grammarians (P¯an.ini, 4th cent. B.C.)
I nineteenth century (historical perspective, diachronic, Hermann Paul: sentences are the sum of their parts)
I pre-Chomskyan 20th century – synchronic (Saussure), structuralists (Leonard Bloomfield, Charles Hockett, Zellig Harris)
Linguistics – a scientific view of language
I formal: explicit, exact (to an extent)
I Noam Chomsky, starting mid-fifties
7 / 50 I nineteenth century (historical perspective, diachronic, Hermann Paul: sentences are the sum of their parts)
I pre-Chomskyan 20th century – synchronic (Saussure), structuralists (Leonard Bloomfield, Charles Hockett, Zellig Harris)
Linguistics – a scientific view of language
I formal: explicit, exact (to an extent)
I Noam Chomsky, starting mid-fifties I but goes back to ancient grammarians (P¯an.ini, 4th cent. B.C.)
7 / 50 I pre-Chomskyan 20th century – synchronic (Saussure), structuralists (Leonard Bloomfield, Charles Hockett, Zellig Harris)
Linguistics – a scientific view of language
I formal: explicit, exact (to an extent)
I Noam Chomsky, starting mid-fifties I but goes back to ancient grammarians (P¯an.ini, 4th cent. B.C.)
I nineteenth century (historical perspective, diachronic, Hermann Paul: sentences are the sum of their parts)
7 / 50 Linguistics – a scientific view of language
I formal: explicit, exact (to an extent)
I Noam Chomsky, starting mid-fifties I but goes back to ancient grammarians (P¯an.ini, 4th cent. B.C.)
I nineteenth century (historical perspective, diachronic, Hermann Paul: sentences are the sum of their parts)
I pre-Chomskyan 20th century – synchronic (Saussure), structuralists (Leonard Bloomfield, Charles Hockett, Zellig Harris)
7 / 50 Linguistic methods
I corpus linguistics
I formal analysis
I experimental methods
8 / 50 Computational linguistics
. . . the scientific study of human language – specifically of the system of rules and the ways in which they are used in communication – using mathematical models and formal procedures that can be realised and validated using computers; a cross-over of many disciplines. (Stanford Linguistics Professor, 1980s) Borrowed from Stephan Oepen’s slide
9 / 50 Computational Linguistics
Wikipedia
University of Saarland
10 / 50 A language module A language module Lexicon Speech recognizer/synthesizer Speech output Speech Morphological input analyzer/generator Text input Text output Syntactic parser/generator
Semantic analyzer/reasoner Grammar Dialogue planner Knowledge base
11 / 50 Phonetics and Phonology Lexicon Speech recognizer/synthesizer Speech output Speech Morphological input analyzer/generator Text input Text output Syntactic parser/generator
Semantic analyzer/reasoner Grammar Dialogue planner Knowledge base
13 / 50 I classification of speech sounds according to articulation
Articulatory phonetics
I how we use our mouth, vocal tract to produce speech sounds
14 / 50 Articulatory phonetics
I how we use our mouth, vocal tract to produce speech sounds
I classification of speech sounds according to articulation
14 / 50 The vocal tract
From Wikipedia.
15 / 50 The IPA chart http://www.internationalphoneticalphabet.org/ipa/
THE INTERNATIONAL PHONETIC ALPHABET (revised to 2005) CONSONANTS (PULMONIC) © 2005 IPA
Bilabial Labiodental Dental Alveolar Post alveolar Retroflex Palatal Velar Uvular Pharyngeal Glottal Plosive p b t d Ê ∂ c Ô k g q G / Nasal m µ n = ≠ N – Trill ı r R Tap or Flap v | «
Fricative F B f v T D s z S Z ß Ω ç J x V X Â © ? h H Lateral fricative Ò L Approximant √ ® ’ j ˜ Lateral approximant l ¥ K Where symbols appear in pairs, the one to the right represents a voiced consonant. Shaded areas denote articulations judged impossible.
CONSONANTS (NON-PULMONIC) VOWELS Front Central Back Clicks Voiced implosives Ejectives Close i yÈ˨u > Bilabial ∫ Bilabial ’ Examples: ˘ Dental Î Dental/alveolar p’ Bilabial IY U Close-mid ! (Post)alveolar ˙ Palatal t’ Dental/alveolar e P e ∏ Øo ¯ Palatoalveolar ƒ Velar k’ Velar ´
≤ Alveolar lateral Ï Uvular s’ Alveolar fricative Open-mid E{ ‰ò øO å OTHER SYMBOLS œ Open a” AÅ ∑ Voiceless labial-velar fricative Ç Û Alveolo-palatal fricatives Where symbols appear in pairs, the one to the right represents a rounded vowel. w Voiced labial-velar approximant » Voiced alveolar lateral flap Á Voiced labial-palatal approximant Í Simultaneous S and x SUPRASEGMENTALS
Ì Voiceless epiglottal fricative Primary stress
( Affricates and double articulations " Voiced epiglottal fricative Æ Secondary stress ¿ can be represented by two symbols kp ts joined by a tie bar if necessary. ( ÆfoUn´"tIS´n Epiglottal plosive ÷ … Long e… DIACRITICS Diacritics may be placed above a symbol with a descender, e.g. N( Ú Half-long eÚ * Extra-short e* 9 Voiceless n9 d9 ª Breathy voiced bª aª 1 Dental t 1 d1 ˘ Minor (foot) group 3 Voiced s3 t 3 0 Creaky voiced b0 a0 ¡ Apical t ¡ d¡ ≤ Major (intonation) group Ó Aspirated tÓ dÓ £ Linguolabial t £ d£ 4 Laminal t 4 d4 . Syllable break ®i.œkt More rounded Labialized Nasalized 7 O7 W tW dW ) e) ≈ Linking (absence of a break)
¶ Less rounded O¶ ∆ Palatalized t∆ d∆ ˆ Nasal release dˆ TONES AND WORDˆ ACCENTS ™ Advanced u™ ◊ Velarized t◊ d◊ ¬ Lateral release d¬ LEVEL CONTOUR Extra Rising 2 Retracted e2 ≥ Pharyngealized t≥ d≥ } No audible release d} e _or â high e or ä High Falling Centralized Velarized or pharyngealized e! ê e$ ë · e· ù : High Mid e@ î e% ü rising + Mid-centralized e+ 6 Raised e6 ( ®6 = voiced alveolar fricative) Low Low e~ ô efi ï rising ` Syllabic n` § Lowered e§ ( B § = voiced bilabial approximant) Extra Rising- e— û low e& ñ$ falling 8 Non-syllabic e8 5 Advanced Tongue Root e5 Õ Downstep ã Global rise ± Rhoticity ´± a± ∞ Retracted Tongue Root e∞ õ Upstep à Global fall
16 / 50 The IPA chart for pulmonic consonants
17 / 50 The IPA chart for vowels
18 / 50 I can we recognise speech sounds from the acoustic data?
I not just acoustic data: McGurk effect, video
I continuous speech to discrete speech sounds, co-articulation
Acoustic phonetics
I the data from sound waves
19 / 50 I not just acoustic data: McGurk effect, video
I continuous speech to discrete speech sounds, co-articulation
Acoustic phonetics
I the data from sound waves
I can we recognise speech sounds from the acoustic data?
19 / 50 I continuous speech to discrete speech sounds, co-articulation
Acoustic phonetics
I the data from sound waves
I can we recognise speech sounds from the acoustic data?
I not just acoustic data: McGurk effect, video
19 / 50 Acoustic phonetics
I the data from sound waves
I can we recognise speech sounds from the acoustic data?
I not just acoustic data: McGurk effect, video
I continuous speech to discrete speech sounds, co-articulation
19 / 50 Spectrogram
From Wikipedia.
20 / 50 I phonological rules ([s]ip,[z]ip – sip[s], zip[s] ≈ bib[z], pub[z])
Phonology
I phonemes (kit, cat)
21 / 50 – sip[s], zip[s] ≈ bib[z], pub[z])
Phonology
I phonemes (kit, cat)
I phonological rules ([s]ip,[z]ip
21 / 50 Phonology
I phonemes (kit, cat)
I phonological rules ([s]ip,[z]ip – sip[s], zip[s] ≈ bib[z], pub[z])
21 / 50 Morphology Lexicon Speech recognizer/synthesizer Speech output Speech Morphological input analyzer/generator Text input Text output Syntactic parser/generator
Semantic analyzer/reasoner Grammar Dialogue planner Knowledge base
23 / 50 Inflectional morphology
I different forms in a paradigm
I singular vs plural (cat vs cats, run, runs, ran)
24 / 50 Derivational morphology
I creating new words, perhaps of a different category, perhaps with a different meaning
I clever ≈ cleverness, able ≈ ability
25 / 50 course assessment
I sometimes not just a sum of meanings of sub-parts: white house, White House
Other morphological processes
I not clear if there is a clear boundary between morphology and syntax
I cliticization – John’s coming, je l’ai vu I compounding – language technology
26 / 50 assessment
I sometimes not just a sum of meanings of sub-parts: white house, White House
Other morphological processes
I not clear if there is a clear boundary between morphology and syntax
I cliticization – John’s coming, je l’ai vu I compounding – language technology course
26 / 50 I sometimes not just a sum of meanings of sub-parts: white house, White House
Other morphological processes
I not clear if there is a clear boundary between morphology and syntax
I cliticization – John’s coming, je l’ai vu I compounding – language technology course assessment
26 / 50 Other morphological processes
I not clear if there is a clear boundary between morphology and syntax
I cliticization – John’s coming, je l’ai vu I compounding – language technology course assessment
I sometimes not just a sum of meanings of sub-parts: white house, White House
26 / 50 Syntax Lexicon Speech recognizer/synthesizer Speech output Speech Morphological input analyzer/generator Text input Text output Syntactic parser/generator
Semantic analyzer/reasoner Grammar Dialogue planner Knowledge base
28 / 50 Parts of speech
I dog – noun
I run – verb
I the – determiner, definite article
29 / 50 Construction types
I the dog – noun phrase
I the dog ran – sentence
I the thief [who saw the policeman] ran into the shop – relative clause
I I wonder [who saw the policeman] – embedded question
30 / 50 Grammars and grammar rules
I sentences may consist of a noun phrase followed by a verb phrase – S → NP VP
I phrase structure grammars, context free grammars (Chomsky hierarchy)
I are natural languages context free?
I features *the dog run, *the dogs runs
31 / 50 Syntactic structures
From here.
32 / 50 Semantics Lexicon Speech recognizer/synthesizer Speech output Speech Morphological input analyzer/generator Text input Text output Syntactic parser/generator
Semantic analyzer/reasoner Grammar Dialogue planner Knowledge base
34 / 50 Semantic properties and model theory
I “to know the meaning of a (declarative) sentence is to know the conditions under which it would be true”
I truth in a model
35 / 50 Logic
I propositional logic
I first order logic
I predicates, constants, variables, quantifiers
I Every television presenter has a secret. ∀ x.(television presenter(x) ⇒ ∃ y.(secret(y) ∧ have(x, y))) ∃ y.(secret(y) ∧ ∀ x.(television presenter(x) ⇒ have(x, y)))
I model theory for logic
I inference
36 / 50 I speech acts (assert, query, . . . )
I language in context (deictic pronouns I, you, but also demonstratives (this, that) and tense)
I presuppositions (my wife is coming → I have a wife, my wife isn’t coming → I have a wife)
Pragmatics
I language in use
37 / 50 I language in context (deictic pronouns I, you, but also demonstratives (this, that) and tense)
I presuppositions (my wife is coming → I have a wife, my wife isn’t coming → I have a wife)
Pragmatics
I language in use
I speech acts (assert, query, . . . )
37 / 50 I presuppositions (my wife is coming → I have a wife, my wife isn’t coming → I have a wife)
Pragmatics
I language in use
I speech acts (assert, query, . . . )
I language in context (deictic pronouns I, you, but also demonstratives (this, that) and tense)
37 / 50 Pragmatics
I language in use
I speech acts (assert, query, . . . )
I language in context (deictic pronouns I, you, but also demonstratives (this, that) and tense)
I presuppositions (my wife is coming → I have a wife, my wife isn’t coming → I have a wife)
37 / 50 Dynamic meaning
From here.
38 / 50 Lexicon Lexicon Speech recognizer/synthesizer Speech output Speech Morphological input analyzer/generator Text input Text output Syntactic parser/generator
Semantic analyzer/reasoner Grammar Dialogue planner Knowledge base
40 / 50 I seems also to include phrases – look up (the number), keep track of (the score), kick the bucket
I more information than just the words: phonology, morphology, syntax semantics
Words and phrases
I “the lexicon is a list of words”
41 / 50 I more information than just the words: phonology, morphology, syntax semantics
Words and phrases
I “the lexicon is a list of words”
I seems also to include phrases – look up (the number), keep track of (the score), kick the bucket
41 / 50 Words and phrases
I “the lexicon is a list of words”
I seems also to include phrases – look up (the number), keep track of (the score), kick the bucket
I more information than just the words: phonology, morphology, syntax semantics
41 / 50 A broader view Some other areas of linguistics
. . . which may be relevant to language technology:
I historical linguistics
I comparative linguistics and language typology
I dialect studies
I sociolinguistics
I psycholinguistics (language acquisition, human language processing)
43 / 50 I Sam read the books in the living-room I Did Sam read the books in the living-room? I *Living-room the in books the read Sam? I Sam read the books which are in the living-room I Which room did Sam read the books in ? I *Which room did Sam read the books which are in ?
Language variation and universals
I languages are different but there’s a limit on how different they are
I language universals
44 / 50 I Did Sam read the books in the living-room? I *Living-room the in books the read Sam? I Sam read the books which are in the living-room I Which room did Sam read the books in ? I *Which room did Sam read the books which are in ?
Language variation and universals
I languages are different but there’s a limit on how different they are
I language universals
I Sam read the books in the living-room
44 / 50 I *Living-room the in books the read Sam? I Sam read the books which are in the living-room I Which room did Sam read the books in ? I *Which room did Sam read the books which are in ?
Language variation and universals
I languages are different but there’s a limit on how different they are
I language universals
I Sam read the books in the living-room I Did Sam read the books in the living-room?
44 / 50 I Sam read the books which are in the living-room I Which room did Sam read the books in ? I *Which room did Sam read the books which are in ?
Language variation and universals
I languages are different but there’s a limit on how different they are
I language universals
I Sam read the books in the living-room I Did Sam read the books in the living-room? I *Living-room the in books the read Sam?
44 / 50 I Which room did Sam read the books in ? I *Which room did Sam read the books which are in ?
Language variation and universals
I languages are different but there’s a limit on how different they are
I language universals
I Sam read the books in the living-room I Did Sam read the books in the living-room? I *Living-room the in books the read Sam? I Sam read the books which are in the living-room
44 / 50 I *Which room did Sam read the books which are in ?
Language variation and universals
I languages are different but there’s a limit on how different they are
I language universals
I Sam read the books in the living-room I Did Sam read the books in the living-room? I *Living-room the in books the read Sam? I Sam read the books which are in the living-room I Which room did Sam read the books in ?
44 / 50 Language variation and universals
I languages are different but there’s a limit on how different they are
I language universals
I Sam read the books in the living-room I Did Sam read the books in the living-room? I *Living-room the in books the read Sam? I Sam read the books which are in the living-room I Which room did Sam read the books in ? I *Which room did Sam read the books which are in ?
44 / 50 I . . . except perhaps because of sickness, developmental characteristics or unusual social conditions
I native speakers
I linguistic (un)consciousness (lexicon vs grammar rules)
Everybody can talk
45 / 50 I native speakers
I linguistic (un)consciousness (lexicon vs grammar rules)
Everybody can talk
I . . . except perhaps because of sickness, developmental characteristics or unusual social conditions
45 / 50 I linguistic (un)consciousness (lexicon vs grammar rules)
Everybody can talk
I . . . except perhaps because of sickness, developmental characteristics or unusual social conditions
I native speakers
45 / 50 Everybody can talk
I . . . except perhaps because of sickness, developmental characteristics or unusual social conditions
I native speakers
I linguistic (un)consciousness (lexicon vs grammar rules)
45 / 50 Language acquisition
From here.
46 / 50 I should language technologists be concerned with this?
I should language technology systems imitate humans?
Linguistics and psychology
I developmental psychology
I human processing
47 / 50 I should language technology systems imitate humans?
Linguistics and psychology
I developmental psychology
I human processing
I should language technologists be concerned with this?
47 / 50 Linguistics and psychology
I developmental psychology
I human processing
I should language technologists be concerned with this?
I should language technology systems imitate humans?
47 / 50 I interaction with context
I multimodality, body language
I difficult to give a precise scientific theory of our linguistic behaviour
Why is linguistics (and language technology) difficult?
I natural languages are complex
48 / 50 I multimodality, body language
I difficult to give a precise scientific theory of our linguistic behaviour
Why is linguistics (and language technology) difficult?
I natural languages are complex
I interaction with context
48 / 50 I difficult to give a precise scientific theory of our linguistic behaviour
Why is linguistics (and language technology) difficult?
I natural languages are complex
I interaction with context
I multimodality, body language
48 / 50 Why is linguistics (and language technology) difficult?
I natural languages are complex
I interaction with context
I multimodality, body language
I difficult to give a precise scientific theory of our linguistic behaviour
48 / 50 Human languages and other languages
I animal languages
I artificial languages (logic, programming languages)
I human languages
49 / 50 I displacement (talking about things not present, time/tense, negation, (im)possibilities)
I arbitrary (compare different words for common objects in unrelated languages)
I productive (take any sentence, can you create a longer sentence which contains it?)
I discrete (digitisation)
Some properties of human languages
50 / 50 I arbitrary (compare different words for common objects in unrelated languages)
I productive (take any sentence, can you create a longer sentence which contains it?)
I discrete (digitisation)
Some properties of human languages
I displacement (talking about things not present, time/tense, negation, (im)possibilities)
50 / 50 I productive (take any sentence, can you create a longer sentence which contains it?)
I discrete (digitisation)
Some properties of human languages
I displacement (talking about things not present, time/tense, negation, (im)possibilities)
I arbitrary (compare different words for common objects in unrelated languages)
50 / 50 I discrete (digitisation)
Some properties of human languages
I displacement (talking about things not present, time/tense, negation, (im)possibilities)
I arbitrary (compare different words for common objects in unrelated languages)
I productive (take any sentence, can you create a longer sentence which contains it?)
50 / 50 Some properties of human languages
I displacement (talking about things not present, time/tense, negation, (im)possibilities)
I arbitrary (compare different words for common objects in unrelated languages)
I productive (take any sentence, can you create a longer sentence which contains it?)
I discrete (digitisation)
50 / 50