<<

Introduction to Formal

Simon Dobnik Department of Philosophy, Linguistics and Theory of Science

September 3, 2015

Based on slides by Robin Cooper Outline

Practicalities

Overview of linguistics

Phonetics and Phonology

Morphology

Syntax

Semantics

Lexicon

A broader view Practicalities The course website

LT2112 H15 Introduction to formal linguistics on https://gul.gu.se

https://gul.gu.se/courseId/65958/content.do?id=26978419

http://gul.gu.se/public/courseId/70822/lang-en/publicPage.do

4 / 50 Course lecturers

I Ellen Breitholtz (morphology)

I Simon Dobnik (syntax and semantics with , course organiser)

I Johan Gross (phonetics and phonology)

5 / 50 Overview of linguistics I Noam Chomsky, starting mid-fifties I but goes back to ancient grammarians (P¯an.ini, 4th cent. B.C.)

I nineteenth century (historical perspective, diachronic, Hermann Paul: sentences are the sum of their parts)

I pre-Chomskyan 20th century – synchronic (Saussure), structuralists (Leonard Bloomfield, Charles Hockett, Zellig Harris)

Linguistics – a scientific view of

I formal: explicit, exact (to an extent)

7 / 50 I but goes back to ancient grammarians (P¯an.ini, 4th cent. B.C.)

I nineteenth century (historical perspective, diachronic, Hermann Paul: sentences are the sum of their parts)

I pre-Chomskyan 20th century – synchronic (Saussure), structuralists (Leonard Bloomfield, Charles Hockett, Zellig Harris)

Linguistics – a scientific view of language

I formal: explicit, exact (to an extent)

I Noam Chomsky, starting mid-fifties

7 / 50 I nineteenth century (historical perspective, diachronic, Hermann Paul: sentences are the sum of their parts)

I pre-Chomskyan 20th century – synchronic (Saussure), structuralists (Leonard Bloomfield, Charles Hockett, Zellig Harris)

Linguistics – a scientific view of language

I formal: explicit, exact (to an extent)

I Noam Chomsky, starting mid-fifties I but goes back to ancient grammarians (P¯an.ini, 4th cent. B.C.)

7 / 50 I pre-Chomskyan 20th century – synchronic (Saussure), structuralists (Leonard Bloomfield, Charles Hockett, Zellig Harris)

Linguistics – a scientific view of language

I formal: explicit, exact (to an extent)

I Noam Chomsky, starting mid-fifties I but goes back to ancient grammarians (P¯an.ini, 4th cent. B.C.)

I nineteenth century (historical perspective, diachronic, Hermann Paul: sentences are the sum of their parts)

7 / 50 Linguistics – a scientific view of language

I formal: explicit, exact (to an extent)

I Noam Chomsky, starting mid-fifties I but goes back to ancient grammarians (P¯an.ini, 4th cent. B.C.)

I nineteenth century (historical perspective, diachronic, Hermann Paul: sentences are the sum of their parts)

I pre-Chomskyan 20th century – synchronic (Saussure), structuralists (Leonard Bloomfield, Charles Hockett, Zellig Harris)

7 / 50 Linguistic methods

I corpus linguistics

I formal analysis

I experimental methods

8 / 50 Computational linguistics

. . . the scientific study of human language – specifically of the system of rules and the ways in which they are used in communication – using mathematical models and formal procedures that can be realised and validated using computers; a cross-over of many disciplines. (Stanford Linguistics Professor, 1980s) Borrowed from Stephan Oepen’s slide

9 / 50 Computational Linguistics

Wikipedia

University of Saarland

10 / 50 A language module A language module Lexicon Speech recognizer/synthesizer Speech output Speech Morphological input analyzer/generator Text input Text output Syntactic parser/generator

Semantic analyzer/reasoner Grammar Dialogue planner Knowledge base

11 / 50 Phonetics and Phonology Lexicon Speech recognizer/synthesizer Speech output Speech Morphological input analyzer/generator Text input Text output Syntactic parser/generator

Semantic analyzer/reasoner Grammar Dialogue planner Knowledge base

13 / 50 I classification of speech sounds according to articulation

Articulatory phonetics

I how we use our mouth, vocal tract to produce speech sounds

14 / 50 Articulatory phonetics

I how we use our mouth, vocal tract to produce speech sounds

I classification of speech sounds according to articulation

14 / 50 The vocal tract

From Wikipedia.

15 / 50 The IPA chart http://www.internationalphoneticalphabet.org/ipa/

THE INTERNATIONAL PHONETIC ALPHABET (revised to 2005) CONSONANTS (PULMONIC) © 2005 IPA

Bilabial Labiodental Dental Alveolar Post alveolar Retroflex Palatal Velar Uvular Pharyngeal Glottal Plosive p b t d Ê ∂ c Ô k g q G / Nasal m µ n = ≠ N – Trill ı r R Tap or Flap v | «

Fricative F B f v T D s z S Z ß Ω ç J x V X Â © ? h H Lateral fricative Ò L Approximant √ ® ’ j ˜ Lateral approximant l  ¥ K Where symbols appear in pairs, the one to the right represents a voiced consonant. Shaded areas denote articulations judged impossible.

CONSONANTS (NON-PULMONIC) VOWELS Front Central Back Clicks Voiced implosives Ejectives Close i yÈ˨u > Bilabial ∫ Bilabial ’ Examples: ˘ Dental Î Dental/alveolar p’ Bilabial IY U Close-mid ! (Post)alveolar ˙ Palatal t’ Dental/alveolar e P e ∏ Øo ¯ Palatoalveolar ƒ Velar k’ Velar ´

≤ Alveolar lateral Ï Uvular s’ Alveolar fricative Open-mid E{ ‰ò øO å OTHER SYMBOLS œ Open a” AÅ ∑ Voiceless labial-velar fricative Ç Û Alveolo-palatal fricatives Where symbols appear in pairs, the one to the right represents a rounded vowel. w Voiced labial-velar approximant » Voiced alveolar lateral flap Á Voiced labial-palatal approximant Í Simultaneous S and x SUPRASEGMENTALS

Ì Voiceless epiglottal fricative Primary stress

( Affricates and double articulations " Voiced epiglottal fricative Æ Secondary stress ¿ can be represented by two symbols kp ts joined by a tie bar if necessary. ( ÆfoUn´"tIS´n Epiglottal plosive ÷ … Long e… DIACRITICS Diacritics may be placed above a symbol with a descender, e.g. N( Ú Half-long eÚ * Extra-short e* 9 Voiceless n9 d9 ª Breathy voiced bª aª 1 Dental t 1 d1 ˘ Minor (foot) group 3 Voiced s3 t 3 0 Creaky voiced b0 a0 ¡ Apical t ¡ d¡ ≤ Major (intonation) group Ó Aspirated tÓ dÓ £ Linguolabial t £ d£ 4 Laminal t 4 d4 . Syllable break ®i.œkt More rounded Labialized Nasalized 7 O7 W tW dW ) e) ≈ Linking (absence of a break)

¶ Less rounded O¶ ∆ Palatalized t∆ d∆ ˆ Nasal release dˆ TONES AND WORDˆ ACCENTS ™ Advanced u™ ◊ Velarized t◊ d◊ ¬ Lateral release d¬ LEVEL CONTOUR Extra Rising 2 Retracted e2 ≥ Pharyngealized t≥ d≥ } No audible release d} e _or â high e or ä High Falling Centralized Velarized or pharyngealized e! ê e$ ë · e· ù : High Mid e@ î e% ü rising + Mid-centralized e+ 6 Raised e6 ( ®6 = voiced alveolar fricative) Low Low e~ ô efi ï rising ` Syllabic n` § Lowered e§ ( B § = voiced bilabial approximant) Extra Rising- e— û low e& ñ$ falling 8 Non-syllabic e8 5 Advanced Tongue Root e5 Õ Downstep ã Global rise ± Rhoticity ´± a± ∞ Retracted Tongue Root e∞ õ Upstep à Global fall

16 / 50 The IPA chart for pulmonic consonants

17 / 50 The IPA chart for vowels

18 / 50 I can we recognise speech sounds from the acoustic data?

I not just acoustic data: McGurk effect, video

I continuous speech to discrete speech sounds, co-articulation

Acoustic phonetics

I the data from sound waves

19 / 50 I not just acoustic data: McGurk effect, video

I continuous speech to discrete speech sounds, co-articulation

Acoustic phonetics

I the data from sound waves

I can we recognise speech sounds from the acoustic data?

19 / 50 I continuous speech to discrete speech sounds, co-articulation

Acoustic phonetics

I the data from sound waves

I can we recognise speech sounds from the acoustic data?

I not just acoustic data: McGurk effect, video

19 / 50 Acoustic phonetics

I the data from sound waves

I can we recognise speech sounds from the acoustic data?

I not just acoustic data: McGurk effect, video

I continuous speech to discrete speech sounds, co-articulation

19 / 50 Spectrogram

From Wikipedia.

20 / 50 I phonological rules ([s]ip,[z]ip – sip[s], zip[s] ≈ bib[z], pub[z])

Phonology

I phonemes (kit, cat)

21 / 50 – sip[s], zip[s] ≈ bib[z], pub[z])

Phonology

I phonemes (kit, cat)

I phonological rules ([s]ip,[z]ip

21 / 50 Phonology

I phonemes (kit, cat)

I phonological rules ([s]ip,[z]ip – sip[s], zip[s] ≈ bib[z], pub[z])

21 / 50 Morphology Lexicon Speech recognizer/synthesizer Speech output Speech Morphological input analyzer/generator Text input Text output Syntactic parser/generator

Semantic analyzer/reasoner Grammar Dialogue planner Knowledge base

23 / 50 Inflectional morphology

I different forms in a paradigm

I singular vs plural (cat vs cats, run, runs, ran)

24 / 50 Derivational morphology

I creating new words, perhaps of a different category, perhaps with a different meaning

I clever ≈ cleverness, able ≈ ability

25 / 50 course assessment

I sometimes not just a sum of meanings of sub-parts: white house, White House

Other morphological processes

I not clear if there is a clear boundary between morphology and syntax

I cliticization – John’s coming, je l’ai vu I compounding – language technology

26 / 50 assessment

I sometimes not just a sum of meanings of sub-parts: white house, White House

Other morphological processes

I not clear if there is a clear boundary between morphology and syntax

I cliticization – John’s coming, je l’ai vu I compounding – language technology course

26 / 50 I sometimes not just a sum of meanings of sub-parts: white house, White House

Other morphological processes

I not clear if there is a clear boundary between morphology and syntax

I cliticization – John’s coming, je l’ai vu I compounding – language technology course assessment

26 / 50 Other morphological processes

I not clear if there is a clear boundary between morphology and syntax

I cliticization – John’s coming, je l’ai vu I compounding – language technology course assessment

I sometimes not just a sum of meanings of sub-parts: white house, White House

26 / 50 Syntax Lexicon Speech recognizer/synthesizer Speech output Speech Morphological input analyzer/generator Text input Text output Syntactic parser/generator

Semantic analyzer/reasoner Grammar Dialogue planner Knowledge base

28 / 50 Parts of speech

I dog – noun

I run – verb

I the – determiner, definite article

29 / 50 Construction types

I the dog – noun phrase

I the dog ran – sentence

I the thief [who saw the policeman] ran into the shop – relative clause

I I wonder [who saw the policeman] – embedded question

30 / 50 Grammars and grammar rules

I sentences may consist of a noun phrase followed by a – S → NP VP

I phrase structure grammars, context free grammars (Chomsky hierarchy)

I are natural context free?

I features *the dog run, *the dogs runs

31 / 50 Syntactic structures

From here.

32 / 50 Semantics Lexicon Speech recognizer/synthesizer Speech output Speech Morphological input analyzer/generator Text input Text output Syntactic parser/generator

Semantic analyzer/reasoner Grammar Dialogue planner Knowledge base

34 / 50 Semantic properties and model theory

I “to know the meaning of a (declarative) sentence is to know the conditions under which it would be true”

I truth in a model

35 / 50 Logic

I propositional logic

I first order logic

I predicates, constants, variables, quantifiers

I Every television presenter has a secret. ∀ x.(television presenter(x) ⇒ ∃ y.(secret(y) ∧ have(x, y))) ∃ y.(secret(y) ∧ ∀ x.(television presenter(x) ⇒ have(x, y)))

I model theory for logic

I inference

36 / 50 I speech acts (assert, query, . . . )

I language in context (deictic pronouns I, you, but also demonstratives (this, that) and tense)

I presuppositions (my wife is coming → I have a wife, my wife isn’t coming → I have a wife)

Pragmatics

I language in use

37 / 50 I language in context (deictic pronouns I, you, but also demonstratives (this, that) and tense)

I presuppositions (my wife is coming → I have a wife, my wife isn’t coming → I have a wife)

Pragmatics

I language in use

I speech acts (assert, query, . . . )

37 / 50 I presuppositions (my wife is coming → I have a wife, my wife isn’t coming → I have a wife)

Pragmatics

I language in use

I speech acts (assert, query, . . . )

I language in context (deictic pronouns I, you, but also demonstratives (this, that) and tense)

37 / 50 Pragmatics

I language in use

I speech acts (assert, query, . . . )

I language in context (deictic pronouns I, you, but also demonstratives (this, that) and tense)

I presuppositions (my wife is coming → I have a wife, my wife isn’t coming → I have a wife)

37 / 50 Dynamic meaning

From here.

38 / 50 Lexicon Lexicon Speech recognizer/synthesizer Speech output Speech Morphological input analyzer/generator Text input Text output Syntactic parser/generator

Semantic analyzer/reasoner Grammar Dialogue planner Knowledge base

40 / 50 I seems also to include phrases – look up (the number), keep track of (the score), kick the bucket

I more information than just the words: phonology, morphology, syntax semantics

Words and phrases

I “the lexicon is a list of words”

41 / 50 I more information than just the words: phonology, morphology, syntax semantics

Words and phrases

I “the lexicon is a list of words”

I seems also to include phrases – look up (the number), keep track of (the score), kick the bucket

41 / 50 Words and phrases

I “the lexicon is a list of words”

I seems also to include phrases – look up (the number), keep track of (the score), kick the bucket

I more information than just the words: phonology, morphology, syntax semantics

41 / 50 A broader view Some other areas of linguistics

. . . which may be relevant to language technology:

I historical linguistics

I comparative linguistics and language typology

I dialect studies

I sociolinguistics

I psycholinguistics (language acquisition, human language processing)

43 / 50 I Sam read the books in the living-room I Did Sam read the books in the living-room? I *Living-room the in books the read Sam? I Sam read the books which are in the living-room I Which room did Sam read the books in ? I *Which room did Sam read the books which are in ?

Language variation and universals

I languages are different but there’s a limit on how different they are

I language universals

44 / 50 I Did Sam read the books in the living-room? I *Living-room the in books the read Sam? I Sam read the books which are in the living-room I Which room did Sam read the books in ? I *Which room did Sam read the books which are in ?

Language variation and universals

I languages are different but there’s a limit on how different they are

I language universals

I Sam read the books in the living-room

44 / 50 I *Living-room the in books the read Sam? I Sam read the books which are in the living-room I Which room did Sam read the books in ? I *Which room did Sam read the books which are in ?

Language variation and universals

I languages are different but there’s a limit on how different they are

I language universals

I Sam read the books in the living-room I Did Sam read the books in the living-room?

44 / 50 I Sam read the books which are in the living-room I Which room did Sam read the books in ? I *Which room did Sam read the books which are in ?

Language variation and universals

I languages are different but there’s a limit on how different they are

I language universals

I Sam read the books in the living-room I Did Sam read the books in the living-room? I *Living-room the in books the read Sam?

44 / 50 I Which room did Sam read the books in ? I *Which room did Sam read the books which are in ?

Language variation and universals

I languages are different but there’s a limit on how different they are

I language universals

I Sam read the books in the living-room I Did Sam read the books in the living-room? I *Living-room the in books the read Sam? I Sam read the books which are in the living-room

44 / 50 I *Which room did Sam read the books which are in ?

Language variation and universals

I languages are different but there’s a limit on how different they are

I language universals

I Sam read the books in the living-room I Did Sam read the books in the living-room? I *Living-room the in books the read Sam? I Sam read the books which are in the living-room I Which room did Sam read the books in ?

44 / 50 Language variation and universals

I languages are different but there’s a limit on how different they are

I language universals

I Sam read the books in the living-room I Did Sam read the books in the living-room? I *Living-room the in books the read Sam? I Sam read the books which are in the living-room I Which room did Sam read the books in ? I *Which room did Sam read the books which are in ?

44 / 50 I . . . except perhaps because of sickness, developmental characteristics or unusual social conditions

I native speakers

I linguistic (un)consciousness (lexicon vs grammar rules)

Everybody can talk

45 / 50 I native speakers

I linguistic (un)consciousness (lexicon vs grammar rules)

Everybody can talk

I . . . except perhaps because of sickness, developmental characteristics or unusual social conditions

45 / 50 I linguistic (un)consciousness (lexicon vs grammar rules)

Everybody can talk

I . . . except perhaps because of sickness, developmental characteristics or unusual social conditions

I native speakers

45 / 50 Everybody can talk

I . . . except perhaps because of sickness, developmental characteristics or unusual social conditions

I native speakers

I linguistic (un)consciousness (lexicon vs grammar rules)

45 / 50 Language acquisition

From here.

46 / 50 I should language technologists be concerned with this?

I should language technology systems imitate humans?

Linguistics and psychology

I developmental psychology

I human processing

47 / 50 I should language technology systems imitate humans?

Linguistics and psychology

I developmental psychology

I human processing

I should language technologists be concerned with this?

47 / 50 Linguistics and psychology

I developmental psychology

I human processing

I should language technologists be concerned with this?

I should language technology systems imitate humans?

47 / 50 I interaction with context

I multimodality, body language

I difficult to give a precise scientific theory of our linguistic behaviour

Why is linguistics (and language technology) difficult?

I natural languages are complex

48 / 50 I multimodality, body language

I difficult to give a precise scientific theory of our linguistic behaviour

Why is linguistics (and language technology) difficult?

I natural languages are complex

I interaction with context

48 / 50 I difficult to give a precise scientific theory of our linguistic behaviour

Why is linguistics (and language technology) difficult?

I natural languages are complex

I interaction with context

I multimodality, body language

48 / 50 Why is linguistics (and language technology) difficult?

I natural languages are complex

I interaction with context

I multimodality, body language

I difficult to give a precise scientific theory of our linguistic behaviour

48 / 50 Human languages and other languages

I animal languages

I artificial languages (logic, programming languages)

I human languages

49 / 50 I displacement (talking about things not present, time/tense, negation, (im)possibilities)

I arbitrary (compare different words for common objects in unrelated languages)

I productive (take any sentence, can you create a longer sentence which contains it?)

I discrete (digitisation)

Some properties of human languages

50 / 50 I arbitrary (compare different words for common objects in unrelated languages)

I productive (take any sentence, can you create a longer sentence which contains it?)

I discrete (digitisation)

Some properties of human languages

I displacement (talking about things not present, time/tense, negation, (im)possibilities)

50 / 50 I productive (take any sentence, can you create a longer sentence which contains it?)

I discrete (digitisation)

Some properties of human languages

I displacement (talking about things not present, time/tense, negation, (im)possibilities)

I arbitrary (compare different words for common objects in unrelated languages)

50 / 50 I discrete (digitisation)

Some properties of human languages

I displacement (talking about things not present, time/tense, negation, (im)possibilities)

I arbitrary (compare different words for common objects in unrelated languages)

I productive (take any sentence, can you create a longer sentence which contains it?)

50 / 50 Some properties of human languages

I displacement (talking about things not present, time/tense, negation, (im)possibilities)

I arbitrary (compare different words for common objects in unrelated languages)

I productive (take any sentence, can you create a longer sentence which contains it?)

I discrete (digitisation)

50 / 50