<<

Big questions Survey of areas of Summary The lab

Linguistics in a nutshell by hook or by crook

Jeremy G. Kahn

Signal, & Interpretation Laboratory Department of Linguistics University of Washington

22 June 2008 / Workshop 2007

Kahn Linguistics brushup Big questions Survey of areas of linguistics Summary The lab Outline

1 Linguistics: big questions Linguistics charter What linguists look at Linguistics’ role

2 Survey of areas of linguistics & and

3 The lab

Kahn Linguistics brushup Big questions Survey of areas of linguistics Summary The lab Business information

Linguistics introductions By necessity, incomplete Apologies my personal speaking style guessing about level of preparation Caveat: I’m a computational linguist Caveat: I have an engineering bias Goal: informality. Questions are good Thanks to Don Baumer (Linguistics) for letting me crib slides & examples

Kahn Linguistics brushup Big questions Charter Survey of areas of linguistics What linguists look at Summary Linguistics’ role The lab Outline

1 Linguistics: big questions Linguistics charter What linguists look at Linguistics’ role

2 Survey of areas of linguistics Phonetics & Phonology Morphology and syntax Semantics

3 The lab

Kahn Linguistics brushup Big questions Charter Survey of areas of linguistics What linguists look at Summary Linguistics’ role The lab What is linguistics?

Scientific study of human language How is language organized? How is it used? General questions about Language (capital L) What do all have in common? How can we describe how Language (or languages) works? How can we describe how a language works?

Kahn Linguistics brushup Big questions Charter Survey of areas of linguistics What linguists look at Summary Linguistics’ role The lab Language & communication

All communications have: mode or medium : speech, gesture, olfaction, etc semanticity : meaning carried pragmatic function : intention carried some also have: interchangeability (send *and* receive) cultural transmission : learned from other users arbitrariness : non- discreteness "compositionality" displacement : discuss things that aren’t here productivity : new ways to organize it Where do computer languages differ from human languages?

Kahn Linguistics brushup Big questions Charter Survey of areas of linguistics What linguists look at Summary Linguistics’ role The lab What makes language interesting?

Language is creative, but constrained “Seattle is rainy.” – well-formed * “rainy Seattle is.” – ill-formed “I like caffeinated drinks without bubbles.” * “Bubbles without drinks caffeinated like I” Not just word order: “pronk” could be an English word (in fact, it is) “przak” could not be (how do you know?)

Kahn Linguistics brushup Big questions Charter Survey of areas of linguistics What linguists look at Summary Linguistics’ role The lab Constraint and creativity

Linguists like to say language is “rule-governed”. Statistically-minded engineers might quibble... Engineering way of looking at it (thanks Shannon): sender wants to have for every idea recipient won’t have those compositionality and productivity allows novelty and communication

Kahn Linguistics brushup Big questions Charter Survey of areas of linguistics What linguists look at Summary Linguistics’ role The lab Outline

1 Linguistics: big questions Linguistics charter What linguists look at Linguistics’ role

2 Survey of areas of linguistics Phonetics & Phonology Morphology and syntax Semantics

3 The lab

Kahn Linguistics brushup Big questions Charter Survey of areas of linguistics What linguists look at Summary Linguistics’ role The lab Language as a part of the human OS

Language: not literacy. major advantage over chimpanzees (e.g. displacement) we’ve got specialist wetware Competent language use No school required No explicit instruction required Most humans competent in one language before age 3 What do we mean when we say “competent”?

Kahn Linguistics brushup Big questions Charter Survey of areas of linguistics What linguists look at Summary Linguistics’ role The lab Competence and Performance

Big idea in modern linguistics Competence : what a native user of a language knows. ability to produce & comprehend language system or knowledge (“”) that supports that largely subconscious learned (first-language) without effort Performance : what language users do often fully competent not always: speech errors, typos, “brain-o’s”

Kahn Linguistics brushup Big questions Charter Survey of areas of linguistics What linguists look at Summary Linguistics’ role The lab What’s so neat about competence?

Many modern linguists care about competence more than performance. Their view (Chomsky): your competence is a window on the underlying structure of your grammar your performance includes a bunch of messy wetware These (self-proclaimed “theoretical”) linguists are very very interested in trying to figure out what the OS is from the behavior of the .

Kahn Linguistics brushup Big questions Charter Survey of areas of linguistics What linguists look at Summary Linguistics’ role The lab Grammaticality and meaningfulness

“Meaningful” and “grammatical” not synonymous: Grammatical, but meaningless : ‘Colorless green ideas sleep furiously.’ — Noam Chomsky Ungrammatical, but meaningful : ‘Around the survivors, a perimeter create.’ — Yoda, Episode 2

Kahn Linguistics brushup Big questions Charter Survey of areas of linguistics What linguists look at Summary Linguistics’ role The lab Outline

1 Linguistics: big questions Linguistics charter What linguists look at Linguistics’ role

2 Survey of areas of linguistics Phonetics & Phonology Morphology and syntax Semantics

3 The lab

Kahn Linguistics brushup Big questions Charter Survey of areas of linguistics What linguists look at Summary Linguistics’ role The lab What’s all this about grammar, then?

Descriptive grammar : an attempt to describe the acceptability judgments (or patterns of use/competence) of a speaker. Prescriptive grammar : explicit instructions on how one should write (or speak); the language police. Linguistics is not about descriptive grammar. We don’t tell you how you should. We try to describe how you do. Dogma: All human languages, stigmatized or not, are equally expressive.

Kahn Linguistics brushup Big questions Charter Survey of areas of linguistics What linguists look at Summary Linguistics’ role The lab Linguistics and semi-supervised learning

Humans do it We get very little explicit labeling of our language data, yet we learn without instruction: what words and parts of words mean how to pronounce words we read how to understand sophisticated constructions (“respectively”) and more. . . It’s not all hard-coded (“universal grammar”): patterns often language-specific

Kahn Linguistics brushup Big questions Charter Survey of areas of linguistics What linguists look at Summary Linguistics’ role The lab Linguistics and semi-supervised learning

The corpora are out there : the web email (Enron emails!) newsgroups also speech corpora: radio television podcasts All mostly unlabeled but enormous problems: perfect for semi-supervised work.

Kahn Linguistics brushup Big questions Phonetics & Phonology Survey of areas of linguistics Morphology and syntax Summary Semantics The lab Overview of the different parts of language

Overview of the different parts of language (different parts of "grammar") Phonetics - how sounds are made and perceived Phonology - function and patterning of sounds Morphology - structure of words Syntax - analysis of sentence structure (word order) Semantics - meaning (words to meaning)

Kahn Linguistics brushup Big questions Phonetics & Phonology Survey of areas of linguistics Morphology and syntax Summary Semantics The lab Other areas of linguistic study

Other areas of linguistic study: - language evolution and creation - what else is intended and performed Typology - language classification and differences - neurobiological basis for language - language’s influence on and indication of social status and behavior systems - . . . a mess and more. . . We’ll not cover those here

Kahn Linguistics brushup Big questions Phonetics & Phonology Survey of areas of linguistics Morphology and syntax Summary Semantics The lab Outline

1 Linguistics: big questions Linguistics charter What linguists look at Linguistics’ role

2 Survey of areas of linguistics Phonetics & Phonology Morphology and syntax Semantics

3 The lab

Kahn Linguistics brushup Big questions Phonetics & Phonology Survey of areas of linguistics Morphology and syntax Summary Semantics The lab Phonetics

Phonetics: the study of linguistic speech sounds articulatory auditory (perceptual) acoustic Problems phonetics works with: no "spaces" between words: but we perceive them sounds are in a continuous (acoustic) , but we chunk them into the (discrete) space of the language’s segments Tools phoneticians use: spectrogram readers human listening transcription system (usually the International Phonetic , IPA) Why use IPA?

Kahn Linguistics brushup Big questions Phonetics & Phonology Survey of areas of linguistics Morphology and syntax Summary Semantics The lab is not pronunciation

Probably obvious to non-native English speakers Some languages have cleaner spelling-sound relationships (Spanish, Korean), but: “corazon” and “quesadilla” have the same initial sound Even a “clean” alphabetic language (e.g. Spanish) doesn’t have a 1:1 relationship between characters and phonetic segments: English is alphabetic, but with even noisier mappings “this” vs. “thought” English voicing of interdental (tongue-between-teeth) fricative: not represented in ever. This is why we use IPA.

Kahn Linguistics brushup Big questions Phonetics & Phonology Survey of areas of linguistics Morphology and syntax Summary Semantics The lab More on phonetics

Lots more available on phonetics: articulatory names (parts of the speech system) classification system learning the IPA “supra-segmentals”: articulations across multiple segments (e.g., pitch shapes) . . . and still not even touching the perceptual or acoustic domain

Kahn Linguistics brushup Big questions Phonetics & Phonology Survey of areas of linguistics Morphology and syntax Summary Semantics The lab Phonology

Phonology: Study of inventory of sounds in a language How sounds pattern together or contrast (research tool): ‘had’ vs. ‘hat’ : /t/ and /d/ are contrastive in English ‘steel’ vs. ‘stale’ vs. ‘stool‘ : /i/, /e/, /u/ are contrastive Contrastive sounds are : minimal units of sound

Kahn Linguistics brushup Big questions Phonetics & Phonology Survey of areas of linguistics Morphology and syntax Summary Semantics The lab Phonology (2)

Complementary distribution: two sounds appear in consistently different environments (never the same). [ph ] ‘pit’ [p ] ‘spit’ [ph], [p] not phonemically different: allophones of /p/ Glossing over much more in phonology. . .

Kahn Linguistics brushup Big questions Phonetics & Phonology Survey of areas of linguistics Morphology and syntax Summary Semantics The lab An aside for the deaf

Sign languages (e.g. American Sign Language) have phonology as well. Handshapes and gestures are essentially phonemic Different sign languages have different choices about how to cluster handshapes: different phonemes I am not an expert, but I know it’s an open research area.

Kahn Linguistics brushup Big questions Phonetics & Phonology Survey of areas of linguistics Morphology and syntax Summary Semantics The lab Outline

1 Linguistics: big questions Linguistics charter What linguists look at Linguistics’ role

2 Survey of areas of linguistics Phonetics & Phonology Morphology and syntax Semantics

3 The lab

Kahn Linguistics brushup Big questions Phonetics & Phonology Survey of areas of linguistics Morphology and syntax Summary Semantics The lab Morphology

Morphology is: the study of words the rules (patterns) of word formation Word : a minimal free form. Can appear in isolation in multiple positions “The hunter pursued the bears.” is “-er” a word? No. (constrained after “hunt”) is “the hunter” a word? No. (not minimal) wait: what is “-er” then? : the smallest part of a word carrying meaning Some can’t stand alone (affixes): (prefix, suffix, infix, circumfix)

Kahn Linguistics brushup Big questions Phonetics & Phonology Survey of areas of linguistics Morphology and syntax Summary Semantics The lab Syntax

Lexicon : a (form and category) Lexical category : (also “”). “Open class”, e.g. (rabbit, bicycle) (die, love, walk) (red, tall, frivolous) (often, very) (also “”). “Closed class”, e.g. Preposition (with, on, of, for) (and, or, because) (our, the, this, many) Auxiliary (will, can, may)

Kahn Linguistics brushup Big questions Phonetics & Phonology Survey of areas of linguistics Morphology and syntax Summary Semantics The lab Syntax

Some words are ambiguous (especially open-class). Consider “comb”. How to tell what category it is? some examples: meaning : acting as a person/place thing? probably NOUN inflection : if you can add ‘-ed’ or ‘-ing’ to it? probably VERB distribution : if it appears after a degree word (e.g. “very”): probably ADJ (Computational linguistics: “part-of-speech tagging”) Morphology ties to syntax.

Kahn Linguistics brushup Big questions Phonetics & Phonology Survey of areas of linguistics Morphology and syntax Summary Semantics The lab Back to morphology

Nope, not done: Not just words in the : also morphemes: closed-class (function) morphemes : prepositions & articles (function words) inflectional morphemes: don’t change class open-class morphemes : usually stand-alone (, , etc) also ‘-ly’, ‘-er’, ‘anti-’ derivational morpheme (may change class of stem)

Kahn Linguistics brushup Big questions Phonetics & Phonology Survey of areas of linguistics Morphology and syntax Summary Semantics The lab Word formation in English

Inflectional morphemes (no class change) -s third person singular present -ed past tense -ing progressive -en past participle -s plural -’s possessive -er comparative -est superlative Derivational affixes (class change) input result happy [adj] + -ness happiness [n] beauty [n] + -full beautiful [adj] beautiful [adj] + -ly beautifully [adv] stable [adj] + -ize stabilize [v]

Kahn Linguistics brushup Big questions Phonetics & Phonology Survey of areas of linguistics Morphology and syntax Summary Semantics The lab Subtleties in morphology

Perverse cases, even in English:

recursive-ish morphology: input result beauty [n] + -ful + -ness beautifulness [n]

English has roughly one (rather rude, emphatic) infix: input result -****ing- + Massachusetts ("Massa-****ing-chusetts")

Comp ling task: , morphological analysis (v. important in other languages, e.g. Czech)

Kahn Linguistics brushup Big questions Phonetics & Phonology Survey of areas of linguistics Morphology and syntax Summary Semantics The lab Back to syntax

Review: some words are ambiguous (“comb”): what to do? meaning inflection distribution Distribution could be a lot: Constituent : grammatical unit; part of larger unit sentence = noun (NP) + verb phrase (VP) (NP) = determiner + nourn noun is a (minimal) constituent Note recursion is possible.

Kahn Linguistics brushup Big questions Phonetics & Phonology Survey of areas of linguistics Morphology and syntax Summary Semantics The lab and ambiguity

How does phrase structure help with ambiguity? S S

NP VP NP VP Det N V NP Det N V NP the men comb Det N the men share Det N

their hair a comb Note that structure resolves lexical ambiguity: whether “comb” is noun or verb

Kahn Linguistics brushup Big questions Phonetics & Phonology Survey of areas of linguistics Morphology and syntax Summary Semantics The lab Syntax and structural ambiguity

Another kind of ambiguity: The woman shot the man with the gun.

Who has the gun? (she shot him with it): S

NP VP

The woman

V NP PP

shot Det N with the gun

the man Kahn Linguistics brushup Big questions Phonetics & Phonology Survey of areas of linguistics Morphology and syntax Summary Semantics The lab Syntax and structural ambiguity

Another kind of ambiguity: The woman shot the man with the gun.

Who has the gun? (he had it): S

NP VP

The woman V NP

shot Det N PP

the man with the gun Kahn Linguistics brushup Big questions Phonetics & Phonology Survey of areas of linguistics Morphology and syntax Summary Semantics The lab Syntax and structural ambiguity

No ambiguity about the meaning of any word two different kinds of attachment for “with the gun” PP attachment? messy. POS? fairly easy.

Kahn Linguistics brushup Big questions Phonetics & Phonology Survey of areas of linguistics Morphology and syntax Summary Semantics The lab Outline

1 Linguistics: big questions Linguistics charter What linguists look at Linguistics’ role

2 Survey of areas of linguistics Phonetics & Phonology Morphology and syntax Semantics

3 The lab

Kahn Linguistics brushup Big questions Phonetics & Phonology Survey of areas of linguistics Morphology and syntax Summary Semantics The lab Semantics

Two major areas within the study of language meaning: Lexical semantics : meaning of individual morphemes Compositional semantics : (or “phrasal semantics”): how meaning gets built up from pieces

Kahn Linguistics brushup Big questions Phonetics & Phonology Survey of areas of linguistics Morphology and syntax Summary Semantics The lab Lexical semantics

synonymy : “means (almost) the same thing”: (angry,sad), (vomit,puke) homonymy : “same form, unrelated meanings”: (pass[abstain],pass[succeed]) antonymy : “ meaning” hyponymy (hypernymy) : A is a hyponym of B (A is a special case of B; B is a hypernym of A; B is a generalization of A) poodle ; dog ; animal sprint ; run ; move

Kahn Linguistics brushup Big questions Phonetics & Phonology Survey of areas of linguistics Morphology and syntax Summary Semantics The lab Compositional semantics

sense () : the meaning of a word/phrase as a function (e.g., “rabbit” is a function from items to boolean value) (extension) : which thing(s) in the world the function (word,phrase) picks out (the set of rabbits) Example: “Jeremy” “today’s linguistics tutor” Same reference (extension), different sense

Kahn Linguistics brushup Big questions Phonetics & Phonology Survey of areas of linguistics Morphology and syntax Summary Semantics The lab Compositional semantics

Dealing with sentences. Sentences are boolean function on universe. “I like cheese” “I live in Seattle” Same reference (TRUE), different sense (different function).

Kahn Linguistics brushup Big questions Survey of areas of linguistics Summary The lab Summarizing

Lots of areas of linguistic research. Most of these are becoming approachable computationally None are very easy But: these represent what linguists think is going on in natural language not necessarily what is needed: these classes may not relate to task at hand in computation

Kahn Linguistics brushup Big questions Survey of areas of linguistics Summary The lab Emotion detection task, revisited

What can we add to the emotion detection task? Class together words (let’s use POS) sequence of classes might be interesting

Kahn Linguistics brushup Big questions Survey of areas of linguistics Summary The lab The lab

1 Read the datafiles; extract text, write out datafile.tok 2 Invoke the Ratnaparkhi tagger on the tokenized text: datafile.maxHpos 3 read the .maxHpos file and pull out just the tags (clean up the punctuation so it doesn’t break BoosTexter). Create datafile.pos, which must end with space-comma 4 paste together datafile.pos with datafile.orig 5 rerun the emotion detection, but this time with the extra sequence information

Kahn Linguistics brushup Big questions Survey of areas of linguistics Summary The lab The lab’s goals

Practice Perl Practice practical scripting (Perl is great, but not always the answer) Get comfortable with a new tool (the Ratnaparkhi tagger; very easy)

Kahn Linguistics brushup