Typology in Variation: A Stochastic Optimality-theoretic Approach to Person/Voice Interactions in English and (Straits Salish)

by

JOAN BRESNAN, SHIPRA DINGARE, AND CHRIS MANNING

Stanford University

Syntax Workshop, 16 October 2001 In a nutshell . . . i) The generalization: The same categorical phenomena which are attributed to hard grammatical constraints in some languages continue to show up as statistical preferences in other languages, motivating a grammatical model that can account for soft constraints. ii) A case study: The person hierarchy affects subject selection categorically in Lummi (Straits Salish, ), Nootka (Southern Wakashan, British Columbia), Picur´ıs (Tanoan, New Mexico), and other languages. It also affects the frequency of subject selection in active/passive choices in spoken English. iii) An explanation: – The person hierarchy is rooted in psycholinguistic or communicative tendencies which affect not just the formal properties of a few particular languages, but every language. – The detailed effects of these tendencies on can be captured in Optimality Theory (OT). The universal tendencies are modelled as violable constraints which have variable strengths (rankings) across languages. Given a language-particular ranking, an optimization function determines possible grammatical structures by minimizing the worst violations. – Frequentistic variation follows when these same constraints are ranked on a continuous scale with stochastic evaluation (Boersma 1998, 2000, Boersma and Hayes 2001). The resulting model defines a continuum of conventionalization which connects frequentistic preferences in usage to categorical grammatical rules. The Person Hierarchy

1st, 2nd 3rd (local outranks nonlocal)

The Person Hierarchy — appears at the top of a hierarchy of nominal features: e.g. ‘animacy’, ‘topicality’ hierarchies:

1st, 2nd 3rd pronominal name human noun animate

nonhuman noun inanimate noun — ranks nominals according to their referents’ “likelihood of participa- tion in the speech event” (Smith-Stark 1974), their “inherent lexical content” (Silverstein 1976), their discourse-pragmatic topicality (Givon´ 1976, 1979, 1994), or their referents’ accessibility during the psycholinguistic processing of language (Ariel 1990, Warren and Gibson 2001) Why would the person hierarchy influence voice? “Inherent lexical content” of nominals offers little insight. Alternative theories: perspective-based: perspective-taking (MacWhinney, in progress) or empathy (Kuno and Kaburaki 1977, Kuno 1987) — grammar is designed to facilitate perspective shifting during communication; interlocutors share the perspectives of speech-act participants and of referents having causal roles. (These are paradigmatically the subjects of expressions.) -based: accessibility of referents in the pragmatic context (Givon´ 1976, 1979, 1994; Ariel 1991; Warren and Gibson 2001) — nominal expressions are most easily processed when their referents are contextually accessible and their expressions occur in perceptually salient positions (e.g., subject) in linguistic structures Categorical Effects of Person on Voice

The effects of the person hierarchy on grammar are categorical in some languages, most famously in languages with inverse systems, but also in languages with person restrictions on passivization. In Lummi, for example, the person of the subject argument cannot be lower than the person of a nonsubject argument. If this would happen in the active, passivization is obligatory; if it would happen in the passive, the active is obligatory (Jelinek and Demers 1983, 1994).a

aThe ‘transitive’ stem suffix -t, glossed TR, is one of a set that marks degree of voli- tionality of control of the action; the passive suffix - , glossed PASS, also marks middles

(Jelinek and Demers 1994: 706). With local person arguments the active is obligatory. The Lummi pattern holds for bound pronouns; full pronouns designating speaker and hearer are 3rd person deictic expressions (Jelinek and Demers 1994: 714). Lummi examples:

* ‘The man knows me/you’

¡ w

¢ ¢ ¢ ¢ £ ¢¤£ x ci-t-ˇ =s n/=sx c sw y q know-TR-PASS=1/2.SG.NOM by the man

‘I am/you are known by the man’

¢ ¢ £ ¢¥£ ¢ £ £¦¢¨§ x ci-t-sˇ c sw y q c swi qo know-TR-3.TR.SUBJ the man the boy

‘The man knows the boy’

¡

¢ £ £©¢ § ¢ ¢ ¢ £ ¢¥£ x ci-t-ˇ c swi qo c sw y q know-TR-PASS the boy by the man ‘The boy is known by the man’

w

¢ ¢ ¢ £ ¢¥£ x ci-t=sˇ n/=sx c sw y q know-TR=1/2.SG.NOM the man ‘I/you know the man’

* ‘The man is known by me/you’ Like Lummi: Picur´ıs (Tanoan, New Mexico) (Zaharlick 1982, Mithun 1999: 226–228) and Nootka (Southern Wakashan, British Columbia) (Klokeid 1969, Whistler 1985, Emanatian 1988).a Person-driven passives sometimes viewed as inverses (cf. Klaiman 1991, Jacobs 1994, Forrest 1994, Jelinek and Demers 1983, 1994 on Salish), but compare person-driven passives and the Algonquian-type inverse exemplified by Plains Cree (Dahlstrom 1984), from Mithun (1999: 222– 228):

Passive: Inverse: intransitive transitive patient Subject patient Object oblique case marking on agent non-oblique agent omissibility of indefinite agent non-omissibility (Mithun 1999: 227 concludes of Picur´ıs, “There is no question that these constructions are formally passive.”) aIn Picur´ıs, unlike Lummi, full pronouns share the same restriction as bound (Zaharlick 1982). Some languages with person/voice interactions:

Salishan (Western Canada and USA): (, Jelinek and Demers 1983) Lummi (Coast Salish, Jelinek and Demers 1983; cf. Klaiman 1991) Squamish (Coast Salish, Jacobs 1994) Bella Coola (isolate, Forrest 1994)

Southern Wakashan (Western Canada, USA): Nootka (Klokeid 1969, Emanatian 1988) (Jacobsen 1979: 156, 159) Nitinat (Klokeid 1978)

Kiowa Tanoan (USA): Picur´ıs (Tiwa-Tewa, Northern Tiwa, Zaharlick 1982) Southern Tiwa (Tiwa-Tewa, Allen and Frantz 1978, Rosen 1990) Arizona Tiwa (Tiwa-Tewa, Tewa, Kroskrity 1985) Towa (Tiwa-Tewa, Towa, Myers 1970, cited by Klaiman 1991: 292)

Hokan (Mexico and USA): Yana (Sapir 1929)

Algic (Canada and USA): Yurok (relative of Algo- nquian, Northwestern California, Robins 1958)

Mayan (Mexico and Guatamala): K’iche’ (Mondlach 1981, cited in Aissen 1999)

Austronesian: Chamorro (Western Malayo-Polynesian, Guam and Northern Mariana Islands, Chung 1998, Cooreman 1987) A Theory of Passivization in Optimality Theory

Passives are marked (Greenberg 1966, Trask 1979):

Typological distribution: There are many languages without passives.

Language-internal: Where it occurs, the passive is often more restricted than the active. For example, many languages restrict the passive agent (it may not appear, or may appear only in certain persons); others have a morphologically defective passive paradigm (lacking certain tenses, etc); only subclasses of active transitive verbs may passivize.

Morphological: Passivization is morphologically marked (Haspel- math 1990). Why? Historical explanation: actives are basic verb types; passives arise from originally deverbal constructions such as stative adjectives or nominals by a historical process of verbalisation (Trask 1979, Estival and Myhill 1988, Haspelmath 1990, Garrett 1990). But why are actives the basic (unmarked) verb types, rather than passives? The intuition: agents make better subjects than patients do. Semantically ‘active’ (proto-agent) arguments harmonically align with the most promi- nent syntactic argument positions; semantically ‘inactive’ (proto-patient) arguments harmonically align with the least prominent. This harmonic alignment is ultimately to be explained by the underlying psycholinguistic and communicative theories. The detailed effects of harmonic alignment on can be explicitly modelled in OT. In , the sonority hierarchy aligns with syllable structure so that the most sonorous sounds are attracted to syllable peaks and the least sonorous sounds, to syllable margins (see Kager 1999 for a synthetic overview). Aissen’s (1999) theory of harmonic alignment in : Prominence scales: Harmonically aligned scales: ¢

S O S S £¥¤ ¡ ¢

agent patient O O £¥¤ ¡ OT constraint subhierarchies: ¦

*S *S £¥¤ ¡ ¦

*O *O¤ £ ¡ The relative ranking of the subhierarchy is fixed across languages, but interleaving of other constraints can alter its effects.a aIn syntactically ergative languages (Manning 1996), the preference for agentive subjects must be overridden. Harmonic alignment of the person hierarchy with the relational hierarchy yields further constraint subhierarchies, which may interact with role- based harmonic alignment: ¦ ¦ ¡ ¡

*S *S , *O *O , ¢¤£ ¢¤£ ¦ ¡ £ *Obl *Obl ¢ (This in turn follows from the perspective-based or the pragmatics-based theories of person/voice interaction.) ¦

In OT, the universal subhierarchy *S *S entails the markedness of £¥¤ ¡ the passive compared to the active (all else being equal): ¤ input: v(ag,pt) *S *S £ ¡

passive: S , Obl *! £¥¤ ¡ ☞

active: S , O * £¥¤ ¡

Then why is there passivization? Other constraints favor the passive: avoiding or ‘backgrounding’ the agent (Shibatani 1985, Thompson 1987), avoiding subjects newer than non-subjects in the discourse (Birner and Ward 1998), placing the topic in subject position to enhance topic continuity (Givon´ 1983, Thompson 1987, Beaver 2000), etc.

English avoids subjects newer than non-subjects (*S ¡ ¢ ¡ £ ):

input: v(ag/new, pt) *S ¡ ¢ ¡ £ *S *S £¥¤ ¡

active: S ,O *! * £¥¤ ¡ ☞

passive: S ,Obl * £¥¤ ¡ a

Lummi avoids third person subjects (*S ):

¤ input: v(ag/3,pt/1) *S *S *S £ ¡

active: S ,O *! * £¥¤ ¡ ☞

passive: S ,Obl * £¥¤ ¡ In languages without passives, the constraint *S is undominated by any £¥¤ of these countervailing constraints.

aThis analysis of Lummi differs from that of Aissen (1999). It was learned by the GLA. The same constraints are hypothesized to be present in all grammars, but are more or less active depending on their ranking relative to other constraints.

Lummi falls back on *S ¡ ¢ ¡ £ with third person agent and patient: ¡¢ £ ¢ ¤ ¦ § input: v(ag/3/new,pt/3) *S *S *S *S ¥ ¨ § active: S ,O ¦ * *! * ¥ ¨ ☞

passive: S ¦ ,Obl § * * ¥ ¨

English suppresses the syntactic person-avoidance constraints by low ranking: ¦ § ¡¢ £ ¢ ¤ input: v(ag/3, pt/1) *S *S *S *S ¥ ¨ ☞ § active: S ,O ¦ * * ¥ ¨ § passive: S ¦ ,Obl *! ¥ ¨

We know this because the disharmonic combinations are still grammatical in English, unlike Lummi: She met me, She’ll be met by you. Statistical Person/Voice Interactions in English

To see if there is an effect of person on the selection of active or passive in English, we need information about the systematic choices made for each input. Prior studies generally fail to provide the full joint distribution, from which we can reconstruct the conditional frequencies needed.a We have therefore examined the parsed SWITCHBOARD corpus, a carefully designed database of spontaneous telephone conversations spoken by over 500 American English speakers, both male and female, from a great variety of speech communities (Godfrey et al. 1992). The conversations average 6 minutes in length, collectively amounting to 3 million words. We have used the parsed portion of this corpus (Marcus et al. 1993), which contains 1 million words. Although the frequency of passives is quite low in this corpus, the frequency of local pronouns is high.b aEstival and Myhill (1988) provide exactly the kind of information needed for animacy and definiteness, but they provide person frequencies only for the patient role. bFrancis et al. (1999) show that 91% of subjects in the parsed SWITCHBOARD corpus are pronominal. English person/role by voice (full passives):

action: # Act: # Pass: % Act: % Pass:

1,2 1,2 179 0 100.0 0.0

1,2 3 6246 0 100.0 0.0

3 3 3110 39 98.8 1.2

3 1,2 472 14 97.1 2.9 The leftmost column gives the four types of inputs (local person acting on local, local acting on nonlocal, etc.). We estimate the number of times each input was evaluated as the number of actives plus passives with that person/structure association. We then calculate the rate of passivization as the number of times that input was realized as passive. £ ¢

The person/voice effects are highly significant ( ¡ = 115.8, p 0.001;

Fisher exact test, p ¢ 0.001). Similar significance levels result if short passives are included, but we omit them because the person of the agent is not always clear.a aSee Dingare (2001) for detailed analysis and methodological discussion. Compared to the rate of passivization for inputs of third persons acting on third persons (1.2%), the rate of passivization for first or second person acting on third is substantially depressed (0%) while that for third acting on first or second (2.9%) is substantially elevated. The same disharmonic person/argument associations which are avoided categorically in languages like Lummi by making passives either impos- sible or obligatory, are avoided in the SWITCHBOARD corpus of spoken English by either depressing or elevating the frequency of passives relative to actives. In sum, the ‘hard’ grammatical constraints on person/voice interactions seen in languages like Lummi, Picur´ıs, and Nootka continue to show up as statistical preferences in English. Why is English like Lummi?

It is “a mainstay of functional ” that “linguistic elements and patterns that are frequent in discourse become conventionalized in grammar” (from a publisher’s blurb on Bybee and Hopper 2001). On this view, Lummi is simply at an extreme point from English along the continuum of conventionalization that connects frequentistic preferences in usage to categorical grammatical constraints. How can this process work in a conventional ? There, frequentistic processes (such as the conventionalization of usage preferences) must belong either to grammar-external ‘performance’ along with speech errors and memory limitations, or to external choices among competing dialect grammars. Yet neither of these alternatives is an adequate model of variation and change (Weinreich, Labov, and Herzog 1968). Stochastic Optimality Theory

Stochastic OTa (Boersma 1998, 2000, Boersma and Hayes 2001) differs from standard OT in two essential ways: (i) ranking on a continuous scale: Constraints are not simply ranked on a discrete ordinal scale; rather, they have a value on the continuous scale of real numbers. Thus constraints not only dominate other constriants, but they are specific distances apart, and these distances are relevant to what the theory predicts.

(ii) stochastic evaluation: at each evaluation the real value of each constraint is perturbed by temporarily adding to its ranking value a random value drawn from a normal distribution. For example, a constraint with the mean rank of 99 could be evaluated at 98.12 or 100.3. It is the constraint ranking that results from these new disharmonic values that is used in evaluation. a—one of a family of new optimization-based theories of grammar that can provide a unified account of categorical, variable, and gradient data (see Anttila in press, Manning to appear, and references). Constraint ranking on a continuous scale with stochastic evaluation:a

C1 C2

strict 90 88 86 84 82 80 lax

An OT grammar with stochastic evaluation can generate both categorical and variable outputs and can be learned from variable data by the GLA (Gradual Learning Algorithm, Boersma 1998, Boersma and Hayes 2001). Categorical outputs arise when crucially ranked constraints are distant. As the distance between constraints increases, interactions become vanishingly rare. (A distance of five standard deviations ensures an error rate of less than 0.02% (Boersma and Hayes 2001: 50).)b aNote the numerical scale is reversed to show stricter constraints to left as in OT tableaux. bUnits of measurement are arbitrary. With standard deviation = 2.0, a ranking distance of 10 units between constraints is taken to be effectively categorical. Variable outputs arise when crucially ranked constraints are closer together. An example: the English-type ‘pragmatic passive’:

*Spt *Snewer *O1,2

strict 104 99.6 90.1 lax ¡ input: v(ag/new, pt) *S *S ¡ ¢ ¡ £ *O ¢¤£ £¥¤ ☞

active: S ,O * £¥¤ ¡

passive: S ,Obl *! £¥¤ ¡ ¡ input: v(ag/new, pt) *S ¡ ¢ ¡ £ *S *O £¥¤ ¢¤£

active: S ,O *! £¥¤ ¡ ☞

passive: S ,Obl * £¥¤ ¡ The ranking values of the constraints (= the mean of

their normal distributions) are given on the axis. Pragmatic or discourse-driven passivization is a statistical tendency, but not a categorical property of the output. Passives avoid subjects newer than non-subjects, but passivization is infrequent, and actives with new subjects also occur.

(Still unexplained by this partial grammar: passives with nontopical subjects— I was descending Highway 9 on my bike and suddenly had to slow. A young women had been hit by a motorcycle and was lying in the street beside it. It was a sickening accident. ) See slide 44.) Where do the real number ranking values in a stochastic grammar come from? Starting from an initial state grammar in which all constraints have the same ranking values (arbitrarily set to be 100.0), the GLA is presented with learning data consisted of input-output pairs having the statistical distribution of, say, English. For each learning datum (a given input-output pair), the GLA compares the output of its own grammar for the same input; if its own output differs from the given output, it adjusts its grammar by moving all the constraints that disfavor its own output upward on the continuous ranking scale by a small increment, and moving all constraints that disfavor the given output downward along the scale by a small decrement. The adjustment process applies recursively to constraint subhierachies in order to preserve their local ordering relations.a

aThe increment/decrement value is called the ‘plasticity’ and may be assumed to vary stochastically and to change with age (Boersma 2000). Stochastic Grammars for English and Lummi

Partial stochastic grammar of English:

*Sag *Obl1,2 *Spt *S3 *Obl3 *O1,2 *S1,2 *O3

109 103 97 77

Output distribution of grammar: input: % Active: % Passive:

1,2 1,2 100.00 0.00

1,2 3 100.00 0.00

3 3 98.80 1.20

3 1,2 97.21 2.79 The constraint rankings and output distribution of this English grammar were determined by simulation using the Praat system (Boersma and Weenink 2000), which includes an implementation of the Gradual Learning Algorithm. *Sag *Obl1,2 *Spt *S3 *Obl3 *O1,2 *S1,2 *O3 again:

109 103 97 77 ¡ ¦ ¤

Observe: *S *S but *S *S = 6, close enough to produce low

£ £¥¤ frequency variable outputs for some inputs. For inputs where only the

agent is third person, passive outputs will occasionally be favored by *S :

An (infrequent) effect of *S on passive outputs:

¤ input: v(ag/3,pt/1) *S *S *S £ ¡

active: S ,O¤ *! * £ ¡ ☞

passive: S ¤ ,Obl * £ ¡

When both agent and patient are third person, the *S constraint cannot decide between active and passive, and the decision passes to other constraints. For this input it will be *S ¢¡ that permits passive ouputs, with slightly less frequency than the passive outputs pro-

duced by *S £ , which is ranked marginally higher. In a less limited grammar other constraints would fill this role. This

grammar unrealistically omits the effects of the *S ¤¦¥¨§©¥¨ constraint. Additionally, five constraints less active in our

data were also omitted from the simulations for perspicu-

¢¡ ¢¡  ity: *Obl , *Obl , *O , and *O . *Sag *Obl1,2 *Spt *S3 *Obl3 *O1,2 *S1,2 *O3 again:

109 103 97 77 ¡ ¢ ¡ ¡ Observe: *Obl *O¡ £ 10. (*O disfavors an active for an input

¢¤£ ¢¤£ £

with local-person patient and *O for an input with third-person patient.) These rankings reflect the zero frequency of local person passive agents in our data. But Kato (1979) cites (from Studs Terkel, Working):

I said, “Me watch it! Fuck that! Let him watch it.” He was hired by me. I could fire him if I didn’t like him.

When somebody says to me, “You’re great, how come you’re just a waitress?” Just a waitress. I’d say, “Why, don’t you think you deserve to be served by me?” With more training data and a more complete constraint set which includes factors of topicality and focus, our model should learn grammars that produce passives with local person agents. (If the ranking value of

*Obl ¡ in the grammar were lowered from 109 to 104, the output of local ¢¤£ person passives would increase to one-tenth of one percent, 0.1%, while barely changing the frequency of other outputs.) In sum, stochastic OT can capture the soft influence of person on English passivization that exists beneath the level of judgments. Disharmonic person/argument combinations are grammatical but avoided, affecting the frequency of passivization. Unfortunately we lack a parsed SWITCHBOARD corpus for Lummi. Nevertheless, it is possible to show by simulation how the descriptions of passive/voice interactions in Lummi grammar can also be captured by a stochastic OT grammar. We interpret the descriptions of Lummi from Jelinek and Demers (1983, 1994) by means of a simple distribution. Where a sentence type is described as ungrammatical, we assign it 0% relative frequency; where it is described as obligatory, we assign it 100%; and where it is described as optional, we assign it 50%: Simulated Lummi input/output distribution: input: % Active: % Passive:

1,2 1,2 100.00 0.00

1,2 3 100.00 0.00

3 3 50.00 50.00

3 1,2 0.00 100.00 The simulated input/output distribution is then used to generate training data for the GLA, as before. The initial state of the grammar and the training regime are exactly the same as for English. Partial stochastic grammar of Lummi

*O3 *Obl1,2 *S3 *O1,2 *Spt*S1,2 *Sag *Obl3

110 107 93.5 83 ¡ ¢

Observe: *S *S 10. This ranking yields the obligatory pas-

£¥¤ sivization of inputs with local person patients and non-local person agents, capturing the categorical influence of person on Lummi passivization. The output distribution of the grammar matches the similated learning distribution exactly. Isn’t ranking on the continuous scale of real numbers powerful enough to learn any distribution? No, it isn’t. Under the present theory there are no stochastic OT grammars for ‘anti-Lummi’ or ‘anti-English’ distributions, which reverse the gen- eralizations embodied in our data. Greater relative frequency of passives for first or second person acting on third would imply that third person subjects are avoided less than first or second person subjects. If so, then ¡ *S must dominate *S for a greater proportion of evaluations. But ¢¤£ that ranking violates the constraint subhierarchy, which in stochastic OT requires the mean ranking values of these constraints to occur in the reverse order. Thus, stochastic OT grammars are limited to subspaces of distributions that conform to the theory embodied in the constraint set. They are not general-purpose statistical analyzers and they have no special memory for frequencies (Boersma 2000). Conventionalization and Frequency

Stochastic OT grammars allow us to place the person/voice interactions in English and Lummi at points on a continuum of conventionalization that connects frequentistic preferences in usage to categorical grammatical constraints. If this general perspective is correct, then we would expect to find languages at intermediate points on this same continuum. Consider Squamish:

3 2: passive obligatory in Lummi and Squamish

3 1: passive obligatory in Lummi, optional in Squamish Aissen’s (1999) analysis:a Lummi: Squamish: ¦ ¦ ¦ ¡ £ O ¡ ,*O£ *S *O *S *O £¥¤ £¥¤

aSince languages differ in whether first or second person is dominant (DeLancey 1981), the mutual ranking of the local-person avoidance constraints is not fixed by the subhierarchy, but subject to crosslinguistic variation. However, it is not fully informative to say, as Aissen (1999) does, that passivization with third person agents and first person patients is “optional” in Squamish. In terms of what is preferred rather than what is merely possible, Squamish is described as being much the same as Lummi, “except that third person acting on first may be active, though commonly passive” (Klokeid 1969: 11). Thus in Squamish as in English, passives of the type I was fooled by her are optional alternatives to actives with disharmonic local-person objects: She fooled me. But in spoken English, such passives are exceedingly infrequent, far less common than the corresponding actives, while in Squamish they are more frequent than the corresponding actives. Why? In stochastic OT the rerankings postulated by Aissen (1999) take place on a continuous scale and imply changes in frequency as well as changes in grammaticality. The high rate of passivization with first person patients

in Squamish shows that a constraint favoring passive, such as *O ¡ is still ranked considerably above a constraint favoring active, such as *S , on £¥¤ our continuous constraint ranking scale. Squamish and Lummi are closely related Coast Salish languages. It is plausible that we are observing a change in progress: the two languages represent different points in the changing categoricity of person effects on the passive, reflected in the ranking of the person-avoidance constraints for first and second person. If a process of historical change is modeled by the movement in strength of a constraint along the continuous scale, as implied by the stochastic OT model, then (all else being equal) smooth changes in the relative frequencies of usage are predicted. Note however that although the change is smooth, it is not predicted to be linear. Rather, if a constraint reranking is crucial to the choice between two outputs, the prediction is that we should see an ‘S’ curve between the proportion of occurrences of the two outputs, of the sort that has been widely remarked on in historical and socio-linguistics (Weinreich, Labov, and Herzog 1968, Bailey 1973, Kroch 2001). Logistic response

1.0 Lummi

Squamish 0.8 0.6 0.4 Proportion of the time output is passive Proportion of the time output 0.2

English 0.0

-10 -5 0 5 10 Difference in base constraint ranking More technically, assuming that the difference in ranking of two con- straints which are crucial to the choice between two outputs A and B is changing linearly, then the proportion of output A is given by a logistic curve: when the constraints are at least 5 standard deviations apart, the proportion of the disfavored output is negligible; at 2 standard deviations, the rate reaches about 8%, but then it increases much more rapidly to 50% of each output when the two constraints are equally ranked. These considerations suggest that classical grammatical descriptions in terms of what is ‘possible’ or ‘grammatical’ are overly idealized, concealing grammatically significant statistical structure beneath the idealization of linguistic intuitions of grammaticality. Constraint Overlap and Statistical Dependencies of the Input

Splits in local person rankings (as in Squamish) are not naturally explained by the pragmatics-based interpretation of the person hierarchy, but could have origins in different habits of perspective shifting: ‘polite’ or socially distant (2nd person dominates) vs. ‘egocentric’ or socially intimate (1st person dominates). In other words, person-avoidance is controlled independently of informa- tion status. But must we conclude that person-avoidance constraints play a role even in English? Couldn’t it all be pragmatics? After all, first and second persons are seldom discourse new, while third persons may be (Cooreman 1987). Therefore, one could assume that local person subjects are not penalized by the avoidance of newer subjects, while non-local person subjects are. Then even if there were no person constraints in the grammar of English, there would be a soft effect of the

person hierarchy attributable to *S ¡ ¢ ¡ £ . Nonlocal agents are differentially favored for passivization by newness: ¤ input: v(ag/3/new, pt/3) *S ¡ ¢ ¡ £ *S £

active: S ,O *! £¥¤ ¡ ☞

passive: S ,Obl * £¥¤ ¡

input: v(ag/2, pt/3) *S ¡ ¢ ¡ £ *S £¥¤ ☞

active: S ,O £¥¤ ¡

passive: S ,Obl *! £¥¤ ¡ Similarly, nonlocal patients are differentially disfavored for passivization by newness:

input: v(ag/3, pt/3/new) *S ¡ ¢ ¡ £ *S £¥¤ ☞

active: S ,O £¥¤ ¡

passive: S ,Obl *! * £¥¤ ¡

input: v(ag/3, pt/2) *S ¡ ¢ ¡ £ *S £¥¤ ☞

active: S ,O £¥¤ ¡

passive: S ,Obl *! £¥¤ ¡ Because they designate speech act participants at the top of the topicality hierarchy, the local persons are not differentiated by newness. It follows that local–local inputs will be expressed by actives, all else being equal: ¢ Lummi: xci-t-oˇ ¢ s=s n ¡

know-TR-1/2.OBJ=1.NOM.SUBJ ‘I know you’ * ‘You are known by me’ However, pragmatics cannot be the total answer. We should not eliminate the person constraints from English grammar, or from grammar in general. Consider the following tableau: ¡ £ ¡ input: v(ag/1,2, pt/3) *Obl *S ¡ ¢ ¡ £ *S ¢ ¢¤£ ☞

active: S ,O * £¥¤ ¡

passive: S ,Obl *! * £¥¤ ¡

Here the discourse-pragmatic constraint *S ¡ ¢ ¡ £ and the person-avoidance

constraint *Obl ¡ are exactly parallel and penalize passivization. This ¢¤£ represents the most frequent situation: the local person agent is also topical or given in the discourse. The fact that the two constraints have exactly the same violation marks means that during training in the GLA, they will be treated the same (that is, demoted or promoted by the same amount). Since this input will be realized as active, both constraints will

be pulled up above *S ¡ . ¢¤£ Thus, even if the active output is in some sense driven by the discourse constraint, the person constraint will be pushed up as well. If the given input is frequent, the person constraint will end up close enough to the discourse constraint to drive the choice of active even when the local person agent is treated as non-topical. That is, because the local person agent is rarely realized as an oblique, the speaker will disprefer passive even when the local person agent is non-topical. Thus, because of the statistical dependencies of person and ‘newerness’ in the input, the person and discourse-pragmatic constraints will rise in tandem under the GLA. An emerging categoricity of person effects on voice will necessarily accompany an emerging categoricity of ‘newerness’

effects on voice (*S ¡ ¢ ¡ £ ), and vice versa. Evidence that Lummi has near-categorical ranking of the discourse- as well as the person-constraints: (i) indefinite noun phrases cannot appear as the subjects of transitive verbs (Jelinek and Demers 1994: 732). (ii) the avoidance of local person objects and passive agents does not extend to the freestanding referring expressions for local persons, which are focussed, perhaps ‘newer’ (Jelinek and Demers 1994: 714); This “sympathetic” behavior of constraints restricts the typology of possible languages. Rather than reranking constraints from different hierarchies in every possible way, the result of training with the GLA is that certain constraints will move together. This predicts that languages which realize agents as subjects may also tend to realize animates and definites as subjects even when they are not agents. Similarly, languages which realize topical elements as subjects may occasionally realize animates and local persons as subjects even when they are not topical. For example (slide 23): I was descending Highway 9 on my bike and suddenly had to slow. A young women had been hit by a motorcycle and was lying in the street beside it. It was a sickening accident. Here, it appears to be the animacy of the subject, rather than its topicality, that favors the passive. It follows that a language like English could not selectively show effects of the topicality hierarchy and simultaneously not show effects of the person hierarchy, and it is predicted that languages which show effects of the person hierarchy fall back on topicality to choose between active and passive when both inputs are third person.

Conclusion

The same categorical phenomena which are attributed to hard gram- matical constraints in some languages continue to show up as statistical preferences in other languages, motivating a grammatical model that can account for soft constraints. This observation is not new. Givon´ (1979: 26–31) already made this point forcefully over twenty years ago. In many of the world’s languages, probably in most, the subject of declarative clauses cannot be referential-indefinite . . . . Languages of this type are, for example, Swahili, Bemba, Rwanda (Bantu), Chinese, Sherpa (Sino-Tibetan), Bikol (Austronesian), Ute (Uto-Aztecan), Krio (Creole), all Creoles, and many others . . . . In a relatively small number of the world’s languages . . . referen- tial-indefinite nouns may appear as subjects of nonpresentative sentences . . . . When one investigates the text frequency of [such] sentences in English, however, one finds them at an extremely low frequency: About 10% of the subjects of main-declarative-affirmative-active sentences (nonpresentative) are indefinite, as against 90% definite. Now this is presumably not a fact about the “competence” of English speakers, but only about their actual “language behavior.” But are we dealing with two different kinds of facts in English and Krio? Hardly. What we are dealing with is apparently the very same communicative tendency—to reserve the subject position in the sentence for the topic, the old-information argument, the “continuity marker.” In some languages (Krio, etc.), this communicative tendency is expressed at the categorial level of 100%. In other languages (English, etc.) the very same communicative tendency is expressed “only” at the noncategorial level of 90%. And a tranformational-generative linguist will then be forced to count this fact as competence in Krio and performance in English. But what is the communicative difference between a rule of 90% fidelity and one of 100% fidelity? In psychological terms, next to nothing . . . . When live discourse data are taken into account . . . it becomes obvious that noncategorial phenomena are the rule rather than the exception in human language. — Givon´ (1979: 26–31) References

Parts of this talk are written up in the following, where references can be found: Bresnan, Joan, Shipra Dingare and Christopher Manning. 2001. Soft constraints mirror hard constraints: Voice and person in English and Lummi. In M. Butt and T. H. King (eds.), Proceedings of the LFG 01 Conference, University of Hong Kong. On-line, CSLI Publications: http:// csli-publications.stanford.edu/.

Dingare, Shipra. 2001. The effect of feature hierarchies on frequencies of passivization in English. Master’s thesis, Stanford University, Stan- ford, CA. On-line, Rutgers Optimality Archive: http://ruccs.rutgers.edu/ roa.html. ROA-467-0901.