<<

Corpus-based studies motivation Previous corpus studies PTB and annotation in PTB

Empirical Approaches to Elliptical Constructions Class 2: Corpus-based studies on ellipsis and English gapping

Gabriela Bîlbîie1 and Anne Abeillé2 1University of Bucharest [email protected] 2Université Paris Diderot-Paris 7 [email protected] LSA 2017 Linguistic Institute 10 July 2017, University of Kentucky

1 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Class 2 – Content

1 Corpus-based studies motivation

2 Previous corpus studies Meyer (1995) Greenbaum & Nelson (1999) Harbusch & Kempen (2011)

3 PTB and ellipsis annotation

4 Gapping in PTB Missing material Remnants

2 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Plan

1 Corpus-based studies motivation

2 Previous corpus studies Meyer (1995) Greenbaum & Nelson (1999) Harbusch & Kempen (2011)

3 PTB and ellipsis annotation

4 Gapping in PTB Missing material Remnants

3 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Ellipsis – a challenge for grammar

Ellipsis : a form/meaning mismatch (significatio ex nihilo)

1 part of the material necessary for the interpretation is missing in the syntactic structure (’incomplete’ ) ;

2 the missing material is recovered from an antecedent in the context.

Descriptive problem : A mass of elliptical constructions, on the basis of several criteria, e.g. syntactic function of the missing material (head or dependent), syntactic context (coordination, subordination ; dialogue), ellipsis directionnality (forward vs. backward ellipsis). ⇒ Sometimes, unstable terminology. Theoretical problem : A plethora of competitive analyses, with respect to the level at which reconstruction of the missing material takes place : syntactic reconstruction vs. semantic reconstruction. ⇒ Unsolved theoretical problems.

4 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Gathering data on ellipsis

In the literature on ellipsis, the large majority of examples are constructed data, based on introspective acceptability judgments. This leads very often to significant variation in acceptability judgments across speakers and sometimes even to contradictory data. Reliability of introspective acceptability judgments was recently called into question (Sprouse et al. 2010, Sprouse & Almeida 2012, Gibson & Fedorenko 2013), cf. weak methodological standards in linguistics (Gibson & Fedorenko 2013) : Confirmation bias on the part of the researcher (bias in favor of the predicted result). Confirmation bias on the part of the participants (if they are linguists, biased by their own hypotheses).

5 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Problems related to the introspective methods

Even if the acceptability judgment on a given example is correct, we don’t always have clear intuitions about what is the source of inacceptability. If only a few examples (sometimes just one) of a given type are used as the basis for the judgment, this makes it especially unclear that low acceptability is due to one factor rather than another. E.g. specific lexical items can create play a role, as can discourse constraints. Many subtle factors of usage influence ease of processing and consequently acceptability. Factors influencing acceptability : Grammaticality Complexity Plausibility Lexical semantic properties of the lexical items chosen Frequency of lexical item and sequences of lexical items Various usage preferences

6 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Usage Preferences impact

Usage Preferences (UPs) can affect any subdomain of linguistic competence : syntax, semantics, pragmatics, morphology, etc. Violating UPs can lead to reducing acceptability independently of any processing difficulties. UP violations can be cumulative, leading to strong unacceptability without any violations of linguistic constraints.

7 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB An illustration : the verbal anaphor do so

General assumptions : Do so does not allow stative antecedents (Lakoff & Ross 1976). Do so does not allow non-action event antecedents (Culicover & Jackendoff 2005).

(1) a. *Bill knew the answer, and Harry did so, too. (Lakoff & Ross 1976) b. *Robin dislikes Ozzie, but Leslie doesn’t do so. [Stative, Culicover & Jackendoff 2005] c. ?*Robin fell out the window, but Leslie didn’t do so. [Non-action event, Culicover & Jackendoff 2005]

8 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB

Attested examples of do so with stative antecedents :

(2) a. The basic idea is that whenever the relation of complementary distribution holds between phones belonging to a common phoneme, it does so because the phonetic value of the phoneme depends upon the phonetic environment in which it occurs. [Stative, in Fodor, Bever and Garret, The Psychology of Language, cited by Michiels 1978] b. [Lanchester brings] his singular narrative ease to a historical story that sniffs of a quiet, personalized epic, but does so beautifully, eschewing the dripping drama so often wrongly associated with books that trace more than a few decades. [Stative, NYT, cited by Houser 2010]

Paradox : Why do constructed examples with do so and a stative antecedent seem to be ungrammatical when such examples are attested in spontaneous usage of language ?

9 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB

Usage preferences for finite do so as evidenced by corpus investigation : UP1 Finite do so very strongly prefers to occur with non-stative antecedents. (98% of cases according to Houser 2010) UP2 Finite do so very strongly prefers to occur referring to the same state of affairs as its antecedent and hence with the same subject as its antecedent. (98% of cases according to Miller 2011) UP3 Finite do so prefers to occur with a non-contrastive adjunct. (83% of cases according to Miller 2011) Easy to find examples violating one UP in corpora. Very difficult to find examples violating two or three UPs.

10 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB

Acceptable examples respect two of the three UPs :

(3) a. The basic idea is that whenever the relation of complementary distribution holds between phones belonging to a common phoneme, it does so because the phonetic value of the phoneme depends upon the phonetic environment in which it occurs. [UP1–, UP2+, UP3+] b. [Lanchester brings] his singular narrative ease to a historical story that sniffs of a quiet, personalized epic, but does so beautifully, eschewing the dripping drama so often wrongly associated with books that trace more than a few decades. [UP1–, UP2+, UP3+] c. Thus, players were more likely to behave positively if the team’s spectators and coaches did so as well. (COCA, Acad) [UP1+, UP2–, UP3+]

11 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB

Unacceptable examples violate the three UPs :

(4) a. ↓Bill knew the answer, and Harry did so, too. b. ↓Robin dislikes Ozzie, but Leslie doesn’t do so. c. ↓Robin fell out the window, but Leslie didn’t do so.

Examples are unacceptable because they do not respect three UPs but they are grammatical. The down-arrow symbol (↓) indicates unacceptability due to usage-preference violation. ⇒ Gradience in acceptability and grammaticality.

12 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB

Methodological importance of UPs : Coming back to Lakoff & Ross (1976) on do so, they calque typical VPE examples. Specifically, out of 33 example sentences with do so, 27 have contrasting subjects. Among these are all of the sentences that they use to argue that do so cannot have stative antecedents.

(5) a. Mary likes apples and Jane does too. b. *Bill knew the answer, and Harry did so, too. (Lakoff & Ross 1976). c. Bill knew the answer. He did so because he had read an article :::::::::::::::::::::::: on the subject in the paper the day before. :::::::::::::::::::::::::::::::::

Since Lakoff & Ross (1976), this unnatural pattern of usage has made its way into many articles and textbooks. In arguments for VP constituency and for the complement/adjunct distinction : in textbooks, e.g. Radford (1988), Haegeman (1991), Haegeman & Guéron (1999) ; in articles, e.g. Sobin (2008) : (out of 32 examples of do so, 26 have

contrasting subjects). 13 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB The moral of the story

A better understanding of usage is crucial to the interpretation of acceptability judgments. Corpus research is crucial to understanding usage preferences. Working on a small number of invented examples can lead to serious misinterpretations as to the actual source of acceptability differences.

14 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB The importance of corpus for ellipsis studies

Data issues : Use of empirically attested data prevents the problems related to the introspective data and to the variability in acceptability judgments across speakers. Need for the contextual dimension (cf. definition of ellipsis supra): investigation of the contextual constraints applying on various elliptical constructions ; observation of preferences between structures with and without ellipsis. Quantitative issues : frequency measures of several factors, e.g. which constructions are the most frequent, which constraints are strict or less strict. Corpus crucially gives a safer ground to assess the facts and evaluate competing analyses : allows one to choose the best suited analysis based on data.

Corpus investigation is not optional, but a must for ellipsis !

15 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Plan

1 Corpus-based studies motivation

2 Previous corpus studies Meyer (1995) Greenbaum & Nelson (1999) Harbusch & Kempen (2011)

3 PTB and ellipsis annotation

4 Gapping in PTB Missing material Remnants

16 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Few corpus studies on ellipsis

English American English (Meyer 1995): Brown Corpus (80 000 words ; edited written English) ; International Corpus of English (16 000 words ; spoken English) British English (Greenbaum and Nelson 1999) : selection of 82 spoken and written texts (176 968 words) drawn from the British component of ICE (ICE-GB)

German and Dutch (Harbusch 2011) German : TIGER (50 474 sentences of written newspaper text) ; VERBMOBIL (38 328 spoken sentences) Dutch : ALPINO (7 153 sentences of written newspaper text) ; CGN2.0 (130 594 spoken sentences)

17 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB

English American English : Meyer, Ch. 1995. Coordination Ellipsis in Spoken and Written American English. Language Sciences 17(3). 241-269. British English : Greenbaum, S. & Nelson, G. 1999. Elliptical clauses in spoken and written English. In P. Collins & D. Lee (eds.), The clause in English. Amsterdam : John Benjamins.

German and Dutch Many papers by Karin Harbusch and Gerard Kempen ; for an overview, see the paper below : Harbusch, K. 2011. Incremental sentence production and clausal coordinate ellipsis : A treebank study comparing spoken and written language in Dutch and German. Dialogue and Discourse 5. 313-332.

18 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Ellipsis in a typological perspective

Cf. Sanders (1977), 6 types of elliptical constructions across languages, depending on the position of the missing material in the clause : Three catalipsis types (= backward ellipsis) : the missing material is in the first clause. Three analipsis types (= forward ellipsis) : the missing material is in the last clause.

(A)BC & DEF A-ellipsis initial catalipsis A(B)C & DEF B-ellipsis median catalipsis AB(C) & DEF C-ellipsis final catalipsis ABC & (D)EF D-ellipsis initial analipsis ABC & D(E)F E-ellipsis median analipsis ABC & DE(F) F-ellipsis final analipsis

Only coordination is taken into account (as it is generally the case in the literature on ellipsis).

19 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB

Accessibility hierarchy, cf. Sanders (1977) : if an elliptical type T is accessible in a language, then any type to its right is accessible too.

 C  A > B > > D F > E

Availability of elliptical coordinations cross-linguistically :

Chinese ABC – DEF English, Japanese ABC – DEF Quechua ABC – DEF Russian ABC – DEF Hindi, Zapotec ABC – DEF Tojolabal ABC – DEF

⇒ At the two extremes of this spectrum, there are languages with very few types of ellipsis (e.g. Chinese) vs. languages where all elliptical types are available (e.g. Tojolabal).

20 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Ellipsis types retained by Meyer (1995)

According to Sanders (1977), English permits coordination ellipsis in three positions : C-ellipsis (the end of the first conjunct ∼ Right Node Raising), cf. (6) D-ellipsis (the beginning of the second conjunct ∼ Left Peripheral Ellipsis), cf. (7) E-ellipsis (the middle of the second conjunct ∼ Gapping), cf. (8)

(6) [...] the less developed countries must be persuaded to take the necessary steps to allocate Ø and commit their own resources.

(7) The Australian stopped trying to talk a pidgin I could understand, and Ø spoke strange words from deep in his chest.

(8) The top of the sample was nearly flat and the bottom Ø hemispherical.

⇒ 415 instances of clausal coordination ellipsis (CCE) in the corpora

21 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB The frequency of CCE in the corpora Ellipsis-Type Nb.occurrences Percentage RNR 9 2% Gapping 23 5,5% LPE 354 86% Mixed 29 7% Total 415

LPE – the most favorable type of ellipsis in English (86%) Gapping and RNR – less favorable (5,5% et 2%, respectively) [...] given the attention in linguistic theory devoted to the discussion of E-Ellipsis [Gapping], it is quite surprising to see how truly unproductive a process it is. (Meyer 1995 : 258) C-Ellipsis [RNR] is a relatively rare form of ellipsis [...] (Meyer 1995 : 266) ⇒ How to explain this frequency ? Psycholinguistic constraints (Suspense Effect, Serial Position Effect), Communicative Dynamism, Coordinator Type, Repetition Effect, etc.

22 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Possible explanations for these frequencies

Sanders’ (1977) psycholinguistic constraints : The Suspense Effect : ellipsis will be more desirable if the antecedent of ellipsis is known prior to ellipsis. E.g. D-, E-, and F-ellipsis are ’perceptually’ easier than the others. The Serial Position Effect : a particular ellipsis-type will be relatively desirable or undesirable depending on how prominent its antecedent position is. E.g. The A position is higly prominent, thus D-ellipsis is very desirable. Then follows C- and E-ellipsis. Communicative Dynamism (’a linear presentation from low to high information value’) : Positions with lowest information value (=old information) ⇒ high potential for ellipsis, e.g. D-ellipsis. Positions with high information value (=new information) ⇒ low potential for ellipsis, e.g. C- and E-ellipsis. Semantic factors such as the type of coordinator : ’Congruent’ relationship markers (and, or) facilitate ellipsis. ’Incongruent’ relationship markers (but) do not facilitate ellipsis. Etc. 23 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Ellipsis in speech and writing

Ellipsis Full form Total Speech 25 (40%) 38 (60%) 63 (100%) Writing 390 (73%) 142 (27%) 532 (100%)

Discrepancy between speech and writing, as regards ellipsis and its full counterpart : in written clausal coordinations, the proportion of ellipsis was about twice as high as in spoken coordinations. In speech, an overwhelming preference for the full unelliptical form. In writing, an overwhelming preference for the elliptical form. Meyer’s explanation in terms of audience design : Speech is transitory, repetition can enhance cohesion. Writing is permanent, no need for repetition. ⇒ Repetition – much more important in speech than in writing.

24 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB CCE across genres Genre RNR Gapping LPE Total Press 0 2 (3%) 60 (97%) 62 (100%) Learned 2 (5%) 5 (14%) 30 (81%) 37 (100%) Belles L. 1 (2%) 5 (8%) 59 (91%) 65 (100%) Gov.Doc. 6 (12%) 7 (14%) 37 (74%) 50 (100%) Fiction 0 3 (7%) 117 (98%) 120 (100%) Conversation 0 0 23 (100%) 23 (100%)

Gapping is overall an infrequent type of ellipsis ; more frequent in learned prose and fiction than in conversation, where it is non-existent. RNR is relatively uncommun ; most frequently in government documents. Correlation between the use of ellipsis and the clausal complexity in a genre : learned prose and government documents : complex and long sentences ⇒ low rate of ellipsis press reportage and fiction : less complex sentences ⇒ higher rate of ellipsis 25 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Greenbaum & Nelson (1999)

Focus on elliptical clauses ; exclusion of ellipsis in nominal (9-a)–(9-b) or verbal (10-a)–(10-b) phrases.

(9) a. Archaeological and philological evidence in fact confirms that... b. They are also enormously proud of the skill and courage of their armed forces.

(10) a. ... it has changed and homogenised societies by a process of literacy and literary coercion. b. The greatest enemy of the Arab world is not Ø and never has been the United States.

In particular, focus on LPE with ’elliptical’ subjects :

(11) a. And uh so we unpacked our stuff and Ø trooped in. b. ... you can queue up and Ø get a tour round the White House. 26 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB

Distinction between independent ellipsis (IC) vs. coordination ellipsis (CE).

(12) That little plant that grows Ø doesn’t matter what the soil conditions are whether it’s very acidic or chalk...

Discrepancy between speech and writing, wrt IE vs. CE distinction : CE occurs more frequently in writing (cf. Meyer 1995), while IE is characteristic of speech.

IE CE Total Speech 356 278 634 Writing 39 310 349 Total 395 588 983

Ellipsis of clause initial elements – the subject or the subject plus the auxiliary – is favoured throughout the corpus.

27 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB German and Dutch corpus studies

Many papers by Karin Harbusch and Gerard Kempen, one of the most synthetic ones being Harbusch (2011).

German TIGER : 50 474 sentences of written newspaper text VERBMOBIL : 38 328 spoken sentences

Dutch ALPINO : 7 153 sentences of written newspaper text CGN2.0 : 130 594 spoken sentences (dialogue turns) from more than ten different domains

Four CCE types : RNR, Gapping (including ), LPE, and Subject Gap in clauses with Finite/Fronted verbs 28 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB CCE in the German and Dutch treebanks Spoken language Written language VERBMOBIL CGN2.0 TIGER ALPINO (German) (Dutch) (German) (Dutch) Nb. CC 3 713 8 653 7 194 931 Nb. CCE 1 314 924 4 020 319 % CC 10% 6% 14% 13% % CCE 35% 11% 56% 34%

Confirmation of English results : for both languages, the incidence of CCE is about two (or even three) times higher in written than in spoken language. How to explain this frequency ? Speaker (but not listener) perspective, related to stronger constraints in speech than in writing : stronger time pressure, tighter processing constraints, higher working memory load, more severe workspace limitations. Incremental sentence production. Speakers have a stronger tendency than writers to plan the grammatical shape of each clause in isolation, i.e. without taking the shape of coordinated clauses into account, thus overlooking many elliptical options. 29 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Relative frequencies of the four CCE types

Spoken language Written language CCE VERBMOBIL CGN2.0 TIGER ALPINO type (German) (Dutch) (German) (Dutch) RNR 1% 3% 10% 5% Gapping 33% 31% 17% 10% LPE 55% 61% 63% 82% SGF 11% 5% 10% 3%

Well-represented types : LPE and Gapping (80%-92%) ; their distribution considerably differs between modalities : LPE : higher frequency in written language Gapping : considerably more frequent in spoken language (32% vs. 13%) RNR : marginal existence ; higher frequency in written language

30 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Overview of existing corpus studies

In written clausal coordinations, the proportion of CCE versions is about twice as high as in spoken coordination. Criticism : The ellipsis-type considered as having the highest frequency in all these corpora is LPE with ’elliptical’ subjects. And yet, there are alternative analyses proposing a non-elliptical treatment of this kind of construction (i.e. coordination of verbal phrases instead of clausal coordination with subject ellipsis). If one removes these LPE occurrences, do the percentages rest the same ? RNR is considerably less represented than Gapping, especially in spoken language. Criticism : RNR has a larger distribution than Gapping (not only clauses, but also NP, VP, etc. ; not only coordination, but also parataxis, subordination, etc.). By choosing only the clausal coordination domain for RNR occurrences, one eliminates a significant mass of data.

31 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Why an additional corpus study ?

Limits of the previous corpus studies : very biased results

1 The choice of the ’elliptical’ constructions under investigation : are treated at the same level constructions unanimously recognized as elliptical (e.g. Gapping and RNR) and constructions lending themselves to a non-elliptical account (e.g. LPE with ’elliptical’ subjects). All these studies show that the latter (controversial LPE) type generally has the highest frequency, and hence, a wrong quantitative interpretation of the ellipsis phenomenon in general.

2 The syntactic domain under investigation : only interclausal ellipses are taken into account. Some types of ellipsis are most frequent at the sub-clausal level, e.g. RNR.

32 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB A less theory-dependent look on corpus data

When necessary, we take into account not only the clausal, but also the subclausal level. We take into account the fine-grained distinction of various elliptical constructions (e.g. Gapping vs. Stripping vs. Argument Cluster Coordination). ⇒ Class 2 : Gapping in Penn Treebank. ⇒ Class 8 : Right-Node Raising in Penn Treebank.

33 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Plan

1 Corpus-based studies motivation

2 Previous corpus studies Meyer (1995) Greenbaum & Nelson (1999) Harbusch & Kempen (2011)

3 PTB and ellipsis annotation

4 Gapping in PTB Missing material Remnants

34 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB The Penn Treebank (PTB)

A large annotated corpus of American English (4.5 million words).

Main authors : Mitch Marcus, Ann Taylor (University of Pennsylvania). A three-phases project, started in 1989. Sources : 4 sub-corpora of written and spoken English written : Wall Street Journal (1989) and Brown Corpus (1961) spoken : part of Air Travel Information Services [ATIS-3] (1995) and part of Switchboard corpus (1991) Annotations : morpho-syntactic annotation (POS tags, lemma) constituent annotation (parsing) 4 different functional tags : form/function discrepancies (-adv, -nom), grammatical role (-prd, -sbj, etc.), adverbials semantics (-bnf, -loc, -tmp, etc.), miscellaneous (-clr, -clf, etc.) dysfluency annotation (for Switchboard corpus) Use of the Stanford Tregex utility for matching patterns in trees, based on node descriptions or relationships between tree nodes. 35 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Gapping annotation in the PTB

For Gapping (including both clausal and sub-clausal occurrences, e.g. Argument Cluster Coordination ∼ Left Peripheral Ellipsis) structures inside a single sentence : a simple notational mechanism, based on structural templates. The full clause is used as a template. The remnants in the gapped clause are mapped from the gapped clause onto that template, by using gap co-indexing, i.e. an equal sign on the remnant indicates that it should be mapped over its correlate in the full clause. (13) Her name was Suzanne, and mine__ Stephen. (brwn-880)

36 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB RNR annotation in the PTB

Right Node Raising : treated in the PTB in terms of null elements. For discontinuous constituents, use of a Pseudo-Attach function : a method of showing that non-adjacent constituents are related. ⇒ This mechanism is used for several phenomena, such as extraction, extraposition, structural ambiguity and RNR. Among the four types of pseudo-attach, *RNR* -attach is used for shared constituents, i.e. those cases in which a constituent shoud be interpreted simultaneously in more than one place. RNR co-indexing : an index number added to the label of the original constituent is incorporated into the ’null element’. Regular RNR : two (or more) *RNR*-tags co-indexed with the constituent with which the null is associated.

37 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Sample with regular RNR in PTB

(14) Tonight a group of men *RNR*-1, tomorrow night he himself *RNR*-1, would go out there somewhere and wait. (brwn-12426)

38 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Sample with RNR and Gapping in PTB

(15) Philip Glass is the emperor *RNR*-1, and his music __ the new clothes *RNR*-1, of the avant-garde. (wsj-26487)

39 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Other ellipsis annotations in PTB

Structures outside a single sentence : a frag tag, used for other ellipses too (e.g. fragments in the dialogue).

VPE, comparative ellipsis and other related types : a * ?* placeholder for a missing predicate or piece thereof (no co-indexing).

40 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Gap co-indexing other than Gapping

When you’re doing corpus studies, you have to be careful about annotations ! E.g. in the PTB, the gap co-indexing used to annotate Gapping constructions is also used for other constructions. Gapping (’median’ ellipsis at S-level, in general), cf. (16), vs. ACC (clusters under the syntactic scope of a predicate, in general a verb), cf. (17).

(16) In one hand she clutched a hundred dollar bill and in the other a straw suitcase. (brwn-512)

(17) Shelley sent a copy to Southey, a former friend, and another to Godwin. (brwn-11012)

41 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB

Argument Cluster Coordination (ACC) :

(18) a. When something unexpected happened, one always asked for water if one were a woman, brandy if one were a man. (brwn-11851) b. That’s why we went with IBM for data center management ... and now Digital, for voice and data telecommunications. (wsj-8536) c. The rating agency also reduced the ratings for long-term deposits to B-3 from Ba-3 and for preferred stock to Ca from Caa. (wsj-16540)

42 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB

Stripping :

(19) a. Where I go, she goes – and the kids with us. (brwn-7599) b. Winter came, and with it Mary’s baby – a boy as she had wished. (brwn-11007) c. Tolstoy’s characters eat, Pushkin’s, Gogol’s. (wsj-29848) d. ... I was speaking English *NOT* *NOT* at the time, and quite loud so I could be understood. (wsj-22793)

Partitive verbless appositions (PVA) :

(20) a. Wherever you looked, you saw Committemen running across the meadows, some away from the road, some toward it, some parallel to it. (brwn-12724) b. I have two sisters in Texas now, um, one in Austin, one in Dallas. (swbd-53420)

43 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Gap co-indexing in the PTB

Because of many annotation errors, need of manual treatment of the corpus data. Extraction of 554 occurrences, matching the pattern /=/.

WSJ Brwn Swbd ATIS Total Gapping 97 85 10 0 192 ACC 183 61 38 1 283 Stripping 25 21 18 0 64 PVA 4 9 2 0 15 Total 309 176 68 1 554

Overall, we note the same preference observed in previous corpus studies, wrt the writing/speech distinction : ellipsis occurs more often in writing than in speech.

44 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Plan

1 Corpus-based studies motivation

2 Previous corpus studies Meyer (1995) Greenbaum & Nelson (1999) Harbusch & Kempen (2011)

3 PTB and ellipsis annotation

4 Gapping in PTB Missing material Remnants

45 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Variables used for data classification

Genre (WSJ, Brwn, Swbd) Clausal type (root, embedded) Conjunction (parataxis, ’and’, ’or’, but’...) Category of the source and of the target clauses Missing material Pairs number (2/2, 3/3, + different asymmetries) Pair 1 : category, function, contrast Pair 2 : category, function, contrast Pair 3 (if any) : category, function, contrast Multiple Gapping (more than one target clause) Varia (any type of asymmetries : agreement, category, embedding, locality, discursive relation, etc.)

46 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Gapping and clause types

Most gapping occurrences are root clauses (64%). However, embedded occurrences are by no means insignificant (36%).

WSJ Brwn Swbd Total root 58 60 5 123 embed 39 25 5 69

(21) To my knowledge, Lincoln remains the only Head of State and Commander-in-Chief who (...) proved man enough to say this publicly – to give his foe the benefit of the fact that in all human truth there is some error, and in all our error, some truth. (brwn-16875)

47 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB

Most gapping occurrences are finite clauses, but there are some cases with gapping operating in non-finite clauses (against classical definition of gapping !) :

(22) Each mode is believed to have a specific attribute – one inducing pleasure, another generosity, another love, and so on, to include all of the emotions. (brwn-14809)

(23) But there are also the commercial propagandists and the analysts – one dominated by money, the other by nineteenth-century German scholarship. (brwn-22703)

48 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB

Most gapping occurrences are declarative clauses, but there are some cases with gapping operating in non-declarative clauses :

(24) a. The arrangement with Argiento was working well, except that sometimes Michelangelo could not figure who was master and who apprentice. (brwn-12642) b. So why did advertising pages plunge by almost 10% and ad revenue by 7.2% in the first half ? (wsj-26366)

(25) Let the open enemy to it be regarded as a Pandora with her box opened ; and the disguised one, as the serpent creeping with his deadly wiles into Paradise. (brwn-17725) [ACC + Gapping, as many examples in the PTB]

49 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB

75% of gapping cases occur in coordination and 25% in parataxis. WSJ Brwn Swbd Nb.occurr. and 84 46 10 140 or 1 2 0 3 but 0 1 0 1 paratax 12 36 0 48

⇒ In accordance with the contrast coordination/subordination from Sag (1976) :

(26) a. Sandy played the guitar, and Betsy the recorder. b. *Sandy played the guitar {before / after / although / while} Betsy the recorder.

Gapping is supposed to occur in comparative contexts too, cf. (27-a)–(27-b). But no such example found in PTB. Why ? See Hoeksema (2006) : general preference for coordination structures in gapping vs. general preference for comparison-like structures in . ⇒ Usage preferences.

(27) a. Robin speaks French better than Leslie German. (Culicover & Jackendoff 2005) b. Bill ate more peaches than Harry grapes. (Jackendoff 1971) 50 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Gapping and discourse relations

Levin & Prince (1986), Kehler (2002) : Gapping is appropriate only when a symmetrical discourse relation holds between the two clauses. Symmetrical discourse relations : independent events, permutation in the order of conjuncts without affecting the interpretation of the whole. E.g. Ressemblance (and similarly), Contrast (but), Parallelism (or). Asymmetrical discourse relations : Successsion/Contiguity (and then), Result (and therefore), Condition (or else).

(28) a. Sue became upset and (similarly) Nan angry. b. Gephardt supported Gore, but Armey Bush. c. Bill will visit Sue or Sue Bill.

(29) a. ?Sue became upset and then Nan dowright angry. b. #Sue became upset and therefore Nan dowright angry. c. #. . . Now either you will go to New York or Bill to Boston !

51 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB

In the PTB, most of gapping examples involve parallelism and contrast relation (symmetrical discourse relations), except some cases with a succession/contiguity (and then) relation or a condition relation.

(30) a. The cork should be pulled gradually and smoothly, and the lip of the bottle wiped afterward. (brwn-22792) b. One cell was teased out, and its DNA extracted. (wsj-49927)

(31) In affidavits, each plaintiff claims Mr. Peterson promised the bank purchase would be completed by the end of 1988 or the money returned. (wsj-1915)

52 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Multiple gapping

27 occurrences (14%) with more than one target clause :

(32) These trumps were more touching than they were anything else, and seemed to imply that the nights were long, her children ungrateful, and her marriage bewilderingly threadbare. (brwn-12298)

(33) But we still hear him moaning at night because the Navy has a few ships left, and to satisfy him the Navy’s sea lift forces were given to a new Air Force bureaucracy in Illinois, its space operations to another command in Colorado, the frogmen to a new Army bureaucracy in Fort Bragg, and the Navy’s Indian Ocean and Persian Gulf forces to an Army bureaucracy in Florida. (wsj-25887)

53 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Constraints on the missing material

At least the main (head) verb is missing, except cases with subject inversion (new data !) :

(34) a. Only recently has it been attractively redesigned and its editorial product improved. (wsj-18562) b. So why did advertising pages plunge by almost 10% and ad revenue by 7.2% in the first half ? (wsj-26366)

54 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Case 1 : missing material contains only verbal forms (77%).

1 verb 102 2 verbs 32 3 verbs 14 lex vb 36 modal + vb 18 modal + BE + part 13 aux BE 64 BE + part 10 modal + HAVE + part 1 modal 1 HAVE + part 4 TO 1

Case 2 : more than the verb can be missing (subjects, complements, adjuncts) ; 14%.

subj NP + vb 10 vb + comp/adj NP 6 vb + comp/adj PP 6 vb + pred Adj 2 vb + comp NP + adj PP 1 vb + adj PP + adj AdvP + vb 1

(35) In Dunston the rent would run close to two hundred a month ; in Medfield, perhaps twenty-five less, not all of it paid by Thayer, who could charge off one room on his expense account. (brwn-18216)

55 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB

Case 3 : complex embedding (18 occurrences).

embedded infinitives 8 PP/NP domain 5 relative adjuncts 4 subordinates with no complementizer 1

(36) a. The Perch and Dolphin fields are expected to start producing early next year, and the Seahorse and Tarwhine fields later next year. (wsj-12345) b. The second ship is scheduled to be delivered in fall 1990 and the third in fall 1991. (wsj-47074)

(37) a. ... Mr. Rogers said that of the 14 computer-related firms he follows, half will report for their most recent quarter earnings below last year’s results, and half above those results. (wsj-2134) b. Builders get away with using sand and financiers junk when society decides it’s okay, necessary even, to look the other way. (wsj-39369)

56 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB

(38) a. In the past, it has been the husband who has been dominant and the wife passive. (brwn-21990) b. Inflation is expected to be highest in Greece, where it is projected at 14.25%, and Portugal, at 13%. (wsj-40564) c. Forest-product issues showing strength included Champion International, which went up 1 3 to 31 7 ; Weyerhaeuser, up 3/4 to 27 1/4 ; Louisiana-Pacific, up 1 1/8 to 40 3/8, and Boise Cascade, up 5/8 to 42. (wsj-14299)

(39) Some claim they suffered losses because they sold while he was buying and others because they bought while he was selling. (wsj-7562)

Examples such as (38) are unexpected and ungrammatical according to many theoretical studies (which take this unacceptability as a strong evidence for a structural approach, see e.g. Merchant 2001).

57 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Gapping and negation

Ross (1967) : negation cannot be elided. Sag (1976) : the omission of negation is blocked with the conjunction and, cf. (40-a).

(40) a. *I didn’t eat fish and Bill ice-cream. b. I didn’t eat fish nor Bill ice-cream.

But what about PTB attested examples such as (41), where negation is perfectly ok with wide scope ? (See also Repp 2009 for more details.)

(41) And, therefore, being in disgrace, they would not be cremated and their ashes flung to the winds in public ceremony. (brwn-5691)

58 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Verb identity in Gapping

The source verb and the missing verb may have different syntactic features, dependents... Agreement mismatch (person, number) : 14 occurrences Different status of the auxiliary to be : 5 occurrences

(42) a. The plant was evacuated and workers sent home. (wsj-40142) b. The tables were all spinning, the dice rattling, the bar crowded. (brwn-1924)

But tense identity (43) and lexematic identity (44) :

(43) *Paul went yesterday to the movies and Mary tomorrow to the pool.

(44) #Paul shot a feasan and his son the breeze.

59 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Missing elements can be discontinuous

The redondant material in the target is sometimes omitted along with the gapped material (many cases of nominal ellipsis ; PP ellipsis) :

(45) a. Meat consumption is at 1979’s level, pork production at 1973’s, milk output at 1960’s. (wsj-33185) b. Time is a queer thing and memory a queerer ; the tricks that time plays with memory and memory with time are queerest of all. (brwn-16058)

(46) a. Vegetables are abundant and full of flavor in Poland, the pickles and sauerkraut sublime, the state monopolies long broken. (wsj-33188) b. While American PC sales have averaged roughly 25% annual growth since 1984 and West European sales a whopping 40%, Japanese sales were flat for most of that time. (wsj-20459)

60 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Number of remnants

(At least) two remnants in the target clause, matching two correlates in the source clause. ⇒ (At least) two contrastive pairs. Jackendoff (1971), Kuno (1976), etc. : no more than two pairs in English (47-a). Johnson (2014) : Gapping with 3 remnants improves if it is the answer to a multiple wh-question (47-b).

(47) a. *Arizona elected Goldwater Senator, and Massachussets McCormack Congressman. (Jackendoff 1971) b. A : – Who will send who what ? B : – Sally will send Ron pickles, and Martha Hermione kumquats. (Johnson 2014)

61 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB

In the PTB, there are cases with 3 contrastive pairs, but in all of them there is no category identity. Hypothesis : Degraded acceptability is not linked to the number of remnants, but rather to their syntactic and semantic identity (e.g. 3 NP remnants with the same semantic type – animate individuals – would be weird).

(48) a. One moment there was a man in the saddle ; the next a headless horror on a horse that bolted through the redcoat ranks, and during the next second or two, we all of us fired into the suddenly disorganized column of soldiers. (brwn-12708) b. The Hart-Scott filing is then reviewed and any antitrust concerns usually met. (wsj-36691)

This could also apply to classical 2 remnants. Difficult to interpret cases such as (49) ; appeal to prosody cues.

(49) NAS is National Advanced Systems, CDC – Control Data Corp. (wsj-40986)

62 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Most frequent patterns

Syntactic category and function in the pairs : Pair 1 : subject_NP 176 occ. 92% comp_PP 45 comp_NP 40 Pair 2 with 5 frequent patterns : comp_VP 31 pred_AdjP 30 adjunct_PP 19 subject_NP comp_XP 152 subject_NP adjunct_XP 25 subject_NP comp_PP 45 Pair 1 – pair 2 : subject_NP comp_NP 34 subject_NP comp_VP 31 subject_NP pred_AdjP 28 subject_NP adjunct_PP 17

63 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Syntactic asymmetries

A rigid constraint : syntactic symmetry wrt the syntactic function, i.e. remnants share the same syntactic function with their correlates. Otherwise, no strict syntactic parallelism (contra Hartmann 2000, Culicover & Jackendoff 2005, etc.). Different categories in a pair : AdjP/PP, AdjP/VP, NP/PP, NP/AdjP, etc.

(50) a. The mouth was thin-lipped and wide, the long cleft in the upper lip like a slide. (brwn-13191) b. Gustav Vasa is a superb example, and Charles 10, the conqueror of Denmark, hardly less so. (brwn-15903)

Different word order :

(51) a. The witness table is center stage and below it, the paraphernalia for the ever-present media, in this case, TNN, the Total News Network. (wsj-17574) b. and then, every other day I’d like to incorporate the video, and then the bike on the other day. (swbd-152914)

64 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB

Different number of overt dependents :

1 No lexical correlate (implicit correlate) :

(52) The phone had been disconnected but telegrams came for him and notes by special messenger. (brwn-10722)

2 Additional constituent in the target : e.g. sentence adverbials (of course, perhaps, nowadays, etc.).

(53) Keith was on his feet because he didn’t care at all about life any more ; Penny on her feet, proudly, because she cared too much. (brwn-6891)

65 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB

Remnants are not necessarily sisters (may depend on different verbs), see examples (36)-(39) : One of the remnants depends on a non-verbal (but predicative) head. One of the remnants depends on a verbal head, but an embedded one (see embedded infinitives, subordinate clauses). One of the remnants occurs inside a NP/PP domain or an island. A general syntactic constraint supposed to apply in Gapping structures is the Major Constituent Condition (Hankamer 1971, 1973) : each remnant must be paired with some ’major’ correlate in the source, namely some correlate that depends on a verbal head. BUT need of experimental studies :

(54) a. John thought about Jane, and Bill, Betty. b. Fred has been working on semantics, and Bill, syntax. c. Fred sat on a chair, Mary a stool, and Bill, a bench.

66 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Semantic contrast Remnants must stand in semantic contrast with their correlates (Hartmann 2000, Repp 2009) : they must belong to the same alternatives set and be different. frequent remnant classes in PTB pair 1 pair 2 abstract/concrete objects 95 70 individuals 53 26 antonymy 14 30 alternative the one...the other 13 3 scalar 1 25 ... (55) a. Some claim they suffered losses because they sold while he was buying and others because they bought while he was selling. (wsj-7562) b. ... a market order to buy would be filled at the higher price and a market order to sell at the lower price. (wsj-1072) c. Some have beautiful gardens ; some not even a blade of grass. (brwn-23266)

(56) Time is a queer thing and memory a queerer ; the tricks that time plays with memory and memory with time are queerest of all. (brwn-16058) 67 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB

Lexeme identity is possible with certain quantified phrases, deictic proforms or interrogative elements.

(57) a. ... Mr. Rogers said that of the 14 computer-related firms he follows, half will report for their most recent quarter earnings below last year’s results, and half above those results. (wsj-2134) b. Some have beautiful gardens – some not even a blade of grass. (brwn-23266) c. So each reading can be given a weight and each reading a score by adding up these weights. (brwn-23061) d. The arrangement with Argiento was working well, except that sometimes Michelangelo could not figure who was master and who apprentice. (brwn-12642)

68 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB

Sometimes the contrast is not so clear (whole-part relationship, possession, individual/object).

(58) a. And, therefore, being in disgrace, they would not be cremated and their ashes flung to the winds in public ceremony. (brwn-5691) b. Only recently has it been attractively redesigned and its editorial product improved. (wsj-18562) c. Philip Glass is the emperor, and his music the new clothes, of the avant-garde. (wsj-26487)

(59) Gross was behind a clean-top desk, only a manila folder before him. (brwn-15936)

69 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Conclusion on PTB Gapping data

Gapping data in the PTB reassess many aspects generally assumed in the theoretical literature on Gapping. PTB data show that Gapping is not ’a very rare elliptical construction’ (6= previous corpus studies on English). Discrepancy between speech and writing wrt elliptical phenomena (= previous corpus studies) : ellipsis in general is much less frequent in spoken than in written copora of PTB. This generalization applies to Gapping too (6= Harbusch & Kempen). Need of further empirical investigation (psycholinguistic experiments) to test various issues raised by our corpus data.

70 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB Experimental perspectives

There are many aspects on English Gapping that need further em- pirical investigation. As most of you are interested in setting up an acceptability judgment task, you have various possible topics for the course assignment.

Gapping and mismatch. Check for various asymmetries in English Gapping. E.g. set up an acceptability experiment on valence mismatch :

(60) John sent Mary white roses and Bob a book to his best friend. (Miller, p.c.)

71 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB

Gapping and the Major Constituent Constraint. Check for the acceptability of English Gapping cases where the remnant is not a ’major constituent’ in the sense of Hankamer (1971, 1973), e.g. its correlate may depend on a preposition, and not directly on a verbal head.

(61) a. John thought about Jane, and Bill, Betty. b. Fred has been working on semantics, and Bill, syntax. c. Fred sat on a chair, Mary a stool, and Bill, a bench.

(62) a. John thought about Jane, and Bill, about Betty. b. Fred has been working on semantics, and Bill, on syntax. c. Fred sat on a chair, Mary on a stool, and Bill, on a bench.

72 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB

Gapping and the number of remnants. Check for the acceptability of English Gapping with more than two remnants ; in particular, observe the acceptability of examples with three remnants, all being NP. Semantic type ? Syntactic category (prepositional marking) ?

(63) a. *Arizona elected Goldwater Senator, and [Massachussets] [McCormack] [Congressman]. (Jackendoff 1971) b. *Millie will send the President an obscene telegram, and [Paul] [the Queen] [a pregnant duck]. (Jackendoff 1971)

(64) A : – Who will send who what ? B : – Sally will send Ron pickles, and [Martha] [Hermione] [kumquats]. (Johnson 2014)

(65) Some talked with you about politics and [others] [with me] [about music]. (Winkler 2005)

73 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB

Gapping and subordination. Check for the acceptability of English Gapping with various subordinating elements : oppositive whereas, while vs. whenever, despite the fact that ; before ; comparative contexts...

(66) a. Men are valued for their economic status, whereas women for their appearance. (Izutsu 2008) b. Boys are encouraged to go out for work, while girls to stay at home.

(67) a. *Sam played tuba whenever Max sax. (Jackendoff 1971) b. *McTavish plays bagpipe despite the fact that McCawley the contrafagotto d’amore.

(68) a. *Sandy played the guitar before Betsy the recorder. (Sag 1976) b. Truth is you will be in a position to hire me, before I, you. (Google search, Chaves p.c.)

(69) a. Robin doesn’t speak French, let alone Leslie, German. (Culicover & Jackendoff 2005) b. Robin speaks French better than Leslie German.

74 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB

Gapping and embedded remnants. Check for the acceptability of Gapping examples where the remnants belong to different clauses. The attested example (70) from the PTB has a subordinate without an overt complementizer, but what about subordinates with overt complementizers ?

(70) Some claim they suffered losses because they sold while he was buying and others because they bought while he was selling. (wsj-7562)

The judgments from the literature are quite different : Syntactic constraint : No remnants embedded under a subordinate clause (Johnson 1996/2004). Acceptability for complement that-clauses, but not for adjunct clauses (Gardent 1991). Semantic constraint : Remnants embedded under a complement that-clause are ok if the root subject and the embedded subject are coreferential (Merchant 2001, Lasnik 2006, Repp 2009).

75 / 76 N Corpus-based studies motivation Previous corpus studies PTB and ellipsis annotation Gapping in PTB

(71) a. *Some said that Mittie liked beans and others rice. (Johnson 1996/2004) b. *Some left in order to meet Mittie and others Sam. (72) a. This doctor said that I should eat salmon and that doctor tuna. (Gardent 1991) b. The child insisted that she wanted chips and the mother salad. c. *John left without telling his boss and Bill his colleagues. (73) a. Jim said that he called his mum and John his dad. (Repp 2009) b. *Jim claimed that Alan went to the ballgame and John to the movies.

(74) a. Johni thinks that hei will see Susan and Harry Mary. (Lasnik 2006) b. *John said you kissed Susan, and Bill Mary.

76 / 76 N