Uncorrected Proofs - 198 Interview with Geoffrey Sampson
Total Page:16
File Type:pdf, Size:1020Kb
A two-way exchange between syntax and corpora Inhiscontribution,GeoffreySampson,ProfessorEmeritusattheUniversity ofSussex(UnitedKingdom),highlightstherelationshipbetweenCorpusLin- guisticsandSyntax.Heshowshowthisbondhasatwo-waynature.Inhisview, theuseofcorporainlanguageresearchallowsonetobetterunderstandsyntac- ticissuesandthedevelopmentoflanguagecomplexity.However,theotherway isalsotrueinSampson’sviewsincehebelievesthefocusonsyntaxisoneofthe majorfactorscontributingtothegrowthofinterestinCorpusLinguistics.From amoregeneralperspective,Sampsonarguesinfavoroflinguisticsremaining acreativeactivitywhichdevelopsinunexpectedways.Asfortheprospectsof CorpusLinguistics,hepredictsitsdeath–notofthisapproachitself,butof theterm.Hebelievesthelabel‘corpuslinguistics’willdisappearwhencorpora becomejustanotherresourceavailabletolinguists. 1. Where do you place the roots of Corpus Linguistics? And to what do you attribute the growth of interest in the area? 2. Is Corpus Linguistics a science or a methodology? Where would you situate Corpus Linguistics in theJohn scientific Benjamins or Publishingmethodological Company panorama? Imusttakethesequestionstogether, becauseansweringeitheroneinvolvesdis- cussingtheother. Thefirstthingthatneedstobesaidabouttheseandtherestofthisseriesof questions(IshallbesurprisedifIamtheonlycontributorwhomakesessentially thesamepoint)isthatitismisleadingtothinkof“CorpusLinguistics”asabranch oflinguistics,alongsidesociolinguisticsorhistoricallinguistics.Corpuslinguists arejustpeoplewhostudylanguageandlanguagesinanempirical,scientificman- ner,usingwhateversourcesofempiricaldataareavailable;atthepresenttimeit happensthat,formanyaspectsoflanguage,themostusefuldatasourcesareoften electroniccorpora.Iworkalotwithcorpora,butIthinkofmyselfasalinguist, nota“corpuslinguist”.Ifsomeaspectoflanguageisbetterstudiedusingother tools,Iwillusethose. Uncorrected proofs - 198 InterviewwithGeoffreySampson The reason why corpora have become more significant in linguistics than theyusedtobeinclude:(i)theavailabilityofcomputers;(ii)changeofemphasis fromphonologytosyntax;and(iii)thebankruptcyofintuition-basedtechniques. Idiscussthesepointsinturn: Availability ofcomputers Itishardtodomuchwithacorpusunlessitisinelectronicformandyouhave accesstoacomputertoprocessandsearchit.TheBrownCorpus,thefirstelec- troniccorpus,waspublishedin1964,whichasithappenswasclosetothetime whenIbeganlearningtoworkwithcomputers–butthatwasveryunusualthen forsomeonewithahumanitiesbackground.Everyonehadheardofcomputers, butmostacademicsknewlittleaboutthemandhadcertainlyneverseenone.I remembertheairofimperfectly-concealedcondescensionwithwhichengineers andmathematiciansgreetedtheideathatsomeofusartstypeswantedtoplay withtheirmachines.Whenwemanagedtodoso,thelow-levelprogramming languagesofthosedaysandthebatch-processingapproachof1960scomputing environmentsmeantthat,althoughonecouldusecomputerstofindoutthings aboutlanguagewhichwouldbehardtodiscoveranyotherway,theprocesswas horriblyslowandcumbersomerelativetowhatispossibleandeasynow. Itwasnotuntilsometimeinthe1980sthatcomputersbegantobecomerou- tinelyavailabletolinguists.Eventhatisquiteawhileagonow;butwhenacom- plexnewtechnologydoesbecomeconvenientandwidelyavailable,itinevitably takestimeforaprofessiontoadjusttoitspossibilities.Corpus-basedtechniques havetakendecadestocatchoninlinguistics,butIamnotsurethatonecould haveexpectedtheprocesstooccur Johnfaster. Benjamins Publishing Company Change ofemphasiswithinthediscipline Untilsomepointinthe1960s,theintellectual“centreofgravity”oflinguisticslay inphonology,whichdealsmainlywithfinitesystemsofafewdozenphonemes thatcombineinalimitednumberofways.Corporadonotoffermuchtothepho- nologist.Onecansurveythepossibilitiesadequatelyusingtraditionaltechniques. Onlywiththeriseofgenerativelinguisticsdidthe“weight”ofthedisciplineshift tosyntax,whichdealswithlargenumbersofelementscombiningineffectively infinitelymanyways.Thatmeantthatoneneededtostudyverylargesamplesto haveachanceofencounteringarepresentativerangeofpossibilities,socorpus compilationbecamethewayforward. Uncorrected proofs - Atwo-wayexchangebetweensyntaxandcorpora 199 Bankruptcy ofintuition-basedtechniques Ironically,whilethegenerativemovementshiftedlinguists’attentiontoanaspect oflanguage–syntax–whichisdifficulttostudyempiricallywithouttheuseof corpora,theunempiricalstyleofresearchadvocatedbythegenerativistsledvery manylinguiststoignorethevirtuesofcorporaforalongtimeaftertheystarted becomingavailable.No-oneinthemodernworldwouldsuggestthat,say,meteo- rologistsormarinebiologistsshoulddecidewhattheirbasicdatawerewithout lookingatevidence:itistooobviousthattheweather,andmarineorganisms,are thingsindependentofusandthatwecanfindoutaboutthemonlybylooking. Languageisnotinthesamesenseindependentofhumancognition,soitmay atfirsthavebeenreasonablefortheChomskyanstobelievethatalinguistcan decidewhatisinandwhatisnotinhislanguagebyintrospection,withoutexter- nalobservation.And,aswellasarguingthatgrammar-writingcanbebasedon introspection,theycitedthe“absenceofnegativeevidence”(thatis,wedon’thear starredsentences)inordertoarguethatgrammar-writingcannotsuccessfullybe basedonobservation. Forashortwhiletheseideasmayhavebeenreasonable,butitsoonturned outthateliminatingthedependenceofscienceonobservationisjustasbadan ideainlinguisticsasinphysicalsciences.Thiswasclearatleastfromthetime when William Labov (1975) demonstrated that speakers simply do not know howtheyspeak,andthatgenerativelinguistsascribeanauthoritytotheirown judgementswhichtheymanifestlydonotpossess.Theargumentfromabsence ofnegativeevidencerepresentedamisunderstandingofhowempiricalscience works(Sampson1975);ifitwereagoodargument,nophysicalsciencewouldbe possible(Sampson2005:89–91). John Benjamins Publishing Company Bynowtherearemanycaseswherecoreelementsofnon-empiricallinguists’ theoriesrestonintuitivebeliefsthatarewildlyatvariancewithreality.Oneof NoamChomsky’sleadingargumentsforinnateknowledgeoflanguage(seee.g. Chomsky1980:40)istheclaimthat,withoutinnateknowledge,childrencould notsucceedinmasteringtheEnglishruleforformingquestions,becausestruc- turesthatareallegedlycrucialfordeterminingthecorrectrulearesorarethat onecanliveone’slifewithouteverhearinganexample.Chomskyseemstohave basedthatstatementonguesswork(or“intuition”,ifonewantstousethemore dignifiedterm).AlthoughIdonotbelievethatoneneedstoheartheseparticu- larstructurestogetthequestionruleright,Iusedthedemographically-sampled speechsectionoftheBritishNationalCorpustocheckhowrarethestructures areinreallife.Itturnedoutthatonecanexpecttohearthousandsofrelevant examplesinalifetime’sexposuretocasualchat(Sampson2005:81).Thisisnotan isolatedcaseofmismatchbetweengenerativelinguists’intuitionsandempirical Uncorrected proofs - 200 InterviewwithGeoffreySampson reality(thoughitisperhapsthemostegregiouscase,inviewofthefrequencywith whichthegenerativeliteraturehasreliedonthisbaselessassertion–cf.Pullum andScholz2002:39–40). Eveninfaceofabsurditieslikethis,quiteafewlinguistsdocontinuetocling to the idea that grammatical research can progress independently of empirical evidence.Butbynowtheyarestartingtoresembleupper-middle-classEdwardian ladieswhocannotconceiveofcookingorcleaningwiththeirownhands.Fiddling aboutwithscriptsforsearchingtextfilesorwithtaperecordingsofspontaneous speechlookslikeservants’worktosomeofthemorepreciousinhabitantsoflin- guisticsdepartments.Buttherealityofmanyareasofpresent-daylinguisticsisthat, ifonewantstomakeprogressratherthanjustgothroughthemotions,thatisthe kindofworkthathastobedone;andIthinkthisisnowobvioustomanyyounger linguists.Soitisnosurprisethatcorpusworkhasbeencomingtothefore. The remaining point in Questions 1 and 2 concerns the “roots” of corpus linguistics.DianaMcCarthyandIsurveyedthehistoricaloriginsofcorpuswork brieflyinourCorpusLinguisticsanthology(SampsonandMcCarthy2004:1–4). OnemightarguethatDrJohnson’sdictionarywasbasedinpartona“corpus”of literaryquotations,andtheworkofWilhelmKaeding(1898)seemstohavebeen aclearearlycaseofcorpuslinguisticsinthemodernsense.Butthesearematters offactandofdefinition(whatcountsasa“corpus”?),ratherthanofintellectual controversy;thereislittletobegainedfromcontributorsrepeatedlyrehearsing thehistoryatlength. 3. How representative can a corpus be? John Benjamins Publishing Company Representativenessseemstohavebecomesomethingofabugbearforcorpusre- searchers,butIamnotquitesurewhyitisfelttobeaworry.Anycorpusisa sampleoflanguageuse,andnaturallyonewantsittobeanunbiased“fairsam- ple”.Statisticianswhodiscusssamplingtalkintermsofdrawingasamplefrom a“population”–the(perhapsinfinitely)numeroussetofentitiesforwhichthe finitesampleisintendedtostandproxy.Ifthereisaworryaboutcorpusrepre- sentativeness,perhapstheproblemislessaboutsamplingtechniquesthanabout decidingwhat“population”istobesampled.Thus,forwrittenlanguageought wetothinkintermsofactsofwriting,oractsofreading(somepiecesofwritten languagearereadverymanytimes,othersonlyonce)?Orperhapstheproblem arisesbecauseoftensionsbetweengroupswhowanttouselanguagecorporafor differentpurposesandhavenotfullyrecognizedthatthesamekindofsamplewill notsuitallpurposesequally.Thewritten-languagesectionoftheBritishNational Corpusincludesquitealotofliterarywriting,sometimesdecadesold.Foraso- ciolinguistinterestedinwhatwrittenusagetheaverageBritonencounters,this Uncorrected proofs - Atwo-wayexchangebetweensyntaxandcorpora 201 mightbeinappropriate;forthedictionarypublisherswhowereamongthelead- ingsponsorsoftheBNCproject,itmaybeverydesirabletogiveextraweightto writingthatisrecognizedasmoreauthoritativethan,say,hastily-composedoffice memos.Thiswouldbeacaseofconflictinginterests;Iwonderwhether“repre-