Uncorrected Proofs - 198 Interview with Geoffrey Sampson

Uncorrected Proofs - 198 Interview with Geoffrey Sampson

A two-way exchange between syntax and corpora Inhiscontribution,GeoffreySampson,ProfessorEmeritusattheUniversity ofSussex(UnitedKingdom),highlightstherelationshipbetweenCorpusLin- guisticsandSyntax.Heshowshowthisbondhasatwo-waynature.Inhisview, theuseofcorporainlanguageresearchallowsonetobetterunderstandsyntac- ticissuesandthedevelopmentoflanguagecomplexity.However,theotherway isalsotrueinSampson’sviewsincehebelievesthefocusonsyntaxisoneofthe majorfactorscontributingtothegrowthofinterestinCorpusLinguistics.From amoregeneralperspective,Sampsonarguesinfavoroflinguisticsremaining acreativeactivitywhichdevelopsinunexpectedways.Asfortheprospectsof CorpusLinguistics,hepredictsitsdeath–notofthisapproachitself,butof theterm.Hebelievesthelabel‘corpuslinguistics’willdisappearwhencorpora becomejustanotherresourceavailabletolinguists. 1. Where do you place the roots of Corpus Linguistics? And to what do you attribute the growth of interest in the area? 2. Is Corpus Linguistics a science or a methodology? Where would you situate Corpus Linguistics in theJohn scientific Benjamins or Publishingmethodological Company panorama? Imusttakethesequestionstogether, becauseansweringeitheroneinvolvesdis- cussingtheother. Thefirstthingthatneedstobesaidabouttheseandtherestofthisseriesof questions(IshallbesurprisedifIamtheonlycontributorwhomakesessentially thesamepoint)isthatitismisleadingtothinkof“CorpusLinguistics”asabranch oflinguistics,alongsidesociolinguisticsorhistoricallinguistics.Corpuslinguists arejustpeoplewhostudylanguageandlanguagesinanempirical,scientificman- ner,usingwhateversourcesofempiricaldataareavailable;atthepresenttimeit happensthat,formanyaspectsoflanguage,themostusefuldatasourcesareoften electroniccorpora.Iworkalotwithcorpora,butIthinkofmyselfasalinguist, nota“corpuslinguist”.Ifsomeaspectoflanguageisbetterstudiedusingother tools,Iwillusethose. Uncorrected proofs - 198 InterviewwithGeoffreySampson The reason why corpora have become more significant in linguistics than theyusedtobeinclude:(i)theavailabilityofcomputers;(ii)changeofemphasis fromphonologytosyntax;and(iii)thebankruptcyofintuition-basedtechniques. Idiscussthesepointsinturn: Availability ofcomputers Itishardtodomuchwithacorpusunlessitisinelectronicformandyouhave accesstoacomputertoprocessandsearchit.TheBrownCorpus,thefirstelec- troniccorpus,waspublishedin1964,whichasithappenswasclosetothetime whenIbeganlearningtoworkwithcomputers–butthatwasveryunusualthen forsomeonewithahumanitiesbackground.Everyonehadheardofcomputers, butmostacademicsknewlittleaboutthemandhadcertainlyneverseenone.I remembertheairofimperfectly-concealedcondescensionwithwhichengineers andmathematiciansgreetedtheideathatsomeofusartstypeswantedtoplay withtheirmachines.Whenwemanagedtodoso,thelow-levelprogramming languagesofthosedaysandthebatch-processingapproachof1960scomputing environmentsmeantthat,althoughonecouldusecomputerstofindoutthings aboutlanguagewhichwouldbehardtodiscoveranyotherway,theprocesswas horriblyslowandcumbersomerelativetowhatispossibleandeasynow. Itwasnotuntilsometimeinthe1980sthatcomputersbegantobecomerou- tinelyavailabletolinguists.Eventhatisquiteawhileagonow;butwhenacom- plexnewtechnologydoesbecomeconvenientandwidelyavailable,itinevitably takestimeforaprofessiontoadjusttoitspossibilities.Corpus-basedtechniques havetakendecadestocatchoninlinguistics,butIamnotsurethatonecould haveexpectedtheprocesstooccur Johnfaster. Benjamins Publishing Company Change ofemphasiswithinthediscipline Untilsomepointinthe1960s,theintellectual“centreofgravity”oflinguisticslay inphonology,whichdealsmainlywithfinitesystemsofafewdozenphonemes thatcombineinalimitednumberofways.Corporadonotoffermuchtothepho- nologist.Onecansurveythepossibilitiesadequatelyusingtraditionaltechniques. Onlywiththeriseofgenerativelinguisticsdidthe“weight”ofthedisciplineshift tosyntax,whichdealswithlargenumbersofelementscombiningineffectively infinitelymanyways.Thatmeantthatoneneededtostudyverylargesamplesto haveachanceofencounteringarepresentativerangeofpossibilities,socorpus compilationbecamethewayforward. Uncorrected proofs - Atwo-wayexchangebetweensyntaxandcorpora 199 Bankruptcy ofintuition-basedtechniques Ironically,whilethegenerativemovementshiftedlinguists’attentiontoanaspect oflanguage–syntax–whichisdifficulttostudyempiricallywithouttheuseof corpora,theunempiricalstyleofresearchadvocatedbythegenerativistsledvery manylinguiststoignorethevirtuesofcorporaforalongtimeaftertheystarted becomingavailable.No-oneinthemodernworldwouldsuggestthat,say,meteo- rologistsormarinebiologistsshoulddecidewhattheirbasicdatawerewithout lookingatevidence:itistooobviousthattheweather,andmarineorganisms,are thingsindependentofusandthatwecanfindoutaboutthemonlybylooking. Languageisnotinthesamesenseindependentofhumancognition,soitmay atfirsthavebeenreasonablefortheChomskyanstobelievethatalinguistcan decidewhatisinandwhatisnotinhislanguagebyintrospection,withoutexter- nalobservation.And,aswellasarguingthatgrammar-writingcanbebasedon introspection,theycitedthe“absenceofnegativeevidence”(thatis,wedon’thear starredsentences)inordertoarguethatgrammar-writingcannotsuccessfullybe basedonobservation. Forashortwhiletheseideasmayhavebeenreasonable,butitsoonturned outthateliminatingthedependenceofscienceonobservationisjustasbadan ideainlinguisticsasinphysicalsciences.Thiswasclearatleastfromthetime when William Labov (1975) demonstrated that speakers simply do not know howtheyspeak,andthatgenerativelinguistsascribeanauthoritytotheirown judgementswhichtheymanifestlydonotpossess.Theargumentfromabsence ofnegativeevidencerepresentedamisunderstandingofhowempiricalscience works(Sampson1975);ifitwereagoodargument,nophysicalsciencewouldbe possible(Sampson2005:89–91). John Benjamins Publishing Company Bynowtherearemanycaseswherecoreelementsofnon-empiricallinguists’ theoriesrestonintuitivebeliefsthatarewildlyatvariancewithreality.Oneof NoamChomsky’sleadingargumentsforinnateknowledgeoflanguage(seee.g. Chomsky1980:40)istheclaimthat,withoutinnateknowledge,childrencould notsucceedinmasteringtheEnglishruleforformingquestions,becausestruc- turesthatareallegedlycrucialfordeterminingthecorrectrulearesorarethat onecanliveone’slifewithouteverhearinganexample.Chomskyseemstohave basedthatstatementonguesswork(or“intuition”,ifonewantstousethemore dignifiedterm).AlthoughIdonotbelievethatoneneedstoheartheseparticu- larstructurestogetthequestionruleright,Iusedthedemographically-sampled speechsectionoftheBritishNationalCorpustocheckhowrarethestructures areinreallife.Itturnedoutthatonecanexpecttohearthousandsofrelevant examplesinalifetime’sexposuretocasualchat(Sampson2005:81).Thisisnotan isolatedcaseofmismatchbetweengenerativelinguists’intuitionsandempirical Uncorrected proofs - 200 InterviewwithGeoffreySampson reality(thoughitisperhapsthemostegregiouscase,inviewofthefrequencywith whichthegenerativeliteraturehasreliedonthisbaselessassertion–cf.Pullum andScholz2002:39–40). Eveninfaceofabsurditieslikethis,quiteafewlinguistsdocontinuetocling to the idea that grammatical research can progress independently of empirical evidence.Butbynowtheyarestartingtoresembleupper-middle-classEdwardian ladieswhocannotconceiveofcookingorcleaningwiththeirownhands.Fiddling aboutwithscriptsforsearchingtextfilesorwithtaperecordingsofspontaneous speechlookslikeservants’worktosomeofthemorepreciousinhabitantsoflin- guisticsdepartments.Buttherealityofmanyareasofpresent-daylinguisticsisthat, ifonewantstomakeprogressratherthanjustgothroughthemotions,thatisthe kindofworkthathastobedone;andIthinkthisisnowobvioustomanyyounger linguists.Soitisnosurprisethatcorpusworkhasbeencomingtothefore. The remaining point in Questions 1 and 2 concerns the “roots” of corpus linguistics.DianaMcCarthyandIsurveyedthehistoricaloriginsofcorpuswork brieflyinourCorpusLinguisticsanthology(SampsonandMcCarthy2004:1–4). OnemightarguethatDrJohnson’sdictionarywasbasedinpartona“corpus”of literaryquotations,andtheworkofWilhelmKaeding(1898)seemstohavebeen aclearearlycaseofcorpuslinguisticsinthemodernsense.Butthesearematters offactandofdefinition(whatcountsasa“corpus”?),ratherthanofintellectual controversy;thereislittletobegainedfromcontributorsrepeatedlyrehearsing thehistoryatlength. 3. How representative can a corpus be? John Benjamins Publishing Company Representativenessseemstohavebecomesomethingofabugbearforcorpusre- searchers,butIamnotquitesurewhyitisfelttobeaworry.Anycorpusisa sampleoflanguageuse,andnaturallyonewantsittobeanunbiased“fairsam- ple”.Statisticianswhodiscusssamplingtalkintermsofdrawingasamplefrom a“population”–the(perhapsinfinitely)numeroussetofentitiesforwhichthe finitesampleisintendedtostandproxy.Ifthereisaworryaboutcorpusrepre- sentativeness,perhapstheproblemislessaboutsamplingtechniquesthanabout decidingwhat“population”istobesampled.Thus,forwrittenlanguageought wetothinkintermsofactsofwriting,oractsofreading(somepiecesofwritten languagearereadverymanytimes,othersonlyonce)?Orperhapstheproblem arisesbecauseoftensionsbetweengroupswhowanttouselanguagecorporafor differentpurposesandhavenotfullyrecognizedthatthesamekindofsamplewill notsuitallpurposesequally.Thewritten-languagesectionoftheBritishNational Corpusincludesquitealotofliterarywriting,sometimesdecadesold.Foraso- ciolinguistinterestedinwhatwrittenusagetheaverageBritonencounters,this Uncorrected proofs - Atwo-wayexchangebetweensyntaxandcorpora 201 mightbeinappropriate;forthedictionarypublisherswhowereamongthelead- ingsponsorsoftheBNCproject,itmaybeverydesirabletogiveextraweightto writingthatisrecognizedasmoreauthoritativethan,say,hastily-composedoffice memos.Thiswouldbeacaseofconflictinginterests;Iwonderwhether“repre-

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    16 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us