TheVISLSystem: ResearchandapplicativeaspectsofIT-basedlearing EckhardBick e-mail:[email protected] web:http://visl.sdu.dk 1.Abstract Thepaperpresentsanintegratedinter activeuserinterfaceforteachinggrammaticalanalysis throughtheInternetmedium(VisualInteractiveLearning),developedatSouthern University, covering14differentlanguages ,halfofwhicharesupportedbylivegrammatical analysisofrunningtext.Forreasonsofrobustness,efficiencyandcorrectness,the system'sinternal toolsarebasedontheConstraintGrammarformalism(Karlsson,1990,1995),butusersarefreeto choosefromavarietyofnotationalfilters,supportingdifferentdescriptionalparadigms , witha currentteachingfocuson syntactictreestructuresand theform -functiondichotomy.Theoriginal kernelofprogramswasbuiltaroundamulti -levelparserforPortuguese(Bick, 1996, 2000) developedinadissertation frameworkatÅrhusUniversity andusedasapointofdeparturefor similarsystemsinother languages .Over thepast 5years,VISLhasgrownfromateaching initiativeintoafullblownresearch anddevelopment project withawiderangeof secondary projects,activitiesandlanguagetechnologyproducts. Examplesofapplicationorientedresearch areNLP -basedteachinggames, machineandgrammaticalspellchecking. TheVISL group has repeatedly attractedoutsidefundingfor thedevelopment of grammar teachingtools, semanticsbasedConstraintGrammarsandtheconstructionofannotatedcorpora.

Online Proceedings of NODALIDA 2001 1.Background WhentheVISLprojectstartedin1996,itsprimarygoalwastofurthertheintegration ofITtoolsandITbasedcommunicationroutinesintotheuniversitylanguage teachingmilieuatOdenseUniversity(Denmark),andmorespecifically,todevelop toolsforVisualInteractiveSyntaxLearning.Theinitiativewasfundedjointlyby CTU(Centerfor Teknologi-StøttetUddannelse)andOdenseUniversityfor3years, andthelanguagesinvolvedwereEnglish,German,FrenchandPortuguese. Alreadyintheearlystagesoftheprojectitbecameclearthatadistinction wouldhavetobemadeastowhetherthe languagedatatobeusedintheteaching interfacewouldbelimitedtextbookexamplesorunlimitednaturallanguagetext.We decidedtodevelopbotha"closed"andan"open"system,andtodesigntheteaching applicationsformaximalsynergy,suchthatt heywouldbeabletotakeinputfrom boththeclosedandopenlanguagedatasources, -anddosoinalargelylanguage independentway. Fortheclosedsystemanotationalformalismwasdevelopedthatallowedthe textualexpressionofgraphicalsyntactict reestructures,anddatabasesofmanually analysedsentenceswerebuiltforallparticipatinglanguages,theoriginaltargetbeing 500textbooksentencesand500runningtextsentences.Withthehelpofenthusiastic studentsandteachers,these"closedcorpus"databasesareconstantlybeingenlarged, andtodayVISLcovers14languages,amongthemthebasicRomanceandGermanic languagesaswellasanumberofmoreexoticspecimen,likeArabic,Japaneseand . Theopensystemisresearchbasedan dcenteredaroundtheConstraint Grammarparadigm,introducedbyFredKarlssonatHelsinkiUniversityintheearly 1990ies(Karlsson,1991,1995).The1996rolemodelforthesyntacticVISLsystem wasmyPortugueseCGsystem(Bick1996,2000),whic hfeaturedafull dependencyanalysisofsubclausestructureandaprototypeCG -to-treesyntax transformationgrammar.IhavesincedevelopedsimilarCGbasedsystemsfor Danish,SpanishandEsperanto.ForEnglishandGerman,VISLhas correcteda nd amplifiedlicensedcommercialCGsystemsfromtheFinnishsoftwarefirmLingsoft. 2.Aunifiedapproachtogrammar ThecentralprincipleofVISL'slanguageanalysisis itsfocusonsurfacestructure (expressedaseitherdependencyrelationsorsyntactictrees tructures)andtheform - functiondichotomy.FollowingBacheet.al.(1993,1999),functionsymbolsstartwith uppercaseletters,formsymbolswithlowercaseletters,andboth arecombinedina combinedcolon -separatedsymbol(text)orfunction -over-formsy mbol(graphics). Forthedependencynotation,internationalCGconventionsarefollowed,withupper caselettersforallprimarytags,usingthe@ -symboltointroducefunctiontags,and arrowheads(>,<)forheadorienteddependencymarkers.

Online Proceedings of NODALIDA 2001 VISLlightverticaltree VISLverticaltree (non-graphicalnotation) (non-graphicalnotation) UTT:cl(fcl) STA:fcl S:propVISL S:propVISL P:v(v-pr) er P:v-fin(v-pr) er Cs:g(np) Cs:np =D:art et =DN:art et =H:n forskningsprojekt =H:n forskningsprojekt =D:cl(fcl) =DN:fcl ==S:pron(pron-rel) der ==S:pron-rel der ==P:v(v-pr) involverer ==P:v-fin(v-pr) involverer ==Od:g(np) ==Od:np ===D:pron(pron-indef) mange ===DN:pron-indef mange ===D:adj forskellige ===DN:adj forskellige ===H:n sprog ===H:n sprog VISL [VISL] <*> PROPNOM @SUBJ> er [være] VPRAKT @FMV et [en] ARTNEUSIDF @>N forskningsprojekt [forskningsprojekt] NNEUSIDFNOM @ INDPnGnNNOM @SUBJ> involverer [involvere] VPRAKT&MV @FS-N< mange [mange] DETnGPNOM @>N forskellige [forskellig] ADJnGPnDNOM @>N sprog [sprog] NNEUPIDFNOM @

¡ Auxiliary(Vaux),alsoas<>(D) £

¢ Mainverb(V*,Vm),alsoas (H,K) ¥ ¤ Verbchainparticle(Vp),alsoas<>(D),simplifiedas (A)

¦ Infinitivemarker(Vi,INFM),alsoas<>(D) § Subject(S) ( § ) Formalorprovisionalsubject(Sf),possiblywiththesubclassofsituativesubject(Ss) ¨ Direct(accusative)object(Od) ( ¨ ) Formalorprovisionalobject(Of)

© Indirect(dative)object(Oi)

Prepositionalobject(Op),emneled,evt.forenkletsom ¥ (A) ⊗ Subjectcomplement(Cs),Subjectpredicative(Ps)

[⊗] Freesubjectpredicative(fPs,fCs),simplifiedas ¥ (A)

Online Proceedings of NODALIDA 2001 ⊕ Objectcomplement(Co),Objectpredicative(Po) ⊕

[ ] Freeobjectpredicative(fPo,fCo),simplifiedas ¥ (A)

¥ ¥ Adverbial(A),withpossiblesubdivisionoffree( fA)orbound( bA,bAs,bAo)

£ Head(H),Kernel(K) <> Dependents(D) Subordinator(SUB) ↔ Co-ordinator(CO) # Conjunct(CJT) «» Underspecifiedconstituentatclauselevel(e.g.clausebody) 3.Internetbasedteachingtools OnelessontobelearnedfromtheVISLproject,isthatitisnotatalleasyto introduceIT-basedtoolsintoanexistingteachingenvironment.Ap artfromhardware problems(thereneverbeingenough-compatibleandupdated-machinesintheright roomattherighttime),thereistheverycentralproblemofpsychologicalresistance againstthenewmedium,simplybecauseitmayfeeltoo"technical". Allthings technicalhaveaverylowacceptancerateintheHumanities,andteachersoftenresent thepersonalinvestmentintimeandeffortnecessarytoacquirethenecessaryskills - nottomentionchangesinteachingmaterialandexams.Thereis,ofco urse,a fundamentaldifference intermsof"technicality"betweenahumanteacheranda computerterminal,-thelatterlackstheteacher's naturalness,interactivity,flexibility and tutoringcapacities.Ontheotherhand,computersdohaveevidentteachin g advantages-theycanintegratethesenses,makinguseofcolours,picturesandsounds inamoreflexibleandimpressivemannerthanpapercan.Also,acomputerprogram can"know"more-intermsoffactsandexamples,andwithinawell -definedsubject matter-thanahumanteacher.Andlast,butnotleast,acomputersystem,especially ifaccessiblethroughtheinternet,canteachanunlimitednumberofstudentsatthe sametimeinwhatoptimallystillamountstoanindividualmanner. Giventheseadvantages,itmakessensetoinvestsomeeffortinaddressingthe fourmaindisadvantages,aslistedabove.TheVISL grammarteachinginterfacetries tomakeadvanceswithregardtothefollowingfourprinciples: (i)Flexibility TheVISLinterfaceisnotationallyf lexible,i.e.th eusercanchoosebetween severalnotationalconventions(e.g.flatdependencygrammar,enrichedtext,meta textnotation,treestructures),andmovebackandforthbetweendifferentlevelsof complexity.Forinstance,dependingontheexer cisechosen,thetypeandnumberof grammaticalcategoriesused(e.g.wordclasses)maybechanged.Inordertomake workmorecolourful,itisalsopossibletomovebetweentextbookmaterial,copied "live"texts,randomizedtestsentencesandone'sowncreativeidiolect. Inthetreestructureexamplebelow,theusercanswitchbackandforthbetween lettersymbolsandgraphicalsymbols,morethandoublethenumberofcategories,or reducethetreetoapurefunctiontree(greenonly).

Online Proceedings of NODALIDA 2001 Lettersymbols Graphicalsymbols

VISL'suniqueintegrationofteachingandresearchtoolswouldevenallowtheuserto experimentwithdifferentkindsofsubjectsor addacoupleofplaceandtime adverbialsandrerunthesentenceinfree-textmode–withexactlythesamegraphical setupandpaedagogicalfunctionality. (ii)Interactivity VISL'sjava -treeinterfaceforgrammaticalanalysisallowsthestep -by-step interactiveinspection,constructionandlabellingofsyntactictreesusingmenus, mouseclicksanddrag -and-dropmovements,allknownfromb asictextprocessor functionality. Inthe firstexamplebelow, astudenthasrecognizedthenp "minhest",buthas yettoassemble "lyst"ontothepredicator ("har")oftheadverbialsubclausetothe left.

Online Proceedings of NODALIDA 2001 Whenasentenceprovesproblematicorincomprehensible,theusercanmodifyit,or askforthecomputer 'sopinion(show -meoption).Ingrammargameslike Paintbox, PostOffice or Shoot-the-Verb, interactivityinegratesac ertainelementof competition,andisfurtherenhancedbysoundeffects,timersandhigh-scores.

Online Proceedings of NODALIDA 2001 (iii)Naturalness Amajordrawbackofmostlanguageteachingsoftware(or,forthatmatter, languageanalysissoftware)isthattheydonotrunonfree,na turallanguage,butonly onasmallsetofpredefinedsentencesorstructures("toylexica"or"toygrammars"), thatcannotbemodifiedorreplaced.IntheVISLinterface,forbetterorworse,the underlyinglexicaandgrammarscoverthewholelanguage, supportinggradualand comparativechangesinagivensentence,orconfrontingtheuserwiththestimulating lexicalfreshnessandstructuralunpredictabilityofrunningnaturaltext. Thesecondaspectofnaturalnessconcerns,asmentionedabove,"untech nical" ergonomics,andasmuchkeyboard -interactionaspossiblehasthereforebeen replacedbygraphicalandmousegovernedtools,likemenuchoicesandhelp windows.Beinginternetbased,thesystemautomaticallytakesadvantageofa browser'snavigationtools,scrollbars,pagememoryandcut'n'pastefunctionality. (iv)Tutoring Tutoringistraditionallyahumantask,anddifficulttosimulateinacomputer interface.Therefore,ithasbeenoneofthelastfeaturestobebroadlyimplementedon theVISL site.Acertainminimumoftutoringcanbeachievedsimplybyproviding guidedtours,helpwindows,clickabledefinitionsofgrammaticalterms,show -me- buttons,andreadyaccesstotopicconditionedcorpusexamples(throug hVISL's corpussearchsite).Howev er,realtutoringasksformorespecificandindividual comments.Therefore,tohelpstudentswiththetree -buildingand -labellingtask,we haveimplementedso -callederror -commentfiles ,wherepedagogicremarks(and suggestedreading -links)arestoredfor all commonandsomerarer combinationsof "correctlabelexpected"and"wronglabelchosen",aswellasfordifferenttypesof wrongattachment(phraseandclausegrouping). 4.Amethodologicalresearchparadigm Animportantdifferencebetween theVISLapproachandtraditionalschoolsof grammaristhefactthatwhatunifiesVISL'sdifferentstrandsofresearchisnot primarilyadescriptionalorinterpretativeparadigm,butamethodologicalone. ConstraintGrammarwithitsfocusoncorpusdata ,,disambiguationand wordbasedtaggingissimplyaveryrobustmethod,yieldinglowerrorratesand information-richoutputeasytohandleandfilterwithrelativelysimpletextbased computerprograms.Indescriptionalandapplicativeterms ,ConstraintGrammaris moreatoolgrammarthanatargetgrammar.Thus,attheteachinglevel,VISLuses differentrepresantationsofthesamegrammaticalinformation,forinstancegraphical treeswithform -functionnodes,wordclasscolouringorheadbas edfunction indexing,andanumberofdifferentcorpusannotationandcorpussearchschemes havebeensupportedincollaborationwithoutsideresearchpartners. ConstraintGrammarcanbethoughtofasahierarchicallyorganized progressivelevelsystemof lexicaldatabasesand grammars,dynamicallyadaptable todifferenttasksanddifferentlevelsoranglesofgrammaticaldescription. Inthe

Online Proceedings of NODALIDA 2001 tablebelow ,ahierarchyof"pure"and"applicational"modulesareshownforthe presentVISLlanguages,halfofwhichincorporateCGmodulesatdifferentlevels. Modules Languages Po En Da Sp Ge Es Fr It Ar Ja Gr Ru La Bo Morphologicalparsinglexica + (+) + + (+) (+) * * Valencylexica + (+) + + Semanticlexica + + MorphologicalCG + x + ≈+ x ≈ *+ * SyntacticCG + x+ + ≈+ ≈ ≈ CG-to-treePSGorequivalent + + + + ? PolysemiCG(partial) ? Bilingualelectroniclexica(intoTL) Da Da Es En En MachinetranslationtoTLorfromSL: Da Es Da (withtranslationmappingCG) Po Spelling/grammarcheckerCG ? ? CG-to-treecompatibleteachingcorpora + + + + + + + + + + + + + CGtaggedcorpora + + + CGbasedtreecorpora + + VISL-builtmodule (+) LexiconaspartofaclosedCGsystem,licensedformLingsoft,Helsinki x ClosedCG,licensedfromLingsoft,Helsinki x+ ClosedcommercialCGwithVISLadd-ons(correctionmodule,subclausefunctionetc.) ≈ "Cloned"fromthePortuguesePALAVRASsystem * ProbabilisticTreeDecisionTagger(HelmutSchmid&AchimStein,Stuttgart) *+ ProbabilisticTaggerwithcorrectionCG ? Partialpilotproject 5.Spin-offresults Transcendingitsoriginaltargetarea,internetbasedgrammarteachingtools,VISL hasgeneratedanumberofcollateralspin-offresultsbothtechnologicalandlinguistic. Thus,anumberofcomprehensivebilinguallexica,valency -lexicaandsemantic prototypelexicaareunderdevelopmentforseverallanguages,andGNU -licence compilersforCGandPSGarebeingmadeavailabletothepublic.VISL'scorpussite offersasearchinterfacehandlingregularexpressionsandCGtags,andtextcorpora areaccessibleinbothrawandtaggedfor mforVISL'scorelanguages.S eparatesub- projectsare theconstructionofalargefreelyaccessi bleDanishcorpus(now10 millionwords,incooperationwithDSL,Denmark )anda2millionwordtreebank forPortuguese(incooperationwiththeAC/DC-project,Oslo). VISL'scorpusmaterialispartlyintegratedintothemainsite,part lyaccessible throughaseparatesearchinterface( http://corp.hum.sdu.dk ),whichallowstheuseof regularexpressionsforrunningtext,andthecombination and chaining of word forms,baseforms,wordclass,inflexionandsyntactictagsforCG-taggedtext.

Online Proceedings of NODALIDA 2001 ThetablegivesanoverviewofVISL-productswithindifferentcoreareas: Teaching Corpusandgeneral ConstraintGrammar linguistics Programs Java-trees:Interactive Searchengineforraw flexibleCG-compiler inspection,constructionand textandCG-tagged forConstraintGrammars labellingogsyntactictrees corpora PSG-compilerforCG- Paintbox:Wordclass Filtersforanumberof to-tree-grammars colouringgame differentnotational Postoffice:Syntactic conventions functionstampinggame Shootinggallery:Selection ofgrammaticalcategoriesin movingsentences Linguistic Textbooksentences:Hand- Collectionofraw-text Englishbenchmarktext data analysedormachineanalysed corporafor6languages Port.benchmarktext andproof-read"closed CG-taggedcorporafor corpora"for14languages En,Po,Da NewDanishfreecorpus Portuguese Grammars Unifiedapproachto Corpusdrivengrammar Port.CG(ca.5000rules) grammaticalanalysisand development Dan.CG(ca.3000rules) commoncategoryinventory PSG-grammarsforCG- Spa.CG(ca.3000rules) acrosslanguages to-tree-conversion,for Esp.CG(port.clone) DanishX-and-O-symbols En,Da,(Po,Sp) Eng.add-onCG Ger.add-onCG Lexica Termbankwithdefinitions BilingualMT-lexicafor Valencylexica: ofgrammaticalcategoriesetc. runningtexttranslation: Po,Da,Sp Onlinedictionaries:Po-Da, Po-Da,Po-En,Da-En, Semanticclasslexica: Da-Po,Da-Es,Es-Da En-Da Po,Da,En Texts& Onlinegrammarmanuals, Manuals,e.g.onregular Scientificarticles,BA- documents guidedtoursandtutorials expressionincorpus andPh.D.-projects EB:GrammyiKlostermølle- searches(JMD&HK) EB:TheParsingSystem skoven(Da-En-Ge-Fr), Articles,reportsand "Palavras" PortugueseSyntaxManual evaluations

Online Proceedings of NODALIDA 2001 AmongVISL'snon -teachingapplications,machinetranslationisthemost controversialone,whilethetinyDanishspell-checkermoduleistheonethat evenat theidealevelgeneratesmostcommercialinterest.

A kindof"dictionary" translationservicecaneasilybeincorporatedas polysemy disambiguatedbaseform addedontoCGtaglines,but MTproperasks foranumberofadditionalmodules,suchastargetlanguageinflexiongeneration, syntactictransformations, instantiationofcomplextenses andsoon.Constraint Grammarfunctionshereasacontextsensitivemappingdevicefo rstructuralmarkers orspecialtranslationequivalents.

Bibliography: Bache,Carlet.al.(1999).EnglishSentenceAnalysis,København:Gyldendal Bick,Eckhard(1996). AutomaticParsingofPortuguese. InGarcía,LauraSánchez(ed.), Anais/II EncontroparaoProcessamentoComputacionaldePortuguêsEscritoeFalado. Curitiba: CEFET-PR.

Online Proceedings of NODALIDA 2001 Bick,Eckhard (1997) InternetBasedGrammarTeaching ,in:Christoffersen,Ellen&Music, Bradley(eds.), DatalingvistiskForeningsÅrsmøde1997iKolding,Proceedings, pp.86 -106. Kolding:InstitutforErhvervssprogogSprogligInformatik,HandelshøjskoleSyd Bick,Eck hard(2000 -1), PortugueseSyntax (TeachingManual), http://www.portugues.mct.pt/ Repositorio/Bick_Portuguese_Syntax3.docandhttp://visl.sdu.dk/visl/pt Bick,Eckhard(2000 -2), TheParsingSystem"Palavras" –AutomaticGrammaticalAnalysisof PortugueseinaConstraintGrammarFamework,Aarhus:AarhusUniversityPress Bick,Eckhard(2001),GrammyiKlostermølleskoven-"VISLlight":Tværsprogligsætningsanalyse forbegyndere(TeachingManual),http://visl.sdu.dk/visl/light Dienhart,John(2000),VISL-projektet:OmanvendelseafITisprogundervisningog -forskning.In: AtundervisemedIKT,pp.51-70.Gylling:NarayanaPress Karlsson,Fred(1990) . ConstraintGrammarasaFrameworkforParsingRunningText. In Karlgren,Hans(ed.), COLING-90:Paperspresentedtothe13 thInternationalConferenceon ComputationalLinguistics,Vol.3,pp.168-173.Helsinki:RUCL Karlsson,Fred,et.al.(1995). ConstraintGrammar,ALanguage -IndependentSystemforParsing UnrestrictedText.Berlin:MoutondeGruyter. Santos,Diana&EckhardBick(2000). ProvidingInternetaccesstoPortuguesecorpora:the AC/DCproject ,inMariaGavrilidouetal.(eds.), Proceedingsof theSecondInternational ConferenceonLanguageResourcesandEvaluation,LREC2000(Athens,31May-2June2000), pp.205-210. Tapanainen,Pasi(1996). TheConstraintGrammarParserCG -2. PublicationNo.27.Helsinki: DepartmentofGeneralLinguistics,UniversityofHelsinki Voutilainen,Atro&Heikkilä,Juka&Anttila,Arto(1992). ConstraintGrammarofEnglish,A Performance-OrientedIntroduction, PublicationNo.21.Helsinki:DepartmentofGeneral Linguistics,UniversityofHelsinki Voutilainen,Atro(1 994). DesigningaParsingGrammar. PublicationsNo.22.Helsinki: DepartmentofGeneralLinguistics,HelsinkiUniversity

Online Proceedings of NODALIDA 2001