Taxonomy of XML Schema Languages Using Formal Language

Taxonomy of XML Schema Languages Using Formal Language

TaxonomyofXMLSchemaLanguagesusing FormalLanguageTheory Onthebasisofregulartreelanguages,wepresentaformalframeworkforXMLschemalanguages.This frameworkhelpstodescribe,compare,andimplementsuchschemalanguages.Ourmainresultsareasfollows:(1) fourclassesoftreelanguages,namely"local","single-type","restrainedcompetition"and"regular";(2)document validationalgorithmsfortheseclasses;and(3)classificationandcomparisonofschemalanguages:DTD,XML- Schema,DSD,XDuce,RELAXCore,andTREX. MakotoMurata,IBMTokyoResearchLab./InternationalUniversityofJapan DongwonLee,UCLA/CSD MuraliMani,UCLA/CSD 1. Introduction XMLisametalanguageforcreatingmarkuplanguages.Torepresentaparticulartypeof information,wecreateanXML-basedlanguagebydesigninganinventoryofnamesforelements andattributes.Thesenamesarethenutilizedbyapplicationprogramsdedicatedtothistypeof information. Aschemaisadescriptionofsuchaninventory:aschemaspecifiespermissiblenamesfor elementsandattributes,andfurtherspecifiespermissiblestructuresandvaluesofelementsand attributes.Advantagesofcreatingaschemaareasfollows:(1)theschemapreciselydescribes permissibleXMLdocuments,(2)computerprogramscandeterminewhetherornotagivenXML documentispermittedbytheschema,and(3)wecanusetheschemaforcreatingapplication programs(bygeneratingcodeskeletons,forexample).Thus,schemasplayaveryimportantrole indevelopmentofXML-basedapplications. Severallanguagesforwritingschemas,whichwecallschemalanguages,havebeenproposedin thepast.Somelanguages(e.g.,DTD)areconcernedwithXMLdocumentsingeneral;thatis, theyhandleelementsandattributes.Otherlanguagesareconcernedwithparticulartypeof informationwhichmayberepresentedbyXML;primaryconstructsforsuchinformationarenot elementsorattributes,butratherconstructsspecifictothattypeofinformation.RDFSchemaof W3Cisanexampleofsuchaschemalanguage.SinceprimaryconstructsforRDFinformation areresources,properties,andstatements,RDFSchemaisconcernedwithresources,properties, andstatementsratherthanelementsandattributes.Inthispaper,welimitourconcerntoschema languagesforXMLdocumentsingeneral(i.e.,elementsandattributes);specifically,weconsider DTD[BPS00],XML-Schema[TBMM00],DSD[KMS00],RELAXCore[Mur00b],andTREX [Cla01].1Althoughschemalanguagesdedicatedtoparticulartypesofinformation(e.g.,RDF, XTM,andER)areusefulforparticularapplications,theyareoutsidethescopeofthispaper. Webelievethatprovidingaformalframeworkiscrucialinunderstandingvariousaspectsof schemalanguagesandfacilitatingefficientimplementationsofschemalanguages.Weuse regulartreegrammartheory([Tak75]and[CDG+97])toformallycaptureschemas,schema languages,anddocumentvalidation.Regulartreegrammarshaverecentlybeenusedbymany ExtremeMarkupLanguages2000 1 MakotoMurata,DongwonLeeandMuraliMani•TaxonomyofXMLSchemaLanguagesusingFormalLanguage Theory researchersforrepresentingschemasorqueriesforXMLandhavebecomethemainstreamin thisarea(seeOASIS[OA01]andVianu[VV01]);inparticular,XMLQuery[CCF+01]ofW3C isbasedontreegrammars. Ourcontributionsareasfollows: 1. Wedefinefoursubclassesofregulartreegrammarsandtheircorrespondinglanguagesto describeschemasprecisely; 2. Weshowalgorithmsforvalidationofdocumentsagainstschemasforthesesubclasses andconsiderthecharacteristicsofthesealgorithms(e.g.,thetreemodel.vs.theevent model);and 3. Basedonregulartreegrammarsandthesevalidationalgorithms,wepresentadetailed analysisandcomparisonofafewXMLschemaproposalsandtypesystems;anXML schemaproposalAismoreexpressivethananotherproposalBifthesubclasscapturedby AproperlyincludesthatcapturedbyB. Theremainderofthispaperisorganizedasfollows.InSection2,weconsiderrelatedworks suchasothersurveypapersonXMLschemalanguages.InSection3,wefirstintroduceregular treelanguagesandgrammars,andthenintroducerestrictedclasses.InSection4,weintroduce validationalgorithmsforthefourclasses,andconsidertheircharacteristics.InSection5,onthe basisoftheseobservations,weevaluatedifferentXMLschemalanguageproposals.Finally, concludingremarksandthoughtsonfutureresearchdirectionsarediscussedinSection6. 2. RelatedWork MorethantenschemalanguagesforXMLhaveappearedrecently,and[Jel00]and[LC00] attempttocompareandclassifysuchXMLschemaproposalsfromvariousperspectives. However,theirapproachesarebyandlargenotmathematicalsothattheprecisedescriptionand comparisonamongschemalanguageproposalsarenotstraightforward.Ontheotherhand,this paperfirstestablishesaformalframeworkbasedonregulartreegrammars,andthencompares schemalanguageproposals. SinceKilhoShinadvocateduseoftreeautomataforstructureddocumentsin1992,many researchershaveusedregulartreegrammarsortreeautomataforXML(seeOASIS[OA01]and Vianu[VV01]).However,tothebestofourknowledge,nopapershaveusedregulartree grammarstoclassifyandcompareschemalanguageproposals.Furthermore,weintroduce subclassesofregulartreegrammars,andpresentacollectionofvalidationalgorithmsdedicated tothesesubclasses. XML-SchemaFormalDescription(formerlycalledMSL[BFRW01])isamathematicalmodelof XML-Schema.However,itistailoredforXML-Schemaandisthusunabletocaptureother schemalanguages.Meanwhile,ourframeworkisnottailoredforaparticularschemalanguage. Asaresult,allschemalanguagescanbecaptured,althoughfinedetailsofeachschemalanguage arenot. ExtremeMarkupLanguages2000 2 MakotoMurata,DongwonLeeandMuraliMani•TaxonomyofXMLSchemaLanguagesusingFormalLanguage Theory 3. TreeGrammars Inthissection,asamechanismfordescribingpermissibletrees,westudytreegrammars.We beginwithaclassoftreegrammarscalled"regular",andthenintroducethreerestrictedclasses called"local","single-type",and"restrained-competition". Somereadersmightwonderwhywedonotusecontext-free(string)grammars.Context-free (string)grammars[HU79]representsetsofstrings.Successfulparsingofstringsagainstsuch grammarsprovidesderivationtrees.Thisscenarioisappropriateforprogramminglanguages andnaturallanguages,whereprogramsandnaturallanguagetextarestringsratherthantrees. Ontheotherhand,starttagsandendtagsinanXMLdocumentcollectivelyrepresentatree. Sincetraditionalcontext-free(string)grammarsareoriginallydesignedtodescribepermissible strings,theyareinappropriatefordescribingpermissibletrees. 3.1 RegularTreeGrammarsandLanguages Weborrowthedefinitionsofregulartreelanguagesandtreeautomatain[CDG+97],butallow treeswith"infinitearity";thatis,weallowanodetohaveanynumberofsubordinatenodes,and allowtheright-handsideofaproductionruletohavearegularexpressionovernon-terminals. Definition1.(RegularTreeGrammar)Aregulartreegrammar(RTG)isa4-tupleG=(N,T, S,P),where: • Nisafinitesetofnon-terminals, • Tisafinitesetofterminals, • Sisasetofstartsymbols,whereSisasubsetofN. • PisafinitesetofproductionrulesoftheformX→ar,whereX∈N,a∈T,andrisa regularexpressionoverN;Xistheleft-handside,aristheright-handside,andristhe contentmodelofthisproductionrule. Example1.ThefollowinggrammarG1=(N,T,S,P)isaregulartreegrammar.Theleft-hand side,right-handside,andcontentmodelofthefirstproductionruleisDoc,Doc(Para1, Para2*),and(Para1,Para2*),respectively. N={Doc,Para1,Para2,Pcdata} T={doc,para,pcdata} S={Doc} P={Doc→doc(Para1,Para2*),Para1→para(Pcdata), Para2→para(Pcdata),Pcdata→pcdata } Werepresenteverytextvaluebythenodepcdataforconvenience. Withoutlossofgenerality,wecanassumethatnotwoproductionruleshavethesamenon- terminalintheleft-handsideandthesameterminalintheright-handsideatthesametime.Ifa regulartreegrammarcontainssuchproductionrules,weonlyhavetomergethemintoasingle ExtremeMarkupLanguages2000 3 MakotoMurata,DongwonLeeandMuraliMani•TaxonomyofXMLSchemaLanguagesusingFormalLanguage Theory productionrule.Wealsoassumethateverynon-terminaliseitherastartsymboloroccursinthe contentmodelofsomeproductionrule(inotherwords,nonon-terminalsareuseless). Wehavetodefinehowaregulartreegrammargeneratesasetoftreesoverterminals.Wefirst defineinterpretations. Definition2.(Interpretation)AninterpretationIofatreetagainstaregulartreegrammarGis amappingfromeachnodeeinttoanon-terminal,denotedI(e),suchthat: • I(eroot)isastartsymbolwhereerootistherootoft,and X→ar • foreachnodeeanditssubordinatese0,e1,...,ei,thereexistsaproductionrule suchthat • I(e)isX, • theterminal(label)ofeisa,and • I(e0)I(e1)...I(ei)matchesr. Now,wearereadytodefinegenerationoftreesfromregulartreegrammars,andregulartree languages. Definition3.(Generation)AtreetisgeneratedbyaregulartreegrammarGifthereisan interpretationoftagainstG. Example2.AninstancetreegeneratedbyG1,anditsinterpretationagainstG1areshownin Figure1. d1 Doc p1 p2 Para1 Para2 pcdata1 pcdata2 Pcdata Pcdata a) Instance tree, t b) Interpretation of t generated by G against G Figure1. AninstancetreegeneratedbyG1,anditsinterpretationagainstG1.Weuseunique labelstorepresentthenodesintheinstancetree. Definition4.(RegularTreeLanguage)Aregulartreelanguageisthesetoftreesgeneratedby aregulartreegrammar. ExtremeMarkupLanguages2000 4 MakotoMurata,DongwonLeeandMuraliMani•TaxonomyofXMLSchemaLanguagesusingFormalLanguage Theory 3.2 LocalTreeGrammarsandLanguages Wefirstdefinecompetitionofnon-terminals,whichmakesvalidationdifficult.Then,we introducearestrictedclasscalled``local''byprohibitingcompetitionofnon-terminals[Tak75]. ThisclassroughlycorrespondstoDTD. Definition5.(CompetingNon-Terminals)Twodifferentnon-terminalsAandBaresaid competingwitheachotherif • oneproductionrulehasAintheleft-handside, • anotherproductionrulehasBintheleft-handside,and • thesetwoproductionrulessharethesameterminalintheright-handside. Example3.ConsideraregulartreegrammarG3=(N,T,S,P),where: N={Book,Author1,Son,Article,Author2,Daughter} T={book,author,son,daughter} S={Book,Article} P={Book→book(Author1),Author1→author(Son),Son→son , Article→article(Author2),Author2→author(Daughter), Daughter→daughter } Author1andAuthor2competewitheachother,sincetheproductionruleforAuthor1andthatfor Author2sharetheterminalauthorintheright-handside.Therearenoothercompetingnon-

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    25 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us