<<

Relational,,andComplexity

PhokionG.Kolaitis UniversityofCalifornia,SantaCruz & IBMResearchAlmaden [email protected]

2009GIIDoctoralSchoolonAdvancesinDatabases

1 Whatthiscourseisabout

Goals:

 Coveracoherentbodyofbasicmaterialinthefoundationsof relationaldatabases

 Preparestudentsforfurtherstudyandresearchinrelational systems OverviewofTopics:

 Databasequery:expressivepowerandcomplexity

 RelationalandRelational

 Conjunctivequeriesandhomomorphisms

 Recursivequeriesand  Selectedadditionaltopics:Bag,InconsistentDatabases UnifyingTheme: Theinterplaybetweendatabases,logic,andcomputationalcomplexity

2 RelationalDatabases:AVeryBriefHistory

 Thehistoryofrelationaldatabasesis thehistoryofascientificand technologicalrevolution. EdgarF.Codd,19232003

 Thescientificrevolutionstartedin1970 byEdgar(Ted)F.Codd attheIBMSan JoseResearchLaboratory(nowtheIBM Almaden ResearchCenter)

 Codd introducedtherelationaldata modelandtwodatabasequery languages: relationalalgebra and relationalcalculus .  “Arelationalmodelfordataforlarge shareddatabanks”,CACM,1970.  “Relationalofdata basesublanguages”,in:Database Systems,ed.byR.Rustin,1972.

3 RelationalDatabases:AVeryBriefHistory

 ResearchersattheIBMSanJoseLaboratoryembarkonthe SystemR project,thefirstimplementationofarelationaldatabase managementsystem(RDBMS)

 In19741975,theydevelop SEQUEL ,aquerythat eventuallybecametheindustrystandard SQL .

 SystemRevolvedto DB2 – releasedfirstin1983.

 M.Stonebraker andE.Wongembarkonthedevelopmentofthe RDBMSatUCBerkeleyin1973.

 Ingresiscommercializedin1983;later,itbecame PostgreSQL ,a freesoftwareOODBMS(objectorientedDBMS).

 .Ellisonfoundsacompanyin1979thateventuallybecomes Oracle Corporation ;OracleV2isreleasedin1979andOracleV3in1983.

 TedCodd receivesthe ACMTuringAward in1981.

4 RelationalDatabaseIndustryToday

 AccordingtoGartner,Inc.,June Company 2006 2006 2007: Revenue Market Share “Worldwiderelationaldatabase Oracle 7.168B 47.1% managementsystems(RDBMS) totalsoftwarerevenuetotaled IBM 3.204B 21.1% $15.2billionin2006,a14.2 percentincrease2005 Microsoft 2.654B 17.4% revenueof$13.3billion.” Teradata 494.2M 3.2%  In2007,thetotalRDBMS Sybase 486.7M 3.2% softwarerevenueincreasedto $17.1billion(figuresreleasedin Other 1.2B 7.8% July2008). Total 15.2B 100%

5 DatabaseResearchToday

 Averyvibrantcommunitycomprisingseveralthousandresearchers aroundtheworld.

 Severalmajorannualconferencesindatabaseresearch:  SIGMOD,PODS,VLDB,ICDE,EDBT,ICDT(topsix).  Numerousotherconferencesandworkshops.

 Severalmajorscholarlyjournalsdedicatedtodatabaseresearch:  ACMTODS,VLDBJournal,IEEETKDE,…

 Strongdatabaseresearchgroupsinacademiaaroundtheworld.

 Severaldatabaseresearchgroupsinindustriallaboratories.

6 DatabaseManagementSystems

A databasemanagementsystem(DBMS) providessupportfor:

 Atleastonedatamodel(amathematicalabstractionfor representingdata);

 Atleastonehighleveldatalanguage(languagefordefining, updating,manipulating,andretrievingdata);

 Transactionmanagement&concurrencycontrolmechanisms;

 Accesscontrol(limitaccessofcertaindatatocertainusers);

 Resiliency(abilitytorecoverfromcrashes).

7 DataModelsandDataLanguages

 A datamodel isamathematicalfordescribingand representingdata.  Adatamodelisaccompaniedbya datalanguage thathastwo parts:a datalanguage anda datamanipulationlanguage .  A datadefinitionlanguage(DDL) hasafordescribing “databasetemplates” intermsoftheunderlyingdatamodel.  A datamanipulationlanguage(DML )supportsthefollowing operationsondata:  Insertion  Deletion  Update  Retrievalandextractionofdata(querythedata).  Thefirstthreeoperationsarefairlystandard.However,thereis muchvarietyondataretrievalandextraction( querylanguages ).

8 TheRelationalDataModel(E.F.Codd – 1970)

 TheRelationalDataModelusesthemathematicalconceptofa astheformalismfordescribingandrepresentingdata.

 Question :Whatisarelation?

 Answer:

 Formally,a relation isaofaofsets.

 Informally,arelationisa “” withrowsandcolumns.

9 TheRelationalDataModel(E.F.Codd – 1970)

 TheRelationalDataModelusesthemathematicalconceptofa relation as theformalismfordescribingandrepresentingdata.  Question :Whatisarelation?  Answer:

 Formally,a relation isasubsetofacartesian productofsets.

 Informally,arelationisa “table” withrowsandcolumns.

CHECKING-ACCOUNT Table branch accountno customername balance

Orsay 1099106284 Abiteboul $3,567.53 Hawthorne 1099235671 Hull $11,245.75 … … … …

10 BasicNotionsfromDiscrete

 A k isanorderedsequenceofkobjects(neednotbedistinct)

 (2,0,1)isa3tuple;(a,b,a,a,c)isa5tuple,andsoon.

 IfD 1,D 2,…,Dk areksets,thenthe cartesian product

D1 × D2 … × Dk ofthesesetsistheofallk

(d 1,d 2,…,dk)suchthatdi ∈ Di,for1 ≤ i ≤ k.

 Fact: Let|D|denotethe(=#ofelements)ofasetD.

Then|D 1 × D2 × … × Dk|=|D 1|× |D 2| × …× |Dk|.

 Example: IfD 1 ={0,1}andD 2 ={a,b,c,d},then|D 1|×|D 2|=8.

 Warning: Computingcartesian productsisan expensive operation!

11 BasicNotionsfromDiscreteMathematics

 A kary relation Risasubsetofacartesian productofksets,i.e.,

 R ⊆ D1× D2× … × Dk.

 Examples:  UnaryR={0,2,4,…,100}(R ⊆ D)

 BinaryT={(a,b):aandbhavethesamebirthday}

 TernaryS={(m,n,s):s=m+n}

 …

12 RelationsandAttributes

 Note:

R ⊆ D1× D2× … × Dk canbeviewedasatablewithkcolumns

Table S

3 5 8

150 100 250 … … …

 In the relational , we want to have for the columns; these are the attributes of the relation.

13 RelationSchemasandRelationalDatabaseSchemas

 Akary relationschema R(A 1,A 2,…,A K)isaset{A 1,A 2,…,Ak}of kattributes.

 COURSE (courseno,coursename,,instructor,room,time)

 CITYINFO (name,state,population) Thus,akary relationschemaisa“blueprint”,a“template” forsome kary relation.

 An instanceofarelationschema isarelationconformingtotheschema (arities match;also,inDBMS,datatypesofattributesmatch).

 A relationaldatabaseschema isasetofrelationschemas R (A ,A ,…,A ), i 1 2 ki for1 ≤ i≤ m.

 A relationaldatabaseinstance ofarelationalschemaisasetofrelationsRi eachofwhichisaninstanceoftherelationschema Ri,1 ≤ i≤ m.

14 RelationalDatabaseSchemas Examples

 BANKINGrelationaldatabaseschemawithrelationschemas

 CHECKINGACCOUNT(branch,accno,custid,balance)

 SAVINGSACCOUNT(branch,accno,custid,balance)

 CUSTOMER(custid,name,address,phone,email)

 ….

 UNIVERSITYrelationaldatabaseschemawithrelationschemas

 STUDENT(studentid,studentname,major,status)

 FACULTY(facultyid,facultyname,dpt,title,salary)

 COURSE(courseno,coursename,term,instructor)

 ENROLLS(studentid,courseno,term)

 …

15 Schemasvs.Instances

Keepinmindthatthereisa cleardistinction between

 relationschemasandinstancesofrelationschemas andbetween

 relationaldatabaseschemasandrelationaldatabaseinstances.

SyntacticNotion SemanticNotion (discretemathematicsnotion) RelationSchema Instanceofarelationschema (i.e.,arelation) RelationalDatabaseSchema Relationaldatabaseinstance (i.e.,adatabase)

16 ProgrammingLanguagesParadigms

Therearetwomainparadigmsofprogramminglanguages: imperative(orprocedural)languagesanddeclarativelanguages.

 Imperative(Procedural)Languages: programsareexpressedby specifying how thetaskistobeaccomplished(sequenceof operations).

 FORTRAN,C,…

 DeclarativeLanguages: programsareexpressedbyspecifying what hastobeaccomplished(asopposedto“how”).

 LISP(functionalprogramming),(logicprogramming),…

17 QueryLanguagesfortheRelationalDataModel

Codd introducedtwodifferentquerylanguagesfortherelationaldata model:

 RelationalAlgebra ,whichisa procedural language.  Itisan algebraicformalism inwhichqueriesareexpressedby applyingasequenceofoperationstorelations.

 RelationalCalculus ,whichisa declarative language.  Itisa logicalformalism inwhichqueriesareexpressedas formulasoffirstorderlogic.

Codd’s :RelationalAlgebraandRelationalCalculusare essentiallyequivalentintermsofexpressivepower. (butwhatdoesthisreallymean?)

18 DesiderataforaDatabaseQueryLanguage

Desiderata:  Thelanguageshouldbesufficientlyhighleveltosecure physical dataindependence ,i.e.,theseparationbetweenthe physicallevel andthe conceptuallevel ofdatabases.  Thelanguageshouldhavehighenough expressivepower tobe abletoposeusefulandinterestingqueriesagainstthedatabase.  Thelanguageshouldbe efficientlyimplementable toallowforthe fastretrievalofinformationfromthedatabase.

Warning:  Thereisa tension betweenthelasttwodesiderata.  Increasein expressivepower comesattheexpenseof efficiency .

19 RelationalAlgebra

 Relationalalgebrastrikesa goodbalance betweenexpressivepower andefficiency.

 Codd’s contributionwastoidentifyasmallsetof basic operations onrelationsandtodemonstratethatusefuland interestingqueriescanbeexpressedby combining theseoperations.  Thus,relationalalgebraisarichenoughlanguage,eventhough, aswewillseelateron,itsuffersfromcertain limitations interms ofexpressivepower.

 ThefirstRDBMSprototypeimplementations(SystemRandIngres) demonstratedthattherelationalalgebraoperationscanbe implementedefficiently.

20 TheFiveBasicOperationsofRelationalAlgebra

 GroupI: Threestandardsettheoreticbinaryoperations:



 Difference

 CartesianProduct.

 GroupII. Twospecialunaryoperationsonrelations:



 Selection.

 RelationalAlgebra consistsofallexpressionsobtainedbycombining thesefivebasicoperationsinsyntacticallycorrectways.

21 RelationalAlgebra:StandardSetTheoreticOperations

 Union  Input: Twokary relationsRandS,forsomek.  Output: Thekary relationR ∪ S,

R ∪ S={(a 1,…,ak):(a 1,…,ak)isinRor(a 1,…,ak)isinS}

 Difference:  Input: Twokary relationsRandS,forsomek.  Output: Thekary relationR S,where

R S={(a 1,…,ak):(a 1,…,ak)isinRand(a 1,…,ak)isnotinS}

 Note:  Inrelationalalgebra,bothtotheunionandthedifference mustberelationsofthesamearity.

 InSQL,thereistheadditionalrequirementthatthecorresponding attributesmusthavethesamedatatype.

22 RelationalAlgebra:StandardSetTheoreticOperations

 CartesianProduct

 Input: Anmary relationRandannary relationS

 Output: The(m+n)ary relationR × S,where

R × S={(a 1,…,a m,b 1,…,bn):(a 1,…am)isinRand(b 1,…,bn)isinS}

 Note: Asstatedearlier, |R × S|=|R| × |S|

23 TheProjectionOperator

 Motivation: Itisoftenthecasethat,givenatableR,onewantsto:

 Rearrangetheorderofthecolumns

 Suppresssomecolumns

 Dobothoftheabove.

 Fact: The ProjectionOperation istailoredforthistask

24 TheProjectionOperation

 Projection isafamilyofunaryoperationsoftheform

π ()

 Theintuitiveoftheprojectionoperationisasfollows:

 WhenprojectionisappliedtoarelationR,itremovesallcolumns whoseattributesdo not appearinthe.

 Theremainingcolumnsmayberearrangedaccordingtothe orderinthe.

 Anyduplicaterowsarealsoeliminated.

25 TheProjectionOperation Example

SAVINGS

π (SAVINGS) branch accno cust balance cust-name,branch-name name name Aptos 153125 Vianu 3,450 custname branchname Vianu Aptos Santa 123658 Hull 2,817 Cruz Hull SantaCruz San 321456 Codd 9,234 Codd SanJose Jose San 334789 Codd 875 Jose

26 MoreontheSyntaxoftheProjectionOperation

 Inrelationalalgebra,attributescanbereferencedbypositionno.

 ProjectionOperation:

 Syntax: π (R),whereRisofarity k,andi_1,….i_m are i1,…,im distinctfrom1uptok.

 Semantics: π (R)={(a ,…,a ):thereisatuple (b ,…,b )inRsuchthat i1,…,im 1 m 1 k a =b ,…,a =b } 1 i1 m im

 Example: IfRisR(A,B,C,D),then πC,A (R)= π3,1 (R)

27 TheSelectionOperation

 Motivation:

 Considerthetable SAVINGS(branchname,accno,custname,balance)

 Wemaywanttoextractthefollowinginformationfromit:

 FindallrecordsintheAptosbranch

 Findallrecordswithbalanceatleast$50,000

 FindallrecordsintheAptosbranchwithbalancelessthan $1,000

 Fact: The SelectionOperation istailoredforthistask.

28 TheSelectionOperation

 Selection isafamilyofunaryoperationsoftheform

σΘ (R), whereRisarelationand Θ isa thatcanbeappliedasa testtoeachofR.

 WhenaselectionoperationisappliedtoR,itreturnsthesubsetofR consistingofallrowsthatsatisfythecondition Θ

 Question: Whatistheprecisedefinitionofa“condition”?

29 TheSelectionOperation

 Definition: A condition intheselectionoperationisan builtupfrom:

 Comparisonoperators=,<,>, ≠,≤,≥ appliedtooperands thatareconstantsorattributenamesorcomponentnumbers.

 Thesearethe basic(atomic)clauses oftheconditions.

 TheBooleanlogicoperators Æ, Ç, ¬ appliedtobasicclauses.

 Examples:

 balance>10,000

 branchname=“Aptos”

 (branchname=“Aptos”) Æ (balance<1,000)

30 TheSelectionOperator

 Note:

 Theuseofthecomparisonoperators<,>,≤,≥ assumesthat theunderlyingdomainofvaluesis totallyordered .

 Ifthedomainisnottotallyordered,then only =and≠ are allowed.

 Ifwedonothaveattributenames(hence,wecanonly columnsviatheircomponentnumber),thenweneedtohavea special,say$,infrontofacomponentnumber.Thus,

 $4>100isameaningfulbasicclause

 $1=“Aptos” isameaningfulbasicclause,andsoon.

31 RelationalAlgebra

 Definition: A relationalalgebraexpression isastringobtainedfrom relationschemasusingunion,difference,cartesian product, projection,andselection.

 Contextfreegrammarforrelationalalgebraexpressions:

E:=R,S,… |(E 1 Ç E2)|(E 1 –E2)|(E 1× E2)| πL (E)| σΘ (E), where  R,S,… arerelationschemas  Lisalistofattributes  Θ isacondition.

32 StrengthfromUnityandCombination

 Byitself,eachbasicrelationalalgebraoperationhaslimited expressivepower,asitcarriesoutaspecificandrathersimple task.

 Whenusedincombination,however,thefiverelationalalgebra operationscanexpressinterestingand,quiteoften,rathercomplex queries.

 Derivedrelationalalgebraoperations areoperationsonrelations thatareexpressibleviaarelationalalgebraexpression(builtfrom thefivebasicoperators).

33 Intersection

 Intersection

 Input: Twokary relationsRandS,forsomek.

 Output: Thekary relationR S,where

R S={(a 1,…,ak):(a 1,…,ak)isinRand(a 1,…,ak)isinS}

 Fact: R S=R– (R– S)=S– (S– R)

Thus,intersectionisaderivedrelationalalgebraoperation.

34 NaturalJoin

 Fact: ThemostFAQs againstdatabasesinvolvethe natural operation ⋈.

 MotivatingExample: Given TEACHES(facname,course,term)and ENROLLS(studname,course,term),

wewanttoobtain TAUGHTBY(studname,course,term,facname)

ItturnsoutthatTAUGHTBY=ENROLSS ⋈ TEACHES

35 NaturalJoin

GivenTEACHES(facname,course,term)and ENROLLS(studname,course,term): TocomputeTAUGHTBY(studname,course,term,facname)

1. ENROLLS × TEACHES

2. σ T.course =E.course Æ T.term =E.term (ENROLLS × TEACHES)

3. π studname,E.course,E.term,facname

(σ T.course =E.course Æ T.term =E.term (ENROLLS × TEACHES))

TheresultisENROLLS ⋈ TEACHES.

36 NaturalJoin

 Definition: LetA1,…,Ak bethecommonattributesoftworelation schemasRandS.Then

R ⋈ S= π (σ R.A1=S.A1 Æ … Æ R.A1=S.Ak (R ×S)), wherecontainsallattributesofR ×S,exceptfor S.A1,…,S.Ak (inotherwords,duplicatecolumnsareeliminated).  forR ⋈ S: Foreverytuple inR,compareitwitheverytuple inSasfollows:  testiftheyagreeonallcommonattributesofRandS;  iftheydo,takethetuple inR × Sformedbythesetwotuples,  removeallvaluesofattributesofSthatalsooccurinR;  puttheresultingtuple inR ⋈ S.

37 Quotient(Division)

 MotivatingExample: GivenENROLLS(studname,course)andTEACHES(facname,course), findthenamesofstudentswhotakeeverycoursetaughtbyV. Vianu.

 OtherMotivatingExamples:  Findthenamesofcustomerswhohaveanaccountinevery branchofWachoviainSanJose.  FindthenamesofNetflixcustomerswhohaverentedeveryfilm inwhichPaulNewmanstarred.

 Theseandothersimilarqueriescanbeansweredusingthe Quotient(Division )operation.

38 Quotient(Division)

 Definition: LetRbearelationofarity randletSbearelationof arity s,wherer>s. The quotient (or division )R÷ Sistherelationofarity r– s consistingofalltuples (a 1,…,ars)suchthatforeverytuple (b 1,…,bs) inS,wehavethat(a 1,…,ars,b 1,…,bs)isinR.

 Example: Given ENROLLS(studname,course)andTEACHES(facname,course),find thenamesofstudentswhotakeeverycoursetaughtbyV.Vianu.  FindthecoursestaughtbyV.Vianu

πcourse (σ facname=“V.Vianu” (TEACHES))  Thedesiredanswerisgivenbytheexpression:

ENROLLS÷ πcourse (σ facname=“V.Vianu” (TEACHES))

39 Quotient(Division)

Fact: Thequotientoperationisexpressibleinrelationalalgebra.

Proof: Forconcreteness,assumethatRhasarity 5andShasarity 2.

KeyIdea: Usethe differenceoperation

 R÷S= π1,2,3 (R)– “tuples in π1,2,3 (R)thatdonotmakeittoR÷S”

 Considertherelationalalgebraexpression( π1,2,3 (R) ×S)– R.

Intuitively,itisthesetofalltuples that fail thetestformembership inR÷S.Hence,

 R÷S= π1,2,3 (R)– π1,2,3 (( π1,2,3 (R) ×S)– R).

40 TheExpressivePowerofRelationalAlgebra

 Whencombinedtogether,thefivebasicrelationalalgebra operationscanexpressinterestingandcomplexqueries.

 Inparticular,relationalalgebracanexpress:

 TheIntersectionOperation

 TheNaturalJoinOperation

 TheQuotientOperation

 ….

41 IndependenceoftheBasicRelationalAlgebraOperations

 Question: Areallfivebasicrelationalalgebraoperationsreally needed?Canoneofthembeexpressedintermsoftheotherfour?

 Theorem: Eachofthefivebasicrelationalalgebraoperationsis independent oftheotherfour,thatis,it cannot beexpressedbya relationalalgebraexpressionthatinvolvesonlytheotherfour.

ProofIdea: Foreachrelationalalgebraoperation,weneedto discovera property thatispossessedbythatoperation,butis not possessedbyanyrelationalalgebraexpressionthatinvolvesonly theotherfouroperations.

42 IndependenceoftheBasicRelationalAlgebraOperations

Theorem: Eachofthefivebasicrelationalalgebraoperationsis independent oftheotherfour,thatis,it cannot beexpressedbya relationalalgebraexpressionthatinvolvesonlytheotherfour.

ProofSketch:(projectionandcartesian productonly)  Propertyofprojection:  Itistheonlyoperationwhoseoutputmayhavearity smaller thanitsinput.  Show,byinduction,thattheoutputofeveryrelationalalgebraexpression intheotherfourbasicrelationalalgebraisofarity atleastasbig asthe maximumarity ofitsarguments.

 Propertyofcartesian product:  Itistheonlyoperationwhoseoutputhasarity bigger thanitsinput.  Show,byinduction,thattheoutputofeveryrelationalalgebraexpression intheotherfourbasicrelationalalgebraisofarity atmostasbig asthe maximumarity ofitsarguments.

Exercise: Completethisproof.

43 RelationalAlgebra:Summary

 Whencombinedwitheachother,thefivebasicrelationalalgebra operationscanexpressinterestingandcomplexqueries(natural join,quotient,…)

 Thefivebasicrelationalalgebraoperationsareindependentofeach other:nonecanbeexpressedintermsoftheotherfour.

 So,inconclusion,Codd’s choiceofthefivebasicrelationalalgebra operationshasbeenveryjudicious.

44 RelationalCompleteness

 Definition (Codd – 1972):AdatabasequerylanguageLis relationallycomplete ifitisatleastasexpressiveasrelational algebra,i.e.,everyrelationalalgebraexpressionEhasanequivalent expressionFinL.

 Relationalcompletenessprovidesa benchmark fortheexpressive powerofadatabasequerylanguage.

 Everycommercialdatabasequerylanguageshouldbeatleastas expressiveasrelationalalgebra.

 Exercise: ExplainwhySQLisrelationallycomplete.

45 SQLvs.RelationalAlgebra

SQL RelationalAlgebra SELECT Projection π FROM CartesianProduct × WHERE Selection σ

SemanticsofSQLviatoRelationalAlgebra

SELECTR i1 .A1,…,Rim .A.m × × FROMR 1,…,R K= π Ri1 .A1,…,Rim .A.m (σΨ (R 1 … RK)) WHERE Ψ

46 RelationalCalculus

 Inadditiontorelationalalgebra,Codd introduced relationalcalculus .

 Relationalcalculusisadeclarativedatabasequerylanguagebased on firstorderlogic .

 Relationalcalculuscomesintotwodifferentflavors:  Tuple relationalcalculus  Domainrelationalcalculus. Wewillfocusondomainrelationalcalculus. Thereisaneasytranslationbetweenthesetwoformalisms.

 Codd’s maintechnicalresultisthatrelationalalgebraandhaveessentiallythesameexpressivepower.

47 PropositionalLogic(akaBooleanLogic)Reminder

 Propositionalvariables :x,y,z,…  Theytakevalues0(True)and1(False).

 Propositionalconnectives: Æ, Ç, ¬, →

 Propositionalformulas: expressionsbuiltfrompropositional variablesandpropositionalconnectives  Syntax: ϕ :=x,y,z,… |( ψ Æ χ)|( ψ Ç χ)| ¬ ψ |( ψ → χ)  Semantics: tablesemantics

 Application: PropositionalformulasexpressBooleanfunctions  (x Ç y) Æ (¬ x Ç ¬ y) XORGate  (x Æ y) Ç (x Æ z) Ç (y Æ z) MajorityGate

48 FirstOrderLogic Motivation

 FirstOrderLogic isaformalismforexpressingpropertiesof mathematicalstructures(graphs,trees,partialorders,…).

 Example: ConsideragraphG=(V,E)(nodesareinV,edgesareinE)  Thereisaselfloop.  Everytwonodesareconnectedviaapathoflength2.  Everynodehasexactlythreedistinctneighbors.  Thereisapathoflength3fromnodextonodey.  Nodexhasatleastfourdistinctneighbors Theseandmanyothersimilarpropertiesareexpressibleas formulasoffirstorderlogicongraphs.

 OneofCodd’s keyinsightswasthatfirstorderlogiccanalsobe usedtoexpressrelationaldatabasequeries.

49 FirstOrderLogic

 Question: WhatisFirstOrderLogic?

 Answer: Informally,

“ FirstOrderLogic=PropositionalLogic+( ∃ and ∀)”, where ∃ and ∀ rangeoverpossiblevaluesoccurringinrelations.

50 RelationalCalculus(FirstOrderLogicforDatabases)

 Firstordervariables: x,y,z,…,x 1,…,xk,…  Theyrangeovervaluesthatmayoccurintables.

 Relationsymbols: R,S,T,… ofspecifiedarities (namesofrelations)  Atomic(Basic)Formulas:

 R(x 1,…,xk),whereRisakary relationsymbol

(alternatively,(x 1,…,xk) ∈ R;thevariablesneednotbedistinct)  (xopy),whereopisoneof=,≠,<,>,≤,≥

 (xopc),wherecisaconstantandopisoneof=,≠,<,>,≤,≥.  RelationalCalculusFormulas:

 Everyatomicformulaisarelationalcalculusformula.

 If ϕ and ψ arerelationalcalculusformulas,thensoare:  (ϕ Æ ψ),( ϕ Ç ψ), ¬ ψ,( ϕ → ψ)(propositionalconnectives)  (∃ x ϕ) (existentialquantification)  (∀ x ϕ) (universalquantification).

51 RelationalCalculus

 Examples: AssumeEisabinaryrelationsymbol

 (∃ x)E(x,x)

 (∀ x)( ∀ y)( ∃ z)(E(x,z) Æ E(z,y))

 (∃ z1)( ∃ z2)(E(x,z 1) Æ E(z 1,z 2) Æ E(z 2,y))  (∃ y)( ∃ z)(E(x,y) Æ E(x,z) Æ (y≠z))

 Freeandboundvariables:

 Inthefirsttwoformulasabove,noisfree.

 Inthethirdformulaabove,thefreevariablesarexandy.

 Inthefourthformulaabove,theonlyfreevariableisx.

 Intuitively,avariableis freeinaformula ifthevariablemustbe assignedavalueinordertotelliftheformulaistrueorfalse.

52 RelationalCalculusasaDatabaseQueryLanguage

Definition:  A relationalcalculusexpression isanexpressionoftheform

{(x 1,…,xk): ϕ(x 1,…xk)},

where ϕ(x 1,…,xk)isarelationalcalculusformulawithx 1,…,xk asits freevariables.  WhenappliedtoarelationaldatabaseI,thisrelationalcalculus expressionreturnsthekary relationthatconsistsofallktuples (a 1,…,ak)thatmaketheformula“true” onI.

Example: Therelationalcalculusexpression {(x,y): ∃z(E(x,z) Æ E(z,y))} returnsthesetPofallpairsofnodes(a,b)thatareconnectedviaa pathoflength2.

53 RelationalCalculusasaDatabaseQueryLanguage

Example: FACULTY(name,dpt,salary),CHAIR(dpt,name) GivearelationalcalculusexpressionforCSALARY(dpt,salary) (findthesalariesofdepartmentchairs).

{(x,y): ∃u(FACULTY(u,x,y) Æ CHAIR(x,u))}

Hereisanotherrelationalcalculusexpressionforthesametask:

{(x,y): ∃u∃v(FACULTY(u,x,y) Æ CHAIR(x,v) Æ (u=v))}

54 RelationalCalculusasaDatabaseQueryLanguage

Example: FACULTY(name,dpt,salary) FindthenamesofthehighestpaidfacultyinCS {x: ϕ(x)},where ϕ(x)istheformula:

∃y,z(FACULTY(x,y,z) Æ y=“CS” Æ (∀u,v,w(FACULTY(u,v,w) Æ v=“CS” → z≥ w)))

Exercise :ExpressthisqueryinrelationalalgebraandinSQL.

Abbreviation:

 ∃x1,…,xk standsfor ∃x1,…,∃xk

 ∀x1,…,xk standsfor ∀x1,…,∀xk

55 NaturalJoininRelationalCalculus

Example: LetR(A,B,C)andS(B,C,D)betwoternaryrelationschemas.

 Recallthat,inrelationalalgebra,thenaturaljoinR ⋈ Sisgivenby

π R.A,R.B,R.C,S.D (σ R.B=S.B Æ R.C=S.C (R × S)).

 GivearelationalcalculusexpressionforR ⋈ S

{(x 1,x 2,x 3,x 4):R(x 1,x 2,x 3) Æ S(x 2,x 3,x 4)}

Note: Thenaturaljoinisexpressiblebya fre eformulaof relationalcalculus.

56 QuotientinRelationalCalculus

 Recallthatthe quotient (or division )R÷ SoftworelationsRandS

istherelationofarity r– sconsistingofalltuples (a 1,…,ars)such thatforeverytuple (b 1,…,bs)inS,wehavethat(a 1,…,ars,b 1,…,bs) isinR.

 AssumethatRhasarity 5andShasarity 3. ExpressR÷ Sinrelationalcalculus.

{(x 1,x 2):( ∀ x3)( ∀ x4)( ∀ x5)(S(x 3,x 4,x 5) → R(x 1,x 2,x 3,x 4,x 5))}

 MuchsimplerthantherelationalalgebraexpressionforR÷ S

57 RelationalAlgebravs.RelationalCalculus

Codd’s Theorem (informal): RelationalAlgebraandRelationalCalculushaveessentiallythesame expressivepower,i.e.,theycanexpressthesamequeries.

Note:  Thisstatementis not entirelyaccurate.  Inwhatfollows,wewillgivearigorousformulationofCodd’s Theoremandsketchitsproof.

58 Queries

Definition: Let S bearelationaldatabaseschema. A kary queryon S isaqdefinedondatabaseinstancesoverS suchthatifIisadatabaseinstanceoverS,thenq(I)isakary relation thatisinvariantunder andhasvaluesamongthose occurringintherelationsinI (i.e.,ifh:I → Jisan,thenq(J)=h(q(I)).

Note:  All“queries” thatwehaveexpressedinrelationalalgebraand/orin relationalcalculussofararequeriesintheabovesense.  Inparticular,arelationalcalculusexpressionoftheform

{(x 1,…,xk): ϕ(x 1,…xk)}definesakary query.

59 FromRelationalAlgebratoRelationalCalculus

Theorem: ForeveryrelationalexpressionE,thereisanequivalent relationalcalculusexpression{(x 1,…,xk): ϕ(x 1,…xk)}.

Proof: Byinductionontheconstructionofrel.algebraexpressions.

 IfEisarelationRofarity k,thenwetake{(x 1,…,xk):E(x 1,…,xk)}.

 AssumeE 1 andE 2 areexpressibleby{(x 1,…,xk): ϕ1(x 1,…,xk)}andby {(x 1,…,xm): ϕ2(x 1,…,xm)}.Then  E1 ∪ E2 isexpressibleby {(x 1,…,xk): ϕ1(x 1,…,xk) Ç ϕ2(x 1,…,xk)}.

 E1 –E2 isexpressibleby {(x 1,…,xk): ϕ1(x 1,…,xk) Æ ¬ϕ2(x 1,…,xk)}.

 E1 × E2 isexpressibleby {(x 1,…,x k,y 1,…,ym): ϕ1 (x 1,…,xk) Æ ϕ2(y 1,…,ym)}

60 FromRelationalAlgebratoRelationalCalculus

Theorem: ForeveryrelationalexpressionE,thereisanequivalent relationalcalculusexpression{(x 1,…,xk): ϕ(x 1,…xk)}.

Proof: (continued)

 AssumethatEisexpressibleby{(x 1,…,xk): ϕ(x 1,…,xk)}. Then

 π1,3 (E)isexpressibleby {(x 1,x 3):( ∃ x2)( ∃ x4)…(∃ xk) ϕ(x 1,…,xk)}  σΘ(E)isexpressibleby {(x 1,…,xk): Θ* Æ ϕ(x 1,…,xk)},where Θ*istherewritingof Θ as aformulaofrelationalcalculus.

Corollary: RelationalCalculusisrelationallycomplete.

61 FromRelationalCalculustoRelationalAlgebra

Fact: Itis not truethatforeveryrelationalcalculusexpressionϕ, thereisanequivalentrelationalalgebraexpressionE.

Examples:

 {(x 1,…,xk): ¬ R(x 1,…,xk)}

 {(x,y): ∃z(CHAIR(x,z) Æ y≠z)},whereCHAIR(dpt,name)

 {x: ∀y,zENROLLS(x,y,z)},whereENROLLS(sname,course,term)

62 FromRelationalCalculustoRelationalAlgebra

Note: Thepreviousthreerelationalcalculusexpressionproduce differentanswers whenweconsider differentdomains overwhich thevariablesareinterpreted.

Example: Ifthevariablesx 1,…,xk rangeoveradomainD,then k {(x 1,…,xk): ¬ R(x 1,…,xk)}=D – R.

Fact:

 Therelationalcalculusexpression{(x 1,…,xk): ¬ R(x 1,…,xk)} is not “domainindependent”.  Therelationalcalculusexpression

{(x 1,…,xk):S(x 1,..,x k) Æ ¬ R(x 1,…,xk)}is “domainindependent”.

63 FromRelationalCalculustoRelationalAlgebra

 Question: Howcanwegofromrelationalcalculustorelational algebra?

 Answer: Therearetwopossibilities:

 Restrictourselvesto “domainindependent” relationalcalculus expressions.

 “Relativize” thesemanticsofrelationalcalculusexpressionsby fixingadomainoverwhichthevariablesrange.

64 ActiveDomain

Definition:

 The activedomain adom(ϕ)ofarelationalcalculusformula ϕ isthe setofallconstantsthatoccurin ϕ.  If ϕ isR(x,y),thenadom(ϕ)= ∅  If ϕ is ∃y(R(x,y) Æ (y>3) Æ (x<5)),thenadom(ϕ)={3,5}.

 The activedomain adom(I)ofarelationaldatabaseinstanceIisthe setofallvaluesthatoccurintherelationsofI.

65 ActiveDomainandRelativeInterpretations

Definition: Let ϕ(x 1,…,xk)bearelationalcalculusformulaandletIbea relationaldatabaseinstance.

 IfisDadomainsuchadom(ϕ) ∪ adom(I) ⊆ D,then D ϕ (I)istheresultofevaluating ϕ(x 1,…,xk)overDandI,thatis,  allvariablesandquantifiersareassumedtorangeoverD;  therelationsymbolsin ϕ areinterpretedbytherelationsinI.

 Bydefinition, ϕadom (I)is ϕD(I),whereD=adom(ϕ) ∪ adom(I).

66 ActiveDomainandRelativeInterpretation

Example: Let ϕ be ¬R(x,y)andI={(1,2)}.

 adom(I)={1,2}

 ϕadom (I)={(2,1),(1,1),(2,2)}

 IfD={1,2,3},then

ϕD(I)={(2,1),(1,1),(2,2),(3,3),(1,3),(3,1),(2,3),(3,2)}

67 ActiveDomainandRelativeInterpretation

Example: Let ϕ be ∃yR(x,y)andI={(1,1),(1,2),(2,1),(1,3)}.

 adom(I)={1,2,3}

 ϕadom (I)={1,2}

 IfD={1,2,3,4},then

ϕD(I)={1,2}.

 Moregenerally,ifadom(I) ⊆ D,then

ϕD(I)={1,2}.

68 ActiveDomainandRelativeInterpretation

Example: Let ϕ be ∀yR(x,y)andI={(1,1),(1,2),(2,1)}.

 adom(I)={1,2}

 ϕadom (I)={1}

 IfD={1,2,3},then

ϕD(I)= ∅.

69 DomainIndependence

Definition: Arelationalcalculusformula ϕ is domainindependent ifforeveryrelationalinstanceIandeverydomainDsuchthat adom(ϕ) ∪ adom(I) ⊆ D,wehavethat ϕD(I)= ϕadom (I).

Examples:

 ¬R(x 1,…,xk)is not domainindependent.

 ∃yR(x,y)isdomainindependent.

 ∀yR(x,y)is not domainindependent.

 ∀y(R(x,y) → y>5)isdomainindependent.

70 EquivalenceofRelationalAlgebraandRelationalCalculus

Theorem: Thefollowingareequivalentforakary queryq: 1.ThereisarelationalalgebraexpressionEsuchthatq(I)=E(I),for everydatabaseinstanceI (inotherwords,qisexpressibleinrelationalalgebra).

2.Thereisadomainindependentrelationalcalculusformula ϕ such thatq(I)= ϕadom (I)(inotherwords,qisexpressibleindomain independentrelationalcalculus).

3.Thereisarelationalcalculusformula ψ suchthatq(I)= ψadom (I) (inotherwords,qisexpressibleinrelationalcalculusunderthe activedomaininterpretation).

71 EquivalenceofRelationalAlgebraandRelationalCalculus

Proof(Sketch): 1. ⇒ 2.Showbyinductionthattheearliertranslationof relationalalgebratorelationalcalculusisactuallyatranslationof relationalalgebratodomainindependentrelationalcalculus.

2. ⇒ 3.Thisimplicationisobvious.

3. ⇒ 1.  Showfirstthatforeveryrelationaldatabaseschema S,thereisa relationalalgebraexpressionEsuchthatforeverydatabaseinstanceI, wehavethatadom(I)=E(I).

 Usetheabovefactandinductionontheconstructionofrelational calculusformulastoobtainatranslationofrelationalcalculus underthe activedomaininterpretationtorelationalalgebra.

72 EquivalenceofRelationalAlgebraandRelationalCalculus

 Inthistranslation,themostinterestingpartisthesimulation ofthe universalquantifier ∀ inrelationalalgebra.  Itusesthelogicalequivalence ∀yψ ≡ ¬∃ y¬ψ  Asanillustration,consider ∀yR(x,y).  ∀yR(x,y) ≡ ¬∃ y¬R(x,y)

 adom(I)= π(R) ∪ π(R)

Rel.Calc.formula ϕ RelationalAlgebraExpressionfor ϕadom

¬ R(x,y) (π(R) ∪ π(R)) ×(π(R) ∪ π(R))– R

∃y¬R(x,y) π(( π(R) ∪ π(R)) ×(π(R) ∪ π(R)) R)

¬∃ y¬R(x,y) (π(R) ∪ π(R))– (π(( π(R) ∪ π(R)) ×(π(R) ∪ π(R)) R))

73 EquivalenceofRelationalAlgebraandRelationalCalculus

Remarks:

 TheEquivalenceTheoremis effective .Specifically,theproofofthis theoremyieldstwo:

 analgorithmfortranslatingfromrelationalalgebratodomain independentrelationalcalculus,and

 analgorithmfromtranslatingfromdomainindependentrelational calculustorelationalalgebra.

 Eachofthesetwoalgorithmsrunsin lineartime .

74 DomainIndependentRelationalCalculus

Note:

 Adesirablefeatureofalogicalformalismisthatthereisan (efficient)algorithmfordeterminingwhetherornotanexpressionis aformulaofthatformalism.

 Bothrelationalalgebraandrelationalcalculushavethisproperty.

Question:

 Doesdomainindependentrelationalcalculushavethisproperty?

 Inotherwords,isthereanalgorithmsuchthat,givenarelational calculusformula ϕ,thealgorithmtellswhetherornot ϕ isdomain independent?

75 DomainIndependentRelationalCalculus

BadNews… Theorem(DiPaola– 1969): Determiningdomainindependenceisan undecidable problem,i.e.,thereis no algorithmsuchthat,givena relationalcalculusformula ϕ,thealgorithmtellswhetherornot ϕ is domainindependent.

SomeGoodNews: Theorem: Domainindependentrelationalcalculushasan effective syntax ,i.e.,thereisa F ofrelationalcalculusformulassuchthat:  Thereisan(efficient)algorithmfortestingmembershipin F.  Everyformulain F isdomainindependent.  Everydomainindependentrelationalcalculusformulaislogically equivalenttoaformulain F.

76 DomainIndependentRelationalCalculus

Formuchmoreondomainindependence:

 ReadSections5.3and5.4of“FoundationsofDatabases” by Abiteboul,Hull,andVianu

 Readthepapers

 “Therecursiveunsolvability ofthedecisionproblemfortheclass ofdefiniteformulas” byRobertA.DiPaola,JACM,Vol.16,1969, pages324327.

 “Safetyandtranslationofrelationalcalculus” byAllenvanGelder andRodneyTopor,ACMTransactionsonDatabaseSystems,Vol. 16,1991,pages235– 278.

77 Queries(Revisited)

Definition: Let S bearelationaldatabaseschema.  A kary queryon S isafunctionqdefinedondatabaseinstances overSsuchthatifIisadatabaseinstanceoverS,thenq(I)isa kary relationonadom(I)thatisinvariantunderisomorphisms (i.e.,ifh:I → Jisanisomorphism,thenq(J)=h(q(I)).

 A Booleanqueryon S isafunctionqdefinedondatabaseinstances overSsuchthatifIisadatabaseinstanceoverS,thenq(I)=0or q(I)=1,andq(I)isinvariantunderisomorphisms.

Example: ThefollowingareBooleanqueriesongraphs:  GivenagraphE(binaryrelation),isthediameterofEatmost3?  GivenagraphE(binaryrelation),isEconnected?

78 ThreeFundamentalAlgorithmicProblemsaboutQueries

 TheQueryEvaluationProblem :Givenaqueryqandadatabase instanceI,findq(I).

 TheQueryEquivalenceProblem: Giventwoqueriesqandq’ ofthe samearity,isitthecasethatq ≡ q’ ? (i.e.,isitthecasethat,foreverydatabaseinstanceI,wehavethat q(I)=q’(I)?)

 TheQueryContainmentProblem: Giventwoqueriesqandq’ ofthe samearity,isitthecasethatq ⊆ q’ ? (i.e.,isitthecasethat,foreverydatabaseinstanceI,wehavethat q(I) ⊆ q’(I)?)

79 ThreeFundamentalAlgorithmicProblemsaboutQueries

 TheQueryEvaluationProblem isthemainprobleminquery processing.

 TheQueryEquivalenceProblem underliesqueryprocessingand optimization,asweoftenneedtotransformagivenquerytoan equivalentone.

 TheQueryContainmentProblem and QueryEquivalenceProblem arecloselyrelatedtoeachother:

 q ≡ q’ ifandonlyifq ⊆ q’ andq’ ⊆ q.

 q ⊆ q’ ifandonlyifq ≡ q Æ q’.

80 ThreeFundamentalAlgorithmicProblemsaboutQueries

 Ourgoalistoinvestigatethealgorithmicaspectsoftheseproblems forqueriesexpressibleinrelationalalgebra/relationalcalculus.

 Thequestionswewanttoaddressare:

 Howcanwemeasuretheprecise“difficulty ” oftheseproblems?

 Arethere“good ” algorithmsforsolvingtheseproblems?

 Ifnot,arethere specialcases oftheseproblemsforwhich“good ” algorithmsexist?

81 ThreeFundamentalAlgorithmicProblemsaboutQueries

Ourstudyoftheseproblemswilluseconceptsandmethodsfromtwo different,yetrelated,areas:

 MathematicalLogic:

 andUndecidable Problems

 ComputationalComplexityTheory:

 ComplexityClassesandCompleteProblems

 Inparticular,theclassesPandNP,andNPcompleteproblems.

82 DecisionProblemsandLanguages

 Definition (informal):A decisionproblem Qconsistsofasetof inputsandaquestionwitha“yes” or“no” answerforeachinput. 1 (“yes”) input x Q? 0 (“no”)

 Definition:  Σ*isthesetofallstringsoverafinitealphabetΣ.  A language over Σ isasetL ⊆ Σ*  EverylanguageLgivesrisetothefollowingdecisionproblem:  Givenx ∈ Σ*,isx ∈ L?  Conversely,everydecisionproblemcanbethoughtofasarising fromalanguage,namely, thelanguageconsistingofallinputswitha“yes” answer.

83 TuringComputability

 Turingmachines

 Turingcomputable(partial)functionsf: Σ* → Σ*

 Church’sThesis(akaChurchTuringThesis): Thefollowing statementsareequivalentfora(partial)functionf: Σ* → Σ*:

 ThereisaTuringmachinethatcomputesf

 Thereisanalgorithmthatcomputesf.

 MainUseofChurch’sThesis: Toshowthatthereis noalgorithm for computingafunctionf,itsufficestoshowthatthereis noTuring machine thatcomputesf.

84 RecursiveandRecursivelyEnumerableLanguages

 Definition: LetL ⊆ Σ*bealanguage

 Lis recursive ifits characteristicfunction χ isTuringcomputable, where

 χL(x)=1ifx ∈ L

 χL(x)=0ifx ∉ L.

 Lis recursivelyenumerable ifits semicharacteristicfunction sL is Turingcomputable,where

 sL(x)=1ifx ∈ L

 sL(x)=undefinedifx ∉ L.  Theorem: ThefollowingareequivalentforalanguageL ⊆ Σ*:

 Lisrecursive.

 BothLandits Σ* Larerecursivelyenumerable.

85 DecidableandUndecidable Problems

 Definition: LetQbeadecisionproblem.

 Qis decidable(solvable) ifthelanguageassociatedwithQis recursive.

 Qis undecidable (unsolvable) ifthelanguageassociatedwithQis notrecursive. 1 (yes)

Input x Q? 0 (“no”)

Q is undecidable means that there is no algorithm for this problem

86 Undecidable Problems

Fact: Undecidable problemsexist. Proof: Useacounting:

 Therearecountably manyTuringmachines.

 Thereareuncountably manylanguagesL ⊆ {0,1}*.

Theorem: Manynaturalproblemsofalgorithmicinterestorof mathematicalsignificanceareundecidable.

87 Undecidable Problems

Theorem :Thefollowingproblemsareundecidable:  TheHaltingProblem(A.Turing– 1936): GivenaTuringmachineM andaninputx,doesMhaltonx?

 TheFiniteProblem(B.Trakhtenbrot – 1949): Givenafirst orderformula ϕ ongraphs,is ϕ trueoneveryfinitegraph?

 Hilbert’s10 th Problem(Y.Matijacevic – 1971): Givenamultivariate

polynomialp(x 1,…,xn)withcoefficients,doesp(x 1,…,xn)have anallintegerssolution?

88 Undecidable Problems

 TheHaltingProblem(A.Turing– 1936): GivenaTuringmachineM andaninputx,doesMhaltonx?

 ImplicationsofUndecidability oftheHaltingProblem:  The undecidability ofthe HaltingProblem impliesthatthereis no algorithmsuchthat,givenaCprogrampandaninputx,the algorithmdetermineswhethertheprogrampproducesanoutput oninputxorgoesintoaninfiniteloop.  Ofcourse,itmaystillbepossibletoshowthataparticular programterminatesonagiveninput(orevenoneveryinput), butitis not possibletoautomatethisprocessforeveryprogram.  Buteventhismaybeadifficulttask…

89 ProvingProgramTermination

 McCarthy’sProgram Givenapositiveintegern: Whilen>1,do:  Ifniseven,thensetn:=n/2;  Ifnisodd,thensetn:=3n+1.

 ExampleRun:  n=11 ֏ 34 ֏ 17 ֏ 52 ֏ 26 ֏ 13 ֏ 40 ֏ 20 ֏ 10 ֏ 5 ֏ 16 ֏ 8 ֏ 4 ֏ 2 ֏ 1.

 OpenProblem: Doesthisprogramterminateoneveryinput?

90 Undecidable Problems

 TheFiniteValidityProblem(B.Trakhtenbrot – 1949): Givenafirst orderformula ϕ ongraphs,is ϕ trueoneveryfinitegraph?

 ExamplesofFinitelyValidFormulas:

 ∀ x(E(x,x) → ∃ yE(x,y))

 ∀ x∀ y(E(x,x) Æ x=y → E(y,y))

 “ifEisatotalorder,thenEhasabiggest

 ExampleofNonFinitelyValidFormulas:

 ∀ x ∀ y(E(x,y) → E(y,x))

 (∀ x ∃ yE(x,y)) → (∃ y∀ xE(x,y))

 Theundecidability ofthe FiniteValidityProblem impliesthatthereis no algorithmfortellingformulasinthefirstgroupfromformulasin thesecondgroup.

91 Undecidable Problems

 Hilbert’s10 th Problem(Y.Matijacevic – 1971): Givenamultivariate

polynomialp(x 1,…,xn)withintegercoefficients,doesp(x 1,…,xn)have anallintegersroot?(i.e.,doestheequationp(x1,…,xn)=0havean allintegersolution?)  DiophantineEquations (Diophantus ofAlexandria3 rd CenturyAD)  3x+5y 8z=0  x2 – 2xy+z 3 +9=0  x2 – 100y 2 +1=0  x2 +y 2 z2 =0  x3 +y 3 z3 =0  Theundecidability of Hilbert’s10 th Problem impliesthatthereis no algorithmtotellwhetherornotagivenDiophantineequationhasa solutionconsistingentirelyofintegers.

92 Undecidable Problems

Note:

 TheHaltingProblem isrecursivelyenumerable,butnotrecursive (hence,itscomplementisnotrecursivelyenumerable).

 TheFiniteValidityProblem iscorecursivelyenumerable,butnot recursive. (hence,itisnotevenrecursivelyenumerable).

 Hilbert’s10 th Problem isrecursivelyenumerable,butnotrecursive (hence,itscomplementisnotrecursivelyenumerable).

93 TheReductionMethod

 Bynowthereisavastofundecidable problems.

 The ReductionMethod isthemaintechniqueforestablishing undecidability.

 ReductionMethod :ToshowthatalanguageL*isnotrecursive,it sufficestofindanonrecursivelanguageLandatotalTuring computablefunctionfsuchthatforeverystringx,wehavethat x ∈ L ⇔ f(x)∈ L*.  Suchafunctionfiscalleda reduction ofLtoL*  L ≼ L*meansthatthereisareductionofLtoL*.

94 TheReductionMethod

 TheHaltingProblemwasthefirstfundamentaldecisionproblem showntobeundecidable.

 TheFiniteValidityProblemwasshowntobeundecidable byshowing thatHaltingProblem ≼ FiniteValidityProblem .

 Manydatabaseproblemshavebeenshowntobeundecidable via reductionsfromonethefollowingproblems:  TheHaltingProblem  TheFiniteValidityProblem th  Hilbert’s10 Problem.

 Inparticular,DiPaolaprovedthattheDomainIndependence Problemforrelationalcalculusformulasisundecidable byshowing thatFiniteValidityProblem ≼ DomainIndependence.

95 Undecidability ofTheQueryEquivalenceProblem

 TheQueryEquivalenceProblem: Giventwoqueriesqandq’ ofthe samearity,isitthecasethatq ≡ q’ ? (i.e.,isq(I)=q’(I)oneverydatabaseinstanceI?)

 Theorem: TheQueryEquivalenceProblemforrelationalcalculus queriesisundecidable. Proof: FiniteValidityProblem ≼ QueryEquivalenceProblem

 Tosee,thislet ψ*beafixedfinitelyvalidrelationalcalculus sentence(say, ∀ x(E(x,x) → ∃ yE(x,y))).

 Then,foreveryrelationalcalculussentence ϕ,wehavethat ϕ isfinitelyvalid ⇔ ϕ ≡ ψ*.

96 Undecidability oftheQueryContainmentProblem

 TheQueryContainmentProblem: Giventwoqueriesqandq’ ofthe samearity,isitthecasethatq ⊆ q’ ? (i.e.,isq(I) ⊆ q’(I)oneverydatabaseinstanceI?)

 Corollary: TheQueryContainmentProblemforrelationalcalculus queriesinundecidable. Proof :QueryEquivalence ≼ QueryContainment,since q ≡ q’ ⇔ q ⊆ q’ andq’ ⊆ q.

 Noticethechainofreductions: HaltingProblem ≼ FiniteValidity ≼ QueryEquiv. ≼ QueryCont.

97 TheQueryEvaluationProblem

 TheQueryEvaluationProblem :Givenaqueryqandadatabase instanceI,findq(I).

 TheQueryEvaluationProblemforrelationalcalculusqueriesis decidable ,but,aswewillsee,ithas high computationalcomplexity .

 TounderstandtheprecisealgorithmicdifficultyoftheQuery EvaluationProblem,weneedsomebasicnotionsandresultsfrom computationalcomplexity.

98 DecidableProblemsandComputationalComplexity

 ComputationalComplexity isthe quantitativestudyofdecidableproblems. Undecidable

Problems  “Fromtheseandotherconsiderations grewourdeepconvictionthat there mustbequantitativelawsthat governthebehaviorofinformation andcomputing .Theresultsofthis researcheffortweresummarizedinour firstpaperonthistopic,whichalso namedthisnewresearcharea,"Onthe Decidable computationalcomplexityof Problems algorithms“.”

J.Hartmanis,TuringAwardLecture,1993

99 ComputationalComplexityClasses

 Decidableproblemsaregroupedtogetherin computational complexityclasses .  Eachcomputationalcomplexityclassconsistsofallproblemsthat canbesolvedinacomputationalmodelundercertainrestrictionson theresourcesusedtosolvetheproblem.  Examplesofcomputationalmodels:  TuringMachineTM(deterministicTuringmachine)  NondeterministicTuringmachineNTM  …

 Examplesofresources:  Amountof time neededtosolvetheproblem  Amountof space (memory)neededtosolvetheproblem.  …

100 TheFiveBasicComputationalComplexityClasses

 LOGSPACE(or,L): AlldecisionproblemssolvablebyaTMusing extramemoryboundedbyalogarithmicamountintheinputsize.

 NLOGSPACE(or,NL): AlldecisionproblemssolvablebyaNTMusing extramemoryboundedbyalogarithmicamountintheinputsize.

 P(or,PTIME): AlldecisionproblemssolvablebyaTMintime boundedbysomepolynomialintheinputsize.

 NP: AlldecisionproblemssolvablebyaNTMintimeboundedby somepolynomialintheinputsize.

 PSPACE: AlldecisionproblemssolvablebyaTMusingmemory boundedbyapolynomialintheinputsize.

101 TheFiveBasicComputationalComplexityClasses

Theorem:  Thefollowinginclusionshold: LOGSPACE ⊆ NLOGSPACE ⊆ P ⊆ NP ⊆ PSPACE.

 Moreover,itisknownthatLOGSPACE ⊂ PSPACE.

 No otherproperinclusionbetweentheseclassesisknownatpresent. Inparticular,itis not knownwhetherP=NP.

Note:  Thequestion:“isP=NP?” isthecentralopenproblemin computationalcomplexity.  Itisoneofthe MillenniumPrizeProblems – see http://www.claymath.org/millennium/

102 CompleteProblems

 Akeypropertyofmostcomplexityclassesisthattheypossess completeproblems .

 Intuitively,completeproblemsarethe“hardest ” problemsintheclass inthesensethateveryotherproblemcanbe reduced toit.

 Definition: Let Cbeacomplexityclass. AdecisionproblemQis Ccomplete if  Qisin C.  IfQ’ isin C,thenthereisa“suitable ” totalTuringcomputable functionfsuchthatforeverystringx,wehavethat x ∈ Q’ ⇔ f(x)∈ Q.

 “suitable ” meansthatfcanbecomputedwithfewerresources thanthoseusedtodefine C.  So,fisareductionofarestrictednature.

103 CompleteProblemsandReductions

ComplexityClass ReductionsforComplete Problems PSPACE Polynomialtimecomputable NP Polynomialtimecomputable P Logspacecomputable NL Logspacecomputable

 Definition: AdecisionproblemQis NPcomplete if  QisinNP.  IfQ’ isinNP,thenthereisapolynomialtimecomputable functionfsuchthatforeverystringx,wehavethat x ∈ Q’ ⇔ f(x)∈ Q.

 Suchanfisa polynomialtimereduction ofQ’ toQ(Q’ ≼p Q).

104 CompleteProblemsforComputationalComplexityClasses

 PSPACEcomplete:  QuantifiedBooleanFormulas(QBF): GivenaquantifiedBoolean formula ∀ x1∃ x2 …. ∀ xkϕ,isittrue?  NPcomplete:  (SAT): GivenaCNFformula ϕ,isitsatisfiable?  3Colorability: GivenagraphG=(V,E),isit3colorable?  IntegerLinearInequalities(ILI): Givenasystemoflinear inequalitieswithintegercoeffs.,doesithaveanintegersolution?  Pcomplete:  HornSAT: GivenaHornCNFformula ϕ,isitsatisfiable?  LinearInequalities(LI): Givenasystemoflinearinequalitieswith integercoefficients,doesithavearationalsolution?  NLcomplete:  DirectedGraphReachability: GivenadirectedgraphG=(V,E)and twonodessandt,isthereapathfromstot?

105 PolynomialTimeReductions

 3Satisfiability(3SAT): Givena3CNFformula ϕ,isitsatisfiable? (eachclausehasatmost3literals)

 Theorem: 3SATisNPcomplete

Proof: ShowthatSAT ≼p 3SAT

 Let ϕ beaCNFformulac 1 Æ c2 … Æ cm

 Ifaclauseci hasmorethanthreeliterals,thenwereplaceitwith asetofclauseseachwiththreeliteralsandcertainnew variables.

 Forexample,ifci is(x 1 Ç ¬ x2 Ç x3 Ç x4 Ç x5),thenwereplaceci by(x 1 Ç ¬ x2 Ç y1),( ¬ y1 Ç x3 Ç y2),( ¬ y2 Ç x4 Ç x5).  Let ϕ*betheresulting3CNFformula.Then ϕ issatisfiable ⇔ ϕ*issatisfiable (checkthis).

106 CompleteProblemsforComputationalComplexityClasses

 : Let C and C’ beoneofthecomplexityclasses NLOGSPACE,P,NP,PSPACEsuchthat C ⊆ C’ andletQbea C’completeproblem.Thenthefollowingstatementsareequivalent:  C = C’ .  Qisin C . Inparticular,thefollowingstatementsareequivalent:  P=NP.  SATisinP  YourfavoriteNPcompleteproblemisinP.

 Conclusion:  Completeproblemsholdthesecretofwhetherornotahigher computationalcomplexityclasscollapsestoalowerone.  ShowingthatadecisionproblemQisNPcompleteprovides strongevidence thatQis not inP.

107 ComplexityoftheQueryEvaluationProblem

 TheQueryEvaluationProblemforRelationalCalculus: Givenarelationalcalculusformula ϕ andadatabaseinstanceI,find ϕadom (I).

 TheQueryEvaluationProblemforRelationalAlgebra: GivenarelationalalgebraexpressionEandadatabaseinstanceI, findE(I).

 Theorem: TheQueryEvaluationProblemforRelationalCalculusis PSPACEcomplete.

 Corollary: TheQueryEvaluationProblemforRelationalAlgebrais PSPACEcomplete.

108 ComplexityoftheQueryEvaluationProblem

 Theorem: TheQueryEvaluationProblemforRelationalCalculusis PSPACEcomplete.

Proof: Weneedtoshowthat

 ThisproblemisinPSPACE(i.e.,giveaPSPACEalgorithmforit).

 ThisproblemisPSPACEhard.

Westartwiththesecondtask.

109 ComplexityoftheQueryEvaluationProblem

 Theorem: TheQueryEvaluationProblemforRelationalCalculusis PSPACEhard.  Proof: Showthat

QuantifiedBooleanFormulas ≼p QueryEvaluationforRel.Calc.

GivenQBF ∀ x1∃ x2 …. ∀ xk ψ  LetVandPbetwounaryrelationsymbols

 Obtain ψ*from ψ byreplacingx i byP(x i),and ¬xi by ¬P(x i)  LetIbethedatabaseinstancewithV={0,1},P={1}.  Thenthefollowingstatementsareequivalent:

 ∀ x1∃ x2 …. ∀ xk ψ istrue

 ∀ x1 (V(x 1) → ∃ x2 (V(x 2)Æ(… ∀ xk(V(x k) → ψ*))…)istrueonI.

110 ComplexityoftheQueryEvaluationProblem

 Theorem: TheQueryEvaluationProblemforRelationalCalculusisinPSPACE.

Proof(Hint): Let ϕ bearelationalcalculusformula ∀x1∃x2 … ∀xmψ andletIbe adatabaseinstance.

 ExponentialTimeAlgorithm :Wecanfind ϕadom (I),byexhaustivelycycling

overallpossibleinterpretationsofthex i’s. ThisrunsintimeO(n m),wheren=|I|(sizeofI).

 Amorecarefulshowsthatthisalgorithmcanbeimplementedin O(m—logn)space.  Usemblocksofmemory,eachholdingoneofthenelementsof adom(I)writteninbinary(soO(logn)spaceisusedineachblock).  Maintainalsomcountersinbinarytokeeptrackofthenumberof elementsexamined.

∀ x1 ∃ x2 … ∀ xm

a1 inadom(I) a2 inadom(I) … am inadom(I) writteninbinary writteninbinary writteninbinary

111 ComplexityoftheQueryEvaluationProblem

 Corollary: TheQueryEvaluationProblemforRelationalAlgebrais PSPACEcomplete. Proof: Thetranslationofrelationalcalculustorelationalalgebra yieldsapolynomialtimereductionoftheQueryEvaluationProblem forRelationalCalculustotheQueryEvaluationProblemfor RelationalAlgebra.

112 Summary

 TheQueryEvaluationProblemforRelationalCalculusisPSPACE complete.

 TheQueryEquivalenceProblemforRelationalCalculusin undecidable.

 TheQueryContainmentProblemforRelationalCalculusis undecidable.

113 ComputationalComplexityClasses

ClassificationofDecidableProblems Therearemanyothercomplexity (notonscale) classes.Foracomprehensivecatalog, . . visitthe ComplexityZoo at . PSPACE qwiki.stanford.edu/wiki/Complexity_Zoo

NP

P

NLOGSPACE

LOGSPACE

114 CompleteProblems

 Akeypropertyofmostcomplexityclassesisthattheypossess completeproblems .

 Intuitively,completeproblemsarethe“hardest ” problemsintheclass inthesensethateveryotherproblemcanbe reduced toit.

 Definition: Let Cbeacomplexityclass. AdecisionproblemQis Ccomplete if  Qisin C.  IfQ’ isin C,thenthereisa“suitable ” totalTuringcomputable functionfsuchthatforeverystringx,wehavethat x ∈ Q’ ⇔ f(x)∈ Q.

 “suitable ” meansthatfcanbecomputedwithfewerresources thanthoseusedtodefine C.  So,fisareductionofarestrictednature.

115 CompleteProblemsandReductions

ComplexityClass ReductionsforComplete Problems PSPACE Polynomialtimecomputable NP Polynomialtimecomputable P Logspacecomputable NL Logspacecomputable

 Definition: AdecisionproblemQis NPcomplete if  QisinNP.  IfQ’ isinNP,thenthereisapolynomialtimecomputable functionfsuchthatforeverystringx,wehavethat x ∈ Q’ ⇔ f(x)∈ Q.

 Suchanfisa polynomialtimereduction ofLtoL*(L ≼p L*)

116 TheQueryEvaluationProblemRevisited

 SincetheQueryEvaluationProblemforRelationalCalculusis PSPACEhard,thereare no polynomialtimealgorithmsforthis problem,unlessPSPACE=P(whichisconsideredhighlyunlikely).

 Let’stakeanotherlookattheexponentialtimealgorithmforthis problem:

 Let ϕ bearelationalcalculusformula ∀x1∃x2 … ∀xmψ andletIbe adatabaseinstance. adom  ExponentialTimeAlgorithm :Wecanfind ϕ (I),by

exhaustivelycyclingoverallpossibleinterpretationsofthex i’s. ThisrunsintimeO(n m),wheren=|I|). |ϕ|  So,therunningtimeisO(|I| ),where|I|isthesizeofIand |ϕ|isthesizeoftherelationalcalculusformula ϕ.

 Thistellsthatthe sourceofexponentiality istheformulasize.

117 TheQueryEvaluationProblemRevisited

 Theorem: Let ϕ beafixedrelationalcalculusformula.Thenthe followingproblemissolvableinpolynomialtime:givenadatabase instanceI,find ϕadom (I).Infact,thisproblemisinLOGSPACE.

 Proof: Let ϕ beafixedrelationalcalculusformula ∀x1∃x2 … ∀xmψ |ϕ|  ThepreviousalgorithmhasrunningtimeO(|I| ),whichisa polynomial,sincenow| ϕ|isaconstant.

 Moreover,thealgorithmcannowbeimplementedusing logarithmicspaceonly,sinceweneedonlymaintainaconstant numberofmemoryblocks,eachoflogarithmicsize

∀ x1 ∃ x2 … ∀ xm

a1 inadom(I) a2 inadom(I) … am inadom(I) writteninbinary writteninbinary writteninbinary

118 Vardi’s TaxonomyoftheQueryEvaluationProblem

M.YVardi,“TheComplexityofRelationalQueryLanguages”,1982

 Definition: LetLbeadatabasequerylanguage.  The combinedcomplexityofL isthedecisionproblem: givenanLsentenceandadatabaseinstanceI,isϕ trueonI? (doesIsatisfy ϕ?) (insymbols,doesI  ϕ?)

 The datacomplexityofL isthefamilyofthefollowingdecision problemsQ ϕ,where ϕ isanLsentence: givenadatabaseinstanceI,doesI  ϕ?

 The querycomplexityofL isthefamilyofthefollowingdecision problemsQ I,whereIisadatabaseinstance: givenanLsentence ϕ,doesI  ϕ?

119 Vardi’s TaxonomyoftheQueryEvaluationProblem

Note: LetLbeadatabasequerylanguage  Theinputtothecombinedcomplexityproblemconsistsoftwo parts: anLsentenceandadatabaseinstance .

 TheinputtoamemberofthedatacomplexityofLconsistsof adatabaseinstanceonly (theLsentenceisfixed).  Hence,thedatacomplexityofLisaspecialcaseofthe combinedcomplexityofL.

 TheinputtoamemberofthequerycomplexityofLconsistsof anLsentenceonly (thedatabaseinstanceisfixed).  Hence,thequerycomplexityofLisaspecialcaseofthe combinedcomplexityofL.

120 Vardi’s TaxonomyoftheQueryEvaluationProblem

 Definition: LetLbeadatabasequerylanguageandletCbea computationalcomplexityclass.  The datacomplexityofLisinC ifforeachLsentence ϕ, the decisionproblemQ ϕ isinC.

 The querycomplexityofL isinC ifforeverydatabaseinstance, thedecisionproblemQ I isinC.

 Vardi’s discovery: FormostquerylanguagesL:  ThedatacomplexityofLisof lowercomplexity thanboththe combinedcomplexityofLandthequerycomplexityofL  ThequerycomplexityofLcanbe ashardas thecombined complexityofL.

121 TaxonomyoftheQueryEvaluationProblemforRelationalCalculus

ComputationalComplexityClasses The Query Evaluation Problem . for Relational Calculus . . Problem Complexity PSPACE Combined PSPACEcomplete Complexity NP QueryComplexity  IsinPSPACE  Itcanbe P PSPACEcomplete

NLOGSPACE DataComplexity InLOGSPACE LOGSPACE

122 TheQueryEvaluationProblemforRelationalCalculus

 :  TheQueryEvaluationProblemforRelationalCalculushas veryhigh combinedcomplexity (PSPACEcomplete,so“harder” thanNPcomplete).  Yet,databasesystemsevaluateSQLqueries“efficientl y”.

 oftheParadox:  Inpractice,wedealwiththe datacomplexity oftheQuery EvaluationProblemforRelationalCalculus,becausewetypically haveasmallfixedcollectionofqueriestoanswer(whileof coursethedatabaseinstancesvary).  ThedatacomplexityoftheQueryEvaluationProblemfor RelationalCalculusisinLOGSPACE(hence,inPTIME);so,in principle,itisatractableproblem.

123 SublanguagesofRelationalCalculus

 Question: Arethereinterestingsublanguagesofrelationalcalculus forwhichtheQueryContainmentProblemandtheQueryEvaluation Problemare“easier” thanthefullrelationalcalculus?

 Answer:

 Yes,thelanguageof conjunctivequeries issuchasublanguage.

 Moreover,conjunctivequeriesarethe mostfrequentlyasked queries againstrelationaldatabases.

124 ConjunctiveQueries

 Definition : A conjunctivequery isaqueryexpressiblebya relationalcalculusformulainprenex normalformbuiltfromatomic

formulasR(y 1,…,yn),and Æ and ∃ only.

{(x 1,…,xk): ∃ z1 …∃ zm χ(x 1,…,xk,z 1,…,zk)},

where χ(x 1,…,xk,z 1,…,zk)isaconjunctionofatomicformulasofthe

formR(y 1,…,ym).  Equivalently,aconjunctivequeryisaqueryexpressiblebya relationalalgebraexpressionoftheform

πX(σΘ(R 1× …× Rn)),where Θ isaconjunctionofatomicformulas(equijoin).  Equivalently,aconjunctivequeryisaqueryexpressiblebyanSQL expressionoftheform SELECT FROM WHERE

125 ConjunctiveQueries

 Definition : A conjunctivequery isaqueryexpressiblebya relationalcalculusformulainprenex normalformbuiltfromatomic

formulasR(y 1,…,yn),and Æ and ∃ only.

{(x 1,…,xk): ∃ z1 …∃ zm χ(x 1,…,xk,z 1,…,zk)}

 Aconjunctivequerycanbewrittenasa logicprogrammingrule:

Q(x 1,…,xk): R1(u1),…,Rn(un),where

 Eachvariablex i occursintherighthandsideoftherule.

 Each ui isatuple ofvariables(notnecessarilydistinct)  Thevariablesoccurringintherighthandside( thebody ),but notinthelefthandside( thehead )oftheruleareexistentially quantified(butthequantifiersarenotdisplayed).  “,” standsforconjunction.

126 ConjunctiveQueries

Examples:  PathofLength2: (Binaryquery) {(x,y): ∃ z(E(x,z) Æ E(z,y))}

 Asarelationalalgebraexpression,

π1,4 (σ$2=$3 (E ×E))

 Asarule: q(x,y): E(x,z),E(z,y)

 CycleofLength3: (Booleanquery) ∃ x∃ y∃ z(E(x,y) Æ E(y,z) Æ E(z,x))

 Asarule(theheadhasnovariables)  Q: E(x,z),E(z,y),E(z,x)

127 ConjunctiveQueries

 Every relationaljoin isaconjunctivequery: P(A,B,C),R(B,C,D)tworelationsymbols  P ⋈ R={(x,y,z,w):P(x,y,z) Æ R(y,z,w)}

 q(x,y,z,w): P(x,y,z),R(y,z,w) (novariablesareexistentiallyquantified)

 SELECTP.A,P.B,P.C,R.D FROMP,R WHEREP.B=R.BANDP.C=R.C  Conjunctivequeriesarealsoknownas SPJqueries (SELECTPROJECTJOINqueries)

128 ConjunctiveQueryEvaluationandContainment

 Definition: TwofundamentalproblemsaboutCQs

 ConjunctiveQueryEvaluation (CQE): GivenaconjunctivequeryqandaninstanceI,findq(I).

 ConjunctiveQueryContainment (CQC):

 Giventwokary conjunctivequeriesq 1 andq 2,

isittruethatq 1 ⊆ q2?

(i.e.,foreveryinstanceI,wehavethatq 1(I) ⊆ q2(I))

 GiventwoBooleanconjunctivequeriesq 1andq 2,isittruethat q1  q2?(thatis,forallI,ifI  q1,thenI  q2)? CQCis logicalimplication.

129 CQEvs.CQC

 Recallthatforrelationalcalculusqueries:  TheQueryEvaluationProblemisPSPACEcomplete (combinedcomplexity).  TheQueryContainmentProblemisundecidable.

 Theorem : Chandra&Merlin,1977  CQEandCQCarethe“same” problem.  Moreover,eachisanNPcompleteproblem.

 Question: Whatisthecommonlink?  Answer: The HomomorphismProblem

130 Isomorphisms BetweenDatabaseInstances

 Definition: LetIandJbetwodatabaseinstancesoverthesame relationalschema S.

 An isomorphismh:I → J isafunctionh:adom(I) → adom(J) suchthat

 hisonetooneandonto.

 ForeveryrelationalsymbolPofSandevery(a 1,…,a m),we havethat I J (a 1,…,a m) ∈ P ifandonlyif(h(a 1),..,h(a m)) ∈ P .  IandJare isomorphic ifanisomorphismhfromItoJexists.

 Note: Intuitively,twodatabaseinstancesareisomorphicifonecan beobtainedfromtheotherbyrenamingtheelementsofitsactive domainina11way.

131 ADigressiontoTheIsomorphismProblem

 TheIsomorphismProblem: GiventwodatabaseinstancesIandJ overthesamerelationalschema,isIisomorphictoJ?

 Fact: Theexactcomputationalcomplexityoftheisomorphism problemis not knownatpresent.  TheisomorphismproblemisinNP(thisiseasy).  Theisomorphismproblemis not knowntobeNPcomplete (andthereisevidencethatitisnotNPcomplete).  Theisomorphismproblemis not knowntobeinP. Findingapolynomialtimealgorithmfortheisomorphismproblem wouldbeamajorbreakthrough.  SpecialcasesoftheisomorphismproblemareknowntobeinP (forexample,the isomorphismproblemonplanargraphs ).

132 Homomorphisms

 Definition: LetIandJbetwodatabaseinstancesoverthesame relationalschemaS. A homomorphismh:I → J isafunctionh:adom(I) → adom(J)such

ThatforeveryrelationalsymbolPofSandevery(a 1,…,a m),we havethat I J if(a 1,…,a m) ∈ P ,then(h(a 1),..,h(a m) ∈ P .

 Note: Theconceptofhomomorphismisa relaxation oftheconcept ofisomorphism,sinceeveryisomorphismisalsoahomomorphism, butnotviceversa.

 Example:  AgraphG=(V,E)is3colorable ifandonlyif

thereisahomomorphismh:G → K3

133 Homomorphisms

 Fact: Homomorphisms compose,i.e., iff:I → Jandg:J → Karehomomorphisms,then g◦f:I → Kisahomomorphims,whereg◦f(a)=g(f(a)).

 Definition:

 TwodatabaseinstancesIandI’ are homomorphically equivalent ifthereisahomomorphismh:I → I’ andahomomorphism h’:I’ → I.

 I ≡h I’ meansthatIandI’ arehomomorphically equivalent.

 Note: I ≡h I’ does not implythatIandI’ areisomorphic.

134 Homomorphisms

 Fact: Homomorphisms compose,i.e., iff:I → Jandg:J → Karehomomorphisms,then g◦f:I → Kisahomomorphims,whereg◦f(a)=g(f(a)).

 Definition:

 TwodatabaseinstancesIandI’ are homomorphically equivalent ifthereisahomomorphismh:I → I’ andahomomorphism h’:I’ → I.

 I ≡h I’ meansthatIandI’ arehomomorphically equivalent.

 Note: I ≡h I’ does not implythatIandI’ areisomorphic.

I I’

135 TheHomomorphismProblem

 Definition: TheHomomorphismProblem GiventwodatabaseinstancesIandJ,isthereahomomorphism h:I → J?

 Notation: I → JdenotesthatahomomorphismfromItoJexists.

 Theorem: TheHomomorphismProblemisNPcomplete Proof: Easyreductionfrom3Colorabilty

Gis3colorableifandonlyifG → K3.

 Exercise: Formulate3SATasaspecialcaseoftheHomomorphism Problem.

136 TheHomomorphismProblem

 Note: TheHomomorphismProblemisafundamentalalgorithmic problem:

 Satisfiability canbeviewedasaspecialcaseofit.

 kColorability canbeviewedasaspecialcaseofit.

 ManyAIproblems,suchas planning ,canbeviewedasaspecial caseofit.

 Infact,every constraintsatisfactionproblem canbeviewedasa specialcaseoftheHomomorphismProblem (Feder andVardi – 1993).

137 TheHomomorphismProblemandConjunctiveQueries

 Theorem : Chandra&Merlin,1977  CQEandCQCarethe“same” problem.  Moreover,eachisanNPcompleteproblem.

 Question: Whatisthecommonlink?  Answer:  BothCQEandCQCare“equivalen t” totheHomomorphism Problem.  Thelinkisestablishedbybringingintothepicture  Canonicalconjunctivequeries and  Canonicaldatabaseinstances.

138 CanonicalCQs andCanonicalInstances

 Definition: CanonicalConjunctiveQuery

GivenaninstanceI=(R 1,…,Rm),the canonicalCQ ofIisthe BooleanconjunctivequeryQ I with(arenamingof)theelementsofI asvariablesandthefactsofIasconjuncts,wherea fact ofIisan expression

Ri(a 1,…,a m)suchthat(a 1,…,a m) ∈ Ri.

 Example: IconsistsofE(a,b),E(b,c),E(c,a)

 QI isgivenbytherule: QI : E(x,z),E(z,y),E(y,x)  Alternatively,Q I is ∃ x ∃ y ∃ z(E(x,z) Æ E(z,y) Æ E(y,x))

139 CanonicalConjunctiveQuery

 Example: K3,the completegraphwith3nodes

K3 isadatabaseinstancewithonebinaryrelationE,where E={(b,r),(r,b),(b,g),(g,b),(r,g),(g,r)}

K3  The canonicalconjunctivequery Q ofK 3 isgivenbytherule: QK3 : E(x,y),E(y,x),E(x,z),E(z,x),E(y,z),E(z,y)

K3  The canonicalconjunctivequery Q ofK 3 isalsogivenbythe relationalcalculusexpression: ∃x,y,z(E(x,y) Æ E(y,x) Æ E(x,z) Æ E(z,x) Æ E(y,z) Æ E(z,y))

140 CanonicalDatabaseInstance

 Definition: CanonicalInstance GivenaCQQ,the canonicalinstance ofQistheinstanceI Q withthe variablesofQaselementsandtheconjunctsofQasfacts.

 Example: ConjunctivequeryQ: E(x,y),E(y,z),E(z,w)

 CanonicalinstanceI Q consistsofthefactsE(x,y),E(y,z),E(z,w).  Inotherwords,E IQ ={(x,y),(y,z),(z,w)}.

141 CanonicalDatabaseInstance

 Example: ConjunctivequeryQ(x,y): E(x,z),E(z,y),P(z) or,equivalently, {(x,y): ∃ z(E(x,z) Æ E(z,y) Æ P(z)}

 CanonicalinstanceI Q consistsofthefacts E(x,z),E(z,y),P(z).

 Inotherwords,E IQ ={(x,z),(z,y)}andP IQ={z}

142 CanonicalConjunctiveQueriesandCanonicalInstances

 Fact: I  ForeverydatabaseinstanceI,wehavethatI  Q . Q  ForeveryBooleanconjunctivequeryQ,wehavethatI  Q.

 Fact: LetIbeadatabaseinstance,letQ I beitscanonical conjunctivequeryandletI QI bethecanonicalinstanceofQ I. ThenIisisomorphictoI QI.

143 CanonicalConjunctiveQueriesandCanonicalInstances

MagicLemma: AssumethatQisaBooleanconjunctivequeryandJisa databaseinstance.Thenthefollowingstatementsareequivalent.  J  Q.  Thereisahomomorphismh:I Q → J.

Proof: LetQbe ∃ x1 …∃ xm ϕ(x 1,…,xm).

1. ⇒ 2. AssumethatJ  Q.Hence,thereareelements a1,…,a m inadom(J)suchthatJ  ϕ(a 1,…,a m).Thefunctionhwith Q h(x i)=ai,fori=1,…,m,isahomomorphismfromI toJ.

2. ⇒ 1. Assumethatthereisahomomorphismh:I Q → J.

Thenthevaluesh(x i)=ai,fori=1,…,m,givevaluesforthe interpretationoftheexistentialquantifiers ∃ xi ofQinadom(J) sothatJ  ϕ(a 1,…,a m).

144 Homomorphisms,CQE,andCQC

TheHomomorphismTheorem: Chandra&Merlin– 1977 ForBooleanCQs QandQ’,thefollowingareequivalent:  Q ⊆ Q’  Thereisahomomorphismh:I Q’ → IQ  IQ  Q’.

Indualform:

TheHomomorphismTheorem: Chandra&Merlin– 1977 ForinstancesIandI’,thefollowingareequivalent:  Thereisahomomorphismh:I → I’  I’  QI  QI’ ⊆ QI

145 Homomorphisms,CQE,andCQC

TheHomomorphismTheorem: Chandra&Merlin– 1977 ForBooleanCQs QandQ’,thefollowingareequivalent: 1. Q ⊆ Q’ 2. Thereisahomomorphismh:I Q’ → IQ 3. IQ  Q’. Proof: 1. ⇒ 2. AssumeQ ⊆ Q’.SinceI Q  Q,wehavethatI Q  Q’. Hence,bytheMagicLemma,thereisahomomorphismfromI Q’ toI Q. 2. ⇒ 3. ItfollowsfromtheotherdirectionoftheMagicLemma.

3. ⇒ 1. AssumethatI Q  Q’.So,bytheMagicLemma,thereisa homomorphismh:I Q’ → IQ.WehavetoshowthatifJ  Q,thenJ  Q’.Well,if J  Q,then(bytheMagicLemma),thereisahomomorphismh’:I Q → J.The compositionh’◦ h:I Q’ → Jisahomomorphism,hence (onceagainbytheMagicLemma!),wehavethatJ  Q’.

146 IllustratingtheHomomorphismTheorem

 Example:

 Q: ∃x1∃x2∃x3∃x4 (E(x 1,x 2)Æ E(x 2,x 3) Æ E(x 3,x 4))

 Q’ : ∃x1∃x2∃x3 (E(x 1,x 2)Æ E(x 2,x 3))

Then:  Q ⊆ Q’ Homomorphismh:I Q’ → IQ with

h(x 1)=x 1,h(x 2)=x 2,h(x 3)=x 3.

 Q’ ⊈ Q(why?).

147 IllustratingtheHomomorphismTheorem

 Example:

 Q: ∃x1∃x2 (E(x 1,x 2) Æ E(x 2,x 1))

 Q’: ∃x1∃x2∃x3∃x4 (E(x 1,x 2) Æ E(x 2,x 1) Æ E(x 2,x 3) Æ E(x 3,x 2) Æ

E(x 3,x 4) Æ E(x 4,x 3) Æ E(x 4,x 1) Æ E(x 1,x 4)) Then:  Q ⊆ Q’ Homomorphismh:I Q’ → IQ with

h(x 1)=x 1,h(x 2)=x 2,h(x 3)=x 1, h(x 4)=x 2.  Q’ ⊆ Q Q Q’ Homomorphismh’:I → I withh’(x 1)=x 1,h(x 2)=x 2.  Hence,Q ≡ Q’.

148 IllustratingtheHomomorphismTheorem

Example: 3Colorability ForagraphG=(V,E),thefollowingareequivalent:  Gis3colorable

 Thereisahomomorphismh:G → K3

G  K3  Q

 QK3 ⊆ QG.

149 TheHomomorphismTheoremfornonBooleanConjunctive Queries

 Sofar,wehavefocusedonBooleanconjunctivequeries.

 However,theHomomorphismTheoremeasilyextendsto conjunctivequeriesofarbitraryarities byconsidering homomorphisms thataretheonthevariablesintheheadof theconjunctivequeries(writtenasrules).

 Moreover,theHomomorphismTheoremalsoextendstoconjunctive querieswithconstantsinsomeoftheconjuncts. FindallcitiesonecanreachbyflyingfromSanJosewithtwostops

{x: ∃x1∃x2 (FLIGHT(SanJose,x 1)Æ FLIGHT(x 1,x 2) Æ FLIGHT(x 2,x))}.

150 TheHomomorphismTheoremfornonBooleanConjunctive Queries

TheHomomorphismTheorem: Chandra&Merlin– 1977 Considertwokary conjunctivequeries

Q(x 1,…,xk): R1(u1),…,Rn(un)andQ’(x 1,…,xk): T1(v1),…,Tm(vm), Thenthefollowingareequivalent:

 Q ⊆ Q’

 Thereisahomomorphismh:I Q’ → IQ suchthat

h(x 1)=x 1,h(x 2)=x 2,…,h(x k)=xk.

Q  I ,x 1,…,xk  Q’.

151 TheHomomorphismTheoremfornonBooleanConjunctive Queries

 Example :Considerthebinaryconjunctivequeries Q(x,y): E(y,x),E(x,u) and Q’(x,y): E(y,x),E(z,x),E(w,x),E(x,u)

ThenQ ⊆ Q’ becausethereisahomomorphism h:I Q’ → IQ withh(x)=xandh(y)=y, namely, h(x)=x,h(y)=y,h(z)=y,h(w)=y,h(u)=u.

152 CombinedcomplexityofCQCandCQE

Corollary: ThefollowingproblemsareNPcomplete:  Giventwo(Boolean)conjunctivequeriesQandQ’ isQ ⊆ Q’?  GivenaBooleanconjunctivequeryQandaninstanceI, doesI  Q?

Proof: (a)MembershipinNPfollowsfromtheHomomorphismTheorem: Q ⊆ Q’ ifandonlyifthereisahomomorphismh:I Q’ → IQ

(b)NPhardnessfollowsfrom3Colorability: Gis3colorableifandonlyifQ K3 ⊆ QG.

153 ConjunctiveQueryEquivalence

 TheConjunctiveQueryEquivalenceProblem: Giventwoconjunctive queriesQandQ’,isQ ≡ Q’?

 Corollary: ForconjunctivequeriesQandQ’,wehavethat Q Q’ Q ≡ Q’ ifandonlyifI ≡h I .

 Corollary: TheConjunctiveQueryEquivalenceProblemis NPcomplete.  Proof:  ThefollowingproblemisNPcomplete:

GivenagraphHcontainingaK 3,isH3colorable?

 LetHbeagraphcontainingaK 3.Then His3colorableifandonlyifQ H ≡ QK3.

154 CombinedComplexityvs.DataComplexity

RecallVardi’s TaxonomyofQueryEvaluation:  CombinedComplexity: Boththequeryandtheinstancearepartof theinput.  DataComplexity: Fixthequery;theinputconsistsoftheinstance only.

ComplexityofConjunctiveQueryEvaluation:  The combinedcomplexity ofconjunctivequeryevaluationis NPcomplete.  The datacomplexity ofconjunctivequeryevaluationisinP (infact,itisinLOGSPACE).

155 TheComplexityofDatabaseQueryLanguages

RelationalCalculus ConjunctiveQueries

QueryEvaluationProblem: PSPACEcomplete NPcomplete CombinedComplexity

QueryEvaluationProblem: InLOGSPACE InLOGSPACE DataComplexity (hence,inP) (hence,inP)

QueryEquivalence Undecidable NPcomplete Problem

QueryContainment Undecidable NPcomplete Problem

156 ConjunctiveQueryMinimization

 Definition: LetQbeaconjunctivequery.A minimalequivalent toQisaconjunctivequeryQ’ suchthat

 QisequivalenttoQ’;

 Q’ hasasfewconjunctsasanyconjunct.queryequivalenttoQ.

 Example :LetQbetheconjunctivequery Q(x,y): E(y,x),E(z,x),E(w,x),E(x,u) Thentheconjunctivequery Q’(x,y): E(y,x),E(x,u) isaminimalequivalentconjunctivequerytoQ.(Why?)

157 ConjunctiveQueryMinimizationandConjunctiveQuery Evaluation

 Anaturalapproachtoconjunctivequeryevaluationistofirst processagivenconjunctivequeryQ,transformittoaminimal equivalentconjunctivequeryQ’,andthenevaluateQ’ insteadofQ.

 Atfirstsight,thisseemstobeapromisingapproach.

 Thereis goodnews and badnews .First,the goodnews :

 Theorem: LetQbeaconjunctivequery.  ThereisaminimalequivalentconjunctivequeryQ’ obtainedfrom Qbyremovingzeroormoreconjuncts.  AllminimalequivalentconjunctivequeriestoQareisomorphic, i.e.,theircanonicalinstancesareallisomorphictoeachother. Proof: UsetheHomomorphismTheorem(exercise).

158 ConjunctiveQueryMinimizationandConjunctiveQuery Evaluation

Next,the badnews : Theorem:  ThefollowingproblemisNPhard:GiventwoconjunctivequeriesQ andQ’,isQ’ aminimalequivalentquerytoQ?  Consequently,unlessP=NP,thereis no polynomialtimealgorithm suchthat,givenaconjunctivequeryQ,thealgorithmoutputsa conjunctivequeryQ’ thatisaminimalequivalentquerytoQ. Proof: Reductionfrom3Colorability(exercise). Note:  Thisisnotsurprisingsince,aswesaw,conjunctivequeryevaluation isNPcomplete.  Thereisanexponentialtimealgorithmsuchthat,givena conjunctivequeryQ,thealgorithmoutputsaconjunctivequeryQ’ thatisaminimalequivalentquerytoQ.

159 TractableCasesofConjunctiveQueryEvaluation

 SinceconjunctivequeryevaluationisNPcomplete,therehasbeen anextensiveinvestigationofspecialcasesofconjunctivequery evaluationforwhichtheproblemisinP.

 Thekeyideaistoimpose structuralrestrictions ontheconjunctive queriesconsidered:

 Acyclicjoins– M.Yannakakis (1981)

 Variousextensionsofacyclicity havebeenstudiedovertheyears, includingqueriesof boundedwidth andqueriesof bounded hypertree width .

 Extensiveinteractionwithconstraintsatisfaction,logic,andgraph theory.

 Thisbelongsmoretoanadvancedtopicscourseoranindependent studycourse.

160 BeyondConjunctiveQueries

 Whatcanwesayaboutquerylanguagesofintermediateexpressive powerbetweenconjunctivequeriesandthefullrelationalcalculus?

 Conjunctivequeriesformthesublanguageofrelationalalgebra obtainedbyusingonlycartesian product,projection,andselection withequalityconditions.

 Thenextstepwouldbetoconsiderrelationalalgebraexpressions thatalsoinvolveunion.

161 BeyondConjunctiveQueries

 Definition:

 A unionofconjunctivequeries isaqueryexpressiblebyanexpressionof

theformq 1 ∪ q2 ∪ … ∪ qm,whereeachqi isaconjunctivequery.  A monotonequery isaqueryexpressiblebyarelationalalgebra expressionwhichusesonlyunion,cartesian product,projection,and selectionwithequalitycondition.

 Fact:

 Everyunionofconjunctivequeriesisamonotonequery.

 Everymonotonequeryisequivalenttoaunionofconjunctivequeries, buttheunionmayhaveexponentiallymanydisjuncts. (normalform formonotonequeries).

 Monotonequeriesarepreciselythequeriesexpressiblebyrelational calculusexpressionsusing Æ, Ç,and ∃ only.

162 UnionsofConjunctiveQueriesandMonotoneQueries

 UnionofConjunctiveQueries

E ∪ π1,4 (σ$2=$3 (E × E))or,asarelationalcalculusexpression,

E(x 1,x 2) Ç ∃ z(E(x 1,z) Æ E(z,x 2))

 MonotoneQuery

ConsidertherelationschemasR 1(A,B),R 2(A,B),R 3(B,C),R 4(B,C). Themonotonequery

(R 1 ∪ R2) ⋈ (R 3 ∪ R4) isequivalenttothefollowingunionofconjunctivequeries:

(R 1 ⋈ R3) ∪ (R 1 ⋈ R4) ∪ (R 2 ⋈ R3) ∪ (R 2 ⋈ R4).

163 TheContainmentProblemforUnionsofConjunctive Queries

Theorem: Sagiv andYannakakis – 1981

Letq 1 ∪ q2 ∪ … ∪ qm andq’1 ∪ q’2 ∪ … ∪ q’n betwounionsof conjunctivequeries.Thenthefollowingareequivalent:

1. q1 ∪ q2 ∪ … ∪ qm ⊆ q’1 ∪ q’2 ∪ … ∪ q’n.

2. Foreveryi ≤ m,thereisj ≤ nsuchthatqi ⊆ q’j.

Proof: UsetheHomomorphismTheorem qi qi 1. ⇒ 2.SinceI  qi,wehavethatI  q1 ∪ q2 ∪ … ∪ qm ,hence qi qi I  q’1 ∪ q’2 ∪ … ∪ q’n ,hencethereissomej ≤ nsuchthatI  q’j, hence(bytheHomomorphismTheorem)qi ⊆ q’j.

2. ⇒ 1.Thisdirectionisobvious.

164 TheContainmentProblemforUnionsofConjunctive Queries

 Corollary: TheQueryContainmentProblemfor UnionsofConjunctiveQueriesisNPcomplete.

 Proof:

 MembershipinNPfollowsfromtheSagivYannakakis Theorem.  Weguessmpairs(q’ ,h )andverifythatforeveryi ≤ m, ki ki thefunctionh isahomomorphismfromIq’ki toIqi. ki  NPhardnessfollowsfromthefactthatConjunctiveQuery Containmentisaspecialcaseofthisproblem.

 Fact: TheQueryEvaluationProblemforUnionsofConjunctive QueriesisNPcomplete(combinedcomplexity). Proof: Exercise.

165 TheComplexityofDatabaseQueryLanguages

Relational Conjunctive Unionsof Calculus Queries Conjunctive Queries QueryEvaluation: PSPACEcomplete NPcomplete NPcomplete Combined Complexity QueryEvaluation: InLOGSPACE InLOGSPACE InLOGSPACE DataComplexity (hence,inP) (hence,inP) (hence,inP)

QueryEquivalence Undecidable NPcomplete NPcomplete

Query Undecidable NPcomplete NPcomplete Containment

166 MonotoneQueries

 Eventhoughmonotonequerieshavethe sameexpressivepower as unionsofconjunctivequeries,thecontainmentproblemformonotone querieshas highercomplexity thanthecontainmentproblemfor unionsofconjunctivequeries( syntax/complexitytradeoff )

 Theorem: Sagiv andYannakakis – 1982 p Thecontainmentproblemformonotonequeriesis Π2 complete.

 Note: p  Π2 isacomplexityclassthatcontainsNPandiscontainedin PSPACE. p  Theprototypical Π2 completeproblemis ∀∃ SAT,i.e.,the restrictionofQBFtoformulasoftheform

∀ x1…∀ xm∃ y1 …∃ yn ϕ.

167 TheComplexityofDatabaseQueryLanguage

Relational Conjunctive Unionsof Monotone Calculus Queries Conjunctive Queries Queries QueryEval.: PSPACE NPcomplete NPcomplete NPcomplete Combined complete Complexity QueryEval.: InLOGSPACE InLOGSPACE InLOGSPACE InLOGSPACE Data (hence,inP) (hence,inP) (hence,inP) (hence,inP) Complexity p Query Undecidable NPcomplete NPcomplete Π2 complete Equivalence

p Query Undecidable NPcomplete NPcomplete Π2 complete Containment

168 ConjunctiveQuerieswithInequalities

 Definition: Conjunctivequerieswithinequalities formthe sublanguageofrelationalalgebraobtainedbyusingonlycartesian product,projection,andselectionwithequalityandinequality (≠,<, ≤)conditions.

 Example: Q(x,y): E(x,z),E(z,w),E(w,y),z≠ w,z<y.

 Theorem: (Klug– 1988,vander Meyden – 1992)

 Thequerycontainmentproblemforconjunctivequerieswith p inequalitiesis Π2 complete.  Thequeryevaluationproblemforconjunctivequerieswith inequalitiesinNPcomplete.

169 TheComplexityofDatabaseQueryLanguages

Relational Conjunctive Unionsof Monotone Calculus Queries Conjunctive Queries/ Queries Conj.Queries withInequal. QueryEval.: PSPACE NPcomplete NPcomplete NPcomplete Combined complete Complexity QueryEval.: InLOGSPACE InLOGSPACE InLOGSPACE InLOGSPACE Data (hence,inP) (hence,inP) (hence,inP) (hence,inP) Complexity

p Query Undecidable NPcomplete NPcomplete Π2 complete Equivalence

p Query Undecidable NPcomplete NPcomplete Π2 complete Containment

170 ConjunctiveQuerieswithInequalities

 Note: TheHomomorphismTheorem fails forconjunctivequeries withinequalities.

 Example: Considerthequeries

 q(x,y): E(x,y),F(u,v)

 p(x,y): E(x,y),F(u,v),u≠ v. Then

 q(x,y) ⊈ p(x,y)(why?) p q  Yet,I isthesameasI ; inparticular,thereisahomomorphismh:Ip → Iq.

 Notealsothatp(x,y) ⊆ q(x,y)(why?)

171 ComplexityofQueryContainment

 So,thecomplexityofquerycontainmentforconjunctivequeriesand theirvariantsiswellunderstood.

Caveat:  Allprecedingresultsassume setsemantics ,i.e.,queriestake sets asinputsandreturn sets asoutput(duplicates are eliminated).  DBMS,however,use bagsemantics( semantics) ,since theyreturn bags() (recallthatduplicatesare not eliminatedinSQL,unlessexplicitly specified).

172 A RealConjunctiveQuery

 ConsiderthefollowingSQLquery: TableEmployeehasattributessalary,dept,… SELECTsalary FROMEmployee WHEREdept=‘CS’

 RecallthatSQLkeepsduplicates,because:

 Usermaycareaboutduplicates {100,100,200} differentthan {100,200} for AVERAGE

 Ingeneral,bagscanbemore“efficient” thansets.

173 QueryEvaluationunderBagSemantics

Operation Multiplicity  R1 AB 12 Union m +m 1 2 12 ∪ R1 R2 23

Intersection min(m 1,m 2)  R2 BC 24 R1 R2 25

Product m1× m2  (R ⋈ R ) ABC R × R 1 2 1 2 124 124 Projectionand Duplicatesarenot 125 Selection eliminated 125

174 Bag(Multiset)Semantics

S.Chaudhuri &M.Y.Vardi – 1993 Optimizationof RealConjunctiveQueries

 Calledforareexaminationofconjunctivequeryoptimizationunder bagsemantics.

 Inparticular,theyinitiatedthestudyofthe containmentproblemforconjunctivequeriesunderbagsemantics .

175 BagSemanticsvs.SetSemantics

 QBAG (I): Resultofevaluating Q on(bag)databaseinstance I.

 Forbags R1,R 2:

R1 ⊆BAG R2 if m( a,R 1) ≤ m( a,R 2), foreverytuple a.

 Q1 ⊆BAG Q2 ifforevery(bag)database I,wehavethat BAG BAG Q1 (I) ⊆BAG Q2 (I).

Fact:

 Q1 ⊆BAG Q2 implies Q1 ⊆ Q2 (thisisobviousfromthe).

 Theconversedoes not alwayshold.

176 BagSemanticsvs.SetSemantics

Fact: Q1 ⊆ Q2 neednotimplythatQ 1 ⊆BAG Q2.

Example:

 Q1(x): P(x),T(x)

 Q2(x): P(x)

 Q1 ⊆ Q2 (thisisobviousfromthedefinitions)

 Q1 ⊈BAG Q2  Considerthe(bag)instanceI={P(a),T(a),T(a)}. Then:

 Q1(I)={a,a}

 Q2(I)={a},soQ 1(I) ⊈ Q2(I).

177 ConjunctiveQueryEvaluationunderBagSemantics

 Booleanquery Q: E(x,y),E(y,z),E(z,x)

 SetSemantics Q(G)=“true” ifandonlyifthegraphGcontainsatriangle.

 BagSemantics QBAG (G)=#oftrianglesinthegraphG.

178 ConjunctiveQueryEvaluationunderBagSemantics

Consider:

 K3:thecompletegraphwith3nodes  G=(V,E)anarbitrarygraphandthecanonicalconjunctivequery QG ofG. Then  SetSemantics G Q (K 3)=“true” ifandonlyifGis3colorable.  BagSemantics GBAG Q (K 3)=#3coloringsofthegraphG.

Corollary: Theconjunctivequeryevaluationproblemunderbag semanticsis#Pcomplete.

179 QueryContainmentunderBagSemantics

 Chaudhuri &Vardi 1993statedthat: Underbagsemantics,thecontainmentproblemforconjunctive p queriesis Π2 hard.

 OpenProblem:

 Whatisthe exactcomplexity ,underbagsemantics,ofthe containmentproblemforconjunctivequeries?

 Isthisproblem decidable ?Eventhisisnotknowntodate!

180 UnionsofConjunctiveQueries

Theorem: Ioannidis&Ramakrishnan – 1995 Underbagsemantics,thecontainmentproblemfor unionsofconjunctivequeriesis undecidable .

HintofProof: Reductionfrom Hilbert’s10 th Problem.

181 Hilbert’s10 th Problem

 Hilbert’s10 th Problem – 1900 (10 th inhislistof23problems) Findanalgorithmforthefollowingproblem:

Givenapolynomialequation p(x 1,...,xn)=0 withintegercoefficients, doesithaveanallintegersolution?

 Matijasevic – 1971 th  Hilbert’s10 Problem is undecidable ,hence no suchalgorithm exists.  Undecidable ,evenfordegree d=4 and n=58 .

182 Hilbert’s10 th Problem

 Fact: ThefollowingvariantofHilbert’s10 th Problemis undecidable :

 Giventwopolynomialsp 1(x 1,…xn)andp 2(x 1,…xn)withpositive integercoefficientsandnoconstantterms,isittruethatp 1 ≤ p2? i.e.,isittruethatp 1(a 1,…,a n) ≤ p2(a 1,…an),forallpositive integersa 1,…,a n?

 So,thereisnoalgorithmfordecidingquestionslike: 4 6 ?  Is3x 1 x2x3 +2x 2x3 ≤ x1 +5x 2x3

183 UnionsofConjunctiveQueries

Theorem: Ioannidis&Ramakrishnan – 1995 Underbagsemantics,thecontainmentproblemforunionsof conjunctivequeriesis undecidable ,evenifallrelationsareunary.

HintofProof:  ReductionfromthepreviousvariantofHilbert’s10th Problem:  Use joins ofunaryrelationstoencode monomials (productsof variables).  Use unions toencode sumsofmonomials .

184 UnionsofConjunctiveQueries

Theorem: Ioannidis&Ramakrishnan – 1995 Underbagsemantics,thecontainmentproblemforunionsof conjunctivequeriesis undecidable ,evenifallrelationsareunary.

4 Example: Considerthepolynomial3x 1 x2x3 +2x 2x3 4  Themonomialx 1 x2x3 isencodedbytheconjunctivequery P (w),P (w),P (w),P (w),P (w),P (w). 1 1 1 1 2 3

 Themonomialx 2x3 isencodedbytheconjunctivequeryP 2(w),P 3(w).

4  Thepolynomial3x 1 x2x3 +2x 2x3 isencodedbytheunion:  threecopiesofP (w),P (w),P (w),P (w),P (w),P (w)and 1 1 1 1 2 3  twocopiesofP 2(w),P 3(w).

185 ComputationalComplexityofQueryContainment

ClassofQueries Complexity– Complexity– SetSemantics BagSemantics

Conjunctivequeries NPcomplete Open ChandraMerlin– 1977

Unionsofconj.queries NPcomplete Undecidable SagivYannakakis – 1981 IoannidisRamakrishnan 1995 p Conj.querieswith Π2 complete ≠≠≠ , ≤, ≥ Vander Meyden – 1992

Relationalcalculus Undecidable Undecidable queries Trakhtenbrot 1949

186 BagSemanticsandConjunctiveQuerieswith g

Theorem: Jayram,K… ,Vee – 2006 Underbagsemantics,thecontainmentproblemfor conjunctivequerieswith g is undecidable .

Infact,thisproblemis undecidable evenif  thequeriesuseonlya single relationofarity 2;  the numberofinequalities inthequeriesisatmostsome fixed constant .

187 BagSemanticsandConjunctiveQuerieswith g

Proof Idea: ReductionfromanothervariantofHilbert’s10 th Problem:

Givenhomogeneouspolynomials

P1(x 1,…,x 59 ) and P2(x 1,…,x 59 ) bothwithintegercoefficientsandbothofdegree 5, 5 is P1(x 1,…,x 59 ) ≤ (x 1) P2(x 1,…,x 59 ), forallintegers x1,…,x 59 ?

Thereductionismuchmoreinvolvedthantheearlierreductionfor unionsofconjunctivequeries.

188 ProofIdea(continued)

 Givenpolynomials P1 and P2  Bothwithintegercoefficients

 Bothhomogeneous,degree 5

 Bothwithatmost n=59 variables

 Wewanttofind Q1 and Q2 suchthat g  Q1 and Q2 areconjunctivequerieswithinequalities 5  P1(x 1,…,x 59 ) ≤ (x 1) P2(x 1,…,x 59 )

forallintegers x1,…,x 59 ifandonlyif Q (D) ⊆ Q (D) forall(bag)databases D. 1 BAG 2

189 ProofOutline:

Proofiscarriedoutinthreesteps.

Step1: OnlyconsiderDBs ofa special form. Showhowtouseconjunctivequeriestoencode polynomials(withoutusinginequalities!)

Step2: “Force” DBtohavespecialformusinginequalities.

 If D isnotofspecialform,then

Q1(D) ⊆BAG Q2(D) necessarily.

Step3: Showthatweonlyneeda single relationofarity 2.

190 Step1:DBs ofaSpecialForm Example

 Encodeahomogeneous,2variable,degree2polynomialinwhichall coefficientsare1. 2 2 P(x 1,x 2)=x 1 +x 1x2 +x 2  DBs ofspecialform:  TernaryrelationTERMconsistingof

 (X 1,X 1,T 1),(X 1,X 2,T 2),(X 2,X 2,T 3) allspecialDBs havepreciselythistableforTERM  BinaryrelationVALUE  Tablefor VALUEvariestoencodedifferentvaluesforthe variables x1,x 2.

 Query Q: TERM(u 1,u 2,t),VALUE(u 1,v 1),VALUE(u 2,v 2)

191 Step1:DBs ofaSpecialForm Example

2 2  P(x 1,x 2)=x 1 +x 1x2 +x 2 2 2 x1 =3,x 2 =2,P(3,2)=3 +3 2+2 =19.

 Query Q: TERM(u 1,u 2,t),VALUE(u 1,v 1),VALUE(u 2,v 2)  DB D ofspecialform:

 TERM: (X 1,X 1,T 1),(X 1,X 2,T 2),(X 2,X 2,T 3)

 VALUE: (X 1,1),(X 1,2),(X 1,3)

(X 2,1),(X 2,2)

Claim: P(3,2)=19=Q BAG (D)

192 Step1:DBs ofaSpecialForm Example

 P(3,2)=3 2 +3 2+2 2 =19.

 Query Q: TERM(u 1,u 2,t),VALUE(u 1,v 1),VALUE(u 2,v 2)

 D hasTERM: (X 1,X 1,T 1),(X 1,X 2,T 2),(X 2,X 2,T 3)

VALUE: (X 1,1),(X 1,2),(X 1,3),(X 2,1),(X 2,2)  QBAG (D)=19, because:

 t → T1,u 1→ X1,u 2→ X1. Hence: 2 v1 → 1,2, or 3 and v2→ 1 or 2, soweget 3 witnesses.

 t → T2,u 1→ X1,u 2→ X2. Hence:

v1 → 1,2, or 3 and v2→ 1 or 2, soweget 32 witnesses.

 t → T3,u 1→ X2,u 2→ X2. Hence: 2 v1 → 1 or 2, and v2→ 1or2, soweget 2 witnesses.

193 Step1:CompleteArgumentandWrapup

 Previoustechniqueonlyworksifallcoefficientsare1  Forthecompleteargument:  addafixedtableforeverytermtotheDB;  encodecoefficientsinthequery;  onlytableforVALUEcanvary.  Summary:  Ifthedatabasehasaspecialform,thenwe canencodeseparatelyhomogeneouspolynomials

P1 andP 2 byconjunctivequeriesQ 1 andQ 2.  ByvaryingtableforVALUE,wevarythevariablevalues.  No ≠constraintsareusedinthisencoding;hence,conjunctivequery containmentis undecidable ,ifrestrictedtodatabasesofthespecial form.

194 Step2:ArbitraryDatabases

Idea: Useinequalities g intheencodingqueries toachievethefollowing:

 Ifadatabase D isofspecialform,thenwearebacktotheprevious case.

 Ifadatabase D isnotofspecialform,then

Q1(D) ⊆BAG Q2(D) necessarily.

195 Step2:ArbitraryDatabases Hint

1. Ensurethatcertain“facts” inspecialformDBs appear (elseneitherqueryissatisfied).  Thisisdonebyaddingapartofthe canonicalquery ofspecialformDBs as subgoals toeachencodingquery.

2. ModifyspecialformDBs byadding gadgettuples toTERMand toVALUE.

 TERM: (X 1,X 1,T 1),(X 1,X 2,T 2),(X 2,X 2,T 3), (T 0,T 0,T 0)

 VALUE: (X 1,1),(X 1,2),(X 1,3),(X 2,1),(X 2,2), (T 0,T 0)

3. Addextrasubgoals to Q2,sothatif D isnotofspecialform,then Q2 “benefits” morethan Q1 and,asaresult, Q1(D) ⊆BAG Q2(D) .

196 Step2:ArbitraryDatabases Example

2 2  P1(x 1,x 2)=x 1 +x 1x2 +x 2  Poly 1(u 1,u 2,t) : TERM(u 1,u 2,t),VALUE(u 1,v 1),VALUE(u 2,v 2) thequeryencodingP 1 onspecialformDBs.  TERM: (X 1,X 1,T 1),(X 1,X 2,T 2),(X 2,X 2,T 3), (T 0,T 0,T 0)  VALUE: (X 1,1),(X 1,2),(X 1,3),(X 2,1),(X 2,2), (T 0,T 0)

 Q1 : Poly 1(u 1,u 2,t)  Q2 : Poly 2(u 1,u 2,t),Poly 1(w 1,w 2,w),w ggg T1,w ggg T2,w ggg T3

Fact:

 IfDBisofspecialform,then Q2 getsnoadvantage,because w → T0,w 1 → T0,w 2 → T0 istheonlypossibleassignment.  IfDBnotofspecialform,sayithasanextrafact (X 2,X 1,T’),thenboth Q1 and Q2 canuseitequally.

197 Step2:ArbitraryDatabases– Wrapup

 Additionaltricksareneededforthefullconstruction.

 Fullconstructionusessevendifferentcontrolgadgets.

 Additionalcomplicationswhenweencodecoefficients.

 Inequalities ggg areusedinbothqueries.

 Numberofinequalities ggg dependsonsizeofspecialformDBs,notcounting thefactsinVALUEtable.

 Hence,dependsondegreeofpolynomials,#ofvariables.

 Itisahugeconstant(about59 10 ).

198 ComplexityofQueryContainment

ClassofQueries Complexity– Complexity– SetSemantics BagSemantics

Conjunctivequeries NPcomplete Open CM– 1977

Unionsofconj.queries NPcomplete Undecidable SY 1980 IR 1995

p Conj.querieswith Π2 complete Undecidable ≠≠≠ , ≤, ≥ vdM 1992 JKV 2006

Firstorder(SQL)queries Undecidable Undecidable Gödel 1931

199 DirectionsforFutureWork

 MajorOpenProblem: Conjunctivequerycontainmentproblem(noinequalities),underbag semantics.

 Isit decidable ?

 Ifso,whatistheexactcomplexity?

 Identifyclassesofqueriesforwhich,underbagsemantics,thecontainment problemis tractable .

 Pinpointtheexactcomplexityof bagequivalence .

200 TheQueryEquivalenceProblem

 QueryEquivalence :given Q1, Q2,is Q1 ≡ Q2?  Underbagsemantics,  Forconjunctivequeries,ithasthesamecomplexityas GRAPHISOMORPHISM Chaudhuri &Vardi 1993  Forconjunctivequerieswithinequalities g,  LowerBound :ItisGRAPHISOMORPHISMhard.  UpperBound :ItisinPSPACE Nutt,Sagin,Shurin (Cohen)– 1998 Biggapbetweenthelowerandtheupperbound.

201 LimitationsofRelationalAlgebra&RelationalCalculus

Outline:

 RelationalAlgebraandRelationalCalculushavesubstantial expressivepower.Inparticular,they can express

 NaturalJoin

 Quotient

 Unionsofconjunctivequeries

 …

 However,they cannot express recursive queries.

 Datalog isadeclarativedatabasequerylanguagethataugmentsthe languageofconjunctivequerieswitha mechanism .

202 Parents,Grandparents,andGreatgrandparents

 LetPARENTbeabinaryrelationalschemasuchthatif (a,b) ∈ PARENTinsomedatabaseinstance,thenaisaparentofb.

 UsingPARENT,wecandefineGRANDPARENTand GREATGRANPARENTbytheconjunctivequeries:

GRANDPARENT(x,y): PARENT(x,z),PARENT(z,y) and GREATGRANDPARENT(x,y): PARENT(x,z),PARENT(z,w), PARENT(w,y)

 Similarly,wecandefineGREATGREATGRANPARENTbya conjunctivequery,andsoonuptoany fixedlevel ofancenstry.

203 ParentsandAncestors

 Question: Istherearelationalalgebra(relationalcalculus) expressionthatdefinesANCESTORfromPARENT?

 Note: Thistypeofquestionoccursinotherrelatedconcepts:  GivenabinaryrelationMANAGES(manager,employee),istherea relationalalgebra(relationalcalculus)expressionthatdefines HIGHERMANAGER

 GivenabinaryrelationDIRECT(from,to)aboutflights,istherea relationalalgebra(relationalcalculus)expressionthatdefines CANFLY(from,to)?

 Moreabstractly,givenabinaryrelationE,istherearelational algebra(relationalcalculus)expressionthatdefinesthe TransitiveClosure TCofE?

204 EdgesandPaths

Definition: LetEbeabinaryrelation

 Foreveryn ≥ 1,letPATH n bethebinaryquery: “givenaandb,isthereapathoflengthnfromatobalongedges fromE?”  PATH isthebinaryquery: “givenaandb,isthereapathfromatobalongedgesfromE?”

Fact:

 Foreveryn ≥ 1,thequeryPATH n isexpressiblebyaconjunctive query.(Why?)  Hence,PATHisexpressiblebyan infiniteunion ofconjunctive queries:

PATH ≡ PATH 1 ∪ PATH 2 ∪ … ∪ PATH n ∪ …

205 Edges,Paths,andTransitiveClosure

Facts: LetEbeabinaryrelation

 PATHisthe TransitiveClosure ofE,i.e., the smallest binaryrelationTsuchthat

 E ⊆ T

 Tistransitive(if(a,b) ∈ Tand(b,c) ∈ T,then(a,c) ∈ T).

 Thereareseveralwellknownefficientalgorithmsforcomputingthe TransitiveClosureofagivenbinaryrelationE

 Floyd–Warshall Algorithm (taughtinCMPS101).

 RecallthatthefollowingproblemisNLOGSPACEcomplete: GivenE,aandb,isthereapathfromatob?(i.e.,is(a,b) ∈ TC?)

206 TransitiveClosureandRelationalCalculus

 Question: Istherearelationalalgebra(relationalcalculus) expressionthatdefinesANCESTORfromPARENT?Inotherwords, istherearelationalalgebra(relationalcalculus)expressionthat definestheTransitiveClosureofagivenbinaryrelationE?

 Theorem: A.Aho andJ.Ullman – 1979 Thereis no relationalalgebra(orrelationalcalculus)expressionthat definestheTransitiveClosureofagivenbinaryrelationE.

207 TransitiveClosureandRelationalCalculus

Theorem: A.Aho andJ.Ullman – 1979 Thereis no relationalalgebra(orrelationalcalculus)expression thatdefinestheTransitiveClosureofagivenbinaryrelationE.

Note:  Theproofofthisresultrequiresmethodsfrommathematicallogic.

 EhrenfeuchtFraïssé Games isapowerfulmethodforprovinglimitationsin theexpressivepowerofrelationalcalculus.

 Twopersonperfectinformationcombinatorialgamesinwhichtwo playerstaketurnsandpickelementsintwodifferentdatabase instances.  Oneplayertriestomaintainapartialisomorphismbetweenthemoves played;theotherplayertriestoviolatethis.

208 TransitiveClosureandRelationalCalculus

Theorem: A.Aho andJ.Ullman – 1979 Thereis no relationalalgebra(orrelationalcalculus)expression thatdefinestheTransitiveClosureofagivenbinaryrelationE.

Intuitionbehindthisresult:  RelationalCalculusqueriescanonlyexpress“local ” properties s s

E F t t

209 LimitationsofRelationalCalculusandRelationalCalculus

 Thereis no relationalalgebra(relationalcalculus)expression involvingPARENTthatdefinesANCESTOR.

 ANCESTORisdefinablebyan infiniteunion ofconjunctivequeries, butisnotdefinablebyany finiteunion ofconjunctivequeries.

 Aho andUllman’s Theoremrevealsa limitation intheexpressive powerofrelationalalgebraandrelationalcalculus,namely

 They cannot expressrecursivequeries.

210 OvercomingtheLimitationsofRelationalCalculus

 Question: Whatistobedonetoovercomethelimitationsofthe expressivepowerofrelationalcalculus?

 Answer1: EmbeddedRelationalCalculus(EmbeddedSQL):

 AllowSQLcommandsinsideaconventionalprogramming language,suchasC,Java,etc.

 Thisisaninferiorsolution,asitdestroysthehighlevelcharacter ofSQL.

 Answer2:

 Augmentrelationalcalculuswithahighleveldeclarative mechanismforrecursion.

 Conceptually,thisasuperiorsolutionasitmaintainsthehigh leveldeclarativecharacterofrelationalcalculus.

211 Datalog

 Datalog =“ConjunctiveQueries+Recursion”

 Datalog wasintroducedbyChandraandHarel in1982andhasbeen studiedbytheresearchcommunityindepthsincethattime:

 Hundredsofresearchpapersinmajordatabaseconferences;

 Numerousdoctoraldissertations.

 Recentapplicationsoutsidedatabases:

 Specificationofnetworkproperties

 Accesscontrollanguages

 SQL:1999andsubsequentversionsoftheSQLstandardprovide supportforasublanguageofDatalog,called linearDatalog .

212 Datalog Syntax

 Definition :A Datalog program π isafinitesetofruleseach expressingaconjunctivequery

T(x 1,…,xk): R1(u1),…,Rn(un),

whereeachvariablex i occursinthebodyoftherule.

 Somerelationalsymbolsoccurringintheheadsoftherulesmay alsooccurinthebodiesoftherules (unliketherulesforconjunctivequeries).  Theserelationalsymbolsarethe recursive relationalsymbols; theyarealsoknownas intensional databasepredicates (IDBs).

 Theremainingrelationalsymbolsintherulesareknownasthe extensionaldatabasepredicates (EDBs).

213 Datalog

 Example: Datalog programforTransitiveClosure T(x,y): E(x,y) T(x,y): E(x,z),T(z,y)

 EistheEDBpredicate

 TistheIDBpredicate

 TheintuitionisthattheDatalog programgivesa recursive specification oftheIDBpredicateTintermsoftheEDBE.

 Example: AnotherDatalog programforTransitiveClosure T(x,y): E(x,y) T(x,y): T(x,z),T(z,y) (“divideandconquer ” algorithmforTransitiveClosure)

214 Datalog

 Example: PathsofEvenandOddLength ConsidertheDatalog program: ODD(x,y): E(x,y) ODD(x,y): E(x,z),EVEN(z,y) EVEN(x,y): E(x,z),ODD(z,y).

 EistheEDBpredicate  EVENandODDaretheIDBpredicates.  So,aDatalog programmayhaveseveraldifferentIDBpredicates (anditmayhaveseveraldifferentEDBpredicatesaswell).  Thisprogramgivesa recursivespecification oftheIDBpredicates EVENandODDintermsoftheEDBpredicateE.  ThisisaDatalog programexpressing mutualrecursion .

215 ConjunctiveQueriesvs.Datalog

 Aswehaveseen,conjunctivequeriescanbewrittenasrules:

T(x 1,…,xk): R1(u1),…,Rn(un).  Insucharule,therelationsymbolinthehead doesnotoccur inthebodyoftherule.  Datalog programsarefinitesetsofrules.

 InaDatalog program,however,arelationsymbolsoccurringin theheadsofarule:

 mayalsooccur inthebodyofthe same rule T(x,y): E(x,z),T(z,y)

 or,it mayoccur inthebodyof another ruleintheprogram ODD(x,y): E(x,z),EVEN(z,y) EVEN(x,y): E(x,z),ODD(z,y).

216 Datalog Semantics

 Question: WhatistheprecisesemanticsofaDatalog program?

 Answer: Datalog programscanbegiventwodifferenttypesof semantics.  DeclarativeSemantics(denotational semantics)  Smallestsolutionsofrecursivespecifications.  Leastfixedpointsofmonotoneoperators.

 ProceduralSemantics(operationalsemantics)  Aniterativeprocessforcomputingthe“meaning” ofDatalog programs.

 MainResult: Thedeclarativesemantics coincides withthe proceduralsemantics.

217 DeclarativeSemanticsofDatalog Programs

Motivation:  Recalltherecursivedefinitionofthe factorialfunction f(n)=n! f(0)=1 f(n+1)=(n+1)—f(n)  Thesetwoequationsgivea recursivespecification ofthefactorial functionf(n)=n!  Thefactorialfunctionistheonlyfunctionontheintegersthat satisfiesthisspecification.  Similarly,recalltherecursivedefinitionofg(x,y)=xy g(x,0)=1 g(x,y+1)=x—g(x,y) y  The exponentialfunction g(x,y)=x istheonlyfunctiononthe integersthatsatisfiesthisspecification.

218 DeclarativeSemanticsofDatalog Programs

 EachDatalog programcanbeviewedasa recursivespecification of itsIDBpredicates.  Thisspecificationisexpressedusingrelationalalgebraoperators  Thebodyofeachruleuses π, σ,andcartesian product ×  Allruleshavingthesamepredicateintheheadarecombined usingunion.  Therecursivespecificationisgivenbyequationsinvolving unionsofconjunctivequeries .

 Example: T(x,y): E(x,y) T(x,y): T(x,z),T(z,y)  Recursiveequation:

T=E ∪ π1,4 (σ$2=$3 (T ×T))

219 DeclarativeSemanticsofDatalog Programs

Example: ConsidertheDatalog program: ODD(x,y): E(x,y) ODD(x,y): E(x,z),EVEN(z,y) EVEN(x,y): E(x,z),ODD(z,y).

 System ofrecursiveequations:

ODD=E ∪ π1,4 (σ$2=$3 (E ×EVEN))

EVEN= π1,4 (σ$2=$3 (E ×ODD)).

220 DeclarativeSemanticsofDatalog Programs

 Unliketherecursiveequationsforthefactorialandtheexponential function,recursiveequationsarisingfromDatalog programs neednot haveauniquesolution.

 Example: Considertherecursiveequation:

T=E ∪ π1,4 (σ$2=$3 (T ×T)) LetE={(1,2),(2,3)}.

ThenbothT 1 andT 2 satisfythisrecursiveequation,where

 T1 ={(1,2),(2,3),(1,3)}

 T2 ={(1,2),(2,1),(2,3),(3,2),(1,3),(3,2),(1,1),(2,2),(3,3)}. Furthermore,thisrecursiveequationhasmanyothersolutions.

221 DeclarativeSemanticsofDatalog Programs

 Theorem: EveryrecursiveequationarisingfromaDatalog program hasasmallestsolution(smallestw.r.t.the ⊆ partialorder).  Example: Datalog program T(x,y): E(x,y) T(x,y): T(x,z),T(z,y)  Recursiveequation:

T=E ∪ π1,4 (σ$2=$3 (T ×T))  The smallestsolution ofthisrecursiveequationistheTransitive ClosureofE.

 Note: Thisisaspecialcaseofthe KnasterTarski Theorem for smallestsolutionsofrecursiveequationsarisingfrommonotone operators(itisimportantthatDatalog usesmonotonerelational algebraoperators only ).

222 DeclarativeSemanticsofDatalog Programs

 Theorem: EveryrecursiveequationarisingfromaDatalog program hasasmallestsolution(smallestw.r.t.the ⊆ partialorder).

 Definition: The (declarative)semanticsofaDatalog program isthe smallestsolutionofthesystemofrecursiveequationsarisingfrom theDatalog program.

 Note:

 IftheDatalog programhasmorethanoneIDBs,thenwegeta systemofrecursiveequations,insteadofsingleone.

 Question:

 Whatdoes“smallestsolutionofasystem ” meaninthiscase?

223 DeclarativeSemanticsofDatalog Programs

Example: ConsideragaintheDatalog program: ODD(x,y): E(x,y) ODD(x,y): E(x,z),EVEN(z,y) EVEN(x,y): E(x,z),ODD(z,y).

 System ofrecursiveequations:

ODD=E ∪ π1,4 (σ$2=$3 (E ×EVEN))

EVEN= π1,4 (σ$2=$3 (E ×ODD)).  The smallestsolution ofthissystemisapairofrelations(P,Q) suchthat

1. (P,Q)satisfiesthissystem(e.g.,Q= π1,4 (σ$2=$3 (E ×Q))). 2. Ifapair(P’,Q’)satisfiesthissystem,thenP ⊆ P’ andQ ⊆ Q’.

224 DeclarativeSemanticsofDatalog Programs

 Question:

 Howdifficultisittocomputethedeclarativesemanticsof Datalog programs?

 Inotherwords,whatisthecomputationalcomplexityofthe queryevaluationproblemforDatalog queries?

 Note: Onthefaceoftheirdefinition,thedeclarativesemanticsof Datalog programsdonotgiverisetoanalgorithmforcomputing thissemantics.

225 ProceduralSemanticsofDatalog Programs

Definition: Let π beaDatalog program.Theproceduralsemanticsof π areobtainedbythefollowing bottomupevaluation oftherecursive predicates(IDBs)of π:

1. SetallIDBs of π to ∅. 2. Applyallrulesof π inparallel;theIDBs byevaluatingthe bodiesoftherules. 3. RepeatuntilnoIDBpredicatechanges. 4. ReturnthevaluesoftheIDBpredicatesobtainedattheendof Step3.

226 ProceduralSemanticsofDatalog Programs

Example :Datalog programforTransitiveClosure T(x,y): E(x,y) T(x,y): E(x,z),T(z,y)

 Bottomupevaluation: T0 = ∅ Tn+1 ={(a,b):E(a,b) Ç ∃ z(E(a,z) Æ Tn(z,b))}

Fact: Thefollowingstatementsaretrue: n  T ={(a,b):thereisapathoflengthatmostnfromatob} n  TransitiveClosureofE= ∪ n ≥ 1T . Proof: Byinductiononn.

227 ProceduralSemanticsofDatalog Programs

Example :AnotherDatalog programforTransitiveClosure T(x,y): E(x,y) T(x,y): T(x,z),T(z,y)

 Bottomupevaluation: T0 = ∅ Tn+1 ={(a,b):E(a,b) Ç ∃ z(T n(a,z) Æ Tn(z,b))}

Fact: Thefollowingstatementsaretrue: n n  T ={(a,b):thereisapathoflengthatmost2 fromatob} n  TransitiveClosureofE= ∪ n ≥ 1T . Proof: Byinductiononn.

228 ProceduralSemanticsofDatalog Programs

Example :ConsidertheDatalog program ODD(x,y): E(x,y) ODD(x,y): E(x,z),EVEN(z,y) EVEN(x,y): E(x,z),ODD(z,y)  Bottomupevaluation: ODD 0 = ∅ EVEN 0 = ∅ ODD n+1 ={(a,b):E(a,b) Ç ∃ z(E(a,z) Æ EVEN n(z,b))} EVEN n+1 ={(a,b): ∃ z(E(a,z) Æ ODD n(z,b))}

Fact: Thefollowingstatementsaretrue: n  ∪ n ≥ 1 ODD ={(a,b):thereisapathofoddlengthfromatob} n  ∪ n ≥ 1 EVEN ={(a,b):thereisapathofevenlengthfromatob}. Proof: Byinductiononn.

229 Declarativevs.ProceduralDatalog Semantics

Theorem: Let π beaDatalog program.Thenthefollowingaretrue:

 Thebottomupevaluationoftheproceduralsemanticsof π terminateswithinanumberofstepsboundedbyapolynomialin thesizeofthedatabaseinstance(=sizeoftheEDBpredicates).

 Thedeclarativesemanticsof π coincideswiththeprocedural semanticsof π. Proof: Forsimplicity,assumethat π hasasingleIDBTofarity k. n n+1  Byinductiononn,showthatT ⊆ T ,foreveryn. (thisusesthemonotonicity ofunionsofconjunctivequeries). 0 1 n n+1  Hence,T ⊆ T ⊆ … ⊆ T ⊆ T ⊆ … n k k  SinceeachT ⊆ adom(I) ,thereisanm ≤ |adom(I)| suchthat Tm =T m+1 .

230 Declarativevs.ProceduralDatalog Semantics

Theorem: Let π beaDatalog program.Thenthefollowingaretrue:  Thebottomupevaluationoftheproceduralsemanticsof π terminateswithinanumberofstepsboundedbyapolynomialin thesizeofthedatabaseinstance(=sizeoftheEDBpredicates).  Thedeclarativesemanticsof π coincideswiththeprocedural semanticsof π. Proof: Forsimplicity,assumethat π hasasingleIDBTofarity k. m m+1  SinceT =T ,wehavethattheproceduralsemantics producesa solution totherecursiveequationarisingfrom π.  Byinductiononn,showthatifT*isanothersolutionofthis recursiveequation,thenTn ⊆ T*,foralln (usethe monotonicity ofunionsofconjunctivequeriesagain). m m  Inparticular,T ⊆ T*,henceT isthe smallestsolution ofthis recursiveequation.

231 TheQueryEvaluationProblemforDatalog

Theorem: Let π beaDatalog program.Thereisapolynomialtime algorithmsuchthat,givenadatabaseinstanceI,itevaluates π onI (i.e.,itcomputesthesemanticsof π onI). Proof: Thebottomupevaluationoftheproceduralsemanticsof π runs inpolynomialtimebecause:

 Thenumberofiterationsisboundedbyapolynomialinthesize ofI.

 Eachstepoftheiterationcanbecarriedoutinpolynomialtime (why? ).

Corollary: ThedatacomplexityofDatalog isinP.

232 TheQueryEvaluationProblemforDatalog

Corollary: ThedatacomplexityofDatalog isinP.

Theorem: ThecombinedcomplexityofDatalog isEXPTIMEcomplete.

Note:  P ⊆ NP ⊆ PSPACE ⊆ EXPTIME.  Moreover,itisknownthatPisproperlycontainedinEXPTIME.  Thus,Datalog hashighercombinedcomplexitythanrelational calculus(sincethecombinedcomplexityofrelationalcalculusis PSPACEcomplete).

233 SomeInterestingDatalog Programs

Example: Non2Colorability canbeexpressedbyaDatalog program.

Fact: AgraphEis2colorableifandonlyifitdoesnotcontainacycle ofoddlength.

Datalog programfor Non2Colorability :

ODD(x,y): E(x,y) ODD(x,y): E(x,z),EVEN(z,y) EVEN(x,y): E(x,z),ODD(z,y). Q: ODD(x,x)

Sanitycheck: CanyoufindaDatalog programfor Non3Colorability ?

234 SomeInterestingDatalog Programs

Example: PathSystemsProblem T(x): A(x) T(x): R(x,y,z),T(y),T(z)

Theorem: S.Cook– 1974 EvaluatingthisDatalog programisaPcompleteproblem (vialogspacereductions).

Note:  PathSystemswasthefirstproblemshowntobePcomplete.  Inparticular,itishighlyunlikelythatPathSystemsisinNLOGSPACE orinLOGSPACE.  Inthissense,Datalog hashigherdatacomplexitythanrelational calculus.

235 ProceduralSemanticsofDatalog Programs

 Onthefaceofit,thebottomupevaluationofDatalog programsis aninfiniteprocess,sinceitgivesrisetoan infinite sequenceof “stages ”T0,T 1,…,Tn,…

 Question: Doesthisinfinitesequence converg e?

 Itwillturnoutthat,foreveryDatalog program π andforevery databaseinstanceI,thebottomupevaluationof π onIterminates withinanumberofstepsthatisboundedbysomepolynomialinthe sizeofI,i.e.,thereissomemsuchthat Tm =T m+1 . Moreover,thesmallestsuchmisboundedbysomepolynomialin thesizeofI.

236 Declarativevs.ProceduralDatalog Semantics

Theorem: Let π beaDatalog program.Thenthefollowingaretrue:

 OneverydatabaseinstanceI,thebottomupevaluationofthe proceduralsemanticsof π terminateswithinanumberofsteps boundedbyapolynomialinthesizeofI.

 Thedeclarativesemanticsof π coincideswiththeprocedural semanticsof π. Proof: Forsimplicity,assumethat π hasasingleIDBTofarity k. n n+1  Byinductiononn,showthatT ⊆ T ,foreveryn. (thisusesthemonotonicity ofunionsofconjunctivequeries). 0 1 n n+1  Hence,T ⊆ T ⊆ … ⊆ T ⊆ T ⊆ … n k k  SinceeachT ⊆ adom(I) ,thereisanm ≤ |adom(I)| suchthat Tm =T m+1 .

237 Declarativevs.ProceduralDatalog Semantics

Theorem: Let π beaDatalog program.Thenthefollowingaretrue:  Thebottomupevaluationoftheproceduralsemanticsof π terminateswithinanumberofstepsboundedbyapolynomialin thesizeofthedatabaseinstance(=sizeoftheEDBpredicates).  Thedeclarativesemanticsof π coincideswiththeprocedural semanticsof π. Proof: Forsimplicity,assumethat π hasasingleIDBTofarity k. m m+1  SinceT =T ,wehavethattheproceduralsemantics producesa solution totherecursiveequationarisingfrom π.  Byinductiononn,showthatifT*isanothersolutionofthis recursiveequation,thenTn ⊆ T*,foralln (usethe monotonicity ofunionsofconjunctivequeriesagain). m m  Inparticular,T ⊆ T*,henceT isthe smallestsolution ofthis recursiveequation.

238 TheQueryEvaluationProblemforDatalog

Theorem: Let π beaDatalog program.Thereisapolynomialtime algorithmsuchthat,givenadatabaseinstanceI,itevaluates π onI (i.e.,itcomputesthesemanticsof π onI). Proof: Thebottomupevaluationoftheproceduralsemanticsof π runs inpolynomialtimebecause:

 Thenumberofiterationsisboundedbyapolynomialinthesize ofI.

 Eachstepoftheiterationcanbecarriedoutinpolynomialtime (why? ).

Corollary: ThedatacomplexityofDatalog isinP.

239 TheQueryEvaluationProblemforDatalog

Corollary: ThedatacomplexityofDatalog isinP.

Theorem: ThecombinedcomplexityofDatalog isEXPTIMEcomplete.

Note:  P ⊆ NP ⊆ PSPACE ⊆ EXPTIME.  Moreover,itisknownthatPisproperlycontainedinEXPTIME.  Thus,Datalog hashighercombinedcomplexitythanrelational calculus(sincethecombinedcomplexityofrelationalcalculusis PSPACEcomplete).

240 SomeInterestingDatalog Programs

Example: Non2Colorability canbeexpressedbyaDatalog program.

Fact: AgraphEis2colorableifandonlyifitdoesnotcontainacycle ofoddlength.

Datalog programfor Non2Colorability :

ODD(x,y): E(x,y) ODD(x,y): E(x,z),EVEN(z,y) EVEN(x,y): E(x,z),ODD(z,y). Q: ODD(x,x)

Sanitycheck: CanyoufindaDatalog programfor Non3Colorability ?

241 SomeInterestingDatalog Programs

Example: PathSystemsProblem T(x): A(x) T(x): R(x,y,z),T(y),T(z)

Theorem: S.Cook– 1974 EvaluatingthisDatalog programisaPcompleteproblem (vialogspacereductions).

Note:  PathSystemswasthefirstproblemshowntobePcomplete.  Inparticular,itishighlyunlikelythatPathSystemsisinNLOGSPACE orinLOGSPACE.  Inthissense,Datalog hashigherdatacomplexitythanrelational calculus.

242 LinearDatalog

 Definition: ADatalog programis linea rifthebodyofeachrule containsatmostoneatomicformulainvolvinganIDBpredicate.

 Example: LinearDatalog ProgramforTransitiveClosure T(x,y): E(x,y) T(x,y): E(x,z),T(z,y)

 Example: NonlinearDatalog programforTransitiveClosure T(x,y): E(x,y) T(x,y): T(x,z),T(z,y)

243 LinearDatalog

Example: GivealinearDatalog programthatcomputesthebinary queryCOUSINfromthebinaryrelationschemaPARENT

SIBLING(x,y): PARENT(z,x),PARENT(z,y) COUSIN(x,y): PARENT(z,x),PARENT(w,y),SIBLING(z,w) COUSIN(x,y): PARENT(z,x),PARENT(w,y),COUSIN(z,w).

Fact: COUSIN(Barack Obama,DickChenney) Actually,COUSIN 8(BarackObama,DickChenney) http://www.msnbc.msn.com/id/21340764/

Fact: COUSIN(Sarah Palin,PrincessDiana). Actually,COUSIN 10 (SarahPalin,PrincessDiana) http://www.dailymail.co.uk/news/worldnews/article1073249/SarahPalin PrincessDianacousinsgenealogistsreveal.html

244 LinearDatalog

 Definition: ADatalog program π is linearizable ifthereisalinear Datalog program π*thatisequivalentto π.

 Example: ThefollowingDatalog programislinearizable: T(x,y): E(x,y) T(x,y): T(x,z),T(z,y)

 Example: ThefollowingDatalog programis not linearizable: T(x): A(x) T(x): R(x,y,z),T(y),T(z) (theproofofthisfactisnontrivial).

245 Datalog andSQL

 SQL:99andsubsequentversionsoftheSQLstandardprovide supportfor linearDatalog programs(but not fornonlinearones)

 Syntax: WITHRECURSIVETAS

 Semantics:  ComputeTasthesemanticsof  Theresultofthepreviousstepisatemporaryrelationthatis thenused,togetherwithotherEDBS,asifitwereastored relation(anEDB)in.

246 Datalog andSQL

Example: GiveanSQLqueryforANCESTOR(usingPARENTasEDB)

WITHRECURSIVEANCESTOR(anc,desc) (SELECTparent,child FROMPARENT UNION SELECTPARENT.parent,ANCESTOR.desc FROMPARENT,ANCESTOR WHEREPARENT.child =ANCESTOR.anc) SELECT*FROMANCESTOR

247 Datalog andSQL

Example: GiveanSQLquerythatcomputesalldescendantsofNoah

WITHRECURSIVEANCESTOR(anc,desc) (SELECTparent,child FROMPARENT UNION SELECTANCESTOR.anc,PARENT.child FROMPARENT,ANCESTOR WHEREPARENT.child =ANCESTOR.anc) SELECTdesc FROMANCESTOR WHEREanc =‘Noah’

248 Datalog andSQL

 LinearDatalog programswithmutiple IDBs aresupportedinSQL:99

 Syntax: WITHRECURSIVER,S,T,… AS

 Semantics:  ComputeR,S,T,… asthesemanticsof  Theresultoftheprevioussteparetemporaryrelationthatare thenused,togetherwithotherEDBS,asiftheywerestored relation(EDBs)in.

249 Datalog(≠)

 Definition: Datalog(≠)

 Datalog(≠)istheextensionofDatalog inwhichthebodyofa rulemaycontainalso≠.

 DeclarativeandProceduralSemanticsofDatalog(≠)aresimilar tothoseofDatalog.

 Example : wAVOIDINGPATH GivenagraphEandthreenodesx,y,andw,isthereapathfromx toythatdoesnotcontainw? T(x,y,w): E(x,y),x≠ w,y≠ w T(x,y,w): E(x,z),T(z,y,w),x≠ w.

250 Datalog with

 Question: WhatifweallownegationinthebodiesofDatalog rules?

 Examples:  T(x): ¬ T(x) (therecursivespecificationhas no solutions!)  S(x): E(x,y), ¬ S(y)

 Note: SeveraldifferentsemanticsforDatalog programswith negationhavebeenproposedovertheyears(seeCh.15ofAHV):  Stratifieddatalog programs  Wellfoundedsemantics  Inflationarysemantics  Stablemodelsemantics  …

251 QueryEquivalenceandContainmentforDatalog

 Note: Recallthatthefollowingareknownaboutthe Query EvaluationProblem forDatalog queries:

 ThedatacomplexityofDatalog isinP.

 ThecombinedcomplexityofDatalog isEXPTIMEcomplete.

 Questions:

 Whataboutthe QueryEquivalenceProblem forDatalog: GiventwoDatalog programs π and π’,is π equivalentto π’? (dotheyreturnthesameansweroneverydatabaseinstance?)

 Whataboutthe QueryContainmentProblem forDatalog: GiventwoDatalog programs π and π’,is π ⊆ π’? (is π(I) ⊆ π’(I),oneverydatabaseinstanceI?)

252 QueryEquivalenceandContainmentforDatalog

Theorem: O.Shmueli – 1987

 ThequeryequivalenceproblemforDatalog queriesisundecidable. Infact,itisundecidable evenforDatalog querieswithasingleIDB.

 Consequently,thequerycontainmentproblemforDatalog queriesis undecidable.

HintofProof:

 Reductionfrom ContextFreeGrammarEquivalence : GiventwocontextfreegrammarsGandG’,isL(G)=L(G’)?

Formoreonthistopic,readChapter12ofAHV.

253 TheComplexityofDatabaseQueryLanguage

Relational Conjunctive Unionsof Datalog Calculus Queries Conjunctive Queries Queries QueryEval.: PSPACE NPcomplete NPcomplete EXPTIME Combined complete complete Complexity QueryEval.: InLOGSPACE InLOGSPACE InLOGSPACE Pcomplete Data (hence,inP) (hence,inP) (hence,inP) Complexity Query Undecidable NPcomplete NPcomplete Undecidable Equivalence

Query Undecidable NPcomplete NPcomplete Undecidable Containment

254 ComposingSchemaMappings: AnOverview

 Thistopiccomplementsthetopicsondataintegrationanddata exchangepresentedinthecoursebyMaurizioLenzerini.

 ItisjointworkwithRonaldFagin,LucianPopa,andWang Chiew Tan. TheoreticalAspectsofDataInteroperability

Theresearchcommunityhasstudiedtwodifferent,but closelyrelated,facetsofdatainteroperability:

 DataIntegration (aka DataFederation )

 Formalizedandstudiedforthepast1015years

 DataExchange (aka DataTranslation )

 Formalizedandstudiedforthepast56years

256 DataIntegration

Queryheterogeneousdataindifferent sources viaavirtual global schema

S1

I 1 query Q S 2 T Global I 2 Schema

S3

I3 Sources

257 DataExchange

Transformdatastructuredundera source schemaintodata structuredunderadifferent target schema.

Σ

S T Source Schema Target Schema

I J

258 SchemaMappings

 Schemamappings: Highlevel,declarativeassertionsthatspecifytherelationship betweentwodatabaseschemas.

 Schemamappingsconstitutetheessential buildingblocks in formalizingandstudyingdatainteroperabilitytasks,including data integration and dataexchange .

 Schemamappings makeitpossibletoseparatethe design ofthe relationshipbetweenschemasfromits implementation .  Areeasiertogenerateandmanage(semi)automatically;  CanbecompiledintoSQL/XSLTscriptsautomatically.

259 Schema Mappings

Σ Source S Target T

 SchemaMapping M =( S, T,Σ)

 Source schema S, Target schema T

 Highlevel,declarativeassertionsΣ thatspecifythe relationshipbetween S and T.

 Question: Whatisa“good” schemamappingspecification language?

260 SchemaMappingSpecificationLanguages

 ObviousIdea : Usealogicbasedlanguagetospecifyschemamappings. Inparticular,usefirstorderlogic.

 Warning: Unrestricteduseoffirstorderlogicasaschemamappingspecification languagegivesriseto undecidability ofbasicalgorithmicproblemsabout schemamappings.

261 SchemaMappingSpecificationLanguages

Letusconsidersomesimpletasksthateveryschemamappingspecification languageshouldsupport:

 Copy(Nicknaming):

 Copyeachsourcetabletoatargettableandrenameit.

 Projection:

 Formatargettablebyprojectingononeormorecolumnsofasourcetable.

 Augmentation:

 Formatargettablebyaddingoneormorecolumnstoasourcetable.

 Decomposition:

 Decomposeasourcetableintotwoormoretargettables.

 Join:

 Formatargettablebyjoiningtwoormoresourcetables.

 Combinationsoftheabove (e.g.,“join+columnaugmentation+…”)

262 SchemaMappingSpecificationLanguages

 Copy(Nicknaming):

 ∀x1,…,x n(P(x 1,…,xn) → R(x 1,…,xn))  Projection:  ∀x,y,z(P(x,y,z) → R(x,y))

 ColumnAugmentation:  ∀x,y(P(x,y) → ∃ zR(x,y,z))

 Decomposition:  ∀x,y,z(P(x,y,z) → R(x,y) Æ T(y,z))

 Join:  ∀x,y,z(E(x,z) ÆF(z,y) → R(x,y,z))

 Combinationsoftheabove (e.g.,“join+columnaugmentation+…”)  ∀x,y,z(E(x,z) Æ F(z,y) → ∃ w(R(x,y) Æ T(x,y,z,w)))

263 SchemaMappingSpecificationLanguages

 Question: Whatdoallthesetasks(copy,projection,column augmentation,decomposition,join)haveincommon?

 Answer:

 Theycanbespecifiedusing tuplegeneratingdependencies(tgds).

 Infact,theycanbespecifiedusingaspecialclassof tuplegeneratingdependenciesknownas sourcetotargettuple generatingdependencies(st tgds).

264 DatabaseIntegrityConstraints

 DependencyTheory :extensivestudyofintegrityconstraintsinrelational databasesinthe1970sand1980s (Codd,Fagin,Beeri,Vardi …)

 Tuplegeneratingdependencies (tgds)emergedasanimportantclassof constraintswithabalancebetweenhighexpressivepowerandgood algorithmicproperties.Tgds areexpressionsoftheform ∀ x (ϕ(x) → ∃ y ψ(x, y )),where ϕ(x), ψ(x, y )areconjunctionsofatomicformulas. SpecialCases:  InclusionDependencies  Multivalued Dependencies

265 SchemaMappingSpecificationLanguage

Therelationshipbetweensourceandtargetisgivenby sourcetotargettuple generatingdependencies (st tgds) ∀x (ϕ(x) → ∃y ψ(x, y)) ,where  ϕ(x) isaconjunction ofatomsoverthesource;  ψ(x, y) isaconjunctionofatomsoverthetarget.

 st tgds assertthat:some conjunctive queryoverthesourceis contained in someother conjunctive queryoverthetarget.

Example: (droppingtheuniversalquantifiersinthefront) (Student(s) ∧ Enrolls(s,c)) → ∃t ∃g(Teaches(t,c) ∧ Grade(s,c,g))

266 SchemaMappingSpecificationLanguage

Fact: st tgds generalizethemainspecificationsusedindataintegration:  TheygeneralizeLAV( localas )specifications: P(x) → ∃y ψ(x, y),where P isa source schema.  E(x,y) → ∃ z(H(x,z) Æ H(z,y)) (LAVconstraint) Note: Copy,projection,anddecompositionareLAVst tgds.

 TheygeneralizeGAV( globalasview )specifications: ϕ(x) → R( x), where R isa target relation (theyareequivalentto full tgds: ϕ(x) → ψ(x), where ϕ(x) and ψ(x) areconjunctionsofatoms).  E(x,y) Æ E(y,z) → F(x,z) (GAV(full)constraint) Note: Copy,projection,andjoinareGAVst tgds.

267 Schema Mappings&DataExchange Σ

Source S Target T

I J

 DataExchange viatheschemamapping M =( S, T,Σ) Givena source instanceI,constructa target instanceJ,sothat (I,J)satisfythespecificationsΣ of M. SuchaJiscalleda solution forI.  Difficulty:  Usually,therearemultiplesolutions  Whichoneisthe“best ” tomaterialize?

268 DataExchange&Universalsolutions

Fagin,K…,Miller,Popa: Identifiedandstudiedtheconceptofa universalsolution for schemamappingsspecifiedbyst tgds

 Auniversalsolutionsisamostgeneralsolution.

 Auniversalsolution“represents ” theentirespaceofsolutions.

 A“canonical” universalsolutioncanbegeneratedefficientlyusing thechaseprocedure.

 Auniversalsolutioncanbeusedtocomputethecertainanswers ofconjunctivequeriesoverthetargetschema.

269 ManagingSchemaMappings

 Schemamappingscanbequitecomplex.

 Methodsandtoolsareneededtoautomateorsemiautomate schemamappingmanagement .

 MetadataManagementFramework – Bernstein2003 basedongenericschemamappingoperators:  Match operator  Merge operator  …  Composition operator  Inverse operator

270 ComposingSchemaMappings

M12 M23

Schema S1 Schema S2 Schema S3

M13

 Given M12 =( S1, S2,Σ12 )and M23 =( S2, S3,Σ23 ),derivea

schemamapping M13 =( S1, S3,Σ13 )thatis“equivalent ” tothe

sequentialapplicationof M12 and M23.

 M13 isa composition of M12 and M23

M13 = M12 ◦ M23

271 ComposingSchemaMappings

ΜΜΜ12 ΜΜΜ23

Schema S1 Schema S2 Schema S3

ΜΜΜ13

 Given ΜΜΜ12 =( S1, S2, Σ12 )and ΜΜΜ23 =( S2, S3, Σ23 ),derivea

schemamapping ΜΜΜ13 =( S1, S3, Σ13 )thatis“equivalent ” tothe

sequence ΜΜΜ12 and ΜΜΜ23 .

What does it mean for ΜΜΜ13 to be “equivalent ” to the composition of ΜΜΜ12 and ΜΜΜ23 ?

272 EarlierWork

 MetadataModelManagement (BernsteininCIDR2003)

 Compositionisoneofthefundamentaloperators

 However,noprecisesemanticsisgiven

 ComposingMappingsamongDataSources (Madhavan &HalevyinVLDB2003)

 Firsttoproposeasemanticsforcomposition

 However,theirdefinitionisintermsofmaintainingthesame certainanswersrelativetoaclassofqueries.

 Theirnotionofcomposition depends ontheclassofqueries;it may not beuniqueuptologicalequivalence.

273 SemanticsofComposition

 Everyschemamapping M =( S, T, Σ)definesabinaryrelationshipInst( M)between instances : Inst( M)={(I,J)|(I,J) ~ Σ }. Infact,fromasemanticpointofview,aschemamapping M canbe identifiedwiththesetInst( M).

 Definition: (FKPT)

Aschemamapping M13 isa composition of M12 and M23 if

Inst( M13 )=Inst( M12 ) ° Inst( M23 ),thatis,

(I 1,I 3) ~ Σ13 ifandonlyif

thereexistsI 2 suchthat(I 1,I 2) ~ Σ12 and(I 2,I 3) ~ Σ23.

274 TheCompositionofSchemaMappings

Fact: Ifboth ΜΜΜ =( S1, S3, Σ)and ΜΜΜ’ =( S1, S3, Σ’)are

compositionsof ΜΜΜ12 and ΜΜΜ23 ,then Σ are Σ’ arelogicallyequivalent. Forthis:

 Wesaythat ΜΜΜ (or ΜΜΜ’) is the composition of ΜΜΜ12 and ΜΜΜ23 .

 Wewrite ΜΜΜ12 °ΜΜΜ 23 todenoteit

275 IssuesinCompositionofSchemaMappings

 The semantics ofcompositionwasthefirstmainissue.

 Thesecondmainissueisthe language ofthecomposition.

 Isthelanguageofst tgds closedundercomposition?

If ΜΜΜ12 and ΜΜΜ23 arespecifiedbyfinitesetsofst tgds,is

ΜΜΜ12 °ΜΜΜ 23 alsospecifiedbyafinitesetofst tgds?

 Ifnot,whatisthe“right ” languageforcomposingschema mappings?

276 InexpressibilityofComposition

Theorem:  Thelanguageofst tgds is not closedundercomposition.

 Infact,thereareschemamappings ΜΜΜ12 and ΜΜΜ23 specifiedbyst tgds suchthattheircomposition ΜΜΜ12 ◦ ΜΜΜ23 is not expressiblein leastfixedpointlogicLFP;hence,itisexpressibleneitherinfirst orderlogicnorinDatalog.

277 LowerBoundsforComposition

 ΜΜΜ12 : ∀x∀y (E(x,y) → ∃u∃v (C(x,u) ∧ C(y,v))) ∀x∀y (E(x,y) → F(x,y))

 ΜΜΜ23 : ∀x∀y∀u∀v (C(x,u) ∧ C(y,v) ∧ F(x,y) → D(u,v))

 Givengraph G=(V,E):

 LetI 1 =E  LetI 3 ={( r,g),( g,r),( b,r),( r,b),( g,b),( b,g)}

Fact:

G is3colorableiff ∈ Inst( ΜΜΜ12 ) ° Inst( ΜΜΜ23 )

 Theorem(Dawar – 1998): 3Colorabilityis not expressibleinLFP

278 ComplexityofComposition

Definition: The modelcheckingproblem foraschemamapping M =( S, T, Σ)asks:givenasourceinstancedIandatarget instanceJ,does  Σ ?

Fact: IfaschemamappingMisspecifiedbyst tgds,thenthe modelcheckingproblemforMisinLOGSPACE.

Fact: Thereareschemamappings ΜΜΜ12 and ΜΜΜ23 specifiedbyst tgds suchthatthemodelcheckingproblemfortheircomposition

ΜΜΜ12 ◦ ΜΜΜ23 isNPcomplete.

279 EmployeeExample

 Σ12 : Emp Rep Mgr  Emp(e) → ∃mRep(e,m) e e e m m  Σ23 :  Rep(e,m) → Mgr(e,m) SelfMgr e  Rep(e,e) → SelfMgr(e)

 Theorem: Thiscompositionisnotdefinableby any set (finiteorinfinite)ofst tgds.

 Fact :Thiscompositionisdefinableinawellbehavedfragmentof secondorderlogic,called SOtgds ,thatextendsst tgds with Skolem functions.

280 EmployeeExample revisited

Σ12 :  ∀e(Emp(e) → ∃mRep(e,m))

Σ23 :  ∀e∀m(Rep(e,m) → Mgr(e,m))

 ∀e(Rep(e,e) → SelfMgr(e))

Fact: ThecompositionisdefinablebytheSOtgd

Σ13 :  ∃f (∀e(Emp(e) → Mgr(e, f(e) ) ∧ ∀e(Emp(e) ∧ (e=f(e)) → SelfMgr(e)) )

281 SecondOrderTgds

Definition: Let S beasourceschemaand T atargetschema. A secondordertuplegeneratingdependency (SOtgd)isaformula oftheform:

∃f1 … ∃fm(( ∀x1(φ1 → ψ1)) ∧ … ∧ (∀xn(φn → ψn))),where

 Eachfi isafunctionsymbol.

 Each φi isaconjunctionofatomsfrom S andequalitiesofterms.

 Each ψi isaconjunctionofatomsfrom T.

Example: ∃f (∀e(Emp(e) → Mgr(e, f(e) ) ∧ ∀e(Emp(e) ∧ (e=f(e)) → SelfMgr(e)) )

282 ComposingSOTgds andDataExchange

Theorem (FKPT):

 ThecompositionoftwoSOtgds isdefinablebyaSOtgd.

 ThereisanalgorithmforcomposingSOtgds.

 ThechaseprocedurecanbeextendedtoSOtgds; itproducesuniversalsolutionsinpolynomialtime.

 EverySOtgd isthecompositionoffinitelymanyfinitesetsofst tgds.Hence,SOtgds arethe“right ” languageforthecomposition ofst tgds

283 WhenisthecompositionFOdefinable?

Fact:  Itisanundecidable problemtotellwhetherthecompositionoftwo schemamappingsspecifiedbyst tgds isfirstorderdefinable.  However,therearecertainsufficientconditionsthatguaranteethat thecompositionisfirstorderdefinable.

 If ΜΜΜ12 isspecifiedbyGAV(full)st tgds and ΜΜΜ23 isspecifiedbys t tgds,thentheircompositionisdefinablebyst tgds.

 Arocena,Fuxman,Miller: Ifboth ΜΜΜ12 and ΜΜΜ23 arespecifiedby LAVst tgds withdistinctvariables,thentheircompositionis specifiedbyLAVst tgds.

284 SynopsisofSchemaMappingComposition

 st tgds are not closedundercomposition.

 SOtgds forma wellbehaved fragmentofsecondorderlogic.

 SOtgds areclosedundercomposition;theyare the“right ” languageforcomposingst tgds.

 SOtgds are“chasable ”: Polynomialtimedataexchangewithuniversalsolutions.

 SOtgds andthecompositionalgorithmhavebeenincorporatedin Clio’s MappingSpecificationLanguage (MSL).

285 RelatedWorkandOpenProblems

RelatedWork:  Composingricherschemamappings Nash,Bernstein,Melnik – 2007  ComposingSchemaMappingsinOpen&ClosedWorlds Libkin andSirangelo – 2008  XMLSchemaMappings Amano,Libkin,Murlak – 2009

OpenProblems:  Compositionofschemamappingsspecifiedbyst tgds andtarget constraints(targettgds andtargetegds).  Compositionofschemamappingsspecifiedbyrichersourcetotarget dependencies.

286 “The notion of composition of maps leads to the most natural account of fundamental notions of mathematics, from multiplication, addition, and exponentiation, through the basic notions of logic."

"Conceptual Mathematics " by Lawevere and Schanuel

287