Non-Standard-Datenbanken Und Data Mining

Non-Standard-Datenbanken Und Data Mining

Non-Standard-Datenbanken und Data Mining Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme 1 Übersicht } Semistrukturierte Datenbanken (JSON, XML) und Volltextsuche } Information Retrieval } Mehrdimensionale Indexstrukturen } Cluster-Bildung } Einbettungstechniken } First-n-, Top-k-, und Skyline-Anfragen } Probabilistische Datenbanken, Anfragebeantwortung, Top-k-Anfragen und Open-World-Annahme } Probabilistische Modellierung, Bayes-Netze, Anfragebeantwortungsalgorithmen, Lernverfahren, } Temporale Datenbanken und das relationale Modell, SQL:2011 } Probabilistische Temporale Datenbanken } SQL: neue Entwicklungen (z.B. JSON-Strukturen und Arrays), Zeitreihen (z.B. TimeScaleDB) } Stromdatenbanken, Prinzipien der Fenster-orientierten inkrementellen Verarbeitung } Approximationstechniken für Stromdatenverarbeitung, Stream-Mining } Probabilistische raum-zeitliche Datenbanken und Stromdatenverarbeitungsssysteme: Anfragen und Indexstrukturen, Raum-zeitliches Data Mining, Probabilistische Skylines } Von NoSQL- zu NewSQL-Datenbanken, CAP-Theorem, Blockchain-Datenbanken 2 2 Acknowledgements: This presentation is based on the following two presentations 10 Years of Probabilistic Querying – What Next? Martin Theobald University of Antwerp Temporal Alignment Anton Dignös1 Michael H. Böhlen1 Johann Gamper2 1University of Zürich, Switzerland 2Free University of Bozen-Bolzano, Italy Recap: Probabilistic Databases A probabilistic database Dp (compactly) encodes a probability distribution over a finite set of deterministic database instances Di. D1: 0.42 D2: 0.18 D3: 0.28 D4: 0.12 WorksAt(Sub, Obj) WorksAt(Sub, Obj) WorksAt(Sub, Obj) WorksAt(Sub, Obj) Jeff Stanford Jeff Stanford Jeff Princeton Jeff Princeton } Special Cases: (1) Dp tuple-independent (II) Dp block-independent WorksAt(Sub, Obj) p WorksAt(Sub, Obj) p Jeff Stanford 0.6 Jeff Stanford 0.6 Note: (I) and (II) Jeff Princeton 0.7 Princeton 0.4 are not equivalent! } Query Semantics: (“Marginal Probabilities”) } Run query Q against each instance Di; for each answer tuple t, sum up the probabilities of all instances Di where t is a result. 4 Probabilistic & Temporal Databases A temporal-probabilistic database DTp (compactly) encodes a probability distribution over a finite set of deterministic database T instances Di at each time point of a finite time domain W . BornIn(Sub,Obj) T p Wedding(Sub,Obj) T p Divorce(Sub,Obj) T p DeNiro Green- [1943, 0.9 DeNiro Abbott [1936, 0.3 DeNiro Abbott [1988, 0.8 which 1944) 1940) 1989) DeNiro Tribeca [1998, 0.6 DeNiro Abbott [1976, 0.7 1999) 1977) } Sequenced Semantics & Snapshot Reducibility: [Dignös, Gamper, Böhlen: SIGMOD’12] } Built-in semantics: reduce temporal-relational operators to their non- temporal counterparts at each snapshot (i.e., time point) of the database. } Coalesce/split tuples with consecutive time intervals based on their lineages. marriedTo(x,y)[tb1,Tmax) Ü wedding(x,y)[tb1,te1) Ù } Non-Sequenced Semantics ¬divorce(x,y)[tb2,te2) } Queries can freely manipulate timestamps just like regular attributes. Single temporal operator ≤T supports all of Allen’s 13 temporal relations. } Deduplicate tuples with overlapping time intervals based on their lineages. [Dylla,Miliaraki,Theobald: PVLDB’13] Sequenced Semantics: Example [Dignös, Gamper, Böhlen: SIGMOD’12] Temporal Splitter / Snapshot Reduction [Dignös, Gamper, Böhlen: SIGMOD’12] Temporal Alignment & Deduplication Example (f1 Ù f3) Ú (f1 Ù ¬f3) Ú (f2 Ù f3) Ú (f2 Ù ¬f3) Dedupl. Facts (f1 Ù f3) Ú (f1 Ù ¬f3) (f1 Ù ¬f3) Ú (f2 Ù ¬f3) BornIn(Sub,Obj) T p Wedding(Sub,Obj) T p Divorce(Sub,Obj) T p DeNiro Green- [1943, 0.9 f1 ÙDeNirof3 Abbott [1936, 0.3 DeNiro Abbott [1988, 0.8 Derivedwhich 1944) 1940) 1989) DeNiroFacts Tribeca [1998, 0.6 DeNiro Abbott [1976, 0.7 1999) 1977) f1 Ù ¬f3 f2 Ù f3 f2 Ù ¬f3 f Base 3 wedding(DeNiro,Abbott) f divorce(DeNiro,Abbott) Facts 2 f1 wedding(DeNiro,Abbott) WT Tmin 1936 1976 1988 Tmax Non-Sequenced Semantics: marriedTo(x,y)[tb1,Tmax) Ü wedding(x,y)[tb1,te1) Ù ¬divorce(x,y)[tb2,te2) T marriedTo(x,y)[tb1,te2) Ü wedding(x,y)[tb1,te1) Ù divorce(x,y)[tb2,te2) Ù te1 ≤ tb2 Inference in Probabilistic-Temporal Databases [Wang,Yahya,Theobald: MUD’10; Dylla,Miliaraki,Theobald: PVLDB’13] Derived teamMates(Beckham, Ü playsFor(Beckham, Real, T1) Facts Ronaldo, T3) Ù playsFor(Ronaldo, Real, T2) Ù overlaps(T1, T2, T3) 0.16 0.08 0.12 ‘03 ‘04 ‘05 ‘07 0.6 0.4 0.4 0.1 0.2 0.2 Base ‘03 ‘05 ‘07 ‘00 ‘02 ‘04 ‘05 ‘07 Facts playsFor(Beckham, Real, T1) playsFor(Ronaldo, Real, T2) Example using the Allen predicate overlaps Inference in Probabilistic-Temporal Databases [Wang,Yahya,Theobald: MUD’10; Dylla,Miliaraki,Theobald: PVLDB’13] Derived teamMates(Beckham, teamMates(Beckham, Facts Ronaldo, T4) Zidane, T5) teamMates(Ronaldo, 0.16Zidane, T6) 0.08 0.12 ‘03 ‘04 ‘05 ‘07 Non-independent Independent 0.6 0.4 0.4 0.1 0.2 0.2 playsFor(Zidane, Real, T3) Base ‘03 ‘05 ‘07 ‘00 ‘02 ‘04 ‘05 ‘07 Facts playsFor(Beckham, Real, T1) playsFor(Ronaldo, Real, T2) Inference in Probabilistic-Temporal Databases [Wang,Yahya,Theobald: MUD’10; Dylla,Miliaraki,Theobald: PVLDB’13] Derived teamMates(Beckham, teamMates(Beckham, facts stored Ronaldo, T4) Zidane, T5) in views teamMates(Ronaldo, Zidane, T6) Need Non-independent Lineage! Independent } Closed and complete representation model (incl. lineage) } Te m p o r a l a l i g n m e n t is polyn. in the number of input intervals playsFor(Zidane, Real, T3) Base} Confidence computation per interval remains #P-hard Facts} In generalplaysFor requires(Beckham, Monte Real, Carlo T1 )approximationsplaysFor(Ronaldo, (Luby-Karp Real, for T2 ) DNF, MCMC-style sampling), decompositions, or top-k pruning Literature [Das Sarma,Theobald,Widom: ICDE’08] Das Sarma, Anish and Theobald, Martin and Widom, Jennifer, Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic Databases. In: 24th International Conference on Data Engineering (ICDE 2008), IEEE Computer Society Press, pp. 1023-1032. 2008 [Wang,Yahya,Theobald: MUD’10] Wang, Yafang and Yahya, Mohamed and Theobald, Martin, Time-aware Reasoning in Uncertain Knowledge Bases. In: 4th International VLDB Workshop on Management of Uncertain Data, Vol. WP 10-, pp. 51-65, 2010 [Dignös, Gamper, Böhlen: SIGMOD’12] A. Dignös, M.H. Böhlen, J. Gamper. Temporal alignment. In Proc. of the SIGMOD-12, pages 433-444, Scottsdale, AZ, USA, May 20-24, 2012 [Dylla,Miliaraki,Theobald: PVLDB’13] Maximilian Dylla, Iris Miliaraki, Martin Theobald, A Temporal-Probabilistic Database Model for Information Extraction, Proceedings of the VLDB Endowment, Volume 6, Issue 14, 2013.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    12 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us