Schema Flexibility and Data Sharing in Multi-Tenant Databases

Technische Universität München Lehrstuhl III - Datenbanksysteme Schema Flexibility and Data Sharing in Multi-Tenant Databases Diplom-Informatiker Univ. Stefan Aulbach Vollständiger Abdruck der von der Fakultät für Informatik der Technischen Uni- versität München zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften (Dr. rer. nat.) genehmigten Dissertation. Vorsitzender: Univ.-Prof. Dr. Florian Matthes Prüfer der Dissertation: 1. Univ.-Prof. Alfons Kemper, Ph. D. 2. Univ.-Prof. Dr. Torsten Grust (Eberhard-Karls-Universität Tübingen) Die Dissertation wurde am 26.05.2011 bei der Technischen Universität München eingereicht und durch die Fakultät für Informatik am 12.10.2011 angenommen. Abstract Hosted applications, operated by independent service providers and accessed via the In- ternet, commonly implement multi-tenancy to leverage economy of scale by consolidating multiple businesses onto the same operational system to lower Total Cost of Ownership (TCO). Traditionally, Soware as a Service (SasS) applications are built on a soware platform to allow tenants to customize the application. ese platforms use traditional database systems as storage back-end, but implement an intermediate layer that maps the application data to generic structures inside the DBMS. We discuss these mapping techniques in the context of schema exibility, such as schema extensibility and schema evolution. However, as the schema mapping has to be handled by the application layer, the DBMS degenerates to be a “dumb” data repository. us, a multi-tenant database system for SaaS should offer explicit support for schema exibility. at implies that schemas can be extended for different versions of the application and dynamically modied while the system is on-line. Furthermore, when co-locating tenants, there is a potential for sharing certain data across tenants, such as master data and conguration data of the application. We present FlexScheme, a meta-data model that is specially designed for multi-tenancy. It enables data sharing and schema exibility by adding explicit support for schema extensibility and “Lights-out” Online Schema Evolution. We employ meta-data sharing where each tenant inherits the schema of the base application that can be extended to satisfy the individual tenant’s needs. e meta-data sharing is complemented by master data sharing where the global data set is overridden by the individual changes of a tenant. We develop a light-weight schema evolution mechanism that, rather than carrying out costly data re- organizations immediately, lets the DBMS adaptively schedule the reorganization to not impact co-located tenants. We implement a main-memory-based DBMS prototype that has specialized operators for data sharing and lights-out schema evolution. We show that, in the multi-tenancy context, both techniques effectively lower TCO by allowing a higher tenant packaging as well as better resource utilization. Furthermore, we focus on efficient storage mechanisms for FlexScheme and present a novel approach, called XOR Delta, which is based on XOR en- coding and is optimized for main-memory DBMS. iii Acknowledgements Many people attended me during the time I was working on this thesis. Without their invaluable support, this project never completed. I am grateful to my advisor Prof. Alfons Kemper, Ph.D., for all the insightful discussions, the feedback on publications and presentations, the encouragements, and the opportunity to pursue this thesis at his chair. Dean Jacobs, Ph.D., is responsible for me to have the initial contact with Soware as a Service. At this point, I would like to thank him for all the comprehensive discussions, phone calls, and e-mails. For their help, the pleasant work atmosphere, the many discussions, and much more I would like to thank my colleagues: Martina Albutiu, Nikolaus Augsten, Ph.D., Veneta Do- breva, Florian Funke, Dr. Daniel Gmach, Prof. Dr. Torsten Grust, Andrey Gubichev, Ben- jamin Guer, Sebastian Hagen, Stefan Krompaß, Dr. Richard Kuntschke, Manuel Mayr, Henrik Mühe, Prof. Dr. omas Neumann, Fabian Prasser, Dr. Angelika Reiser, Jan Rit- tinger, Dr. Tobias Scholl, Andreas Scholz, Michael Seibold, Jessica Smejkal, and Bernd Vögele. I particularly thank our secretary Evi Kollmann. Several students offered their support for implementing the research prototype: Andreas Gast, Vladislav Lazarov, Alfred Ostermeier, Peng Xu, and Ursula Zucker. I am thankful for their work. Finally, I thank Iris, my partner, and Ruth, my mother, for all their love, support and patience throughout the years. v Contents . Introduction .. Cloud Computing .................................. .. Soware as a Service ................................ .. e Impact of Massive Multi-Tenancy ...................... .. Schema Flexibility of SaaS Applications ..................... .. Problem Statement ................................. .. Contributions and Outline ............................. . Fundamental Implementation of Multi-Tenancy .. Basic Database Layouts for Multi-Tenancy ................... ... Shared Machine .............................. ... Shared Process ............................... ... Shared Table ................................ .. Scalability and Efficiency .............................. ... Multi-Tenancy Testbed .......................... ... Experiment: DBMS Performance with Many Tables ......... .. Multi-Tenancy Issues with Current DBMS ................... ... Buffer Pool Utilization .......................... ... Meta-Data Management ......................... . Extensible Schemas for Traditional DBMS .. Schema Mappings .................................. ... Schema Based Approaches ........................ ... Generic Structures ............................. ... Semistructured Approaches ....................... .. Case Study: e force.com Platform ....................... ... Meta-Data-Driven Applications ..................... ... Data Persistence .............................. .. Request Processing ................................. ... Chunk Tables ................................ ... IBM pureXML ............................... .. Chunk Table Performance ............................. ... Test Schema ................................ ... Test Query ................................. vii ... Transformation and Nesting ....................... ... Transformation and Scaling ....................... ... Response Times with Warm Cache ................... ... Logical Page Reads ............................ ... Response Times with Cold Cache .................... ... Cache Locality Benets .......................... ... Additional Tests .............................. .. Schema Evolution on Semistructured Schema Mappings ........... ... Microso SQL Server ........................... ... IBM DB .................................. . Next Generation Multi-Tenant DBMS .. Estimated Workload ................................ .. Multi-Tenant DBMS Concepts .......................... ... Native Schema Flexibility ......................... ... Tenant Virtualization ........................... ... Grid-like Technologies .......................... ... Main Memory DBMS ........................... ... Execution Model .............................. .. Native Support for Schema Flexibility ...................... .. FlexScheme – Data Management Model for Flexibility ............ ... Data Structures ............................... ... Deriving Virtual Private Tables ..................... ... Comparison to Other Models ...................... .. Applying FlexScheme to Multi-Tenant DBMS ................. . Data Sharing Across Tenants .. Data Overlay ..................................... .. Accessing Overridden Data ............................ ... Basic Implementations .......................... ... Special Index-Supported Variants .................... ... Data Maintenance ............................. ... Secondary Indexes ............................. ... Overlay Hierarchies ............................ ... Comparison to Other Delta Mechanisms ............... .. Shared Data Versioning ............................... ... Pruning of Unused Versions ....................... ... Tenant-specic Reconciliation ...................... .. Physical Data Organization ............................ ... Snapshot Approach ............................ ... Dictionary Approach ........................... ... Temporal Approach ............................ ... Differential Delta Approach ....................... ... XOR Delta Approach ........................... viii .. Evaluation ...................................... ... Overlay Operator ............................. ... Physical Data Organization ....................... . Graceful On-Line Schema Evolution .. Schema Versioning ................................. ... Versioning Concepts ........................... ... Atomic Modication Operations .................... ... Applying Modication Operations ................... ... Version Pruning .............................. .. Lightweight Physical Data Reorganization .................... ... Objectives .................................. ... Evolution Operator ............................ ... Physical Tuple Format ........................... ... Access Behavior .............................. .. Evolution Strategies ................................. ... Immediate Evolution ........................... ... Lightweight Evolution ........................... ... Strategy Selection ............................

Load more