A Simultaneous Execution Scheme for Database Caching

A Simultaneous Execution Scheme for Database Caching Der Fakultät Mathematik, Naturwissenschaften und Informatik der Brandenburgischen Technischen Universität Cottbus zur Erlangung des akademischen Grades Doktor der Ingenieurwissenschaften (Dr.-Ing.) genehmigte Dissertation vorgelegt von Diplom-Informatiker Steffen Jurk geboren am 29. November 1974 in Apolda Gutachter: Prof. Dr. rer. nat. habil. Bernhard Thalheim Prof. Dr. rer. pol. habil. Hans-Joachim Lenz Prof. Dr.-Ing. Jens Nolte Tag der mündlichen Prüfung: 12.12.2005 Hiermit erkläre ich, dass ich die vorliegende Dissertation selbständig verfasst habe. Cottbus, den September 22, 2006 ........................ Steffen Jurk, Striesower Weg 22, D–03044 Cottbus Email: [email protected] ii Abstract Database caching techniques promise to improve the performance and scalability of client- server database applications. The task of a cache is to accept requests of clients and to compute them locally on behalf of the server. The content of a cache is filled dynamically based on the application and users’ data domain of interest. If data is missing or concurrent access has to be controlled, the computation of the request is completed at the central server. As a result, applications benefit from quick responses of a cache and load is taken from the server. The dynamic nature of a cache, the need of transactional consistency, and the complex nature of a request make database caching a challenging field of research. This thesis presents a novel approach to the shared and parallel execution of stored procedure code between a cache and the server. Every commercial database product provides such stored procedures that are coded in a complete programming language. Given a request in form of such a procedure, we introduce the concept of split twin transactions that logically split the procedure code into two parts, say A and B, such that A is executed at the cache and B at the server in a simultaneous and parallel manner. Furthermore, we analyse the procedure code to detect suitable parts. To the best of our knowledge, this has not yet been addressed by any existing approaches. Within a detailed case study, we show that our novel scheme improves the performance of existing caching approaches. Furthermore, we demonstrate that different load conditions of the system require different sizes of the parts A and B to gain maximal performance. As a result, we extend database caching by a new dimension of optimization, namely by splitting of the procedure code into A and B. To solve this problem of dynamically balancing the code execution between cache and server, we define the maximum performance of a database cache over time and propose a stochastic model to capture the average execution time of a procedure. Based on the execution frequencies of primitive database operations, the model allows us to partially predict the response times for different sizes of A and B, hence providing a partial solution to the optimization problem. iii iv Zusammenfassung Datenbank-Cache-Techniken versprechen eine Verbesserung von Client-Server-Anwendungen hinsichtlich Performance und Skalierbarkeit. Anfragen werden direkt an den Cache geleitet und mit Hilfe lokal gespeicherter Daten ausgeführt. Der Cache-Inhalt wird dabei dynamisch entsprechend dem Verhalten von Anwendungen und Benutzern gefüllt. Fehlen dem Cache Daten oder erfordert die Ausführung der Anfrage die Prüfung transaktionaler Konsistenz, wird die Anfrage vom Server vervollständigt. Anwendungen profitieren von schnellen Antworten des Caches und der Entlastung des Servers. Das dynamische Umfeld, die transaktionale Kon- sistenz und die Komplexität der Anfragen bilden zusammen einen interessanten Forschungs- bereich f¨rEffizienzsteigerung von Client-Server-Systemen. In der vorliegenden Arbeit wird eine neue Technik zur geteilten und parallelen Ausführung von Prozeduren aufgezeigt. Nahezu jedes kommerzielle Datenbanksystem bietet heutzutage derartige Prozeduren, die in einer vollständigen Programmiersprache implementiert sind und auf der Ebene des Datenbanksystems ausgeführt werden. Als Erweiterung für Prozeduren werden Geteilte Zwillingstransaktionen vorgestellt, die den Code logisch in zwei Teile A und B teilen, so dass A vom Cache und B vom Server zeitgleich ausgeführt werden. Mögliche Teilungen des Codes werden vorweg durch eine Code-Analyse ermittelt. Entsprechend meinen Erkenntnissen wurde dieses Problem bisher von keiner der existierenden Techniken betrachtet. In einer detaillierten Fallstudie wird die verbesserte Performance der hier dargestellten Methoden aufgezeigt. Weiterhin wird gezeigt, dass verschiedene Lastbedingungen eine unter- schiedliche Dimensionierung der Teile A und B zum Erreichen der maximalen Performance erfordern. Somit eröffnet die richtige Wahl von A und B eine neue Dimension zur Optimierung von Datenbank-Cache-Techniken. Um das Problem des dynamischen Ausbalancierens der Code-Ausführung auf einem Cache sowie dem Server zu lösen, wird die zeitabhängige maximale Cache-Performance und ein stochastisches Modell zur Erfassung der durchschnittlichen Ausführungszeit von Prozeduren definiert. Basierend auf Ausführungshäufigkeiten von elementaren Datenbankoperationen ermöglicht das Modell die partielle Vorhersage von Antwortzeiten für verschiedene Dimen- sionierungen von A und B. Somit wird eine partielle Lösung des Optimierungsproblems er- reicht. v vi Contents Abstract iii Zusammenfassung v I Overview and Motivation 1 1 Introduction 3 1.1 DatabaseCaching ................................. 4 1.2 ProblemStatement................................ 6 1.3 Outline ....................................... 8 2 Database Caching Techniques 9 2.1 TheNeedforDatabaseCaching. ... 9 2.2 DatabaseCacheDesign ............................. 12 2.3 Request Execution and Concurrency Control . ....... 15 2.4 Loading the Cache and Self-Adaptation . ...... 18 2.5 A Classification of Caching Schemes . ..... 20 2.6 Related Fields to Database Caching . ..... 22 2.6.1 MobileDatabases.............................. 22 2.6.2 IntegrityEnforcement . 24 3 Problem Analysis 27 3.1 TheExample .................................... 27 3.2 TraditionalCachingSchemes . 28 3.2.1 Avoidance-Based Protocols . 29 3.2.2 Detection-Based Protocols . 31 3.2.3 Discussion.................................. 32 3.3 ASimultaneousExecutionScheme . 33 3.3.1 TwinTransactions ............................. 33 3.3.2 Partial Execution and Re-Use of Query Results . ...... 34 3.3.3 TheRunTimeOptimizationProblem . 39 3.4 SummaryandDiscussion . .. .. .. .. .. .. .. .. 40 vii II Split Twin Transactions for Database Caching 43 4 A Client-Server Database System for Twin Transactions 45 4.1 Overview ...................................... 45 4.2 Data Definition and Stored Procedures . ...... 48 4.2.1 TablesandConstraints . 48 4.2.2 TheLanguageofStoredProcedures . 48 4.2.3 Low-LevelProcedures . 50 4.3 TheExecutionEngine .............................. 52 4.4 TableFragmentation .. .. .. .. .. .. .. .. .. 54 4.4.1 The Definition of Fragments . 54 4.4.2 Fragment Access of Queries and Low-Level Updates . ...... 55 4.4.3 The Management of Fragments . 57 4.4.4 FragmentVersions ............................. 59 4.5 Fragment Replication and Synchronization . ......... 61 4.5.1 Propagate Updates to Caches . 62 4.5.2 Receive Updates from the Server . 63 4.5.3 Dynamic Data Replication . 65 4.6 Executability of IO Statements . ..... 65 4.7 TwinTransactions ................................ 66 4.7.1 Unique Execution and Tuple Identifiers . 67 4.7.2 Consistent Twin Transaction . 69 4.8 RelatedWork.................................... 71 4.9 SummaryandDiscussion . .. .. .. .. .. .. .. .. 72 5 A Simultaneous Execution Scheme for Split Procedure Code 75 5.1 Split Twin Transactions — An Overview . ..... 76 5.2 TheQueryResultCache............................. 79 5.3 ExecutionattheCache ............................. 80 5.3.1 ThePartialExecutionModel . 80 5.3.2 HandlingIOStatements . 83 5.3.3 CompletinganExecution . 84 5.4 ExecutionattheServer ............................ 84 5.4.1 RetrievingQueryResultsfromtheQRC . 84 5.4.2 HandlingIOStatements . 87 5.4.3 VerifyingQueryResults . 88 5.4.4 CompletinganExecution . 88 5.5 Identify Dependencies within the Procedure Code . .......... 89 5.6 Correctness of Split Twin Transactions . ........ 94 5.7 RelatedWork.................................... 95 5.8 SummaryandDiscussion . .. .. .. .. .. .. .. .. 96 viii 6 Case Study — The ONE-System 99 6.1 Architecture.................................... 99 6.1.1 DatabaseSchemaandProcedureCode . 99 6.1.2 The Client-Server Database System . 100 6.1.3 A Traditional Execution Scheme . 101 6.1.4 The Simultaneous Execution Scheme . 103 6.2 ExperimentalSetup ............................... 104 6.2.1 UsedHard-andSoftware .. .. .. .. .. .. .. 104 6.2.2 ParametersofExperiments . 105 6.2.3 TheExecutionofExperiments . 106 6.3 ComparisonoftheExecutionSchemes . 108 6.3.1 The Traditional Execution Scheme . 108 6.3.2 The Simultaneous Execution Scheme With Dependent Queries . 113 6.3.3 The Simultaneous Execution Scheme With Independent Queries . 117 6.4 Other Related Experimental Results . 120 6.4.1 Performance of Slow Database Caches . 121 6.4.2 Sharedvs.NormalMode .. .. .. .. .. .. .. 122 6.5 SummaryandDiscussion . .. .. .. .. .. .. .. .. 125 III Self-Adaptive Load Balancing between Cache and Server 127 7 Dynamics, Performance and the Optimization 129 7.1 The Dynamics of Client-Server Database Systems . ......... 129 7.2 CacheExecutionPerformance. 133 7.2.1

A Simultaneous Execution Scheme for Database Caching

A Program Optimization for Automatic Database Result Caching

Managing Cache Consistency to Scale Dynamic Web Systems

A Trigger-Based Middleware Cache for Orms

Tuning Websphere Application Server Cluster with Caching Ii Tuning Websphere Application Server Cluster with Caching Contents

Database Performance Tuning Guide

Application-Level Caching with Transactional Consistency by Dan R

A Program Optimization for Automatic Database Result Caching

Middle-Tier Database Caching for E-Business *

Performance Analysis of a Database Caching System in a Grid Environment

A Trigger-Based Middleware Cache for Orms

Apollo: Learning Query Correlations for Predictive Caching in Geo-Distributed Systems

SQL Query Cache for the Mysql Database System