Distributed Database Management Systems
Total Page:16
File Type:pdf, Size:1020Kb
DISTRIBUTED DATABASE MANAGEMENT SYSTEMS A Practical Approach SAEED K. RAHIMI University of St. Thomas FRANK S. HAUG University of St. Thomas A JOHN WILEY & SONS, INC., PUBLICATION DISTRIBUTED DATABASE MANAGEMENT SYSTEMS DISTRIBUTED DATABASE MANAGEMENT SYSTEMS A Practical Approach SAEED K. RAHIMI University of St. Thomas FRANK S. HAUG University of St. Thomas A JOHN WILEY & SONS, INC., PUBLICATION All marks are the properties of their respective owners. IBM® and DB2® are registered trademarks of IBM. JBoss®,RedHat®, and all JBoss-based trademarks are trademarks or registered trademarks of Red Hat, Inc. in the United States and other countries. Linux® is a registered trademark of Linus Torvalds. Access®, ActiveX®, Excel®,MS®, Microsoft®, MS-DOS®, Microsoft Windows®,SQLServer®, Visual Basic®,VisualC#®,VisualC++®, Visual Studio®, Windows 2000®, Windows NT®, Windows Server®, Windows Vista®, Windows XP®, Windows 7, and Windows® are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. Oracle® is a registered trademark of Oracle Corporation. Sun®,Java®, and all Java-based marks are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. Sybase® is a registered trademark of Sybase, Inc. UNIX® is a registered trademark of The Open Group. Copyright © 2010 by IEEE Computer Society. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400, fax 978-646-8600, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services please contact our Customer Care Department within the U.S. at 877-762-2974, outside the U.S. at 317-572-3993 or fax 317-572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print, however, may not be available in electronic format. Library of Congress Cataloging-in-Publication Data is available. ISBN 978-0-470-40745-5 Printed in the United States of America. 10987654321 To my mother, Behjat, and my father, Mohammad—though they were not given the opportunity to attend or finish school, they did everything they could to make sure that all of their seven children obtained college degrees. S. K. R. To my mother who taught me to love reading, learning, and books; my father who taught me to love mathematics, science, and computers; and the rest of my family who put up with me while we wrote this book —this would not have been possible without you. F. S. H. CONTENTS Preface xxv 1 Introduction 1 1.1 Database Concepts, 2 1.1.1 Data Models, 2 1.1.2 Database Operations, 2 1.1.3 Database Management, 3 1.1.4 DB Clients, Servers, and Environments, 3 1.2 DBE Architectural Concepts, 4 1.2.1 Services, 4 1.2.2 Components and Subsystems, 5 1.2.3 Sites, 5 1.3 Archetypical DBE Architectures, 6 1.3.1 Required Services, 6 1.3.2 Basic Services, 7 1.3.3 Expected Services, 8 1.3.4 Expected Subsystems, 9 1.3.5 Typical DBMS Services, 10 1.3.6 Summary Level Diagrams, 11 1.4 A New Taxonomy, 13 1.4.1 COS Distribution and Deployment, 13 1.4.2 COS Closedness or Openness, 14 1.4.3 Schema and Data Visibility, 15 1.4.4 Schema and Data Control, 16 1.5 An Example DDBE, 17 1.6 A Reference DDBE Architecture, 18 1.6.1 DDBE Information Architecture, 18 1.6.2 DDBE Software Architecture, 20 vii viii CONTENTS 1.6.2.1 Components of the Application Processor, 21 1.6.2.2 Components of the Data Processor, 23 1.7 Transaction Management in Distributed Systems, 24 1.8 Summary, 31 1.9 Glossary, 32 References, 33 2 Data Distribution Alternatives 35 2.1 Design Alternatives, 38 2.1.1 Localized Data, 38 2.1.2 Distributed Data, 38 2.1.2.1 Nonreplicated, Nonfragmented, 38 2.1.2.2 Fully Replicated, 38 2.1.2.3 Fragmented or Partitioned, 39 2.1.2.4 Partially Replicated, 39 2.1.2.5 Mixed Distribution, 39 2.2 Fragmentation, 39 2.2.1 Vertical Fragmentation, 40 2.2.2 Horizontal Fragmentation, 42 2.2.2.1 Primary Horizontal Fragmentation, 42 2.2.2.2 Derived Horizontal Fragmentation, 44 2.2.3 Hybrid Fragmentation, 47 2.2.4 Vertical Fragmentation Generation Guidelines, 49 2.2.4.1 Grouping, 49 2.2.4.2 Splitting, 49 2.2.5 Vertical Fragmentation Correctness Rules, 62 2.2.6 Horizontal Fragmentation Generation Guidelines, 62 2.2.6.1 Minimality and Completeness of Horizontal Fragmentation, 63 2.2.7 Horizontal Fragmentation Correctness Rules, 66 2.2.8 Replication, 68 2.3 Distribution Transparency, 68 2.3.1 Location Transparency, 68 2.3.2 Fragmentation Transparency, 68 2.3.3 Replication Transparency, 69 2.3.4 Location, Fragmentation, and Replication Transparencies, 69 2.4 Impact of Distribution on User Queries, 69 2.4.1 No GDD—No Transparency, 70 2.4.2 GDD Containing Location Information—Location Transparency, 72 2.4.3 Fragmentation, Location, and Replication Transparencies, 73 2.5 A More Complex Example, 73 2.5.1 Location, Fragmentation, and Replication Transparencies, 75 2.5.2 Location and Replication Transparencies, 76 CONTENTS ix 2.5.3 No Transparencies, 77 2.6 Summary, 78 2.7 Glossary, 78 References, 79 Exercises, 80 3 Database Control 83 3.1 Authentication, 84 3.2 Access Rights, 85 3.3 Semantic Integrity Control, 86 3.3.1 Semantic Integrity Constraints, 88 3.3.1.1 Relational Constraints, 88 3.4 Distributed Semantic Integrity Control, 94 3.4.1 Compile Time Validation, 97 3.4.2 Run Time Validation, 97 3.4.3 Postexecution Time Validation, 97 3.5 Cost of Semantic Integrity Enforcement, 97 3.5.1 Semantic Integrity Enforcement Cost in Distributed System, 98 3.5.1.1 Variables Used, 100 3.5.1.2 Compile Time Validation, 102 3.5.1.3 Run Time Validation, 103 3.5.1.4 Postexecution Time Validation, 104 3.6 Summary, 106 3.7 Glossary, 106 References, 107 Exercises, 107 4 Query Optimization 111 4.1 Sample Database, 112 4.2 Relational Algebra, 112 4.2.1 Subset of Relational Algebra Commands, 113 4.2.1.1 Relational Algebra Basic Operators, 114 4.2.1.2 Relational Algebra Derived Operators, 116 4.3 Computing Relational Algebra Operators, 119 4.3.1 Computing Selection, 120 4.3.1.1 No Index on R, 120 4.3.1.2 B + Tree Index on R, 120 4.3.1.3 Hash Index on R, 122 4.3.2 Computing Join, 123 4.3.2.1 Nested-Loop Joins, 123 4.3.2.2 Sort–Merge Join, 124 4.3.2.3 Hash-Join, 126 4.4 Query Processing in Centralized Systems, 126 4.4.1 Query Parsing and Translation, 127 4.4.2 Query Optimization, 128 4.4.2.1 Cost Estimation, 129 x CONTENTS 4.4.2.2 Plan Generation, 133 4.4.2.3 Dynamic Programming, 135 4.4.2.4 Reducing the Solution Space, 141 4.4.3 Code Generation, 144 4.5 Query Processing in Distributed Systems, 145 4.5.1 Mapping Global Query into Local Queries, 146 4.5.2 Distributed Query Optimization, 150 4.5.2.1 Utilization of Distributed Resources, 151 4.5.2.2 Dynamic Programming in Distributed Systems, 152 4.5.2.3 Query Trading in Distributed Systems, 156 4.5.2.4 Distributed Query Solution Space Reduction, 157 4.5.3 Heterogeneous Database Systems, 170 4.5.3.1 Heterogeneous Database Systems Architecture, 170 4.5.3.2 Optimization in Heterogeneous Databases, 171 4.6 Summary, 172 4.7 Glossary, 173 References, 175 Exercises, 178 5 Controlling Concurrency 183 5.1 Terminology, 183 5.1.1 Database, 183 5.1.1.1 Database Consistency, 184 5.1.2 Transaction, 184 5.1.2.1 Transaction Redefined, 188 5.2 Multitransaction Processing Systems, 189 5.2.1 Schedule, 189 5.2.1.1 Serial Schedule, 189 5.2.1.2 Parallel Schedule, 189 5.2.2 Conflicts, 191 5.2.2.1 Unrepeatable Reads, 191 5.2.2.2 Reading Uncommitted Data, 191 5.2.2.3 Overwriting Uncommitted Data, 192 5.2.3 Equivalence, 192 5.2.4 Serializable Schedules, 193 5.2.4.1 Serializability in a Centralized System, 194 5.2.4.2 Serializability in a Distributed System, 195 5.2.4.3 Conflict Serializable Schedules, 196 5.2.4.4 View Serializable Schedules, 196 5.2.4.5 Recoverable Schedules, 197 5.2.4.6 Cascadeless Schedules, 197 5.2.5 Advanced Transaction Types, 197 5.2.5.1 Sagas, 198 5.2.5.2 ConTracts, 199 CONTENTS xi 5.2.6 Transactions in Distributed System, 199 5.3 Centralized DBE Concurrency Control, 200 5.3.1 Locking-Based Concurrency Control Algorithms, 201 5.3.1.1 One-Phase Locking, 202 5.3.1.2 Two-Phase Locking, 202 5.3.1.3