OLAP Solutions Building Multidimensional Information Systems

Y L F M A E T Team-Fly® OLAP Solutions Building Multidimensional Information Systems Second Edition Erik Thomsen Wiley Computer Publishing John Wiley & Sons, Inc. NEW YORK • CHICHESTER • WEINHEIM • BRISBANE • SINGAPORE • TORONTO OLAP Solutions Building Multidimensional Information Systems Second Edition OLAP Solutions Building Multidimensional Information Systems Second Edition Erik Thomsen Wiley Computer Publishing John Wiley & Sons, Inc. NEW YORK • CHICHESTER • WEINHEIM • BRISBANE • SINGAPORE • TORONTO Publisher: Robert Ipsen Editor: Robert Elliott Developmental Editor: Emilie Herman Managing Editor: John Atkins New Media Editor: Brian Snapp Text Design & Composition: MacAllister Publishing Services, LLC Designations used by companies to distinguish their products are often claimed as trademarks. In all instances where John Wiley & Sons, Inc., is aware of a claim, the product names appear in initial capital or all capital letters. Readers, however, should contact the appropriate companies for more complete information regarding trademarks and registration. This book is printed on acid-free paper. Copyright © 2002 by Erik Thomsen. All rights reserved. Published by John Wiley & Sons, Inc. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4744. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 605 Third Avenue, New York, NY 10158-0012, (212) 850-6011, fax (212) 850-6008, E-Mail: PERMREQ @ WILEY.COM. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold with the understanding that the publisher is not engaged in professional services. If professional advice or other expert assistance is required, the services of a competent professional person should be sought. Library of Congress Cataloging-in-Publication Data: ISBN: 0-471-40030-0 Printed in the United States of America. 10987654321 Advanced Praise for OLAP Solutions, Second Edition “Erik Thomsen’s book goes in depth where other books have not. In terms of completeness, readability, and merging theory and practice, I strongly recommend this book. If you buy only one book on OLAP this year, it should be OLAP Solutions, Second Edition.” W.H. Inmon Partner, www.billinmon.com “Erik Thomsen’s first edition of OLAP Solutions is widely acknowledged as the standard desk reference for all serious practitioners in the areas of OLAP systems, decision support, data warehousing, and business analysis. All of us have benefited immeasurably from its clear, concise, and comprehensive treatment of multidimensional information systems. The second edition of OLAP Solutions not only continues this great tradition, but also contains many new and profound contributions. In particular, by introducing the LC Model for OLAP and providing thorough examples of its application, this book offers a logically grounded, multidimensional framework and language that overcomes the conceptual difficulties generally encountered in the specification and use of OLAP models. OLAP Solutions, Second Edition, will revolutionize how we think about, build, and use OLAP technologies.” John Poole Distinguished Software Engineer, Hyperion Solutions Corporation “Erik has done it again! I found his latest work updated to reflect valuable new information regarding the fast-paced changes in OLAP tools and methods. I would recommend this book to those who already have the first edition on their bookshelves for the valuable, updated content that it provides and to those who need to move beyond the beginners’ stage of working with OLAP products.” Alan P. Alborn Vice President, Science Applications International Corporation “This book is a ‘must read’ for everyone that purports to be a player in the field, as well as for developers that are building today’s leading edge analytical applications. Readers who take advantage of this material will form a much greater understanding of how to structure their analytical applications.” Frank McGuff Independent consultant vi Advanced Praise “This should be required reading for students and practitioners who plan to or are working in the OLAP arena. In addition to having quite a bit of practical advice, it is well suited to be used as a reference for a senior-level undergraduate or graduate- level data mining course. A ‘relational algebra’ for OLAP was sorely needed, and the real-world examples make you think about how to apply OLAP technology to actually help a business.” David Grossman Assistant Professor, Illinois Institute of Technology “This book is a comprehensive introduction to OLAP analysis. It explains this complex subject and demonstrates the power of OLAP in assisting decision makers.” Mehdi Akhlaghi Information Officer, Development Data Group of the World Bank To Hannah and Max and the hopefully joyous lives in which you’ll be fully immersed by the time you can understand this book. Y L F M A E T Team-Fly® Contents Preface xix Acknowledments xxiii Part One The Need for Multidimensional Technology 1 Chapter 1 The Functional Requirements of OLAP Systems 3 The Different Meanings of OLAP 5 Where OLAP Is Useful 6 Desired Attributes of General Information Processing 6 The Distinction between Transaction and Decision Support Processing 8 Decision Stages 15 The Functional Requirements for OLAP 18 User Challenges 18 The Goal-Challenge Matrix 20 Core Logical Requirements 21 Core Physical Requirements 23 Summary 24 Chapter 2 The Limitations of Spreadsheets and SQL 29 The Evolution of OLAP Functionality in Spreadsheets and SQL Databases 30 Spreadsheets and OLAP 30 OLAP, the Relational Model, and SQL Databases 31 Proving the Point: Some Examples 32 A Common Starting Point 33 Trying to Provide OLAP Functionality with a Spreadsheet 36 Trying to Provide OLAP Functionality with SQL 38 Summary 44 ix x Contents Chapter 3 Thinking Clearly in N Dimensions 47 Lower-Dimensional Data Sets 47 Beyond Three Dimensions 51 Multidimensional Type Structures (MTSs) 55 Adding a Fourth Dimension 56 Representing Hypercubes on a Computer Screen 57 Analytical Views 64 Summary 66 Part Two Core Technology 69 Chapter 4 Introduction to the LC Model 71 Disarray in the OLAP Space 73 Terms Frequently Used to Refer to the Same Thing 73 Open Issues 73 Critique of Implicit Issue-Specific Approaches to OLAP Modeling 75 Attributes of an Ideal Model 82 Theoretic Groundedness 83 Completeness 84 Efficiency 85 Analytical Awareness 85 Overview of the Located Content Model 85 A Functional Approach 86 Super Symmetry 86 Type Structures 89 Schemas and Models 90 Summary 92 Chapter 5 The Internal Structure of a Dimension 93 Nonhierarchical Structuring 95 Method of Definition 95 Cardinality 95 Ordering 96 Metrics 101 Dimensional Hierarchies 107 Overview 107 Hierarchies in General 108 Ragged Hierarchies 111 Referencing Syntax 115 Leveled or Symmetric Hierarchies 118 Contents xi Leveled Dimensions with Nominally Ordered Instances 119 Leveled Dimensions with Ordinally Ordered Instances 122 Leveled Dimensions with Cardinally Ordered Instances: Time and Space 123 Space 125 Constant Scaling Factors 125 Multiple Hierarchies per Type 126 Deviant Hierarchies 129 Pseudolevels 129 Ordering 130 Dummy Members 132 Mixed Hierarchies 133 Summary 135 Chapter 6 Hypercubes or Semantic Spaces 137 Meaning and Sparsity 138 Types of Sparsity 139 Defining Application Ranges 143 Meaning and Comparability 148 When a New Dimension Needs a New Cube 148 A Single Domain Schema 151 Multidomain Schemas 152 Not Applicable versus Nonvarying 157 Joining Cubes with Nonconformant Dimensions 161 Summary 163 Chapter 7 Multidimensional Formulas 165 Formulas in a Multidimensional Context 166 Formula Hierarchies or Dependency Trees 166 Cell- and Axis-Based Formulas 169 Precedences 171 Multidimensional Formulas 177 Anatomy of a Multidimensional Aggregation Formula 178 Formula Types 180 Formula Complexities 193 Summary 199 Chapter 8 Links 201 Types of Links 203 Structure Tables and Links 207 Data Tables and Content Links 208 xii Contents Row-Member Links 209 Column-Member Links 209 Cell Links 211 Table-Member Links 211 Preaggregation 212 Summary 213 Chapter 9 Analytic Visualization 215 What Is Data Visualization? 216 The Semantics of Visualization 219 Graphic versus Tabular-Numeric Representation 219 Picking the Appropriate Visualization Forms 229 Using Data Visualization for Decision Making 233 Business Pattern Analysis 233 Visualizing Multidimensional Business Data 239 Subsetting 241 Repeating or Tiled Patterns 242 Examples of More Complex Data Visualization Metaphors 242 Product Sales Analysis 243 Credit Car Application Analysis 243 Web Site Traffic Analysis 244 Summary 245 Chapter 10 Physical Design of Applications 247 Data Distribution 249 Within Machines 249 Within an Application 251 Across Machines 258 Across Applications (ROLAP and HOLAP) 262 Across Partitions 264 Calculation Distributions 264 Temporal 265 Across Applications and Utilities 266 Common Configurations 267 Departmental: Relational Data Mart with a Multidimensional Client 267

OLAP Solutions Building Multidimensional Information Systems

Middleware in Action 2007

Advanced Sql Programming Third Edition

Up to a Point, Lord Copper

Joe Celko's SQL for Smarties

Type Inference, Type Improvement, and Type Simplification in a Language with User-Defined Polymorphic Relational Operators

Confusing Physcal and Logical Levels of Abstraction Is The

Missing Data in the Relational Model

Oral History of C. J. Date

Stephen Gray

Relational and Object-Relational Database Management Systems As Platforms for Managing Software Engineering Artifacts

Data and Databases Concepts in Practice.Pdf

Stephen Gray