Copyrighted Material
Total Page:16
File Type:pdf, Size:1020Kb
14_777099 bindex.qxp 6/2/06 3:55 PM Page 329 Index A multi-valued attributes, 276–278 account balances, summarization, 31 roles, 278–280 accounting systems, 3 summarization, 281 accumulating snapshots uses for, 275 made-to-order manufacturing, 265 change processing, 76 mortgage application example, conformance matrix, 70–72 265–266 contracts, 282 summarization, 267–268 dimensions, 90–93 ad hoc query discussed, 67, 257 front-end tools, 121 documentation working without aggregate attribute type, 103 navigator, 150 change data identification, 102 Adamson, Chris (Data Warehouse column name, 102 Design Solutions), 3 conformance matrix, 99, 101 administrators, as user input, 44 data types, 103 aggregate design dependent fact tables, 102 accumulating snapshots dimension tables, 101–103 made-to-order manufacturing, fact tables, 103–106 265 COPYRIGHTED MATERIALgood design steps, 98 mortgage application example, growth rate, 102 265–266 load frequency, 102 summarization, 267–268 location, 102 aggregation points, 74 pre-joined aggregates, 106–107 audit dimension, 80 rows, 102 base schema, 68–72 schema family, 99 bridge tables size, 102 facts, allocation factors, 280 source columns, 103 invisible aggregates, 278 329 14_777099 bindex.qxp 6/2/06 3:55 PM Page 330 330 Index aggregate design (continued) third normal form, 287–289 table name, 102 time-stamped dimensions table-level, 103–106 date/time values, 274–275 drill-down capability, 76 discussed, 272 ETL process, 68 transaction dimensions, 273 fact tables type 1 change, 76 audit dimension, 96 type 2 change, 76 counts, 94–95 aggregate navigator coverage tables, 271 applications, 118 degenerate dimensions, 96 automatic identification, 118 factless events, 269–271 back-end tools grain statement, 93 advantages/disadvantages, names and data types, 94 130–133 source attributes, 97 base star schema, 132 summarization, 95 discussed, 123 fuzzy grain declaration, 68–69 fact tables, 130 grain identification, 68–69 materialized query tables, 130 hierarchies, 76–79 middle-tier applications, 129 housekeeping columns, 78–80, 92 pre-joined aggregates, 132 level field, 69 proprietary extensions, 132 materialized views, 108 query rewrites, 125 naming conventions single navigation system, 124 attributes, 87–88 base star schema, 120 fact tables, 88–90 discussed, 2 natural keys, 74–75 dynamic availability, 120–121 normalization, 287–289 front-end tools periodic snapshot design ad hoc query, 121 averages, 263–264 advantages/disadvantages, bank account example, 259 128–129 invisible aggregates for, 261–262 dashboard data, 121 less frequent snapshots, 264–265 data mining tools, 121 product inventory, 259 desktop reporting software, 121 semi-additive facts, 260–261 metadata, 127 transactions, 258–259 query rewrites, 127 pre-joined aggregates, 86–87, 98 single navigation approach, 122 processing time, 78–79 spreadsheets, 121 product dimension, 76–77 SQL queries, 121 query tables, 108 statistical analysis, 121 rollup dimensions, 72 invisible aggregates, 26–27 separate table advantages, 85 materialized query tables single schema design, 81, 83–84 dimension tables and, 147–148 snowflake schema approach, discussed, 144 282–286 as fact tables, 146–147 source mapping, 75 as pre-joined aggregates, 145–146 source system, 68 14_777099 bindex.qxp 6/2/06 3:55 PM Page 331 Index 331 materialized views large database considerations, 57 as dimension tables, 141–142 month dimension, 37 dimensional hierarchies, 142–143 navigation capabilities, 56 as fact tables, 140–141 number of aggregate considerations, as pre-joined aggregates, 137–139 56–57 summary tables, 144 order entry system, 36 Oracle RDBMS capabilities, 137 performance benefits, 57, 64 performance add-on technologies, poor performers, 54–55 134–136 power users, 55 query rewrite process, 120 pre-joined aggregates, 39, 56 reasons for, 116–117 product sales data, 50 reporting tools, 121 query patterns, 49 requirements summary, 126 regional reports, 51 snowflake schema design, 118 report details, 49, 54 star schema support, 126 resource constraints, 64 views, 117–119 scheduled reports, 54 working without, 148–150 skew, 60–61 aggregate selection criteria space allocation, 55–56 aggregate levels, 39–40 subject area considerations, 45–48 automated alerts, 49 subreports, 51 batched reports, 54 subscriptions, 54 broadcasted reports, 54 subtotal data, 49–50 bursted reports, 54 summarization, 37 business requirement importance, 64 summary rows and pages, 49–51 cardinality, 62–63 user benefits, 54 category values, savings across, 61 user input, 44–45 chart data, 49 value measurements, 59–61 configuration tasks, 56 aggregate tables conformance matrix, 45–46 discussed, 2 dashboard data, 49 space consumed by, 56–57 day dimension, 37–38 aggregation principles, invisible design potential, 41–45 aggregates, 27–28 dimensionally thinking, 40 applications, aggregate navigator, 118 discussed, 35 architecture, data warehouse, 20–22 disk input/output, 58 archives drill-across reports, 46–47, 49, 51 dimensional history, 295 drill-up/drill-down capabilities, ETL process, 292 43–44 monthly frequencies, 293, 296 grain statement, 36–37, 42 off-line data, 297–299 groups of dimension attributes, purge versus, 297 42–43 quarterly frequencies, 293–294 hierarchical viewpoint, 40 summarization, 296 incremental value, 64 weekly frequencies, 297 key users, needs for, 55 14_777099 bindex.qxp 6/2/06 3:55 PM Page 332 332 Index attributes Boolean columns, dimension tables, 9 aggregate design documentation, 103 bridge tables aggregate dimensions, 90–91 facts, allocation factors, 280 naming conventions, 87–88 invisible aggregates, 278 audit dimension multi-valued attributes, 276–278 aggregate design, 80 roles, 278–280 fact table design, 96 summarization, 281 automated alerts, aggregate selection uses for, 275 criteria, 49 broadcasted reports, 54 automatic identification, aggregate build stage, data warehouse navigator, 118 implementation automation routine, ETL process, 172 controlled iterations, 246 availability considerations, table, deliverables and tasks, 246 114–116 documentation, 248 averages navigation testing, 249 importance of, 15 performance testing, 249 periodic snapshot design, 263–264 refresh function, 248 scope creep, 245 B bursted reports, 54 back-end tools, aggregate navigator business intelligence (BI) software, 22 advantages/disadvantages, 130–133 business processes base star schema, 132 execution, operational systems, 3 discussed, 123 results, accurate measurement of, 27 fact tables, 130 business requirement importance, materialized query tables, 130 aggregate selection criteria, 64 middle-tier applications, 129 pre-joined aggregates, 132 C proprietary extensions, 132 caching, 166 query rewrites, 125 calculations, ETL load process, 208–209 single navigation system, 124 candidate aggregates. See aggregate base fact tables, 25 selection criteria base schema cardinality, aggregate selection aggregate design, 68–72 criteria, 62–63 invisible aggregates, 25 category values, aggregate selection schema family, 25 criteria, 61 base star schema change data identification aggregate navigator, 120 aggregate design documentation, back-end tools, 132 102 data warehouse implementation ETL load process, 156–157, 188–189 design stage, 243 change history, surrogate keys, 9 ETL load process, 158–159, 192–193 change processing, aggregate batched reports, 54 design, 76 best practices, 2 chart data, aggregate selection BI (business intelligence) software, 22 criteria, 49 14_777099 bindex.qxp 6/2/06 3:55 PM Page 333 Index 333 clustering, 232 data marts code-generating tools, ETL load data warehouse implementation, 226 process, 156 defined, 19–20 columns examples of, 45 aggregate design documentation, forecasted data, 45 102 planning around conformed dimension tables, 8 dimensions, 226 fact tables, 10 data mining tools, front-end tools, 121 housekeeping, 78–80 data types compression, data warehouse aggregate design documentation, implementation, 232 103 configuration tasks, aggregate fact table design, 94 selection criteria, 56 pre-joined aggregate design, 98 conformance matrix data warehouse aggregate design, 70–72 architecture, 20–22 aggregate selection criteria, 45–46 business process evaluation, 4 data warehouse development implementation strategy stage, 236 build stage, 245–249 documentation, 99, 101 challenges, 231 ETL load process, 190–192 clustering, 232 conformed rollup, 18–19 compression, 232 contracts, aggregate design, 282 data marts, planning around controlled iterations, data warehouse conformed dimensions, 226 implementation build stage, deployment, 250–252 246 design stage, 241–244 counts, 15, 94–95 documentation, 250 coverage tables, 271–272 drill-across reports, 233 CREATE DIMENSION statement, 141 end user education, 252 CREATE MATERIALIZED VIEW enterprise-level scope, avoiding, statement, 139, 145 227–228 CREATE TABLE statement, 148 final testing, 250–251 criteria-based tool selection, data follow-on projects, 232 warehouse implementation incremental, 226–230 strategy state, 240 iterative design, 233–234 cube view, materialized query tables, JAD (joint application 147 development), 233 custom-build applications, operational parallelization, 232 systems, 20 partitioning, 232 performance improvements, 230 D production, transitioning to, 250 dashboard data RAD (rapid application aggregate selection criteria, 49 development), 233 front-end tools, 121 stovepipe solutions, avoiding, data extraction, ETL load process, 229–230 195–197 strategy stage, 234–238 14_777099 bindex.qxp 6/2/06 3:55 PM Page