Analysis Services Best Practices
Total Page:16
File Type:pdf, Size:1020Kb
presented by Marco Russo [email protected] sqlbi.com sqlbi.com Who am I Latest conferences BI Expert and Consultant PASS Europe 2009 – Neuss – Germany Problem Solving Complex Project Assistance PASS 2009 – Seattle – USA DataWarehouse Assesments and Development SQL Conference 2010 – Milan – Italy Courses, Trainings and Workshops Teched 2010 – New Orleans – USA Microsoft Business Intelligence Partner Book Writer 24 Hours of PASS 2010 – Online PASS 2010 – Seattle – USA sqlbi.com Agenda DATA SOURCE (RELATIONAL MODELING) Relational Schema Decoupling Layer Dimensional Patterns Slowly Changing Dimensions Junk Dimensions Parent-Child Hierarchies Role Dimensions Drill-through Calculation Dimensions sqlbi.com sqlbi.com 1 CONFIGURATION Source OLTP DB SQLBI Methodology Relational Schema SNOWFLAKE SCHEMA Analysis Services reads data Mirror OLTP from Data Mart A Data Mart is not the Data Staging Area Warehouse ODS Data Warehouse Operational Data Store Data Marts OLAP Cubes Custom Reports Client Tools Other Systems Excel, Proclarity, … Customers Relational Schema Relational Schema STAR SCHEMA STAR VS. SNOWFLAKE SCHEMA Options for dimensions from snowflake schema: Transform into a star schema by using views Transform into a star schema by using DWV queries Join tables in SSAS dimensions Referenced Dimension Ideal solution Use SQL views to generate a star schema The star schema eliminates ambiguity Data Source Decoupling USE VIEWS TO DECOUPLE DIFFERENT LAYERS OF A BI SOLUTION DATA SOURCE (RELATIONAL MODELING) OLTP OLTP Mirror DWH DATA MART CUBES • COPY • SELECT • INTEGRATION • SELECTION • AGGREGATION Views • RENAME • CLEANSING • FILTERS • PRESENTATION Views Views • STANDARDIZAT • JOINS Views• USER ION INTERFACE sqlbi.com sqlbi.com 2 Data Source Decoupling Data Source Decoupling DECOUPLE SSAS MODEL – RDBMS STRUCTURE DEFINE VIEWS FOR SSAS CUBES Model • One view for each dimension • Namespace = cube name (Common for shared dimensions, if necessary) • Convert a snowflake in a star schema for SSAS RELATIONAL DATA SOURCE VIEW CUBES DATA SOURCE • VIEWS • AGGREGATION Columns • STAR SCHEMA • RELATIONSHIPS • PRESENTATION • SNOWFLAKE SCHEMA • NO TABLES • Number of view’s columns = number of attributes of dimension • UNCONVENTIONAL SCHEMA • USER INTERFACE • NO NAMED VIEWS • Column name = attribute name VIEWS • NO CALCULATED COLUMNS • Use a naming convention for attribute keys • Simple calculations or string operations in views • Avoid calculated column in DSV • Default values when necessary • Eliminates NULL (i.e. in case of a LEFT JOIN) Data Source Decoupling SURROGATE KEY, ATTRIBUTE KEY WHY TO USE VIEWS? GROUPING, BANDING Decoupling Stored in DB Textual interface Easy to Database Can be change objects optimized sqlbi.com Surrogate Keys Surrogate Keys WHY YOU SHOULDN’T EXPOSE SURROGATE KEYS PATTERN FOR NOT EXPOSING SURROGATE KEYS IN SSAS Benefits Dimensione Customer • _CustomerKey • Independency from application keys • KeyColumns: ID_Customer (surrogate key) • SCD modeling • NameColumn: (none) • Star schema optimization • AttributeHierarchyVisible: False • AttributeRelationship Drawbacks with Analysis Services • Attribute: Customer • Cardinality: One • MDX query with surrogate keys • Customer • Don’t have a semantic value • KeyColumns: COD_Customer (business key) • NameColumn: CustomerName • Are invalidated if the cube is reprocessed (i.e. nightly one- • AttributeRelationship shot processing) • SCD Type I – all other attributes are related to this one • Impact on custom security and reporting • SCD Type II – evaluate on a case-by-case basis (impact on KeyColumns) sqlbi.com 3 Surrogate Keys Surrogate Keys PATTERN ATTRIBUTE HIERARCHY FOR SCD TYPE I PATTERN ATTRIBUTE HIERARCHY FOR SCD TYPE II Legend Legend Customer AttributeHierarchyVisible = True AttributeHierarchyVisible = True AttributeHierarchyVisible = False AttributeHierarchyVisible = False Address Address Customer City Country _CustomerKey Customer-City City Country Customer Dimension _CustomerKey Customer Dimension Customer- Category Category Category Customer- Area Area Area Hierarchies Hierarchies • Country / City / Customer • Country / City / Customer ( Customer-City ) • Category / Customer • Category / Customer ( Customer-Category ) • Area / Customer • Area / Customer ( Customer-Area ) Attribute Keys Attribute Keys PATTERN FOR CREATING NATURAL HIERARCHIES FROM CUSTOM HIERARCHIES PATTERN FOR PRODUCT DIMENSION Legend Hierarchies AttributeHierarchyVisible = True • Color / Category (CategoryColor ) / Product Use unique key for attribute AttributeHierarchyVisible = False • i.e. Subcategory code if it is unique • Category / Color ( ColorCategory ) / Product • Don’t create one in ETL if it doesn’t exist (use composite key instead) ColorCategory Category _ProductKey Product (surrogate key) If it doesn’t exist, use attribute with composite key CategoryColor Color • Set of keys corresponding to attribute hierarchy • Set AttributeHierarchyVisible = False CategoryColor • KeyColumns: COD_Category, COD_Color Use decoupling views to generate columns for • NameColumn: Category appropriate descriptions ColorCategory • Natural hierarchy: you may reference the parent’s name • KeyColumns:COD_Color, COD_Category • Browsable attribute: use a simple description • i.e. «February 2010» hierarchy - «February» attribute • NameColumn: Color Attribute Keys Grouping PATTERN FOR DATE DIMENSION ATTRIBUTE GROUPING – AUTOMATIC VS. MANUAL Legend AttributeHierarchyVisible = True Automatic grouping AttributeHierarchyVisible = False • DiscretizationMethod Month • EqualAreas (same number of elements) DayYearMonthly MonthYear Quarter • Clusters (Data Mining algorithm) • Automatic QuarterYear • DiscretizationBucketCount Date Year • Number of groups (surrogate key) Week DayYearWeekly WeekYear Manual grouping YearWeekly • Business logic in view for dimension on Data Mart Hierarchies CASE WHEN Weight IS NULL • Year / Quarter (QuarterYear ) / Month (MonthYear ) / Day (DayYearMonthly ) OR Weight<0 THEN 'N/A''N/A''N/A' • Year / Month (MonthYear ) / Day (DayYearMonthly ) WHEN Weight<10 THEN '0'0'0-'0 ---10Kg'10Kg'10Kg'10Kg' Year (YearWeekly ) / Week ( WeekYear ) / Day (DayYearWeekly ) WHEN Weight<20 THEN '10'10'10-'10 ---20Kg'20Kg'20Kg'20Kg' • ELSE '20Kg or more' ENDENDEND sqlbi.com 4 Banding CATEGORIZE A MEASURE ACCORDING TO RANGES OF VALUES SSAS DESIGN FOR SCD TYPE I AND TYPE II Direct relationship Indirect relationship • Range granularity is the same as • Discretization and constant dimension granularity granularity • Impact of range change • Impact of range change • Fact table ETL • Dimension ETL • Dimension ETL • Reprocess Dimension • Reprocess Dimension • Reprocess Cube sqlbi.com Type I SCD Type II SCD ISSUES WITH PROCESS UPDATE SSAS MODELING Process Update Process Dimension • Doesn’t detect duplicate key errors • Process Add instead of Process Update • Process Full would fail according with error handling settings • Attribute relationship: always Rigid • Flexible aggregations are regenerated Existing MDX Queries • Unique Name may change SSAS Dimension Modeling • MDXMissingMemberMode – Ignore (Default) • There is no specific dimension type for SCD • Missing elements are ignored • Surrogate key CustomerKey • Rows/columns disappear without a warning • MDXMissingMemberMode – Error • CustomerSCD attribute – (Primary key attribute) • Need to change the query • Business key CustomerAlternateKey • Some clients require to rebuild the query from scratch • Customer attribute Type II SCD ATTRIBUTE RELATIONSHIP AND HIERARCHIES CONSOLIDATE INDEPENDENT ATTRIBUTES “Standard” Modeling “Ideal” Modeling • Only «visible» attributes • Invisible attribute • CustomerSCD should be “Customer in City” invisible • Composite key City + • Otherwise may be CustomerAlternateKey confusing for end user • Used only in hierarchy • Country-City-Customer Country-City-Customer at hierarchy is not natural Customer level sqlbi.com sqlbi.com 5 Junk Dimension Junk Dimension MODELING PERFORMANCE AND SEMANTIC Junk dimension in relational modeling Reduce aggregation complexity • Fewer dimensions • Ideal solution for performance and consumed space • Same number of attributes, but optimization on AttributeKey more • New attribute requires reprocess of fact table efficient • Unless a valid default exists for existing data • Better performance in both query and process Junk dimension only on cube Usability • Create dimension based on a view • Improves if dimension name is intuitive • Use CROSS JOIN • Use more than one junk dimension if a single name is not good • Set Attribute Key hierarchy to invisibile • Creates also combinations of values that are never used • Carefully choose names • A new attribute always require cube reprocess • Otherwise you might invalidate existing queries by changing them later Parent-Child Hierarchies LIMITATIONS OF STANDARD PARENT-CHILD HIERARCHIES LIMITATIONS MODELING MULTIPLE HIERARCHIES Functional limits • Only one parent-child hierarchy per dimension • Must use the dimension’s Attribute Key • Unique name for a member includes surrogate keys Performance issues • Increase processing time • MDX calculation is more complex (especially SCOPE, autoexists e CurrentMember) • Slower query response (Good up to some thousand members) Alternative sqlbi.com • Parent-child naturalizer in BIDS Helper Parent-Child Hierarchies Parent-Child Hierarchies STANDARD PARENT-CHILD HIERARCHIES IN ANALYSIS SERVICES STANDARD PARENT-CHILD VS MULTIPLE HIERARCHIES Pros A multiple hierarchy IS NOT a parent-child with just some more nodes • Internal