Master Data Management Aligning Data, Process, and Governance Donna Burbank Global Data Strategy, Ltd. April 23rd , 2020
Follow on Twitter @donnaburbank Twitter Event hashtag: #DAStrategies Copyright Global Data Strategy, Ltd. 2020 Simplifying Advanced Data Workloads with NoSQL Data Management for Modern Data Demands
David Jones-Gilardi Developer Advocate @DataStax NoSQL and Cassandra: Foundational Capability
LEGACY DATA REAL-TIME, DISPARATE, INTEGRATION STREAMING, EVENTS SILO’D DATA
2 Modern Data Diversity and Complexity
LEGACY DATA REAL-TIME, DISPARATE, INTEGRATION STREAMING, EVENTS SILO’D DATA
DATA SECURITY / UNPREDICTABLE HYBRID, MULTI, SOVEREIGNTY SCALE INTER-CLOUD
3 Modern Data Brings Workload Complexity
> Traditional Siloed Today’s Related Data and Data and Workload Complex Management Workloads Data Management Evolution { : } 1980’s 2000’s 2020’s So, how do we solve for mixed workloads?
How complex is your query?
• Simple - Single Partition/Single Index Lookup, Single Iteration • Complex - Full Scan, Large Aggregation, Unknown Iterations • In between - Multiple Partition/Indexes, Aggregations, or Multiple Iterations
How fast do you need it?
• Machine Time < the time it takes to interrupt a user process • Human Time < time a user will wait • Offline Time is Everything else
6 Mixed Workload Coverage – Customer 360 Queries
1. Find me Dave Offline 2. Find me all people with similar fast Analytics 5 names to ‘Dave’ 3 3. Tell me if there are duplicate Daves Human 4. Find how Dave and Jenn are fast DSE 4 connected Graph 5. Find how influential Dave is in Response time CQL Search my application 2 Machine 6. Show Dave what songs are 6 fast 1 trending for anyone with the same preferences while he Stream Processing looks for a song to play in his Simple Complex mobile app New Opportunities
BLENDED WORKLOADS AT SCALE WITH DATASTAX
Recommendation Law Fleet Fraud IoT Systems Enforcement Management
Anomaly detection Act on sensors Your preferences Bad actor identification Vehicle tracking and connected and analyze and your network’s and criminal and path components the network preferences network activity optimization
Manage Seamlessly with one Database Graph, Analytics, Search, Advanced Security,
Stream Processing, In-memory Engine 8 Simplified Data Complexity SINGLE CLOUD
A Single Data Platform
Mixed Workloads at Scale Scale apps for complex HYBRID ON data & workloads with CLOUD PREMISES ease Real-time Intelligence Access the full value of your ever-changing data Multi-DC, Multi- Platform Deployment and MULTI- / INTER-CLOUD operations wherever
9 you choose to deploy Learn about Cassandra and TinkerPop today! academy.datastax.co m @DataStaxDevs
datastax.com/download s @DataStax
10 Thank you Master Data Management Aligning Data, Process, and Governance Donna Burbank Global Data Strategy, Ltd. April 23rd , 2020
Follow on Twitter @donnaburbank Twitter Event hashtag: #DAStrategies Copyright Global Data Strategy, Ltd. 2020 Donna Burbank
technology. In past roles, she has served in She has worked with dozens of Fortune 500 key brand strategy and product companies worldwide in the Americas, management roles at CA Technologies and Europe, Asia, and Africa and speaks regularly Embarcadero Technologies for several of the at industry conferences. She has co- leading data management products in the authored two books: Data Modeling for the market. Business and Data Modeling Made Simple with ERwin Data Modeler and is a regular As an active contributor to the data contributor to industry publications. She can management community, she is a long time be reached at Donna is a recognised industry expert in DAMA International member, Past President [email protected] information management with over 20 years and Advisor to the DAMA Rocky Mountain Donna is based in Boulder, Colorado, USA. of experience in data strategy, information chapter, and was awarded the Excellence in management, data modeling, metadata Data Management Award from DAMA management, and enterprise architecture. International. Her background is multi-faceted across Donna is also an analyst at the Boulder BI consulting, product development, product Train Trust (BBBT) where she provides advice management, brand strategy, marketing, and gains insight on the latest BI and and business leadership. Analytics software in the market. She was on She is currently the Managing Director at several review committees for the Object Global Data Strategy, Ltd., an international Management Group’s for key information information management consulting management and process modeling company that specializes in the alignment of notations. business drivers with data-centric Follow on Twitter @donnaburbank
Global Data Strategy, Ltd. 2020 Twitter Event hashtag: #DAStrategies 2 DATAVERSITY Data Architecture Strategies This Year’s Lineup
• January 23 Emerging Trends in Data Architecture – What’s the Next Big Thing? • February 27 Building a Data Strategy - Practical Steps for Aligning with Business Goals • March 26 Cloud-Based Data Warehousing – What's New and What Stays the Same • April 23 Master Data Management – Aligning Data, Process, and Governance • May 28 Data Governance and Data Architecture – Alignment and Synergies • June 25 Enterprise Architecture vs. Data Architecture • July 22 Best Practices in Metadata Management • August 27 Data Quality Best Practices • September 24 Data Virtualization – Separating Myth from Reality • October 22 Data Architect vs. Data Engineer vs. Data Modeler • December 1 Graph Databases: Practical Use Cases
Global Data Strategy, Ltd. 2020 3 What We’ll Cover Today
• Master Data Management (MDM) provides organizations with an accurate and comprehensive view of business-critical data such as Customers, Products, Vendors, and more. • While mastering these key data areas can be a complex task, the value of doing so can be tremendous – from real-time operational integration to data warehousing & analytic reporting. • This webinar provides practical strategies for gaining value from your MDM initiative, while at the same time assuring a solid architectural and governance foundation that will ensure long- term, enterprise-wide success.
Global Data Strategy, Ltd. 2020 4 MDM is Part of a Wider Data Strategy A Successful Data Strategy links Business Goals with Technology Solutions
“Top-Down” alignment with business priorities
Managing the people, process, policies & culture around data
Leveraging & managing data for strategic advantage
Coordinating & integrating disparate data sources
“Bottom-Up” management & inventory of data sources
Copyright 2020 Global Data Strategy, Ltd
Global Data Strategy, Ltd. 2020 5 www.globaldatastrategy.com What is Master Data? Definition
• Master Data is the consistent and uniform set of identifiers and extended attributes that describes the core entities of the enterprise including customers, prospects, citizens, suppliers, sites, hierarchies and chart of accounts (sic).
• Master data management (MDM) is a technology-enabled discipline in which business and IT work together to ensure the uniformity, accuracy, stewardship, semantic consistency and accountability of the enterprise's official shared master data assets.
- Source Gartner
Global Data Strategy, Ltd. 2020 6 What is Master Data? Real-world examples
The $1M cheese slice The $2M baby bottle The “dead” living organism
Which Dr. Smith is credentialled Which Michael Jones is the How do we define Regions, Markets, for heart surgery? high-net worth customer? Locations, Catchments, Sites, etc.?
Global Data Strategy, Ltd. 2020 7 What is Master Data? What is Reference Data?
Master Data Reference Data
Address Line 1 Address Line 2 One person’s Master Data is City another person’s Reference Data… vs. State
How do we define Regions, Markets, AL Locations, Catchments, Sites, etc.? AK AR AZ CA CO ..etc. Global Data Strategy, Ltd. 2020 8 … but Don’t make this overly Pedantic …
Global Data Strategy, Ltd. 2020 9 Understanding Your Customer A 360 Degree View through Data
Address = Pontresina, Switzerland
Occupation = Ski Instructor Purchased €500 in outdoor gear in 2015
Member of Loyalty Program since 2010 100% of purchases online
Stefan Krauss Age = 31 Top Finisher in Engadin Ski Marathon 2010-2015 Prefers Text Message Global Data Strategy, Ltd. 2020 10 Understanding Your Customer A 360 Degree View through Data
Address = Zurich, Switzerland Occupation = Banker
Purchased €3.500 in outdoor gear in 2019
Member of Loyalty Program since 1990 75% of spending is while Stefan Krauss on holiday Age = 62
Football Fan 100% of spending in store
Global Data Strategy, Ltd. 2020 Prefers Physical Mail 11 Transaction Data vs. Master Data
Consider the following retail transaction data
Customer Date Product Code Price Quantity Location Stefan Kraus 1/2/2017 Scarpa Telemark Ski Boot SC1279 €250 1 St. Moritz, CH Donna Burbank 1/5/2017 Scarpa Telemark Ski Boot SCU1289 $150 1 Boulder, CO Stefan Kraus 1/2/2017 North Face Down Jacket NF8392 €450 1 Zurich, CH Stefan Kraus 1/2/2017 Garmin Sports Watch GM29384 €200 2 Zurich, CH Wendy Hu 3/4/2017 Prana Yoga Pant PN82734 $51 5 New York, NY Joe Smith 4/1/2017 Garmin Sports Watch GM29384 $150 1 Albany, NY
Transaction Data Master Data • Describes an action (verb): E.g. “buy” • Describes the key entities (nouns), e.g. Customer, Product, • May include measurements about the action: (Who, When, Location What, How Many, Where, How Much, etc.) • Provides attributes & context for these nouns • E.g. Stefan Kraus, 1/2/2017/, Scarpa Telemark Ski Boot, St. • e.g. Wendy Hu, age 25, female, resident of New York, NY, Moritz, CH, €250 Customer since 2005, preferred customer card, etc.
Global Data Strategy, Ltd. 2020 12 Transaction Data vs. Master Data Reference Data: Country Codes Reference Data: State Codes
Customer Date Product Code Price Quantity Location Stefan Kraus 1/2/2017 Scarpa Telemark Ski Boot SC1279 €250 1 St. Moritz, CH Donna Burbank 1/5/2017 Scarpa Telemark Ski Boot SCU1289 $150 1 Boulder, CO Transaction Stefan Kraus 1/2/2017 North Face Down Jacket NF8392 €450 1 Zurich, CH Data Stefan Kraus 1/2/2017 Garmin Sports Watch GM29384 €200 2 Zurich, CH Wendy Hu 3/4/2017 Prana Yoga Pant PN82734 $51 5 New York, NY Joe Smith 4/1/2017 Garmin Sports Watch GM29384 $150 1 Albany, NY
Master Data: Master Data: Location Master Data: Product Customer
Global Data Strategy, Ltd. 2020 13 Master Data Overview
Supply CRM In-Store Finance Marketing Online Sales Sales Chain
Publish & Subscribe
Data Stewardship Validation Data Quality & Matching
Data Model MDM Data Lookup ETL Warehouse “Golden Record” BI & Reporting Reference End User Applications Data Sets
Global Data Strategy, Ltd. 2020 14 Master Data Overview Each system has its own unique functionality and associated data model.
First Name First Name Customer ID First Name First Name First Name
Family Name Postal Code First Name Middle Name Family Name Family Name
Address Line 1 Family Name Family Name Address Line 1 Address Line 1
Address Line 2 Account # Address Line 2 Address Line 2 Email Supply City CRM In-Store Credit Balance Finance Twitter ID Marketing City Online City Chain State Sales Gender State Sales State
Phone Credit Card # Country Publish & Subscribe Email Phone Phone
Spouse
Data Stewardship Validation Data Quality & Matching
Data Model MDM Data Lookup ETL Warehouse “Golden Record” BI & Reporting Reference End User Applications Data Sets
Global Data Strategy, Ltd. 2020 15 Master Data Overview
First Name First Name Customer ID First Name First Name First Name
Family Name Postal Code First Name Middle Name Family Name Family Name
Address Line 1 Family Name Family Name Address Line 1 Address Line 1
Address Line 2 Account # Address Line 2 Address Line 2 Email Supply City CRM In-Store Credit Balance Finance Twitter ID Marketing City Online City Chain State Sales Gender State Sales State
Phone Credit Card # Country Publish & Subscribe Email Phone Phone
Spouse
Data Stewardship The MDM data model is a selected Validation Data Quality super/subset of the source system models. & MatchingCustomer ID First Name
Family Name Data Model Address Line 1 Data AddressMDM Line 2 Lookup ETL Warehouse “GoldenCity Record” BI & Reporting State Reference End User Applications Country Data Sets Country Codes Phone State Codes Email
Global Data Strategy, Ltd. 2020 16 Master Data Overview
First Name = John First Name = Jack Customer ID = 123 First Name = J. First Name = Johnn First Name = John
Family Name = Smith Postal Code = 10101 First Name = John Middle Name Family Name Family Name = Smith
Address Line 1 = 101 Main ST Family Name = Smith Family Name = Smith Address Line 1 = 101 Main Street Address Line 1 = 101 Main St
Address Line 2 Account # Address Line 2 = Apt 2 Address Line 2 Email = [email protected] Supply City = Anywhere CRM In-Store Credit Balance Finance Twitter ID = @johnsMarketing City = Plano Online City = Anywhere Chain State = Texas Sales Gender = Male State = TX Sales State = TX
ZIP = 10101 Credit Card # Country = USA Publish & Subscribe Phone = 555 927 1212 Phone = +1 555 927 1212 Phone = x1212 Email = [email protected] Matching Rules help create a Spouse = Mary “Golden Record” Data Stewardship Validation Data Quality & Matching Customer ID = 123 First Name = John Family Name = Smith Data Model Address Line 1 = 101 Main ST Data MDM Address Line 2 = Apt 2 Lookup ETL Warehouse “Golden Record”City = Anywhere BI & Reporting State = TX Reference Postal Code = 10101 End User Applications Data Sets Country = USA
Phone = +1 555 927 1212
Email = [email protected] Global Data Strategy, Ltd. 2020 17 Data Model / Database Keys for Matching Rules
• Candidate attribute combinations for matching are often aligned with the primary and alternate keys from the logical data model. Matching on Primary Key
Customer Ideally, if all systems use the same unique identifier, matching is easier. Customer ID But this isn’t often realistic in “real world” systems.
Matching on Alternate Keys • First, match on Date of Birth + SSN • Then, match on SSN + Last Name • Etc.
Global Data Strategy, Ltd. 2020 18 Fuzzy Matching
• Fuzzy matching logic can also be used, which is particularly helpful in matching string fields such as names and addresses, where human error or different entry standards between systems can cause slight variations in similar values, e.g. • “101 Main St” vs. “101 Main Street” • “John Smith” vs. “J Smith” • In addition synonyms can be created to assist with matching, for example • “St”, “St.”, “Street”, etc. for addresses • “Tim”, “Timothy” for names and nicknames. • When using fuzzing matching, data quality thresholds can be defined for auto approval. • Match scores are created for each fuzzy match, for example .9 would indicate a strong match and .2 a weak one. • Using these scores as a guide, thresholds can be defined for which matches can be auto-approved, which can be auto-rejected, and which need human review from a data steward.
Global Data Strategy, Ltd. 2020 19 Master Data Overview
Supply CRM In-Store Finance Marketing Online Sales Sales Chain
Publish & Subscribe
“Human in the Loop” – Data Stewards can validate match Data Stewardship Validation candidates. Data Quality & Matching
Data Model MDM Data Lookup ETL Warehouse “Golden Record” BI & Reporting Reference End User Applications Data Sets
Global Data Strategy, Ltd. 2020 20 Master Data Overview
Supply CRM In-Store Finance Marketing Online Sales Sales Chain
Publish & Subscribe
Applications can reference Data Stewardship the “Golden Record” for Validation Data Quality lookup. & Matching
Data Model MDM Data Lookup ETL Warehouse “Golden Record” BI & Reporting Reference End User Applications Data Sets
Global Data Strategy, Ltd. 2020 21 Master Data Overview
Supply CRM In-Store Finance Marketing Online Sales Sales Chain
Publish & Subscribe
Data Stewardship Validation Data Quality & Matching
Data Model MDM Data Lookup ETL Warehouse “Golden Record” BI & Reporting Reference MDM can feed the dimensional model for End User Applications Data Sets the data warehouse (e.g. customer, location, etc.)
Global Data Strategy, Ltd. 2020 22 MDM is not Reporting or Analytics -- It can be a Source
Data Warehouse BI & Reporting e.g. Customer by Region
MDM “Golden Record”
Reference Data Sets
Graph Database Social Network Analysis
Data Lake Social Media Sentiment Analysis Global Data Strategy, Ltd. 2020 23 Governance & Business Process for MDM
• While the implementation of the hub and population strategies is complex, more complex is understanding the business processes and governance processes around the populating and publishing systems.
• In fact, the top two reasons for failure of MDM systems cited by the Gartner analyst group1 are :
Failure of IT to Align With Delaying or Mismanaging Business Process Improvements Information Governance and Document Business Value Implementation
1 Top Four Reasons Your MDM Program Will Fail, and How to Avoid Them, Gartner, 2016, ID: Global Data Strategy, Ltd. 2020 G00223675, by Bill O’Kane. Note: The remaining two reasons are: Failure to Manage Initial Master 24 Data Quality & Defining Transactional (Fact) Data as Master Data Successful MDM Combines Data, Process, and Accountability
• System Architecture & data flow • Data models & hierarchies Data • Match/merge and survivorship rules Architecture • Data integration & design
Master Data Management • Accountability & stewardship • Business process models • Business rule validation • Data mapping to process Business Data • Conflict resolution • CRUD and usage matrices Process Governance & • Business Prioritization • Optimizing business process Alignment Stewardship for data improvement
Global Data Strategy, Ltd. 2020 The Importance of Business Process Identifying key data dependencies in core business processes
• Process models are a helpful tool for describing core business processes (e.g. BPMN). • “Swimlanes” outline organizational considerations • Data can be mapped to key business processes to understand creation & usage of information. • Understanding business process is critical to Data Governance • Who is using data? • How is it used in business processes? • Are there redundancies, conflicts, etc.?
Global Data Strategy, Ltd. 2020 26 CRUD Matrix – Understanding Data Usage Create, Read, Update, Delete
• CRUD Matrices shows where data is Created, Read, Updated or Deleted across the various areas of the organization • This can be a helpful tool in data governance & data quality to determine route cause analysis. Users, Departments, and/or Systems
Product Supply Chain Marketing Finance Development Accounting
Product Assembly Instructions C R
Product Components C R Data entities or attributes Product Price C U R
Product Name C U,D
Etc.
Global Data Strategy, Ltd. 2020 27 The Intersection of MDM and Data Governance Successful MDM require governed alignment between business & technical roles
Enterprise-wide Business Prioritization Roles Business Technical
Enterprise Conceptual Map to Customer & Logistics Prioritize Subject Areas / MDM Prioritize Geo Rollout Business Data Owners Enterprise Data Business Data Model Journey Domains (e.g. Product) Architect Analyst
Business & Technical Alignment & Implementation Business-level
o Business Rules Data Domain Steward Business Process Enterprise Data Business o Match-Merge Criteria Steward Architect Analyst o Survivorship Rules Logical Model o Harmonization Strategy Business Process Data Steward Tasks & Workflow Workflow Regional/Geo MDM Steward Architect Technical-level Data Quality Data Integration MDM Platform Admin o Data Profiling o Data Integration o o DB Admin Data Quality Remediation o CRUD Matrix o o MDM Platform Admin Data Quality Physical Model Augmentation (e.g. address) o Publish & Subscribe System MDM o Data Quality Dashboards Steward Architect Specialist
Data Integration Specialist Global Data Strategy, Ltd. 2020 www.globaldatastrategy.com Optimizing Restaurant Revenue through Menu Data Managing the Data that Runs the Business • An international restaurant chain realized through its digital strategy that: • While menus are the core product that drives their business… • They had little control or visibility over their menu data • Menu data was scattered across multiple systems in the organization from supply chain to kitchen prep to marketing, restaurant operations, etc. • Menu data was consolidated & managed in a central hub: • Master Data Management created a “single view of menu” for business efficiency & quality control • Data Governance created the workflow & policies around managing menu data • Process Models & Data Mappings were critical • Business Process diagrams to identify the flow of information • CRUD Matrixes to understand usage, stewardship & ownership
Product Creation & Menu Display & Supply Chain Point of Sale & Testing Marketing Restaurant Operations
Global Data Strategy, Ltd. 2020 www.globaldatastrategy.com Summary
• Interest in Master Data Management (MDM) is on the rise as more organizations look to gain a common, consistent source for their core data assets (Customer, Product, Supplier, Employee, etc.)
• Successful MDM is part of a wider data strategy and requires integration with:
• Data Architecture
• Business Process Alignment
• Data Governance & Stewardship
• Getting this combination right can have a positive impact on the success of the business.
Global Data Strategy, Ltd. 2020 30 DATAVERSITY Data Architecture Strategies Join us next month
• January 23 Emerging Trends in Data Architecture – What’s the Next Big Thing? • February 27 Building a Data Strategy - Practical Steps for Aligning with Business Goals • March 26 Cloud-Based Data Warehousing – What's New and What Stays the Same • April 23 Master Data Management – Aligning Data, Process, and Governance • May 28 Data Governance and Data Architecture – Alignment and Synergies • June 25 Enterprise Architecture vs. Data Architecture • July 22 Best Practices in Metadata Management • August 27 Data Quality Best Practices • September 24 Data Virtualization – Separating Myth from Reality • October 22 Data Architect vs. Data Engineer vs. Data Modeler • December 1 Graph Databases: Practical Use Cases
Global Data Strategy, Ltd. 2020 31 About Global Data Strategy, Ltd Data-Driven Business Transformation
• Global Data Strategy is an international information management consulting company that specializes in the alignment of business drivers with data-centric technology. • Our passion is data, and helping organizations enrich their business opportunities through data and information. • Our core values center around providing solutions that are: • Business-Driven: We put the needs of your business first, before we look at any technology solution. • Clear & Relevant: We provide clear explanations using real-world examples. • Customized & Right-Sized: Our implementations are based on the unique needs of your organization’s size, corporate culture, and geography. • High Quality & Technically Precise: We pride ourselves in excellence of execution, with years of technical expertise in the industry. Business Strategy Data Strategy Aligned With
Visit www.globaldatastrategy.com for more information
Global Data Strategy, Ltd. 2020 32 Questions? • Thoughts? Ideas?
Global Data Strategy, Ltd. 2020 www.globaldatastrategy.com 33