The Latest in the Database – Metadata Relationship Donna Burbank Global Data Strategy Ltd. Database Now! Online Conference July 19, 2017 Donna Burbank of business drivers with data-centric latest BI and Analytics software in the technology. In past roles, she has served in market. key brand strategy and product management roles at CA Technologies and She has worked with dozens of Fortune Embarcadero Technologies for several of 500 companies worldwide in the Americas, the leading data management products in Europe, Asia, and Africa and speaks the market. regularly at industry conferences. She has co-authored two books: Data Modeling for As an active contributor to the data the Business and Data Modeling Made Donna is a recognised industry expert in management community, she is a long Simple with ERwin Data Modeler and is a information management with over 20 time DAMA International member, Past regular contributor to industry years of experience in data strategy, President and Advisor to the DAMA Rocky publications. She can be reached at information management, data modeling, Mountain chapter, and was recently [email protected] metadata management, and enterprise awarded the Excellence in Data Donna is based in Boulder, Colorado, USA. architecture. Her background is multi- Management Award from DAMA faceted across consulting, product International in 2016. She was on the development, product management, brand review committee for the Object strategy, marketing, and business Management Group’s (OMG) Information leadership. Management Metamodel (IMM) and the Business Process Modeling Notation She is currently the Managing Director at (BPMN). Donna is also an analyst at the Global Data Strategy, Ltd., an international Boulder BI Train Trust (BBBT) where she information management consulting provides advices and gains insight on the company that specializes in the alignment Follow on Twitter @donnaburbank Global Data Strategy, Ltd. 2017 Today’s hashtag: #DBNow 2 Agenda What we’ll cover today
• Emerging Trends in Metadata Management • The Business Value of Metadata Management • Metadata as Part of Wider Enterprise Data Management • Metadata Isn’t Just for Relational Databases Anymore • Technical Innovation & Best Practices in Managing Metadata
Global Data Strategy, Ltd. 2017 3 Metadata is Hotter than ever A Growing Trend
In a recent DATAVERSITY survey, over 80% of respondents stated that: Metadata is as important, if not more important, than in the past.
Global Data Strategy, Ltd. 2017 4 What is Metadata?
Metadata is Data In Context
Global Data Strategy, Ltd. 2017 5 Metadata is the “Who, What, Where, Why, When & How” of Data
Who What Where Why When How
Who created this What is the business Where is this data Why are we storing When was this data How is this data data? definition of this data stored? this data? created? formatted? element? (character, numeric, etc.) Who is the Steward of What are the business Where did this data What is its usage & When was this data How many databases this data? rules for this data? come from? purpose? last updated? or data sources store this data? Who is using this What is the security Where is this data What are the business How long should it be data? level or privacy level used & shared? drivers for using this stored? of this data? data? Who “owns” this What is the Where is the backup When does it need to data? abbreviation or for this data? be purged/deleted? acronym for this data element? Who is regulating or What are the technical Are there regional auditing this data? naming standards for privacy or security database policies that regulate implementation? this data?
Global Data Strategy, Ltd. 2017 6 Metadata is Needed by Business Stakeholders Making business decisions on accurate and well-understood data
80% of users of metadata are from the business, according to the recent DATAVERSITY survey.
Business users often “get” metadata more than IT does!
Global Data Strategy, Ltd. 2017 7 Poor Metadata Management Can be Expensive
On average organizations waste 56% of UK marketing organizations 15-18% of their budgets dealing say managing data quality is a with data problems. “significant challenge” .
Source: Experian Source: UK Marketing Today
The US economy loses $3.1 trillion a year due to poor data quality .
Correcting poor data quality is a Source: Artemis Ventures In the US, 6.9 billion pieces of mail Data Scientist’s least favorite task, are undeliverable annually because consuming on average 80% of their of address issues . working day Source: Forbes 2016 Source: US Postal Service
Global Data Strategy, Ltd. 2017 8 A Very Expensive Example - NASA
• On September 23, 1999 NASA lost the $125 million Mars Climate Orbiter spacecraft after a 286-day journey to Mars. • Missing Metadata was the culprit • Thruster data was sent in English units of pound-seconds (lbf s) instead of Metric units of newton- seconds (N s) • This metadata inconsistency caused thrusters to fire incorrectly, sending the craft off course – 60 miles in all (96.56 km). • In addition to the cost of the orbiter were: • Brand and Reputational Damage • Lost Opportunities for research on the Martian atmosphere & climate
Global Data Strategy, Ltd. 2017 9 Human Metadata Avoid the dreaded “I just know”
• Much business metadata and the history of the business exists in employee’s heads. • It is important to capture this metadata in an electronic format for sharing with others. • Avoid the dreaded “I just know”
Part Number is what used to be called Component Number before the Business Glossary acquisition.
Metadata Repository
Data Models Etc.
Global Data Strategy, Ltd. 2017 10 Metadata is Part of a Larger Enterprise Landscape A Successful Data Strategy Requires Many Inter-related Disciplines
“Top-Down” alignment with business priorities
Managing the people, process, policies & culture around data
Leveraging & managing data for strategic advantage
Coordinating & integrating disparate data sources
“Bottom-Up” management & inventory of data sources
Copyright Global Data Strategy, Ltd. 2017 Global Data Strategy, Ltd. 2017 11 Metadata Use Cases – Now & In the Future • Business Use Cases for Metadata area Evolving, according to the DATAVERSITY Emerging Trends survey. • The “Top 5” are changing – Less BI/DW and Software Dev & More Big Data & Data Science • Data Governance, Master Data Management, and Data Quality remain constants Now Future
Global Data Strategy, Ltd. 2017 12 Types of Metadata – Now & In the Future • The types of metadata sources being managed are also evolving. • Business Glossaries & Data Warehouses remain constants • Data Quality & Big Data Platform sources are growing Now Future
Global Data Strategy, Ltd. 2017 13 Metadata isn’t just for Relational Databases anymore…
• There are many Sources and Types of Metadata
Relational databases Application Code Data Models Data Transformation / ETL Tools Text Documents Spreadsheets XML Data Quality Tools Open Data Business Process Models Internet of Things (IoT) Business Intelligence (BI) Tools Photos / Images ERP, CRM, and Packed Applications Social Media Big Data platforms COBOL Copybooks Etc.… there are many more Graph Databases
Global Data Strategy, Ltd. 2017 14 Relational Database Metadata
• The technical structure of a relational database is defined by DDL (data definition language). It describes the structure / schema for how data is stored in a database. • A Glossary or Data Dictionary generally stores the business metadata.
Data Technical Metadata Business Metadata
CREATE TABLE EMPLOYEE ( employee_id INTEGER NOT NULL, department_id INTEGER NOT NULL, employee_fname VARCHAR(50) NULL, Term Definition employee_lname VARCHAR(50) NULL, An employee is an individual who currently employee_ssn CHAR(9) NULL); Employee works for the organization or who has been recently employed within the past 6 months. CREATE TABLE CUSTOMER ( customer_id INTEGER NOT NULL, A customer is a person or organization who customer_name VARCHAR(50) NULL, has purchased from the organization within Customer customer_address VARCHAR(150) NULL, the past 2 years and has an active loyalty card customer_city VARCHAR(50) NULL, customer_state CHAR(2) NULL, or maintenance contract. customer_zip CHAR(9) NULL); Glossary or Data Dictionary
DDL John Smith
Global Data Strategy, Ltd. 2017 15 Data Models are a Good Source of Metadata • Data Models are another good source of both business & technical metadata for relational databases. • They store structural metadata as well as business rules & definitions.
Technical Metadata Business Metadata Customer Customer_ID CHAR(18) NOT NULL First Name CHAR(18) NOT NULL Last Name CHAR(18) NOT NULL City CHAR(18) NULL Date Purchased CHAR(18) NULL
Global Data Strategy, Ltd. 2017 16 ERP, CRM and Packaged Application Metadata • Packaged applications such as CRM and ERP systems (e.g. Salesforce, PeopleSoft, etc.) are typically based on a relational database system. • Therefore, there is important metadata about both the physical table structures as well as the business names & definitions. Technical Metadata Business Metadata
Global Data Strategy, Ltd. 2017 17 NoSQL – Key Value Databases
• NoSQL Databases are often optimal solutions for flexibility & performance in certain scenarios. • One common NoSQL database is a key-value pair database (e.g. Redis, Oracle NoSQL, etc.) • They can support extremely high volumes of records & state changes per second through distributed processing and distributed storage. • Use cases include: Managing user sessions in web applications, online gaming, online shopping carts, etc. • The structure is often created by the application code, not within a database or metadata structure. • Metadata for NoSQL databases is typically minimal or non-existent. • The structure & metadata is generally determined by the application code
Key Value
1839047 John Doe, Prepaid, 40.00
9287320 01/01/2008, 50.00, Green
Global Data Strategy, Ltd. 2017 18 NoSQL Metadata – Document Databases
• Document databases are popular ways to store unstructured information in a flexible way (e.g. multimedia, social media posts, etc. ) • Each Collection can contain numerous Documents which could all contain different fields.
{type: “Artifact”, {type: “Book”, medium: “Ceramic” title: “Ancient China” country: “China”, country: “China”, } }
• Some data modeling can be done, and some data modeling tools support this (e.g. MongoDB).
*
* Example from docs.mongodb.com
Global Data Strategy, Ltd. 2017 19 Big Data Platform Metadata
• Big Data platforms (e.g. Hadoop-based) are typically based on system of files (HDFS) • As a result, the detailed structure that is found in a relational database platform does not exist • Metadata still exists for these platforms, however.
Technical Metadata Tree structure of HDFS directories Directory and file attributes (ownership, permissions, quotas, replication factor, etc.) Metadata about logical data sets (e.g. format, statistics, etc.) Data ingest & transformation lineage Business Metadata Description of file Tags There are components that allow you to add structure within the Hadoop ecosystem (e.g. Hive)
Global Data Strategy, Ltd. 2017 20 COBOL Copybook Metadata
• What is a COBOL Copybook? – In COBOL, a copybook file is used to define data elements that can be referenced by many programs • What is COBOL Copybook Metadata? – structure, definition
Metadata Describes structure & format of data
The demand for COBOL & legacy metadata is growing, according to the recent DATAVERSITY survey.
Global Data Strategy, Ltd. 2017 21 Graph Relationships
• Graph databases are ideal for analyzing metadata relationships between objects and finding patterns in those relationships. • Common use cases for graph relationship metadata analysis include: • Fraud detection - e.g. financial transactions • Threat detection - e.g. email and phone patterns • Marketing – e.g. social media connections, product recommendation engines • Network optimization - e.g. IoT, Telecommunications
• In a graph database, “the metadata is the database”.
Global Data Strategy, Ltd. 2017 22 XML Metadata
• What is XML? – (Extensible Markup Language) is used to store and transport data. It’s often a complement to HTML, which is used to format the data. • What is XML Metadata? – Similar to DDL, an XML Schema (XSD) defines the structure & format of data
Metadata Data Data
Global Data Strategy, Ltd. 2017 23 JSON Metadata • What is JSON? – (JavaScript Object Notation) is a minimal, readable format for structuring data. It is used primarily to transmit data between a server and web application, as an alternative to XML. • What is JSON Metadata? – structure, definition For example, assume we have a JSON based product catalog. This catalog has a product which has an id, a brand, a price, and an optional set of tags.
{ Metadata "$schema": "http://json-schema.org/draft-04/schema#", Data Context Needed "title": "Product", "description": "A retail product from Acme's online catalog", (i.e. Metadata) "type": "object", { "properties": { • Can the ID contain letters? "id": { "id": 127849, "description": "The unique identifier for a product", “brand": “Super Cooler", • What is a brand? "type": "integer" }, "price": 12.50, • Is a price required? “brand": { "description": “The brand name of the product as shown in the online catalogue", "tags": [“camping", “sports"] • Etc. "type": "string" }, } "price": { "type": "number", }, Example Product in the API "tags": { "type": "array", "items": { "type": "string" }, "minItems": 1, } }, JSON Schema "required": ["id", “brand", "price"] }
Global Data Strategy, Ltd. 2017 24 IoT Metadata
• What is the IoT? – The Internet of Things (IoT) is a network of physical devices that are able to share data over a network. • What is IoT Metadata? – Metadata is necessary to provide context around the readings generated by IoT devices, e.g. units of measure, type of measurement, etc. .
Is 140 the temperature of my stove or my max heart rate on
140
Global Data Strategy, Ltd. 2017 25 Document Metadata
• Document metadata provides context & background on the contents & purpose of the document. • Often found in the “Properties” menu
Tags & Categories for search & organization User-defined Description/Comments about the contents of the document
Date/Time stamp when document was created, modified, etc. System-defined Author of the document
Global Data Strategy, Ltd. 2017 26 Image Metadata • Metadata is critical for locating images online, as well as identifying copyright information, etc. • Some information is system-generated, while other is user-defined.
Technical Metadata Descriptive Metadata Administrative Metadata (Embedded in Photo) (User Defined) (User Defined) Camera: Apple iPhone 6 Plus iPhone 6 Plus back camera 4.15mm f/2.2 Lens: Shot at 4.2 mm Title DATAVERSITY EDW 2016 San Diego Author Donna Burbank Digital Zoom: 5.006134969× 1 Exposure: Auto exposure, Program AE, /7,937 sec, f/2.2, ISO 32 Keywords EDW 2016, San Diego, Bay Photos Copyright None Flash: Auto, Did not fire April 13, 2016 5:35:53PM (timezone not specified) Location San Diego Licensing None (1 month, 11 days, 14 hours, 14 minutes, 46 seconds ago, Date: assuming image timezone of US Pacific)
3,264 × 2,448 JPEG (8.0 megapixels) File: 800,782 bytes (782 kilobytes)
Global Data Strategy, Ltd. 2017 27 Social Media Metadata
• Metadata from Social Media, such as Twitter, can help identify trend and sentiment analysis, for example.
Embedded Metadata Date/Time Location Language Author Device Used ID of Tweet Text/Content of Tweet Etc.
Hash Tag
100
# of Retweets
Global Data Strategy, Ltd. 2017 28 Open Data Metadata • What is a Open Data? – Open Data is data that can be freely used and redistributed by anyone • What is Open Data Metadata? – Metadata provides the context that makes open data usable and credible. Feedback loop When was it created or updated?
When was it Published?
Who published it? What is the intended usage? How often is it refreshed?
What are the security or usage restrictions? Data
What keywords categorize this data?
Global Data Strategy, Ltd. 2017 29 Business Process Model Metadata
• Business Process Models describe key activities within the organization. • Linking these processes to the data that is Created, Updated, or Deleted (CRUD) is important to understanding data usage.
Business Process Model CRUD Matrix Customer Order Account Invoice Product Receive Customer Order R C C, R Process Customer Order C,R,U R,U R Fill Order R,U R,U R,U Send Invoice R,U R,U C
Global Data Strategy, Ltd. 2017 Architectural Options for Metadata Management
• The following are common architectural options for metadata management within & between organizations. • There is no “one size fits all” approach. • They can be used together within the same organization.
Central, Enterprise-wide Tool or Purpose-Specific Metadata Exchange & Metadata Repository Repository Registry Publication & Sharing
Information Sharing & Standards Reports Web Portal Integration & Export Business Glossary Matching & Population ETL Tool Reuse Logic Interfaces
Metamodel(s) Database
Data Modeling Tool Data Dictionary
Etc BI Tool Metadata Storage (Database)
Global Data Strategy, Ltd. 2017 31 Data Lineage Data Warehousing Example
• In the data warehouse example below, metadata for CUSTOMER exists in a number tools & data stores.
Logical Data Model
Business Glossary Physical Data Model Dimensional Data Model CUSTOMER Physical Data Model CUSTOMER CUSTOMER BI Tool
ETL Tool ETL Tool Database Table Database Table Database Table CUST
Database Table TBL_C1 Sales Report
Database Table
Global Data Strategy, Ltd. 2017 32 Metadata Discovery, Lineage & Matching
• Methods for metadata discovery, lineage, and matching have evolved over the years, and can include the following: • Matching Rules: Matching rules & logic based on the structure of the data storage, e.g. • “Customer” can match to “CUST”, “CUSTOMER”, etc. in a relational database table • Database columns are the same if they have the same name & data type • Patterns in Data Values: AI & Pattern Matching, e.g. • Emails generally include the format [email protected]/.org, etc. • A consistent pattern of values such as NNN-NN-NNNN are likely social security numbers • Image Pattern Recognition: Detection of similar patterns in unstructured documents based on signal & image processing at the byte level, e.g. • These two documents both look like services contracts -> classify accordingly • Human-defined Tagging: Many systems allow user-defined “tags”, e.g. • Document of web page tagging for Search (e.g. “furry kittens”) • AWS tags for classification
Global Data Strategy, Ltd. 2017 33 Reuse & Matching Rules
• Reuse & Matching Rules help: • Is it the same table if it has the same name (e.g. CUSTOMER)? • …if it has an approved variation of that name (e.g. CUST) • Rationalize common objects • …if it has the same columns (e.g. first name, last name, gender) • Establish linkages between related • …if the columns are in the same order objects • …if it has the same description • Etc. Lineage Rationalization A customer is a person or organization with The CUSTOMER table on Oracle is an active account. represented by a Physical data model of the same object.
The data model has a description of the table—let’s create a single The CUSTOMER table on Oracle is represented by a Physical object combining both sources. data model of the same object.
Transformation rules are applied to the table and it is integrated with other sources to create the CUSTOMER table in the Staging area. A customer is a person or organization with an active account. This table is then transformed to the dimensional data warehouse CUSTOMER table.
Global Data Strategy, Ltd. 2017 34 AI & Pattern Matching
• Many metadata solutions can detect patterns in the data and, infer classification and linkage based on the data values themselves.
Classify as: email addresses [email protected]
98748 Elkhorn Way Classify as: physical addresses 18 Winding Court Rd
Global Data Strategy, Ltd. 2017 35 Image Pattern Recognition
• Some vendors are using signal and image processing techniques to classify unstructured data. • Based on byte-level patterns in the documents, patterns & classifications can be inferred
Classify as: Bank Check
Global Data Strategy, Ltd. 2017 36 Tagging
• Many solutions have the concept of creating user defined “tags” that classify & follow the data. • A tag is a non-hierarchical keyword that can be assigned to a piece of information, often for search & classification. (i.e. think “tagging photos”)
Amazon S3 • For Amazon S3 Buckets, user-defined metadata tags can be stored in a series of key-value pairs. For example, security classifications & retention policies could be managed as shown below. • Tags can be assigned at the bucket and file level • Metadata can be assigned when uploading an object, manually, via PUT/POST requests, or via the REST or SOAP APIs. Metadata tags can also be retrieved via the API.
Global Data Strategy, Ltd. 2017 37 Summary
• Metadata is more important than ever • Metadata is part of wider enterprise data management • Core use cases around data governance, data quality, and master data management remain • New use cases around Big Data and data science are emerging • There are a variety of sources of metadata across the organization • Metadata isn’t just for relational databases • The concept of “database” is evolving • There are a number of Technical Innovations & Best Practices in Managing Metadata, including • Matching Rules • Patterns in Data Values • Image Pattern Recognition • Tagging
Global Data Strategy, Ltd. 2017 About Global Data Strategy, Ltd Data-Driven Business Transformation • Global Data Strategy is an international information management consulting company that specializes in the alignment of business drivers with data-centric technology. • Our passion is data, and helping organizations enrich their business opportunities through data and information. • Our core values center around providing solutions that are: • Business-Driven: We put the needs of your business first, before we look at any technology solution. • Clear & Relevant: We provide clear explanations using real-world examples. • Customized & Right-Sized: Our implementations are based on the unique needs of your organization’s size, corporate culture, and geography. • High Quality & Technically Precise: We pride ourselves in excellence of execution, with years of technical expertise in the industry. Business Strategy Data Strategy Aligned With
Visit www.globaldatastrategy.com for more information
Global Data Strategy, Ltd. 2017 39 Contact Info
• Email: [email protected] • Twitter: @donnaburbank @GlobalDataStrat • Website: www.globaldatastrategy.com
Global Data Strategy, Ltd. 2017 40 White Paper: Emerging Trends in Metadata Management Free Download
• Download from www.dataversity.net • Also available on www.globaldatastategy.com
Global Data Strategy, Ltd. 2017 41 Lessons in Data Modeling Series This Year’s Line Up • January 26th How Data Modeling Fits Into an Overall Enterprise Architecture • February 23rd Data Modeling and Business Intelligence • March Conceptual Data Modeling – How to Get the Attention of Business Users • April The Evolving Role of the Data Architect – What does it mean for your Career? • May Data Modeling & Metadata Management • June Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling • July Data Modeling & Metadata for Graph Databases Next week! • August Data Modeling & Data Integration • September Data Modeling & MDM • October Agile & Data Modeling – How Can They Work Together? • December Data Modeling, Data Quality & Data Governance
Global Data Strategy, Ltd. 2017 42 DATAVERSITY Training Center Online Training Courses Metadata Management Course • Learn the basics of Metadata Management and practical tips on how to apply metadata management in the real world. This online course hosted by DATAVERSITY provides a series of six courses including: • What is Metadata • The Business Value of Metadata • Sources of Metadata • Metamodels and Metadata Standards • Metadata Architecture, Integration, and Storage • Metadata Strategy and Implementation • Purchase all six courses for $399 or individually at $79 each. Register here • Other courses available on Data Governance & Data Quality
Visit: http://training.dataversity.net/lms/
Global Data Strategy, Ltd. 2017 43 Questions? Thoughts? Ideas?
Global Data Strategy, Ltd. 2017 44