<<

The Latest in the Relationship Donna Burbank Global Strategy Ltd. Database Now! Online Conference July 19, 2017 Donna Burbank of business drivers with data-centric latest BI and in the . In past roles, she has served in . key brand strategy and product management roles at CA and She has worked with dozens of Fortune for several of 500 companies worldwide in the Americas, the leading products in Europe, Asia, and Africa and speaks the market. regularly at industry conferences. She has co-authored two : for As an active contributor to the data the Business and Data Modeling Made Donna is a recognised industry expert in management community, she is a long Simple with and is a management with over 20 time DAMA International member, Past regular contributor to industry years of experience in data strategy, President and Advisor to the DAMA Rocky publications. She can be reached at information management, data modeling, Mountain chapter, and was recently [email protected] metadata management, and enterprise awarded the Excellence in Data Donna is based in Boulder, Colorado, USA. architecture. Her background is multi- Management Award from DAMA faceted across consulting, product International in 2016. She was on the development, product management, brand review committee for the Object strategy, marketing, and business Management Group’s (OMG) Information leadership. Management Metamodel (IMM) and the Business Process Modeling Notation She is currently the Managing Director at (BPMN). Donna is also an analyst at the Global Data Strategy, Ltd., an international Boulder BI Train Trust (BBBT) where she information management consulting provides advices and gains insight on the company that specializes in the alignment Follow on @donnaburbank Global Data Strategy, Ltd. 2017 Today’s hashtag: #DBNow 2 Agenda What we’ll cover today

• Emerging Trends in Metadata Management • The Business Value of Metadata Management • Metadata as Part of Wider Enterprise Data Management • Metadata Isn’t Just for Relational Anymore • Technical Innovation & Best Practices in Managing Metadata

Global Data Strategy, Ltd. 2017 3 Metadata is Hotter than ever A Growing Trend

In a recent DATAVERSITY survey, over 80% of respondents stated that: Metadata is as important, if not more important, than in the past.

Global Data Strategy, Ltd. 2017 4 What is Metadata?

Metadata is Data In Context

Global Data Strategy, Ltd. 2017 5 Metadata is the “Who, What, Where, Why, When & How” of Data

Who What Where Why When How

Who created this What is the business Where is this data Why are we storing When was this data How is this data data? definition of this data stored? this data? created? formatted? element? (character, numeric, etc.) Who is the Steward of What are the business Where did this data What is its usage & When was this data How many databases this data? rules for this data? come from? purpose? last updated? or data sources store this data? Who is using this What is the security Where is this data What are the business How long should it be data? level or level used & shared? drivers for using this stored? of this data? data? Who “owns” this What is the Where is the When does it need to data? abbreviation or for this data? be purged/deleted? acronym for this ? Who is regulating or What are the technical Are there regional auditing this data? naming standards for privacy or security database policies that regulate implementation? this data?

Global Data Strategy, Ltd. 2017 6 Metadata is Needed by Business Stakeholders Making business decisions on accurate and well-understood data

80% of users of metadata are from the business, according to the recent DATAVERSITY survey.

Business users often “get” metadata more than IT does!

Global Data Strategy, Ltd. 2017 7 Poor Metadata Management Can be Expensive

On average organizations waste 56% of UK marketing organizations 15-18% of their budgets dealing say managing is a with data problems. “significant challenge” .

Source: Experian Source: UK Marketing Today

The US economy loses $3.1 trillion a year due to poor data quality .

Correcting poor data quality is a Source: Artemis Ventures In the US, 6.9 billion pieces of Data Scientist’s least favorite task, are undeliverable annually because consuming on average 80% of their of address issues . working day Source: Forbes 2016 Source: US Postal

Global Data Strategy, Ltd. 2017 8 A Very Expensive Example - NASA

• On September 23, 1999 NASA lost the $125 million Mars Climate Orbiter spacecraft after a 286-day journey to Mars. • Missing Metadata was the culprit • Thruster data was sent in English units of pound-seconds (lbf s) instead of Metric units of newton- seconds (N s) • This metadata inconsistency caused thrusters to fire incorrectly, sending the craft off course – 60 miles in all (96.56 km). • In addition to the cost of the orbiter were: • Brand and Reputational Damage • Lost Opportunities for on the Martian atmosphere & climate

Global Data Strategy, Ltd. 2017 9 Human Metadata Avoid the dreaded “I just know”

• Much business metadata and the history of the business exists in employee’s heads. • It is important to capture this metadata in an electronic format for sharing with others. • Avoid the dreaded “I just know”

Part Number is what used to be called Component Number before the Business Glossary acquisition.

Metadata Repository

Data Models Etc.

Global Data Strategy, Ltd. 2017 10 Metadata is Part of a Larger Enterprise Landscape A Successful Data Strategy Requires Many Inter-related Disciplines

“Top-Down” alignment with business priorities

Managing the people, process, policies & culture around data

Leveraging & managing data for strategic advantage

Coordinating & integrating disparate data sources

“Bottom-Up” management & inventory of data sources

Copyright Global Data Strategy, Ltd. 2017 Global Data Strategy, Ltd. 2017 11 Metadata Use Cases – Now & In the Future • Business Use Cases for Metadata area Evolving, according to the DATAVERSITY Emerging Trends survey. • The “Top 5” are changing – Less BI/DW and Software Dev & More & • Data Governance, , and Data Quality remain constants Now Future

Global Data Strategy, Ltd. 2017 12 Types of Metadata – Now & In the Future • The types of metadata sources being managed are also evolving. • Business Glossaries & Data remain constants • Data Quality & Big Data Platform sources are growing Now Future

Global Data Strategy, Ltd. 2017 13 Metadata isn’t just for Relational Databases anymore…

• There are many Sources and Types of Metadata

 Relational databases  Application Code  Data Models  / ETL Tools  Text Documents   XML  Data Quality Tools  Open Data  Business Process Models  of Things (IoT)  (BI) Tools  Photos / Images  ERP, CRM, and Packed Applications   Big Data platforms  COBOL Copybooks  Etc.… there are many more  Graph Databases

Global Data Strategy, Ltd. 2017 14 Metadata

• The technical structure of a relational database is defined by DDL (). It describes the structure / schema for how data is stored in a database. • A Glossary or generally stores the business metadata.

Data Technical Metadata Business Metadata

CREATE EMPLOYEE ( employee_id INTEGER NOT NULL, department_id INTEGER NOT NULL, employee_fname VARCHAR(50) NULL, Term Definition employee_lname VARCHAR(50) NULL, An employee is an individual who currently employee_ssn CHAR(9) NULL); Employee works for the organization or who has been recently employed within the past 6 months. CREATE TABLE CUSTOMER ( customer_id INTEGER NOT NULL, A customer is a person or organization who customer_name VARCHAR(50) NULL, has purchased from the organization within Customer customer_address VARCHAR(150) NULL, the past 2 years and has an active loyalty card customer_city VARCHAR(50) NULL, customer_state CHAR(2) NULL, or maintenance contract. customer_zip CHAR(9) NULL); Glossary or Data Dictionary

DDL John Smith

Global Data Strategy, Ltd. 2017 15 Data Models are a Good Source of Metadata • Data Models are another good source of both business & technical metadata for relational databases. • They store structural metadata as well as business rules & definitions.

Technical Metadata Business Metadata Customer Customer_ID CHAR(18) NOT NULL First Name CHAR(18) NOT NULL Last Name CHAR(18) NOT NULL City CHAR(18) NULL Date Purchased CHAR(18) NULL

Global Data Strategy, Ltd. 2017 16 ERP, CRM and Packaged Application Metadata • Packaged applications such as CRM and ERP (e.g. Salesforce, PeopleSoft, etc.) are typically based on a relational database . • Therefore, there is important metadata about both the physical table structures as well as the business names & definitions. Technical Metadata Business Metadata

Global Data Strategy, Ltd. 2017 17 NoSQL – Key Value Databases

• NoSQL Databases are often optimal solutions for flexibility & performance in certain scenarios. • One common NoSQL database is a key-value pair database (e.g. , Oracle NoSQL, etc.) • They can support extremely high volumes of records & state changes per second through distributed processing and distributed storage. • Use cases include: Managing sessions in web applications, online gaming, online shopping carts, etc. • The structure is often created by the application code, not within a database or metadata structure. • Metadata for NoSQL databases is typically minimal or non-existent. • The structure & metadata is generally determined by the application code

Key Value

1839047 John Doe, Prepaid, 40.00

9287320 01/01/2008, 50.00, Green

Global Data Strategy, Ltd. 2017 18 NoSQL Metadata – Document Databases

• Document databases are popular ways to store unstructured information in a flexible way (e.g. multimedia, social media posts, etc. ) • Each Collection can contain numerous Documents which could all contain different fields.

{type: “Artifact”, {type: “”, medium: “Ceramic” title: “Ancient China” country: “China”, country: “China”, } }

• Some data modeling can be done, and some data modeling tools support this (e.g. MongoDB).

*

* Example from docs.mongodb.com

Global Data Strategy, Ltd. 2017 19 Big Data Platform Metadata

• Big Data platforms (e.g. Hadoop-based) are typically based on system of files (HDFS) • As a result, the detailed structure that is found in a relational database platform does not • Metadata still exists for these platforms, however.

 Technical Metadata  Tree structure of HDFS directories  Directory and file attributes (ownership, permissions, quotas, factor, etc.)  Metadata about logical data sets (e.g. format, , etc.)  Data ingest & transformation lineage  Business Metadata  Description of file  Tags  There are components that allow you to add structure within the Hadoop (e.g. Hive)

Global Data Strategy, Ltd. 2017 20 COBOL Copybook Metadata

• What is a COBOL Copybook? – In COBOL, a copybook file is used to define data elements that can be referenced by many programs • What is COBOL Copybook Metadata? – structure, definition

Metadata Describes structure & format of data

The demand for COBOL & legacy metadata is growing, according to the recent DATAVERSITY survey.

Global Data Strategy, Ltd. 2017 21 Graph Relationships

• Graph databases are ideal for analyzing metadata relationships between objects and finding patterns in those relationships. • Common use cases for graph relationship metadata analysis include: • Fraud detection - e.g. financial transactions • Threat detection - e.g. and phone patterns • Marketing – e.g. social media connections, product recommendation engines • Network optimization - e.g. IoT, Telecommunications

• In a , “the metadata is the database”.

Global Data Strategy, Ltd. 2017 22 XML Metadata

• What is XML? – (Extensible ) is used to store and transport data. It’s often a complement to HTML, which is used to format the data. • What is XML Metadata? – Similar to DDL, an XML Schema (XSD) defines the structure & format of data

Metadata Data Data Order Shipment Ship to: John Smith John Smith

123 Main ST
123 Main ST Boise Boise USA USA ……………………………………… XML ……………………………………… XSD

Global Data Strategy, Ltd. 2017 23 JSON Metadata • What is JSON? – (JavaScript Object Notation) is a minimal, readable format for structuring data. It is used primarily to transmit data between a and web application, as an alternative to XML. • What is JSON Metadata? – structure, definition For example, assume we have a JSON based product catalog. This catalog has a product which has an id, a brand, a price, and an optional set of tags.

{ Metadata "$schema": "http://json-schema.org/draft-04/schema#", Data Context Needed "title": "Product", "description": "A retail product from Acme's online catalog", (i.e. Metadata) "type": "object", { "properties": { • Can the ID contain letters? "id": { "id": 127849, "description": "The unique for a product", “brand": “Super Cooler", • What is a brand? "type": "integer" }, "price": 12.50, • Is a price required? “brand": { "description": “The brand name of the product as shown in the online catalogue", "tags": [“camping", “sports"] • Etc. "type": "string" }, } "price": { "type": "number", }, Example Product in the API "tags": { "type": "array", "items": { "type": "string" }, "minItems": 1, } }, JSON Schema "required": ["id", “brand", "price"] }

Global Data Strategy, Ltd. 2017 24 IoT Metadata

• What is the IoT? – The (IoT) is a network of physical devices that are able to data over a network. • What is IoT Metadata? – Metadata is necessary to provide context around the readings generated by IoT devices, e.g. units of , type of measurement, etc. .

Is 140 the temperature of my stove or my max heart rate on my run?

140

Global Data Strategy, Ltd. 2017 25 Document Metadata

• Document metadata provides context & background on the contents & purpose of the document. • Often found in the “Properties” menu

Tags & Categories for search & organization User-defined Description/Comments about the contents of the document

Date/Time stamp when document was created, modified, etc. System-defined Author of the document

Global Data Strategy, Ltd. 2017 26 Image Metadata • Metadata is critical for locating images online, as well as identifying copyright information, etc. • Some information is system-generated, while other is user-defined.

Technical Metadata Descriptive Metadata Administrative Metadata (Embedded in Photo) (User Defined) (User Defined) Camera: Apple iPhone 6 Plus iPhone 6 Plus back camera 4.15mm f/2.2 Lens: Shot at 4.2 mm Title DATAVERSITY EDW 2016 San Diego Author Donna Burbank Digital Zoom: 5.006134969× 1 Exposure: Auto exposure, Program AE, /7,937 sec, f/2.2, ISO 32 Keywords EDW 2016, San Diego, Bay Photos Copyright None Flash: Auto, Did not fire April 13, 2016 5:35:53PM (timezone not specified) Location San Diego Licensing None (1 month, 11 days, 14 hours, 14 minutes, 46 seconds ago, Date: assuming timezone of US Pacific)

3,264 × 2,448 JPEG (8.0 megapixels) File: 800,782 bytes (782 kilobytes)

Global Data Strategy, Ltd. 2017 27 Social Media Metadata

• Metadata from Social Media, such as Twitter, can help identify trend and , for example.

Embedded Metadata Date/Time Location Language Author Device Used ID of Tweet Text/Content of Tweet Etc.

Hash

100

# of Retweets

Global Data Strategy, Ltd. 2017 28 Open Data Metadata • What is a Open Data? – Open Data is data that can be freely used and redistributed by anyone • What is Open Data Metadata? – Metadata provides the context that makes open data usable and credible. Feedback loop When was it created or updated?

When was it Published?

Who published it? What is the intended usage? How often is it refreshed?

What are the security or usage restrictions? Data

What keywords categorize this data?

Global Data Strategy, Ltd. 2017 29 Business Process Model Metadata

• Business Process Models describe key activities within the organization. • Linking these processes to the data that is Created, Updated, or Deleted (CRUD) is important to understanding data usage.

Business Process Model CRUD Matrix Customer Order Account Invoice Product Receive Customer Order R C C, R Process Customer Order C,R,U R,U R Fill Order R,U R,U R,U Send Invoice R,U R,U C

Global Data Strategy, Ltd. 2017 Architectural Options for Metadata Management

• The following are common architectural options for metadata management within & between organizations. • There is no “one size all” approach. • They can be used together within the same organization.

Central, Enterprise-wide Tool or Purpose-Specific Metadata Exchange & Metadata Repository Repository Registry Publication & Sharing

Information Sharing & Standards Reports Web Portal Integration & Export Business Glossary Matching & Population ETL Tool Reuse Logic Interfaces

Metamodel(s) Database

Data Modeling Tool Data Dictionary

Etc BI Tool Metadata Storage (Database)

Global Data Strategy, Ltd. 2017 31 Data Warehousing Example

• In the data example below, metadata for CUSTOMER exists in a number tools & data stores.

Logical

Business Glossary Physical Data Model Dimensional Data Model CUSTOMER Physical Data Model CUSTOMER CUSTOMER BI Tool

ETL Tool ETL Tool Database Table Database Table Database Table CUST

Database Table TBL_C1 Sales Report

Database Table

Global Data Strategy, Ltd. 2017 32 Metadata Discovery, Lineage & Matching

• Methods for metadata discovery, lineage, and matching have evolved over the years, and can include the following: • Matching Rules: Matching rules & logic based on the structure of the , e.g. • “Customer” can match to “CUST”, “CUSTOMER”, etc. in a relational database table • Database columns are the same if they have the same name & data type • Patterns in Data Values: AI & Pattern Matching, e.g. • generally include the format [email protected]/.org, etc. • A consistent pattern of values such as NNN-NN-NNNN are likely social security numbers • Image Pattern Recognition: Detection of similar patterns in unstructured documents based on signal & image processing at the byte level, e.g. • These two documents both look like services contracts -> classify accordingly • Human-defined Tagging: Many systems allow user-defined “tags”, e.g. • Document of tagging for Search (e.g. “furry kittens”) • AWS tags for classification

Global Data Strategy, Ltd. 2017 33 Reuse & Matching Rules

• Reuse & Matching Rules help: • Is it the same table if it has the same name (e.g. CUSTOMER)? • …if it has an approved variation of that name (e.g. CUST) • Rationalize common objects • …if it has the same columns (e.g. first name, last name, gender) • Establish linkages between related • …if the columns are in the same order objects • …if it has the same description • Etc. Lineage Rationalization A customer is a person or organization with The CUSTOMER table on Oracle is an active account. represented by a Physical data model of the same object.

The data model has a description of the table—let’s create a single The CUSTOMER table on Oracle is represented by a Physical object combining both sources. data model of the same object.

Transformation rules are applied to the table and it is integrated with other sources to create the CUSTOMER table in the Staging area. A customer is a person or organization with an active account. This table is then transformed to the dimensional CUSTOMER table.

Global Data Strategy, Ltd. 2017 34 AI & Pattern Matching

• Many metadata solutions can detect patterns in the data and, infer classification and linkage based on the data values themselves.

Classify as: email addresses [email protected]

98748 Elkhorn Way Classify as: physical addresses 18 Winding Court Rd

Global Data Strategy, Ltd. 2017 35 Image Pattern Recognition

• Some vendors are using signal and image processing techniques to classify . • Based on byte-level patterns in the documents, patterns & classifications can be inferred

Classify as: Bank Check

Global Data Strategy, Ltd. 2017 36 Tagging

• Many solutions have the of creating user defined “tags” that classify & follow the data. • A tag is a non-hierarchical keyword that can be assigned to a piece of information, often for search & classification. (i.e. think “tagging photos”)

Amazon S3 • For Buckets, user-defined metadata tags can be stored in a series of key-value pairs. For example, security classifications & retention policies could be managed as shown below. • Tags can be assigned at the bucket and file level • Metadata can be assigned when uploading an object, manually, via PUT/POST requests, or via the REST or SOAP . Metadata tags can also be retrieved via the API.

Global Data Strategy, Ltd. 2017 37 Summary

• Metadata is more important than ever • Metadata is part of wider enterprise data management • Core use cases around data governance, data quality, and master data management remain • New use cases around Big Data and data science are emerging • There are a variety of sources of metadata across the organization • Metadata isn’t just for relational databases • The concept of “database” is evolving • There are a number of Technical Innovations & Best Practices in Managing Metadata, including • Matching Rules • Patterns in Data Values • Image Pattern Recognition • Tagging

Global Data Strategy, Ltd. 2017 About Global Data Strategy, Ltd Data-Driven Business Transformation • Global Data Strategy is an international information management consulting company that specializes in the alignment of business drivers with data-centric technology. • Our passion is data, and helping organizations enrich their business opportunities through data and information. • Our core values center around providing solutions that are: • Business-Driven: We put the needs of your business first, before we look at any technology solution. • & Relevant: We provide clear explanations using real-world examples. • Customized & Right-Sized: Our implementations are based on the unique needs of your organization’s size, corporate culture, and . • High Quality & Technically Precise: We pride ourselves in excellence of execution, with years of technical expertise in the industry. Business Strategy Data Strategy Aligned With

Visit www.globaldatastrategy.com for more information

Global Data Strategy, Ltd. 2017 39 Contact Info

• Email: [email protected] • Twitter: @donnaburbank @GlobalDataStrat • : www.globaldatastrategy.com

Global Data Strategy, Ltd. 2017 40 White Paper: Emerging Trends in Metadata Management Free Download

• Download from www.dataversity.net • Also available on www.globaldatastategy.com

Global Data Strategy, Ltd. 2017 41 Lessons in Data Modeling Series This Year’s Line Up • January 26th How Data Modeling Fits Into an Overall • February 23rd Data Modeling and Business Intelligence • March Conceptual Data Modeling – How to Get the Attention of Business Users • April The Evolving Role of the Data Architect – What does it mean for your Career? • May Data Modeling & Metadata Management • June Self-Service , Data Wrangling, Data Munging, and Data Modeling • July Data Modeling & Metadata for Graph Databases Next week! • August Data Modeling & • September Data Modeling & MDM • October Agile & Data Modeling – How Can They Work Together? • December Data Modeling, Data Quality & Data Governance

Global Data Strategy, Ltd. 2017 42 DATAVERSITY Training Center Online Training Courses Metadata Management Course • Learn the basics of Metadata Management and practical tips on how to apply metadata management in the real world. This online course hosted by DATAVERSITY provides a series of six courses including: • What is Metadata • The Business Value of Metadata • Sources of Metadata • Metamodels and Metadata Standards • Metadata Architecture, Integration, and Storage • Metadata Strategy and Implementation • Purchase all six courses for $399 or individually at $79 each. Register here • Other courses available on Data Governance & Data Quality

Visit: http://training.dataversity.net/lms/

Global Data Strategy, Ltd. 2017 43 Questions? Thoughts? Ideas?

Global Data Strategy, Ltd. 2017 44