GDPR Readiness and Lineage for Oracle Cloud

www.dimensionality.ch @Nephentur freenode | obihackers slide 1 Who am I?

• Oracle ACE Director Business Analytics • Oracle Analytics since 2001 (nQuire + Peregrin aquisitions by Siebel) • Speaker at OpenWorld, KScope, User Groups and open-source conferences • Part-time blogger on Analytics, BI, DWH, Data Science (http://dimensionality.ch) • Full-time IRC (freenode | #obihackers) • ODC and OCCC community advocate • Trainer for Oracle University since 2006

www.dimensionality.ch @Nephentur freenode | obihackers slide 2 500+ Technical Experts Helping Peers Globally

3 Membership Tiers Connect: • Oracle ACE Director bit.ly/OracleACEProgram [email protected] • Oracle ACE • Oracle ACE Associate Facebook.com/oracleaces @oracleace

Nominate yourself or someone you know: acenomination.oracle.com “Thanks” to this guy

www.dimensionality.ch @Nephentur freenode | obihackers slide 4 GDPR - Yes, this is still a topic

www.dimensionality.ch @Nephentur freenode | obihackers slide 5 GDPR

25 May 2018

www.dimensionality.ch @Nephentur freenode | obihackers slide 6 GDPR – the stuff you’veStill didn’t heard hear of any 500 times big lawsuit yet, right?

General Data Protection Regulation – Approved by EU Parliament on April 2016 – It is already in place !! So everybody did – Enforcement date: 25 May 2018 (fines started from that date) everything correct Some key points to remember: and we’re cool, right? – The same across the European Union – “Personal data”: any information relating to a person who can be identified (directly or indirectly) – Fines: lot of money! Up to €20 million or 4% global revenue (the greater of the two) – Data Protection Officer – Privacy management – Breach & Notification Not really, no… – Data subject access requests – Data retention – Right to be forgotten

www.dimensionality.ch @Nephentur freenode | obihackers slide 7 GDPR

Trying to keep it simple: • Know where the data is stored in your company • Who has access (can’t allow full DB access anymore)

Over the last 12-24 month GDPR has been a key topic at conferences … in the track mainly • Which DB stores what? • Who has access to the DB?

ERP/CRM streams also covered the topic as they often are the entry point where data is gathered

www.dimensionality.ch @Nephentur freenode | obihackers slide 8 Overconfidence

The DB stores all data? GDPR compliance is easy! GDPR compliant I control my DB, I control security in my DB, I can do auditing on it. Nothing to worry about.

www.dimensionality.ch @Nephentur freenode | obihackers slide 9 Analytics and GDPR articles of law

• Article 6 – Lawfulness of processing • Article 18 – Right to restriction of processing • Article 21 – Right to object • Article 22 – Automated individual decision-making, including profiling

www.dimensionality.ch @Nephentur freenode | obihackers slide 10 Analytics and GDPR issues

Pretty much all the • Multiplication across technologies andcool places toys we always • Purposeless storage wanted. • “I may need it” syndrome Thanks, GDPR… • “IT takes too long” syndrome • Post-fact data modelling / mashups • Data prep / data enrichment • Citizen data science capabilities

www.dimensionality.ch @Nephentur freenode | obihackers slide 11 Analytics and GDPR issues

—also known as data enrichment— isn't new. In fact, I can almost guarantee that every analytics deployment out there has their users doing some kind of data preparation to support their visualizations.” -- Barry Mostert, Oracle

www.dimensionality.ch @Nephentur freenode | obihackers slide 12 And then there’s “Analytics”…

www.dimensionality.ch @Nephentur freenode | obihackers slide 13 Oracle (Autonomous) Analytics Cloud

www.dimensionality.ch @Nephentur freenode | obihackers slide 14 Oracle Analytics Cloud

• Oracle’s complete suite of Platform Services (PaaS) for unified analytics in the cloud

• Delivered entirely in the cloud: ‣ No infrastructure footprint ‣ Flexibility to immediately scale up or down ‣ Simplified, metered licensing

• Several options to suit your needs: ‣ Oracle or customer/partner managed ‣ Functionality bundled into 3 editions

www.dimensionality.ch @Nephentur freenode | obihackers slide 15 1 Functionalities

OAC supports every type of analytics workload

• Classic enterprise BI: ‣ Analysis & dashboarding ‣ Published reporting ‣ Enterprise Performance Management

• Modern departmental/personal discovery: ‣ Extended data mashup & modelling ‣ Data preparation, exploration & visualisation ‣ Data science &

www.dimensionality.ch @Nephentur freenode | obihackers slide 16 1 Classic Enterprise BI

• Similar User Experience to OBIEE 12c – Centrally maintained & governed – Semantic model remains key • Interactive Dashboards – Ideal for KPI measurement & monitoring – Guided navigation paths • BI Publisher – Highly formatted, burst outputs • Action Framework – Navigation actions – Scheduled agents

www.dimensionality.ch @Nephentur freenode | obihackers slide 17 1 Modern Data Discovery

• Data Preparation – Acquire data from multiple connections – Apply enrichments data prior to analysis – Define repeatable preparation flows • Data Visualisation – Create visual insights rapidly – Construct narated storyboards – Share findings • Machine Learning – Build & train ML models – Apply model to new data sets

www.dimensionality.ch @Nephentur freenode | obihackers slide 18 1 Mobile Options

• Mobile Web & BI Mobile App – All DV projects will auto-render on mobile devices – The heritage mobile app supports all OAC content • Synopsis Mobile App – Automatic Excel/CSV ingestion & analysis – Extending to all DV supported sources • Day by Day – Included within Enterprise Edition – Search driven analytics – Voice recognition allows you to verablise questions – Embedded learning enables a tailored experience

www.dimensionality.ch @Nephentur freenode | obihackers slide 19 1 Two Service Options

Out of scope … for the moment Analytics Cloud Autonomous Analytics Cloud

Services managed by Oracle : Services managed by You :

Based on Oracle Cloud Infrastructure Classic Backup & Recovery Service Monitoring Patching & Upgrades Test & Production instances Based on Oracle Cloud Infrastructure (OCI)

www.dimensionality.ch @Nephentur freenode | obihackers slide 20 2 * source neo4j

Topic set, time to dive into graphs …

www.dimensionality.ch @Nephentur freenode | obihackers slide 21 THIS is not a “Graph”

www.dimensionality.ch @Nephentur freenode | obihackers slide 22 Graph Database – What’s that?

edge vertex (node) edge ID

directed edge

edge properties

vertex properties edge label vertex ID

www.dimensionality.ch @Nephentur freenode | obihackers slide 23 Graph Database key factors

• No classic “model first” • No predefined schema needed • Completely flexible and extensible • Alleviates painful (relational) shortcomings

www.dimensionality.ch @Nephentur freenode | obihackers slide 24 Graph Database – What’s that?

Examples of graphs and graphs analytics can be seen when traveling from a location A to a location B : finding shortest path between 2 nodes of the graph

www.dimensionality.ch @Nephentur freenode | obihackers slide 25 Graphs for Auditing – Data Lineage

Can you imagine using a graph for auditing from a GDPR point of view?

• Data is first created / inserted (inside or outside the company) • Data is moved around • Data is transformed • Data is consumed by users or other processes

Data lineage is a perfect match for a graph. Data lifecycle steps can be tracked and navigated, node by node, following edges and using properties.

www.dimensionality.ch @Nephentur freenode | obihackers slide 26 Graphs for Auditing – Analytics

mapped to reference Presentation Physical column Logical column column (Physical layer) (BMM layer) (Presentation layer) reference

contains Dashboard Dashboard page page contains Analysis (Catalog) (Catalog) (Catalog)

Catalog ACL

LDAP User member of LDAP Group member of Application role (LDAP) (LDAP) (Security)

www.dimensionality.ch @Nephentur freenode | obihackers slide 27 Data Lineage on Steroids

This is a perfect match with a graph database to track data lineage!

For your information, data lineage graph size:

OBIEE (12.2.1.1.0) Sample Application v607 : – 45'700 nodes – 105'406 edges

OBIA (BIAPPS 10.2) RPD + Catalog on OAC (no security) : – 850'393 nodes – 1'717'554 edges

www.dimensionality.ch @Nephentur freenode | obihackers slide 28 www.dimensionality.ch @Nephentur freenode | obihackers slide 29 Graphs for Auditing – ETL / ELT

Breaks down into 1. Source(s) 2. Target(s) 3. Transformations

Perfect match for a graph database to track data lineage!

www.dimensionality.ch @Nephentur freenode | obihackers slide 30 Graphs for Auditing – ETL / ELT

www.dimensionality.ch @Nephentur freenode | obihackers slide 31 Graphs for Auditing – Database

A database can be seen as… • A set of schemas • A schema can have one or many tables • A table has columns • Various users/schemas can have access to some objects • Objects can be used by other objects – Synonyms, views etc. • Users run queries using objects • Users can generate “new” data from the results of queries

Again – perfect match for a graph database to track data lineage!

www.dimensionality.ch @Nephentur freenode | obihackers slide 32 Graph analysis…in the database

www.dimensionality.ch @Nephentur freenode | obihackers slide 33 Graph in Oracle DB – Creation

What you need: • Oracle Database 12c R2 or newer • Extended Data Types (to have varchar of more than 4’000)

BEGIN OPG_APIS.CREATE_PG('sa607', 4, 8, ''); END;

GE$ : edges of the graph VT$ : vertices of the graph GT$ : graph skeleton IT$ : text index metadata SS$ : graph snapshots

www.dimensionality.ch @Nephentur freenode | obihackers slide 34 Graph in Oracle DB – Loading The graph is by loaded by SQL, doing standard “INSERT” into the tables

www.dimensionality.ch @Nephentur freenode | obihackers slide 35 Graph in Oracle DB – Loading

• The graph can by loaded by Java / Python using one of the “utility” methods of the OraclePropertyGraphUtils class • In Python it can be done by using JPype or the new GraalVM released by Oracle not long ago (warning: python support in GraalVM is still limited and “fragile”)

• Example utility methods: – OraclePropertyGraphUtils. convertRDBMSTable2OPV – OraclePropertyGraphUtils. convertRDBMSTable2OPE

– OraclePropertyGraphUtils. convertCSV2OPV – OraclePropertyGraphUtils. convertCSV2OPE

• More methods exist to generate 2 files in the OPV/OPE format (flat text files: one for vertices, one for edges) www.dimensionality.ch @Nephentur freenode | obihackers slide 36 Graph in Oracle DB – Querying

• Normal tables, normal queries • Support specialised graph algorithms: https://docs.oracle.com/en/database/oracle/oracle-database/12.2/spgdg/OPG_APIS-reference.html

www.dimensionality.ch @Nephentur freenode | obihackers slide 37 Graph in Oracle DB – Querying

www.dimensionality.ch @Nephentur freenode | obihackers slide 38 Graph in Oracle DB – Query limitations

Not really much useful can by done by classic SQL – Analyse edges/vertices properties and labels – Counting – Find simple connection like (A) –[connected to]-> (B) – More complex paths require hierarchical queries as the edges “map” a source vertex to a target vertex

www.dimensionality.ch @Nephentur freenode | obihackers slide 39 Graph in Oracle DB – Advantages

• Standard manipulation on data by SQL can be useful (mass updates) • Define SCD2-like with effective date columns management of data to keep the graph smaller (instead of full snapshots all the time) • NOT supported by “vanilla graph” • Standard backup and restore (remember: it’s “just” tables in the DB … kind of)

www.dimensionality.ch @Nephentur freenode | obihackers slide 40 Connecting all the pieces – Visual Graph Analysis for GDPR

www.dimensionality.ch @Nephentur freenode | obihackers slide 41 Graph Analysis with Cytoscape

www.dimensionality.ch @Nephentur freenode | obihackers slide 42 Graph Analysis with Cytoscape

www.dimensionality.ch @Nephentur freenode | obihackers slide 43 Graph Analysis with Cytoscape

Catalog

RPD

From 45700 nodes with 105406 edges, to 85 nodes with 218 edges in seconds

www.dimensionality.ch @Nephentur freenode | obihackers slide 44 OAC – Catalog Structures and Shortcuts

www.dimensionality.ch @Nephentur freenode | obihackers slide 45 OAC – Catalog Structures and Shortcuts

www.dimensionality.ch @Nephentur freenode | obihackers slide 46 OAC – RPD Aliases

www.dimensionality.ch @Nephentur freenode | obihackers slide 47 OAC – Security

www.dimensionality.ch @Nephentur freenode | obihackers slide 48 OAC – Security

ANY Active Directory, LDAP, DB user/group store etc.

www.dimensionality.ch @Nephentur freenode | obihackers slide 49 OAC – Security

www.dimensionality.ch @Nephentur freenode | obihackers slide 50 OAC – Security

www.dimensionality.ch @Nephentur freenode | obihackers slide 51 OAC – Security

www.dimensionality.ch @Nephentur freenode | obihackers slide 52 OAC – Security

www.dimensionality.ch @Nephentur freenode | obihackers slide 53 OAC – Security Inheritance

What groups/application roles is “Leslie Emerson” part of directly or indirectly?

www.dimensionality.ch @Nephentur freenode | obihackers slide 54 OAC – Security Inheritance

What groups/application roles is “Leslie Emerson” part of directly or indirectly?

www.dimensionality.ch @Nephentur freenode | obihackers slide 55 OAC – Security Inheritance

www.dimensionality.ch @Nephentur freenode | obihackers slide 56 OAC – Data Sets

www.dimensionality.ch @Nephentur freenode | obihackers slide 57 OAC – Data Sets

www.dimensionality.ch @Nephentur freenode | obihackers slide 58 OAC – Data Sets

The Excel mafia is alive and kicking

www.dimensionality.ch @Nephentur freenode | obihackers slide 59 OAC – Data Sets

www.dimensionality.ch @Nephentur freenode | obihackers slide 60 Conclusion: Graphs on OAC alleviate GDPR

The free structure of a graph representing information with connections between nodes allows to store any kind of data lineage: from DB, ETL or Analytics system

Graph analysis can be performed with multiple languages and tools: visually or by code/script

Not only for GDRP: graphs can represent any kind of information

www.dimensionality.ch @Nephentur freenode | obihackers slide 61 Self-service GDPR

Cloud Visualization

OAC Graphs

Analytics Security

Data Science www.dimensionality.ch @Nephentur freenode | obihackers slide 62