GDPR Readiness and Data Lineage for Oracle Analytics Cloud
www.dimensionality.ch @Nephentur freenode | obihackers slide 1 Who am I?
• Oracle ACE Director Business Analytics • Oracle Analytics since 2001 (nQuire + Peregrin aquisitions by Siebel) • Speaker at OpenWorld, KScope, User Groups and open-source conferences • Part-time blogger on Analytics, BI, DWH, Data Science (http://dimensionality.ch) • Full-time IRC (freenode | #obihackers) • ODC and OCCC community advocate • Trainer for Oracle University since 2006
www.dimensionality.ch @Nephentur freenode | obihackers slide 2 500+ Technical Experts Helping Peers Globally
3 Membership Tiers Connect: • Oracle ACE Director bit.ly/OracleACEProgram [email protected] • Oracle ACE • Oracle ACE Associate Facebook.com/oracleaces @oracleace
Nominate yourself or someone you know: acenomination.oracle.com “Thanks” to this guy
www.dimensionality.ch @Nephentur freenode | obihackers slide 4 GDPR - Yes, this is still a topic
www.dimensionality.ch @Nephentur freenode | obihackers slide 5 GDPR
25 May 2018
www.dimensionality.ch @Nephentur freenode | obihackers slide 6 GDPR – the stuff you’veStill didn’t heard hear of any 500 times big lawsuit yet, right?
General Data Protection Regulation – Approved by EU Parliament on April 2016 – It is already in place !! So everybody did – Enforcement date: 25 May 2018 (fines started from that date) everything correct Some key points to remember: and we’re cool, right? – The same across the European Union – “Personal data”: any information relating to a person who can be identified (directly or indirectly) – Fines: lot of money! Up to €20 million or 4% global revenue (the greater of the two) – Data Protection Officer – Privacy management – Breach & Notification Not really, no… – Data subject access requests – Data retention – Right to be forgotten
www.dimensionality.ch @Nephentur freenode | obihackers slide 7 GDPR
Trying to keep it simple: • Know where the data is stored in your company • Who has access (can’t allow full DB access anymore)
Over the last 12-24 month GDPR has been a key topic at conferences … in the database track mainly • Which DB stores what? • Who has access to the DB?
ERP/CRM streams also covered the topic as they often are the entry point where data is gathered
www.dimensionality.ch @Nephentur freenode | obihackers slide 8 Overconfidence
The DB stores all data? GDPR compliance is easy! GDPR compliant I control my DB, I control security in my DB, I can do auditing on it. Nothing to worry about.
www.dimensionality.ch @Nephentur freenode | obihackers slide 9 Analytics and GDPR articles of law
• Article 6 – Lawfulness of processing • Article 18 – Right to restriction of processing • Article 21 – Right to object • Article 22 – Automated individual decision-making, including profiling
www.dimensionality.ch @Nephentur freenode | obihackers slide 10 Analytics and GDPR issues
Pretty much all the • Multiplication across technologies andcool places toys we always • Purposeless storage wanted. • “I may need it” syndrome Thanks, GDPR… • “IT takes too long” syndrome • Post-fact data modelling / mashups • Data prep / data enrichment • Citizen data science capabilities
www.dimensionality.ch @Nephentur freenode | obihackers slide 11 Analytics and GDPR issues
“Data preparation—also known as data enrichment— isn't new. In fact, I can almost guarantee that every analytics deployment out there has their users doing some kind of data preparation to support their visualizations.” -- Barry Mostert, Oracle
www.dimensionality.ch @Nephentur freenode | obihackers slide 12 And then there’s “Analytics”…
www.dimensionality.ch @Nephentur freenode | obihackers slide 13 Oracle (Autonomous) Analytics Cloud
www.dimensionality.ch @Nephentur freenode | obihackers slide 14 Oracle Analytics Cloud
• Oracle’s complete suite of Platform Services (PaaS) for unified analytics in the cloud
• Delivered entirely in the cloud: ‣ No infrastructure footprint ‣ Flexibility to immediately scale up or down ‣ Simplified, metered licensing
• Several options to suit your needs: ‣ Oracle or customer/partner managed ‣ Functionality bundled into 3 editions
www.dimensionality.ch @Nephentur freenode | obihackers slide 15 1 Functionalities
OAC supports every type of analytics workload
• Classic enterprise BI: ‣ Analysis & dashboarding ‣ Published reporting ‣ Enterprise Performance Management
• Modern departmental/personal discovery: ‣ Extended data mashup & modelling ‣ Data preparation, exploration & visualisation ‣ Data science & machine learning
www.dimensionality.ch @Nephentur freenode | obihackers slide 16 1 Classic Enterprise BI
• Similar User Experience to OBIEE 12c – Centrally maintained & governed – Semantic model remains key • Interactive Dashboards – Ideal for KPI measurement & monitoring – Guided navigation paths • BI Publisher – Highly formatted, burst outputs • Action Framework – Navigation actions – Scheduled agents
www.dimensionality.ch @Nephentur freenode | obihackers slide 17 1 Modern Data Discovery
• Data Preparation – Acquire data from multiple connections – Apply enrichments data prior to analysis – Define repeatable preparation flows • Data Visualisation – Create visual insights rapidly – Construct narated storyboards – Share findings • Machine Learning – Build & train ML models – Apply model to new data sets
www.dimensionality.ch @Nephentur freenode | obihackers slide 18 1 Mobile Options
• Mobile Web & BI Mobile App – All DV projects will auto-render on mobile devices – The heritage mobile app supports all OAC content • Synopsis Mobile App – Automatic Excel/CSV ingestion & analysis – Extending to all DV supported sources • Day by Day – Included within Enterprise Edition – Search driven analytics – Voice recognition allows you to verablise questions – Embedded learning enables a tailored experience
www.dimensionality.ch @Nephentur freenode | obihackers slide 19 1 Two Service Options
Out of scope … for the moment Analytics Cloud Autonomous Analytics Cloud
Services managed by Oracle : Services managed by You :
Based on Oracle Cloud Infrastructure Classic Backup & Recovery Service Monitoring Patching & Upgrades Test & Production instances Based on Oracle Cloud Infrastructure (OCI)
www.dimensionality.ch @Nephentur freenode | obihackers slide 20 2 * source neo4j
Topic set, time to dive into graphs …
www.dimensionality.ch @Nephentur freenode | obihackers slide 21 THIS is not a “Graph”
www.dimensionality.ch @Nephentur freenode | obihackers slide 22 Graph Database – What’s that?
edge vertex (node) edge ID
directed edge
edge properties
vertex properties edge label vertex ID
www.dimensionality.ch @Nephentur freenode | obihackers slide 23 Graph Database key factors
• No classic “model first” • No predefined schema needed • Completely flexible and extensible • Alleviates painful (relational) shortcomings
www.dimensionality.ch @Nephentur freenode | obihackers slide 24 Graph Database – What’s that?
Examples of graphs and graphs analytics can be seen when traveling from a location A to a location B : finding shortest path between 2 nodes of the graph
www.dimensionality.ch @Nephentur freenode | obihackers slide 25 Graphs for Auditing – Data Lineage
Can you imagine using a graph for auditing from a GDPR point of view?
• Data is first created / inserted (inside or outside the company) • Data is moved around • Data is transformed • Data is consumed by users or other processes
Data lineage is a perfect match for a graph. Data lifecycle steps can be tracked and navigated, node by node, following edges and using properties.
www.dimensionality.ch @Nephentur freenode | obihackers slide 26 Graphs for Auditing – Analytics
mapped to reference Presentation Physical column Logical column column (Physical layer) (BMM layer) (Presentation layer) reference
contains Dashboard Dashboard page page contains Analysis (Catalog) (Catalog) (Catalog)
Catalog ACL
LDAP User member of LDAP Group member of Application role (LDAP) (LDAP) (Security)
www.dimensionality.ch @Nephentur freenode | obihackers slide 27 Data Lineage on Steroids
This is a perfect match with a graph database to track data lineage!
For your information, data lineage graph size:
OBIEE (12.2.1.1.0) Sample Application v607 : – 45'700 nodes – 105'406 edges
OBIA (BIAPPS 10.2) RPD + Catalog on OAC (no security) : – 850'393 nodes – 1'717'554 edges
www.dimensionality.ch @Nephentur freenode | obihackers slide 28 www.dimensionality.ch @Nephentur freenode | obihackers slide 29 Graphs for Auditing – ETL / ELT
Breaks down into 1. Source(s) 2. Target(s) 3. Transformations
Perfect match for a graph database to track data lineage!
www.dimensionality.ch @Nephentur freenode | obihackers slide 30 Graphs for Auditing – ETL / ELT
www.dimensionality.ch @Nephentur freenode | obihackers slide 31 Graphs for Auditing – Database
A database can be seen as… • A set of schemas • A schema can have one or many tables • A table has columns • Various users/schemas can have access to some objects • Objects can be used by other objects – Synonyms, views etc. • Users run queries using objects • Users can generate “new” data from the results of queries
Again – perfect match for a graph database to track data lineage!
www.dimensionality.ch @Nephentur freenode | obihackers slide 32 Graph analysis…in the database
www.dimensionality.ch @Nephentur freenode | obihackers slide 33 Graph in Oracle DB – Creation
What you need: • Oracle Database 12c R2 or newer • Extended Data Types (to have varchar of more than 4’000)
BEGIN OPG_APIS.CREATE_PG('sa607', 4, 8, ''); END;
GE$ : edges of the graph VT$ : vertices of the graph GT$ : graph skeleton IT$ : text index metadata SS$ : graph snapshots
www.dimensionality.ch @Nephentur freenode | obihackers slide 34 Graph in Oracle DB – Loading The graph is by loaded by SQL, doing standard “INSERT” into the tables
www.dimensionality.ch @Nephentur freenode | obihackers slide 35 Graph in Oracle DB – Loading
• The graph can by loaded by Java / Python using one of the “utility” methods of the OraclePropertyGraphUtils class • In Python it can be done by using JPype or the new GraalVM released by Oracle not long ago (warning: python support in GraalVM is still limited and “fragile”)
• Example utility methods: – OraclePropertyGraphUtils. convertRDBMSTable2OPV – OraclePropertyGraphUtils. convertRDBMSTable2OPE
– OraclePropertyGraphUtils. convertCSV2OPV – OraclePropertyGraphUtils. convertCSV2OPE
• More methods exist to generate 2 files in the OPV/OPE format (flat text files: one for vertices, one for edges) www.dimensionality.ch @Nephentur freenode | obihackers slide 36 Graph in Oracle DB – Querying
• Normal tables, normal queries • Support specialised graph algorithms: https://docs.oracle.com/en/database/oracle/oracle-database/12.2/spgdg/OPG_APIS-reference.html
www.dimensionality.ch @Nephentur freenode | obihackers slide 37 Graph in Oracle DB – Querying
www.dimensionality.ch @Nephentur freenode | obihackers slide 38 Graph in Oracle DB – Query limitations
Not really much useful can by done by classic SQL – Analyse edges/vertices properties and labels – Counting – Find simple connection like (A) –[connected to]-> (B) – More complex paths require hierarchical queries as the edges “map” a source vertex to a target vertex
www.dimensionality.ch @Nephentur freenode | obihackers slide 39 Graph in Oracle DB – Advantages
• Standard manipulation on data by SQL can be useful (mass updates) • Define SCD2-like with effective date columns management of data to keep the graph smaller (instead of full snapshots all the time) • NOT supported by “vanilla graph” • Standard backup and restore (remember: it’s “just” tables in the DB … kind of)
www.dimensionality.ch @Nephentur freenode | obihackers slide 40 Connecting all the pieces – Visual Graph Analysis for GDPR
www.dimensionality.ch @Nephentur freenode | obihackers slide 41 Graph Analysis with Cytoscape
www.dimensionality.ch @Nephentur freenode | obihackers slide 42 Graph Analysis with Cytoscape
www.dimensionality.ch @Nephentur freenode | obihackers slide 43 Graph Analysis with Cytoscape
Catalog
RPD
From 45700 nodes with 105406 edges, to 85 nodes with 218 edges in seconds
www.dimensionality.ch @Nephentur freenode | obihackers slide 44 OAC – Catalog Structures and Shortcuts
www.dimensionality.ch @Nephentur freenode | obihackers slide 45 OAC – Catalog Structures and Shortcuts
www.dimensionality.ch @Nephentur freenode | obihackers slide 46 OAC – RPD Aliases
www.dimensionality.ch @Nephentur freenode | obihackers slide 47 OAC – Security
www.dimensionality.ch @Nephentur freenode | obihackers slide 48 OAC – Security
ANY Active Directory, LDAP, DB user/group store etc.
www.dimensionality.ch @Nephentur freenode | obihackers slide 49 OAC – Security
www.dimensionality.ch @Nephentur freenode | obihackers slide 50 OAC – Security
www.dimensionality.ch @Nephentur freenode | obihackers slide 51 OAC – Security
www.dimensionality.ch @Nephentur freenode | obihackers slide 52 OAC – Security
www.dimensionality.ch @Nephentur freenode | obihackers slide 53 OAC – Security Inheritance
What groups/application roles is “Leslie Emerson” part of directly or indirectly?
www.dimensionality.ch @Nephentur freenode | obihackers slide 54 OAC – Security Inheritance
What groups/application roles is “Leslie Emerson” part of directly or indirectly?
www.dimensionality.ch @Nephentur freenode | obihackers slide 55 OAC – Security Inheritance
www.dimensionality.ch @Nephentur freenode | obihackers slide 56 OAC – Data Sets
www.dimensionality.ch @Nephentur freenode | obihackers slide 57 OAC – Data Sets
www.dimensionality.ch @Nephentur freenode | obihackers slide 58 OAC – Data Sets
The Excel mafia is alive and kicking
www.dimensionality.ch @Nephentur freenode | obihackers slide 59 OAC – Data Sets
www.dimensionality.ch @Nephentur freenode | obihackers slide 60 Conclusion: Graphs on OAC alleviate GDPR
The free structure of a graph representing information with connections between nodes allows to store any kind of data lineage: from DB, ETL or Analytics system
Graph analysis can be performed with multiple languages and tools: visually or by code/script
Not only for GDRP: graphs can represent any kind of information
www.dimensionality.ch @Nephentur freenode | obihackers slide 61 Self-service GDPR
Cloud Visualization
OAC Graphs
Analytics Security
Data Science www.dimensionality.ch @Nephentur freenode | obihackers slide 62