V.2.2
Beyond the Models: Applying Semantic Technologies Across the Enterprise
Eric Little, PhD Chief Data Officer OSTHUS [email protected] The Current Situation Across Enterprises
Many challenges exist for data to be captured, integrated and shared Data Silos Incompatible instruments and software systems, proprietary data formats Legacy architectures are brittle and rigid SME knowledge resides in people’s heads, little common vocabulary Data schemas are not explicitly understood Lack of common vision between business units and scientists
Slide 2 Big Data’s Impacts
The challenge of big data is here – and it is growing By 2020 there will be 2.3 Zetabytes of annual traffic on the Internet (ZB=1,000,000,000,000,000,000,000 bytes) The volume of business data worldwide is estimated to double every 1.2 years. Since 2012, more than 90 percent of the Fortune 500 have funded big data initiatives 100 terabytes of data is uploaded daily to Facebook If each Gigabyte in a Zettabyte were a brick, 258 Data production will be 44 times greater in 2020 Great Walls of China (made of 3.8B bricks) could than it was in 2009 be built.
Storing/retrieving that amount of data is 1 challenge …. Analyzing even a fraction of it is an even bigger challenge
Slide 3 The Common Big Data Fallacy
Hypothesis:
If I have more data at my fingertips – then I will have more answers
Well…. Actually….. No.
One major hurdle: “Real-world data […] is messy data, filled with inconsistencies, potential biases, and noise.”
Copping & Li Harvard Business Review Nov 29, 2016 Need a new approach to Big Data
Slide 4 Understanding the 4V’s of Big Data
Mathematical Clustering Techniques provide clear advantages
Semantic Majority of Big Data analytics technologies provide approaches treat these two V’s clear advantages Normally the focus – Performance is Data complexity is Uncertainty abounds Big Data Analysis is Critical to Success increasing – Model – requires statistics more than just size complexity and probabilities
Slide 5 Moving to Smart Data – Enter Semantics
Smart data can be added to existing systems Does not require replacement of existing tech
Smart data provides a separation of: Model Layer Data Layer
Link to the model layer Leave data in place Smart data links information from the models to instance-level data
Smart Data uses metadata in order to capture logical context about data
Slide 6 Semantic Spectrum of Knowledge Organization Systems
Sources • Deborah L. McGuinness. "Ontologies Come of Age". In Dieter Fensel, Jim Hendler, Henry Lieberman, and Wolfgang Wahlster, editors. Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential. MIT Press, 2003. • Michael Uschold and Michael Gruninger “Ontologies and semantics for seamless connectivity” SIGMOD Rec. 33, 4 (December 2004), 58-64. DOI=http://dx.doi.org/10.1145/1041410.1041420 • Leo Obrst “The Ontology Spectrum”. Book section in of Roberto Poli, Michael Healy, Achilles Kameas “Theory and Applications of Ontology: Computer Applications”. Springer Netherlands, 17 Sep 2010. Slide 7 • Leo Obrst and Mills Davis "Semantic Wave 2008 Report: Industry Roadmap to Web 3.0 & Multibillion Dollar Market Opportunities”. 2008. Ontologies capture important logical structures
Ontologies provide a background for computations
Humans logically structure their world
Ontologies help to capture that structure
Background Beliefs
But where does Machine Learning fit in?
Slide 8 The Growth of Analytics is Changing the Game
The power of analytics is now just beginning to be felt Moore’s Law pertaining to processing is not the problem Focus on the growth of Analysis: From 1988-2003 Computer processing
speed grew by 1000x ANALYTICS In the same period algorithm dev grew by 43,000x
Advanced analytics are increasingly adopted in mid-market organizations and large enterprises
Slide 9 Machine Learning and Deep Learning on Image Data
As data sources continue to increase – so to do new algorithmic approaches
Data Variety & Veracity are driving new innovations
More data is now better
Specialized hardware is evolving to match needs Switch to Deep Learning Approach
From: https://medium.com/@anthony_sarkis/the-age-of-the- algorithm-why-ai-progress-is-faster-than-moores-law-2fb7d5ae7943
Slide 10 THE MOVE FROM BIG DATA TO BIG ANALYSIS SEMANTICS STATISTICAL MACHINE LEARNING REASONING
Slide 11 Big Analysis Requires Hybrid Architectures
Dashboards & Reports Semantic DBs
Structured Data
Unstructured Docs Integration Layer
Analytics Cloud DBs (NoSQL)
Slide 12 Two Extremes of a Spectrum of Possible Solutions for Big Data
Data Warehouse Data Lake
Proven enterprise technology Great flexibility and very little effort to store + + all sorts of data Big DWHs require too great an effort Data lakes are too loose a construct Not all data is suitable for rigid DWHs Tremendous efforts on retrieval
Slide 13 Make Data FAIR (Findable, Accessible, Interoperable Reusable)
Analytics Tools Visualization Reporting simulations dashboards regulatory statistics exploration internal reasoning search external …
Data Science (machine learning, text analytics, clustering etc.)
Lightweight Semantic Integration Layer (semantic RMDM, APIs, semantic indexing, data annotation, catalogues, meta data and linking)
…
Instrument Operational DBs Semi-structured Unstructured Semantic Linked Open Data Data Data Documents Graph DB & Open APIs Slide 14 (Knowledge Graph) 15
Enter LeapAnalysis 16 True Federated Analytics Across The Enterprise
Queries, Rules, Patterns, etc.
Big Analysis Concept: Semantics + Statistics LeapAnalysis
Ref Data NOSQL Excel
Any kind of data source can be supported directly 17 Main Topics to Consider
• Companies must speed up the process of integrating data • Cleaning or integrating data before you know its value is wasteful • Making data just “smart” can make it very slow • The world is moving to decentralization • Virtualization • Federation • Complex problem solving • Pattern/model reuse SPARQL Query 1818 prefix core:
Subject Domain
Domain Models
LA Alignment Store Data Integration … (MongoDB) Model
xxxx Samplexxxxxxx Here xxxx xxxxxxx
Open Source Data
Reference Model Patient Data Patient Data LA Alignment (Virtuoso RDF) (CSV File System) (MSSQL) Store (MongoDB)
Aligned Data Sources CONNECTING DATA, PEOPLE AND ORGANIZATIONS
Contact Information:
Email: [email protected] Web: www.osthus.com www.biganalysis.com Twitter: OntoEric
Slide 19