V.2.2

Beyond the Models: Applying Semantic Technologies Across the Enterprise

Eric Little, PhD Chief Data Officer OSTHUS [email protected] The Current Situation Across Enterprises

Many challenges exist for data to be captured, integrated and shared Data Silos Incompatible instruments and software systems, proprietary data formats Legacy architectures are brittle and rigid SME knowledge resides in people’s heads, little common vocabulary Data schemas are not explicitly understood Lack of common vision between business units and scientists

Slide 2 Big Data’s Impacts

The challenge of big data is here – and it is growing  By 2020 there will be 2.3 Zetabytes of annual traffic on the Internet (ZB=1,000,000,000,000,000,000,000 bytes)  The volume of business data worldwide is estimated to double every 1.2 years.  Since 2012, more than 90 percent of the Fortune 500 have funded big data initiatives  100 terabytes of data is uploaded daily to Facebook If each Gigabyte in a Zettabyte were a brick, 258  Data production will be 44 times greater in 2020 Great Walls of China (made of 3.8B bricks) could than it was in 2009 be built.

Storing/retrieving that amount of data is 1 challenge …. Analyzing even a fraction of it is an even bigger challenge

Slide 3 The Common Big Data Fallacy

Hypothesis:

 If I have more data at my fingertips – then I will have more answers

Well…. Actually….. No.

One major hurdle: “Real-world data […] is messy data, filled with inconsistencies, potential biases, and noise.”

Copping & Li Harvard Business Review Nov 29, 2016 Need a new approach to Big Data

Slide 4 Understanding the 4V’s of Big Data

Mathematical Clustering Techniques provide clear advantages

Semantic Majority of Big Data analytics technologies provide approaches treat these two V’s clear advantages Normally the focus – Performance is Data complexity is Uncertainty abounds Big Data Analysis is Critical to Success increasing – Model – requires statistics more than just size complexity and probabilities

Slide 5 Moving to Smart Data – Enter Semantics

Smart data can be added to existing systems  Does not require replacement of existing tech

Smart data provides a separation of:  Model Layer  Data Layer

Link to the model layer  Leave data in place  Smart data links information from the models to instance-level data

Smart Data uses in order to capture logical context about data

Slide 6 Semantic Spectrum of Knowledge Organization Systems

Sources • Deborah L. McGuinness. "Ontologies Come of Age". In Dieter Fensel, Jim Hendler, Henry Lieberman, and Wolfgang Wahlster, editors. Spinning the : Bringing the World Wide Web to Its Full Potential. MIT Press, 2003. • Michael Uschold and Michael Gruninger “Ontologies and semantics for seamless connectivity” SIGMOD Rec. 33, 4 (December 2004), 58-64. DOI=http://dx.doi.org/10.1145/1041410.1041420 • Leo Obrst “The Ontology Spectrum”. Book section in of Roberto Poli, Michael Healy, Achilles Kameas “Theory and Applications of Ontology: Computer Applications”. Springer Netherlands, 17 Sep 2010. Slide 7 • Leo Obrst and Mills Davis "Semantic Wave 2008 Report: Industry Roadmap to Web 3.0 & Multibillion Dollar Market Opportunities”. 2008. Ontologies capture important logical structures

Ontologies provide a background for computations

Humans logically structure their world

Ontologies help to capture that structure

Background Beliefs

But where does Machine Learning fit in?

Slide 8 The Growth of Analytics is Changing the Game

The power of analytics is now just beginning to be felt  Moore’s Law pertaining to processing is not the problem Focus on the growth of Analysis:  From 1988-2003 Computer processing

speed grew by 1000x ANALYTICS  In the same period algorithm dev grew by 43,000x

Advanced analytics are increasingly adopted in mid-market organizations and large enterprises

Slide 9 Machine Learning and Deep Learning on Image Data

As data sources continue to increase – so to do new algorithmic approaches

Data Variety & Veracity are driving new innovations

More data is now better

Specialized hardware is evolving to match needs Switch to Deep Learning Approach

From: https://medium.com/@anthony_sarkis/the-age-of-the- algorithm-why-ai-progress-is-faster-than-moores-law-2fb7d5ae7943

Slide 10 THE MOVE FROM BIG DATA TO BIG ANALYSIS SEMANTICS STATISTICAL MACHINE LEARNING REASONING

Slide 11 Big Analysis Requires Hybrid Architectures

Dashboards & Reports Semantic DBs

Structured Data

Unstructured Docs Integration Layer

Analytics Cloud DBs (NoSQL)

Slide 12 Two Extremes of a Spectrum of Possible Solutions for Big Data

Data Warehouse Data Lake

Proven enterprise technology Great flexibility and very little effort to store + + all sorts of data Big DWHs require too great an effort Data lakes are too loose a construct Not all data is suitable for rigid DWHs Tremendous efforts on retrieval

Slide 13 Make Data FAIR (Findable, Accessible, Interoperable Reusable)

Analytics Tools Visualization Reporting simulations dashboards regulatory statistics exploration internal reasoning search external …

Data Science (machine learning, text analytics, clustering etc.)

Lightweight Semantic Integration Layer (semantic RMDM, APIs, semantic indexing, data annotation, catalogues, meta data and linking)

Instrument Operational DBs Semi-structured Unstructured Semantic Linked Open Data Data Data Documents Graph DB & Open APIs Slide 14 (Knowledge Graph) 15

Enter LeapAnalysis 16 True Federated Analytics Across The Enterprise

Queries, Rules, Patterns, etc.

Big Analysis Concept: Semantics + Statistics LeapAnalysis

Ref Data NOSQL Excel

Any kind of data source can be supported directly 17 Main Topics to Consider

• Companies must speed up the process of integrating data • Cleaning or integrating data before you know its value is wasteful • Making data just “smart” can make it very slow • The world is moving to decentralization • Virtualization • Federation • Complex problem solving • Pattern/model reuse SPARQL Query 1818 prefix core: xxxxxxxxxxxxxxxx User prefix bdm: xxxxxxxxxxxxxxxx prefix rdf: How LeapAnalysis SELECT ?subject ?gender ?indication ?age ?height ?weight Query WHERE { Workspaces ?subject rdf:type core:Subject . ?subject core:hasGender ?gender . ?subject core:hasIndication ?indication . ?subject bdm:hasAge ?age . Works ?subject bdm:hasHeight ?height . ?subject bdm:hasWeight ?weight . Query Response }

Subject Domain

Domain Models

LA Alignment Store Data Integration … (MongoDB) Model

xxxx Samplexxxxxxx Here xxxx xxxxxxx

Open Source Data

Reference Model Patient Data Patient Data LA Alignment (Virtuoso RDF) (CSV File System) (MSSQL) Store (MongoDB)

Aligned Data Sources CONNECTING DATA, PEOPLE AND ORGANIZATIONS

Contact Information:

Email: [email protected] Web: www.osthus.com www.biganalysis.com Twitter: OntoEric

Slide 19