Data Analytics In the Era of Cloud Computing

Raghu Ramakrishnan Technical Fellow CTO for Data

A Example of Data and AI Intelligent Local Queries Query: Where's the nearest fruit smoothies Location: Omaha, Nebraska Search queries, views, click throughs, …

World graph • People • Places • Things • Actions

2B+ entities 130B+ Web pages 50B+ facts

Web pages, Web documents, Images, … Microsoft’s Internal Big Data Service

Microsoft’s internal data lake

• A data lake for all teams Enabling business growth: @Microsoft • Good developer tools Office productivity revenue (45%YoY)* Azure Data Lake Store Intelligent Cloud (100% YoY)* • Batch, Interactive, Streaming, ML A data lake for everyone Bing search share doubles • Used across Office, Xbox, Azure, Windows, Ads, Bing, , … • Microsoft’s serverless Big Data platform • Production jobs and experimentation • Fully aligned with Hadoop ecosystem and standards, with full support for Hadoop tools and engines as well as By the numbers unique Microsoft capabilities • 9+ Exabytes of data, 8+ billion files • Migrated to ADLS • 100Ks of physical servers • 1P = 3P • Millions of interactive queries • Huge streaming pipelines • 100Ks of daily batch jobs • 15K+ developers • 300+ teams

5 Tesla’s Digital Feedback Loop

Data & AI

Customers CLOUD Products

Signal Signal

Customer interaction data Driving telemetry captured through instrumented vehicles (EDGE) Forums My Model S takes the curve easily at 45, but I frequently come up on cars who take it at 30, and I have to disengage and hit the brake myself

Jnsparke Tesla’s Digital Feedback Loop

Data & AI Customers Products CLOUD Signal Signal

Customer interaction data Driving telemetry captured through instrumented vehicles (EDGE) Issues identified by combining Forums customer and product signal My Model S takes the curve easily at 45, but I frequently come up on cars who take it at 30, and I have to disengage and hit the brake myself

Jnsparke Tesla’s Digital Feedback Loop

Action Action

Closing the loop with customers Pushes updates for Autopilot over the air

Data & AI Customers Products CLOUD Signal Signal

Customer interaction data Driving telemetry captured through instrumented vehicles (EDGE) Issues identified by combining Forums customer and product signal My Model S takes the curve easily at 45, but I frequently come up on cars who take it at 30, and I have to disengage and hit the brake myself

Jnsparke Tesla’s Digital Feedback Loop

Data & AI Customers Products

Operations Tesla’s Digital

Feedback Loop People

Data & AI Customers Products

Operations The Digital Feedback Loop

People

Signal Action

1 Signal: Action Signal

2 Intelligence: Data & AI

3 Action: Customers Products

Signal Action

Signal Action

Operations Improved decisions at the edge, guided by data analysis in the cloud

Autonomous action with intermittent connectivity USACE NavPortal and Real-time Dredging

 ESRI m

➢ NavPortal Bing Data Analytics Trends

➢ Digital loops across Cloud and Edge ➢ Unified Insights and Governance

© Microsoft Corporation Challenges

Large portions of the 1 enterprise are not digitized People

34% of organizations indicate poor data reliability is an obstacle for growth, while 28% cite insufficient data systems

DATA

DATA DATA

Customers Products DATA 3

Generating insights requires a high 2 Data is siloed across different degree of technical expertise parts of the organization 30% cite lack of analytical talent as a hindrance to generating insights 31% indicate data silos prevent their Operations enterprise from maximizing value

Source: PwC Trusted Data Optimization Pulse Survey Organizational Silos Many sources and types of data Siloed Analytic Tools NavPortal Architecture Data Silos Optimized for Different Tasks

Spark, Hive, ML… Azure SQL DW Azure SQL DB Azure Cosmos DB

Data Lake Analytics-optimized Update-optimized Document model

Meta data Meta data Meta data Meta data

XACT_STATE XACT_STATE XACT_STATE

Governance Governance Governance Governance Real Scenarios Involve Many Tasks!

LOB

CRM INGEST EXPLORE PREP & TRAIN MODEL & SERVE REPORT

Graph Azure SQL DW AAS Azure Data Factory Azure Databricks Azure SQL DW Azure Databricks Azure ML

Power BI Image Azure Cosmos DB Azure SQL Database ADLA ASA IOT Hub Event Hub Azure Data Catalog

Social

STORE IoT Azure HDInsight Azure Data Lake Storage Gen2 LOB

CRM INGEST EXPLORE PREP & TRAIN MODEL & SERVE REPORT

Graph Azure Synapse Analytics Power BI Image

Social

STORE IoT Azure Data Lake Storage Gen2

https://azure.microsoft.com/en-us/services/synapse-analytics/ Scalability: All TPC-H Queries at 1PB Scale! Elastic DQP – Unlimited Scale Data In the Cloud

Network

latency Elasticcompute

Elastic storage

21 Unified Data Suite and Governance

Global apps

Spark, Hive, ML… SQL Azure Cosmos DB

Data Lake Analytics-optimized Update-optimized Document storage

Meta data Meta data Meta data Meta data

XACT_STATE XACT_STATE XACT_STATE

Governance The Trusted Cloud More certifications than any other cloud provider

GLOBAL CSA STAR ISO 27001 ISO 27018 ISO 27017 ISO 22301 SOC 1 Type 2 SOC 2 Type 2 SOC 3 CSA STAR CSA STAR Self-Assessment Certification Attestation

US US GOV Moderate High DoD DISA DoD DISA DoD DISA Section 508 SP 800-171 FIPS 140-2 ITAR CJIS IRS 1075 JAB P-ATO JAB P-ATO SRG Level 2 SRG Level 4 SRG Level 5 VPAT

PCI DSS Shared HIPAA / GxP INDUSTRY CDSA MPAA FACT UK FISC Japan HITRUST MARS-E IG Toolkit UK FERPA GLBA FFIEC Level 1 Assessments HITECH Act 21 CFR Part 11

Argentina EU UK China China China Singapore Australia New Zealand Japan My ENISA Japan CS Spain Spain India Canada Privacy Germany IT

REGIONAL PDPA Model Clauses G-Cloud DJCP GB 18030 TRUCS MTCS IRAP/CCSL GCIO Number Act IAF Mark Gold ENS DPA MeitY Privacy Laws Shield Grundschutz workbook