Data Analytics In the Era of Cloud Computing
Raghu Ramakrishnan Technical Fellow CTO for Data
A Microsoft Example of Data and AI Intelligent Local Queries Query: Where's the nearest fruit smoothies Location: Omaha, Nebraska Search queries, views, click throughs, …
World graph • People • Places • Things • Actions
2B+ entities 130B+ Web pages 50B+ facts
Web pages, Web documents, Images, … Microsoft’s Internal Big Data Service
Microsoft’s internal data lake
• A data lake for all teams Enabling business growth: @Microsoft • Good developer tools Office productivity revenue (45%YoY)* Azure Data Lake Store Intelligent Cloud (100% YoY)* • Batch, Interactive, Streaming, ML A data lake for everyone Bing search share doubles • Used across Office, Xbox, Azure, Windows, Ads, Bing, Skype, … • Microsoft’s serverless Big Data platform • Production jobs and experimentation • Fully aligned with Hadoop ecosystem and standards, with full support for Hadoop tools and engines as well as By the numbers unique Microsoft capabilities • 9+ Exabytes of data, 8+ billion files • Migrated to ADLS • 100Ks of physical servers • 1P = 3P • Millions of interactive queries • Huge streaming pipelines • 100Ks of daily batch jobs • 15K+ developers • 300+ teams
5 Tesla’s Digital Feedback Loop
Data & AI
Customers CLOUD Products
Signal Signal
Customer interaction data Driving telemetry captured through instrumented vehicles (EDGE) Forums My Model S takes the curve easily at 45, but I frequently come up on cars who take it at 30, and I have to disengage and hit the brake myself
Jnsparke Tesla’s Digital Feedback Loop
Data & AI Customers Products CLOUD Signal Signal
Customer interaction data Driving telemetry captured through instrumented vehicles (EDGE) Issues identified by combining Forums customer and product signal My Model S takes the curve easily at 45, but I frequently come up on cars who take it at 30, and I have to disengage and hit the brake myself
Jnsparke Tesla’s Digital Feedback Loop
Action Action
Closing the loop with customers Pushes updates for Autopilot over the air
Data & AI Customers Products CLOUD Signal Signal
Customer interaction data Driving telemetry captured through instrumented vehicles (EDGE) Issues identified by combining Forums customer and product signal My Model S takes the curve easily at 45, but I frequently come up on cars who take it at 30, and I have to disengage and hit the brake myself
Jnsparke Tesla’s Digital Feedback Loop
Data & AI Customers Products
Operations Tesla’s Digital
Feedback Loop People
Data & AI Customers Products
Operations The Digital Feedback Loop
People
Signal Action
1 Signal: Action Signal
2 Intelligence: Data & AI
3 Action: Customers Products
Signal Action
Signal Action
Operations Improved decisions at the edge, guided by data analysis in the cloud
Autonomous action with intermittent connectivity USACE NavPortal and Real-time Dredging
ESRI m
➢ NavPortal Bing Data Analytics Trends
➢ Digital loops across Cloud and Edge ➢ Unified Insights and Governance
© Microsoft Corporation Challenges
Large portions of the 1 enterprise are not digitized People
34% of organizations indicate poor data reliability is an obstacle for growth, while 28% cite insufficient data systems
DATA
DATA DATA
Customers Products DATA 3
Generating insights requires a high 2 Data is siloed across different degree of technical expertise parts of the organization 30% cite lack of analytical talent as a hindrance to generating insights 31% indicate data silos prevent their Operations enterprise from maximizing value
Source: PwC Trusted Data Optimization Pulse Survey Organizational Silos Many sources and types of data Siloed Analytic Tools NavPortal Architecture Data Silos Optimized for Different Tasks
Spark, Hive, ML… Azure SQL DW Azure SQL DB Azure Cosmos DB
Data Lake Analytics-optimized Update-optimized Document model
Meta data Meta data Meta data Meta data
XACT_STATE XACT_STATE XACT_STATE
Governance Governance Governance Governance Real Scenarios Involve Many Tasks!
LOB
CRM INGEST EXPLORE PREP & TRAIN MODEL & SERVE REPORT
Graph Azure SQL DW AAS Azure Data Factory Azure Databricks Azure SQL DW Azure Databricks Azure ML
Power BI Image Azure Cosmos DB Azure SQL Database ADLA ASA IOT Hub Event Hub Azure Data Explorer Azure Data Catalog
Social
STORE IoT Azure HDInsight Azure Data Lake Storage Gen2 LOB
CRM INGEST EXPLORE PREP & TRAIN MODEL & SERVE REPORT
Graph Azure Synapse Analytics Power BI Image
Social
STORE IoT Azure Data Lake Storage Gen2
https://azure.microsoft.com/en-us/services/synapse-analytics/ Scalability: All TPC-H Queries at 1PB Scale! Elastic DQP – Unlimited Scale Data In the Cloud
Network
latency Elasticcompute
Elastic storage
21 Unified Data Suite and Governance
Global apps
Spark, Hive, ML… SQL Azure Cosmos DB
Data Lake Analytics-optimized Update-optimized Document storage
Meta data Meta data Meta data Meta data
XACT_STATE XACT_STATE XACT_STATE
Governance Microsoft Azure The Trusted Cloud More certifications than any other cloud provider
GLOBAL CSA STAR ISO 27001 ISO 27018 ISO 27017 ISO 22301 SOC 1 Type 2 SOC 2 Type 2 SOC 3 CSA STAR CSA STAR Self-Assessment Certification Attestation
US US GOV Moderate High DoD DISA DoD DISA DoD DISA Section 508 SP 800-171 FIPS 140-2 ITAR CJIS IRS 1075 JAB P-ATO JAB P-ATO SRG Level 2 SRG Level 4 SRG Level 5 VPAT
PCI DSS Shared HIPAA / GxP INDUSTRY CDSA MPAA FACT UK FISC Japan HITRUST MARS-E IG Toolkit UK FERPA GLBA FFIEC Level 1 Assessments HITECH Act 21 CFR Part 11
Argentina EU UK China China China Singapore Australia New Zealand Japan My ENISA Japan CS Spain Spain India Canada Privacy Germany IT
REGIONAL PDPA Model Clauses G-Cloud DJCP GB 18030 TRUCS MTCS IRAP/CCSL GCIO Number Act IAF Mark Gold ENS DPA MeitY Privacy Laws Shield Grundschutz workbook