Google Data Analytics Solutions Overview
Total Page:16
File Type:pdf, Size:1020Kb
Happy Aloha Friday! Workshop #1 Data Analytics & Visualization Daniel Liu, Google hacc.hawaii.gov April 24, 2020 Happy Aloha Friday! Welcome from ETS Marc Masuno Cyber Security Manager hacc.hawaii.gov April 24, 2020 Logistic 01 Welcome from ETS 02 1:00 PM to 2:30 PM 03 About Google Meet 04 Introduce to the Google Team 05 Introduce to the HACC Committee 06 Workshop 06 Q & A Confidential & Proprietary Google Meet Option 1: Join Hangouts Meet Meeting ID meet.google.com/odb-krud-dvu Option 2: Phone Numbers ( US ) 475-329-7374 PIN: 438 611 547# Confidential & Proprietary Google Meet Confidential & Proprietary The Google Team Daniel Liu Amanda Stange Rob Grace Cloud Customer Engineer Account Executive Cloud customer Engineer [email protected] [email protected] [email protected] The HACC Committee Google Data Analytics and Visualization Solutions Overview April 24, 2020 01:00 PM ~ 02:30 PM https://hacc.hawaii.gov/ Daniel Liu, [email protected] Customer Engineer Agenda 01 Data Challenges 02 Our Approach to Data Analytics 03 Modernize Your Data Warehouse 04 Big Data & Hadoop 05 Analyze Streaming Data in Real Time 06 Data Visualization Tools 07 Predictive Analytics & Machine Learning 08 How to Get Start with GCP Confidential & Proprietary Google Mission Statement Organize the world’s information and make it universally accessible and useful Google Mission Statement Organize the world’s information and make it universally accessible and useful Data Volume Growth Digital Information Measurement Unit Data Volume Growth Survey in 2009 ● 2K - A typewritten page ● 5M –The complete works of Shakespeare ● 10M – One minute of high fidelity sound ● 2T – Information generated on YouTube in one day ● 10T – 530,000,000 miles of bookshelves at Library of congress ● 20P – All hard disk drives in 1995 ● 700P – Data of 700,000 companies with Revenues less than $200M ● 1E – Combined Fortune 1000 company database (1P each) ● 1E – Next 9000 world company databases (average 100T each) ● 1Z – 1000E (Zettabyte–Grains of sand on beaches) ● 100Y –Yottabytes – Addressable memory 128 -bit Global Datasphere Survey by IDC ● IDC defines the "global datasphere" as "the quantification of the amount of data created, captured, and replicated across the world." ● Google Mission Statement Organize the world’s information and make it universally accessible and useful Core tenets 1 2 3 4 5 6 If users can’t If they don’t If they don’t If they can’t If there is not If the web is too spell, it’s our know how to know what speak the enough content slow, it’s our problem. form the query, words to use, it’s language, it’s on the web, it’s problem. it’s our problem. our problem. our problem. our problem. Machine Learning is the new ground for gaining competitive edge & creating business value Competitive advantage ranked as top goal of machine-learning projects for 46% of IT leaders & 50% of adopters can quantify ROI 2X more 5X faster 3X faster data-driven decisions execution decisions than others *Source: MIT Survey 2017; n=375 Bain Consulting Study Confidential + Proprietary First Step in This Journey Begins with Data “Every Company will be a Data Company” *Source: Wired, Bloomberg, Fortune, McKinsey Proprietary + Confidential Confidential + Proprietary Data Challenges 01 20 Data is Everything Companies win or lose based on how do they use it Governments make the right and wrong decisions based on the data they processed You make your personal decision based on the data you collected Confidential & Proprietary Data analytics is still too hard <1% <50% Unstructured Structured Data Data * Harvard Business Review magazine; May-June 2017 22 Data complexities Unstructured data accounts for 90% of enterprise data 1011101 0100101 11010101 0111100 10001101 Legacy Data silos Changing view Regulatory Limited skills, applications everywhere on value of data environment hard to recruit *Source: IDC, Wired 23 Challenges with Big Data Projects Complexity of building and Finding value in existing Collaboration within or 1 maintaining a Big Data system 4 7 data very easily across organizations with consistent ease of use 2 Capture and store all data for 5 Reducing the time from 8 Keep your data secure all business functions data collection to action Continuously accommodating Hurdles to innovate and Keep system greater data volumes and new 9 3 6 iterate with Big Data reliable/running data sources Confidential & Proprietary Challenges with Big Data Projects Complexity of building and Finding value in existing Collaboration within or 1 maintaining a Big Data system 4 7 data very easily across organizations with consistent ease of use 2 Capture and store all data for 5 Reducing the time from 8 Keep your data secure all business functions data collection to action Continuously accommodating Hurdles to innovate and Keep system greater data volumes and new 9 3 6 iterate with Big Data reliable/running data sources Confidential & Proprietary If you want to unlock the power of your data, you need a customer data platform, not just new tools. Confidential & Proprietary “ If Your Organization Isn’t Good at Analytics, It’s Not Ready for AI” *Source: Harvard Business Review Proprietary + Confidential Our Approach to Data Analytics0 2 28 15+ Years of Tackling Big Data Problems Open Source Map Google GFS BigTable Dremel Flume Java Spanner Millwheel Dataflow Tensorflow Papers Reduce Google Cloud Products 2002 2004 2005 2006 2008 2010 2012 2014 2015 2016 29 15 Years of Tackling Big Data Problems Open Source Map Google GFS BigTable Dremel Flume Java Spanner Millwheel Dataflow Tensorflow Papers Reduce Google Cloud Products 2002 2004 2005 2006 2008 2010 2012 2014 2015 2016 30 15 Years of Tackling Big Data Problems Open Source Map Google GFS BigTable Dremel Flume Java Spanner Millwheel Dataflow Tensorflow Papers Reduce Google Cloud Products BigQuery Pub/Sub Dataflow Bigtable ML 2002 2004 2005 2006 2008 2010 2012 2014 2015 2016 31 Serverless data analytics From infrastructure to platform for insights Monitoring Analysis and insights Performance Resource tuning provisioning Analysis and Utilization Handling insights improvements growing scale Deployment & configuration Reliability 32 Enterprise Challenges in Data to ML Journey Data Silos Missing Out Lacks How-To and Legacy on Real-Time Predict Business Systems Insights Outcomes Limits decision-making Rear-view approach Depends on guts for and is time consuming causes business anxiety predicting the unknown Proprietary + Confidential Key Solutions Powered by CloudData Silos Data MissingStreaming out PredictivePredicting Warehouseand Legacy Dataon real-time Analytics Analyticsunknown / ML system insights because business limitationsModern Data Processof rear-view Streaming Anticipateoutcomes customer Warehousing which Dataapproach along with batch needs and automate builds foundation for AI data to generate delivery with Machine real-time insights Intelligence Proprietary + Confidential Complete foundation for data lifecycle Data ingestion Reliable streaming Data warehousing Advanced analytics at any scale data pipeline and data lake Cloud Dataproc Cloud Pub/Sub Data Transfer Service Cloud Dataflow BigQuery Cloud ML Engine Google Data Studio (Hadoop, Spark) Cloud Storage Cloud IoT Core Apache Beam Cloud Dataprep Tensorflow Sheets (Trifacta) Cloud Composer 35 (Apache Airflow) Modernize Your Data Warehouse Get all your business data in one place for faster and comprehensive analysis 0336 Data warehousing for AI-driven business 90’s 00’s Now Next Data warehouses BI foundations Cloud data AI foundations warehousing From 1st-gen EDWs, Data warehousing formed BigQuery represents We’re working to make increased data collection the foundation of reporting a fundamentally different BigQuery the foundation and analysis has helped and business intelligence. approach to cloud data for organizations that will build more data-driven warehousing. leverage machine businesses. intelligence in their businesses. 37 Google Cloud Data Warehouse: Four Typical Flows ETL Analyze Cloud Dataflow Data BigQuery Storage Cloud Storage Relational Data Proprietary + Confidential What is BigQuery? Google Cloud Platform’s enterprise Petabyte-scale storage and queries data warehouse for analytics Encrypted, durable and Convenience of standard SQL highly available Fully managed and serverless Real-time analytics on streaming data 39 BigQuery: architecture Serverless. Decoupled storage and compute for maximum flexibility. SQL:2011 Replicated, BigQuery High-available Compliant distributed storage cluster compute Streaming (99.9999999999% durability) (Dremel) REST API ingest Distributed Web UI, CLI memory shuffle tier Client libraries Free bulk In 7 loading Petabit network languages 40 Introducing BigQuery ML Making machine learning accessible 41 BigQuery ML Execute ML initiatives without empowers data moving data from BigQuery analysts and data scientists Iterate on models in SQL in BigQuery to increase development speed Automate model selection, and hypertuning 42 43 Analyze GIS data in BigQuery with familiar SQL Accurate spatial analyses with Geography data type over GeoJSON and WKT formats Support for coreGIS functions – measurements, transforms, constructors, etc... – using familiar SQL 44 Unlock big data for all users with BigQuery & Sheets gsuite.google.com/bq-sheets “For analysts spread across the globe, this is a blessing. They can now collaborate easily with a streamlined flow for sharing their insights.” -- Nikhil Mishra @ Yahoo 45 See your BigQuery data in one click with Data Studio Explorer Tight