Building Data Science Ecosystems for Smart City and Smart Commerce

Dr. Alex Liu Chief Data Scientist Analytics Services

1 © 2017 IBM Corporation Alex Liu Introduction

▪ Chief Data Scientist – Analytics Services at IBM ▪ A Data Scientist Thought Leader ▪ Chief Data Scientist for a few corporations before joined IBM ▪ Taught advanced data analytics for the University of South California and the University of California at Irvine ▪ Consulted for the United Nations, Ingram Micro … ▪ M.S. and Ph.D. from Stanford University

2 © 2017 IBM Corporation 3

Data Science: Turning Data into VALUE With Models

Data Science produces insights/values via a complicated processes a big set of tools

BigInsights Cloudant dashDB (HDFS) (DBaaS) (Analytics)

Swift SQDB (Object (Managed Storage DB2) )

3 © 2017 IBM Corporation Data Science projects return very valuable results but a lot failed

▪Netflix, for example, integrates data science into each part of their business; they estimate a billion dollars in incremental value from their personalization and recommendation alone. ▪Knight Capital Group, for instance, lost $440 million in 45 minutes after a mistake in updating a model (New York times). ▪Gartner estimated that 60% of big data projects fail in 2016, and in 2017.

4 © 2017 IBM Corporation Reasons why data science projects failed

▪Wrong data ▪Wrong question ▪Week stakeholders commitment ▪Lack of diverse expertise per CIO DIVE

▪Many research leads to conclude mismatch among question, data, model, researchers, tools … as main reason ▪Due to bad coordination of parts involved

5 © 2017 IBM Corporation 6 WHY - Data Science very complicated Very Complicated enough just for Model Building Stage • More than 50 different models: SVM, Neural Net, Decision Trees/Forests, Naïve Bayes, Regression, SMO, k-nearest Neighbor, Clustering, Rules, …

• Combinatorially explosive number of parameter choices per algorithm: kernel type, pruning strategy, number of trees in a forest, learning rate, …

• Wide variation in performance across different algorithm implementations (e.g., SPSS vs Python vs WEKA vs SPARK …)

• User-Defined algorithms

• Substantial cost in user and compute time • User spends time on trying new combinations and parameters • Computational cost for training a single SVM can exceed 24h • Selection commonly based on data scientist bias

• Each additional pipeline stage increases complexity dramatically! 6 © 2017 IBM Corporation 7

WHY - challenges for managing data scientists

▪High turn over rate of data scientists (average tenure 18 months) ▪Ever developing new technologies to master ▪Lack of training for data scientists

Data Prep Exploration

Modeling

Evaluation Deployment

7 © 2017 IBM Corporation A data science ecosystem approach A data science ECOSYSTEM has three basic elements

1) Data portal, 2) Data Science platform, 3) Data Science community

© 2017 IBM Corporation The defining characteristics of an ecosystem - mutuality & orchestration

Markets comprise entities that operate Ecosystems comprise entities that operate out of individual self-interest out of orchestrated, mutual shared-interest

A set of individuals or organizations who A set of individuals or organizations who exchange products or services within an formally or informally operate together to environment governed by the laws of supply produce something of greater value for the and demand mutual benefit of the ecosystem as a whole

Ecosystems exists because operating in an orchestrated environment, participants can deliver more value within the ecosystem acting together than acting alone

9 © 2017 IBM Corporation Value creation in traditional data science tends to be linear; value creation in ecosystems tends to be networked …

Value creation in a value chain Value creation in an ecosystem

Cost Cost Cost Cost Cost plus plus plus plus plus return

▪ Value creation is incremental as organizations ▪ Value capture reflects a networked, dynamic, cover costs plus some return on assets everyone-to-everyone process of exchange ▪ Value capture reflects an additive, sequential ▪ Ecosystems produce more value as a whole, process of exchange than the sum of the individual participants acting independently

10 © 2017 IBM Corporation Ecosystems can yield substantial benefits

Increase Success Ratio of Data Science Projects Embrace ecosystems’ strategic potential

New capabilities Improved access Improved Agility

Ecosystems enable organizations Ecosystems support greater Ecosystems support quick to access critical capabilities that access to new or different creation of new types of products, they would otherwise have resources such as new talents, with different combinations of difficulty obtaining new tools, new data sets organizations and assets

11 © 2017 IBM Corporation 1) An ecosystem for city data science

Applications Events Data Social Media Optimizing Operations Solutions

101 010 Analytical Connecting all Insights for 101 the data Smart Cities scientists from a DS community IoT Data Platform ~ IBM DSX

12 © 2017 IBM Corporation RMDS Communities at IBM Glendale

▪Pasadena/Glendale Meetup Community ▪Local face to face community – more than 1100 members ▪https://www.meetup.com/RMDS_LA/ ▪https://www.linkedin.com/groups/1895501 has 29K participants Aim to create an environment for utilizing big data analytics to make smart cities

2 former CDOs and the current CDO are members, also presented their work here.

13 © 2017 IBM Corporation Data.gov citizen data science ecosystem with open data 105,000+ collections 349 citizen apps 500,000 data resources 175 agencies 450 APIs

Source: City of LA Mayor’s Tech Advisor Presentation at RMDS Meetup.

14 © 2017 IBM Corporation a data science platform – the IBM Data Science Experience - now IBM Studio Local IBM Data Science Experience

Community Open Source IBM Added Value

• Find tutorials and datasets • Code in Scala/Python/R/SQL • Watson Machine Learning

• Connect with Data Scientists • Jupyter Notebooks • SPSS Modeler Canvas

• Ask questions • RStudio IDE and Shiny • Advanced Visualizations

• Read articles and papers • Apache Spark • Projects and Version Control

• Fork and projects • Your favorite libraries • Managed Spark Service

Powered by IBM Watson Data Platform

* Closed beta

15 https://datascience.ibm.com/ © 2017 IBM Corporation 16 IBM Confidential Overview

IBM Data Science Experience (DSX – now Watson Studio Local) is a social environment where you can collaborate to solve data challenges with the best tools and latest expertise.

▪ Community ▪ Projects ▪ Notebooks ▪ Data Asset ▪ Model Management ▪ Admin console

17 © 2017 IBM Corporation Forrester Research Ranks IBM as a Leader in Multimodal Predictive Analytics and Machine Learning IBM puts AI to work. IBM Watson is a vast umbrella of technologies and solutions, one of which is Watson Studio, a PAML solution. Watson Studio was designed from the ground up to aesthetically blend SPSS-inspired workflow capabilities with open source machine learning libraries and notebook- based interfaces.

It is designed for all collaborators — business stakeholders, data engineers, data scientists, and app developers — who are key to making machine learning models surface into production applications. Watson Studio offers easy integrated access to IBM Cloud pretrained machine learning models such as Visual Recognition, Watson Natural Language Classifier, and many others.

It is a perfectly balanced PAML solution for enterprise data science teams that want the productivity of visual tools and access to the latest open source via a notebook-based coding interface.

Source: “The Forrester WaveTM: Multimodal Predictive Analytics And Machine Learning Solutions, Q3 2018”, Forrester Research, September 2018 Los Angeles City as the Best Using Data in the United States

19 http://www.ibm.com/weather EX2 – IBM Weather Data 1km Visible (GOES-R will be even better) EX: Weather Data Serving Retails Weather Data + WATSON Studio + RMDS Community A data science ecosystem with weather data

Applications Weather Data Transaction Optimizing Operations Solutions

101 010 Analytical Connecting all Insights for 101 the data Smart scientists from Commerces a DS community IoT Data Platform ~ IBM DSX Five steps for building successful data science ecosystems

1 Know your ▪ Understand your value chain ecosystem and identify new ▪ Understand your ecosystem opportunities

5 2 ▪ Define measurement ▪ Identify and prioritize Measure your Recognize your model and collect data success and decide capabilities and ecosystem value pools ▪ Refine business model next steps identify your gaps and ecosystem partnerships

4 3 ▪ Choose the right partner ▪ Understand your capabilities Make connections Identify ecosystem relative to your business ▪ Engage in the right way in pursuit of your value and how you objectives might capture it model ▪ Determine what to invest in and what to partner for Notices and disclaimers

Copyright © 2018 by International Business Machines Corporation (IBM). as illustrations of how those customers have used IBM products and No part of this document may be reproduced or transmitted in any form the results they may have achieved. Actual performance, cost, savings or other without written permission from IBM. results in other operating environments may vary.

U.S. Government Users Restricted Rights — use, duplication or disclosure References in this document to IBM products, programs, or services does not restricted by GSA ADP Schedule Contract with IBM. imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business. Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as Workshops, sessions and associated materials may have been prepared by of the date of initial publication and could include unintentional technical or independent session speakers, and do not necessarily reflect the typographical errors. IBM shall have no responsibility to update this views of IBM. All materials and discussions are provided for informational information. This document is distributed “as is” without any warranty, purposes only, and are neither intended to, nor shall constitute legal or other either express or implied. In no event shall IBM be liable for any damage guidance or advice to any individual participant or their specific situation. arising from the use of this information, including but not limited to, loss of data, business interruption, loss of profit or loss of opportunity. It is the customer’s responsibility to insure its own compliance with legal IBM products and services are warranted according to the terms and requirements and to obtain advice of competent legal counsel as to conditions of the agreements under which they are provided. the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions IBM products are manufactured from new parts or new and used parts. the customer may need to take to comply with such laws. IBM does not In some cases, a product may not be new and may have been previously provide legal advice or represent or warrant that its services or products will installed. Regardless, our warranty terms apply.” ensure that the customer is in compliance with any law.

Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice.

Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented

IBM Analytics University 2018 Notices and disclaimers continued

Information concerning non-IBM products was obtained from the suppliers Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise Document Management of those products, their published announcements or other publicly available System™, FASP®, FileNet®, Global Business Services®, sources. IBM has not tested those products in connection with this Global Technology Services®, IBM ExperienceOne™, IBM SmartCloud®, IBM publication and cannot confirm the accuracy of performance, compatibility Social Business®, Information on Demand, ILOG, Maximo®, or any other claims related to non-IBM products. Questions on the MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, capabilities of non-IBM products should be addressed to the suppliers of PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®, those products. IBM does not warrant the quality of any third-party PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®, products, or the ability of any such third-party products to interoperate with Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®, IBM’s products. IBM expressly disclaims all warranties, expressed or StoredIQ, Tealeaf®, Tivoli® Trusteer®, Unica®, urban{code}®, Watson, implied, including but not limited to, the implied warranties of WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of merchantability and fitness for a particular, purpose. International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might The provision of the information contained herein is not intended to, and be trademarks of IBM or other companies. A current list of IBM trademarks is does not, grant any right or license under any IBM patents, copyrights, available on the Web at "Copyright and trademark information" at: trademarks or other intellectual property right. www.ibm.com/legal/copytrade.shtml. IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS,

IBM Analytics University 2018