Data Science in Industry

Ellie Dobson Pivotal

1

Network intrusion detection networks in telecommunications What has industry ever done for us? Theory-driven approach Data-driven approach

‘start with the system and ‘start with the data and work towards the data’ work towards the system’ How can we make it happen? Value Reactive What will happen? Analytics Predictive Why did it happen? Analytics Diagnostic What happened? Analytics Descriptive Analytics

Complexity The value of data over time drive automated low latency actions Big Data in response to events of interest Size

make insights from a large historical dataset…

… and use them to make split- second decisions on real time data

Fast Data Speed year+ year month day s ms µs drive automated low latency actions in response to events of interest

predictive maintenance payment fraud formula 1 racing Telemetry

Telemetry Car setup

Telemetry Car setup Traffic True positive rate True Telemetry Car setup Traffic Weather

False positive rate Big Data Complex Data

Operational Commercial & Dark Data Social Data Data Public Data TOOLKIT 4 Write Code for Big Data 6 Show Results In-Database Hadoop Visualization 1 Find Data 3 Run Code • SQL • Pig • python-matplotlib • GraphViz • PL/Python • Hive • python-networkx • Gephi Platforms Interfaces • PL/Java • Java • D3.js • R (ggplot2, lattice, • Greenplum DB • pgAdminIII • PL/R • Spark • Tableau shiny) • Pivotal HD • psql • PL/pgSQL • Excel • Hadoop (other) • psycopg2 • SAS HPA • Terminal • AWS • Cygwin 5 Implement Algorithms 7 Collaborate • Putty • Winscp Libraries Python Sharing Tools 2 Write Code • MADlib • numpy • Chorus Java • scipy • Confluence Editing Tools Languages • Mahout • scikit-learn • Socialcast • Vi/ • SQL R • Pandas • Github • Emacs • Bash scripting • (Too many to list!) Programs • Google Drive & • Smultron • Text • Alpine Miner Hangouts • TextWrangler • C++ • OpenNLP • Rstudio • Eclipse • C# • NLTK • MATLAB • Notepad++ • Java • GPText • SAS • IPython • Python C++ • Stata • Sublime • R • opencv fashion analytics video analysis call centre analysis What has industry ever done for us? Thanks for listening

23