Microsoft Explained

Dane Stubben QuintilesIMS Database Manager Dane@KelsandDane TAILGATE When: Between sessions Where: Parking lot Why: Get up Move around Network Grab a drink or snack What is R?

. Powerful statistical programming language . Data Visualization tools . Scalable to Big Data . Most widely used data analysis software . Used by 2M+ data scientists, statisticians, and analysts . Thriving open-source community . Leading edge of analytics research . Provides a suite of operators for calculations on arrays, lists, vectors, and matrices. . Provides graphical facilities for data analysis and display.

3 | 11/20/2016 | @2016 KelsandDane All Rights Reserved History of R

. 1993: Research Project . and Robert Gentlemen – Auckland, NZ . 1995: Open-source Release . Compatible w/ the IBM S statistical language . 1997: R Development Core Team . 2000: R 1.0.0 Release . 2003: R Foundation . 2004: First international user conference . 2007: founded . 2013: Revolution R Open Release . 2015: acquires Revolution Analytics

4 | 11/20/2016 | @2016 KelsandDane All Rights Reserved Microsoft R Editions

. Microsoft R Open . Open-source R Distribution . Enhanced and distributed by Revolution Analytics . SQL Server R Services or R Services (In-Database) . Built-in Advanced Analytics . Standalone Server Capability . Integrated w/ SQL 2016 Enterprise . Used to develop and deploy R packages in a development environment . Microsoft R Server . Microsoft R Server for Redhat Linux, SUSE Linux, Teradata DB, Hadoop on Redhat . Microsoft R Server Developer Edition . Integrated w/ SQL 2016 Enterprise

5 | 11/20/2016 | @2016 KelsandDane All Rights Reserved Microsoft R Editions

. Microsoft R Client . Separate, free installer . Develop solutions that can be deployed to R Services (In-Database) or Microsoft R Server running on Windows, Teradata, or Hadoop . Microsoft Data Science Virtual Machine . Azure VM pre-installed and configured with common data analytics and machine learning tools: . Microsoft R Server Developer Edition . Anaconda Python distribution . Jupyter notebook (w/ R, Python kernels) . Visual Study Community Edition . Power BI desktop . SQL Server 2016 Developer Edition . Machine Learning tools: . Computational Network Toolkit (CNTK) . Vowpal Wabbit . XGBoost . Rattle . Mxnet . Libraries in R and Python for Azure Machine Learning . Git

6 | 11/20/2016 | @2016 KelsandDane All Rights Reserved I Did The Math

. Microsoft R Open . Multi-threaded . Microsoft R Server . Capacity . Handles large size datasets and models . Speed . Overcome R’s traditional memory limits . Parallelize across cores and nodes . Minimize data movement w/ in-database

7 | 11/20/2016 | @2016 KelsandDane All Rights Reserved RStudio

8 | 11/20/2016 | @2016 KelsandDane All Rights Reserved Microsoft R Client

9 | 11/20/2016 | @2016 KelsandDane All Rights Reserved R Services (In-Database)

10 | 11/20/2016 | @2016 KelsandDane All Rights Reserved R Libraries / Components

. ScaleR . Collection of proprietary functions in Microsoft R Client and R Server used for practicing data science at scale . Works on both small and large datasets . Enables analysis of very large data sets that would otherwise exceed the memory and processing capabilities on the machine. . DeployR . Turns R Scripts into analytic web services

11 | 11/20/2016 | @2016 KelsandDane All Rights Reserved Microsoft Data Science Virtual Machine

12 | 11/20/2016 | @2016 KelsandDane All Rights Reserved Advanced Analytics w/ Data Science

13 | 11/20/2016 | @2016 KelsandDane All Rights Reserved Data Science Focus

. Big Data Engineering . Data Visualization . Advanced Analytics . Cybersecurity . Healthcare . Preventative Policies (Reactive vs Proactive) . Virulent outbreak . Intervention Deployment . Personalize Healthcare Delivery . Artificial Intelligence / Machine Learning . Deep Learning . Computer Vision . Natural Language Processing . Autonomous Systems (Robots!) . GIS . Research . Politics . Finance . Marketing . Education . Sports Analytics

14 | 11/20/2016 | @2016 KelsandDane All Rights Reserved Data Science Skillset

Computer/DB Computer Mathematics / Statistics Business Skills Tools Languages Database Linear Algebra R SQL SSIS / BI R

Statistics Python Hadoop Data Modeling RStudio

Logic Spark / Scala Oracle Communication Microsoft R Open

Discrete Optimization Data Visualization MongoDB Analytical Microsoft R Client Curiosity Calculus Machine Learning Continuous Microsoft R Server Learner Artificial Intelligence Tenacity R Services (In-Database)

Julia Adaptability Microsoft Data Science VM

Java SAS

T-SQL Tableau

MATLAB

Octave

15 | 11/20/2016 | @2016 KelsandDane All Rights Reserved Build-out Training Environment

R Environment

Microsoft Environment Route: SQL R Services – Download and Install Feature in SQL Server 2016 Developer Microsoft SQL R Developer – Download and Install

R System Route: R – Download and Install OR RStudio Server – Download and Install

16 | 11/20/2016 | @2016 KelsandDane All Rights Reserved Build-out Training Environment

R Client

Microsoft Environment Route: Microsoft R Client – Download RStudio– Download and Install

R System Route: RStudio– Download and Install

17 | 11/20/2016 | @2016 KelsandDane All Rights Reserved Build-out Training Environment

R Examples

MRAN Tutorials R Services (In-Database) Data Science Walkthroughs R Examples

18 | 11/20/2016 | @2016 KelsandDane All Rights Reserved Microsoft R References

. Microsoft R . MRAN Packages . R Project for Statistical Computing . Comprehensive R Network . R Services for SQL Server 2016 (YouTube) . R Services for SQL Server 2016 KB . Data Science with SQL Server R Services

19 | 11/20/2016 | @2016 KelsandDane All Rights Reserved Community and Learning

. Twitter . @MicrosoftR . @revoDavid . @RBloggers . @BecomingDataSci . @MicrosoftR . @RWomenTaskForce . Online Community . PASS Big Data Virtual Chapter . PASS Data Science Virtual Chapter . PASS Women in Technology Virtual Chapter . Microsoft R Server Tiger Team <- How cool of a team name is that? . Online Training . DataCamp . Pluralsight . Coursera . edX . Lynda.com <- Omaha/Lincoln libraries have free access . Microsoft Virtual Academy . MIT

20 | 11/20/2016 | @2016 KelsandDane All Rights Reserved Becoming a Data Scientist References

. Data Science Learning Camp . Doing Data Science . The Data Science Handbook

21 | 11/20/2016 | @2016 KelsandDane All Rights Reserved Thank our Sponsors!

22 | 11/20/2016 | @2016 KelsandDane All Rights Reserved Questions?

Dane Stubben Twitter: @kelsanddane URL: KelsandDane Email: [email protected]

Event & Session Evals – ONLINE ONLY Event: http://www.sqlsaturday.com/552/EventEval.aspx Session: http://www.sqlsaturday.com/552/sessions/sessionevaluation.aspx

23 | 11/20/2016 | @2016 KelsandDane All Rights Reserved