Microsoft R Explained
Total Page:16
File Type:pdf, Size:1020Kb
Microsoft R Explained Dane Stubben QuintilesIMS Database Manager Dane@KelsandDane TAILGATE When: Between sessions Where: Parking lot Why: Get up Move around Network Grab a drink or snack What is R? . Powerful statistical programming language . Data Visualization tools . Scalable to Big Data . Most widely used data analysis software . Used by 2M+ data scientists, statisticians, and analysts . Thriving open-source community . Leading edge of analytics research . Provides a suite of operators for calculations on arrays, lists, vectors, and matrices. Provides graphical facilities for data analysis and display. 3 | 11/20/2016 | @2016 KelsandDane All Rights Reserved History of R . 1993: Research Project . Ross Ihaka and Robert Gentlemen – Auckland, NZ . 1995: Open-source Release . Compatible w/ the IBM S statistical language . 1997: R Development Core Team . 2000: R 1.0.0 Release . 2003: R Foundation . 2004: First international user conference . 2007: Revolution Analytics founded . 2013: Revolution R Open Release . 2015: Microsoft acquires Revolution Analytics 4 | 11/20/2016 | @2016 KelsandDane All Rights Reserved Microsoft R Editions . Microsoft R Open . Open-source R Distribution . Enhanced and distributed by Revolution Analytics . SQL Server R Services or R Services (In-Database) . Built-in Advanced Analytics . Standalone Server Capability . Integrated w/ SQL 2016 Enterprise . Used to develop and deploy R packages in a development environment . Microsoft R Server . Microsoft R Server for Redhat Linux, SUSE Linux, Teradata DB, Hadoop on Redhat . Microsoft R Server Developer Edition . Integrated w/ SQL 2016 Enterprise 5 | 11/20/2016 | @2016 KelsandDane All Rights Reserved Microsoft R Editions . Microsoft R Client . Separate, free installer . Develop solutions that can be deployed to R Services (In-Database) or Microsoft R Server running on Windows, Teradata, or Hadoop . Microsoft Data Science Virtual Machine . Azure VM pre-installed and configured with common data analytics and machine learning tools: . Microsoft R Server Developer Edition . Anaconda Python distribution . Jupyter notebook (w/ R, Python kernels) . Visual Study Community Edition . Power BI desktop . SQL Server 2016 Developer Edition . Machine Learning tools: . Computational Network Toolkit (CNTK) . Vowpal Wabbit . XGBoost . Rattle . Mxnet . Libraries in R and Python for Azure Machine Learning . Git 6 | 11/20/2016 | @2016 KelsandDane All Rights Reserved I Did The Math . Microsoft R Open . Multi-threaded . Microsoft R Server . Capacity . Handles large size datasets and models . Speed . Overcome R’s traditional memory limits . Parallelize across cores and nodes . Minimize data movement w/ in-database 7 | 11/20/2016 | @2016 KelsandDane All Rights Reserved RStudio 8 | 11/20/2016 | @2016 KelsandDane All Rights Reserved Microsoft R Client 9 | 11/20/2016 | @2016 KelsandDane All Rights Reserved R Services (In-Database) 10 | 11/20/2016 | @2016 KelsandDane All Rights Reserved R Libraries / Components . ScaleR . Collection of proprietary functions in Microsoft R Client and R Server used for practicing data science at scale . Works on both small and large datasets . Enables analysis of very large data sets that would otherwise exceed the memory and processing capabilities on the machine. DeployR . Turns R Scripts into analytic web services 11 | 11/20/2016 | @2016 KelsandDane All Rights Reserved Microsoft Data Science Virtual Machine 12 | 11/20/2016 | @2016 KelsandDane All Rights Reserved Advanced Analytics w/ Data Science 13 | 11/20/2016 | @2016 KelsandDane All Rights Reserved Data Science Focus . Big Data Engineering . Data Visualization . Advanced Analytics . Cybersecurity . Healthcare . Preventative Policies (Reactive vs Proactive) . Virulent outbreak . Intervention Deployment . Personalize Healthcare Delivery . Artificial Intelligence / Machine Learning . Deep Learning . Computer Vision . Natural Language Processing . Autonomous Systems (Robots!) . GIS . Research . Politics . Finance . Marketing . Education . Sports Analytics 14 | 11/20/2016 | @2016 KelsandDane All Rights Reserved Data Science Skillset Computer/DB Computer Mathematics / Statistics Business Skills Tools Languages Database Linear Algebra R SQL SSIS / BI R Statistics Python Hadoop Data Modeling RStudio Logic Spark / Scala Oracle Communication Microsoft R Open Discrete Optimization Data Visualization MongoDB Analytical Microsoft R Client Curiosity Calculus Machine Learning Continuous Microsoft R Server Learner Artificial Intelligence Tenacity R Services (In-Database) Julia Adaptability Microsoft Data Science VM Java SAS T-SQL Tableau MATLAB Octave 15 | 11/20/2016 | @2016 KelsandDane All Rights Reserved Build-out Training Environment R Environment Microsoft Environment Route: SQL R Services – Download and Install Feature in SQL Server 2016 Developer Microsoft SQL R Developer – Download and Install R System Route: R – Download and Install OR RStudio Server – Download and Install 16 | 11/20/2016 | @2016 KelsandDane All Rights Reserved Build-out Training Environment R Client Microsoft Environment Route: Microsoft R Client – Download RStudio– Download and Install R System Route: RStudio– Download and Install 17 | 11/20/2016 | @2016 KelsandDane All Rights Reserved Build-out Training Environment R Examples MRAN Tutorials R Services (In-Database) Data Science Walkthroughs R Examples 18 | 11/20/2016 | @2016 KelsandDane All Rights Reserved Microsoft R References . Microsoft R . MRAN Packages . R Project for Statistical Computing . Comprehensive R Network . R Services for SQL Server 2016 (YouTube) . R Services for SQL Server 2016 KB . Data Science with SQL Server R Services 19 | 11/20/2016 | @2016 KelsandDane All Rights Reserved Community and Learning . Twitter . @MicrosoftR . @revoDavid . @RBloggers . @BecomingDataSci . @MicrosoftR . @RWomenTaskForce . Online Community . PASS Big Data Virtual Chapter . PASS Data Science Virtual Chapter . PASS Women in Technology Virtual Chapter . Microsoft R Server Tiger Team <- How cool of a team name is that? . Online Training . DataCamp . Pluralsight . Coursera . edX . Lynda.com <- Omaha/Lincoln libraries have free access . Microsoft Virtual Academy . MIT 20 | 11/20/2016 | @2016 KelsandDane All Rights Reserved Becoming a Data Scientist References . Data Science Learning Camp . Doing Data Science . The Data Science Handbook 21 | 11/20/2016 | @2016 KelsandDane All Rights Reserved Thank our Sponsors! 22 | 11/20/2016 | @2016 KelsandDane All Rights Reserved Questions? Dane Stubben Twitter: @kelsanddane URL: KelsandDane Email: [email protected] Event & Session Evals – ONLINE ONLY Event: http://www.sqlsaturday.com/552/EventEval.aspx Session: http://www.sqlsaturday.com/552/sessions/sessionevaluation.aspx 23 | 11/20/2016 | @2016 KelsandDane All Rights Reserved.