<<

DRAFT for discussion only

Data Science Tech Talk

10/5/17 DRAFT for discussion only

Creating a Data Science Center of Excellence at GSA

The Business: FAS, PBS, OGP, GSA IT Core Business Analytics Teams: GSA IT OHRM, OCFO FAS, PBS, OGP, OHRM, OCFO

Application Business Executive

Data Steward

Application Development

Why is this important? 1 Data Modeling Data Science is a 2 concept to unify Programming statistics, data analysis and their 3 related methods in Visualization/Dashboarding/ Reporting order to understand and analyze actual Access & 4 data use cases Authentication 5 Portal Management 6 Analysis, Story Telling & Briefings

Business function Data Science COE IT function Indicates steps within the proposed Data 1 6 Science COE training program 2 DRAFT for discussion only Data Science Practitioner Training– Training Courses scheduled to be delivered to agency beginning FY18

Data Science Practitioner Training 13 required courses spread out throughout FY18

2 3 5 6 Programming & Data Analytics & Reporting Content Management Story Telling &

Title Model Briefing

R & Python Tableau & MSTR D2D, Drupal OCE s Tool Leveraging & Python to Using a suite of analytics Building and tracking Internal & external run statistical and tools, data is exported to content on D2D portal, presentations around

mathematical scenarios create a high level, includes adding and customer data. Skilled in n detailed dashboards to editing existing datasets, transforming customer & present & analyze data dashboards & data Descriptio market sight into materials models to help GSA and business line forecast

Introduction: Intermediate R Data Modeling Human Identifying your Intro to Stats Advanced Tableau Data Visualization Advanced Microstrategy Centered business problem Data Engineering Design

Beginner Python Beginner Intermediate Tableau D2D Content Beginner R MicroStrategy Intermediate Management

Curriculum Pathway Intermediate Python Beginner Tableau MicroStrategy

Mandatory intro *Note: Calendar subject to change 3

SENSITIVE & PRE-DECISIONAL NOT FOR EXTERNAL DISTRIBUTION DRAFT for discussion only Data Science training aims to foster proficiency in the power of data; COE certification will greatly enhance skillsets

Training Course Skills Obtained Business Value • Perform basic R/Python programming Less reliance on language functions and code writing Programming • be proficient in statistical analysis, contract support for ✓ & Modeling • run R/Python coded programs basic analysis

• Wrangle and clean data to prepare for Enhancing skill sets ✓ Analytics & report building of Federal Reporting • Create visualizations based on business needs & customer employees requirements

• Create and edit new Reports, Datasets, Increase in self ✓ Content Documents, and Articles for their Customer service Management ✓ Opening doors to • Transform customer and market insights Story Telling & into materials to help GSA and the new opportunities business lines forecast and anticipate Breifing customer needs

In planning stages Currently active 4 Intro to Data Science Virtual Desktop (DSVD)

○ DSVD is a virtual GSA desktop that can be used to access D2D components, and the D2D Data Warehouse

○ GSA’s DSVD provides software tools, data, underlying server hardware and network infrastructure. ■ Users need access to a machine. ■ Specific datasets, tools, other content and components are available to the DSVD user by default. ■ Request process in place to obtain approval for non-public content access. DSVD Tools

Putty Cygwin Shell Alfresco Notepad ++ WinSCP Jasper

FileZilla Python Oracle JDK D2D Staging Data MySQL Jasper Studio Warehouse Developer Workbench

November

MicroStrategy launch Oracle SQL Developer GIT Languages: Developer SQL, Python, R, Production Access AWS Command Line Java, UNIX Microsoft SQL Interface Scripting Server

Database Access to: Anaconda Management Access to Python Studio JBoss Fuse JBoss DataVirt R Upcoming D2D Component Access JBoss BPMS Pentaho DI RStudio Before entering Data Science Curriculum Pathway the program:

Prep Month 1 Month 2 Month 3 Month 4 Month 5 Month 6

Data Science Human- Overview centered Design

Capstone Intro to Statistics

Python for Data Intermediate Data Data Intro Python Data Modeling Science Python Engineering Visualization

Fund. Methods for Data Intro R Intermediate R Science in R

Beginner Intermediate Advanced MicroStrategy MicroStrategy MicroStrategy

Beginner Intermediate Advanced Tableau Tableau Tableau Beginner Content Mgmt

Electives - Pick Programming & Analytics & Content 1 or more Career Tools Modeling Reporting Management Optional

Data Science Practitioner Interest Form https://docs.google.com/forms/d/e/1FAIpQLSeDQez3GJjH B0vo8Z5yJXstXuBSFBAyQOw1k2E1TbaOno1Ang/viewform

Register for the Oct 20th event https://open.gsa.gov/events/data-science-codealo Links ng View the class info on GitHub (work in progress) https://github.com/GSA/training-pathway-data-practitioner

View the Data Science Practitioner program info (coming soon) https://tech.gsa.gov/ Additional questions can be sent to [email protected]!

CHIEF TECHNOLOGY OFFICE