DRAFT for discussion only
Data Science Tech Talk
10/5/17 DRAFT for discussion only
Creating a Data Science Center of Excellence at GSA
The Business: FAS, PBS, OGP, GSA IT Core Business Analytics Teams: GSA IT OHRM, OCFO FAS, PBS, OGP, OHRM, OCFO
Application Business Executive
Data Steward
Application Development
Why is this important? 1 Data Modeling Data Science is a 2 concept to unify Programming statistics, data analysis and their 3 related methods in Visualization/Dashboarding/ Reporting order to understand and analyze actual Access & 4 data use cases Authentication 5 Portal Management 6 Analysis, Story Telling & Briefings
Business function Data Science COE IT function Indicates steps within the proposed Data 1 6 Science COE training program 2 DRAFT for discussion only Data Science Practitioner Training– Training Courses scheduled to be delivered to agency beginning FY18
Data Science Practitioner Training 13 required courses spread out throughout FY18
2 3 5 6 Programming & Data Analytics & Reporting Content Management Story Telling &
Title Model Briefing
R & Python Tableau & MSTR D2D, Drupal OCE s Tool Leveraging R & Python to Using a suite of analytics Building and tracking Internal & external run statistical and tools, data is exported to content on D2D portal, presentations around
mathematical scenarios create a high level, includes adding and customer data. Skilled in n detailed dashboards to editing existing datasets, transforming customer & present & analyze data dashboards & data Descriptio market sight into materials models to help GSA and business line forecast
Introduction: Intermediate R Data Modeling Human Identifying your Intro to Stats Advanced Tableau Data Visualization Advanced Microstrategy Centered business problem Data Engineering Design
Beginner Python Beginner Intermediate Tableau D2D Content Beginner R MicroStrategy Intermediate Management
Curriculum Pathway Intermediate Python Beginner Tableau MicroStrategy
Mandatory intro *Note: Calendar subject to change 3
SENSITIVE & PRE-DECISIONAL NOT FOR EXTERNAL DISTRIBUTION DRAFT for discussion only Data Science training aims to foster proficiency in the power of data; COE certification will greatly enhance skillsets
Training Course Skills Obtained Business Value • Perform basic R/Python programming Less reliance on language functions and code writing Programming • be proficient in statistical analysis, contract support for ✓ & Modeling • run R/Python coded programs basic analysis
• Wrangle and clean data to prepare for Enhancing skill sets ✓ Analytics & report building of Federal Reporting • Create dashboard visualizations based on business needs & customer employees requirements
• Create and edit new Reports, Datasets, Increase in self ✓ Content Documents, and Articles for their Customer service Management ✓ Opening doors to • Transform customer and market insights Story Telling & into materials to help GSA and the new opportunities business lines forecast and anticipate Breifing customer needs
In planning stages Currently active 4 Intro to Data Science Virtual Desktop (DSVD)
○ DSVD is a virtual GSA desktop that can be used to access D2D components, and the D2D Data Warehouse
○ GSA’s DSVD provides software tools, data, underlying server hardware and network infrastructure. ■ Users need access to a machine. ■ Specific datasets, tools, other content and components are available to the DSVD user by default. ■ Request process in place to obtain approval for non-public content access. DSVD Tools
Putty Cygwin Shell Alfresco Notepad ++ WinSCP Jasper
FileZilla Python Oracle Java JDK D2D Staging Data MySQL Jasper Studio Warehouse Developer Workbench
November
MicroStrategy Pentaho launch Oracle SQL Developer GIT Languages: Developer SQL, Python, R, Production Access AWS Command Line Java, UNIX Microsoft SQL Interface Scripting Server
Database Access to: Anaconda Management Access to Python Studio JBoss Fuse JBoss DataVirt R Upcoming D2D Component Access JBoss BPMS Pentaho DI RStudio Before entering Data Science Curriculum Pathway the program:
Prep Month 1 Month 2 Month 3 Month 4 Month 5 Month 6
Data Science Human- Overview centered Design
Capstone Intro to Statistics
Python for Data Intermediate Data Data Intro Python Data Modeling Science Python Engineering Visualization
Fund. Methods for Data Intro R Intermediate R Science in R
Beginner Intermediate Advanced MicroStrategy MicroStrategy MicroStrategy
Beginner Intermediate Advanced Tableau Tableau Tableau Beginner Content Mgmt
Electives - Pick Programming & Analytics & Content 1 or more Career Tools Modeling Reporting Management Optional
Data Science Practitioner Interest Form https://docs.google.com/forms/d/e/1FAIpQLSeDQez3GJjH B0vo8Z5yJXstXuBSFBAyQOw1k2E1TbaOno1Ang/viewform
Register for the Oct 20th event https://open.gsa.gov/events/data-science-codealo Links ng View the class info on GitHub (work in progress) https://github.com/GSA/training-pathway-data-practitioner
View the Data Science Practitioner program info (coming soon) https://tech.gsa.gov/ Additional questions can be sent to [email protected]!
CHIEF TECHNOLOGY OFFICE