<<

FACT SHEET

SAS® Preparation Empower analysts to quickly prepare data for analytics in a self-service, point-and-click environment

What does SAS® Data Preparation do? SAS Data Preparation provides an interactive, self-service environment for users who need to access, blend, shape and cleanse data to prepare it for reporting or analytics. Why is SAS® Data Preparation important? SAS Data Preparation saves time on the preliminary tasks done to prepare data for reporting and analytics. Its intuitive interface provides point-and-click actions for critical functions – no coding or SQL skills required. With simplified data preparation tasks seam- lessly defined as part of the activities involved in analytics processing, users can spend more time analyzing data and less time preparing it. For whom is SAS® Data Preparation designed? It’s designed for business analysts, citizen data scientists and other nontechnical users. Data scientists and IT can use the same interface to prepare reusable plans for business analysts.

To answer time-sensitive business questions, improve everyone’s productivity. Through Key Benefits organizations need fast access to consistent, its self-service tools, SAS Data Preparation Boost productivity through self-service data trusted data that they can use for analytics. empowers business users to take vetted preparation. No specialized skills or coding Without it, they may not be able to respond data from IT and customize it for any report are required to access, merge and shape quickly enough to market and customer or analysis they need. data, and data preparation tasks are defined requirements. But most organizations have within the same visual experience – auto- massive volumes of data spread across silos. Built on SAS® Viya®, the intuitive, visual matically integrated with downstream This raw data often contains errors or is interface of SAS Data Preparation1 makes analytics and reporting tasks. duplicated, outdated or lacks identifiers it easy for business users to quickly prepare needed to merge sources. Preparing it for data without coding or help from IT. The Gain efficiency through reusability, collabo- analytics can consume up to 80 percent of software runs in a fast, in-memory distrib- ration. Automatically generated code and an analyst’s time. uted environment. This frees IT from the defined transformations can be shared with mundane task of provisioning data, and IT and scheduled to run with each source code It’s a frustrating issue for business and IT. business analysts and data scientists get update. Data preparation tasks can be saved Nontechnical users lack the skills to move relevant results that drive faster business in projects, then shared and reused by others. and transform data to make it ready for insights. The interface automatically gener- analytics. Alternatives require extensive ates code that can be scheduled to ensure Empower analytics users with fast results. coding, SQL or scripting knowledge, and currency with source system refreshes. Prebuilt transformations and training in data engineering for extract, trans- Templates can be defined and reused, functions assist users as they explore data, form, load (ETL) tools. In most cases, IT has to promoting sharing and collaboration. refine it and explore some more. And with provision data for business users when they in-memory distributed processing and could have focused on more strategic activi- parallel I/O, responses can be delivered ties. And business users have to wait in line in near-real time. for IT to create their data sources before they can get data in the right form for analytics. Reduce total cost of ownership. Make the most of your existing resources by giving Many organizations want to give business them a visual, interactive interface that users direct access to data to free IT from 1 guides them through routine reporting never-ending custom data requests and SAS Visual Analytics (sold separately) is a required product for SAS Data Preparation. and analytics data preparation tasks, with software that requires very little training. Product Overview SAS Data Preparation provides the type of with downstream reporting and analytics ad hoc environment today’s analytics processing – all from the same intuitive With the volume and variety of data avail- professionals crave. With its simple, interac- interface. Market-validated able today, business analysts need to tive user interface designed for self-service and capabilities are prebuilt for curate data to answer specific questions. data preparation, nontechnical users have quick data vetting and correction. And the This requires different views of the data, flexibility to integrate data from virtually any seamless, consistent user experience which often needs to be examined in source they need, cleanse it and prepare it extends across the entire analytics life cycle. different ways, multiple times a day. Even for analysis quickly and easily. Data can be when IT has prepared and cleansed the loaded in memory so multiple users will Easy-to-use capabilities data for them, analysts still need to itera- share the same view simultaneously. Users’ tively examine and prepare it further for With SAS Data Preparation, it’s easy to data preparation tasks are fully integrated their particular needs. access, integrate, browse and cleanse data. Visually explore external data sources, and big data stores like Hadoop and data in SAS Viya. Create connections to external data sources on the fly – curate what you need, when you need it. And get fast insight into the data by profiling physical metadata information – column names, data types, encoding, column and row counts.

You can access data from flat files, relational data sources, social media sources, SAS data sets, Apache Hadoop, Teradata, CSV files, text files and other sources. Technical users who prefer to code can access the SAS Data Quality routines from SAS code or from third-party coding languages, like Python. Figure 1. Explore data accessed from multiple sources. Speed and scalability High-performance, high-quality data fuels high-performing results. With SAS Data Preparation, users can interactively blend and shape data in near-real time, without having to wait on batch processes. Data preparation functions can be loaded in parallel and processed in memory. For some sources, processing can be pushed to run where the data resides – speeding execution of SAS code, minimizing data movement and delivering rapid responses.

Visual interface for self-service data preparation Business analysts and data scientists can Figure 2. Object lineage shows the relationships between different objects. use the wizard-based interface to access, integrate, view, filter, join, transform, cleanse and query data. Each transformation is designed to guide users through the data orchestration process so they can easily understand the impact of how any single data preparation task affected results. Key Features Variety of prebuilt transformations Data and metadata access • Change case, convert column, Several types of prebuilt transformations • Use any authorized internal source, rename, remove, split, trim are included in SAS Data Preparation – accessible external data sources whitespace, custom calculations. column-based, row-based, code-based and data held in memory in SAS Viya. and multiple-input-based transformations. • View a sample of a table or file These prebuilt transformations assist with loaded in the in-memory engine Row-based transformations filtering, blending, shaping, remediating of SAS Viya, or from data sources • Use row-based transformations and standardizing data. registered with SAS/ACCESS®, to to filter and shape data. see data you want to work with. • Create analytical-based tables using Built-in data quality • Quickly create connections to and the transpose transformation to between external data sources. prepare the data for analytics and Out of the box, SAS Data Preparation • Access physical metadata infor- reporting tasks. includes SAS Data Quality functions to mation like column names, data • Create simple or complex filters to help create analytics-ready data. Functions types, encoding, column count remove unnecessary data. include profiling, casing, standardizing, and row count to gain further parsing, identification analysis and more. insight into the data. Users can generate column-based and Code-based transformations • Data sources and types include: • Write custom code to transform, table-based basic and advanced profile • Access to more than 20 data shape, blend, remediate and stan- metrics to uncover data quality issues and sources and types, including rela- dardize data. get insights into the data itself. Data quality tional , social sources, • Write simple expressions to create and other data preparation tasks are acces- etc. calculated columns, write advanced sible from coding interfaces other than code or reuse code snippets for SAS, including Python.2 Data provisioning greater transformational flexibility. • Import custom code defined by Data governance and lineage • Parallel load data from supported data sources into memory simply by others, sharing best practices and SAS Data Preparation lets users explore the collaborative productivity. selecting them – no need to write code relationships between data sources, data or have experience with an ETL tool.*3 objects and actions taken on the data – • Reduce the amount of data being so it’s easy to trace pipeline activity. Multiple-input-based copied by performing row filtering transformations or column filtering before the data is Collaboration, reuse and • Use multiple-input-based transfor- provisioned. mations to blend and shape data. automation • Blend or shape one or more sets of With SAS Data Preparation, users can Guided, interactive data preparation data together using the guided inter- prepare data for their specific analysis, then face – there’s no requirement to know • Transform, blend, shape, cleanse and save and share transformations so they can SQL or SAS. standardize data in an interactive, be reused later. Templates can be defined visual environment. from a point-and-click interface – or from a • Easily understand how a transfor- coding environment – defining best prac- Data profiling mation affected results, getting • Profile data to generate column- tices for others to use. Template code can visual feedback in near-real time based and table-based basic and also be scheduled as part of IT processing through the distributed, in-memory advanced profile metrics. to keep prepared data current with processing of SAS Viya. • Use the table-level profile metrics to refreshes. • Quickly extract document content uncover data quality issues and get and perform text identification and further insight into the data itself. extraction using batch text analysis. • Drill into column-level profile metrics and see visual graphs of Column-based transformations pattern distribution and frequency distribution results. • Save data plans for quick data • Use a variety of data types/sources preparation jobs (through support (listed previously) to profile data from for wide tables). Twitter, Facebook, Google Analytics • Use column-based transformations or YouTube. to standardize, remediate and shape

2 Such third-party interfaces to SAS are available for data without configuring: download from GitHub. 3 Data cannot be sent back to the following data sources: Twitter, YouTube, Facebook, Google Analytics, Esri; it can only be sourced from these sites. Key Features (continued)

Data quality processing4 System and job monitoring Plan templates and project Data cleansing • Use integrated monitoring capa- collaboration • Find like records and group bilities for system- and job-level • Use data preparation plans together logically. processes. (templates), with one or more • Use locale- and context-specific • Understand how many processes sources of data, to improve parsing and field extraction defini- are running, how long they’re taking productivity. tions to reshape data and uncover and who is running them. • Reuse the templates by applying additional insights. • Easily filter through all system jobs them to different sets of data to • Use the extraction transformation based on job status (running, ensure that data is transformed to identify and extract informa- successful, failed, pending and consistently to adhere to enterprise tion (e.g., name, gender, field, cancelled). data standards and policies. pattern, identify, email and phone • Access job error logs to help • Rely on team-based collaboration number) in a specified column. with root-cause analysis and through a project hub used with • Generate match codes on data troubleshooting. SAS Viya projects. that can be used to perform fuzzy Data import and data preparation matching. Cloud data exchange • Standardize data with locale- and job scheduling • Move data from on-premises loca- context-specific definitions to • Create a data import job from auto- tions to SAS Viya running in a private transform data into a common matically generated code to perform or public cloud. format, like casing. a data refresh using the integrated • Preprocess data locally to reduce scheduler. Identity definition the amount of data that needs to be • Schedule data explorer imports as • Create a unique identity for each moved to remote locations. jobs so they will become an auto- row with the unique ID generator. • Use a command line input (CLI) matic, repeatable process. • Analyze column data using interface for administration and • Specify a time, date, frequency and/or locale-specific rules to determine control. interval for the jobs. gender or context. • Use cloud data exchange to securely • Identify, find and sort data by and responsibly negotiate your tagging columns and tables. on-site firewall. • Use identification analysis to analyze • Create multiple views with different the data and determine its context, tabs, and save the organization of and to identify the subject data in those views. TO LEARN MORE » each column. • Explore relationships between • Use gender analysis to determine accessible data sources, data objects To learn more about SAS Data Preparation the gender of a name using locale- and jobs. system requirements, download white specific rules. • Use the relationship graph to visu- papers, view screenshots and see other ally show the relationships that exist related material, please visit: sas.com/ between objects. data-preparation.

4 Supported data quality transformations rely on SAS Quality Knowledge Base Locales, a locale-specific library of data quality functions available in over 30 locales, included with SAS Data Preparation.

To contact your local SAS office, please visit:sas.com/offices

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Copyright © 2018, SAS Institute Inc. All rights reserved. 109216_G91145.1018