A VA Executive's Guide to Data Transformation

Technology Strategy Notes Vol 3, Issue 6 June 2016 A VA Executive’s Guide to Data Transformation Office of Technology Strategies (TS), Architecture, Strategy & Design (ASD) Introduction Both data mapping and code generation are The TS office within OI&T’s Architec- Have you ever needed to transfer data critical in data querying and data migration, between computer systems? Perhaps you ture, Strategy & Design (ASD) interacts two common processes that require data have performed a data query, a request not only with the ASD pillar offices, but transformation. for data from a database? If so, you prob- also with multiple stakeholders within ably noticed differences in how data is Data Querying OI&T and with strategic offices across digitally formatted and stored. To mitigate the enterprise. TS works closely with IT A data query is the request for the retrieval of risks in achieving strategic data initiatives, and business owners to capture busi- organizations may need to transform those data from a database. Users might run a data ness rules and provide technical guid- differences. query to compile and analyze data of a partic- ance as it relates to Data Sharing ular subject for research purposes, or to re- across the enterprise, specifically for This TS Note focuses on data mapping and trieve all files related to a particular person. interagency operability. code generation as tools for transforming Data querying usually relies on one of three data for data queries and data migrations. methods: It explores database transformations for developing data warehouses, with exam- Searching a database menu using system- ples of how databases are transformed at defined parameters VA. Specifying the fields and values of a sys- Standard Query Language (SQL) & tem-generated blank record Not Only SQL (NoSQL) Data Transformation Writing a query in the stylized coding lan- The most common method to organize a Data transformation is the process of con- guage of the system database is Standard Query Language verting a set of data values from a source, (SQL), pronounced as “sequel.” SQL is a or original, format to the required format Each method of running a data query varies in programming language used to retrieve of a new destination. It involves two inter- complexity and flexibility. The level of com- or update data with a relational database related concepts: data mapping and code plexity is determined by the level of special- management system – a system in generation. ized knowledge that the user needs to run the which data is organized into columns and query; flexibility refers to the user’s ability to rows. Each row is identified by a unique Data mapping shows how data from one define the parameters. key; each column is labeled by a value information system should match data on or attribute that is shared by all rows. Data Migration another system. It provides a visual repre- For example, a sales ledger would in- sentation of the process for linking differ- When an enterprise needs to migrate, merge, clude a column that identifies the num- ent configurations of data, or data models. or consolidate data, due to server replace- ber of units sold, with rows that list each This is useful when an enterprise seeks to ment, for example, it may process an entire unique commodity for sale. extract data from one source to consoli- database. Within the hierarchy of data, related date, or combine, it with another, or to data is joined in fields; related fields create A second common method used to or- consolidate multiple separate databases. records; records create files; and files create a ganize databases is an enhanced SQL- Code generation is the development of the database. based coding language called Not Only technical instructions, or code, which is SQL (NoSQL), a non-relational, distribut- required to transform data. Code is gener- The decision on how the enterprise will trans- ed, and scalable database format. NoSQL ated by a compiler software application, a form the data values in each source database is also open source, meaning that its program that reads the source language of is critically influenced by the way the source is source code is available with a license the data and then translates it into the organized and the business needs of the en- from the copyright holder. Unlike rela- destination language. terprise for its destination database. tional database management systems, 1 Technology Strategy Notes Vol 3, Issue 6 June 2016 A VA Executive’s Guide to Data Transformation Office of Technology Strategies (TS), Architecture, Strategy & Design (ASD) NoSQL is modeled to rely on systems that are not defined by tabular column and row relations. NoSQL systems often use tags associated with the individ- ual data files or key-values that map to the file’s loca- tion in the database. Data Warehousing A data warehouse is creat- ed by interlinking separate databases from disparate sources as a central repository of integrated data that allows users to access Figure 1 – Data Transformation with Business Intelligence Results records, enabling greater data sharing and data analysis. Access (HDA). HDA is a data layer that VA applies to its enterprise shared services – the cost-efficient centralized business operations Creating an enterprise shared data warehouse involves a process that are used by multiple divisions within the enterprise. HDA will of data transformation called Extract, Transform, and Load (ETL). allow VA to configure a standardized enterprise schema, or model, for The data is extracted from various distinct sources and trans- data aggregation, whereby data is gathered and expressed in sum- formed through cleaning, filtering, and validating. The ETL pro- mary form for sharing. Inside the HDA infrastructure, an application cess allows for the transformation of heterogeneous database called a data ingester performs transformation on data as it migrates sources into a single, central repository. from local databases. For more information on HDA at VA, please check out the Hybrid Data Access Enterprise Design Pattern. Data Transformation at VA There are several specific applications that VA relies on to trans- At VA, most applications and programs rely on data transformation form data. One is the eCRUD, the electronic Create, Read, Up- processes to migrate or query data across different source and desti- date or Delete service designed as a part of the Veterans Life- nation formats. As in industry, this is necessary because most sepa- time Electronic Record (VLER) Data Access Service (DAS) project rate business lines have historically built, maintained, and stored that provides users with an interface to perform operations on their data on local databases. Through data transformations, the pub- data access layers. It enables in-house COTS, commercial off-the lic and private sectors are able to achieve goals of data integration -shelf applications, to gain access to authoritative data sources and interoperability - the ability of different information technology through the use of an application programming interface (API). systems and software applications to communicate, exchange data, These APIs provide a level of abstraction to developers so that and use the information that has been exchanged. The more data- they don’t require physical access to databases to obtain data. informed an enterprise, the more enabled it is to make effective deci- For DAS, eCRUD is responsible for transforming data placed into sions. DAS data stores. The use of the eCRUD functionality will expand to serve as a data access mediator for many different VA data- Read more on data transformation and related topics in the Office of bases and data file types. Technology Strategies’ TS Notes and Enterprise Design Patterns. If you have any questions about data transformation, don’t hesitate to Another recent development at VA is the adoption of Hybrid Data ask TS for assistance. 2 .

Load more