WHITE PAPER

LEVERAGING VIRTUALIZATION FOR TEST DATA MANAGEMENT Vikas Dewangan, Senior Technology Architect, Infosys

Abstract Database virtualization is an emerging trend in test data management (TDM) and is all set to bring about a revolution in the provisioning of non-production data. These tools offer efficient virtual cloning of production for software application development and testing, without having to physically replicate the entire database. Coupled with powerful features like taking point-in- time snapshots, write backs to the source database, and rolling back changes; these tools have the potential to bring in significant cost savings and efficiency improvements. In this paper, we explore the concept of database virtualization, industry tool features, and how these tools can be leveraged to improve TDM. Introduction

Virtualization technologies have been savings. Further, virtualization leads to The field of test data management (TDM) around for a while and were first savings in hardware costs; thus resulting concerns itself with provisioning of introduced in the era of mainframe in decreased energy consumption and production-like data in non-production systems. However, it is only in the recent increased green technology. Given these environments, such as those used for past, due to advances in technology, advantages, virtualization will continue application development and testing that virtualization has been gaining to see a trend of increasing adoption in and is thus a related area. Database popularity. The term ‘virtualization’ refers the years to come. There are a number virtualization is an emerging trend in TDM, to encapsulating a particular hardware of popular virtualization technologies with the potential of offering significant or software resource so that multiple today, such as server, storage, database cost advantages. In this paper we will systems can make use of it. By means of and desktop virtualization. Out of these, explore how database virtualization can virtualization, hardware resources can database virtualization is the focus of this be optimally leveraged to increase TDM be utilized more effectively through paper. effectiveness. consolidation, resulting in significant cost

External Document © 2018 Infosys Limited External Document © 2018 Infosys Limited Traditional approach to create subsets of production data and maintained as a common block across entire datasets can be made available to the instances, while modified data is stored The common approach taken by most relevant applications. privately for each clone, leading to enterprises today is to have physical copies optimization of the memory footprint of the production database in each non- Presented below is a simplified view of the production environment. Non-production traditional data refresh process for non- 4. Ease of refresh: Unlike refreshing environments include environments for production environments compared with a physical databases, refreshing an , testing, and training. database virtualization-based approach. It existing virtual database clone with should be noted that in some cases there the latest production data is relatively It is not uncommon to find even 10 – 12 may be an intermediate database between simple and straightforward. The clones such production copies in most enterprises. the production and non-production being in-memory instances, the refresh These physical copies may contain environments, which may be used for process is much faster and efficient either an entire copy or a subset of the masking or sub-setting purposes. as there are no physical writes. Most production database. With each additional environment needed there is an increase Traditional ETL approach Database virtualization approach in physical storage requirements, database for nonroduction enironments for nonroduction enironments administration needs and the time required Dev Virtual Clone 1 to setup these databases. Refreshing these Environment 1 non-production databases is a time- Production Production Database Test Virtual Clone 2 consuming and costly exercise. Often, due to Environment 2 Database Dataase EL Dataase contentions, the availability of development susettinutility irtualiation enine and test environments becomes a Test Environment 3 Virtual Clone 3 Involves creating physical copies Involves creating virtual clones of the bottleneck in the software development of the production database production database life cycle and can lead to increased overall turnaround times. In summary, the database virtualization engines will traditional approach is complex and costly. Key features of database refresh the virtual instance with changes virtualization tools only, rather than purging and refreshing Database virtualization – a Today, there are a number of vendors the entire database offering products in the database better approach 5. P oint-in-time snapshots: Often, in virtualization space. Let us take a look at test environments, there is a need to With the coming of age of database some of the important features and benefits take snapshots of multiple databases virtualization, a gamut of opportunities that these products (though not all) provide: for optimization of non-production at the same point in time to ensure environments has opened up. Several 1. Provision database clones: The consistency of the test data. In the enterprises have started taking cognizance functionality of rapidly provisioning traditional approach, extracting data of this trend and are actively exploring how virtual database clones is at the heart of from multiple databases seldom best they can leverage this technology for all database virtualization tools. These happens simultaneously and there deriving maximum business value. virtual clones can typically be created in is some time difference between a matter of hours, as opposed to physical data extraction processes. This leads So what exactly is database virtualization databases, which may take up to several to referential integrity issues across and how does it work? In the case of a days to setup and load data databases (as several transactions database virtualization-based approach, may have taken place during this time the virtualization engine creates multiple 2. Resource segregation: Each virtual difference), which have to be fixed in virtual clones of the production database. database instance is a private copy and non-production environments. With These virtual copies of the database can is unaffected by changes made to other database virtualization engines, it be utilized in the same way as the actual virtual clones (‘Ring fencing’) is possible to obtain a virtual clone physical copies. They will be transparent 3. Lower resource utilization: Database of each source database against a to the applications in these environments virtualization tools implement highly- specific point-in-time. This results in and will continue to function with these optimized compression algorithms to maintaining the referential integrity virtual database instances in exactly the minimize the in-memory footprint of and consistency of data across the same manner as they did with the physical each cloned database instance. Usually provisioned virtual database instances. non-production databases. There is no need data that is duplicate across clones is

External Document © 2018 Infosys Limited External Document © 2018 Infosys Limited 6. Synchronization with source self-service, virtual copies of the given centralization of all administrative tasks database: The updates made to the source databases. leading to ease of administration. There virtual database clones can be written is no need to account for differences 8. Roll back to a previous state: back to the intermediate database in environment configurations in each Most database virtualization tools if required. For example, this might non-production environment where provide a feature for rolling back the be required if the data in the virtual the target database is to be deployed. database state to a previous date. This clone has been augmented for testing is especially useful for testing teams. These features have the potential to bring purposes and a need arises to replicate Often, test data gets corrupted or significant cost savings and efficiency this to other test environments. This changed after a test cycle. By using the improvements from a TDM perspective. feature can also be used to help roll back feature, the test data can be maintain a ‘gold copy’ of the database. restored to a previous consistent state TDM use cases 7. Self-service: Virtualization tools, to enable another round of testing. once setup, are easy to use and can TDM processes can be broadly divided 9. Roll forward: After rolling back, some reduce the dependency on database into data refresh and data provisioning database virtualization tools provide administrators (DBAs) when compared processes. The data refresh process involves the facility to roll forward, if required, to the traditional approach where the refreshing a non-production database with to a specified date in the database physical copy of the database needs required datasets. The data provisioning timeline. to be refreshed. Further, the TDM team process is concerned with mining this data as well as developers and testers, can 10. Ease of administration: Database and allocation of identified records to end be provided the facility to obtain, via virtualization products offer users. Database virtualization is applicable

External Document © 2018 Infosys Limited External Document © 2018 Infosys Limited to the former process viz. data refresh. Let c) Database virtualization with data us look at some relevant cases of how it can masking and synthetic data creation be leveraged for TDM: In practice, there may be several data requirements from data consumers that cannot be fulfilled from production a) Database virtualization in non- databases, as the data is unavailable. This production environments may be the case, for example, in new As we have seen, in the traditional development programs that may require approach, each non-production database is data for testing. In this case, a synthetic a physical copy or subset of the production data creation tool can be used to generate copy. With a database virtualization data, which in turn can be inserted into the tool, the non-production environments intermediate database. As the intermediate are virtual database clones. An added database is being used as the source for the advantage here is that if the data in any database virtualization engine, it will environment is corrupted –- let’s say after ensure that the virtual database clones a round of testing –- it is very easy to reflect the generated data, in addition to revert the data back to its original state. the production data.

Some database virtualization tools provide onroduction enironment setu it support for data masking, which is a key syntetic data eneration and irtual clonin

requirement of most TDM implementations. Data masin irtual utility lone 1 A simplified depiction of this scenario is Mask irtual Production Intermediate lone 2 shown below: Database Database Dataase Data extraction irtualiation susettin irtual enine lone

Data onroduction enironment setuy eneration irtual clonin of roduction dataase Benefits for TDM irtual Clone 1 The key benefits that database Production Database irtual Clone 2 virtualization can provide for TDM are: Dataase irtualiation enine Data 1. Faster test data setup and refresh: masin irtual Clone 3 Database virtualization can be leveraged for rapidly provisioning a b) Database virtualization with data database clone from a source database; masking for example, production, disaster Data masking is often a key TDM recovery, a gold copy etc. Similarly, requirement and not all database refreshing a virtual database clone can virtualization tools support it. In this case, be executed quite quickly. an intermediate database can be setup, which will store the masked data and serve 2. Cost savings: Test databases, as the source database for the database being virtual clones, do not require virtualization engine. The intermediate physical storage space. Also, database database might be a full or partial copy of virtualization tools do not maintain full the production database. Data masking can copies of the data — only the changes be achieved by either using industry in data are tracked and logged. standard TDM tools or custom scripts. This results in significant savings in hardware costs. onroduction enironment setu it data masin and irtual clonin 3. Increased reuse: Most database virtualization tools provide point- Data masin irtual utility lone 1 in-time snapshots. Thus, if there is a

Mask irtual need, for example, to repeat a round Production Intermediate lone 2 Database Database Dataase of testing or a specific test case, it is Data extraction irtualiation susettin irtual enine lone easy to roll back the database to a previous state.

External Document © 2018 Infosys Limited External Document © 2018 Infosys Limited 4. Increased test environment 2. Protect sensitive data: Production to be monitored before and after availability: As virtual database clones databases often contain sensitive virtualization, so that corrective action have a significantly lower resource information that if exposed can may be taken accordingly. utilization as compared to physical result in serious data breaches. It is databases, it is feasible to have larger imperative to have a mechanism in number of test databases at any given place to mask sensitive information Database virtualization point in time. This can significantly so that the virtual database clones product examples eliminate environment contentions provisioned for development and Listed below are a few database and increase productivity. testing are properly desensitized. virtualization tools available in the market One strategy could be to leverage a 5. Enable TDM for DevOps: DevOps today, which can create virtual database database virtualization product that teams can rapidly provision data for clones. As this technology is still maturing inherently supports data masking or dev and test environments using and evolving, the databases supported by even setting up a staging database that self-service features, thus enabling these vendors are likely to increase soon: has desensitized data as the source . database for the virtualization engine # Tool Databases (instead of using the production supported Best practices database directly as the source). 1 Delphix DaaS Oracle, SQL Server, While considering database virtualization 3. Plan for data reusability: As database Sybase, DB2, for TDM, the following best practices states can be easily rolled back and PostGres, MySQL should be followed: rolled forward, a data reuse strategy 2 Actifio Oracle, MS SQL should be put in place to effectively Copy Data Server 1. Conduct product fitment: There are utilize this feature for relevant scenarios Virtualization several robust database virtualization (for example, regression testing cycles). products available in the market today, 3 NetApp Oracle with significantly varying feature- 4. Enable self-service: Relevant FlexClone development / testing team members sets and technology compatibility. 4 EMC XtremIO Oracle, MS SQL Hence, it is an imperative to carry out should be provided access to create Server a comprehensive product fitment virtual database clones to decrease the 5 VMware Oracle, SQL dependency on DBAs. analysis before selecting the right vFabric Data Server, product, keeping in mind the current 5. Monitor performance metrics: Key Director MySQL and long term needs. database performance metrics, such as SQL query response time, need

External Document © 2018 Infosys Limited External Document © 2018 Infosys Limited Conclusion As compared to the traditional approach, database virtualization offers significant opportunities in terms of cost optimization and efficiency improvements in non- production environments. The features offered by these tools are specifically useful for databases used for development and testing purposes. The key benefits of leveraging database virtualization for TDM are faster test data setup, cost savings in disk space and increased reusability of test data. Moreover, the ability to provide self-service, coupled with features like ease of refresh, taking point-in-time snapshots, write backs to the source database and ability to roll back any changes made can be a game-changer in the effective management of test data. Enterprises would be well advised to take a serious look at database virtualization tools, irrespective of whether they have a formal TDM implementation in place.

External Document © 2018 Infosys Limited External Document © 2018 Infosys Limited or more information contact asusinfosyscom

© 2018 Infosys Limited enaluru India ll its esered Infosys eliees te information in tis document is accurate as of its ulication date suc information is suect to cane itout notice Infosys acnoledes te rorietary rits of oter comanies to te trademars roduct names and suc oter intellectual roerty rits mentioned in tis document Excet as exressly ermitted neiter tis documentation nor any art of it may e reroduced stored in a retrieal system or transmitted in any form or y any means electronic mecanical rintin otocoyin recordin or oterise itout te rior ermission of Infosys Limited and or any named intellectual roerty rits olders under tis document

Infosyscom E I tay onnected