LEVERAGING DATABASE VIRTUALIZATION for TEST DATA MANAGEMENT Vikas Dewangan, Senior Technology Architect, Infosys
Total Page:16
File Type:pdf, Size:1020Kb
WHITE PAPER LEVERAGING DATABASE VIRTUALIZATION FOR TEST DATA MANAGEMENT Vikas Dewangan, Senior Technology Architect, Infosys Abstract Database virtualization is an emerging trend in test data management (TDM) and is all set to bring about a revolution in the provisioning of non-production data. These tools offer efficient virtual cloning of production databases for software application development and testing, without having to physically replicate the entire database. Coupled with powerful features like taking point-in- time snapshots, write backs to the source database, and rolling back changes; these tools have the potential to bring in significant cost savings and efficiency improvements. In this paper, we explore the concept of database virtualization, industry tool features, and how these tools can be leveraged to improve TDM. Introduction Virtualization technologies have been savings. Further, virtualization leads to The field of test data management (TDM) around for a while and were first savings in hardware costs; thus resulting concerns itself with provisioning of introduced in the era of mainframe in decreased energy consumption and production-like data in non-production systems. However, it is only in the recent increased green technology. Given these environments, such as those used for past, due to advances in technology, advantages, virtualization will continue application development and testing that virtualization has been gaining to see a trend of increasing adoption in and is thus a related area. Database popularity. The term ‘virtualization’ refers the years to come. There are a number virtualization is an emerging trend in TDM, to encapsulating a particular hardware of popular virtualization technologies with the potential of offering significant or software resource so that multiple today, such as server, storage, database cost advantages. In this paper we will systems can make use of it. By means of and desktop virtualization. Out of these, explore how database virtualization can virtualization, hardware resources can database virtualization is the focus of this be optimally leveraged to increase TDM be utilized more effectively through paper. effectiveness. consolidation, resulting in significant cost External Document © 2018 Infosys Limited External Document © 2018 Infosys Limited Traditional approach to create subsets of production data and maintained as a common block across entire datasets can be made available to the instances, while modified data is stored The common approach taken by most relevant applications. privately for each clone, leading to enterprises today is to have physical copies optimization of the memory footprint of the production database in each non- Presented below is a simplified view of the production environment. Non-production traditional data refresh process for non- 4. Ease of refresh: Unlike refreshing environments include environments for production environments compared with a physical databases, refreshing an software development, testing, and training. database virtualization-based approach. It existing virtual database clone with should be noted that in some cases there the latest production data is relatively It is not uncommon to find even 10 – 12 may be an intermediate database between simple and straightforward. The clones such production copies in most enterprises. the production and non-production being in-memory instances, the refresh These physical copies may contain environments, which may be used for process is much faster and efficient either an entire copy or a subset of the masking or sub-setting purposes. as there are no physical writes. Most production database. With each additional environment needed there is an increase Traditional ETL approach Database virtualization approach in physical storage requirements, database for non-production environments for non-production environments administration needs and the time required Dev Virtual Clone 1 to setup these databases. Refreshing these Environment 1 non-production databases is a time- Production Production Database Test Virtual Clone 2 consuming and costly exercise. Often, due to Environment 2 Database Database ETL Database contentions, the availability of development / sub-settingutility virtualization engine and test environments becomes a Test Environment 3 Virtual Clone 3 Involves creating physical copies Involves creating virtual clones of the bottleneck in the software development of the production database production database life cycle and can lead to increased overall turnaround times. In summary, the database virtualization engines will traditional approach is complex and costly. Key features of database refresh the virtual instance with changes virtualization tools only, rather than purging and refreshing Database virtualization – a Today, there are a number of vendors the entire database offering products in the database better approach 5. Point-in-time snapshots: Often, in virtualization space. Let us take a look at test environments, there is a need to With the coming of age of database some of the important features and benefits take snapshots of multiple databases virtualization, a gamut of opportunities that these products (though not all) provide: for optimization of non-production at the same point in time to ensure environments has opened up. Several 1. Provision database clones: The consistency of the test data. In the enterprises have started taking cognizance functionality of rapidly provisioning traditional approach, extracting data of this trend and are actively exploring how virtual database clones is at the heart of from multiple databases seldom best they can leverage this technology for all database virtualization tools. These happens simultaneously and there deriving maximum business value. virtual clones can typically be created in is some time difference between a matter of hours, as opposed to physical data extraction processes. This leads So what exactly is database virtualization databases, which may take up to several to referential integrity issues across and how does it work? In the case of a days to setup and load data databases (as several transactions database virtualization-based approach, may have taken place during this time the virtualization engine creates multiple 2. Resource segregation: Each virtual difference), which have to be fixed in virtual clones of the production database. database instance is a private copy and non-production environments. With These virtual copies of the database can is unaffected by changes made to other database virtualization engines, it be utilized in the same way as the actual virtual clones (‘Ring fencing’) is possible to obtain a virtual clone physical copies. They will be transparent 3. Lower resource utilization: Database of each source database against a to the applications in these environments virtualization tools implement highly- specific point-in-time. This results in and will continue to function with these optimized compression algorithms to maintaining the referential integrity virtual database instances in exactly the minimize the in-memory footprint of and consistency of data across the same manner as they did with the physical each cloned database instance. Usually provisioned virtual database instances. non-production databases. There is no need data that is duplicate across clones is External Document © 2018 Infosys Limited External Document © 2018 Infosys Limited 6. Synchronization with source self-service, virtual copies of the given centralization of all administrative tasks database: The updates made to the source databases. leading to ease of administration. There virtual database clones can be written is no need to account for differences 8. Roll back to a previous state: back to the intermediate database in environment configurations in each Most database virtualization tools if required. For example, this might non-production environment where provide a feature for rolling back the be required if the data in the virtual the target database is to be deployed. database state to a previous date. This clone has been augmented for testing is especially useful for testing teams. These features have the potential to bring purposes and a need arises to replicate Often, test data gets corrupted or significant cost savings and efficiency this to other test environments. This changed after a test cycle. By using the improvements from a TDM perspective. feature can also be used to help roll back feature, the test data can be maintain a ‘gold copy’ of the database. restored to a previous consistent state TDM use cases 7. Self-service: Virtualization tools, to enable another round of testing. once setup, are easy to use and can TDM processes can be broadly divided 9. Roll forward: After rolling back, some reduce the dependency on database into data refresh and data provisioning database virtualization tools provide administrators (DBAs) when compared processes. The data refresh process involves the facility to roll forward, if required, to the traditional approach where the refreshing a non-production database with to a specified date in the database physical copy of the database needs required datasets. The data provisioning timeline. to be refreshed. Further, the TDM team process is concerned with mining this data as well as developers and testers, can 10. Ease of administration: Database and allocation of identified records to end be provided the facility to obtain, via virtualization products offer