Leveraging Automated Data Validation to Reduce Software Development Timelines and Enhance Test Coverage

Leveraging Automated Data Validation to Reduce Software Development Timelines and Enhance Test Coverage

• Cognizant 20-20 Insights Leveraging Automated Data Validation to Reduce Software Development Timelines and Enhance Test Coverage By industrializing data validation, QA organizations can accelerate time- to-market and improve the quality and quantity of data analyzed, thus boosting credibility with end users and advancing business transformation initiatives. Executive Summary Our enterprise data testing framework overcomes Quality teams across enterprises expend the impediments of traditional test automation significant effort comparing and validating data solutions and provides the ability to test end- across multitier databases, legacy systems, data to-end data. This solution is intuitive and custom- warehouses, etc. This activity is error-prone, izable based on data testing needs. cumbersome and often results in defect leakage due to heterogeneous, complex and voluminous This white paper illustrates how our new data. It also requires testers with advanced data- dataTestPro solution can be applied across base skills. industries to enhance the data validation process. It also elucidates how this solution significantly Whenever time-consuming manual data valida- reduced manual efforts, and improved QA tion impacts project schedules, testing efforts effectiveness, when deployed at a leading insur- are squeezed to get the project back on schedule. ance organization. Where each and every bit of data is of paramount importance, this is not an acceptable solution Challenges in Data Validation: since it puts the business at significant risk. An Industry Perspective Automating data validation is an optimal solution Homogeneity in test data is a thing of the past. to address these challenges. In fact, heterogeneity of data is now the norm across all industries. A decade ago, a data pool In our view, this requires quality assurance (QA) of 10 million records was considered to be large. organizations to industrialize the data validation Today, the volume of data stored by enterprises process. Therefore, we have created an enter- is often in the range of petabyte or even exabyte. prise solution that streamlines the data validation The reasons: process and assists in the identification of errors that typically emerge across the IT landscape. cognizant 20-20 insights | july 2013 • An increase in mergers and acquisitions, which spreadsheets and CSV and XML files, as well as results in tremendous data redundancy and flat files and columns and rows from multiple vast pools of data requiring validation of com- database vendors’ software. After extraction, plex extract, transform and load (ETL) logic. the transformations of the ETL process need • A greater need for data center migrations. to be verified. This can be scripted, but the effort is often time-consuming and cumber- • An increased management focus on data some. and data-driven decision-making related to business intelligence (BI) initiatives. • Increased testing timelines: Test cycles take longer time to complete, if performed While there is an abundance of hetero- manually, particularly when testing large pools geneous data, there is also a high prob- of data. ability of inherent errors associated with • Test scheduling: Testing teams need to erroneous data, excessive duplication, missing execute data comparison tests after ETL values or conflicting definitions. processing, where the tests are executed at a specified time. For this process, manual These errors lead to cost execution is induced by a trigger. This affects Industry research overruns, missed dead- the scheduling process. suggests that only lines and, most important, three out of 10 a negative impact on the Quality engineers working on data warehousing credibility of the data pro- projects are continuously scouting for ways to organizations view vider. Industry research automate the ETL testing process, but with limited their data to be suggests that only three success. The reason: The difficulty in standardiz- reliable. out of 10 organizations view ing processes across technologies, complex archi- their data to be reliable.1 tectures and multi-layered designs. In addition, Data-related problems result in an average loss of automation tools are expensive and require an roughly $5 million annually; in fact, it is estimated up-front investment and extensive learning curve, that about 20% of these companies experience which prolongs time-to-value. losses in excess of $20 million annually.2 An Integrated Approach to Data However disconcerting these losses are, the Validation intriguing and often unanswered question is: To address the aforementioned demands of data “How are these losses accounted for?” validation, we have created a comprehensive solution that can be used in a variety of data Having an appropriate answer to this question testing areas such as data warehousing, data would improve business operations and aid strate- migration testing and database testing. Our gic decision-making. For data warehousing and data solution, dataTestPro, facilitates the QA process migration initiatives, data validation plays a vital in enterprise information environments and role in ensuring overall operational effectiveness. significantly reduces the manual effort, delivering reduced costs for various “data testing” needs by Challenges in Data Validation provisioning for and managing automated data Data testing is substantially different from con- validations. ventional testing. Quality teams across organiza- tions expend a great deal of effort in making com- As an industry-agnostic approach, dataTest- parisons on huge volumes of data. This entails: Pro has been successfully implemented at large engagements in the banking, brokerage, • Identifying data quality issues: Most organi- insurance and retail industries, and is deliver- zations validate far less than 10% of their data ing immense business benefits to our clients through the use of sampling. This means at (see sidebar, next page). least 90% of their data is untested. Since bad data likely exists in all databases, improving Eliminating Data Quality Issues testing coverage is vital. Our experience with clients shows that organi- • Heterogeneous data: Source data is usually zations can eliminate data quality issues by per- extracted from various sources, including Excel forming the following actions: cognizant 20-20 insights 2 • Streamlining and accelerating data validation. different databases, such as Oracle, Microsoft and • Providing an intuitive, comprehensive and any other JDBC-compliant database. integrated workbench. Increasing Testing Speed Ensuring a faster and high-quality test execu- • Regression testing of data validation is auto- tion cycle. mated by our solution, thereby solving the • Preventing data anomalies by early detection problem of “the need for speed.” Based on our of defects in the testing lifecycle. experience at multiple engagements, data com- • Facilitating extensive reuse of test compo- parisons and reporting is 80 times faster than a nents, reducing time-to-market and simplify- manual process. ing the test management process. Test Scheduling • Increasing test coverage. Comprehensive automation – from scheduling tests through execution, data comparison and dataTestPro validates com- Test cases for reporting – is provided in our solution, thereby plex data transformations, various data increasing the speed of testing, as well as reduc- providing full coverage of ing the cycle time and workload of the tester. The comparison and data and thereby doing the solution provides the tester with the option of following: validation scenarios scheduling tests to run at a specified time. can be created using Performing 95% of the • Execution Steps a data mapping data validation process, exercise. with enhanced coverage Our solution automates data validation as follows: and reduced risk. • Test case creation: Test cases for various data • Providing insightful reports highlighting comparison and validation scenarios can be detailed data differences, down to the individ- created using a data mapping exercise. This ual character level. solution also provides the user an option to create test suites and execute multiple test Automating the entire testing process, from • cases in a single framework execution. scheduling to execution to reporting. • Execute and report: Upon completion of the Comparing Heterogeneous Data test cycle, informative summary and detailed reports are generated. The user interface is dataTestPro compares data between different simple; for example, QA analysts from a non- files and databases after data migration or recon- technical background can configure tests to ciliation. Source data is typically pulled from vari- operate in different modes for different types ous sources such as Excel, CSV, XML, flat file types of comparisons. (any type of delimited file or fixed-width files) and Quick Take dataTestPro in Action Our solution was implemented at a U.S.-based leading insurance and annuities provider. The client faced data validation challenges due to huge data volumes and insufficient test coverage. A typical data transaction involved 1.5 GB of transactional data and over 1,200 ETL transactions. The client was unable to implement standard tools available in the market due to the amount of data originating from hetero- geneous data sources. We implemented dataTestPro in two different projects. With this solution, we were able to reduce data validation efforts by 75% and improve test coverage by 80%. Overall, we reduced time-to-market from

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    5 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us