<<

Economic Value Validation Quantifying the Value of Enterprise -grade Lake Management with Bedrock

By N i k R o u d a , Senior Analyst and Adam DeMattia, Research D i r e c t o r

J u l y 2016

© 2016 by The Enterprise Strategy Group, Inc. All Rights Reserved. EVV Report: Quantifying the Value of Enterprise-grade Data Lake Management with Bedrock 2 Contents

Executive Summary ...... 3 Market Overview ...... 3 Zaloni Bedrock: Qualitative Examples of Customer Benefits ...... 5 Dramatically Reduced Time to Implement Bedrock versus Custom-developed Tools ...... 5 Ease of Offloading Ongoing Enhancements and Operational Management ...... 5 Increases Analyst Productivity while Lowering Expertise Required ...... 6 Reduced Time to Insight Adds to the Bottom Line ...... 6 Zaloni Bedrock: An Economic Value Validation ...... 7 Methodology ...... 7 Economic Value Model Overview ...... 7 Default Scenario ...... 8 Economic Value Validation Results ...... 9 Summary of Results ...... 9 TCO Analysis ...... 10 Benefits Analysis ...... 12 The Bigger Truth ...... 16 Appendix A ...... 17

All trademark names are property of their respective companies. contained in this publication has been obtained by sources The Enterprise Strategy Group (ESG) considers to be reliable but is not warranted by ESG. This publication may contain opinions of ESG, which are subject to change from time to time. This publication is copyrighted by The Enterprise Strategy Group, Inc. Any reproduction or redistribution of this publication, in whole or in part, whether in hard-copy format, electronically, or otherwise to persons not authorized to receive it, without the express consent of The Enterprise Strategy Group, Inc., is in violation of U.S. copyright law and will be subject to an action for civil damages and, if applicable, criminal prosecution. Should you have any questions, please contact ESG Client Relations at 508.482.0188.

© 2016 by The Enterprise Strategy Group, Inc. All Rights Reserved. EVV Report: Quantifying the Value of Enterprise-grade Data Lake Management with Bedrock 3 Executive Summary Organizations are rapidly adopting Hadoop as the foundational technology for building “data lakes” (i.e., Analysis Highlights, Typical Use Case: large clusters of servers that house an organization’s  Modeled 671% return on investment data from which analytics and insights can be extracted).  Estimated six-month payback period However, as many organizations have found, the data lake can quickly result in a quagmire if enterprise-grade  Nearly $9.3M in incremental business tools to manage the data pipeline are not in place. ESG benefits enabled was engaged to develop a detailed economic model to quantify the value of using Zaloni’s Bedrock data lake management platform to build and curate a clean data lake compared with a present mode of operation (PMO) representative of internally developing and integrating custom and commercial ETL and data warehousing solutions to sit on top of the Hadoop cluster. The model and accompanying analysis presented in this report are intended to help organizations determine the relative costs and benefits of leveraging Zaloni to empower a Hadoop data lake compared with layering custom- developed toolsets on top of the Hadoop cluster. The economic value model builds upon in-depth interviews with real-world Zaloni customers and Zaloni technical stakeholders, a review of publicly referenceable customer case studies, and ESG’s quantitative market research library covering the data management market. As discussed in the following pages, based on ESG’s analysis, Zaloni offers an extremely compelling and economically efficient method for building and maintaining enterprise-ready data lakes. For many of the use cases ESG examined, Zaloni was modeled to lower costs for the organization while also adding a significant amount of incremental business value compared with what was expected in the PMO scenario. This was achieved through the avoidance of employing a large development team to build and maintain the tools needed to support the data lake, as well as the value from faster time to insight, attaining more valuable insights, and the ability to democratize analytics by allowing analysts and subject matter experts to query and extract data without IT assistance and development changes. In fact, the replacement of custom-developed and warehousing solutions with Zaloni Bedrock, in a typical use case, yields an estimated 671% ROI over a three-year time horizon—lowering total cost of ownership (TCO) by about $1.6M while adding incremental IT efficiencies in the range of $4.5M, user productivity benefits in the range of $3.5M, and time-to-value and delay-avoidance benefits in the range of $1.3M. For organizations looking for an economically efficient way to build and manage a Hadoop data lake, Zaloni offers an extremely compelling value proposition. This report summarizes the rigorous research ESG conducted to quantify the costs and benefits of Zaloni and communicates the results of this analysis. Market Overview In the early days of big data, the focus was on delivering distributed storage and analytics functionality aligned with the now famous “Vs” of data: volume, velocity, and variety. Speed, scale, and support for different sources of data were the main goals. Although building a big data platform with these attributes was a good start, it was not wholly sufficient. Platforms that meet only these demands will remain limited in usefulness for an enterprise, hence the derogatory usage of terms like “science project” or “data swamp” to describe impractical or incomplete offerings. What’s missing then? Rather a lot, actually. For a data lake to evolve from a nice concept into a valuable utility for the business, there are additional needs. These include some less flashy but more mission-critical capabilities, which may be the provenance of the . While data scientists get the glory of building a new analytics model, and data architects put together the appropriate environment, the data steward looks after all the other essential elements of governance. Governance includes topics like:  Data definitions and classification: Ingesting data at scale into the lake is critical, but doing so in a way in which is captured and searchable is crucial to reducing time to insight for end-users.

© 2016 by The Enterprise Strategy Group, Inc. All Rights Reserved. EVV Report: Quantifying the Value of Enterprise-grade Data Lake Management with Bedrock 4

: Users must have confidence that the data they are working with has fidelity to the source. Many analyses are simply not feasible if visibility into where the data is coming from and how it is used cannot also be reported on.  and preparation: The data lake must be able to ingest data regardless of source. Additionally, the ability to automate and orchestrate workflows like watermarking data, converting formats, and integrating data is critical to keeping organizations from becoming bogged down in a data swamp.  Usage policy and controls: While providing access to centrally stored data is a key aim of the data lake, it is also important to maintain access control. Providing automatic means to mask or tokenize data from users who do not have permission prevents sensitive data from being overly democratized.  Self-service extraction: Putting data in the hands of end-users is the end goal of all big data initiatives. The most efficient way to do so is to remove IT and development organizations as the bottlenecks on the path to insight. End-users should be able to query and extract data without submitting a request to IT, which may take weeks or months to process, in order to unlock the value of data in a timely fashion. Sadly, not all businesses have a dedicated data steward with the time and resources to handle all these issues. Worse, few big data platforms have fully embraced these requirements yet, but it’s obvious that any serious effort will have to address these topics. Otherwise, the organization is exposed to all kinds of serious risks, ranging from sorely misleading analyses to the severe repercussions of sensitive data breaches. The business ramifications of building a data lake without these common-sense controls are frightening. So why don’t more vendors and organizations start with these concepts front and center? Perhaps because this is the less exciting detail work, but also perhaps because it’s hard to do with existing tools. The Hadoop community has now recognized that this maturity is table stakes for success. Various distributions are beginning to realize the need and fill in the gaps, but with great differences in functionality today. Complicating matters is the fact that 40% of potential Hadoop adopters expect to have multiple distributions plus some open source software thrown into the mix.1 Partial coverage is better than nothing, but managing distinct frameworks, each with their own functionality, increases effort and complexity without adding much assurance of complete governance. Uniform coverage of the above topics is mandatory; there is no comfort in having “sort of” clean data, “somewhat” locked down access, or “kind of” secure data lakes. Complete and consistent is difficult to manage, yet business and regulatory requirements demand a solution be found. Zaloni has set out to fill this gap with its Bedrock platform—a data pipeline platform that allows the organization to build and maintain data lakes at immense scale and in an operationally efficient manner, while providing and access controls to satisfy business requirements. The remainder of this document outlines the outcomes of ESG’s research into the economic results Zaloni customers are achieving.

1 Source: ESG Research Report, Enterprise Big Data, BI, and Analytics Trends: Redux, to be published.

© 2016 by The Enterprise Strategy Group, Inc. All Rights Reserved. EVV Report: Quantifying the Value of Enterprise-grade Data Lake Management with Bedrock 5 Zaloni Bedrock: Qualitative Examples of Customer Benefits As discussed, Zaloni aims to deliver an innovative data management platform that allows organizations to build Key Customer Benefits Summary: actionable data lakes. Bedrock enables building and  Dramatically reduced time to implement maintaining a data lake at scale without requiring Bedrock versus custom-developed tools inflexible and expensive custom development efforts that fall short of delivering the data quality, lineage, and  Ease of offloading ongoing security functionality that enterprises demand. enhancements and operational management Clearly, leveraging Bedrock should have a number of positive economic implications for organizations. To  Increases analyst productivity while accurately and defensibly quantify these benefits, real- lowering expertise required world experiences must be gathered, vetted, and  Reduced time to insight adds to the interpreted. To accomplish this goal, ESG interviewed bottom line current Bedrock customers to better understand their usage of, and the benefits associated with, Bedrock in order to inform and validate the assumptions used in ESG’s EVV modeling. Based on these interviews, ESG concludes that the benefits of deploying Bedrock compared with alternative means of building and maintaining the data pipeline are numerous and diverse. ESG’s findings with respect to customer benefits are presented quantitatively in the EVV scenario analysis discussed in this report, but they are also summarized qualitatively—in the customers’ own words—in this section.

Dramatically Reduced Time to Implement Bedrock versus Custom-developed Tools Developing data management tools from scratch with enterprise-grade data quality and security capabilities is hard. Add to that the fact that the Hadoop technology underpinning modern data lake endeavors is nascent and developers with the requisite skills are scarce and expensive, and the draw of commercially developed data management technologies is clear. In fact, one of the first—and often times foremost—values that Zaloni Bedrock offers to organizations is the ability to build the data lake fast: “We estimated that we would need 2-3 developers working on the data lake initiative full-time. We also would have needed a senior architect above them. We hoped to get it done in 6 months but knew in reality it would take over a year. With Zaloni we were up and running in a matter of a few weeks.” “Zaloni has a high level of knowledge, when it comes to Hadoop they know the space. By working with them we were able to hit production in 3 months, doing it on our own could have taken years, potentially.” Moreover, as several customers recounted, even if the organization fiercely wanted to develop its own tools and was willing to wait for those homegrown tools to mature before deploying them to production, actually finding developers and engineers with the expertise to develop for a Hadoop environment may have been the gating factor. For this reason, working with a vendor like Zaloni—with its foundation in big data consulting and professional services—holds a lot of value. “There is a shortage of skilled resources that know Hadoop. That was our single biggest barrier to building our own tools. My team was very good at things like Teradata and SQL, but they did not know HDFS or MapReduce. We were Hadoop challenged.” “For every 300 data engineers out there maybe one or two has really good Hadoop experience. We estimated 6- 8 months, and $2-3M development to roll out just the basics, assuming we could find the people.”

Ease of Offloading Ongoing Enhancements and Operational Management Another aspect many customers considered before selecting Zaloni was the development burden of maintaining tools after they were built. As any homegrown application owner can attest, the initial lift of standing up the application often pales in comparison to the development resources required to continually evolve the application

© 2016 by The Enterprise Strategy Group, Inc. All Rights Reserved. EVV Report: Quantifying the Value of Enterprise-grade Data Lake Management with Bedrock 6 over time to account for new feature and functionality requests, all while ensuring subsequent code additions are compatible with the initial roll out. Working with Zaloni removes this burden from the organization. Moreover, future enhancements are not limited to the features an organization’s internal users request. Rather, Zaloni’s entire customer base helps shape the product roadmap: “I would need at least two engineers to support and maintain applications over time. This equates to at least $300K in staff OPEX vs. our licensing cost with Bedrock. Not to mention, we are able to leverage enhancements that serve Zaloni’s entire community of customers, not just what our internal people request.” “Sure there were the costs to initially build the environment, but we added to that the ongoing costs from having to build our own operational dashboards and developing our own support tools. That math further tipped the scales in Zaloni’s favor.”

Increases Analyst Productivity while Lowering Expertise Required Of course, the desired end state of any big data initiative is to unlock the value of data. This is not accomplished simply by making the IT organization’s life better. While that tactical consideration does call out economic differences between working with a vendor like Zaloni and an organization deciding to “roll its own” data management platform, the strategic consideration is deriving value from the data in a more effective fashion. The Bedrock customers that ESG spoke with talked at length about how they are able to make analysts more productive as a result of a better curated and easier to interact with data lake. Bedrock allows for more self-service in the data lake environment, helping to put data directly in the hands of end-users and removing IT and developers as a gateway to the data: “Bedrock’s graphical user interface allows non-technical users, those that are business analysts who should be closest to the data, [to] load and query data themselves. Tools we would have built internally and even commercial products require a SQL loader and knowledge [of] ETL processes to use well.” “The data catalog is very important when managing data sources and files at scale. Having a searchable catalog has helped business analysts a lot. They don’t have to worry about data naming conventions, for example. They could just find what they need with keyword searches and browsing.”

Reduced Time to Insight Adds to the Bottom Line Beyond increasing the productivity of analysts, there is a significant amount of money on the table for organizations that use those analyses to directly drive revenue. These implications were spoken to at length by several of the Zaloni users ESG spoke with: “The 2-3 man years we saved for building out our data lake translate to something more like 6 months or 1 year of actual calendar time. Having the data lake up and running over that calendar time led directly to over $500K in top line revenue.” “Analyst throughput is key to our business model because we actually sell data. We have a team of 50-75 revenue producing analysts. If they are 10% more productive, which I would argue is a conservative number, it translates to a lot of money.” These insights are just a sampling of the benefits Zaloni customers reported to ESG. The remainder of this paper discusses the process of quantifying these benefits in ESG’s Economic Value Model and discusses the model outputs for a hypothetical customer scenario.

© 2016 by The Enterprise Strategy Group, Inc. All Rights Reserved. EVV Report: Quantifying the Value of Enterprise-grade Data Lake Management with Bedrock 7 Zaloni Bedrock: An Economic Value Validation

Methodology For a discussion of the research and modeling methodology that ESG adhered to in the process of writing this report, please see Appendix A.

Economic Value Model Overview As noted in Appendix A, ESG’s EVV methodology compares two scenarios: The first is an organization that elects to build and manage its Hadoop data lake with Zaloni; the second is the PMO. The basic profiles for each scenario are:  Zaloni scenario: In this scenario, the customer is leveraging Zaloni’s Bedrock software, orchestrating its Hadoop nodes to ingest, process, tag, and transform data in the data lake. The software is licensed based on the number of processor cores across all nodes in the Hadoop cluster. The model takes into account all environment costs including those associated with Hadoop nodes; a Hadoop software distribution subscription; the Zaloni software subscription; maintenance and support over time for hardware; data center infrastructure costs associated with power and cooling the cluster; and IT staff requirements to administer the cluster.  PMO scenario: In this scenario, the customer is using a mix of commercial and homegrown data preparation, ETL, and data warehousing tools—skewing to the homegrown end of the spectrum—to ingest, process, tag, and transform data in the Hadoop cluster. The model takes into account all environment costs including those associated with Hadoop nodes; a Hadoop software distribution subscription; hardware and software product costs for commercial data engineering, preparation, and warehousing tools; maintenance and support over time for those tools; data center infrastructure; and data engineering and development resources to implement and administer the cluster and associated tools. The costs and benefits used as the basis of comparison between both scenarios include:  In the Zaloni scenario, hardware costs include costs for the nodes in the Hadoop cluster. This compares with the hardware costs in the PMO for the Hadoop cluster, plus the hardware costs associated with any commercial data preparation and warehousing solutions selected.  In the Zaloni scenario, software costs include costs for the Hadoop distribution running on nodes plus the Zaloni software subscription. This compares with the software costs in the PMO for running Hadoop on the cluster, plus the software costs associated with any ETL and warehousing software licenses estimated for the environment.  Data center infrastructure costs include power, cooling, and rack space opportunity costs incurred to support both the Zaloni and PMO environments, the lion’s share of which are driven by the Hadoop nodes present.  Maintenance and support costs estimated to be incurred on an annual basis either for hardware (nodes and data warehousing hardware, if applicable) deployed, or software to be maintained (not applicable in the Zaloni use case, which is supported under a software subscription).  Relative staff personnel costs to employ FTEs to develop, support, and operate the Zaloni and PMO environments.  IT efficiency improvements, which measure improvements in administration, development, and user support over time enabled by the analytics solution selected.  User productivity improvements, which measure improvements in analytics availability, self-service capabilities, and frequency of helpdesk tickets submitted enabled by the analytics solution selected.  Analytics time-to-value measurements, which include the value derived from the elimination of development delays associated with analyst requests enabled by the solution.

© 2016 by The Enterprise Strategy Group, Inc. All Rights Reserved. EVV Report: Quantifying the Value of Enterprise-grade Data Lake Management with Bedrock 8

Simply put, ESG’s model estimates the likely cost and potential benefits of supporting analytics requirements using Zaloni Bedrock or the PMO.

Default Scenario To illustrate the relative costs and benefits of leveraging Zaloni versus the custom-developed PMO, ESG developed a set of model inputs representative of a typical big data use case. The use case in question consists of a 50-node Hadoop cluster housing 50 TB of data. This cluster is driving analytics for an analyst user community of seven within the organization initially, which is estimated to triple over the next three years to 21 analysts. To account for the value enabled by a self-service Zaloni environment, which allows non-experts to use Zaloni’s GUI interface to automate ingestion, create workflows, and build queries, each analyst in the PMO scenario is assumed to submit five requests for modifications to the analytics environment to internal development teams. Ten percent of requests are expected to be delayed an average of 1.5 months. Each month of delay is assumed to carry with it an opportunity cost of $10,000 per month. Additionally, analysts are estimated to be marginally more productive in the Zaloni scenario, creating 10% more analyses per year when supported by a cleaner, easier to interact with data lake. To account for the more specialized and cumbersome nature of managing a suite of different custom-developed and commercial data preparation and data warehousing tools, the PMO scenario assumes a total of nine administrators and engineers are employed to manage the complete solution initially, including the Hadoop cluster. ESG’s model assumes that the more cohesive and easier to manage Zaloni environment will be managed by a team of only three administrators. Moreover, although the PMO is considered to be composed of chiefly internally developed tools, material commercial product costs are estimated for the PMO in the areas of ETL, and and data warehousing hardware and software. These product costs include $50K in annual data engineering, preparation, and ETL software licensing and $450K in annual data management spend. These costs (and associated capabilities) are replaced in the Zaloni scenario by Zaloni software licensing costs. These and other key assumptions can be reviewed in tabular format in Table 1. Table 1. Key Default Use Case Assumptions

Parameter Default Use Case Nodes in Hadoop cluster 50 Capacity of Hadoop cluster 50 TB Initial number of analysts supported by the environment 7 Additional number of analysts to be supported over the next three years 14 Analytics environment development requests submitted per analyst per year 5 Percentage of development requests that are delayed / Average development time 10% / 1.5 months required for delayed requests Relative number of administrators and engineers required to manage Hadoop plus 9 / 3 custom-developed and commercial tool environment vs. Hadoop plus Zaloni environment Percentage of data assumed to be enriched when Zaloni is in place 25% Relative increase in analyses per analyst when Zaloni is in place 10% Time horizon of the analysis 3 years Average annual salary for an IT administrator $80,000 Average annual salary for a or development resource $150,000 Average annual salary for a business analyst $100,000 Labor burdening rate to account for fully loaded staff costs 40% Source: Enterprise Strategy Group, 2016.

© 2016 by The Enterprise Strategy Group, Inc. All Rights Reserved. EVV Report: Quantifying the Value of Enterprise-grade Data Lake Management with Bedrock 9 Economic Value Validation Results

Summary of Results With the model parameters tuned to the default assumptions in Table 1, ESG’s analysis concludes that the net benefits of implementing Zaloni to support the organization’s analytics environment greatly outweigh the associated costs. Table 2 shows the modeled return on investment (ROI), project payback period, net present value (NPV), annual total cost of ownership (TCO), and annual benefit over the time horizon for a Zaloni deployment compared with a similarly sized approach involving a reliance on a mix of custom-developed and commercial ETL and data warehousing tools layered on the Hadoop cluster. The following section details the most compelling findings from this analysis as they relate to the costs and benefits associated with Zaloni and how they differ from alternative analytics approaches. Table 2. Economic Value Summary, Zaloni versus the PMO

Payback Period Net Present Scenario Project ROI Annual TCO Annual Benefit (months) Value (NPV) Zaloni 671% 6 $5,565,220 $576,933 $4,449,372 PMO 20% 32 ($106,999) $1,130,283 $1,357,504 Source: Enterprise Strategy Group, 2016.

Annual TCO Annual TCO is the sum of all the cost categories included in the analysis, averaged over three years. As displayed in Table 2, the annual TCO for Zaloni is estimated as $576,933, a significant 49% savings compared with the PMO. However, TCO should be only one part of the customer consideration when weighing available data analytics approaches. As shown in Table 2—and discussed in this report section—the lower costs associated with Zaloni are augmented by significant benefits in the area(s) of increased IT efficiency, improved user productivity, and improved analytics time to value.

Annual Benefit Annual benefit is the sum of all the estimated benefit categories included in this analysis, averaged over three years. As displayed in Table 2, the annual benefit associated with Zaloni is estimated as $4,449,372, compared with $1,357,504 for the PMO. The modeled annual costs and benefits for both scenarios is shown graphically in Figure 1. Figure 1. Annual TCO and Benefit, Zaloni versus the PMO

Comparative Annual Benefit Comparative Annual Cost

$5,000,000.00 $1,200,000

$4,000,000.00 $1,000,000 228% increase $800,000 49% decrease $3,000,000.00 $600,000 $2,000,000.00 $400,000 $1,000,000.00 $200,000 $- $- Zaloni Data PMO Zaloni Data PMO Management Platform Management Platform

Source: Enterprise Strategy Group, 2016.

© 2016 by The Enterprise Strategy Group, Inc. All Rights Reserved. EVV Report: Quantifying the Value of Enterprise-grade Data Lake Management with Bedrock 10

ROI ROI is a profitability ratio for investments. It is calculated by dividing the net benefits of an investment (i.e., the total benefits minus the associated costs) by the total cost of the investment. A positive ROI indicates that total benefits exceed the costs of the investment. As displayed in Table 2, the modeled ROI for Zaloni using the inputs defined in Table 1 is 671% (a significantly higher ROI than is estimated for the PMO, representing the pain many organizations feel when using traditional and custom-developed toolsets to attempt to maintain their Hadoop data lake—or swamp).

Payback Period Payback period is an estimate of when a customer will start to see a positive return from the analytics solution she selects; it measures benefits achieved over time and costs incurred over time and indicates the investment’s breakeven point. As displayed in Table 2, the expected payback period for a Zaloni deployment in an environment described by the inputs in Table 1 is roughly six months (significantly shorter than the payback period estimated for the PMO, which is not modeled to reach a breakeven point until 32 months into the deployment).

Net Present Value (NPV) NPV is a measure that calculates the difference between the present value of cash returns and the present value of cash outflows associated with a project. It assumes a discount rate to calculate the present value of future returns. This metric is commonly used in accounting organizations to evaluate projects; initiatives with positive NPVs are generally considered to be worthwhile investments. As displayed in Table 2, the modeled NPV for Zaloni using the inputs defined in Table 1 is in excess of $5M (note that the PMO scenario yields a negative NPV, which is further evidence of the pain associated with custom-developed analytics approaches to Hadoop).

TCO Analysis For the hypothetical customer scenario described in Table 1, the itemized three-year TCO for Zaloni—compared with the TCO estimated for the PMO—is displayed in Table 3. As shown, from a TCO perspective, Zaloni is expected to be significantly less expensive than the PMO over a three-year time horizon. Table 3. Three-year TCO, Zaloni versus the PMO

Category Zaloni PMO Hardware $250,000 $469,156 Software $922,800 $904,870 Infrastructure $27,500 $28,600 Maintenance and Support $135,000 $660,974 Professional Services $17,500 $15,000 Staff Personnel $378,000 $1,312,250 Total three-year costs $1,730,800 $3,390,850

Source: Enterprise Strategy Group, 2016.

Major Cost Differences for Zaloni and the PMO  Hardware: It is important to note that in ESG’s model, since both environments include an identical Hadoop investment, the hardware and software costs incurred by the customer are assumed to be identical for that part of the environment. However, those costs are still included in the scope of the model because those costs, while identical, are required to enable the overall analytics solution. With respect to hardware specifically, ESG’s model assumes a per-Hadoop-node hardware cost of $5,000. In both scenarios, the hypothetical organization is assumed to incur a $250K hardware cost to stand up the Hadoop cluster.

© 2016 by The Enterprise Strategy Group, Inc. All Rights Reserved. EVV Report: Quantifying the Value of Enterprise-grade Data Lake Management with Bedrock 11

Costs begin to diverge due to a major differentiator of Zaloni. That is, Zaloni upsets the typical big data analytics workflow expected with custom-developed and commercial tools. While the process of loading raw data into the Hadoop cluster is identical, in a traditional workflow, data must still be prepared with extract, transform, and load processes to make it query-able. Then, the data must be loaded into a so that it can be arranged into query-able tables. In the PMO, this workflow includes a mix of custom-developed and commercial ETL and warehousing. Despite the PMO being skewed heavily toward the homegrown end of the spectrum, ESG’s model still estimates a hardware expense allocated to data warehousing solutions. ESG’s model correlates total data warehousing product costs (software, hardware, and maintenance) in the PMO scenario to the size of the Hadoop cluster to be supported. By default, ESG assumes an organization will incur $45,000 in annual warehousing costs for every five Hadoop nodes in the cluster in the PMO scenario. In ESG’s view, this is a conservative figure that aligns to the likely costs an organization would incur leveraging relatively minimal commercial data warehousing investments and relying heavily on internal development. The result is a total of $1.35M in data warehousing costs over three years in this scenario. ESG’s model allocates these costs into hardware, software, and maintenance with the assumption that maintenance will be incurred as an annual expense equal to 18% of hardware and software capital purchases. Upfront hardware costs are estimated to make up 20% of upfront capital purchases, and software makes up the remaining 80%. The resulting hardware cost ascribed to the PMO’s data warehousing solution supporting a 50-node Hadoop cluster is $219,156. The incremental data warehousing hardware expenses estimated for the PMO make up the delta between the modeled hardware costs in the two scenarios and accounts for the 47% reduction in hardware costs estimated by ESG in this analysis.  Software: As discussed, Zaloni Bedrock removes the need for different ETL and warehousing toolsets. Instead, Bedrock provides a platform to ingest, track, and scale the data within the data lake. Bedrock automates tagging of files as they are loaded into the Hadoop cluster and can even run transformations as data is ingested. Orchestration is provided throughout and if a job fails, users can quickly and easily identify where the issue occurred and what went wrong. Bedrock, acting as an end-to-end platform for data lake management, replaces all of the software functionality traditionally provided by point products for ETL, data quality, metadata management, masking, and tokenization, with a much more integrated and agile delivery mechanism. The impact on software costs when comparing a scenario where commercial and custom-developed tools are used with a scenario where Zaloni is used is clear: Any spend dedicated to ETL and other point products for data management software licensing is replaced with a subscription to Zaloni. For the scenario ESG examined in this report, ESG’s model allocates Zaloni licenses for 400 cores in the cluster (assuming 8 cores per node). The total software cost for Zaloni for this configuration over 3 years is $772,800. Another $150K over three years is added to account for Hadoop licensing, yielding a total software cost of $922,800 over three years in the Zaloni scenario. By contrast, ESG’s model estimates that to support the 50-node Hadoop environment, an organization is likely to spend in excess of $97K on data preparation software and nearly $660K on additional data management software to provide data quality, metadata management, cataloguing, and capabilities. Again, these are conservative figures that reflect an approach to big data analytics that heavily leverages in-house development. Adding the same $150K Hadoop cluster software costs yields a total software spend of nearly $905K over three years, with over $800K being an upfront capital expenditure on software compared with the Zaloni scenario, which allows much of the cost to be deferred (as an annual subscription) over the time horizon, lowering the cost barrier to entry.  Maintenance and support: As discussed, the organization is anticipated to save significantly on data lake hardware and software by choosing to invest in Zaloni as opposed to traditional and custom-developed tools. However, it is important to note that these savings are compounded by the consumption model of

© 2016 by The Enterprise Strategy Group, Inc. All Rights Reserved. EVV Report: Quantifying the Value of Enterprise-grade Data Lake Management with Bedrock 12

the solution. Traditional tools require significant upfront costs when bought, but they also carry with them significant maintenance costs on an annual basis—in ESG’s model, a best practice assumption of 18% of capital costs is utilized. By contrast, Zaloni software is a subscription model in which maintenance and support are covered within the subscription cost. The only maintenance cost estimated by ESG’s model in the Zaloni scenario is hardware maintenance for Hadoop nodes. Based on the $250K hardware capital expenditure estimated in the Zaloni scenario, an additional $45K annual maintenance charge is estimated, yielding total maintenance costs of $135K over the three-year time horizon. In the PMO scenario, several additional components require maintenance over time. ETL and data management software and associated hardware must be supported. Based on the capital costs estimated, annual data warehousing hardware maintenance is estimated at $39,448 and annual data warehousing software maintenance is estimated at $118,344. Additionally, annual ETL software maintenance is estimated at $17,532. In total, with Hadoop software maintenance included, maintenance and support costs for the PMO are estimated in excess of $620K. In this analysis, Zaloni is estimated to reduce an organization’s maintenance cost over three years by 80%.  Staff: In addition to significant hardware and maintenance savings, Bedrock’s differentiated architecture and fully integrated platform approach to building and maintaining a data lake also enables significant savings related to the staff required to operate the environment. In a traditional environment, multiple development experts are required, from data scientists and Hadoop experts to manage ETL processes and tools, through data warehouse architects and administrators to manage and the data warehouse. By contrast, Bedrock shifts the development and maintenance burden to the vendor and greatly diminishes the need for specialized labor to develop and manage each tier of the analytics solution. ESG’s model represents this shift by reducing the number of fulltime equivalents needed in the environment. For an analytics environment consisting of 50 Hadoop nodes and a suite of custom-developed and commercial tools, ESG’s model estimates five junior-level administrators and four more specialized engineers would be required initially. However, to manage a comparable Hadoop-plus-Bedrock environment, ESG’s model estimates only one junior-level administrator and two more specialized engineers would be required initially. Moreover, to account for analytics growth and complexity over time, at the end of the three-year time horizon, administrator headcount in the PMO is assumed to grow to eight FTEs, while in the Zaloni scenario administrator headcount is estimated to only grow to two FTEs. Similarly, the number of engineering FTEs in the PMO scenario is estimated to grow to seven, compared with just three in the Zaloni scenario. Additional burdening (to account for the fully burdened cost of labor) and productivity correction (to account for the true amount of time an FTE can dedicate to operational tasks) assumptions are utilized by ESG’s model. In total, over the full time horizon, Zaloni’s staffing costs are estimated as $378K, while the PMO’s staffing costs are estimated at $1.3M. The decision to invest in Zaloni to build and maintain the data lake is estimated to reduce staff costs by 71% over three years.

Benefits Analysis Potential customers evaluating different approaches to building out a data lake must be cognizant of the benefits— in this analysis, broken down into IT efficiency savings, user productivity improvements, and analytics time-to-value improvements—they will achieve from that technology solution. The three-year incremental benefits for Zaloni Bedrock compared with the PMO alternative ESG developed are displayed in Table 4. As shown, ESG’s modeled assumptions and analysis result in significant incremental business benefit for an organization electing to invest in Zaloni.

© 2016 by The Enterprise Strategy Group, Inc. All Rights Reserved. EVV Report: Quantifying the Value of Enterprise-grade Data Lake Management with Bedrock 13

Table 4. Three-year Incremental Benefits, Zaloni versus the PMO

Incremental Business Benefit Category Zaloni Bedrock

IT efficiency improvements $4,530,586 System administration $829,448 Customer support $583,955 Analytics development and customization $3,117,184 User productivity improvements $3,451,576 Impact of platform administration $925,330 Improvements in reporting and analysis $2,497,705 Improvements in platform support $28,540 Analytics time to value improvements $1,293,441 Total incremental benefits over three years $9,275,604

Source: Enterprise Strategy Group, 2016.

Major Benefit Differences for Zaloni versus the PMO Benefits were calculated based on observations and estimates related to the value of Zaloni—as compared with the PMO—obtained through ESG qualitative research with real-world Zaloni customers, literature reviews, and in-depth interviews with technical stakeholders at Zaloni. As discussed, ESG’s model quantifies significant cost savings in the area of staffing when comparing Zaloni with the PMO. However, ESG’s model goes beyond purely cost-centric measures related to IT. ESG’s model also quantifies specific workflows and tasks that are expected to be improved through the deployment of Zaloni and quantifies the value of empowering IT and technical resources assigned to the ownership of analytics solutions to be more productive with their time. Key IT efficiency benefit assumptions for Zaloni compared with the PMO are:  System administration: In the PMO approach to data analytics on Hadoop, multiple complex systems are included in the solution build. Elements like ETL tools and data warehouse solutions must be separately developed, configured, deployed, integrated, expanded, and upgraded over time. These processes are eliminated in the Zaloni scenario, through the operation of a single platform (Bedrock), which is incrementally enhanced by Zaloni over time and delivered to the organization in the form of new releases. Thus, the time and effort expended on these tasks can in turn be utilized for other processes that deliver value for the organization. To capture this difference, ESG’s model measures expected architecture and planning workflows, deployment workflows, and moves, adds, and changes (MAC) needed for each component over time in the environment. These workflows are correlated to the scale of the Hadoop cluster supported; the larger the Hadoop cluster, the greater the quantified time lost to component administration in the PMO scenario. For example, for every Hadoop node in the cluster, an incremental three and a half hour ETL setup cost is configured. For the scenario modeled, which consists of 50 Hadoop nodes initially, 175 man-hours are allocated just to ETL tool setup and configuration. This logic is followed for planning, setup (both initially and as new Hadoop nodes are added to the cluster), and ongoing update workflows (assuming two updates per year for each solution component) across ETL and data warehousing solution components. In total, ESG’s model estimates increased efficiency to be observed in the Zaloni use case to be in excess of $829K over the full three-year time horizon.  Customer support: Beyond the setup, configuration, and efficiencies created for the IT organization, ESG’s model estimates significant value related to customer support. This benefit area encapsulates two separate benefits associated with Zaloni.

© 2016 by The Enterprise Strategy Group, Inc. All Rights Reserved. EVV Report: Quantifying the Value of Enterprise-grade Data Lake Management with Bedrock 14

First, as a tightly integrated and enterprise-grade platform, resolving issues for users is assumed to be less burdensome than resolving helpdesk tickets in the PMO environment. In the PMO scenario, multiple IT specialists may need to be involved to determine the root cause of a given user’s issue, and fixes may require multiple stakeholders to take action or develop a feature enhancement. To represent this improvement, ESG’s model utilizes a constant assumption that each user will submit .5 helpdesk tickets related to the analytics solution per-year, per-server in the Hadoop cluster. Helpdesk events are scaled by the number of Hadoop nodes to reflect that larger, more complex environments require additional user hand-holding. However, ESG’s model assumes that resolution of the tickets submitted by users requires one-third the IT resources to resolve in the Zaloni scenario: 30 man-minutes per-event versus 90 man- minutes in the PMO. As the number of users leveraging the data lake grows over time, the delta between Zaloni and the PMO grows commensurately. The other component to this value area is the IT efficiency created through the use of a single platform versus the integration of separate data management solutions. As any IT administrator can attest, managing a heterogeneous environment with many solutions deployed brings issues when something goes wrong. It can be hard to track down the issue and often necessitates, in the case of homegrown applications, stakeholders from multiple development teams to determine what the true issue is and who owns the responsibility of implementing the fix. In single-vendor environments, IT can often take advantage of the “one throat to choke” to shorten the time and frustration associated with break/fix events. ESG’s model quantifies this advantage by assuming that one issue requiring development resources will be submitted for each server in the Hadoop cluster each year. Again, this assumption is scaled by the number of nodes to represent that larger environments are harder to support than smaller environments. However, in the Zaloni example, 30 man-minutes per event are allocated to resolving the issue. By contrast, in the PMO scenario, 120 man-minutes are allocated. In the aggregate, ESG’s model allocates a total of nearly $600K in increased IT efficiency over three years to account for the ability to more easily respond to user tickets and more efficiently implement fixes when issues arise.  Analytics development and customization: This paper has already discussed, through customer anecdotes, how building a data lake with custom-built tools can take months or even years of both labor and calendar time. Unfortunately, for nearly all organizations approaching analytics this way, this is far from a one-time pain point. In the PMO scenario, subsequent improvements and scaling of the data lake can involve additional development resources over time. For dynamic organizations, truly looking to capitalize on and unlock the value of their data, this development and iteration process is a non-starter. Zaloni improves upon, or in many cases eliminates, the development tasks organizations have come to expect when doing analytics on Hadoop. This differentiator delivers value to the organization in multiple ways: users are more productive, analytics are available for use sooner, and IT and development groups no longer need to spend time wrangling data and coding tools. The IT efficiency benefits tied to analytics development and customization deals with this last benefit area. To quantify this benefit, ESG’s model first quantifies the likely man-months dedicated to analytics development in the PMO scenario. For a growing analytics environment initially supporting seven analysts, a population that will triple over the time horizon, ESG’s model assumes that the man-months dedicated to analytics development will grow from 62 in the first year of operation up to 153 in the final year of the time horizon. ESG’s model then conservatively assumes that development requirements in the Zaloni scenario will be halved. This allows the organization to redeploy specialized developer resources to other projects, adding value for the organization. In the aggregate, the organization selecting Zaloni is estimated to increase development efficiency by over $3M over the time horizon. Beyond IT efficiency, ESG’s model also attempts to quantify improvements in user workflows and tasks expected to be enabled via Zaloni. Key user productivity benefit assumptions for Zaloni compared with the PMO are:

© 2016 by The Enterprise Strategy Group, Inc. All Rights Reserved. EVV Report: Quantifying the Value of Enterprise-grade Data Lake Management with Bedrock 15

 Impact of platform administration: As noted, since Zaloni is a single, easily administered platform, ESG’s economic model ascribes a significant benefit to the IT organization compared with the administrative burden estimated for a collection of traditional ETL and data warehousing tools. However, ESG’s model also estimates the trickle-down value to the user community from being supported by a solution that is easier to develop, deploy, configure, expand, and enhance over time. In the Zaloni scenario, initial setup activities and expansion activities over time as new nodes are added to the Hadoop cluster are estimated to have 1/20th the impact on users in terms of the availability of the analytics environment. Additionally, intermittent outages related to system upgrades and MAC events are estimated to be halved. Across the total analyst user community (initially seven analysts, growing to 21), the increase in productivity estimated to be enabled by Zaloni due to less platform unavailability during setup, configuration, and administrative events is in excess of $925K.  Improvements in reporting and analysis: Zaloni enables increased levels of self-service analytics for end- users through the use of Bedrock’s data catalog and Query Builder features. ESG’s model, in turn, quantifies the increase in analyst productivity (i.e., the ability to ask more valuable questions). To do so, ESG’s model accounts for both an increase in the number of analyses a single analyst can conduct (10%) and an increase in the value (25%, by default) to the organization brought by the answers, which are enabled by data agility and quality (i.e., the ability to quickly and easily report on data’s quality and provenance without looping in IT support). Across the entire analyst community (in this scenario, starting with seven analysts and growing to 21 over the time horizon), the incremental value delivered to the organization through the improvement in analyst reporting and analysis value is nearly $2.5M over three years.  Improvements in platform support: ESG’s model previously quantified the IT value advantage of being able to more quickly and easily resolve user issues in the Zaloni scenario compared with the PMO scenario. However, it is important to note that user constituents are also impacted: They lose less productive time as a result of not having to wait for resolution. ESG’s model uses the same 30-minute versus 90-minute relative average issue resolution time for these quantifications. In total over three years, the analyst community is estimated to save about $28.5K in productive time over the three-year time horizon. The final area of benefit quantified by ESG’s economic model is improvements in analytics time to value:  Analytics time to value improvements: The key drivers of analytics time to value is the speed with which the data lake can be built and the avoidance of development delays when responding to analysts’ requests. However, rather than quantifying lost productive time, these measures are representative of the opportunity cost of waiting for development teams to deliver data to utilize in decision making. This is a nebulous topic, but one central to the value of big data analytics. In ESG’s model, the deployment of the data lake is assumed to be much faster with Zaloni than with do-it- yourself development. This fact was validated by every customer ESG spoke with. To estimate the incremental value associated with faster time to market, ESG’s model ascribes an intrinsic value to the data lake, which is correlated to the number of Hadoop nodes in the cluster. For a 50-node cluster, ESG’s model uses a default annual value of slightly greater than $2M; ESG acknowledges that the intrinsic value of the data lake will vary immensely from organization to organization, depending on use case. Next, ESG’s model compares the value of shrinking the time to production for the data lake from several months to several weeks. As a result, initial time to value improvements expected in the Zaloni use case slightly exceed $1M. Furthermore, in the PMO scenario, whenever an analyst submits a development request to enable alterations to the underlying data lake, there is a multistep process to enable that iteration. When juggling the competing requests made among a growing analyst community, it is inevitable that some requests will be delayed. These delays are effectively eliminated in the Zaloni scenario because all the data in the data lake is automatically tagged, sourced, protected, and available to be queried from the moment it is added to the cluster. To quantify the value associated with this difference, ESG’s economic model ascribes a monthly opportunity cost to delayed analyst requests of $10,000. Additionally, ESG’s model assumes that 10% of analyst data customization requests will be delayed and that the average delay is 1.5 months. In

© 2016 by The Enterprise Strategy Group, Inc. All Rights Reserved. EVV Report: Quantifying the Value of Enterprise-grade Data Lake Management with Bedrock 16

total, ESG’s model estimates that the value of avoiding delays for the analyst community is on the order of $242K over three years. The Bigger Truth If big data is to be an accepted part of serious business, versus a limited effort, IT departments will need to offer a path for the organization to take that incorporates enterprise-grade governance, is fast to deploy, and is easy to maintain. Best practices are still being defined across the industry, but the problems that exist with traditional homegrown approaches must be solved today. Finding a more efficient way to systematically address these issues is imperative and can carry with it a large pay off. Thankfully, Zaloni has made this challenge a priority, and has developed a proven platform to address the weaknesses of data lakes for the enterprise. Struggling to recreate by hand all the functionality of the Zaloni approach is a sucker’s game. As attested to by Zaloni’s own customers, it would take more effort, more time, and more cost than simply beginning with a well-designed, comprehensive, and distribution-agnostic solution. Businesses that want to start using advanced analytics methods to derive insights and value from untapped data sources as a result of big data initiatives, but which need to start from a safe, agile foundation should consider Zaloni’s Bedrock before embarking on a massive custom development initiative, likely to end in frustration.

© 2016 by The Enterprise Strategy Group, Inc. All Rights Reserved. EVV Report: Quantifying the Value of Enterprise-grade Data Lake Management with Bedrock 17 Appendix A For this project, ESG adhered to the following research and modeling methodology:  ESG conducted initial market research across Zaloni and other relevant big data technology vendors to assess current market trends, vendor value claims, and the purchase considerations that are most important and relevant to customers evaluating analytics solutions—specifically to be used in conjunction with a Hadoop cluster.  Based on the results of this initial research, ESG subsequently identified a “present mode of operation” (PMO)—effectively, a traditional approach that customers may take to meet their data analytics requirements—against which the costs and benefits of utilizing Zaloni was to be compared. For this analysis, the PMO is a blended average of legacy and homegrown tools—and the staff required to develop and operate these tools—for data preparation and warehousing functionalities that would be required in an alternative Hadoop big data use case.  ESG then conducted a series of in-depth interviews with systems engineering, service and support, and technical marketing representatives from Zaloni. Additionally, and more importantly, ESG conducted four in-depth interviews with Zaloni customers to understand trends and outcomes within in-production environments. The data collected in these interviews was used to refine assumptions built into the model related to current customer environments and the direct and indirect costs and benefits attributable to Zaloni. Product marketing collateral, configuration guides, and case studies of customers were also used to identify specific IT and user considerations and the labor burden (in both time and cost) associated with those considerations. This research helped to inform ESG’s understanding and analysis of Zaloni adoption drivers, usage trends, and the operational and financial benefits that customers can realize.  Once the economic model was finalized and all research complete, ESG modeled a default scenario that is designed to demonstrate the relative costs and benefits of Zaloni in a typical big data environment. Those results were then compared with model outcomes for a similar-scale PMO solution. The results for this default scenario are described in this paper. Please note that the data and conclusions presented in this report regarding the costs and benefits associated with implementing and utilizing Zaloni reflect the output of ESG’s economic value analysis based on the specific use case and default scenario assumptions modeled for this report. ESG acknowledges that changes to these assumptions will lead to a different set of results and, as such, advises IT professionals to use this report as one validation point in a comprehensive financial analysis process prior to making a purchase decision. Zaloni provided current standard pricing and product information to ESG. Other IT equipment and labor cost assumptions were obtained from publicly available sources such as IT vendor and channel partner websites and published price lists.

© 2016 by The Enterprise Strategy Group, Inc. All Rights Reserved.

20 Asylum Street | Milford, MA 01757 | Tel: 508.482.0188 Fax: 508.482.0128 | www.esg-global.com