<<

The past, present and future of data warehousing

HOW UNDERSTANDING FOUR DECADES OF TECHNOLOGY EVOLUTION CAN HELP YOU CHOOSE THE RIGHT SOLUTION TODAY

1 What’s inside

3 The need for the modern 4 The genesis of data warehousing 6 Scaling the data warehouse 7 The data warehouse in a box 8 Data warehousing and the big data explosion 10 The best and worst of both worlds reveal the future 11 The cloud revolution 12 The need for modern data warehousing 13 From the past, to the present, to your organization’s future 14 A time to move forward

2 THE CHAMPION GUIDES The need for the modern data warehouse Every technology evolution tells a story

THE RISE OF THE DATA WAREHOUSE The evolution of data warehousing also reveals a direct path The forces that gave rise to data warehousing in the 1980s are just you’re on today: the need for a powerful, simple and affordable as important today. But from then until now, history reveals the data warehouse built for the cloud to store and analyze all your data benefits and drawbacks of the traditional data warehouse, the NoSQL in one location. options once seen as the replacement for data warehousing and, more recently, the cloud versions of these solutions. Each iteration solved In this ebook, we’ll detail these key insights you can use today an important problem but this evolution proves technology never to champion modern data warehouse technology within your caught up with the data demands of the enterprise. organization, helping it become a data-driven enterprise:

More recently, “big data” hype has stirred healthy debate as to the 1. How traditional data warehousing defined the needs for today’s relevance of the data warehouse. The emergence of NoSQL solutions modern data warehouse. in the past decade urged technology and data professionals to 2. How the success and limitations of NoSQL systems such as consider alternatives to the legacy data warehouse. Hadoop revealed the limitations of legacy data warehouses.

Understanding this history reveals why data warehousing technology 3. How the cloud revealed what’s possible beyond “cloud-washing” is even more relevant, not less, today. The history also shows what an on-premise data warehouse or NoSQL system.

a modern data warehouse must deliver: the best of legacy data 4. Why the modern data warehouse is the first solution to meet the warehouses, the best of NoSQL and big data systems, the true data and analytics demands of today’s enterprise. benefits of cloud platforms and much more.

3 THE CHAMPION GUIDES

The genesis of data warehousing 1980 The data warehouse 1985 is born

Why data warehousing emerged 1990 The rise as the engine of data analytics of parallel processing

1995

The 1980s ushered in the relational . And it was good. WHAT ARE SOME OF THE MAIN DIFFERENCES, AND SIMILARITIES, Appliances BETWEEN AN OLTP DATABASE AND A DATA WAREHOUSE? catalyze the Companies around the world used it to record what was happening growth of 2000 data marts in their organizations. The technology became so prevalent that OLTP DATABASE enterprises began using relational to report on and • A database designed for rapid storage and retrieval of small sets analyze their business. However, IT groups soon discovered that a of current data records in support of transactions and interactions growing appetite for reporting and analytics had overwhelmed their within an enterprise. 2005 transactional databases. • Organizes data in tables and columns, and allows users access via structured (SQL). Hadoop and NoSQL are These new workloads were very different from the online transaction • Designed to handle quick, real-time activity such as entering a introduced processing (OLTP) of relational databases. Enterprises started to customer name, recording a sale, and recording all accounting 2010 activity of that sale. Demand offload these functions to separate databases optimized for reporting for power, flexibility, and analytics. Hence, the data warehouse was born. • Analysis is relegated to simple, static reports often driven by IT. & elasticity

Introduction of DATA WAREHOUSE cloud data 2015 • A database designed to store and process historical data from warehousing multiple sources inside and outside the enterprise, including the A fresh approach to the data cloud, for deep analysis. warehouse • Organizes data in tables and columns, and allows users access 2020 via SQL. The future: Built-for-the- • Optimized for loading, integrating and analyzing very large cloud data warehouses amounts of data. • Designed for descriptive, diagnostic, predictive and prescriptive analysis.

4 4 THE CHAMPION GUIDES

WHAT BENEFITS EMERGED FROM STORING AND ANALYZING 1980 DATA IN A SEPARATE SYSTEM?

• Eased the burden of reporting from transactional systems. The data warehouse • Produced more business-friendly data results. 1985 is born • Provided access to a wider array of reports more quickly. • Integrated valuable data from across the enterprise for richer insights.

1990 WHAT WERE THE LIMITATIONS OF THESE EARLY DATA The rise of parallel WAREHOUSE SYSTEMS? processing

• Getting data into the data warehouse was slow and problematic, often occurring only once a day and during 1995 off-peak hours. • Having to plan and pay for the highest expected resource usage, when that period may only last a month, week or just Appliances catalyze the one day of the year. growth of 2000 data marts • These systems reached capacity quickly, impacting performance. • Only the most technically astute could administer the data warehouse. 2005 • The technology limited the breadth and depth of insight enterprises could act upon. Hadoop and NoSQL are introduced

2010 Demand for power, LESSON LEARNED flexibility, Data warehousing has demands and requirements that & elasticity

OLTP databases and systems not designed for those Introduction of cloud data demands could not satisfy. Two decades later, “big 2015 warehousing

data” systems such as Hadoop also came up short, A fresh approach to the data which we’ll discuss below. warehouse

2020 The future: Built-for-the- cloud data warehouses

5 4 THE CHAMPION GUIDES Scaling the data warehouse 1980

The data warehouse 1985 is born

The rise of parallel processing 1990 The rise of parallel processing

1995

The amount of data and the complexity of investment of research and development by collect, integrate and query even larger data Appliances workloads grew steadily throughout the database vendors. sets than ever before. But the complexity of catalyze the growth of 1990s, taxing the data warehouse beyond configuring and administering these systems 2000 data marts its limits. The early architectures for data The first evolution of MPP was shared-disk: an required exponentially more time than warehousing weren’t much different than MPP cluster of servers connected to a shared executing the actual analytic queries against OLTP databases: a big monolithic server disk array. Shared-disk architectures retain the data warehouse. In addition, scaling for attached to a big bunch of disk. simple storage management by centralizing complex queries and more than just a few 2005 data but with the tradeoff of a performance users could reduce performance to a crawl. Hadoop and As data volumes grew, the architecture of the bottleneck between storage and compute. NoSQL are traditional data warehouse couldn’t keep up. Next came the shared-nothing architecture: an introduced An individual server could only be so large MPP cluster without a central disk, with data 2010 LESSON LEARNED Demand before you experienced diminishing returns distributed across the cluster. Shared-nothing for power, Parallelism is a critical tool to flexibility, on performance at a rapidly escalating cost. architectures avoid the bottleneck between & elasticity addressing scale, and scaling additional The cost and complexity to upgrade to newer, compute and storage but still carried the computing resources more effectively. Introduction of cloud data faster server and storage options was also tradeoff of complicated storage management. 2015 But parallelism didn’t completely warehousing painful. For example, resizing the system requires solve the problem of getting data- A fresh approach redistributing and re-replicating data. to the data driven insights to users and quickly. warehouse These factors led to the emergence of highly In addition, the complexity of these parallel architectures and massively parallel A MARGINAL STEP FORWARD 2020 architectures created new problems. The future: processing technology known as MPP systems. What were the benefits of this next iteration Built-for-the- cloud data The transition to MPP required a major of the data warehouse? Enterprises could warehouses

6 THE CHAMPION GUIDES The data warehouse in a box 1980

The data Data warehouse appliances attempt to democratize data access warehouse 1985 is born

As data warehousing grew in importance in environment system to manage: multiple data these appliances, the growth in data meant the late 1990s and early 2000s, more and repositories, housing multiple copies and enterprises would often exceed the finite

more parts of the business demanded access versions of data, spread across many different available storage of the appliance, which 1990 The rise to data for reporting and analytics. These physical locations. could be as low as 10TB of storage. When of parallel growing demands on data warehousing led to enterprises reached this limit, it was near processing increasingly complex systems, and lots of time WHY DID ENTERPRISES EMBRACE THE DATA impossible to simply increase the storage WAREHOUSE APPLIANCE? spent configuring, maintaining and deploying unless the organization undertook a 1995 an overburdened data warehouse. • Reduced system administration. complicated, disruptive migration to a • Most appliance vendors used MPP larger hardware configuration. architectures for high performance. Appliances To try and relieve some of the load on the catalyze the growth of data warehouse, enterprises began setting up • Users achieved faster query performance This constraint, which plagues on-premises 2000 data marts additional copies of data in the central data over non-appliance solutions. data warehouses of all types, remains a warehouse. This gave smaller groups inside the • Faster path to go-live by eliminating significant issue for enterprises today. Cloud, enterprise their own sandbox, or data mart: complex, piecemeal processes to purchase, which we’ll discuss later, emerged as the first install and configure hardware and a subset of the data warehouse to serve a real opportunity to enable unlimited storage 2005 software. Reduced system administration specific group or line of business (LOB). and scalability without the high cost and meant a faster path to go-live. headache of scaling seen with traditional Hadoop and NoSQL are APPLIANCES CATALYZE THE data warehouses. introduced GROWTH OF DATA MARTS WHAT DID ENTERPRISES SEE AS THE DRAWBACKS? 2010 Supporting this trend, IT vendors started Demand for power, Proprietary platforms made migration off offered pre-configured data warehouse • LESSON LEARNED flexibility, the system difficult. & elasticity appliances to simplify data mart set-up. Making a data warehouse easier • No flexibility around configuring the ratio Introduction of Customers could get up and running, without to get up and running, and use, is cloud data of storage vs compute. 2015 warehousing the headache of purchasing and configuring critical to supporting broader access • Scaling up the system required a “forklift” A fresh approach individual servers and installing and configuring to data insights. But that solution to the data replacement and migration project. warehouse database software. shouldn’t lead to fragmented data • They often served as data marts, creating silos, generating significant data 2020 separate repositories of inconsistent data. The future: The enabled a inconsistencies and requiring Built-for-the- cloud data significant expansion of data access across significant resources to manage. THE PITFALLS OF LOW STORAGE LIMITS warehouses the enterprise. However, these devices also created a much more complicated data In addition to the inherent drawbacks of

7 THE CHAMPION GUIDES

Data warehousing and the big data explosion 1980 The data warehouse 1985 is born

1990 In stomps the elephant: The rise of parallel Hadoop and NoSQL processing

1995

Appliances catalyze the growth of In the late 2000s, the cost of creating and designed to handle these types of data, new for these systems has largely remained 2000 data marts storing data rapidly declined. As a result, new NoSQL systems emerged such as Hadoop, inadequate, and the complexity of these applications emerged, generating exponentially an open source, Java-based programming systems is still high. more data than ever seen previously. Social framework that supports the processing media applications attracted millions of users, and storage of extremely large data sets in a WHAT WAS HADOOP DESIGNED FOR? 2005 capturing every click, , share and post. distributed computing environment. Originally, • Collecting large amounts of varying data types in their raw forms and storing them Hadoop and Facebook has two billion users and generates NoSQL meant “No SQL”, which evolved to “not NoSQL are in a single data storage repository often introduced nearly one petabyte of data each day. Other only SQL”. called a data lake. 2010 applications now generate huge amounts of log Demand • Easily scaling storage but at low cost. for power, data or machine data. Eventually, people realized they needed to use flexibility, SQL on those NoSQL systems to leverage the & elasticity NEW SEMI-STRUCTURED DATA TYPES tools and skills they already had. That demand WHAT WASN’T HADOOP DESIGNED FOR? Introduction of cloud data The data explosion spawned new structures led to the creation of a plethora of SQL and • Optimized analytics 2015 warehousing

for data beyond the traditional and rigid SQL-like interfaces for NoSQL systems. • High concurrency A fresh approach to the data relational formats. These structures that • Easy access to query the data using warehouse emerged were flexible and could express These developments led many to question standard SQL 2020 more complex relationships, including semi- whether the data warehouse would become The future: Built-for-the- structured data such as JSON, Avro and irrelevant, replaced by NoSQL systems. cloud data XML. Since existing data warehouses weren’t However, SQL support and performance warehouses

8 4 THE CHAMPION GUIDES

The upside? Hadoop revealed the traditional data warehouse 1980 had not evolved fast enough to address this new, tsunami of data. Hadoop and other, NoSQL systems addressed the The data challenge of getting all types of relational and nonrelational warehouse 1985 is born data into a single repository – something traditional data warehouses were never designed for.

However, once IT stored the data in Hadoop, enterprises 1990 were left wondering how to transform this data into a format The rise of parallel for fast, easy access by business people to perform deep but processing efficient analysis. Data warehousing was built upon this premise decades before, and the need still remained. 1995

HADOOP DIDN’T REPLACE THE DATA WAREHOUSE

Ultimately it became clear that Hadoop couldn’t replace the Appliances catalyze the data warehouse. If the data warehouse had become obsolete, growth of 2000 data marts everyone would be writing Java programs in MapReduce or data frames in Spark. We learned that Hadoop is a toolkit, not an off-the-shelf solution designed for analytics. Data loading with Hadoop is fairly straightforward. But getting insight from 2005 data stored in Hadoop requires highly technical individuals with

unique skills, which are in short supply. Hadoop and NoSQL are introduced THE FALLOUT? ENTERPRISES FOLLOWED ONE OF THREE APPROACHES: 2010 Demand Keeping their traditional data warehouse for power, • flexibility, • Trying to work within the constraints of Hadoop & elasticity Deploying both a data warehouse and Hadoop to advance Introduction of • cloud data closer to data nirvana: the data-driven enterprise. 2015 warehousing

A fresh approach to the data warehouse

2020 The future: Built-for-the- cloud data warehouses

9 4 THE CHAMPION GUIDES

The best and worst of both worlds reveal the future 1980

The data warehouse 1985 is born

Enterprises demand power, flexibility, elasticity 1990 The rise of parallel processing

Data warehouse and NoSQL vendors continue significant challenges to integrate and analyze Enterprises still needed a better, affordable and 1995 to evolve their offerings, a testament to the that data. elastic architecture to meet their ever-changing varied methods today’s enterprises demand to needs for data storage and compute resources.

better serve their customers, streamline their Enterprises now want the best of both worlds: They also needed to optimize their costs and Appliances catalyze the organizations and lead their industries. Overall, The flexibility of easily loading diverse data into avoid pre-configuring their data warehouse for growth of 2000 these demands fall into the following categories: a system that doesn’t require transforming that peak performance that may only last a month, data marts data first. They also want a system that can week or day of the year. They needed to scale POWER handle changing data, while also having high- up, down and out (concurrency) automatically or Users want the benefits of what the performance access to that data with the skills with the click of a button, and pay only for the 2005 traditional data warehouse embodied, and and tools they already have. resources they used.

more. They want to perform today’s analytics Hadoop and NoSQL are without the constraints of yesterday’s ELASTICITY introduced offerings. They want the compute resources Many technology vendors have simply moved LESSON LEARNED 2010 and concurrency to easily and affordably their on-premises data warehouse or NoSQL A modern data platform has to Demand for power, execute all sorts of analytics, including data solutions to the cloud and offer it as a service. support traditional and non-relational, flexibility, & elasticity exploration, BI and reporting, ad hoc analysis, This shift change nothing about the underlying or semi-structured, forms of data. It should also affordably meet the on- Introduction of event-driven analytics, predictive analytics and architecture of these solutions. All of the cloud data 2015 data-driven applications. on-premises constraints remained demand business needs for storage warehousing and compute. Finally, it should allow A fresh approach • Pre-defined and limited compute and to the data warehouse FLEXIBILITY storage resources people to use tools and skills based NoSQL systems created the ability to load all Competition between user queries and data on SQL, the dominate query language • 2020 in enterprise environments. The future: of an enterprise’s data into a single repository. integration activities Built-for-the- These data lakes largely overcame the challenge cloud data • Compute and storage located on the same warehouses of housing the multitude of varying data types. cluster, preventing you from scaling one Yet, these data sets remained in their original, without also scaling the other raw formats. As discussed above, this created • A host of other architectural impediments

10 THE CHAMPION GUIDES

The cloud revolution 1980

The data warehouse 1985 is born Data warehousing and cloud computing 1990 The rise of parallel processing

Today’s big revolution is cloud computing. At WHAT DO ENTERPRISES MISS OUT ON WITH A The cloud transformed the notion of what’s 1995 first, cloud was used for low-priority workloads DATA WAREHOUSE NOT BUILT FOR THE CLOUD? possible when architecting and building a data such as testing and development. Its initial value • Effortless and infinite scaling of cloud warehouse. Traditional vendors understandably proposition was immediate access to storage compute and storage resources. took the most direct path by simply moving Appliances catalyze the and compute without purchasing hardware. • Independent scaling of compute and their on-premises data warehouse solutions growth of 2000 data marts storage so you only pay for what you use. to the cloud. Their customers experienced a As enterprises considered the merits of cloud • True elasticity to scale up, down and few benefits. However, these vendors beyond easy access to cheaper resources, cloud out, automatically or with a simple click had only scratched the surface of what’s software solutions emerged. However, these of a button. possible with a true, built-for-the-cloud 2005 first-generation, managed services used the • Support for an infinite number of concurrent data warehouse architecture. users without competing for resources. same technology and architecture enterprises Hadoop and Loading and querying data simultaneously NoSQL are used inside their on-premises data centers. The • introduced without degrading performance. only major change was shifting the management LESSON LEARNED 2010 of the technology from the enterprise to a • Transferring the responsibilities of sizing, Demand Most data warehouse and NoSQL for power, cloud software provider. balancing and tuning the data warehouse, flexibility, offerings are simply copied to the & elasticity and enabling security, to the cloud vendor. cloud, or, “cloud washed” versions of Introduction of WHAT WERE THE BENEFITS OF THESE EARLY existing options. To take advantage cloud data 2015 warehousing CLOUD SOLUTIONS, INCLUDING EARLY CLOUD CLOUD: TODAY’S CHOICE FOR DATA ANALYTICS of cloud, a solution must be built for DATA WAREHOUSING SOLUTIONS? A fresh approach As the cloud matured, it became clear it was a the cloud, from the ground up. to the data warehouse • Minimized the huge, upfront purchase of a new development platform, one with unique traditional, on-premises solution. capabilities. Fast forward to today: The cloud is 2020 now considered a viable option, and often the The future: • Depending on the solution, some or all Built-for-the- default option, for a wide array of workloads, cloud data of hardware and software maintenance warehouses transfers to the data warehouse vendor. including data analytics. • The time to implement could be much less without having to purchase and install an on-premises solution.

11 THE CHAMPION GUIDES

The need for modern data warehousing 1980

The data The warehouse brought full circle warehouse 1985 is born

In recent years, four major impacts have Existing cloud and on-premises architectures • The demand for better technology has converged to both necessitate and enable a have reached their limits. Therefore, data always exceeded current offerings. 1990 The rise fresh approach to data warehousing that truly warehousing must evolve to keep up with • Enterprises want all their data in one of parallel enables the data-driven enterprise: demand. A crucial part of that evolution is location and securely accessed by all users. processing the need to develop a new architecture that • Four decades of innovations have • History: Data warehousing and noSQL can support unlimited growth in data volume, crystallized the architecture and technology 1995 needed to enable the data-driven technologies both paved the way for the workload intensity and concurrency. modern data warehouse. enterprise.

• Data: The amount of data generated in the Appliances Another critical component is support for catalyze the last two years has eclipsed all other data growth of diverse data: traditional, relational data and 2000 data marts created in previous times. And it’s coming LESSON LEARNED in all forms. semi-structured data. Ease of use matters. Without it, all potential data users can’t have History reveals the modern data • Demand: Enterprises want the simplicity, warehouse must leverage the concurrency and affordability to store, access to effective data analytics. “Why not architecture and affordability of integrate and analyze all useful data by all cloud?” has become the default question but 2005 business data users, around the clock. answering it effectively requires a solution the cloud, the flexibility of NoSQL built for the cloud technology and the power of Hadoop and • Cloud: The architecture to efficiently NoSQL are scale unlimited compute and storage, the traditional warehouse. introduced enabling data analytics for unlimited, THE MODERN DATA WAREHOUSE: 2010 data-driven insight. Demand BUILT FOR THE CLOUD for power, flexibility, The innovations of traditional data & elasticity KEY CONSIDERATIONS FOR CHOOSING warehousing and subsequent NoSQL systems Introduction of A DATA WAREHOUSE SOLUTION TO MEET cloud data span four decades. As technology vendors 2015 TODAY’S NEEDS warehousing iterated over and over, a number of themes So what are the implications for today’s data A fresh approach emerged that were as important at the birth of to the data platform buyers? Data warehousing is more warehouse the data warehouse as they are today: relevant than ever. SQL has proved to be a 2020 critical requirement for any data platform, and • Enterprises still strive today for what they The future: wanted 30 years ago: data-driven insight Built-for-the- SQL remains at the core of data warehousing. cloud data Each solution paved some of the path to warehouses History has shown that NoSQL options will not • enabling deeper and wider data analytics. eliminate the data warehouse.

12 THE CHAMPION GUIDES

1980

The data warehouse 1985 is born

1990 The rise of parallel processing 5 FIVE QUALITIES THAT DEFINE A DATA WAREHOUSE BUILT FOR THE CLOUD: From the past, to the present, 1995

Complete SQL database Supports the to your organization’s future tools millions of business users already Appliances catalyze the know how to use today. The built-for-the-cloud data warehouse growth of 2000 enables the data-driven enterprise data marts Zero management Reduces complexity with built-in performance so there’s no infrastructure to tweak, no knobs to Architectures of systems from the past Armed with the knowledge of 2005 turn and no tuning required. just can’t be re-engineered to deliver the evolution of the modern data

all these capabilities. Acquiring all these warehouse, you and your organization Hadoop and NoSQL are All your data Analyze multiple petabytes qualities in a single solution is possible can be champions for a data-driven introduced of structured and semi-structured data only with the invention of a brand new enterprise. Give your organization the 2010 data warehouse architecture built from architecture and technology of a data Demand (JSON, XML, Avro) to quickly extract for power, the ground up for the cloud. warehouse built for the cloud. flexibility, critical insight. & elasticity

Introduction of All your users An architecture that cloud data 2015 supports an unlimited number of warehousing A fresh approach concurrent users and applications to to the data gain access to the data warehouse warehouse without eroding performance. 2020 The future: Built-for-the- cloud data Pay only for what you use Scale storage warehouses separate from compute, up and down, transparently and automatically.

13 THE CHAMPION GUIDES

1980

The data warehouse 1985 is born

1990 The rise A time to of parallel processing move forward 1995 A path for modern

data-driven organizations Appliances catalyze the growth of 2000 data marts BECOME A CHAMPION TODAY WITH MODERN CLOUD DATA WAREHOUSING

Data warehousing has been re-thought and 2005 reborn in the cloud for the modern, data-driven Hadoop and organization. Find out how you can challenge the NoSQL are status quo and become an IT champion that creates introduced the future, giving users the benefits they dream of 2010 Demand with data warehousing built for the cloud. for power, flexibility, & elasticity

Take the next step toward writing the next chapter Introduction of cloud data of your organization’s data warehousing history. 2015 warehousing

A fresh approach to the data Visit www.snowflake.net warehouse

2020 The future: Built-for-the- cloud data warehouses

14