The Bloor Group ! ! ! THE BIG DATA INFORMATION ARCHITECTURE! An Analysis of the Consequences of the Big Data Trend
Robin Bloor, Ph.D.! & Rebecca Jozwiak
! ! ! ! ! ! ! ! ! ! ! RESEARCH REPORT
!______! ! !
"We are drowning in information, and starving for knowledge."! ~ John Naisbitt
RESEARCH REPORT THE BIG DATA INFORMATION ARCHITECTURE What’s With All This “Big Data?”! The Babylonians who walked the earth in 3800 BC – nearly six thousand years ago – took a regular census. They didn’t just count people, they also counted livestock and volumes of commodities like wool. Clay tablets were their means of recording data, and their CPU was an abacus. No doubt at that time, a census was big data indeed.! “Big Data” is why computers exist. No matter whether we consider the U.S. census of 1890, which was processed on punched card by Herman Hollerith’s famous tabulating machine, or the code-breaking computers of World War II, which leveraged parallel computation or “Big Processing” – computers continually evolve to better manage data. ! This frames the two main dimensions of large computer workloads. Either they involve sifting through a great deal of data or they involve doing a large amount of processing. In reality, Big Data is a poor description of this computing duality, but it is the one that has captured the headlines, so it is the one we have to use.! The IT industry generates and harvests more data every year. It’s been that way from the beginning. Roughly speaking, the data grows at about 55% per year. If you do the math, this means that data volumes grow in size by about 10x every 6 years or so. This increase is suspiciously in line with Moore’s Law, which has delivered 10x in computer power every six years since Gordon Moore made his wonderful and surprisingly observation.! On one hand, the capacity of the technology increases, and on the other, it gets used. We might thus conclude that what is now happening is just “same old, same old,” but in fact this is not true at all. ! To see why this is so, we need to take a broad look at what has happened in computing in the past, and what is happening now.!
The Technology Curve and its Demise! The evidence suggests that since about 1960 the IT market has been expanding at a dramatic but nevertheless predictable rate. The expansion rate has been dramatic because it has been exponential rather than linear. As human beings we are comfortable with the idea of linear growth. We can represent it as an even upward slope that yields a predictable improvement every year. We are less comfortable with predictable exponential growth. Even though the improvement is regular, we tend to underestimate its impact.! It was in an effort to capture exponential technology improvement that we came up with a graphic representation of it. A diagram of this is shown in Graph 1 on the following page. It shows a fairly complex graph, which illustrates in a general way the response time for computer applications graphed against the IT workload they present. ! The vertical axis is logarithmic, meaning that each unit (marked in black) represents 10 times the previous unit: i.e., 0.01 seconds, 0.1 seconds, then 10 seconds, 100 seconds and so on. We could have extended the graph (and hence the area labeled real time) below the 0.01 second line, but we’ve chosen to truncate it there. ! The horizontal axis is not logarithmic. No specific units are shown for workload because there are no obvious ways to metricate the workload of an application. Sometimes applications take a long time because the CPU is busy or because a great deal of data is being accessed or because network latency is an added factor. The use of resources varies.!