INFORMATION Management
Total Page:16
File Type:pdf, Size:1020Kb
72 INFORMATION MANAGEMENT Engineering & Technology September 2012 www.EandTmagazine.com C2201_R9699_Feature_72.BK.indd 72 14/08/2012 17:12 73 THE LARGING-UP OF BIG DATA ‘Big data’ is a buzz-term that is resonating big-time with IT solutions providers and end-user organisations. But are ‘big data’ applications really so different from the business intelligence and analytics tools that have been around for decades? Martin Courtney investigates. THE TERM ‘BIG DATA’ has been getting “Traditional customers may have a lot of Goes To Work’, p75). He recalls the case of a big much exposure in IT circles over the data in tabular format – customer credit utility industry customer in the US running last year or two, on a scale that is bound ratings tables, for example – which they need a power plant offering nuclear and fossil fuel. to cause seasoned industry-watchers to to join together in a variety of ways. For “It had a bunch of systems from 20-30 years sniff the air for the familiar aroma of some customers it’s megabytes, gigabytes, ago, and wanted to cut down storage and IT industry hyperbole. There is the customary terabytes – the biggest with petabytes, like costs, but because of compliance and amount of hype, of course, but there is eBay, say.” However, with entities like the regulation it had to keep the old systems more to it than the covert repackaging Web, and social media sites like LinkedIn, going to show the auditor what systems they and repurposing of existing products. the kind of analytics on those data sets are were running to avoid accidents,” Krishna In one sense ‘big data’ is a classic semi-structured. Schrader says it is “hard says. “Now it can use metadata to search misnomer. The implication is that the to force them into a relational database. It them, build a new archive [to house them] volume of electronic information being is far easier if you have database systems and keep it in a place where they can easily generated and stored is now so large that with the required speed to be up and running query it, and shut down all the stuff sitting in existing database systems are no longer able already to handle non-relational database the main database. It can be much more cost to handle it. data, systems able to run queries in parallel”. effective than having two systems where It is certainly true that the world is there is some accountability, and can pay for generating data on an unprecedented Compliance versus intelligence itself in six months.” scale, and it is going to escalate as trends Maintaining separate storage systems to Quocirca’s Longbottom agrees: “If this such as machine-to-machine applications handle all those different forms of data is [stored data] is going to be something about roll-out. However, it is not so much its generally inefficient, particularly if an people’s mortgages, for example, we need to size as the diversity of formats that data individual or organisation wants to exploit be able to prove how we put everything now comes in – particularly unstructured all of the information it stores to use or together to prove that opportunity, so when sources like text, email, instant messages, for meaningful insight, and to do that fast mis-selling cases hit the headlines it is Web pages, audio files, phone records, enough to make the most of any business maintaining that auditability as well.” videos – and what people want to do with opportunity the exercise might subsequently it that presents the bigger problem. provide. Most organisations keep data Onboard the ‘big data’ bandwagon “Most vendors are now realising that big archived for compliance and regulatory When applying business intelligence data has actually very little to do with purposes, at least on a temporary basis, and analytics tools to large repositories databases and more to do with information before deleting. But others see the value in of structured and unstructured data on management,” according to Clive the information itself, and apply business a regular basis, there is a danger that Longbottom, director of analyst firm intelligence and analytics tools to pull out companies will spend time and money on Quocirca. “Eighty per cent of an statistics and patterns which they can turn new systems that are able to sift through organisation’s data is now electronic, yet to their advantage before discarding it. information on an industrial scale, only to 80 per cent of that is not held in a database, so Archiving data as insurance against find that the data contains little or no value cannot be dealt with just by throwing a big potential e-discovery requests is relatively to the business anyway. As such, there are database at it.” It is a question of “how you easy as the organisation does not need certain industries that are far more likely to pull data from a Microsoft Office or whatever to know precisely what information is gain advantage from big data projects than into an environment where it can be dealt being kept, only that they can search it others, with healthcare, retail, utilities, with”, Longbottom believes. if necessary, while modest investment and transport sectors top of the list. “Companies have always done big in the required capacity is easily offset We are already seeing the healthcare data – escalating amounts of information against the cost of potential litigation. sector benefitting, because it has so much – but that is not really the definition of Arvind Krishna is IBM’s general manager information that is not in databases, or is the term. It is more about the variety of of information management, which, like spread across multiple databases. the data and the velocity at which it comes Teradata, EMC, Oracle, and a host of other Longbottom argues that the retail sector at you,” explains Dr David Schrader, software application vendors, is making a “could do a lot with it because it has lots of director of marketing and strategy and big play for big data customers, albeit from stuff held in databases around loyalty cards, data warehousing software firm Teradata. a slightly different approach (see ‘Big Data for example, and they often want to be > www.EandTmagazine.com September 2012 Engineering & Technology C2201_R9699_Feature_72.BK.indd 73 14/08/2012 17:12 74 INFORMATION MANAGEMENT ‘Most of an organisation’s data cannot be dealt with just by throwing a great big database at it’ Clive Longbottom, Quocirca Business intelligence (BI) Big data Describes data sets that have Data analytics (DA) The science of Computer-based techniques used in grown so large that they become examining raw data with the purpose identifying, extracting, and analysing awkward to work with using on-hand of drawing conclusions about that business data, such as sales revenue by database management tools. Typical information. Data analytics is used in products and/or departments, or by difficulties include capture, storage, many industries to allow companies and associated costs and incomes. search, sharing, analytics, and visualising. organisation to make better business BI technologies provide historical, This trend continues because of the decisions and in the sciences to verify current and predictive views of benefits of working with larger and larger or disprove existing models or theories. business operations. Common data sets, allowing analysts to discern and Data analytics is distinguished from functions of business intelligence validate trends, such as tracking (and data mining by the scope, purpose, technologies are reporting, online preventing) the spread of diseases. and focus of the analysis. Data miners analytical processing, analytics, data sort through huge data sets using mining, process mining, complex event Data mining Interdisciplinary field of sophisticated software to identify processing, business performance computer science that describes undiscovered patterns and establish management, benchmarking, text the process of discovering new hidden relationships. Data analytics mining and predictive analytics. Until patterns from large data sets involving focuses on inference, the process of recently, BI applications have been methods at the intersection of artificial deriving a conclusion based solely on seen mainly as the preserve of very intelligence, machine learning, what is already known by the researcher. large enterprises and organisations. statistics and database systems. SOURCES: WIKIPEDIA, TECHTARGET, E&T RESEARCH. What level of growth are you seeing in the following Considered overall, to what degree does your types of data within your organisation? organisation exploit its information assets for analysis and decision making purposes? Structured data (e.g. tabular in RDBMSs) Structured date Unstructured data (e.g. documents, messages, Unstructured data multi-media, etc.) 0% 20% 40% 60% 80% 100% 0% 20% 40% 60% 80% 100% 5 – Extremely 4 3 2 1 – No Unsure high growth growth 5 – Fully 4 3 2 1 – Poorly Unsure Table 1: Organisations are seeing data volumes increase, Table 2: Lack of clear return on investment is with unstructured data looking set to grow even faster one key reason why so few organisations are than structured data in some cases, according to a extracting value from information held outside survey by IT industry analyst Freeform Dynamics. systems designed for handling structured data. WWW.FREEFORMDYNAMICS.COM < pulling data in from social networks to get a “It is also about situational awareness in website,” he says, “but alongside traditional better idea of what customers and prospects real-time – British Airways uses similar measures like sales or net promoter scores, are thinking”. tools to replan operations in the event you can now capture user tweets, which do He adds: “The utility companies have that a volcano blows and screws up [its not use tabular data, and get back an idea masses of data that is not being mined schedules], with information on grounded about who is happy with a new product, and correctly, and they are not pulling in external planes, crew and passengers all at their who is not happy.