Technologies for Data
Total Page:16
File Type:pdf, Size:1020Kb
This report can be found at www.acola.org.au © Australian Council of Learned Academies Technologies for Data Kirsty Douglas Working paper April 2015 This working paper contributes to Securing Australia’s Future (SAF) Project 05. 1 This report can be found at www.acola.org.au © Australian Council of Learned Academies Table of contents Table of contents ............................................................................................................................ 2 Technologies for data ............................................................................................................................. 3 Overview ......................................................................................................................................... 3 Information in a digital age ................................................................................................................. 6 The ages of information ................................................................................................................ 12 The rise of a fourth paradigm? ..................................................................................................... 14 The age of interoperability ........................................................................................................... 17 Technologies for data in an age of interoperability .......................................................................... 18 ‘Big’ data capture, collection, and analysis ................................................................................... 22 Open data...................................................................................................................................... 25 Technologies for research data ..................................................................................................... 28 Policy issues and implications ....................................................................................................... 30 Summary ........................................................................................................................................... 33 References ........................................................................................................................................ 34 2 This report can be found at www.acola.org.au © Australian Council of Learned Academies Technologies for data Governments, intelligence agencies, businesses, time, some stochastic – with different and citizens must adapt rapidly to an environment properties of stability and longevity, and where data is both a commodity and a currency. different importance. The data accumulated from centuries of If the current ‘flood’ of data can be observation, and rapid improvements in digital distinguished from previous so-called interoperability in the last 50 years, continue to information revolutions, perhaps it is better transform the ways in which people understand characterised as an age of interoperability and describe the world, and the uses to which rather than a new information age. information is put. Countries and organisations The status of data is ambivalent. One which develop and foster technologies to ensure researcher’s data is another’s metadata; one that data can efficiently, reliably and economically person’s technology is another’s be stored, curated, managed, retrieved, accessed, infrastructure; the browsing and purchase shared, and protected have an opportunity to histories of an online community are a capitalise on this global data ecosystem. commodity and a resource to other users; and the size of a ‘big’ dataset in one field of Overview research, such as sociology, can be trifling in This paper addresses some of the another, such as astronomy. opportunities of three interconnected types of In the face of this pervasive interoperability technology for data: data analytics (‘big data’); and classificatory instability, regulators, users technologies for open data; and infrastructure and curators of data collections, and designers for collaborative research data. and administrators of data infrastructure may The notion of an information age or a data find it difficult to balance preservation, flood is not new. storage, and use with the management of However, the rise of digital information and digital traces, personal information harvested communication technologies (ICT) in the late from online behaviour and sensors. 20th century has been described as If ICT is a general purpose technology, then engendering an intellectual, social and data are its general-purpose constituents. The economic transformation ‘akin to the impact of technologies for data stems from emergence of literacy’. the interactions of data with other technology With the advent of digital programmable and with other data, ensuring a multiplicative computers in the mid-20th century, data effect: as 'data intermediaries', technologies outputs from simulations of complex for data can themselves be the mechanisms phenomena have increased enormously. by which things interoperate. More recently, improvements in networking Merely having more data does not translate and interoperability (broadly, the capacity for automatically to having better information. systems, organisations, processes and ‘Big data’ analysis is subject to the same limits products to work together) have provided as any statistical analysis. unprecedented access to information about The various uses of data creates tensions that the behaviour of people interacting with feed into a democratic deficit, whereby data online platforms. that is collected and generated for some The emergence of digital ICT as a general purposes is being co-opted for others, purpose technology (a GPT is a generic including by elected representatives. technology, widely used across the economy, For example, data producers (i.e. citizens) may with many different uses and spillover effects, have trouble accessing data about themselves, after (Lipsey et al., 2005) over the last 50 years while non-state entities use this data for has been accompanied by a huge increase in commercial or criminal purposes, and state the amount of information generated, agencies may themselves be complicit in the collected, stored, and shared. corruption of infrastructure. In an environment of networked sensors and Technologies to store, manage, access, ubiquitous internet access, virtually all human analyse, and share data can optimise the use activity is now a source of data. These data of large and complex datasets and ensure have different characteristics – some their preservation, providing Australian streaming, some stable, some updated over researchers and industries with a competitive 3 This report can be found at www.acola.org.au © Australian Council of Learned Academies advantage. But to realise these opportunities, consumers and producers of data need to be policymakers and users need to understand able to trust that their information is in safe the limits of data and data technologies, and hands. 4 This report can be found at www.acola.org.au © Australian Council of Learned Academies Keywords access, analyse, big data, capture, collect, data, disseminate, generate, ICT, information, information age, interoperability, infrastructure, metadata, open data, research data, re-use, share, standards, store, technologies for data, use ______________________________ This paper discusses the history of information and technologies for information, challenging the idea that the late 20th and early 21st centuries are distinguished from other periods for being an information or data age; the notion of a digital revolution and the development of a digital economy and its increasing entanglement with the rest of the economy (see also (Douglas, 2015); types of data problems including ‘big’ data, open data, research data collections, and the interlinked problems of privacy, security and surveillance; and the technologies that are needed to store, manage, access, analyse, share and regulate data to optimise the use of these collections for Australia. The paper does not dwell on the distinctions between data and information. For these purposes, data are items of ‘information considered collectively’, typically ‘used for reference, analysis, or calculation’, or ‘something given or granted; something known or assumed as fact, and made the basis of reasoning; an assumption or premise from which inferences are drawn’.1 So data and information describe or represent the world. Since at least the 1960s, data as raw numbers and facts have sometimes been distinguished from information as processed data, particularly in the domains of knowledge management and informatics, but this distinction is not particularly useful here. Data need infrastructure, or technology, for their capture, generation, collection, manipulation, analysis, use, dissemination, and storage. This data infrastructure may include data itself. Metadata as a type of data that characterises other data is a kind of infrastructure, for example (Research Data Infrastructure Committee, 2014). Legal and regulatory frameworks, licenses, and industry standards at all points of the data lifecycle and across sectors are also technologies and infrastructure for data (Korte, 2014) (see Table 2). In 2014 the Australian Government Minister for Communications, the Hon. Malcolm Turnbull MP, observed that ‘the cost of storing, transmitting and processing or analysing data is cheaper today than it has