10.00 am Daniel Austin, Welcome and Introduction Head, Research and Development, Smart Services – CRC

10.30 am Gordon Bell Big Data: A Contact Sport? Researcher Emeritus, Microsoft Research Like many information technologies, Big Gordon Bell is a Microsoft Researcher, Data, is one of those phrases aka buzzwords Emeritus, and former Digital Vice President that computer science eventually has to deal of R&D, where he led the development with and adopt as systems and the resulting of the first mini- and time-sharing applications become just too “big” to computers. Digital introduced VAX Clusters ignore. In a September CACM note Michael as a scale out architecture. As the first NSF Stonebraker summarized BD, or the 3Vs-- Director for Computing (CISE), he led the volume, velocity, and variety that drive NREN (Internet) creation. Bell has worked interesting technologies that are ripe for on and written articles and books about computer science: Big Volume/Small computer architecture, high-tech start-up Analytics; Big Volume/Big Analytics; Big companies, and life logging. He is a Velocity (real time for trading, commerce, member of the American Academy of Arts surveillance), and Big Variety (fusion of and Sciences, National Academy of multiple databases). Engineering, National Academy and Science, the Academy of Stories of the misuse of data using all sorts Technological Sciences and Engineering and of analytics, is almost enough to inhibit work received The 1991 National Medal of on actual data and just support efforts on Technology. He is a founding trustee of the the downside e.g. how you assure anonymity Computer History Museum, Mountain for personal health data. The biggest View, CA. concern for computer science should be “just any data”—this requires working http://research.microsoft.com/en- collaboratively WITH fields that actually have us/people/gbell/ data--social and physical sciences, medical and health, and environment applications.

This is an especially interesting period as another decade-old buzzword, the “Internet of Things”, comes into existence driven by the plethora of sensors and low cost, wireless, sensor networks.

11.30 Emeritus Professor Margaret Jackson De-identified Data, Identifiable Data and RMIT University Data Protection Regulations Emeritus Professor Margaret Jackson, RMIT University, researches and Data protection principles originated in the publishes in the areas of protection of late 1970s, before the WWW and the electronic information and privacy. She is internet. Their developers did not foresee currently a senior research fellow of the the growth of massive global databases, nor Smart Services Co-operative Research the phenomena of social media. Centre, leading research in the area of online privacy and the use of social media This presentation examines how current by courts. data protection principles apply to big data and discusses whether the data protection

principles are still relevant today. Data She is the author of three books: ‘Electronic protection laws do not generally apply to Information and the Law’ (2011) (with unidentifiable or anonymous personal data Marita Shelly), ‘Practical Guide to the and one of the concerns about big data is Protection of Confidential Business that unidentifiable data can, when mixed Information’ (2006) and ‘Hughes on Data with other data, become personally Protection in Australia’ (2001), all published identifiable again. How should the data by Thomson Reuters. ‘Identity, Anonymity protection principles be applied in such and Privacy’ being co-authored with Dr cases? Gordon Hughes, will be published by

Thomson Reuters in late 2014. Also discussed is the possible impact of the proposed European Union ‘right to be erased’ which is likely to come into effect in early 2015. While it will have no significance in Australia, it may impose new obligations on data collectors in Europe and the US.

12.00 pm Professor Timos Sellis Big data infrastructures: Exploiting the RMIT University power of big data

Timos Sellis is a Professor at RMIT A revolution is going on these days in University in the School of Computer Computer Science around the Big Data Science and Information Technology, with theme. Many state that big data computing an expertise in Data Management. is perhaps the biggest innovation in

computing in the last decade. Big data is not His research interests include data streams, only referring to large volumes; the number peer-to-peer database systems, of data sets, diversity and arrival rates are personalization, the integration of Web and also challenges to data-driven science and databases, and spatio- temporal database engineering. Our ability to process data, to systems. He has published over 200 articles store it and, most of all to understand it and in refereed journals and international drive decisions and discoveries is in the heart conferences in the above areas and has of research. The Australian Academy of been invited speaker in major international Science in its recently published report on events. He has also participated and co- "Future Science - Computer Science" sets the ordinated several national and European general theme "Meeting the scale challenge" research projects. and notes that "we need to develop new paradigms to address the needs of the data- Prof. Sellis is a recipient of the prestigious rich society. In particular, computer Presidential Young Investigator (PYI) award scientists need to develop theory and given by the President of USA to the most algorithms that will lead to the production of talented new researchers (1990), and of new tools and techniques that are required the VLDB 1997 10 Year Paper Award in to advance the state of the art in managing 1997 (awarded to the paper published in and making sense of data." the proceedings of the VLDB 1987 conference that had the biggest impact in In our work we look at novel means of the field of database systems in the decade managing, analyzing, and extracting useful 1987-97). He was the president of the information from large, diverse, distributed National Council for Research and and heterogeneous data sets in order to Technology of Greece (2001-2003), and in advance the development of new data November 2009, he was awarded the analytic tools and algorithms, and facilitate status of IEEE Fellow, for his contributions scalable, accessible, and sustainable data to database query optimization, and spatial infrastructures. In this talk we will emphasize data management. on our view on how to build such big data infrastructures and the interesting research problems that arise.

12.30 Dr Heather Horst Ethnographic perspectives on big data Vice-'s Senior Research Fellow RMIT University The emergence of 'big data' has led researchers within and beyond the academy Dr Heather Horst is a VC Senior Research Fellow to reconsider what, how and when to relate and Director of the Digital Ethnography Research to the 'big data' phenomenon. Chris

Centre in the School of Media and Communication Anderson’s famously declared big data ‘the at RMIT University. Prior to joining RMIT, she held end of theory’. Corporations began research positions in interdisciplinary research imagining new forms of market research and centres at the University of Southern California, understandings of users through the University of California, Berkeley and University of collection, collation and connection of California, Irvine where she worked on large, metadata collected about users through collaborative research projects. She currently sits their everyday ‘behaviors’ and practices. on the editorial boards of New Media & Society, Governments focused upon the potential of Mobile Media & Communication, Information big data to glean new information on issues Technologies & International Development and the ranging from healthcare, economic activity Emerald Studies in Media and Communication. and security. Academics and others hurriedly developed new visualization software and An anthropologist by training, Heather's research other conceptual tools to keep pace with the focuses upon understanding how digital media, speed and scale of the new data ecology. technology and other forms of material culture Yet, the hype around big data has also mediate relationships, communication, learning, garnered critics who question the impact of mobility and our sense of being human. Her books big data in the everyday lives of people examining these themes include The Cell Phone: across the globe who remain excluded, the An Anthropology of Communication (Horst and ability of big data to capture everyone’s Miller, 2006, Berg), Kids Living and Learning with experience and the uncritical positivism of New Media: Findings from the Digital Youth big data proponents who posit big data as Project (Ito, Horst, et. al. 2009, MIT truth or reality which discredits other forms Press), Hanging Out, Messing Around and Geeking of knowledge production. Out: Kids Living and Learning with Digital Media (Ito, et al. 2010, MIT Press) and, most This talk reflects upon the ways in which recently, Digital Anthropology (Horst and Miller, ethnographers across a range of disciplinary Eds., 2012, Berg). Her current research, supported contexts are engaging with the idea and by an ARC Discovery Grant, two ARC Linkage practice of ‘big data’. Rather than posit big Grants and the Smart Services CRC, explores data as antithetical to qualitative and transformations in the telecommunications interpretive approaches, I will compare industry and the emergence of new mobile media three areas that shape the texture and practices such as mobile money and locative media patterns of knowledge production across the across the Asia- two domains. The first involves the scale of Pacific. Email: [email protected] Web: ht data collection and generation, what I’ll term tp://heatherhorst.com/ the big data(set) vs. little data(set) debate. The second involves a distinction between conscious and unconscious data collection and the final tension involves issues of time and temporality. Through a discussion of the approaches to knowing and ‘legitimate’ forms of data across ethnography and big data, the focus upon the textures of data may forge a path forward for more productive debate and engagement.

1.30 pm Professor Supriya Singh Big Data for Financial Services: The Deputy Head, Research, GSBL Consumer Perspective RMIT University The excitement around big data is coming mainly from large organisations. They see Prof. Supriya Singh is the Professor of the possibility of increasing their offerings to Sociology of Communications at the Royal customers and their profitability up to 5 per (RMIT) cent. The picture is one of getting large University. She also the deputy head of amounts of transactional data and being

Research, Graduate School of Business and able to analyse it in real time leading to Law, RMIT University. She is the Project offers being made to the customer by Leader of Smart Services Cooperative increased knowledge of what the customer Research Centre and Program Leader of is doing, where he or she is, and the context Community , Global Cities of his or her history with the organisation. It Research Institute, RMIT University. Her is a bit like a bank manager playing golf with research is focused on the sociology of his important customers and working out money and banking, user-centred design of deals. Except that this is going to happen to information and communication men and women, and it will happen to technologies, qualitative methodology and people with small or large savings and credit the domestic aspects of . portfolios.

It is already being accepted that in this world of big data where lots of things are uncertain, the financial organisation will know more about you and I irrespective of the fact that we have not given explicit permission. The questions that are being asked are: Will customers withdraw from the big financial organisations with the capacity to exploit big data and take their business to small boutique organisations without the time or the resources to go the big data route? Will customers want to be paid in some concrete way for the use of their big data? Will big data become individualised as other developments in the ICT field? Much of what is written about big data is in the realm of future implications and profitability. In studying the consumer perspective in the context of everyday life, we will ask: How important is big data in a consumer’s life in varied social and cultural contexts?

Is it empowering the consumer or intruding into his or her private spaces? In what ways can consumers get value from big data?

2.00 pm Damian Paull Credit Reporting – will more information CEO Australasian Retail Credit Association change consumer behaviour? Credit markets have changed significantly Damian is responsible for ensuring that over the past 15 years as automation and ARCA builds its capacity as a trusted the desire by consumers to obtain credit advocate on behalf of its members and instantly has created distance between the provides thought leadership that brings credit applicant and credit about positive change in the retail credit assessment. Credit markets operating with

environment. Over the past 12 months, poor information ultimately impacts Damian has led ARCA to a position consumers and the organisations that recognised by Government and the provide credit. (In a study of bankruptcy files Attorney General’s Department as the it was identified that only eight precent of leading industry association involved in the consumers had fully disclosed debt current credit reporting reforms. Damian relationships and outstanding balances when has a diverse and varied background applying for credit.) The cost of this poor including education, law enforcement, risk information or “information asymmetry” can and compliance, operations, sustainability result in credit rationing, hidden over and industry self-regulation. indebtedness and financial exclusion from responsible credit markets for consumers who are, but for a default, able to borrow.

The way that credit reporting information is reported in Australia is changing in an effort to balance the information used to assess credit worthiness. The new reforms make it possible for credit providers to collect, use and disclose more information thus creating a better picture of a consumers credit performance and one that may be far more holistic than currently available. If the reforms are to deliver the expected benefits, eligible organisations will need to contribute credit information and consumer’s will need to change their behaviour- primarily consumers will need to pay their credit repayments on time. In an economy where consumers have a basic understanding of a credit report, the next 6 -12 months will see a raise in awareness and potentially a shift in prioritisation of how financial commitments are managed.

2.30 pm Dr Lawrence Cavedon Roles for language technology and text Senior Lecturer, RMIT University mining for next-generation healthcare

Lawrence Cavedon is a Senior Lecturer in New Big Data initiatives in Health and the School of Computer Science and Bioinformatics, including the increasing Information Technology at RMIT University; ubiquity of EHRs and cheaper gene he is also a Senior Researcher at NICTA, sequencing, are creating enormous volumes Australia's ICT Research Centre of of data as well as opportunity for enormous Excellence, where he is part of the impact in the Health industry. In particular, Biomedical Informatics team, which works this increasing data offers the potential to with Health and Life Sciences partners to develop data analytic technologies to predict develop new data- and text-analytic risk of disease, to prevent conditions by techniques for biomedical applications. detecting early indicators, to monitor Until July 2005, Lawrence worked at symptoms, and to personalise treatments , where he led that maximise effectiveness in specific development of spoken dialogue systems population groups. for complex applications and While genomic and image data are widely environments, usually in partnership with considered to be sources of the “biggest” industry and US-agency partners. Prior to Health data from individuals, much Health his position at Stanford, Lawrence worked data -- particularly in EHRs -- is still in at VerticalNet Inc., a Silicon Valley e- unstructured or semi-structured form, as commerce technology company, where he text narratives. Such data must typically be led development of the first commercial processed using Language Technologies (LT) Semantic Web Services platform. to convert human-authored text into actionable data that can be processed by computational analytic techniques, e.g., data mining algorithms. LT can also be used to extract valuable information from a “stream” of clinical reports or other texts, enabling real-time automated monitoring. The extracted information is much more compact and problem-focused than the original text reports. We describe research at NICTA – conducted in collaboration with clinical research partners at Alfred Health, Peter MacCallum Cancer Centre, Melbourne Health, and Barwon Health – on technology for mining text in EHRs, for developing analytic techniques to support monitoring, prediction, decision support, and biomedical discovery.

3.00 pm Dr Marta Poblet Balcell Big Data for disaster and emergency VC Research Fellow, RMIT University management: information challenges and ethical issues Dr. Marta Poblet is an Associate Professor and VC Senior Researcher at RMIT Disaster management can be broadly University. She is one of the co-founders understood as a set of public policies, of the Institute of Law and Technology at protocols, social practices and individual and the Autonomous University of collective behaviors to deal with different and past researcher at ICREA (Catalonia). types of natural events—e.g. earthquakes, She holds a Juris from the bushfires, floods, droughts, etc.—or human- Stanford University (2002) and a Masters induced emergencies—e.g. oil spills, nuclear degree in International Legal Studies incidents, gas explosions, etc. In the last few (Stanford Law School, 2000). Her research years, the explosion of user-generated interests are law and conflict resolution, content and social media is impacting the ADR-ODR, mobile technologies for way organizations and authorities are development, and crowdsourced crisis dealing with emergencies and disasters. The mapping, and she has published over 30 potential of processing social media data in scientific articles on these topics in the phases of early warning and immediate international journals and books. Marta is response is huge. Big data can contribute to also a team coordinator of the Standby provide the big picture, while offering at the Volunteer Task Force, an international same time granular, real-time information at network of crisis mappers working in the local level. However, processing such an humanitarian response. amount of information can largely exceed the capacities of most agencies and response organizations, especially in the immediate aftermath of a disaster.

To face these new informational challenges, organizations are leveraging the power of the crowds. The massive adoption of mobile technologies, the explosion of social media and user-generated contents enabled by the Web 2.0, and the popularization of digital maps have lowered the barriers for citizens and communities to get involved in emergency management. But, in addition to informational challenges, the use of big data from social media in disaster management also raises a number of legal and ethical issues: What are the potential legal and ethical issues relevant to technology providers, communities, volunteers, and citizens at large when engaging in crowdsourced disaster management? Which standards, best practices, and regulatory models are to be set? This contribution will provide a brief overview of this emerging domain.

3.45 pm David Schaefer Big Data and Strategic Assessment: the School of Global, Urban and Social Studies case of Intelligence and National Security RMIT University The growing volume of data is a defining David Schaefer is a sessional tutor in the challenge for the modern intelligence School of Global, Urban and Social Studies profession. With the rapid spread of at RMIT University, and a non-residential communications technology, and such a research fellow at the Centre for Air Power large amount of information transferred via Studies in New Delhi. He is an MA graduate social media, the task is increasingly to from the Department of War Studies in extract valuable signals relevant for national King's College London, and the 2013 security from all the noise. To date, this has winner of the Australian Defence Business been centred on “meta-data” led counter- Review Young Writer's Prize for his essay terrorism. Surveillance efforts have been on intelligence reform in Australia, which developed to sift through electronic will be published in the journal Security information on a massive scale, raising legal Challenges. Some of his ideas being questions about privacy. As yet, however, presented today are drawn from this little work has been done to tease out the research. implications for intelligence assessment, which is the task of surveying strategic threats to national security.

Big Data will have to be carefully applied in the intelligence context. That being said, new ideas about “Long Data” offer grounds for optimism. In particular, intelligence assessment can benefit from a more sustained, quantitative analysis of socio- economic development in other countries where political instability might be of concern to Australia. This would help to anticipate dynamics, such as coups or the outbreak of intra-state violence, which might be relevant for national security. A case in point for Australia is the intervention in the Solomon Islands dispute in 2003, which the Flood Inquiry into intelligence cited as an instance when more thorough assessment was needed in advance of the situation. In the future, Big Data might fill this gap.

The paucity of historical data will be a handicap, as qualitative methods are only beginning to be applied to socio-economic problems. But it is something that should receive more attention from government; and as time goes by, incorporated into practice of intelligence assessment.

4.15 Dr Xiuzhen Zhang, RMIT University Anomaly detection in big networks David Savage, RMIT University Large volumes of data have been Xiuzhen (Jenny) Zhang is a senior lecturer in accumulated, and often they are inter- the School of Computer Science and IT at connected. Many big networks have been RMIT. Her research interests include data formed from connected data, and in various mining and data analytics. She has domains. Examples include the Web, published extensively, and has been financial transaction networks, biological reviewers for several journals and on the protein-protein interaction networks, and program committees for several online social networks. Anomalous international conferences in these areas. behaviours on networks can signify She is currently working on a research irregularities like web spams, financial project on big network analytics project transaction frauds, unusual motifs in funded by the Australian Research Council. protein-protein interaction networks, intrusions in computer networks, or frauds in online social networks. David Savage is a PhD candidate at RMIT. His work focuses on the detection of Anomalies have been generally defined as criminal behaviour in social networks. He is ``an observation that deviates so much from currently working on anomaly detection other observations as to arouse suspicion techniques for identifying evidence of that it was generated by a different criminal behaviour from the pattern of mechanism’’ (Hawkins, 1980). Depending on interactions within the network. David has the structure and information available on recently made the transition to Computer networks, different features/statistics Science from Ecology, where he studied derived from networks have been used for invasion ecology using simulation models anomaly detection. of invasive pests. Detecting anomalies on big networks is vital for many applications but is also challenging. Networks can be static or dynamic, where links in networks change with time. From another perspective networks can be unlabelled or labelled, where nodes and links are labelled with types. Different techniques applied to a network can result in completely different results. Due to the lack of information on ground truth of anomalies in big networks, it is difficult if not possible to verify or generalise on the effectiveness and applicability of different techniques.

We explore algorithms for anomaly detection in big networks and analyse their effectiveness and efficiency on different types of networks. We examine different approaches for anomaly detection, with a focus on application on fraud detection on social networks. In addition to real-world big networks, we also build an artificial network dataset for benchmarking anomaly detection algorithms. Through our experiments we demonstrate the strength and limitations of anomaly detection algorithms, and identify possible directions for improving these algorithms for applications in fraud detection.

4.45 pm Jonathan O’Donnell How big is your data? Senior Advisor, Research Grant Development, DSC, RMIT University The problem with terms like 'big data' is that they get bandied about without anyone When I began work at RMIT, my boss really agreeing what the term means. Are all pointed to a cable disappearing into the books in a library 'big data'? What about the wall. "That's the Internet," she said. all the entries for all the authors that have "You're the youngest. You like computers. ever written anything in Australia? Is that big Learn all about it, and tell us what it can data? Is that bigger than all the profiles that do." are on Facebook or Weibo? What about all the people who are tagged on all the photos That was in 1990, just before the Web was on Facebook? What about all the 'photos', all invented. For me, exploring the Internet the medical scans made by a hospital? Or all was like living in the science fiction books the scans of the heavens made by NASA? that I had loved as a kid. Photos of the universe – surely that is 'big data'? So I've been learning about it ever since Maybe it is, but does it matter? People care and helping people to work out what it can about personal data, sensitive data, financial do. data and medical data a lot more than they do about astronomical data. Do we need a tighter definition of 'big data' for the data that people care about? Or should it be defined by what people do with the data. If you are using a medical scan to see if I have a disease, then I'm probably OK with that. But I don't want you to use that data to stop me from getting a job, particularly if it is just based on the probability that I might get a disease. We can use large sets of data to accurately predict people's name, gender, sexual preference, country of residence, listening, reading and purchasing habits - even their breed of their dog. Do we need to put a tighter definition around what big data is, depending on how it is used? It turns out we need to know all these things. We need to understand how big 'big data' really is. We need to understand what we can do with it. And we really need to understand how people feel about that.

5.15 pm Professor Clive Morley Statistical issues in analysing big data Deputy Head, Learning and Teaching Graduate School of Business and Law Statistical inference testing and estimation techniques were devised for samples which Research has been mostly in applied were implicitly not as large as the Big Data data quantitative data analysis. sets. These are so large that very small Contributions of a methodological and differences can be calculated to be “significant” theoretical nature have also been using conventional statistical tools. Whether made, primarily in the area of tourism such significant differences are meaningful or

demand analysis. Current research useful is another question. The multiple interests are in the areas of applied comparisons issues with repeated testing in a data analysis, tourism demand data mining approach are acknowledged, but modelling, forecasting and strategic essentially fudged in the literature devoted to analysis techniques, and epistio- this problem. The data used in Big Data phenomenological media paradigms. analyses are not really a sample in many cases, and could more properly be considered more like a population census. Inference is therefore not relevant. The tools of data description, structure and exploration come to the fore.

Some of the concerns with issues such as data mining unguided by theory are illustrated with a simple example. Taking a completely a- theoretical approach, regress Gender on a lot of other, attitudinal, variables in a data set. The model derived predicts 100% of respondents’ gender correctly. The problems with this are two-fold. Firstly, it is, of course, not just a-theoretic but complete nonsense.

The second concern is that the perfect fit of the model means that the fitted combination of the “explanatory” variables can be a perfect proxy for Gender. So, for example, a marketing segmentation or hiring decision derived from these variables – not including gender – can be essentially discriminating by gender, even though gender is not included in the decision basis, and even though this may be illegal in certain circumstances. The legal defence might argue that gender was not used as a discriminating variable, but the reality is that it essentially is. In this example that is clear, in other cases it might not be. In a large data exercise, it may well not be realised or known. Illegal outcomes can be achieved thus in sophisticated, non-obvious, perhaps unwitting, ways.