School of Journalism and Mass Communication Faculty of Economic and Political Sciences

Big Data, Impact on Society

BY Aikaterini Dardoumpa

A thesis submitted in partial fulfilment of the requirements for the degree of

MASTER OF DIGITAL MEDIA, COMMUNICATION AND JOURNALISM Specialization: Digital Media, Culture and Communication

Supervisor: Assistant Prof. Dimitra Dimitrakopoulou May 2018

CONTENTS

ABSTRACT iii INTRODUCTION 1 METHODOLOGY CHAPTER ONE: DEFINING 4 1.1 Brief History: From Data to Big Data 8 1.1.1 The Datasets of the past ​ 8 1.1.2​ The Information Age 10 1.2 How is Big Data defined? 14 CHAPTER TWO: DATA REVOLUTION 21 2.1 Data Revolution as a social phenomenon 21 2.1.1​ The Data Revolution Era 23 ​2.1.2 Data Revolution and Social Sciences 23 2.2 Social (Big) Data 23 2.3 Big Data gets political 27 2.3.1 Social Movements and disastres’ loudspeaker ​ 27 2.4 Big Data Utility 30 2.5 Predictions and insights 32 CHAPTER THREE: CASE STUDY: SMART TRIKALA 46 3.1 Data and the City 36 4.2 The Case of Trikala 39 3.2.1​ About Trikala 39 3.2.2​ The vision ​ 39 3.3 The current situation 40 CHAPTER FOUR: DISTRESS ABOUT BIG DATA 54 4.1 Concerns about Big Data Social Research 54 4.2 Privacy 55

1

4.2.1​ Privacy Concerns 56 4.3 Information Privacy 58 4.4 Profiling 59 4.5 Europe for Data Privacy 61 CHAPTER FIVE: CONCLUSIONS - LIMITATIONS - FUTURE RESEARCH 66 REFERENCES AND BIBLIOGRAPHY 68

CONTENTS OF FIGURES

Chapter 1 Figure 1.1 What is a zettabyte? 13 Figure 1.2 The Three Big Data Vs 15 Figure 1.3 The Four Big Data Vs 19 Chapter 2 Figure 2.1 Data Shared in per minute 25 Figure 2.2 Port- au- Prince Crisis Map 29 Chapter 3 Figure 3.1 Urban Big Data 38 Figure 3.2 Screenshot of the website e-politis 43 Figure 3.3 ATM-style certificate spot 44 Figure 3.4 Check APP 45 Figure 3.5 SmartGuru Application 47 Figure 3.6 Street Lights Visualization 51 Figure 3.7 Smart Trikala Control Room 51 Chapter 4 Figure 4.1 How decides what ads to show 60 Figure 4.2 Sex of the respondents 61 Figure 4.3 Age of the respondents 62 Figure 4.4 Do you care if your personal data is being used? 62 Figure 4.5 Do you change your privacy settings? 63 Figure 3.2 Composition of the European Data Protection Board 64

2

Abstract

Data have become a torrent flowing into almost every aspect of our everyday life. Companies, and not only, churn out a burgeoning volume of transactional data, capturing trillions of bytes of information about their customers, suppliers, operations and policies. Big Data is now everywhere—in every sector, in every economy, in every organization and user of digital technology. There are many ways that Big Data can be used to analyze and predict various behaviors and situations. Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools (Snijders et al., 2012) or traditional data processing applications. For some organizations, facing hundreds of gigabytes of data for the first time may trigger a need to reconsider data management options. Big Data is not just about the size of data but also includes data variety and data velocity and more. More and more organizations are adopting Big Data analytics and this is not incidental. Big Data silently create a major shift on society that can only be seen if we look from afar. What is the role of a social researchers in the Information Age? What kind of data do we share? What is the impact on our everyday lives? How do our cities change? Those question arise from all the new technologies we are facing and have to cope with. This dissertation is trying to give answer to all those questions from a Big Data scope. With the explosion of sensors, smart devices as well as social networking, data has become complex because it includes not only structured traditional relational data, but also semi-structured and unstructured data. Big Data technology early adopters such as Facebook, LinkedIn, and are good examples for companies that deploy Big Data analytics. Also the same analytics are adopted by cities, like Chicago, Barcelona and Seoul. A wide variety of techniques and technologies has been developed and adapted to aggregate, manipulate, analyze, and visualize Big Data. These techniques and technologies draw from several fields like Artificial Intelligence (AI), and Internet of Things (IoT) with the ambition to solve contemporary problems. This paper is not about all the technicalities surrounding Big Data, but, mostly about the social impact this new research tool has, as well as, the arising opportunities in a theoretical framework. K​ey words: Big Data, Society, Smart cities, Data privacy

3

Introduction

In the near future every article on the planet will be producing data, counting our homes, our cars, even, our bodies and our cities. Almost everything we do today leaves a trail of digital exhaust, a continual stream of texts (BLOGS), location data (GPS) and other information that will survive well after all of us will be departed. We are now being exposed to as much information, in a single day, as our 15th century ancestors were exposed to in their entire lifetime. Big Data is, usually, the term used to describe a large and complex collection of data that is difficult to process using available database management tools or traditional data processing applications (Pesenson, Pesenson, & McCollum, 2010). Challenges include capturing, maintaining, storing, searching, distributing, transferring, analyzing, and displaying. These days, Big Data isn’t only for social networking and machine-generated net logs but it is about the optimisation of our lives. Organizations and enterprises will discover solutions to questions that they might never afford to ask before and Big Data will assist with perceiving questions that they, by no means, knew how to invite (Needham, 2013). Nonetheless, we need to be very meticulous because in this vast ocean of data there is a frighteningly accurate image of us, where we live where we travel, what we purchase and what we talk about (Ellison, 2016). It is all being documented and stored forever. This is the story of an phenomenal revolution; the Data Revolution, that is sweeping almost invisibly through our lives and about how planet earth is beginning to build up a nervous system with each of us acting as human sensors1. Everything we are creating these days whether we are talking about phone devices or computers or cars or refrigerators are offering data. Information is being extracted out of toll booths, out of parking spaces, out of internet searches, Facebook posts, phones, tablets, photos, videos. Every single thing that we do leaves a digital mark. All the data processing happened the last two years is more than all the data processing happened in the last 3,000 years (Ebbes and Stourm, 2017). The more information we get the greater the problems, that we see, will be. Since 2012 the size limit of data packets that are feasible to process over a reasonable period

1 Documentary of PBS called “The Human Face of Big Data” Link: h​ttps://www..com/watch?v=m9D-v6r3NJQ&t=1913s

4

2 3 of time, were measured in exabytes (1 trillion or 10​18 Bytes ). Scientists are regularly coming up against restrictions, because of large data sets in many research fields, including meteorology, genomics, complex physics simulations, biological and environmental research. But now, we start to face the same problems in social sciences research, because nowadays more and more social data become available. Every powerful instrument features a dark side (Walker, 2016). Anything that is aiming to alter the world by definition must be able to change it for the worse as much as for the better. Everything exceptional must have its negative counterpart. The most common problem is the possible invasion of privacy and the excessive surveillance. When the deliberation comes to Big Data, the world might not be ready to accept it because everybody talks about it, but, very few really understand it. Data can be used in any number of ways that we are not always aware of. The less knowledge of the use of that data we have, the less power we will have in the coming society. Getting to know the best ways to use Big Data could have an important impact on future societies (Anderson and Rainie, 2012). What is considered as "Big Data" varies depending on the organization's goals, which manages the sum and the capabilities of the applications, traditionally used to process and analyze all data in each domain. For some organizations, that experiencing hundreds of gigabytes of data, the "Big Data", may cause a need to revive data management methods and turn into more digitized solutions. For others, tens of thousands of terabytes will be needed, before the data size grows big enough to be of interest, meaning that different organizations consider different data volumes as Big Data. Organization or “Organism” that is going to use Big Data is the Smart City. A smart city aims in making the citizens’ life easier and the environment more sustainable. Until today cities do not use Big Data as much as they could. The following case study about Trikala is going to present the current infrastructure and propose potential use of Big Data in the existing infrastructure. According to Bernard Marr (2016), what we call Big Data today will simply become the new normal in a few years’ time, when all businesses and government organizations use large volumes of data to improve what they do and how they do it.

2 Retrieved from: http://highscalability.com/blog/2012/9/11/how-big-is-a-petabyte-exabyte-zettabyte-or-a-yottabyte.html 3 Retrieved from: h​ttps://www.collinsdictionary.com/dictionary/english/exabyte

5

Methodology

This dissertation was concerned generally to investigate the way Big Data is affecting society and in what depth. This paper concentrates on Big Data because it is a buzzword for the last decade (Puyvelde, 2017). This thesis is mostly based on literature review because it is of considerable size, since everyday new material appears on the web. To verify the bibliographic findings, the qualitative method of a case study was adopted. There were used various sources to cover the literature review, such as: newspaper articles, books, video documentaries and websites. Most of the sources were retrieved from Google Scholar and Elsevier. Some of the keywords used were: Big Data, Analytics, Society, New technologies, smart cities and Data Revolution. The large amount of information about Big Data led to the composition of four chapters about the History, the Data Revolution, the Privacy Concerns and finally a case study about the Smart City of Trikala. The first chapter is about the Big Data Definitions and characteristics. An effort was made to use simply language and explore as many of the definitions as possible by the bibliographic research. The second chapter is about the ubiquitous existence of Big Data, The Data Revolution and how societies experience it. There are real- life examples of Big Data in order to validate the theory. The third chapter is about “the dark side” ( Cukier, 2014, Hayers, 2015). It was considered appropriate to address the matter of data privacy extensively, due to the increasing interest of 4 the matter (Long, 2018). A sample of 86 people, was asked if they take the time to review privacy settings and if they know how much of their personal information they are sharing online. Next comes a small presentation of GDPR, because is a major change on how our data is being used in Europe. Last, in the case study, it is examined the close relationship between Big Data and Smart Cities. Various surveys conclude that smart cities are the way we are going to survive the 21st century (Graham,2014, JUPITER and McKinsey, 2017). The connection is not always

4 EMPHASIS ON PRIVACY AND DATA PROTECTION AT DATAWORKS SUMMIT, Retrieved from: https://www.protegrity.com/emphasis-on-privacy-and-data-protection-at-dataworks-summit/

6

obvious because we usually pay attention to what​ is happening and not to how​ it is happening, within a smart city framework. The case of Trikala, which is the main focus of the case study on this dissertation, is claimed as the first Digital City of Greece and that is the reason it was chosen. The municipality of Trikala has applied a variety of new technologies for the benefit of the citizens on the one hand, and for the municipality authorities on the other. Every service/platform/application is examined separately and is compared with a corresponding application of other cities and analysing the utility of Big Data on each of them. The point is to underline that Big Data is an important factor in order Smart Cities and by extension new societal structures to succeed. This means that in this chapter we are not presenting how the City of Trikala is operating but what can be succeed if Big Data Analytics will be fully functioning in its infrastructure. A prospective approach to their combined potentials. There was an effort made to present different points of view on the matter and create a full picture of the current situation.

7

Chapter 1

Defining Big Data

1.1 Brief history: From Data to Big Data

Data has been an essential part of human evolution for thousands of years. We -as a species- are pattern solvers by default and use data as a tool.In order to understand the reason Big Data is important, a short presentation of previous data collection and their significance is listed below.

1.1.1 The datasets of the past

Humanity gathers information of surroundings to optimize the life. The earliest humans, 4000-2500 B.C, passed on information through verbal communication. As our brains and human communities were evolving, humans began to keep track of information acquired. The first examples of this data preservation comes over 35,000 years ago, with cave paintings, this was the hunter-gatherer age, so it makes sense that most of the paintings were of animals, a way to pass down knowledge of the types of animals to hunt and the weapons used. Rather than a long game of broken telephone, recording was a much more efficient way to pass down information and allowed us to do three important things that were unable to happen before. First was, the rapid expand of knowledge, second, the preservation of the gained knowledge over generations, and third, building on the past knowledge in order to gain deeper insights.

8

This accumulation of knowledge continued for thousands of years as our ancestors started to 5 become more advanced and develop more powerful tools and survival instincts .

Around 3000 to 5000 years ago, another milestone in human data collection is being reached. Whilst previously only information that correlated with human survival was obtained and transferred, at this point, history and stories began to be recorded more frequently, in various civilizations. Primarily due to the fact that people were getting better at communicating ideas and expressing thoughts. As a result spoken and -more importantly- written languages became 6 7 more sophisticated and the first written documents appear in Egypt . 8 Moving forward to the year 300 BC, we come across the Great Library of Alexandria . Which is the most important and ancient example of dataset in human history. It is considered to be 9 the world’s largest data center -until it was destroyed by the Romans in the year 48 BC . Rarely before, or since, has a government allocated so much of its gross national product for the acquisition of knowledge. Every ship entering its harbor was searched, not for treasures but for books that could be copied and stored there. Estimates on the number of scrolls contained in the library, before it was destroyed, range from half a million up to one million. Those scrolls contained many of the core pillars of mathematics, science, geography, philosophy and more, written from some of the greatest minds in history such as Euclid, Pythagoras and Socrates, to list a few. Derived from the knowledge from the scrolls came astounding conclusions, related to how to calculate the mass of the earth, or even mathematical formulas that we still utilize today to map out universe and everything in between. The Great Library of Alexandria contained unbounded wisdom and knowledge. The way the library operated, demonstrates the importance of the accumulation of vast amounts of information and how deciphering the information could produce meaningful results that contribute, further, to our ability to solve various problems.

5 Retrieved from: h​ttps://www.humancondition.com/freedom-other-adjustments-adventurous-adolescence/

6 Retrieved from: h​ttps://sites.utexas.edu/dsb/tokens/the-evolution-of-writing/ 7 Retrieved from: https://www.khanacademy.org/humanities/world-history/world-history-beginnings/origin-humans-early-societ ies/a/learning-about-prehistory-article 8 Retrieved from: h​ttp://www.newworldencyclopedia.org/entry/Alexandria_Library 9 Retrieved from: h​ttps://ehistory.osu.edu/articles/burning-library-alexandria

9

There are, also, other examples of datasets over the years that have contributed in understanding and getting to know the world around us. For instance, the Astronomical dataset really began to take off in 1500s with Nicholas Copernicus, Galileo and Johannes 10 Kepler. These astronomers used Ptolemy’s (Greek astronomer), years of previous data on the movement of celestial bodies, and then by tracking the motion of the planets over the course of years, they were able to derive that instead of celestial bodies orbiting the earth, that 11 was a common belief back then, it was, in fact, earth and the planets that orbiting the Sun . Another one is the Microscopic Dataset, that became public in the mid-1600s with Anton Van 12 Leeuwenhoek and Robert Hooke . Through the use of the first microscopes, humanity was able to see and make observations past the macroscopic and bacterial cells and even microscopic life was observed. Without the Microscopic Dataset we would not have modern medicine and the majority of the technologies used nowadays. Last but not least, the Quantum Dataset, starting at the dawn of the 20th century from theories 13 of great minds such as Einstein, Max Planck, Niels Bohr, Ernest Rutherford and others . The ability to interact with individual atoms through the use of tunneling electron microscope is behind the significant advances in technology today, allowing to fit smaller transistors on chips. This dataset is still in its infancy, giving birth to new jobs such as Nano-engineering, and there is still so much to explore. These datasets are just a small subset of all the knowledge acquired in the last 2000 years and all the information since the cave painting. The potential to collect, store, and analyze data has significantly increased with the advancement in digital technology (Dorasamy and Pomazalova,2016), but this constitutes a small blip compared to the information we currently have on our fingerprints, meaning we all have our own personal library.

1.1.2 The Information Age

Since 1990, humanity has passed to the, so-called, Information Age, also known as Computer 14 Age, Digital Age, or New Media Age . This era brought about a time period in which people

10 Retrieved from: h​ttp://www.polaris.iastate.edu/EveningStar/Unit2/unit2_sub1.htm 11 Retrieved from: h​ttp://www.astronomytrek.com/who-discovered-the-earth-moves-around-the-sun/ 12 Retrieved from: http://www.history-of-the-microscope.org/history-of-the-microscope-who-invented-the-microscope.php 13 Retrieved from: h​ttps://www.amnh.org/exhibitions/einstein/legacy/quantum-theory/ 14 Retrieved from: h​ttps://historyoftechnologyif.weebly.com/information-age.html

10

15 could access information and knowledge easily. The year 1990 corresponds with the birth of the World Wide Web (www) and the start of the Information Age; the Data Revolution. In 1986 approximately 2.5 exabytes of data existed. In the span of seven years,1993, and three years of the public existence of the WEB, the number of data increased over six times and reached 15.8 exabytes. For a visual representation an exabyte is one billion gigabytes and a zettabyte is one thousand exabytes. The realm in front of Big Data, after Big Data moment, came in the mid-90s with the creation of search engines, a way to shift through the already vast amounts of data on the young web. Another seven years from 1993 to 2000, as search engines matured and WEB 1.0 started to go mainstream, the amount of data nearly quadrupled to 55 exabytes. According to Eric Schmidt (2010), former CEO of Google, the amount of data collected between the dawn of humanity and 2003 is equivalent to the​ amount we now produce every two days.​ Seven years from 2000 to 2007, the rate of growth increased to six times again, with the total data in existence growing to 300 exabytes. Information added to the web up to this point was primarily just digitizing all human knowledge: written, spoken and experiential. 16 However, 2007 was a pivotal turning point for Big Data, as Sensor Data began to populate more space on the web and WEB 2.0 started to go mainstream. Also in 2007, the transformation to our now digitized mobile world began, the Mobile Era (Meeker, 2010). The release of iPhone17, not surprisingly, was the result of this evolution, packed with sensors such as accelerometer, gyroscope and more. In less than seven years, the rate of growth had increased over 15 times with 4500 exabytes or 4,5 zettabytes of data produced. Total data officially passed the zettabyte mark sometime in 2010. The worldwide adoption of mobile phones has sparkled an exponential explosion in data creation and has been the catalyst for the field of Big Data to evolve. Beyond mobile phones, internet connected devices are evolving into a field of their own with the Internet of Things

15 Retrieved from: h​ttps://en.wikipedia.org/wiki/World_Wide_Web 16 Definition of Sensor Data: Sensor data is the output of a device that detects and responds to some type of input from the physical environment. The output may be used to provide information or input to another system or to guide a process. (last visit:23/4/2018) https://internetofthingsagenda.techtarget.com/definition/sensor-data 17 Retrieved from: h​ttp://historycooperative.org/the-history-of-the-iphone/

11

(IoT), which in turn correlates to the increasing number of sensors that can digitize the 18 physical world into data even further . A great example of such a thing, that the world is increasingly witnessing over the past few years, is the digitalization of our daily activities. The Big Data trend in the health community. Millions of people digitize and track themselves through the use of smart watches and over health tracking devices (BDV, 2016). We have over six (6) billion data points sitting in our 19 genomes alone (Robison, 2015) . In addition to this, according to a 2017 report20 by We​ Are Social and Hootsuite​ ,​ half of the worldwide population has access to the internet and the prediction is that this number can only grow larger. As the number of the connected users increases, the exponential data production will continue to increase as well (Cave, 2017). By 2020, it is estimated that there will be more than 40-44 zettabytes of data produced in total, which is almost 5,200 gigabytes for each person in the world21. By 2025 the production will reach the 200 zettabytes (Mearin, 2012). We are primed to reach the yottabyte era by 2030 (Sasaki, Fujitsu). As reference, yottabyte is equal to 1000 zettabytes. Interesting information is that there is no official measurement beyond yottabyte. Although the tech industry is considering the term brontobyte to represent a thousand (1000) yottabytes, and Geobytes to represent a thousand (1000) brontobytes22.

18 Retrieved from: https://www.forbes.com/sites/avigilon/2014/10/13/the-digital-physical-world-in-2020/#2d00459d2811 19 Reid Robison, 2015, retrieved from: https://www.linkedin.com/pulse/how-big-human-genome-megabytes-base-pairs-reid-robison/ 20 h​ttps://wearesocial.com/special-reports/digital-in-2017-global-overview 21 https://www.computerworld.com/article/2493701/data-center/by-2020--there-will-be-5-200-gb-of-data-for-ev ery-person-on-earth.html 22 http://itknowledgeexchange.techtarget.com/storage-disaster-recovery/exabyte-zettabyte-yottabyte-then-wha t-opinions-vary/

12

The following graph is for the better understanding of the measurements above:

Figure 1.1 What is a zettabyte? Source: h​ttp://www.missqt.com/what-is-a-zettabyte/

Apart from the long term calculation on the data to be produced, it is very interesting to pay attention to the increase of data produced daily. In 2002 less than a 40th of an Exabyte was produced in one day, in 2017 this number climbed up to 2.5 Exabytes. At this rate of growth, by 2025 nearly 120 exabytes of data will be produced every day. In practice, 90% of all data, we as human race had ever accumulated, was produced from 2010 to 2013. Those numbers indicate that by 2020 we will be able to create, that amount of data in just one year. Data production is an accelerating exponential trend with no end in sight.

13

1.2 How is Big Data defined?

So far, in this paper, we are talking about Big Data but still have not defined the term. Ward and Barker (2013) wrote a survey about Big Data definitions and, again, Gil Pres (2014), in an article on Forbes, makes a list of 12 possible definitions of Big Data. Those archives make obvious the ubiquitous nature of Big Data and give an understanding of how complicated Big Data ecosystem is. In an attempt to perceive the data concept, it is of importance to analyze its origin. The Latin word dare,​ ​which means “to give”, is the ancestor of the modern word data​ so​ originally data​ means given​ by (Strong, 2015), but instead of that meaning we usually refer to data as something taken (Kitchin, 2014) ;extracted through observations, computations, experiments, and record-keeping. According to Kitchin the right word to symbolize data in science should be captum (from the Latin word capere​ ​, which means ‘to take’), instead of datum and, by extension ‘capta’ instead of ‘data’ (Kitchin, 2014). So, when we talk about data, we talk about what has been taken and is ready to be analyzed. The potential benefits of Big Data are real and significant, (..), there remain many technical challenges that must be addressed to fully realize this potential (Agrawal et. al, 2012). In this chapter, there are many definitions for Big Data, but it was cosidered practical to analyze the most common one. This does not mean that it is the only definition which finds applications in Big Data environment, there are much more aspects of Big Data to be covered. The first time we came across the term “Big Data” was in the 1990s decade (Cox & Ellsworth, 1997). Initially​ the idea was that the volume of information had grown so large that the quantity being examined no longer fit into the memory that computers use for processing, so engineers needed to revamp the tools they used for analyzing it all (Viktor Mayer-Schonberger, 2013). The first known and widely accepted definition of Big Data came public in 2001 with the 3Vs (Laney, 2001) ,which​ are still considered to be the core pillars: Volume, Variety and Velocity. Of course, this definition does not work for all Big Data studies (Schroeder, 2014). There are Big Data cases that we come across just one of the Vs or some extra characteristics.

14

However, Gartner’s Inc. proposed definition is:

“Big data is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, 23 decision making, and process automation.”

In order to gain a better understanding on Big Data, we will examine in more detail those three Vs.

Figure 1.2 The three “Big Data” Vs Source: h​ttps://www.opservices.com/big-data-analytics/

1) The volume or​ the size is synonymous with "big" in the Big Data definition (Gartner). The volume is a relative term, e.g. some small-sized businesses are likely to have just gigabytes or terabytes of stored data as opposed to the petabytes or exabytes of data that large global businesses and public organizations have. Big Data volume depends on the available datasets. The volume of data will continue to grow, regardless of the size of the organization that uses or/and produces them, so the term, also, applies to the extremely growing volume. There is a natural tendency for all types of organizations businesses or even households to store data of

23 Retrieved from: h​ttps://www.gartner.com/it-glossary/big-data

15

all kinds: financial data, medical data, environmental data and so on. Many of these enterprise datasets today are within the terabytes range, but petabytes or even exabytes will soon be available (Lovalekar,2014). Corresponding growth of data is faster than the growth of available storage capacity (Gandomi and Haider, 2015). 2)The Variety which​ indicates different types of data ranging from video to data logs, for instance, and with different structures (Blazquez & Domenech, 2017). Within a Big Data framework we come across structured​ ,​ unstructured​ ​or semi-structured​ ​data. The data comes from various sources and in a variety of types (Löfgren, Gravem and Haraldsen, 2011). Variety represents the inconsistency in data flows, information can flow from multiple different locations, which can differ within time (with daily, seasonal and event-triggered peak loads that can be challenging to manage). The term and function of Variety, also refers to patterns and relations among information mélange. Ever since, the explosion of microcontrollers, smart devices as well as social networking sites data has become complex. This happened because it includes not only structured traditional standard data, but also semi-structured and unstructured data. Structured Data:​ is a tool being optimized since the birth of web. This type describes the data that is grouped into a related drawing (e.g., rows and columns within a standard database). Configuration and consistency allow them to respond to simple queries to reach useful information, based on the parameters and operational needs of an organization, usually very simple to understand. The real power of Big Data is being able to sift through various sets of data, both structured and unstructured, understand them and derive correlations and conclusions of them (Lohr,2012). Big Data is considered to be the microscope used to peer deeper into the world around us (Tufecki, 2014). To make that clear, we use a prior example: The astronomical Dataset. All that data on the motion of the planets by itself would have been useless, but the conclusion derived from it; that earth and all the other planets, orbit the sun; validated the dataset. So, first the data had to be accumulated, then a conclusion in the dataset drawn and, finally, the conclusion had to be verified by gathering more data and other astronomers verify it, as well. The hard part of Big Data research nowadays, is to verify conclusions around the datasets and to present accurate practices to the real world- which is extremely essential (Hammer,2017).

16

Depending on how they envisage data, they will draw different correlations from it, not all of them are correct or have real world applications. When working with structured data, languages such as Standard Query Language (SQL) have been around for nearly two decades to assist in recognizing patterns in datasets (Chen and Zhagn, 2014). As Big Data volume and variety has change, additional data frameworks such as Hadoop have been introduced. At this point a hypothetical example of an e-shop website is considered useful in better understanding of Big Data.

There is a structured database with three columns:

1) Gender of users 2) Categories of products 3) Ratings given to the purchased product

By analyzing the data, they could derive what categories each gender favors based on ratings given to the products purchased in those categories. To validate their findings these results would be plotted over the years and by doing so, even more patterns would be observed. Patterns like, the categories changing with the season and the same repetition year after year. Working with real data is not as simple as in the example. Working with structured data, when it comes to transactions is typically fairly simple. But, the real difficulty arrives when data comes from multiple sensors or other ambiguous datasets.

Semi-structured Data​: This is a form of structured data that does not fit into a clear and stable shape. The data are inherently described and contain labels or other markers to impose file and domain hierarchies within the data. Examples include weblogs and social media feeds.

Unstructured data​: This​ type of data consists of forms that can not be categorized easily in the relevant tables for analysis or discussion. Unstructured data is more convoluted, the vast majority of data on the web is unstructured, correlations can not be easily seen or derived and they are constantly morphing, expanding and evolving. Most unstructured data is considered essentially static noise at this point (Fan et al., 2006), but this changes rapidly. Examples of unstructured data, include images, audio and video files, social media data, satellite images,

17

scientific data and more. These data types and sets are extremely difficult to convert into actionable insights. The procedure of deriving any conclusions using unstructured data is a long, manual, painstakingly difficult process. Also, there is a major shortage of skills in analyzing unstructured data, such as programing, business intelligence, data visualisation and statistic skills (Wowczko, 2015). 3) The velocity or s​peed of​ the data in relation to the frequency of production and its delivery is also a feature of Big Data. Conventional understanding of velocity typically gives us information about the speed in which data is arriving and stored, and how quickly they can be recovered. In the context of Big Data, speed (velocity) must also be applied to data with motion: the speed at which data flows; how fast the data produces. Some data can be changed dynamically after they were created, some can be nearly obsolete in a few seconds. In order to extract information or change information the analysis on these data has to be performed immediately. This requires available allocation of capacity for near real time processing by analytic methods. The various information flows and the increase in networked microcontroller expansion have led to a steady data flow at a rate that has made it impossible to manage with traditional systems, this leads to the opinion that velocity applies to the rate of analysis of Big Data, as well. In today’s growing mobile society, people demand real-time results (Bloom, 2015). This becomes obvious if consider that many of us have experienced frustration when a web page takes longer than some milliseconds to load. With the size and complexity of big datasets yet to be analyzed, even a super computer would take days, months or even years to derive any conclusion from data, let alone deliver it -in real time- to devices across the world.

Following the same trail of thought, more recently, there is a 4​th V added; the Value/ Veracity/ Quality of the Data, which means that the inclusion of external and heterogeneous data—even though big—still raises questions about the accuracy and completeness of datasets (M. Chen, Mao, & Liu, 2014).

18

F​igure 1.3 The four “Big Data” Vs

Source: h​ttp://www.zarantech.com

4)Veracity of the data refers to the quality and often the provenance tracking and usage of the data (Wang, et al., 2015). It is considered to be a disadvantage of Big Data because when Volume and Velocity increase, Veracity-meaning trust and confidence in the data- drops (Saha and Srivastava, 2014). It sometimes gets referred to as validity or volatility referring to the lifetime of the data. Veracity is very important for making Big Data operational, because Big Data can be noisy (mostly the unstructured data)- meaningless data (Sunitha et al., 2013)- and uncertain.Value is the achievement of a result that is important, worth, or is useful in an organization, it measures how data assists in the decision-making process (Bajaj et al,2014). Value can be revenue, profit margin, or cost reduction, time, efficiency, simplification, accuracy, innovation, expansion, public relations, environmental impact, corporate responsibility (Chen et al., 2014). Big Data can be full of biases, abnormalities, and it can be imprecise (Zheng, 2016). Although, Big Data provides many opportunities to make data enabled decisions, the evidence provided by data is only valuable if the data is of a satisfactory quality. Data is of no value if it is not accurate. The results of Big Data analysis are only as good as the data being analyzed. This is often described in analytics as: “junk in equals junk out”. There are many different ways to define data quality. In the context of Big Data, quality can be defined as a function of a couple of different variables; accuracy of the data, the

19

trustworthiness or reliability of the data source, and how the data was generated are all important factors that affect the quality of the data. As George Firican (2017) explains, knowledge of the data's veracity helps in better understanding the risks associated with analysis and business decisions based on this particular dataset.

As humans we can only see in patterns at such a deep level before things stop making sense to us. With the increasing popularity of machine learning this is starting to rapidly change. As stated earlier, Big Data acts as more of a microscope, allowing us to peer deeper into the world and draw correlations and conclusions from the patterns we see, through the use of powerful algorithms and machine learning.

20

Chapter 2

Data Revolution

The Big Data Era has quietly descended on many communities, from governments and e-commerce to health organizations (H. Chen, Chiang, & Storey, 2012). The Internet offers extraordinary potential for the expression of citizen rights, and for the communication of human values (Castells, 2003). We are only just starting to see the value that social media data has in sparking social innovation. Social Big Data is, without a doubt, going to play an important role in the next years or even decades, for that reason it is important to examine and comprehend the relative aspects (Hoven et. al, 2014). This chapter is about the way Big Data has entered the Social Sciences, as well as, the world around us and how societies are experiencing this (r)evolution.

2.1 Data Revolution as Social Phenomenon

Even back in the early 00s it was obvious that a scientific revolution was about to break out, Andrew Abbot (2000) in his article “Reflections on the Future of Sociology”, notes:

“There is little question that a gradual revolution in the nature of knowledge is taking place: a slow eclipsing of print by visual representation, a move towards knowledge that is more experimental and even aleatory, an extensive commodification of important parts of previously esoteric knowledge.”

21

2.1.1 The Data Revolution Era

Changes in social networking and the omnipresent use of World Wide Web (www) in daily life, as well as improvements in computational power and data storage, have impressive effects on data production, consumption and analysis. Social networks, sensors, and data infrastructure are generating a massive amount of new data (big data, big corpora, linked data, open data, etc.) that are readily available for the analysis of societies (Ovadia, 2013). That is the phenomenon referred as “data deluge” (The Economist, 2010) able to radically change both the social and the individual behaviors. Others, such as Kitchin (2014), characterize the present time as the “Data Revolution Era”. Moreover, there are others, such as Martha Stone (2014) who argues that the Big Data Revolution did not happen by accident. Prices for digital media storage and bandwidth are going down, explosion of digital devices including smartphones and tablets, and the exponential growth of audience accessed digital media have aroused the perfect conditions to create this surge in Big Data strategies and implementations. Also, Andreas Weigend24 in his article “Social Data Revolution(s)” (2009) supports that data exploitation has passed to a new era and businesses as well as users turn into using and providing more “honest data”. By “honest data” we refer to the situation when users give reliable feedback for online products/ services/ surveys. “Honest data” means that we have the potential for more accurate analysis and consequently, better results.

Lohr (2012), quoting Gary King- director of Harvard’s Institute for Quantitative Social Science- about Data Revolution:

“It’s a revolution; we’re really just getting under way. But the march of quantification, made possible by enormous new sources of data, will sweep through academia, business and government. There is no area that is going to be untouched.”

24 A​ ndreas Weigend ​is the former Chief Scientist at Amazon.com and an expert in data mining and computational marketing. He currently teaches the graduate course Data Mining and Electronic Commerce at Stanford University, and the executive course Technology, Information and Innovation in Shanghai. As an independent consultant, he now helps data-intensive organizations make strategic decisions based on analytics and metrics. ​https://hbr.org/2009/05/the-social-data-revolution.html

22

2.1.2 Data Revolution and Social Sciences

A more long- term impact of Data Revolution proved to be its adjustment and its necessity -at this point- in Social Sciences (Ridgway, 2015). The reason is that, this technological evolution empowers the empirical base of social disciplines, and also because it promotes interdisciplinary collaboration between different areas of science, enhancing integration of data and methods. Only by mixing social theory and computation, data and modelling in an innovative way, social scientists can contribute to a clearer vision of the human beings and society. Data Revolution allows social scientists to reconsider previous sociological question and analyze them from a different perspective. Now instead of asking a question to a small sample of people, social scientists can ask the same question to a vast amount of data. The equation is simple: larger datasets = larger sample = more accurate (and honest?) results. Amaturo and Aragona (2014), define Data Revolution as the total of disruptive social and technological transitions that are changing the routines of manufacture, management, and analysis of data once consolidated within the different scientic disciplines. Aaron Koblin (creative director, Google), has the opinion that, (Big) Data grants us extra senses, expands up on our ability to perceive the world and, at the end of the day, gives us the opportunity to make things tangible again and get a perspective on ourselves both as individuals and society.

2.2 Social (Big) Data

The “Big Data” definition adopted by the American Association of Public Opinion Research (2015) seems more proper when it comes to commend on the relationship between Social Sciences and Big Data:

“The term ‘Big Data’ is an imprecise description of a rich and complicated set of characteristics, practices, techniques, ethical issues, and outcomes all associated with data”

The new types of data is unquestionably of fundamental importance for social science. Personal data has been hailed as the “new oil” of the twenty-first century, and the benefits to policy, society, and public opinion research are undeniable (Foster, 2017). A great amount of data is daily produced by Sites (SNS). This massive production through

23

Social Media leads to the conclusion that those data is Social Data and they are referred to as key to crucial insights into human behavior (Boyd and Crawford,2012). A proposed definition for Social Network Sites (SNS) is that they are web-based services that allow individuals to construct a public or semi-public profile within a bounded system, articulate a list of other users with whom they share a connection, and view and traverse their list of connections and those made by others within the system (Boyd & Ellison, 2011). Social Media encourage users to express their feelings and opinions about any kind of topic. Therefore, the information they contain is to some extent a reflection of what happens in society. In the book “Debates in the Digital Humanities” (2012) is referred that for the first time, we can follow imaginations, opinions, ideas, and feelings of hundreds of millions of people. We can see the images and the videos they create and comment on, monitor the conversations they are taking part in, read their blog posts and tweets, navigate their maps, listen to their track lists, and follow their trajectories in physical space. The most extraordinary feature is that we don’t need to ask their permission to do this, since they themselves encourage us to do so by making all of this data public. Indeed, the term “Social Big Data” is becoming popular to refer to data generated by SNS and blogs (Bello-Orgaz, Jung, & Camacho, 2016). The amount of data produced in SNS is huge and gives a lot of space for analysis and research. Every minute on Facebook: 510,000 comments are posted, 293,000 statuses are updated, and 136,000 photos are uploaded25 and on : (according to the last update, almost three years ago) 500 million tweets are posted every day26. Social Big Data come from joining the efforts of the two previous domains: Social Media and Big Data. Therefore, Social Big Data will be based on the analysis of vast amounts of data that could come from multiple distributed sources but with a strong focus on social media (Bello-Orgaz et al., 2016). Depeige (2017) explains that, the rise of social media combined with the emergence of new technologies has made it possible to adopt a new approach to understand individuals and society at large, erasing the long existing dichotomy between large sample size (quantitative studies) and in-depth analysis (qualitative studies). These numbers illustrate that Big Data analysis is becoming further widespread in the social sciences and is often based on social media data like Facebook and Twitter along with mobile phone data.

25 h​ttps://zephoria.com/top-15-valuable-facebook-statistics/ ​(Last visit: 11/03/2018) 26 h​ttps://blog.hootsuite.com/twitter-statistics/ ​(Last visit: 11/03/2018)

24

The following graph portrays the amount of Data produced every minute. It is significant to consider all the data we as individuals contribute every day, to realize how Big Data is compiled.

Figure 2.1 Data shared in Social Media per minute Source: h​ttps://gr..com/pin/283726845247077736/?lp=true

Another perspective for Social Big Data is their potential to transform our ability to look at (human) behavior over time (Strong, 2015). This kind and amount of data allows researchers to relax the atomistic assumptions that are imposed by reliance on random samples (Golder & Macy, 2014), and to model social life as relationships among actors (Macy & Willer, 2002). Big Data offers the humanistic disciplines a new way to claim the status of quantitative science and objective method. It makes many more social spaces quantifiable (Boyd &

25

Crawford,2012). For that reason, more attention is being paid to SNS as sources of data potentially useful in forecasting social variables (Blazquez & Domenech, 2017). Even though the data we collect from SNS is huge and valuable the truth is that “Social media is just a small part of the social data universe—one of the many data sources that represent the front end of the process. The back end is when you bring together the data from different sources” (Krivda 2011). Since ancient Greece scientists observe that people have the need to know what comes next, the need to make the right choice, the need to be a step ahead. Society has not changed that much and people are still craving for insights and that is what makes Social Big Data so important. The emergence of big data from social media has influenced the study of human behavior in the same way the introduction of the microscope or the telescope had in the fields of biology (microscope) and astronomy (telescope): it has produced a qualitative shift in the scale, scope and depth of possible analysis. Such a dramatic surge requires a mindful and systematic examination of its methodological implications, including tradeoffs, biases, strengths and weaknesses (Tufecki, 2014). Tricia Wang, a technology ethnographer, supports that Big Data have the power to give these predictions only if there is a collaboration between Big Data systems and analytics with social scientists27, an opinion that comes in contrast with Anderson's (2008) opinion that we are facing “The end of Theory”, in particular he supported that modern society will forget about disciplines such as taxonomy, ontology, and psychology because as he wrote “With enough data, the numbers speak for themselves”. Anderson, 10 years after the publication of his article is proven to be wrong, because this collaboration between Big Data and social sciences is already happening, as Lev Manovich mentions, many computer scientists are working with large social data sets; they call their new eld “social computing” (Manovich, 2007). According to the denition provided by the website of the Third IEEE International 28 Conference on Social Computing (2011) this new scientific field refers to “computational facilitation of social studies and human social dynamics as well as design and use of information and communication technologies that consider social context” (“Social Computing”), (Gold, 2012).

27 h​ttps://www.ted.com/talks/tricia_wang_the_human_insights_missing_from_big_data 28 Official site of IEEE​ ​2011: https://ieeexplore.ieee.org/xpl/meostRecentIssue.jsp?reload=true&punumber=6112285

26

Two major areas to which social scientists can assist, based on decades of involvement and struggle with end users, are inference and attention to data quality (Foster, 2017). The computational turn offers the humanities an incredibly important opportunity to study the contemporary transformation of society (Es and Schäfer,2018).

2.3 Big Data gets political

Social Big Data have to power to transform the relationship we have with the governments or the authority. Big Data empowers individuals to act for the common good either as protesters or as contributors to local issues. Everything we want to learn is just a click away, the same is true for everything we want to share. We can transmit an idea or news almost instantaneously by our social media profiles or an online application, but most people do not realise the power behind that click. It is common to think of change as a bad thing, but this time we are witnessing the dawn of a new age of participation, where people can participate in society in ways that were unthinkable in the past.

2.3.1 Social Movements and Disasters’ loudspeaker

Social media, once again, have a distinguished position because have brought users together and enabled the sharing of knowledge between people of different communities, cultures, countries, and continents. Social media, as explained before in this chapter, has given to many people, especially those belonging to marginalized and minority groups, the ability to voice their opinions and concerns. In many different ways, social media has empowered the first creation of a global village, where everyone is a “virtual” neighbor of another (Strømmen-Bakhtiar, 2012). Users are able to create online communities and exchange information. With Big Data analytics it is possible to calculate the size of those communities, understand their purpose and act on it. Social movements in the Information Age are essentially mobilized around cultural values. The struggle to change the codes of meaning in the institutions and practice of society is the essential struggle in the process of social change in the new historical context, some of the most important social movements of our time, such as nationalist or religious movements, are very old in their principles, but they take on a new meaning when they become trenches of

27

cultural identity to build social autonomy in a world dominated by homogeneous, global information flows. The Internet, and especially social media, become essential mediums of expression and organization for these kinds of protestation, which coincide in a given time and space, make their impact through the media world, and act upon institutions and organizations by the repercussions of their impact on public opinion. These are movements to seize the power of the mind, not state power. E-power increasingly functions in global networks, largely by passing the institutions of the nation-state, movements are faced with the need to match the global reach of the powers that be with their own global impact on the media, through symbolic actions. In other words, the globalization of social movements is a distinct, and much more important, phenomenon than the movement against globalization. The Internet provides the material basis for these movements to engage in the production of a new society (Castells,2003). Democratization movements had existed long before technologies such as mobile phones and the internet came to wide usage. But, with these technologies people sharing an interest in democracy built extensive networks, create social capitals, and organize political actions; virtual networks materialized in the streets. Brave citizens sharing their opposition to authoritarian rule, and digital media helping to accelerate the pace of revolution and build its constituency. Digital media and Big Data technics serve as an “information equalizer,” allowing for both the telling of compelling stories and the management of all the small communications and logistics tasks that must happen in concert if an uprising is to succeed. Social media allows protesters to organize their thoughts and to synchronize their actions (Shirky, 2011). Digital media provided the important new tools that allowed social movements to accomplish political goals that had previously been unachievable. And judging by the reactions of dictators and other desperate political elites, digital media have become an important part of a modern counterinsurgency strategy (Hussain,2013). Social networks can have enormous benefits for the society, in one more way, as it can help in disasters. Through the social networks, the information about disasters can quickly be disseminated among the people. The most prominent and widely used social media networks like Twitter and Facebook are playing an important role in the propagation of information which could be of different genres. The widespread use of hashtag trends can help with easy access of the latest trends going on. In case of any disaster or catastrophic crisis, the faster

28

spread of information through these sites could be an epidemic in saving lives and providing assistance for the further course of action (Baloch et al.,2016). Some of the disasters that Social Media, and by extend Big Data, had active part in are the Haiti Earthquake (Mier, 2015), the Japanese Earthquake in 2011 (Millham and Thakur, 2017), the Hurricane Sandy in 2012, the Nepal Earthquake in 2015 (Baloch et al.,2016) and more. In these cases Big Data analytics helped in discovering patterns and recognizing if the informers where eyewitnesses or not, to find out where help was the most needed and spread the word, ubiquitously. 29 Below a Crisis Map by Digital Jedi pinpointing the damage and resulting needs across the Haitian capital of Port-au-Prince.

Figure 2.2 Port-au-Prince Crisis Map Source:h​ttps://www.brookings.edu/blog/techtank/2015/02/19/digital-humanitarians-big-data-and-disas ter-response/

Those maps were built by using Big Data Analytics. The head of FEMA (Federal Emergency 30 Management Agency) , Craig Fugate (2012), referred to these crisis maps, which use Big Data analytics, as the most detailed and useful tools available to the humanitarian community.

29 Digital Jedi website: h​ttp://digitaljedi.info/ 30 FEMA website: h​ttps://www.fema.gov/

29

2.4 Big Data Utility

Big data can be exploited by many sectors, bringing a wave of innovation and productive profits. The discussion of the impact of large data is particularly centered on the application of sophisticated methods like Hadoop and HANA for managing large data rather than on intermediate software or infrastructure. Therefore, the adoption of Big Data technologies always emanates from the analysis dimension, which in turn guides the adoption of underlying assistive technologies. Most of the time when people talk about Big Data, they talk about the "commercial side". About how businesses can use them for advertising or marketing strategies. However, a significant part of the Big Data is used by consumers, people are utilizing data. The interesting fact is that the algorithms used and all the sophisticated processing is almost transparent to consumers. The results are so clear that consumers only get a small part of the available data, but the information is exactly what they need. Large organizations have exploited the power of data analysis for a long time. However, consumer service companies find new ways to use Business Intelligence to benefit individuals, meaning their audience or consumers or registered users. Websites like Amazon and Netflix started this trend using sophisticated Business Intelligence (or data mining) to understand and suggest things that consumers might want to buy, or watch. One thing that makes all of this feasible is the growing availability of large public and private sources of information. Government agencies and companies such as Facebook Inc., Google Inc. and Twitter Inc. offer Application Programming Interface (API), which allows other software developers to access and use their data. Although consumers are worried about how the promulgation of their personal data may have an impact on their private life, many find ways to take advantage of new, easily accessible, data services. Big Data provide significant comfort and functionality to consumers but Big Data has the same effect on the business world. Big Data inverts the way people trade. Many people have identified the Big Data in terms of trade in Google's results, additional searches. For instance, whenever someone types a search on Google or a similar search engine, it appears the results, 31 but, ads, also, appear at the same time. Advertisements are displayed based on Google's

31 Google adsWord: http://adwords.google.com/intl/el_gr/start/how-it-works/search-ads/?subid=gr-el-ha-g-aw-c-0-b4_xx_txx_xx_

30

criteria and information for the final user. Google places ads based on the user's profile and displays the elements that the user is more likely to respond to. This is accomplished by processing of a very large amount of available data. According to McKinsey (2011) and later Fanning and Grant (2013), there are ways to pour out the potential of Big Data to give value:

Creating transparency Simply by making Big Data more easily accessible to interested parties in due time can generate enormous value. In the public sector, for instance, the provision of relevant information more easily accessible to the various segregated departments can quickly reduce research and processing time. In the construction, integrating research and development data, engineering and processing units to exploit modern technology can significantly reduce the time needed for a product to be available on the market and improve quality.

Activating experimentation to discover needs, demonstrate volatility and improve performance As organizations create and store more transaction data in digital form, they can collect more precise and detailed data efficiencies (in real or near real time) on everything from item inventories to taking a sick or personal day off. IT (Information Technology) allows organizations to synchronize processes and then conduct controlled experiments. Utilizing data to analyze performance variability -which is either naturally occurring or altered by controlled experiments- and understanding the underlying causes that make leaders able to drive performance to higher levels.

Substituting populations to adapt actions Big data allows organizations to create specialized partitions and tailor products and services to meet these needs. This approach is well-known in marketing and risk management, but it can be revolutionary elsewhere - for example, in the public sector, where ethical treatment of citizens in the same way is commonplace. Even consumer goods and service companies that have used the segmentation for many years are beginning to implement even more

xx_bau_non!o2~119893495-230493462685-kwd-22649792327&utm_campaign=gr-el-ha-g-aw-c-0-b4_xx_txx_x x_xx_bau_non!o2~119893495-230493462685-kwd-22649792327

31

sophisticated big data techniques such as real-time retail pricing aimed at promoting and advertising

Replace / Support human decision-making with automated algorithms Sophisticated methods of analysis can significantly improve decision making, minimize risks, and discover valuable information that would otherwise remain hidden. Such methods of analysis have applications for organizations from tax authorities that can use automated risk engines to mark candidates for further examination to retailers who can use algorithms to optimize their decision making process such as automatic inventory fining and pricing in response to real-time purchases in the store or via internet. In some cases, decisions will not necessarily be automated but increased by the analysis of huge, complete sets of data using Big Data techniques and technologies rather than small specimens with spreadsheets that someone can personally use to handle and understand (Lalovich,2014). Decision-making may never be the same. Some organizations are already making better decisions by analyzing entire sets of data from customers, employees, or even sensors embedded in products.

Innovating in new business models, products and services Big data enables companies to create new products and services, reinforce existing ones and devise entirely new business models. Manufacturers use data resulting from the use of modern products to improve the development of the next generation of products and create innovative resale bidding services, through proactive maintenance and preventive measures to minimize failures. Real-time location mapping has created an entirely new set of location-based services, including property pricing navigation and accident insurance based on where and how people drive their vehicles.

2.5 Predictions and Insights

The uncertainty about the future, the more and more intense environmental changes, the consumer trust, as well as the fidelity of buyers or the devotion of the voters that is decreasing year after year, due to the economic environment, are the conditions which increasingly create the need for businesses, organizations and even governments ,to foresee their prospects in the future. They want to predict the effect, crisis will have upon their sales or policies, what plans to make, what goals they should set and how to create a development strategy in an

32

environment full of challenges, based on some data. The right predictions are important to achieve both strategic and operational objectives. Forecasting, or in different words, Data Driven Decision-making (DDD) (Brynjolfsson et al., 2011), identifies those points that contribute to development over a long term horizon. The result of this analytical model quantifies the consequences of these points and provides forecasts in both volume and value. At the same time, it takes into account a large number of factors such as economic indicators, demographic changes, socio-economic and consumer/market trends. The benefits for those interested are better resources distribution depending on the factors affecting the increase in demand of one brand or category. Consequently, the increase in productive and functional efficiency, greater stability in the creation and execution of plans. Industrialists and retailers can hold the ideal storage of goods according to the sales envisaged for each year. They are more flexible in responding to their preferences, increase the 32 profitability and Return On an Investment (ROI ) through data-based plans and at the same time they have smaller risk of reduced sales through time and informed adaptations of promotional activities. This solution makes possible to create scenarios through a tool where changes can be made to variables, for example a 10% increase in promotional activity next year and feel the result in volume and value. Alongside , forecasting gives the opportunity for better strategy in combination with retailers, for example in which retail chains should the industrialists aim the following year, to achieve their goals in sales, in value, in market share, in profitability. In addition, it allows more effective placement and communication of the brand through the integrated understanding of the factors that affect purchasing behavior. The marketers and other consultants now know in what elements of the marketing mix they should invest to achieve their goals. It also contributes to long-term increase in profitability-acceptance through the understanding of external factors affecting the organizations, such as the economic environment and the demographic changes. For example, if a manufacturer does yogurt, in a country where the number of elderly grows with a very large rhythm, there may be a need to create yogurt

32 Return on Investment (ROI) is a performance measure, used to evaluate the e​fficiency ​of an investment or compare the efficiency of a number of different investments. ROI measures the amount of r​eturn ​on an investment, relative to the investment’s cost. To calculate ROI, the benefit (or return) of an investment is divided by the cost of the investment. The result is expressed as a percentage or a ratio. https://www.investopedia.com/terms/r/returnoninvestment.asp ​(last visit: 30/4/2018)

33

enriched in calcium. Such examples, also, apply to public policies and governmental strategies. It seems normal that Big Data are not able to create insights and value without the proper handling, KPMG (2015) comes to the conclusion that there are eight essential points that determine the proposals for increasing value:

● The issues to be addressed, must be defined Companies/organizations usually have a problem and they do not know which analytical model they will they have to "run" in order to solve it and sometimes they do not even know this Data Analysis solution exists. The person who is called to interpret the data has to find the most important points in the customer's problem, where the analytical model should focus on. ● Insights do not come just from the data There should be cross-sectoral cooperation (analyzing the data and working across silos to map this back ) to identify issues and challenges a company is facing.

● The analyst must go beyond specific solutions These analytical model projects should not deal with a concrete solution but be part of an integrated business strategy and support investment decisions. According to Christian Rast, KPMG Global Head of Data and Analytics: “At the end of the day, growth and productivity are about adding value to the organization, something that Data Analysis excels at.”

● Determination of value The truth is that the value varies depending on the subject you are asked to solve. It is different if you need to reduce the cost, different when it is about risk management and different when there is a need to improve customer experience related to a product or service. ● Focus on customers The analyst should think about how to use new data types and algorithms to automate decision making procedure, so the customer gets better service. ● Make the right questions Data Analysis should not be carried out just to be conducted. The results (insights) should be prioritized and understanding their potential value, not only in terms of “who gets what”, but also in terms of “speed and complexity.” That is the point where social computing is involved,

34

as it was mentioned in previous chapter, because asking the right questions usually needs a theoretical background to support it.

“Companies are getting really good at collecting data, but they are having big problems connecting it. Analyzing a single source of data will never drive real value; it takes multiple streams of data to get real insight.”

-Nova Spivack CEO, Bottlenose

● Measuring success Use successes to fund more projects and spread the experience and knowledge across the organization. ● Engaging stakeholders from the early stages of a project Make the value of the analytical model understandable to the organization and the investors that see it more and more as a transformative strategy, rather than just a tool to bring greater insight to their existing business problems. Although, KPMG is professional services company, their key points are to apply to all sorts of organizations from small businesses to governmental to Non-for-profit organizations because they outline how Big Data Analytics work in a wide range.

35

Chapter 3

Case Study: Smart Trikala

A city is considered an important social structure. Since we are discussing about how Big Data affects society, this case study describes how Smart Cities use or will use Big Data. For that reason the city of Trikala is used as an example.

3.1 Data and the city With sensors almost everything is quantifiable, from the liquidity in the soil to radiation in the atmosphere to the heartbeat and the breathing of a newborn. Yet, even now, much of our world remains to be digitized: from mapping of the human body on a cellular level to better understanding the genome, to in-depth mapping of the oceans- the list can go on and on (Weldon, 2016). The increasing number of the internet connected devices, due to mainstream adoption, smart cities and more will help with this digitization. Individually these products are great contributors to our own personal wellbeing and life, but the real power is when hundreds of millions of people contribute data and we begin to see patterns emerge from across the world. Likewise, Big Data involvement in Smart Cities is useful for implementing and improving policies,too. Taking in data affects someone as an individual and in turn the individual affects the data, again. This circle of the data makes us actors in a larger system.With methods like that, combining datasets, will give insights about the crime rate and the quality of the road network and differentiate the city’s priorities. All this possibilities bring the realization that cities are responsive organisms and citizens their vital organs. There are all kinds of new approaches and tools that politicians can use to get citizens involved, to brainstorm, to get new ideas. In fact politicians get to use the power of

36

networking. Internet, social media and political applications are based on networks and make the spread of news or disasters or malfunctions significantly easier than it used to be. At this point data generation or volume is not our issue anymore (McLellan, 2015). The new concern is to understand the variety in the datasets and being able to utilize them by deriving verifiable conclusions and correlations that have practical real-world applications (Bhadani et al., 2016). The definition of the term “Smart City” is still under construction and it is used to describe various approaches to the “Smart City Situation”. The flexibility of this definition provides cities with the opportunity to define its programs, policies and procedures according to its own local set of priorities and needs (Doherty, 2014). For this case study the data-driven definition by Cisco is adopted which considers smart cities as those which adopt “scalable solutions that take advantage of information and communications technology (ICT) to increase efficiencies, reduce costs, and enhance quality of life” (Falconer and Mitchell, 2012).

Big data is the term used to describe very large, complex, rapidly-changing datasets (Gurin,2014). Smart cities are producing Big Data, because they are becoming smarter by collecting and analysing data (Babu and Swathi, 2018). Big Data, as mentioned before, is produced by IoT (Internet of Things), sensors, mobile phones, smart devices, online applications and many other sources and is considered essential for smart cities because it can be used to improve city life and public services (Tomas, 2016). Big Data can help reduce emissions and reduce pollution. Street sensors will measure total traffic at different times of day and total emissions. The data can be sent to a central unit that is coordinated with the traffic police. Traffic can be tackled or diverted to other less congested areas to reduce carbon dioxide emissions in a particular region. Parking problems can be better addressed. The cars will carry sensors that will guide the car to the nearest available parking spaces. The 33 environment will be cooler and greener with less energy consumption . Big data analytics are certainly reshaping and enriching our experiences of how cities can function and be managed, planned, and developed (Bibri, 2018).

33 Retrieved from: https://www.myota.gr/index.php/k2-tags/2013-02-13-13-23-56/140-2013-03-19-04-55-08/11393-big-da ta ​(last visit: 18/5/2018)

37

Big Data analytics for smart cities are still immature (Sustainable Cities and Society, 2017) and in developing cities, the reality is that operations are uncoordinated and data capture is still a heavy manual process (Deloitte, 2015). Although, as a city evolves toward informatization and intelligence, numerous information bases and data centers have been emerging, which should be properly interconnected to form urban ​big data (Pan et al., 2016). In the same paper (Pan et al., 2016) Urban Big Data is defined as: “Urban big data is a massive amount of dynamic and static data generated from the subjects and objects including various urban facilities, organizations, and individuals, which have been being collected and collated by city governments, public institutions, enterprises, and individuals using a new generation information technologies”. Urban big data enables a highly granular, longitudinal system, a whole system understanding of a city system or service (Kitchin, 2016).

Figure 3.1 Urban Big Data

34 According to the World Health Association 70% of the world’s population (more than six billion) will live in urban by 2050 . Having such massive volume of the population, billions of the devices will also communication with each other, this producing overwhelming of Big Data (Rathore et al., 2016). Smart cities emerge all around the globe and prove that technology can be very helpful for the citizens, as well as, for the city authorities. There are several examples from cities that are trying to integrate these “smart” solutions and some, successful, of them are Amsterdam, Bologna, Singapore (Sanseverino, Sanseverino and Vaccaro, 2017) and so on. This study, though, examines the case of Trikala, because it is one of the first cities in Greece that took the initiative to become smarter. According to Anthopoulos and Tsoukalas (2005) the smart city of Trikala is build on a four layer architecture: a) Infrastructure Layer, b)

34 World Health Association, retrieved from: h​ttp://www.who.int/bulletin/volumes/88/4/10-010410/en/

38

Service Layer, c) End-user Layer and d) Information Layer. The main concern here is the Information Layer. Information Layer holds all the data produced and exchanged within the framework of the digital city. It includes the necessary database for storing the information, as well as, the mechanisms to ensure the safety and integrity of the data in all layers ( production, storage, exchange, access). It is also responsible for the copyright safeguarding and tries to overcome interoperability and communication problems with systems ,already running in the city (Tsarchopoulos, 2013). Information Layer certainly indicates the close connection with Big Data, because it holds information that apply to Big Data characteristics: Volume, Variety , Value and Velocity. It has immediate impact on society because all this data is collected in order to create a more efficient city environment. It is not just about the current use of Big Data, but, it is mostly about the potentials of its use. It is aiming in presenting how Big Data can be utilized by the already implemented infrastructure. The information that are presented below is collected from articles published in greek press, the website of the municipality of Trikala, some observations of the Trikala Check APP ParkGuru and the collaborators’ websites.

3.2 The case of Trikala

3.2.1 About Trikala The city of Trikala has been using new technologies since 2004. It is worth mentioning that in 2004, the Greek Vice Minister of Economics C. Folias, at the time, proclaimed Trikala as the 35 "First Digital City of Greece" . Since then, the city of Trikala has been building infrastructures and providing services that aim to create and implement applications based on Information and Communication Technologies (ICT). 36 Trikala is the homonymous Prefecture's capital and was built on the area of antiquated 37 Trikki . Asklipios, one of the most distinguished doctors of the ancient time , started and

35 Retrieved from: h​ttp://www.govtech.com/security/E-Trikala-The-First-Greek-Digital-City.html ​(last visit:17/5/2018) 36 Retreived from: http://www.greeknewsagenda.gr/index.php/topics/business-r-d/6357-thinking-of-a-greek-smart-city-thin k-of-trikala ​(last visit: 16/5/2018)

37 Retrieved from: h​ttp://www.govtech.com/security/E-Trikala-The-First-Greek-Digital-City.html

39

practiced pharmaceutical here. The Lithaios River (the stream of Lithi = “forgetfulness” in mythology) streams through the city of Trikala. Its Central was fabricated in 1886 in France by French engineers and is made of metal. The banks of Lithaios have been formed and adorned with exceptionally tall plane trees and wild chestnuts creating unique green view within the heart of the city. The middle of the city is recognized by historical byzantine and ottoman landmarks, amazing urban arranging, roomy squares, parks, pedestrian roads and exhibition halls, such as the Tsitsanis Gallery, devoted to Vasilis Tsitsanis, one of Greece's most conspicuous composers and lyricists, who comes from Trikala and was especially powerful within the field of ‘Rebetika‘ (urban blues). This excellent city is found at a remove of 331km NW of Athens and 215 km SW of Thessaloniki. At a stone’s toss, there's moreover the world celebrated breathtaking location of Meteora, a one of a kind topographical phenomenon. Unlike numerous cities and towns in Greece, Trikala is flat – a reality that has contributed to the town becoming one of the foremost bike-friendly within the country. Three bicycle roads run through the city and bikes for open utilize are indeed given by the district at certain information points. One of Trikala’s most charming bicycle roads runs nearby the Lithaios River with its clean, cool breeze, swans and ducks.

3.2.3 The vision According to the official Municipality’s website, their essential concern is the usefulness of technological applications for the city’s inhabitants. With a population of about 76.000, the overall goal is to make sure that collective benefits are maximized by implementing policies that reduce cost and resource consumption to engage more effectively and actively with citizens. A 21st-century city must make technological progress accessible to its citizens, changing elements of their regular lifestyle for the better, counting quality of life, ease of transportation, and cost of living (Dubow, 2017) . One of the city’s major achievements is the large-scale demonstration of 6 public driverless transport vehicles. The demonstration lasted 5 months (September 2015 – February 2016) in the framework of the CityMobil2​ pilot project activities. The CityMobil2 is a European project which deals with automating mobility, the full title is “Cities demonstrating cybernetic mobility” . The project was about the research of Automated Road Transport System (ARTS) and the acceptance of the citizens (Final Report Summary- CITYMOBIL2).

40

The six CityMobil2 buses operated as complementary to the rest of the city’s public transportation system on a specific route within the city centre. Trikala was the only city were the demonstration took place in the city center (Panteliadis and Sidiropoulos). During the pilot implementation, 1,490 routes were conducted, 3,580 kilometers were traveled, more than 38 12,000 passengers were transported . Greece was the first EU country to apply national law at a very early stage allowing automated transportation. The results of this demonstration have been really useful not only for the city of Trikala but for every city that intends to automate its urban transport system. Italy, France, Switzerland, Finland and Spain also took part in the project. According to the Final Report Summary of the project, during the demonstrations, substantial data was collected about vehicle performance which is valuable for further improvements of the vehicles, especially in localisation and obstacle detection. A strong base of operating knowledge and of technologies generated by a comprehensive approach to evaluation has been developed which places the manufacturers.

3.3 Current situation

The Municipality of Trikala was recommended by a group of distinguished companies (of national and international renown) to host a large-scale pilot project that makes available to city residents services that utilise advanced technological achievements. Collaborators to the project are UNIXFOR, Vodafone, ENGIS by enstruct, SYNERGASIA S.A., Cisco, Egritos GROUP, Space Hellas, Viva Wallet, ParkGuru, e-trikala, ITM Intelligent, KAYKAS and SiEBEN.

The results of this collaboration in the Municipality are the following:

Free Wireless Network of Trikala is an initiative of the Municipality of Trikala in cooperation with e-trikala SA. Implementation began in October 2005 with the aim of providing free Internet access to all citizens. The entire commercial centre of the city of Trikala is now covered with free wireless internet access. The wireless network has increased

38 Retreived from: http://www.businessnews.gr/article/33066/interamerican-apologismos-toy-citymobil2-sta-trikala

41

existing city infrastructure, since it is necessary for the operation of the other applications, and it also offers additional security to users’ internet connections. It permits fast and easy user connections to the wireless municipal network in different ways, such as through users’ accounts on social media platforms. The Municipal Authority intends to utilize the data which the wireless network collects, by using the Marera application, in order to inform residents about cultural events and activities in the Municipality and to help them enjoy their time in the city. Furthermore, in collaboration with the local Trade Association or other interested parties, business activity and increased consumer activity are promoted through targeted offers or other promotional activities. The Marera application, provided by SiEBEN, gathers information about the age and the sex of the users, their location, the time they spend in the city center and how often they connect to the free network. Marera Analytics use database marketing, data mining and advanced 39 analytic techniques for lead scoring and allows the municipality to be informed about the network’s statistics in real time through fully functional dashboards. Along these lines, the municipality has access to data that can lead to data driven decision-making. It is in the hands of the municipality to share the information with the Trade Association of the city and together they can create a different marketing approach for their market. E-dialogos is an innovative open dialogue page. It was funded through the “Politeia” Programme of the “Greek Ministry of Interior, Public & Local Administration”,considering the reform and modernization of Public Administration (FEK 30-01-2001), and the “Region 40 of Thessaly” . The platform is a step forward to e-governance and e-democracy. E-dialogos promotes the idea of e-citizenship. The purpose of the platform is to enable citizens, as well as those living and working in the Municipality of Trikala, to participate in the process of planning and implementing their city's policies and actions. E-citizenship is a recent term in political science and describes the new ways citizens can be engaged in the contemporary political activities and is linked with romanticized ideals of deliberative democracy and “thick” citizenship, which according to Chadwick (2008) have very little to do with each other.

39 Info from official Marera website: h​ttp://www.marera.io/features ​(last visit: 17/5/2018) 40 Retrieved from: h​ttps://onlinepolitics.wordpress.com/edialogos/ ​(last visit: 15/5/2018)

42

E- citizenship is closely related to the Smart City phenomenon. The analysis of extensive data sets has the power for substantive effects. The way in which digital data can improve our understanding of urban dynamics, and help plan interventions to improve city life (González-Bailón, 2013). Meanwhile, the Municipality of Trikala has the opportunity to collaborate creatively and productively with the citizens by conducting Electronic Surveys -gathering Electronic Signatures- and participating in Electronic Consultations with the ultimate goal of designing and implementing political actions. E-dialogos was nominated by the European Commission as a finalist​ project for the European eGovernment Awards 2009,​ announced during the 5th​ Ministerial eGovernment Conference,​ that took place between 18-20 November ’09 in Malmö, Sweden. The lessons learn from running the e-dialogos project were crucial. Vasilis Goulandris, responsible for the methodological design of e-dialogos, in an interview, in 2009, mentioned that such projects are not only technological but their philosophy and methodology are the elements that make them unique. The e-dialogos platform is not accessible, anymore, because it was built as a pilot project. E-politis, came to “replace” e-dialogos, with the capabilities of e-deliberation, e-polls and more.

Figure 3.2 Screenshot of the website: h​ttp://trikalacity.e-politis.gr/

43

Tele-care (τηλε-πρόνοια, in greek), is a project which has created a network for digital providence, using telematics infrastructure (transmission of information over long distance) , operating in the Municipality of Trikala.The platform includes the creation of a Health File to provide health and welfare services using contemporary IT and communications technologies, for a selected number of citizens in need. It is based on the development of systems that monitor health indexes. This solution operates complementary with other support actions for vulnerable social groups, such as “Help At Home”, towards the ultimate goal of providing comprehensive primary healthcare services to vulnerable social groups. Its actions can also be extended through the establishment of Preventive Medicine Centres (which will be open to broader population groups, such as youth, athletes, etc.), and the development of systems that notify specific resident groups about emergencies (e.g. residents with first aid and cardiopulmonary resuscitation training), etc. e-KEP (Automated Citizens Service Centre) Special,​ ATM-style machines offer residents the option, at any time of the day or night, to request and print out municipal clearance certificates, civil register certificates, and other related municipal authority documents, quickly, simply and easily.

F​igure 3.3 Image of the ATM-style certificate spot Source: h​ttp://www.ert.gr/radiotileorasi/poiotita-zois-radiotileorasi/kan-to-opos-ta-trikala/

44

Residents will be identified via their Resident Card. The goal is for more sophisticated e-services to be activated soon, which will also allow residents to submit and pick up documents that need to be notified to the Municipal Authority. The petitions and related documentation will be communicated directly, through the electronic records, to the appropriate Municipal Directorate. Using the e-KEP, interested parties will be able to print out the relevant official responses to their requests. Mobile Check App Residents​ can send their requests directly to the Municipal Authority through the Check App for mobile phones. This comprehensive application is available for free on Google Play and the App Store. In order to be able to use the application, users have to either create an account or login with their Social Media accounts. The fast growth of SNS allows users to be connected and has created a new generation of people, a new kind of society, who are enthusiastic about interacting, sharing, and collaborating using these sites (Bello-Orgaz,2016).​ E- citizens, by providing data, are helping their communities to evolve and, why not, get better. Its basic function is the capacity to log and monitor the progress of resident petitions.

Figure 3.4 checkAPP

The application is linked to the “20000″ comprehensive residents’ service platform and it directs petitions straight to the competent Municipal Department. Demosthenis is the name

45

project of number “2000”, is a system of citizen service for the management of complaints concerning the Municipality of Trikala. The specialized personnel receives the citizens' requests by telephone free of charge, via e-mail or simply by visiting the offices of Demosthenis project. Moreover,the Check APP covers basic information needs, by displaying announcements and events posted on the Municipality’s website. It also functions as a tourist guide, highlighting points of interest on a map and displaying handy information such as useful telephone numbers, pharmacies open late, and gas stations. If this collaboration between citizens and City Hall keeps on going, in some years our city experience will be totally different of what we are used to. The central idea is that the citizens will report every little problem they face everyday. In turn the city collects all the data, combines them, runs algorithms on them and figures out ways to intervene in this system. Somehow, patterns emerge and and the city is able to react to those patterns as Jennifer Palhka (2016) explains. Although the Check APP is useful and user friendly, there are only 100 downloads from Google Play and 7 application reviews. This indicates that it is not very popular among the Trikala citizens. Smart Parking System has​ been implemented, which allows the identification, imaging and monitoring of designated parking spaces in the city centre. Use of Sensors: Installation of specialised sensors on the road surface of Othonos and Garibaldi streets, with each sensor corresponding to one discreet, delineated parking spot. The sensor provides feedback to the network’s controllers by sending appropriate signals when the spot is occupied or unoccupied. Furthermore, residents can be informed in real time about the availability of parking spots in the selected area, both through the parking mobile app for smartphones and through signs that can be installed in central points around the city. Also, traffic control authorities are provided with real-time information about illegal parking instances. The application also offers the option to pay for parking.

46

3.5 ParkGuru Application

Smart parking management covering 45 general and disabled parking spots; it allows residents to locate unoccupied parking spots, to get charged and pay for parking. It also facilitates managing reward schemes and monitoring using advanced mobile technologies. The application, covers other greek cities, too.It has 4.4 stars rating on Play Store and more than 10000 downloads. As we can see, from the screenshots, it only covers two main streets of Trikala and does not provide the private parkings. Integrated Intelligent Transport System is a project, with a direct impact on the everyday life of the citizens of the city of Trikala. Now, with the help of inductive loops, the city traffic data is managed. The fleet and the city bus network are monitored, resulting in immediate information for the traffic circles. It includes smart bus stops if digital signs information signs informing the citizens about the bus routes and the arrival time and digital signs for the available parking spots. The citizens can regulate their daily routines and enjoy “good methods” equal to european standards.

47

Traffic lights operation monitoring system,​ Electronic equipment (controllers) are installed at the city’s intersections and constantly monitor the traffic lights’ operation, reports any potential breakdown, provides information about light bulb malfunctions per direction and signage (red – orange – green) and notifies the control centre online or sends a text message to the competent employee. This application can go even further by interconnecting smart traffic lights and signals across the traffic grids in order to gain the maximum possible amount of information and provide informed and fast services. This requires the use of real-time big data analytics (Nuaimi et al. 2015). Environmental Conditions Monitoring System. Energy​ and Natural Resources are expensive to produce, distribute and manage. Additionally, they have a huge environmental impact which grows larger when using them unwisely. Big Data and analytics aid in planning for peak loads variations across physical locations and time of the day so that resources are provisioned appropriately, along with the applicability of analytics to help make real-time adjustments to the distribution of energy based on actual demand on the ground (Patil, 2016). Such a system/sensor has been installed in the building of the Regional Unit of Trikala. Using special equipment for environmental readings (such as measuring the concentrations of air pollutants and particulate matter, and noise levels), the quality of the atmosphere can be evaluated and any potential impact on public health can be assessed. Also, the application displays real-time standardised indexes of environmental quality that allow for comparative evaluations (benchmarking), real-time alerts, and the identification of trends that could or should lead to specific measures. Smart Lighting System has been implemented to manage municipal street-lighting; it has achieved energy savings of over 60% compared to conventional lighting systems. Mr. Nikos Lamprogeorgos (2017), during the official presentation for the progress of the project Smart 41 Trikala , says that the calculations showed that the energy saving is reaching the 70%. Existing conventional technology lighting systems have been replaced by new LED lighting systems, along a representative street of the intra-urban street network (Othonos Street). The goal, according to the plan, is the replacement of all existing sodium lamps in street lighting with Light Emitting Diodes (LEDs) by 2030.

41 Official progress presentation of Smart Trikala : ​h​ttps://www.youtube.com/watch?v=c_JgB0ICpsE

48

Furthermore, a wireless control system has been installed, which offers the capacity for early malfunction detection, “smart” intervention scheduling, dynamic lighting adjustment, when-where and to the extent needed, to achieve maximum energy savings and to improve visibility for drivers, cyclists and pedestrians. Smart lighting will be provided with the replacement of 24 lighting fixtures with LEDs equipped with motion sensors. These lights optimize energy and use a smart function: it activates when detecting motion. The smart lighting, also, detects the sunrise and the sunset. This means, that the turning on and off, for streetlights is not based on a schedule anymore but it is based on the realisting needs. Fully networked lighting management systems generate considerable amounts of information (in other words Big Data), and have the ability to collect and compare metrics, such as energy usage per square foot and operation hours, among the interconnected devices. This is a valuable business intelligence benefit that provides comprehensive visibility into key performance indicators across an entire portfolio.The keys are the ability to access, aggregate and normalize the data. Even when data is available, it is necessary to normalize the data otherwise operational blindness occurs. Normalized data provides a standardized method for analyzing metrics and highlighting how operations can be improved across a range of building types, locations and sizes, while still providing drill-down visibility into road*-specific operational data (DIGITAL LUMENS, 2010). The use of cell phones, data storage in the cloud, embedded intelligence and “Big Data”, from better, cheaper sensors are beginning to achieve unprecedented control, responsiveness, and efficiency in lighting through characteristics that were previously nonexistent in legacy lighting systems, such as dimming and hue change.(McClellan and Jimenes, 2018). Smart Lighting is an efficient way to promote sustainable energy in an urban area. Cities can extract new value from their infrastructure, while gathering essential data to document progress toward sustainable practices. Some examples from others cities that use Smart Lighting are Barcelona. Comprehensive Geographic Information System (GIS) includes​ Business Intelligence (BI) with broad capabilities, so as to provide management of every level with tools to facilitate well informed decision-making for the Municipality, as well as easy access to the data by residents. It includes, among other things, apps for Urban Planning data, Urban Planning Archives, Technical Projects, Municipal Property, Signage, Traffic Lights, Street Lighting,

49

and points of interest. Furthermore, as part of a pilot project, waste collection routes will be analysed and optimised. The Cisco Smart+Connected Digital Platform – CDP has been installed. A smart city gathers Big Data every day. The analysis and the manipulation of this information is valuable for futurus city’s strategies. The procedure of its usage is composed of the collection the analysis and the visualization of the data. The CDP is a comprehensive IT system that utilises the advantages of the Internet of Things (IoT) and manages different surveillance and information applications, while also feeding into third-party systems through application programming interfaces (APIs). The platform collects, stores, normalises, and visualises the data produced by the above structures and applications and makes it available for analysis to parties interested in utilising them to benefit the city’s residents and businesses. Data analytics covert data into knowledge and exposes hidden patterns in the dataset. According to IBM data analytics provide advantages which is divided into four basic categories: ● Business Intelligence ● Performance Management ● Predictive analytics ● Analytical Decision Management Next comes the data visualization. The main diagram types used in a Smart City concept are charts for the relationship between data points, comparison between total scores, real-time monitoring, presentation of parts of a whole, text analysis and map representation.

50

Figure 3.6 Street Lights Visualization Source:h​ttps://trikalacity.gr/smart-trikala/

“Smart​ City” Control Centre A control centre for all “Smart City” services was established on the ground floor of City Hall.

Figure 3.7 Smart Trikala Control Room Source:h​ttps://trikalacity.gr/smart-trikala/

51

Terminals were installed to monitor the following systems:

1. The Cisco​ Smart+Connected Digital Platform is​ designed to display the data it collects on one admin screen 2. GIS ​displays spatial – urban planning data and points of interest in the Municipality of Trikala 3. Traffic light operation monitoring system.​ It offers online monitoring of malfunctions and blown light bulbs in the city’s intersections that are regulated by traffic lights. 4. Municipal vehicle traffic recording system 5. Terminal for monitoring the operation of wireless network hubs for free Wi-Fi access 6. Solenoid valves monitoring and regulating system – Municipal Water and Sanitation Utility 7. Recording and monitoring of the progress of residents’ petitions 8. Posting of Municipality of Trikala open data

Future Applications Smart Waste Management Using sensors, the waste collection centre can be notified in real time about the level of waste in trash cans. The goal is to optimise the routes of waste collection vehicles, especially in the commercial centre. Traffic Conditions Analysis through CCTV The CCTV cameras that will be installed to manage parking will also serve to monitor and analyse traffic conditions in the city. In this way, the authorities will be able to effectively manage traffic and react immediately in unforeseen incidents that cause delays in road traffic. This service will be discussed as to the method of implementation, taking into consideration the opinion of the Personal Data Protection Authority. Also to be investigated is the possibility of implementing a system to monitor vehicle access onto roads where access is not permitted (e.g. pedestrian walkways), through license plate recognition. Controlled Parking Using video analytics technologies, the system monitors parking availability, illegal parking in sensitive areas (e.g. disability parking locations, pedestrian crossings, double-parking), and compliance with any eventual paid parking systems, while ensuring protection of personal

52

data in full compliance with current legislation. The CCTV cameras are installed on the street lighting columns and each camera surveys multiple parking spots.

53

Chapter 4

Distress about Big Data

Big Data finds application in several domains, such as, astronomy and other e-sciences that do not use personal data, so privacy is irrelevant. Privacy issues arise when Big Data apply to more sociological domains, like, the , consumer and business analytics and governmental surveillance (Uzonwanne et al., 2014). The way Big Data is being used by companies and academics has a great power over societies. It can be hard to notice because the change undergoes a long time before it becomes obvious and the rapid technological progress does not allow everyone to follow the pace. The chapter is about the cases where data, indeed, affect communities and individuals, which are privacy, security, prediction possibilities and decision-making process.

4.1 Concerns about “Big Data” Social Research

Up to this point only the positive effects of “Big Data” entering the social sciences is being discussed ,of course there is a different side of the subject, as well. From the perspective of Carolin Gerlitz in an interview given for the book “The datafied Society” (2018), what is, sometimes, not taken into consideration is that Digital Media, including Social Media, are overhauled by standardization, users can only do what the platforms allows them to. All the Social Media features (likes etc.) give the user the

54

opportunity to react on prestructured emotions within the front end while at the same time creating similarly prestructured information (data points), in conclusion. Adding to this ascertainment, digital reaction on SNS beside of standardized are also limited. We can therefore conclude to the implication that, there is the technology to aggregate the data, in this case the digital reactions of the users, but we still have problem in giving meaning to them because each user can like/share/comment the same post in a totally different context. The researcher’s preliminary challenge is to designate the research data, they need to understand what they are counting, before reassembling data points into new metrics, in order to be able to work with social media data. This process should be guided by theory, a research question or prior explorations. All the previous indicate the elements that make this very interesting collaborations of sciences challenging. One of the dangers posed by the scope of computational social science tools and the explosion in the corpus of data, free of the ethical restraint placed on researchers, raises serious questions about how those who have the domination of the data and the corresponding infrastructure will influence the society (Dorasamy and Pomazalova, 2017, with examples). The appliance of social computing will potentially be used for “exploiting” ,in a way, internet users. Despite of the fact that Social Media Big Data is a powerful mechanism helping social scientists’ work, Social Big Data engraves patterns in human communication, the nature of social and cultural cooperation is complex and should be examined from various perspectives, witch makes the human empathy necessary to observe the little things, machines fail to.This is both a significant opportunity and a huge challenge, social scientists need to overcome. Maybe we have reached a point to test all the previous sociological practises and conclusions, to confirm or to deny everything we consider as common knowledge in social sciences and examine if the theories have actual practical implications in the new social structures that emerge.

4.2 Privacy

According to Warren (1890), "Privacy" is defined as anyone’s right to stay alone when he wishes. In this regard, protection of privacy is understood as the combination of "loneliness" and "non-intrusive". It therefore acquires two dimensions. One refers to private thinking,

55

ownership and actions of the individual and the other is related to other people's data and how they affect us. In everyday life, this kind of privacy is respected and protected through social rules. Similarly, "privacy" needs to be defined and delimited in the digital world, as well, with suchlike rules of the physical world. Therefore, this approach is mainly about its social dimension. The privacy of data is a huge concern, and one that increases in the context of Big Data (Agrawal et. al, 2012).

4.2.1 Privacy Concerns

When surfing online, it is quite usual to come across the word “Privacy”, which is very 42 normal if we consider all the personal data we share. According to the european law , personal data is any information that relates to an identified or identifiable living individual. Different pieces of information, which collected together, can lead to the identification of a 43 particular person, also constitute personal data . For example: name, surname, home address, e-mail address, location data. We, the internet users, most of the time do not pay as much attention as we should, in order to be informed and protected online. Privacy​ is all about not​ collecting data, or at least exercising some control over who knows what. On the other hand Big Data is about collecting, storing and analyzing as much information (data) a possible. That, puts Big Data and personal privacy on a collision course (Reno, 2012). There is an audience approach for websites to employ, users’ personal information and preferences to offer personalized products and services based on big data analytics. Think about the recommendation feature on Netflix and the personalized advertisements on Google and Facebook. In such cases Big Data is being used aiming to create and analyze profiles of us (Uzonwanne et al., 2014), which, obviously, studies us in a very deep level. This example applies to most of the well-known digital brands.Jer Thorp (2016), says, that all Facebook users are involved in a transaction in which they are donating their data to Facebook, in turn Facebook sells those data, in return the users get the service which allows them to post pictures and comment to their friends. It is not a free service, users are paying for it by allowing them access to their data.

42 D​ efinition for personal data, European Commision, retrieved: https://ec.europa.eu/info/law/law-topic/data-protection/reform/what-personal-data_en 43 Retrieved from: h​ttps://orra.rutgers.edu/gcpr

56

The pictures below are screenshots to answer how Facebook Ads work:

Figure 4.1 How Facebook decides what ads to show

In brief, Facebook asks “permission” to use our data (location, active hours, likes etc.) in order to optimise the ads we see. This way it is possible to maximize the advertised business’s profits and cultivate in specific ways our consumer consciousness.

Newell and Marabelli (2015) support that the practice of combining personal data sources can reveal very personal and sensitive information that is at risk of being released. Once​ the data is collected it is hard to guarantee that it is only going to be used for a specific purpose (Reno, 2012). Such intense engagement with personal data raises many concerns related to privacy, identity theft, illegal discrimination, unjust classification (Günther, Rezazade Mehrizi, Huysman, & Feldberg, 2017), and even “exploitation of the vulnerable” (Newell & Marabelli, 2015).​ The previous paragraphs set the privacy issue for further discussion and make obvious that there are, also, moral reasons for the protection of personal data and for providing direct or indirect control over access to those data. Van den Hoven, in 2008, distinguished some the moral reasons privacy should be protected, which all provide for limiting and constraining access to personal data and providing individuals with control over their data.:

57

● Prevention of harm: Unrestricted access by others to one's passwords, characteristics, and whereabouts can be used to utilize the information subject in an assortment of ways. ● Informational inequality: Personal data have gotten to be commodities. Individuals are usually not in a great position to negotiate contracts around the use of their data and do not have the means to check whether accomplices live up to the terms of the contract. Data protection laws, regulation and governance point at building up reasonable conditions for drafting contracts about personal data transmission and exchange and providing data subjects with checks and balances, guarantees for redress. ● Informational injustice and discrimination: Personal information provided in one sphere or context (for example, health care) may change its meaning when used in another sphere or context (such as commercial transactions) and may lead to discrimination and disadvantages for the individual. ● Encroachment on moral autonomy: Lack of privacy may expose individuals to outside forces that influence their choices.

4.3 Information Privacy

What computer scientists call "Information Privacy" (van den Hoven, et. al,2016), the term "privacy" is defined as the right to determine what personal information is available to whom and to what extent. Thus, this definition focuses on absolute control which the individuals have on their personal data, their conversations and their actions. The data owners choose what and to whom they will reveal their information and this constitutes their own individual responsibility. Therefore, this approach focuses more on personal data as property, and hence, acquires technical orientation. It is worth observing that these approaches complement each other. Therefore, if we take into account what we have mentioned in this chapter, it is feasible to approach, to a large and satisfactory degree, the concept "Privacy in the Digital Environment". So it can be identified, on the one hand, the protection of privacy in the online environment, the protection of the multiple digital identities of the individuals, their online/electronic activities, their

58

communication, their copyrights and their electronic devices through which they enter this 44 environment, and on the other hand, the protection of their personal data, sensitive or not . Additionally, the requirements for protecting "privacy" in various Informational Systems as defined and used broadly by the scientific community and which obviously applies to "Big Data" systems are the following: Confidentiality: which concerns protection from disclosure of data to unauthorized entities. Integrity: which concerns the protection of data from unauthorized insertion, modification or deletion. Availability: which concerns the protection from non-availability of data to their owner when they wish so. Authenticity: it is about securing the identity of each involved entity in any action. Non-Repudiation: concerning protection from a possible denial of an entity for not realizing a particular activity, but in which he participated.

4.4 Profiling

At this point, Big Data technology is able to provide a more safe society by tracking systems, cameras everywhere and even moving sensors able to identify every move we make. These possibilities can impact an individual’s privacy but the exponential diffusion of tracking software embedded in social networks, the sensors and cameras in many other digital devices leads to thinking that it will be hard for governments (or organizations) to regulate how individuals use responsibly the technologies that enable tracking (Newell & Marabelli, 2015). There is a strange feeling among people that they are being “followed” or heard. To use 45 Jennifer Pahlka’s words, it is like “ this device in my hand knows who I am, it can somewhat anticipate what I want and where I am going”. More or less this is the essence of profiling. Profiling is an inductive way to generate knowledge, Big Data may​ be used in profiling the user (Hildebrandt 2008), profiling is based on past behavior and it mostly helps the commerce

44 Retrieved from:h​ttps://en.wikipedia.org/wiki/Information_privacy ​(last visit:1/5/2018) 45 Jennifer Pahlka is Founder and Executive Director, Code for America: https://www.codeforamerica.org/people/jennifer-pahlka

59

business. But, profiling might be also used by organizations or conceivable future governments that have discrimination of specific bunches on their political motivation, in arrange to discover their targets and deny them get to to administration, or worse. All these information might be utilized to profile citizens, and base choices upon such profiles (van​ den Hoven, et. al,2016). Social Media Data are frequently used in profiling procedures. A big data research project, conducted in November 2014 from Gerwin van Schie, Irene Westra & Mirko Tobias Schäfer, concludes that the online profiles can better be seen as representations of people, not the people themselves, and, depending on the Social Media that being investigated, many users may provide fake or false information. Receiving informed consent from the whole population of a social network or service is therefore unrealistic. The results imply that profiling might not be as successful as it promises because oftenly it depends on controversial data. This leads us back to privacy issues. Nowadays, stayings anonymous is nearly impossible. Everything we are doing on the internet can be connected to us and our activities in real life. Whoever has this data has the capability to connect it and create a story about us. We are quickly moving to the point that governments and companies will have access to all the data (not just social media data), and use it to profile everybody. It seems like a conspiracy theory, but sets valid concerns about the reaction of the populations, this kind of surveillance raises dangers from psychological aspects because being worried about your everyday choices may have unhealthy consequences. Reducing privacy because of digitization is an important matter of discussion between citizens and their representatives because, despite the positive results, this technology might have, it is possible to “judge” wrong. These privacy violations, in a way, will not stop on location data but will reach the level of the individual. Privacy is a huge concern, not just for companies dealing with big data but it comes down to society. Privacy violation dangers, more or less, are the same as always. Breaches, discrimination and unfair analysis. The privacy issue with Big Data is the possibility and potential to know more about people than they know about themselves (Weathington, 2017). 46 Kenneth Cukier poses the contemplation : what is at state in the Big Data Era is not privacy, which was a small data era problem. Big Data can erode the idea of personal responsibility.

46 Kenneth Cukier talk “Big Data is Better Data”. Retrieved from: https://www.ted.com/talks/kenneth_cukier_big_data_is_better_data

60

The real challenge is the safeguarding of free will, which is a cornerstone of the modern worldview, the moral choices, human volition, human agency. The age of big data will require new rules to safeguard the sanctity of the individual (Cukier and Schonberger, 2013). Big Data raises major questions about a loss of human autonomy which arises from deterministic knowledge being applied to human behaviour (Schroeder, 2014).

4.5 Europe for Data Privacy

As it is already outlined, Data Privacy is a huge element of discussion. Increasingly sophisticated algorithms emerge and delve into our habits. In theory, you can change your privacy setting and share as less data as possible but the vast majority of users may not take the extra time to change the default setting. This claim derives from a small scale survey I conducted among my co-workers, friends and people I randomly met on the street. I interviewed 86 persons of ages 18-50 if they change the privacy settings on their social media accounts and if they have ever read the privacy policy of the websites they visit. The majority, almost 75% the respondents (62/86), replied that not only they do not change or read the privacy settings but it does not matter to them if their personal information is used by the websites and the companies.

The following graphs give a more detailed image of the given answers:

61

Figure 4.2 Sex of the r​espondents

Figure 4.3 Age of the r​espondents

The age and the sex of the users is important because different groups have different points of view in technology related matters and it is because of the difference in the level of familiarity they have with technology.

Figure 4.4 Do you care if your personal data is being used?

62

Figure 4.5 Do you change your privacy settings?

Almost 35% of the respondents, tend to care about their data being used but only 25% spends time in changing privacy settings. This means that possibly there is a lack of sufficient motivation and understanding of what those setting are about.

Of those who answered that they do change privacy settings 16/24 were women and 7/16 were of ages 31-40.

63

European Union in an attempt to guarantee civilians’ data privacy and fill the legislative gap, institutes new legislations on their behalf. Within the framework of the European Union, The European Data Protection Board was established. The Board has the responsibility to ensure the application of the data protection law across EU and effectively cooperating amongst Data Protection Authorities.

Below there is a scheme to replicate the composition of the European Data Protection Board:

Figure 4.6 Composition of the European Data Protection Board Source: h​ttps://www.i-scoop.eu/gdpr/european-data-protection-board-edpb/

In April 2016, the the General Data Protection Regulation (GDPR) was approved by the EU parliament and replaces the former the Data Protection Directive 95/46/EC. The GDPR concerns all the online exchanges and businesses and aims to apply (28/05/2018) seven new 47 regulations/rights :

47 More information at: e​uropa.eu/dataprotection

64

● A right to receive clear and understandable information about who is processing your data, what data they are processing and why they are processing it. ● A right to request access to the personal data an organisation has about you. ● A right to request one service provider to transmit your personal data to another service provider, e.g. when switching from one to another internet social network, or switching to another cloud provider. ● A right ‘to be forgotten’. You will be able to ask to delete your personal data if you no longer want it to be processed, and there is no legitimate reason for a company to keep it. For example, when you type your name into an online search engine, and the results include links to an old newspaper article about the debt you long paid, you will be able to ask the search engine to delete the links (unless you are a public figure or your interest in removing the article outweighs the general public’s interest in accessing the information). ● In cases when companies need your consent to process your data, they will have to ask you for it and clearly indicate what use will be made of your personal data. Your consent must be an unambiguous indication of your wishes and be provided by an affirmative action by you. So, the companies won’t be able to hide behind long legalistic terms and conditions that you never read. ● If your data is lost or stolen, and if this data breach could harm you, the company causing the data breach will have to inform you (and the relevant data protection supervisory authority) without undue delay. If the company doesn’t do this, it can be fined. Recent attacks, such as WannaCry, Meltdown and Spectre, or the case show how important this new right is. ● Better protection of children online. Children may be less aware of the risks and consequences of sharing data and are less aware of their rights. This is why any information addressed specifically to a child will need to be adapted to be easily accessible, using clear and plain language.

65

Chapter 5

Conclusions- Limitations- Future Research

Over the last few years, the arrival of "Big Data" has brought rapid developments in various areas of modern life, such as commerce, science, health, research, business and domestic security. It is a big trend that is not yet fully explored or defined and that causes some problems in comprehending its significance. Now the data flows from all over the world to every possible direction. We can validate the impact Big Data has on society by observing the increasing production of data and the new areas in which it finds application. They are used from the largest part of the scientific and business community and bring more direct and documented results, because the numbers are used to better evaluate both things and procedures. Indeed, innovative technologies have introduced new ways to efficient manage the vast amount of data generated almost in real time, from multiple sources, such as the various IoT sensors, social media, mobile and / or fixed devices, satellites, positioning devices (gps), etc. Purpose of all of these technologies is to extract valuable information through this "Information Chaos" (Gantz ann Reinsel, 2011) and to identify new associations or capture new unexpected uses of these data, some of which are social science research, health projects, business intelligence, helping in disaster, empowering active citizenship and making Smart Cities more efficient and sustainable. The commercial impacts of the Big Data have the potential to generate significant productivity growth for a number of vertical sectors. Big Data presents opportunity to create unprecedented business and organization advantages and better service delivery by automated decision-making and deeper data analysis. We are moving into a new era of governance and democracy that can be based on active citizenship, where people are becoming involved in the governmental process, taking responsibility for their government’s actions and for whatever else is happening either globally or locally. If this could be achieved, it holds a great promise for the future. can be helpful for the organizations to redesign their policies to address the public issues (Garg and Chatterjee, 2014). Mobile applications like the Check APP of Smart

66

Trikala are moving to this e-government direction. By analyzing the provided data, cities can discover interesting insights that can help in the smooth operation of the city’s interests. The smart lighting, parking and monitoring systems produce data that assist in reducing energy consumption, regulate traffic jam and control environmental changes (a major problem of our time). These automated analytics make life in cities easier because citizens have real- time information and can also contributed in further improvements. This study was concentrated on the impact of Big Data on society and, mostly, through literature review it becomes obvious that Big Data affect many sectors. 86 people, between 19-44 years of age were interviewed about the way the feel and act about their data privacy and the results showed that most of them were unaware or indifferent on the matter. Since, the legal framework around data privacy is changing it would be interesting to carry out a more extensive survey on the topic. A sample could be teenagers and young adults, because they are more active internet users. Smart cities is a concept that from my point of view has great potential of using Big Data. However, the only sample examined was the case of Trikala and there may not be sufficient evidence of the use of Big Data. A suggestion is to take Big Data results from various cities and analyze them in depth in order to examine if they are actually used and to what extend. Even if the effects from "Big Data" and "Big Data Analytics" applications are mostly positive for various sectors of the modern lifestyle, serious issues relating to systems’ safety and personal data privacy arise and should be taken into serious consideration. A regulatory framework for big data is essential. That framework must be constructed with a clear understanding of the ravages that have been wrought on personal interests by the reduction of information to data, its centralization, and its expropriation. The results coming from a small scale survey conducted for the purposes of this dissertation lead to the conclusion that people are not as informed as they should and usually do not pay attention to what information they give away to the service providers and are not careful. This could be a major problem in case of a breach or a targeted promotion. This concern led to the new GDPR from the European Union, in an attempt to protect personal data and, as Cukier (2014) suggests, even free will. GDPR is about correcting and supplementing previous legislative gaps and creative a morality considering personal data usage.

67

In conclusion, there are much more things to discover and discuss about Big Data. There is no question on whether or not Big Data is important but there are a lot of questions about ethics and usage. Big Data is not just a buzzword and the difficulty in defining the term is because of of its large scale of appliance.

68

REFERENCES AND BIBLIOGRAPHY

"Big Data Revolution" - PBS Documentary, 2016. https://www.youtube.com/watch?v=bIY3LUZ7i8Y&index=9&list=PLFoEDFL9w8RXOUZS

SKnpjdWFL7UyMvvgV

Abbott, A., 2000. Reflections on the Future of Sociology. Contemporary Sociology 29, 296.

Agrawal D., Bernstein P., Bertino E., Davidson S., Dayal U., Franklin M., . . . . Widom J.,

2012. Challenges and Opportunities with Big Data: A white paper prepared for the

Computing Community Consortium committee of the Computing Research Association. http://cra.org/ccc/resources/ccc-led-whitepapers/

Anderson, C., 2008. The end of theory. Wired, 16 . Retrieved from: http://archive.wired.com/science/discoveries/magazine/16-07/pb_theory

Anderson, J., & Rainie, L., 2012. Main Findings: Influence of Big Data in 2020. Retrieved from: http://www.pewinternet.org/2012/07/20/main-findings-influence-of-big-data-in-2020/​

Agre, P.E., 1994. Surveillance and capture: Two models of privacy. The Information Society

10, 101–127.

Babu, T., Swathi, P., 2018. Internet of Things (Iot) & Big Data Analytics for Smart

Cities-A Case Study. SSRN Electronic Journal.

Bello-Orgaz, G., Jung, J.J., Camacho, D., 2016. Social big data: Recent achievements and new challenges. Information Fusion 28, 45–59.

Bhadani, A., Jothimani, D., 2016. Big data: Challenges, opportunities and realities, In Singh,

M.K., & Kumar, D.G. (Eds.), Effective Big Data Management and Opportunities for

Implementation (pp. 1-24), Pennsylvania, USA, IGI Global

Bibri, S.E., 2018. Transitioning from Smart Cities to Smarter Cities: The Future Potential of

69

ICT of Pervasive Computing for Advancing Environmental Sustainability. The Urban Book

Series Smart Sustainable Cities of the Future 535–599.

Blazquez, D., Domenech, J., 2018. Big Data sources and methods for social and economic analyses. Technological Forecasting and Social Change 130, 99–113.

Boyd, D., & Crawford, K. (2012). Critical Questions for Big Data. Information,

Communication & Society, 15(5), 1–5. https://doi.org/10.1126/science.1243089​

Boyd, D.M., Ellison, N.B., 2007. Social Network Sites: Definition, History, and Scholarship.

Journal of Computer-Mediated Communication 13, 210–230.

Brynjolfsson, E., Hitt, L.M., Kim, H.H., 2011. Strength in Numbers: How Does Data-Driven

Decisionmaking Affect Firm Performance? SSRN Electronic Journal.

Cave, A., 2017. What Will We Do When The World's Data Hits 163 Zettabytes In 2025?.

Forbes. URL: https://www.forbes.com/sites/andrewcave/2017/04/13/what-will-we-do-when-the-worlds-data

-hits-163-zettabytes-in-2025/#483b4058349a

Chadwick, A., Howard, P.N., 2008. Routledge handbook of internet politics. Routledge.

Chen, C.P., Zhang, C.-Y., 2014. Data-intensive applications, challenges, techniques and technologies: A survey on Big Data. Information Sciences 275, 314–347.

Chen H., 2009. AI, E-government, and Politics 2.0, University of Arizona

Chen, H., Chiang, R. H. L., & Storey, V. C., 2012. Business Intelligence and Analytics: From

Big Data to Big Impact. Management Information Systems Quarterly, 36(4), 1165–1188. https://doi.org/10.1145/2463676.2463712

Chen, M., Mao, S., Liu, Y., 2014. Big Data: A Survey. Mobile Networks and Applications 19,

171–209.

Cleveland, W.S., 2014. Data science: An action plan for expanding the technical areas of the

70

field of statistics. Statistical Analysis and Data Mining: The ASA Data Science Journal 7,

414–417.

Cox, M., Ellsworth, D., n.d. Application-controlled demand paging for out-of-core visualization. Proceedings. Visualization '97 (Cat. No. 97CB36155).

Doherty, P., 2014. Smart Cities Big, Infinite Data. Retrieved from https://www.linkedin.com/pulse/20141123151132-11775600-smart-cities-big-infinite-data/

Dorasamy, N., Pomazalová, N., 2016. Social Impact and Social Media Analysis Relating to

Big Data. Data Science and Big Data Computing 293–313.

EDialogos: Ελληνικό χρώμα στα ευρωπαϊκά βραβεία «e-gov» | Kathimerini. (n.d.). Retrieved from: http://www.kathimerini.gr/74727/article/texnologia/diadiktyo/edialogos-ellhniko-xrwma-sta-e yrwpaika-vraveia-e-gov

Falconer, G., Mitchell, S., 2012, Smart City Framework; A Systematic Process for Enabling

Smart+Connected Communities, Cisco

Fan, W., Wallace, L., Rich, S., Zhang, Z., 2006. Tapping the power of text mining.

Communications of the ACM 49, 76–82. doi:10.1145/1151030.1151032

Firican, G., 2017. The 10 Vs of Big Data. Retrieved from: https://tdwi.org/articles/2017/02/08/10-vs-of-big-data.aspx

Foster, I., 2017. Big Data and Social Science: a Practical Guide to Methods and Tools. CRC

Press, Taylor & Francis Group.

Fugate, C., 2012. The state of FEMA

Gandomi, A., Haider, M., 2015. Beyond the hype: Big data concepts, methods, and analytics.

International Journal of Information Management 35, 137–144. doi:10.1016/j.ijinfomgt.2014.10.007

Gantz, J., Reinsel, D., 2011. Extracting Value From Chaos. IDC IVIEW. Retrieved from:

71

https://www.emcgrandprix.com/collateral/analyst-reports/idc-extracting-value-from-chaos-ar.pdf

Garg, Y., Chatterjee, N., 2014. Sentiment Analysis of Twitter Feeds. Big Data Analytics

Lecture Notes in Computer Science 33–52.

Gohar, M., Muzammal, M., Rahman, A.U., 2018. SMART TSS: Defining transportation system behavior using big data analytics in smart cities. Sustainable Cities and Society 41,

114–119.

Gold, M.K., 2012. Debates in the digital humanities. University of Minnesota Press.

Golder, S.A., Macy, M.W., 2014. Digital Footprints: Opportunities and Challenges for Online

Social Research. Annual Review of Sociology 40, 129–152.

González-Bailón, S., 2013. Social science in the era of big data. Policy & Internet 5,

147–160.

Günther, W.A., Mehrizi, M.H.R., Huysman, M., Feldberg, F., 2017. Debating big data: A literature review on realizing value from big data. The Journal of Strategic Information

Systems 26, 191–209.

Hammer, C., Kostoch D., Quiros, G., STA Internal Group, 2017. Big Data: Potential,

Challenges, and Statistical Implications, IMF STAFF DISCUSSION NOTE

Hayes, W., 2015. The Dark Side Of Big Data [WWW Document]. Forbes. URL https://www.forbes.com/sites/willhayes/2015/09/14/the-dark-side-of-big-data/#6bfb5c173d1d

Hildebrandt, M., 2008. Defining Profiling: A New Type of Knowledge? Profiling the

European Citizen 17–45.

Hill, K., 2016. In depth: Big data, smart cities. Retrieved from https://www.rcrwireless.com/20160210/internet-of-things/in-depth-big-data-smart-cities-tag2

3-tag99

Hoven, J. van, 2008. “Information technology, privacy, and the protection of personal data”,

72

in Information technology and moral philosophy, J. Van Den Hoven and J. Weckert (eds.),

Cambridge: Cambridge University Press, pp. 301–322.

Hoven, J.van den, Blaauw, M., Pieters, W., Warnier, M., 2014. Privacy and Information

Technology. Stanford Encyclopedia of Philosophy. URL https://plato.stanford.edu/entries/it-privacy/

Howard, P.N., Hussain, M.M., 2013. Digital Media and the Arab Spring. Democracy’s Fourth

Wave? 17–34.

Hypotheses in the Era of Big Data, 2015. Social Big Data Mining 46–65.

Implementing Automated Road Transport Systems in Urban Settings, 2018.

Ingram, M., 2010. Mary Meeker: Mobile Internet Will Soon Overtake Fixed Internet .

Gigaom. URL https://gigaom.com/2010/04/12/mary-meeker-mobile-internet-will-soon-overtake-fixed-intern et/

Ismail, A., 2016. Utilizing big data analytics as a solution for smart cities. 2016 3rd MEC

International Conference on Big Data and Smart City (ICBDSC).

Japec, L., Kreuter, F., Berg, M., Biemer, P., Decker, P., Lampe, C., Lane, J., O’Neil, C.,

Usher, A., 2015. Big Data in Survey Research. Public Opinion Quarterly 79, 839–880.

Kitchin, R., 2014. The Data Revolution: Big Data, Open Data, Data Infrastructures &

Their Consequences.

Kitchin, R., 2017. The data revolution: big data, open data, data infrastructures & their consequences. Sage.

Kling, R., Castells, M., 2002. The Internet Galaxy: Reflections on the Internet, Business, and

Society. Academe 88, 66.

KPMG International, 2015. Going beyond the data: turning data from insights into value,

73

KPMG International Cooperative

Krivda, C., 2011. Socialization of data. Teradata Magazine 11 (2): 38–41

Lake, P., Drake, R., 2014. The Future of IS in the Era of Big Data Big Data. Information

Systems Management in the Big Data Era Advanced Information and Knowledge Processing

267–288.

Lalovich, P., n.d. What is the Big Data? [WWW Document]. AskTheHRguy.com. URL http://www.askthehrguy.com/2014/03/what-is-big-data.html

Lane, J., 2012. O Privacy, Where Art Thou?: Protecting Privacy and Confidentiality in an Era of Big Data Access. Chance 25, 39–41.

Laney, D., 2001. META Delta. Application Delivery Strategies, 949(February 2001), 4. https://doi.org/10.1016/j.infsof.2008.09.005

Lauro, N. C., Amaturo, E., Aragona, B., & Marino, M., 2017. Data Science and Social Research. Studies in Classification, Data Analysis, and Knowledge Organization.

Lazer, D., Kennedy, R., King, G., Vespignani, A., 2014. The Parable of Google Flu: Traps in

Big Data Analysis. Science 343, 1203–1205.

Lohr, S., 2012. The Age of Big Data [WWW Document]. The New York Times. URL https://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-world.html

Lovalekar, S., 2014. Big Data an emerging trend in future, Department of IT, SIES, School of

Technology. Neruk, Novi Mumbai, India

Macy, M.W., Willer, R., 2002. From Factors to Actors: Computational Sociology and

Agent-Based Modeling. Annual Review of Sociology 28, 143–166.

Manovich, L., 2007. How to Follow Global Digital Cultures, or Cultural Analytics for

Beginners. New York, (May 2007), 1–12.

Marr, B., 2016. Big Data in Practice (Use Cases) - How 45 Successful Companies Used Big

74

Data Analytics to Deliver Extraordinary Results. John Wiley & Sons, Inc, 320. https://doi.org/10.1080/21670811.2015.1074863

Mayer-Schönberger Viktor, Cukier, K., 2013. Big data: a revolution that will transform how we live, work and think. John Murray.

McKinsey & Company., 2011. Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute, (June), 156. https://doi.org/10.1080/01443610903114527

McLellan, C., 2015. The internet of things and big data: Unlocking the power. ZDNet. URL: https://www.zdnet.com/article/the-internet-of-things-and-big-data-unlocking-the-power/

Mearian, L., 2012. By 2020, there will be 5,200 GB of data for every person on Earth.

Computerworld. URL: https://www.computerworld.com/article/2493701/data-center/by-2020--there-will-be-5-200-g b-of-data-for-every-person-on-earth.html

Meier, P., 2015. Digital humanitarians: how big data is changing the face of humanitarian response. CRC Press.

MGI Big Data Full Report [WWW Document], n.d. [WWW Document]. Scribd. URL: https://www.scribd.com/document/71593694/MGI-Big-Data-Full-Report

Naur, P., 1974. Concise survey of computer methoak.

Needham, J., 2013. Disruptive possibilities: How big data changes everything. O'Reilly.

Newell, S., Marabelli, M., 2015. Strategic opportunities (and challenges) of algorithmic decision-making: A call for action on the long-term societal effects of ‘datification.’ The

Journal of Strategic Information Systems 24, 3–14.

Nuaimi, E.A., Neyadi, H.A., Mohamed, N., Al-Jaroodi, J., 2015. Applications of big data to smart cities. Journal of Internet Services and Applications 6.

75

Office of Research Regulatory Affairs. 2018. Retrieved from ​https://orra.rutgers.edu/gcpr

Oguro, K., 2016. Big Data- Key to the 4th Industrial Revolution. Innovation in the Global

Economy – 3.

Ovadia, S., 2013. The Role of Big Data in the Social Sciences. Behavioral & Social

Sciences Librarian 32, 130–134.

Pan, Y., Tian, Y., Liu, X., Gu, D., Hua, G., 2016. Urban Big Data and the Development of

City Intelligence. Engineering 2, 171–178.

Pesenson, M.Z., Pesenson, I.Z., Mccollum, B., 2010. The Data Big Bang and the Expanding

Digital Universe: High-Dimensional, Complex and Massive Data Sets in an Inflationary

Epoch. Advances in Astronomy 2010, 1–16.

Press, G., 2014. 12 Big Data Definitions: What's Yours? Retrieved from: https://www.forbes.com/sites/gilpress/2014/09/03/12-big-data-definitions-whats-yours/2/#547

472825b3f

Prospect of Big Data Technologies in Healthcare, 2016. . The Human Element of Big Data

265–280.

Puyvelde, V., Damien, Stephen, Hossain, Shahriar, M., 2017. Beyond the buzzword: big data and national security decision-making | International Affairs | Oxford Academic OUP

Academic. URL https://academic.oup.com/ia/article-abstract/93/6/1397/4111109?redirectedFrom=PDF

Rathore, M.M., Ahmad, A., Paul, A., Rho, S., 2016. Urban planning and building smart cities based on the Internet of Things using Big Data analytics. Computer Networks 101, 63–80.

Reno, J., 2012. Introduction to the Big Data Issue. CA Technology Exchange, 3(2). Retrieved from:http://www.arcserve.com/~/media/Files/About

Us/CATX/introduction-to-big-data.pdf%5Cnhttp://citeseerx.ist.psu.edu/viewdoc/download?d​

76

oi=10.1.1.434.7867&rep=rep1&type=pdf#page=26

Robison, R., 2015. How big is the human genome? (In megabytes, not base pairs) Retrieved from: https://www.linkedin.com/pulse/how-big-human-genome-megabytes-base-pairs-reid-robison/

Saha, B., Srivastava, D., 2014. Data quality: The other face of Big Data. 2014 IEEE 30th

International Conference on Data Engineering. doi:10.1109/icde.2014.6816764

Sanseverino, E.R., Sanseverino, R.R., Vaccaro, V., 2017. Smart cities atlas: western and eastern intelligent communities. Springer.

Schafer, M.T., Es, K.V., 2018. The datafied society: studying culture through data.

Amsterdam University Press.

Schroeder, R., 2014. Big Data and the brave new world of social media research. Big Data

& Society 1, 205395171456319.

Shirky, C., 2011. The Political Power of Social Media. Foreign affairs.

Siegler, M.G., 2010. Eric Schmidt: Every 2 Days We Create As Much Information As We

Did Up To 2003. TechCrunch. URL: https://techcrunch.com/2010/08/04/schmidt-data/?guccounter=1

Stone, M., 2014. Big Data for Media (report), Reuters Institute for the study of Journalism

Strommen-Bakhtiar, A., 2012. An essay on the emerging political economy and the future of the social media. 2012 6th IEEE International Conference on Digital Ecosystems and

Technologies (DEST).

Strong, C., 2016. Humanizing big data: marketing at the meeting of data, social science and consumer insight. Kogan Page Stylus.

Sunitha, L., Raju, B., Sunil, M., Srinivas, B., 2013. A comparative study between Noisy Data and Outlier Data in data mining,

77

Taming the Realm of Big Data Analytics: Acclamation or Disaffection?, 2016. . The Human

Element of Big Data 3–15.

Tufekci, Z., 2014. Big Questions for Social Media Big Data: Representativeness, Validity and

Other Methodological Pitfalls. In ICWSM ’14: Proceedings of the 8th International AAAI

Conference on Weblogs and Social Media, 2014.

Townsend, A.M., 2013. Smart cities: big data, civic hackers, and the quest for a new utopia.

W.W. Norton & Company.

University, J.G.N.Y., 2014. Big data and open data: what's what and why does it matter? | Joel

Gurin [WWW Document]. The Guardian. URL https://www.theguardian.com/public-leaders-network/2014/apr/15/big-data-open-data-transfo rm-government

Wang, J., Crawl, D., Purawat, S., Nguyen, M., Altintas, I., 2015. Big data provenance:

Challenges, state of the art and opportunities. 2015 IEEE International Conference on Big

Data (Big Data).

Warren, S.D., Brandeis, L.D., 1890. The Right to Privacy. Harvard Law Review 4, 193.

Weathington, J., 2017. Big data privacy is a bigger issue than you think. TechRepublic.

Retrieved from: https://www.techrepublic.com/article/big-data-privacy-is-a-bigger-issue-than-you-think/

Weldon, M.K., 2016. The future X network: a Bell Labs perspective. CRC Press.

What Is Big Data? - Gartner IT Glossary - Big Data [WWW Document], 2016. [WWW

Document]. Hype Cycle Research Methodology | Gartner Inc. URL https://www.gartner.com/it-glossary/big-data

Wowczko, I., 2015. Skills and Vacancy Analysis with Data Mining Techniques. Informatics

2, 31–49. doi:10.3390/informatics2040031

78

The data deluge, 2010. The Economist. URL https://www.economist.com/node/15579717​

Zheng, J., 2016. UCSD Introduction to Big Data Week 1 & 2 review Retrieved from:

https://jingwen-z.github.io/introduction-bd-week1-2/

Extra References:​

We're Making Our Terms and Data Policy Clearer, Without New Rights to Use Your Data on Facebook. (n.d.). Retrieved from https://newsroom.fb.com/news/2018/04/terms-and-data-policy/

What is personal data?, 2018. Retrieved from: https://ec.europa.eu/info/law/law-topic/data-protection/reform/what-personal-data_en 20 Examples Of ROI And Results With Big Data. (n.d.). Retrieved from: https://content.pivotal.io/blog/20-examples-of-roi-and-results-with-big-data https://www.mckinsey.com/~/media/McKinsey/Industries/Capital%20Projects%20and%20Inf rastructure/Our%20Insights/Voices%20on%20Infrastructure%20Turning%20the%20smart%2 0city%20opportunity%20into%20reality/Voices-December-2017-WEB.ashx https://newsroom.intel.com/wp-content/uploads/sites/11/2018/03/smart-cities-whats-in-it-for- citizens.pdf

Other adjustments that developed during Adventurous Adolescent Mean | FREEDOM. 2016. Retrieved from: https://www.humancondition.com/freedom-other-adjustments-adventurous-adolescence/

Teradata. (2018). Teradata/ Big Data: http://bigdata.teradata.com/?logo=BDLOGO​ (last visit:30/4/2018)

The Internet of Things, Intelligent Lighting & Big Data: What You Need to Know- White paper (2010), DIGITAL LUMENS

79

80