Web Science Framework Introduction to the

Lucia Ciofi

12 November 2014 DINFO - University of Florence, Italy Outline

• Web science: introduction, basic concepts • A brief rationale of the evolution of the Web • Current trends: – of Things – Open data and

2 Before We Ask “What is Web Science?”

White, B. (2008, Novembre 9). The Emergence of Web Science.3 “What is ‘The Web?’” (1/2)

White, B. (2008, Novembre 9). The Emergence of Web Science.4 “What is ‘The Web?’” (2/2)

• is an application (client + server) built on top of the TCP/IP stack providing a system of interlinked documents. • With a Web browser, one can access Web servers hosting Web pages that may contain multimedia objects and navigate between them using hyperlinks.

– A distributed document delivery system implemented using application-level protocols on the Internet – A tool for collaborative writing and community building – A framework of protocols that support e-(applications) – A network of co-operating computers interoperating using HTTP and related protocols to form a ‘subnet’ of the Internet – A large, cyclical, directed graph made up of Web pages and links – A Giant Global Graph (GGG): computers, documents, people

Adapted from White, B. (2008, Novembre 9). The Emergence of Web Science.5

“Which Science for the Web?”

6 The perspective of Computer Science (CS)

• The Web, is often viewed by people in the CS field as an application running on top of the Net, more than as an entity unto itself • Computing has made significant contributions to the Web. Our everyday use of the Web depends on fundamental developments in CS that took place long before the Web was invented • Whether in CS studies or in information-school courses, the Web is often studied exclusively as the delivery vehicle for content, technical or social, rather than as an object of study in its own right.

7 Hendler, J., Shadbolt, N., Hall, W., Berners-Lee, T. and Weitzner, D. 2008. Web Science: An Interdisciplinary Approach to Understanding the Web. Communications of the ACM, 51 (7), 60-69. Web Science is part of CS?

• what place Web Science might occupy within the family of the computing disciplines? – Initially defined as “the science of decentralized information systems”, by Berners-Lee et al in 2006 [1, 2] – Established itself as a rapidly evolving and fundamentally interdisciplinary field of study [3, 4] • explorations and discussions of the relationship between Web Science and computer science [5, 6, 7]

From Web Science: Expanding the Notion of Computer Science Su White, Michalis N Vafopoulos SIGCSE’12, February 29–March 3, 2012, Raleigh, North Carolina, USA Copyright 2012 ACM 8 Web Science signals a whole new way of thinking about computer science?

Web Science

• Web Science is a new field of science that involves a multi-disciplinary study and inquiry for the understanding of the Web and its relationships to us • "social machines" (Berners-Lee and Fischetti, 1999); – the includes the underlying technology but also the rules, policies and organizational structures that are used to manage the technology. 10 – Eg. mediaWiki in Wikipedia

The Web Science Buttefly

Philosophy

11 “Phylosophy” has been added to the butterfly upon explicit agreement declared at the first WebScience Conference, held in , March2009 Web Science

• It is the study of an engineered technology (the Web) and the inter-related impacts of that technology on human, social and organizational domains. • The study of Web Science is fundamentally interdisciplinary since it incorporates enquiring into what constitutes the Web, alongside how and why practices and organizations have emerged from, or are modified by, the wider interaction of society with the Web • Web Science is distinctive from the study of web technologies 12 From Web Science: Expanding the Notion of Computer Science Su White, Michalis N Vafopoulos SIGCSE’12, February 29–March 3, 2012, Raleigh, North Carolina, USA Copyright 2012 ACM • The Web Science Trust (WST) is a charitable body – the aim is supporting the global development of Web Science. – it is hosted by the University of Southampton • The Web Science Trust began life in 2006 as the "Web Science Research Initiative“ • The ambition was to coordinate and support the study of the decentralised information system that is the • WSRI’s activities have focused on – (i) articulating a research agenda for the broader scientific community, – (ii) coordinating the development of Web Science educational material and curricula – (iii) engaging in thought leadership for this emerging field

13 Video Showcase

1) Noshir Contractor @ WebSci09

http://www.youtube.com/watch?v=drD7lqvK2CQ&feature=PlayList&p=02AA1FFE5693D5AC&inde x=7 (dal min. 6.40, presentazione e background)

http://www.youtube.com/watch?v=aF_f- n0o4oM&feature=PlayList&p=02AA1FFE5693D5AC&index=6

Web Science: An interdisciplinary approach to understanding the World Wide Web

James Hendler, , , Tim Berners-Lee, and Danny Weitzner Communications of the ACM (cover story), July 2008 What is it?

• Physical science is commonly regarded as an analytic discipline that aims to find laws that generate or explain observed phenomena, • CS is predominantly (though not exclusively) synthetic, in that formalisms and algorithms are created in order to support specific desired behaviors • Web science deliberately seeks to merge these two paradigms. • The Web needs to be studied and understood as a phenomenon but also as something to be engineered for future growth and capabilities. 16 A significant interplay

The social interactions enabled by the Web put demands on the Web applications behind them, in turn putting further demands on the Web's infrastructure. (Fig. 1 in Hendler, J., Shadbolt, N., Hall, W., Berners-Lee, T., and Weitzner, D. 2008. Web science: an interdisciplinary approach to understanding the web. Commun. ACM 51, 7 (Jul. 2008), 60-69. ) 17 It’s an Issue of Scale (1/2)

• At the micro scale, the Web is an infrastructure of artificial languages and protocols; it is a piece of engineering • It’s the interactions of human beings creating, linking, and consuming information that generates the Web’s behavior as emergent properties at the macro scale

• These properties often generate surprising properties that require new analytic methods to be understood

18 Hendler, J., Shadbolt, N., Hall, W., Berners-Lee, T., and Weitzner, D. 2008. It’s an Issue of Scale (2/2)

19 Hendler, J., Shadbolt, N., Hall, W., Berners-Lee, T., and Weitzner, D. 2008. Challenges for the Web Science (1/2) 1. What are the fundamental theoretical properties of social machines, and what kinds of algorithms are needed to create them? 2. What are the underlying architectural principles to guide the design and efficient engineering of new Web infrastructure components for this social software? 3. How can we extend the current Web infrastructure to provide mechanisms that make the social properties of information sharing explicit and that guarantee that the uses of this information conforms to the relevant social policy expectations? 4. How do cultural differences affect the development and use of social mechanisms on the Web? As the Web is now truly "World Wide," the properties desired by one culture may be seen as counter-productive by another. Can Web infrastructure help in bridging cultural divides and/or increase cross-cultural understanding? 20 Hendler, J., Shadbolt, N., Hall, W., Berners-Lee, T., and Weitzner, D. 2008. Challenges for the Web Science (2/2)

• How can we understand and develop? – The “dot-com” business models – Effective social networking environments – New “real-time” tools (Twittering, – Etc., etc. • How can we address? – trustworthiness, reliability, and tacit expectations Internationalization – privacy, copyright, and other legal rules • How will/can the Web affect the way we “do” science, education, governance, communication, etc.? • Why the has not yet arrived? • How will a “Web of objects” (Internet of Things) operate? • How will we address the issue of digital identity? (Miller, 2009)

21 A Rationale of the Web

Towards a new Web Milestones of Web evolution

• Internet: allowed programmers to communicate without concern of the network of cable through which the communication had to flow • WWW: allowed users to work with a set of interconnected documents without the concern of the details of the computers storing and exchanging them • Semantic Web (?): will allow users to refer to real- word objects without concern for the underlyin documents in which these things, abstract and concrete, are described

23 Web 1.0

• The web initially made it possible for people to publish and access documents online • Metcalfe’s Law or ‘network effect’ – The higher the number of users publishing documents on the web, the higher the value of the web to its users • Conditions – the adoption of standards – the ease of navigation with simple tools such as web browsers – the availability of content • Early web users were primarily consumers of online resources or services 24

Web 1.0 : from Infrastructural Reqs

It was all about how to connect pc

The social interactions enabled by the Web put demands on the Web applications behind them, in turn putting further demands on the Web's infrastructure. (Fig. 1 in Hendler, J., Shadbolt, N., Hall, W., Berners-Lee, T., and Weitzner, D. 2008. Web science: an interdisciplinary approach to understanding the web. Commun. ACM 51, 7 (Jul. 2008), 60-69. ) 25 Web 2.0

• Read/Write, two-way, anyone can be a publisher • Web 2.0’ online tools and services such as blogs, forum, wikis and social networks led to many users adding content to the web – creating value on Web 2.0 social platforms • Social Web • new business models have emerged • new generation of services that combine data from different sources to provide added-value services – mashups Image credit: catspyjamasnz W. Hall, N. Shadbolt, T. Tiropanis, K. O’Hara and T. Davies, Open data and charities, 26 2012 Web 2.0 : from Social Interactions

It was all about how to connect people

The social interactions enabled by the Web put demands on the Web applications behind them, in turn putting further demands on the Web's infrastructure. (Fig. 1 in Hendler, J., Shadbolt, N., Hall, W., Berners-Lee, T., and Weitzner, D. 2008. Web science: an interdisciplinary approach to understanding the web. Commun. ACM 51, 7 (Jul. 2008), 60-69. ) 27 Issues from Web 2.0

• Privacy • Trust • Security • Intellectual property

28 What about the future?

• Many hypothesis about what the new Web could be • Many of these visions have failed or better we think it will take much more time to see them realized – hot trends have declined and new ones have emerged • Web 3.0 ?

29 Web 3.0?

• “People keep asking what Web 3.0 is. I think maybe when you've got an overlay of scalable vector graphics […] on Web 2.0 and access to a semantic Web integrated across a huge space of data […]” Tim Berners-Lee, 2006

• ”The Web of Openness. A web that breaks the old siloes, links everyone everything everywhere, and makes the whole thing potentially smarter.” Greg Boutin, May 2009

• “The Web 3.0 term misleads organizations by implying that a new version of the web is upon us.” Anthony Bradley, Gartner, April 2009

Richard MacManus Web 3.0 or Not, There's Something Different About 2009 30 Web 3.0?

• If Web 2.0 was about user generated content and social applications such as YouTube and Wikipedia, then Web 3.0 is about open and more structured data - which essentially makes the Web more 'intelligent'.

• Web 3.0 is an amorphous term, and possibly one that people shouldn't even attempt to use. Nevertheless, it's clear to us that the time for structured data has come. We're beginning to see it in the current wave of Linked Data sets being released, and in the support that big companies, like Google and Yahoo, are showing for structured data.

Richard MacManus Understanding the New Web Era: Web 3.0, Linked Data, Semantic Web May 14 2009 31 Web Trends Beyond Web 2.0

• Beyond PC – Internet of Things • Data challenges – Open Data – Big Data

32 Internet of Things

D. Miorandi, S. Sicari, F. De Pellegrini, and I. Chlamtac. 2012. Survey Internet of things: Vision, applications and research challenges. Ad Hoc Netw. 10, 7 (September 2012), 1497-1516. DOI=10.1016/j.adhoc.2012.02.016 Internet of Things (IoT)

• IoT is an umbrella keyword • extension of the Internet and the Web into the physical realm – by means of devices with embedded • identification • sensing capabilities (sensor: input device) • actuation capabilities (actuator: output device) • from infrastructure network to interconnected ‘‘smart’’ objects

34 IoT: Things

• Smart objects (or simply things) – embedding of electronics into everyday physical objects • Smart objects are entities that: – Have a physical embodiment – Have a minimal set of communication functionalities, – Possess a unique identifier • Are associated to at least one name and one address – Possess some basic computing capabilities – May possess means to sense physical phenomena

35 IoT: System – level perspective

• a system-level perspective • the IoT can be looked at as – a highly dynamic and radically distributed networked system – composed of a very large number of smart objects producing and consuming information • incremental process – starting from existing technologies and applications. – e.g. from identification technologies such as RFID (Radio Frequency Identification)

36 IoT: System-level features

• Features to take into account in IoT domain: – Devices heterogeneity – Scalability – Ubiquitous data exchange through proximity wireless technologies – Energy-optimized solutions – Localization and tracking capabilities – Self-organization capabilities – Semantic interoperability and data management – Embedded security and privacy-preserving mechanisms

37 Open Data

A new era for the Web?

W. Hall, N. Shadbolt, T. Tiropanis, K. O’Hara and T. Davies, Open data and charities, 2012 Open Data: Introduction

• All modern organizations rely on data – often this data is held in disparate and spreadsheets stored on servers, laptops or USB sticks • How to unlock the value of data – It is when content is openly accessible and linked that network-effects are generated which add value to both the content and to the network as a whole

39 What is Open Data?

• “Open data is data that can be freely used, reused and redistributed by anyone – subject only, at most, to the requirement to attribute and sharealike http://okfn.org/opendata

40 Open Data: Definition

• open data is defined by three criteria • a dataset is open data if: 1. it is made accessible online 2. it is published in an open machine readable format • csv format as an example 3. it is licensed to allow others to re-use it • Open Licence (ODbL) • Creative Commons Zero (CC0) licence

41 Open Data Advantages

• Value does not come from hoarding or selling raw data but from transforming it into actionable information and knowledge: putting it into use • Data should be usable by anyone, not just the data owner – e-science data – government data

42 Open Data Gov

• 100 open government data initiatives worldwide http://datacatalogs.org • Government datasets cover a wide range of areas of activity including transport, health, agriculture, business, law and education • Main motivations – Independent third parties are often able to develop tools using government data far more quickly and cheaply than if government developed them direct – When the data is public, errors can the spotted, and corrected, more easily – The transparency agenda 43

Open for Business?

• enabling third parties to use data and offer value added services to users – e.g.price comparison websites • crowdsourcing added-value applications that further enhance their offerings – e.g. Twitter

44 Consuming Open Data

• a wide range of off-the-shelf tools to combine different datasets – data analytics – data visualization • open data tools – Google Spreadsheets – Google Fusion Tables – Google Refine

W. Hall, N. Shadbolt, T. Tiropanis, K. O’Hara and T. Davies, Open data and charities, 2012 45 Many kinds of (open) data

Untangling Davies (2012): definitions the debate: data Availableand implications. from:

Definitions Potential implications www.opendataimpacts.net/2012/03/untanglingthe Open Datasets that are made accessible in non-proprietary Third-parties can innovate with open data, data formats under licences that permit unrestricted re-use generating social and economic benefits. Citizens (OKF - Open Knowledge Foundation, 2006). and advocacy groups can use open government Open government data involves governments data to hold state institutions to account. Data can providing many of their datasets online in this way. be shared between institutions with less friction. Big data Data that requires ‘massive’ computing power to Companies and researchers can ‘data mine’ vast process (Crawford and Boyd, 2011). Massive data resources, to identify trends and patterns. Big computing power, originally only available on data is often generated by combining different supercomputers, is increasingly available on desktop datasets. Digital traces from individuals and computers or via low cost cloud computing. companies are increasingly captured and stored for

their potential value as ‘big data’. -

Raw Primary data, as collected or measured direct from Access to raw data can allow journalists, open

data the source. Or researchers and citizens to ‘fact check’ official -

Data in a form that allows it to be easily manipulated, analysis. Programmers are interested in building data - sorted, filtered and remixed. innovative services with raw data. debate

Realtime Data measured and made accessible with minimal Real-time data supports rapid identifications trends.

- definitions data delay. Often accessed over the web as a stream of Data can support the development of

data through APIs (Application Programming ‘early warning systems’ (e.g. Google Flu Trends; -

Interfaces) Ushahidi). ‘Smart systems’ and ‘smart cities’ can be andimplications configured to respond to

real-time data and adapt to changing circumstances. Linked Datasets are published in a format (for instance RDF) A ‘web of linked data’ emerges, supporting ‘smart

data facilitating the use of URLs (web addresses) to identify the applications’ (Allemang and Hendler, 2008) that can elements they contain, with links made between datasets follow the links between datasets. This provides the (Berners-Lee, 2009; Shadbolt, Hall and Berners-Lee, foundations for the Semantic Web. 2006). Personal Data about an individual that they have a right to control Many big and raw datasets are based on aggregating /private access to. personal data, and combining them with other data. data Such data might be gathered by companies, governments Effective anonymisation of personal data is difficult, or other third-parties in order to provide a service to particularly when open data provides the pieces for 46 someone, or as part of regulatory and law-enforcement ‘jigsaw identification’ of private facts about people (Ohm, activities. 2010). Linked Data

A vision within the Web Science and a bottom-up approach to Semantic Web Video Showcase

2) TBL @ TED 2009

http://www.ted.com/talks/tim_berners_lee_o n_the_next_web.html

49 Linked Data: Let A Thousand Flowers Bloom

• Linked Data enables data to be opened up and connected so that people can build interesting new things from it. (via Tim Berners-Lee) Linked Data is Blooming; ReadWriteWeb, May 2009 50 What is Linked Data?

• Linked Data offers a new medium to link structured data that is then more machine-readable." However, he added that Linked Data "does not by itself add any semantic meaning to the information, but it better carries that semantic information once you have it.

• So, while Linked Data is not semantic, creating links at the data level paves the way to a true Semantic Web." (Greg Boutin, Tying Web 3.0, the Semantic Web and Linked Data Together - Part 2/3: Linked Data is a Medium)

• More specifically, Wikipedia defines Linked Data as "a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF." (Chris Bizer, Tom Heath, Kingsley Idehen, Tim Berners-Lee WWW 2008: Linked Data on the Web) 51 LD Principles

1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information. 4. Include links to other URIs. so that they can discover more things.

– The essence of Linked Data comes down to a URI that delivers the following in one go: 1. Named Reference (a Web Space name for something) 2. Conduit to an address (URL) that exposes the Description of a Named Thing in a negotiated representation ( (X)HTML, RDFa, N3, , RDF/XML etc..)

from: Kingsley Idehen, ReadWriteWebm comment to The Web of Data: Creating Machine-Accessible Information, April 19, 2009

52 Bridging the gap: Semantic Web – Web of Data

Semantics Provide models that can be used to represent expressive semantic descriptions (OWL) of applications domains and provide inferencing power for Web and non-Web applications that need A

Linking data entries using URIs powered by RDF and SPARQL helps to create Web applications and portals that use REST-based models, integrating data from multimple sources Data withouth need of preexisting schema 53 The Classic Web Single information space, Web Search Browsers Engines build on 1. URIs – globally unique IDs – retrieval mechanism

HTML HTML HTML 2. Hyperlinks

hyper- hyper- – are the glue that holds links links everything together

A B C

Chris Bizer, Tom Heath, Kingsley Idehen, Tim Berners-Lee WWW 2008: Linked Data on 54the Web Linked Data Use Semantic Web technologies to

1. publish structured data on the Web, 2. set links between data from one data source to data within other data sources.

Thing Thing Thing Thing Thing

Thing Thing Thing Thing Thing

typed typed typed typed links links links links

A B C D E

55 Chris Bizer, Tom Heath, Kingsley Idehen, Tim Berners-Lee WWW 2008: Linked Data on the Web Applications (1/2) What can I do with this?

Linked Data Linked Data Search Browsers Mashups Engines

Thing Thing Thing Thing Thing

Thing Thing Thing Thing Thing

typed typed typed typed links links links links

A B C D E 56 Chris Bizer, Tom Heath, Kingsley Idehen, Tim Berners-Lee WWW 2008: Linked Data on the Web Applications (2/2) What can I do with this?

• Once the data sets are interconnected (i.e. link to each other like websites), a machine can traverse this independent web of noiseless, structured information to gather semantic knowledge of arbitrary entities and domains. – The result is a massive, freely accessible knowledge base forming the foundation of a new generation of applications and services. – The data sets currently can be accessed in heterogeneous ways; for example, through a semantic web browser or by being crawled by a engine.

57 W3C Linking Open Data Project

• Community effort to – publish existing open license datasets as Linked Data on the Web – interlink things between different data sources

Chris Bizer, Tom Heath, Kingsley Idehen, Tim Berners-Lee WWW 2008: Linked Data on 58the Web The LOD Cloud

Collectively, the data sets consist of over 4.5 billion RDF triples, which are interlinked by around 180 million RDF links (March 2009).

Typically, a data set contains knowledge about a particular domain, like books, music, encyclopedic data, companies 59 Web Science Framework Introduction to the Web Science

Lucia Ciofi

12 November 2014 DINFO - University of Florence, Italy