Semantic Web 1 (2012) 1–16 1 IOS Press
An Architecture of a Distributed Semantic Social Network
Sebastian Tramp a,∗, Philipp Frischmuth a, Timofey Ermilov a, Saeedeh Shekarpour a and Sören Auer a a Universität Leipzig, Institut für Informatik, AKSW, Postfach 100920, D-04009 Leipzig, Germany E-mail: {lastname}@informatik.uni-leipzig.de
Abstract. Online social networking has become one of the most popular services on the Web. However, current social networks are like walled gardens in which users do not have full control over their data, are bound to specific usage terms of the social network operator and suffer from a lock-in effect due to the lack of interoperability and standards compliance between social networks. In this paper we propose an architecture for an open, distributed social network, which is built solely on Semantic Web standards and emerging best practices. Our architecture combines vocabularies and protocols such as WebID, FOAF, Se- mantic Pingback and PubSubHubbub into a coherent distributed semantic social network, which is capable to provide all crucial functionalities known from centralized social networks. We present our reference implementation, which utilizes the OntoWiki application framework and take this framework as the basis for an extensive evaluation. Our results show that a distributed social network is feasible, while it also avoids the limitations of centralized solutions.
Keywords: Semantic Social Web, Architecture, Evaluation, Distributed Social Networks, WebID, Semantic Pingback
1. Introduction platform or information will diverge. Since there are only a few large social networking players, the Web Online social networking has become one of the also loses its distributed nature. According to a recent most popular service on the Web. Especially Facebook comScore study1, Facebook usage times already out- with its 600M+ million users creates a web inside the number traditional Web usage by factor two and this Web. Drawing on the metaphor of islands, Facebook divergence is continuing to increase. is becoming more like a continent. However, users are We argue that social networking must be distributed locked up on this continent with hardly any opportu- and users should be empowered to regain control over nity to communicate easily with users on other islands their data. The currently vast oceans between social and continents or even to relocate trans-continentally. networking continents and islands should be bridged Users are bound to a certain platform and hardly have by high-speed connections allowing data and users to the chance to migrate easily to another social network- travel easily and quickly between these places. In fact, ing platform if they want to preserve their connections. we envision the currently few social networking conti- Once users have published their personal information nents to be complemented by a large number of smaller within a social network, they often also lose control islands with a tight network of bridges and ferry con- over the data they own, since it is stored on a single nections between them. company’s servers. Interoperability between platforms In this paper we describe the main technological is very rudimentary and largely limited to proprietary ingredients for a distributed semantic social network APIs. In order to keep data up-to-date on multiple plat- (DSSN) as well as their interplay. The semantic rep- forms, users have to modify the data on every single resentation of personal information is facilitated by a
*Corresponding author. 1http://allthingsd.com/20110623/ WebID: http://sebastian.tramp.name the-web-is-shrinking-now-what/
1570-0844/12/$27.50 c 2012 – IOS Press and the authors. All rights reserved 2 S.Tramp et al. / An Architecture of a Distributed Semantic Social Network
WebID profile. The WebID protocol allows the use of Service Decoupling. A second fundamental design a WebID profile for authentication and access control principle is the decoupling of user data from services purposes. Semantic Pingback facilitates the first con- as well as applications [8]. It ensures that users of the tact between users of the social network and provides network are able to choose between different services a method for communication about resources (such and applications. In addition, this principle helps users as images, status messages, comments, activities) on of the DSSN to distinguish between their own data, the social network. Finally, PubSubHubbub-based sub- which they share with and license to other people and scription services allow obtaining near-instant notifica- services, and foreign data, which they create by using these services and which they do not own. tions of specific information as WebID profile change sets and activity streams from people in one’s so- Protocol Minimalism. The main task for social net- cial network. Combined these standards, protocols and working protocols is to communicate RDF triples be- technologies provide all necessary ingredients to re- tween DSSN nodes, not to enforce a specific work flow alize a distributed social network having all the cru- nor an exact interpretation of the data. This constraint cial social networking features provided by centralized ensures the extensibility of the data model and keeps ones. the overall architecture clean. This article is structured as follows: We present our 2.2. Data Layer reference architecture of a distributed social semantic network in Section 2. Our implementation based on The data layer comprises two main data structures: the OntoWiki framework is demonstrated in Section 3, resources for the description of static entities and feeds while the evaluation results of our approach are dis- for the representation and publishing of events and ac- cussed in Section 4. Finally, we give an overview of re- tivities. lated work in Section 5 and conclude with an outlook 2.2.1. Resources on future work in Section 6. We distinguish between three main categories of DSSN resources: WebIDs for persons as well as appli- cations, data artefacts and media artefacts. The prop- 2. Architecture of a Distributed Semantic Social erties, conditions and roles in the network of these re- sources are described in the next paragraphs. Network WebID [17]2 is a best practice recently conceived in In this section we describe the DSSN reference ar- order to simplify the creation of a digital ID for end chitecture. After introducing a few design principles users. Since its focus lies on simplicity, the require- ments for a WebID profile are minimal. In essence, on which the architecture is based, we present its dif- a WebID profile is a de-referenceable RDF document ferent layers, i.e. the data, protocol, service and appli- (possibly even an RDFa-enriched HTML page) de- cation layers. The overall architecture is depicted in scribing its owner3. That is, a WebID profile con- Figure 1. tains RDF triples which have the IRI identifying the owner as subject. The description of the owner can be 2.1. Basic Design Principles performed in any (mix of) suitable vocabulary (-ies), but FOAF [6] has emerged as the ‘industry standard’ Our DSSN architecture is based on the following for that purpose. An example WebID profile compris- three design principles. ing some personal information (lines 8-12) and two rel:worksWith4 links to co-workers (lines 6-7) is Linked Data. The main protocol for data publishing, shown in Listing 1. retrieval and integration is based on the Linked Data principles [3]. All of the information contained in the 2The latest spec is available at http://webid.info/spec/. DSSN is represented according to the RDF paradigm, 3The usage of an IRI with a fragment identifier allows the indi- made de-referencable and interlinked with other re- rect identification of an owner by reference to the (FOAF) profile sources. This principle facilitates heterogeneity and document. 4Taken from RELATIONSHIP: A vocabulary for describing enables the distribution of data and services on the relationships between people at http://purl.org/vocab/ Web. relationship. S.Tramp et al. / An Architecture of a Distributed Semantic Social Network 3
Profile Bookmark Foto ... Blog Manager Collection Sharing Application Layer 4 ping 2 create update 5 search push subscribe 7 3 delegeate access to Ping Update Search Push 6 Service Layer 4 announce 1 create update 5 index 1 announce 7 read access
WebIDs Activity History Streams Feeds Data & Media announce Artefacts 1 Resources Feeds Data Layer
Fig. 1. Architecture of a Distributed Semantic Social Network (without protocol layer): (1) Resources announce services and feeds, feeds announce services – in particular a push service. (2) Applications initiate ping requests to spin the Linked Data Network. (3) Applications subscribe to feeds on push services and receive instant notifications on updates. (4) Update services are able to modify resources and feeds (e.g. on demand of an application). (5) Personal and global search services can index resources and are used by applications. (6) Access to resources and services can be delegated to applications by a WebIDs, i.e. the application can act in the name of the WebID owner. (7) The majority of all access operations is executed through standard web requests.
Apart from the main focus on representing user pro- parts of the profile data and can add or change some files, our architecture extends the WebID concept by of the profile information, e.g. create activities or cre- facilitating two additional tasks: service discovery and ate and link images. Applications on the DSSN are access delegation. also identified by using WebID profiles, but are not Service discovery is used to equip a WebID with re- described as a person but as an application. They can lations to trusted services which have to be used with act on behalf of a person but rely on delegated access that WebID. The usage of the WebID itself ensures rights for such an activity. This process is described in that an agent can trust this service in the same way as Section 2.3.1. she trusts the owner of the WebID. The most impor- tant service in our DSSN architecture is the Seman- Data Artefacts are resources on the Web which are tic Pingback service, which we describe in detail in published according to the Linked Data principles. Section 2.3.2. An example service relation is shown in Data artefacts includes posts, comments, taggings, ac- Listing 4. tivities and other Social Web artefacts which have been In addition to this service, we introduce access del- created by services and applications on the Web. Most egation for the WebID protocol. WebID access delega- of them are described by using specific web ontologies 6 tion is an enhancement to the current WebID authenti- such as SIOC [5], Common Tag or Activity Streams cation process in order to allow applications to access in RDF [10]. resources and services on behalf of the WebID owner, Media Artefacts are also created by services and ap- but without the need to introduce additional applica- plications but consist of two parts – a binary data part tion certificates in a WebID. We describe the WebID which needs to be decoded with a specific codec, and protocol as well as our access delegation extension in a meta-data part which describes this artefact. Usually, detail in Section 2.3.1. such artefacts are audio, video and image files, but of- Agents and Applications play an important role in fice document types are also frequently used on the today’s social networks5 . They have access to large Social Web. Media artefacts can be easily integrated into the DSSN by using the Semantic Pingback mech- 5Social network games such as FarmVille can have more than 80 million users (according to appdata.com), which constantly cre- ate activities in their social network. 6http://commontag.org/Specification 4 S.Tramp et al. / An Architecture of a Distributed Semantic Social Network
work. In our DSSN architecture, each activity is cre- 1 @prefix rdfs:
user8. The relying party then fetches the WebID and @prefix rsa:
Resource Layer Linking Resource Linked Resource (Source) (Target) (typed) linking 1
5 autodiscovery Publisher fetches 3
2 6 announces (updates) observes (notifies) 7 Pingback Client 4 Pingback Server (Link Propagator) RPC request RPC Layer
Link Publisher Link Reveiver
Fig. 2. Architecture of the Semantic Pingback approach: (1) A linking resource links to another (Data) Web resource, here called linked resource. (2) The Pingback client is either integrated into the data/content management system or realized as a separate service, which observes changes of the Web resource. (3) Once the establishing of a link has been noted, the Pingback client tries to auto-discover a Pingback server from the linked resource. (4) If the auto-discovery has been successful, the respective Pingback server is used for a ping. (5) In order to verify the retrieved request (and to obtain information about the type of the link in the semantic case), the Pingback server fetches (or de-references) the linking resource. (6 + 7) Subsequently, the Pingback server can perform a number of actions such as updating the linked resource (e.g. adding inverse links) or notifying the publisher of the linked resource (e.g. via email).
ple establish a reference to them or their work, thus the seamless connection and interlinking of resources facilitating social interactions. It also allows to pub- on the Social Web with resources on the DSSN. An lish backlinks automatically from the original WebID extension of our example profile with Semantic Ping- profile (or other content, e.g. status message) to com- back functionality making use of an external Semantic ments or references of the WebID (or other content) Pingback service is shown in Listing 4. elsewhere on the Web, thus facilitating timeliness and As requested by our third DSSN design paradigm coherence of the Social Web. (protocol minimalism), Semantic Pingback is a generic As a result, the distributed network of WebID pro- data networking protocol which allows to spin rela- files, RDF resources and social websites can be much tions between any two Social Web resources. In the tighter and timelier interlinked by using the Semantic context of the DSSN Architecture, Semantic Pingback Pingback mechanism than conventional websites, thus is used in particular for friending, commenting and rendering a network effect, which is one of the ma- tagging. jor success factors of the Social Web. Semantic Ping- Friending is the process of establishing a symmetric back is completely downwards compatible with the foaf:knows relation between two WebIDs. A rela- conventional Pingback implementations, thus allowing tionship is approved when both persons publish this re- lation in their WebIDs. A typical friending work flow can be described by the following steps: 22 @prefix ping:
This basic model of communication can be applied personal search / index service of a subscribed user to different events and activities in the social network. (see next section) Table 1 lists some of the more important pingback Resource Synchronization is an additional commu- events. In any case, the owner of these resources can nication scheme based on feeds. It is required, espe- be informed about the event and in most cases specific cially in distributed social networks, to take into ac- actions should be triggered (refer to Section 3 for more count that relevant data is highly distributed over many details). locations, and access to and querying of this data can 2.3.3. PubSubHubbub be very time-consuming without caching. A properly PubSubHubbub is a web-hook-based publish/sub- connected resource synchronization tackles this prob- scribe protocol, as an extension to Atom and RSS, lem by allowing users to subscribe to changes of cer- which allows for near instance distribution of feed en- tain resources over PubSubHubbub. For a WebID, this tries from one publisher to many subscribers. Since process can be included into the friending process, feed entries are not described as RDF resources, Pub- while for other resources a user can subscribe man- SubHubbub is not the best solution as a transport pro- ually (e.g. for a group resource to receive updates of tocol for a DSSN from our perspective. However, Pub- its membership relations). Since resource synchroniza- SubHubbub with atom feeds is widely in use and has tion is not an issue which was raised in the context of good support in the web developer community which DSSN at first, we have built our data model as Linked is why we decided to use it in our architecture. Similar Data update logs [1] based on the work previously pub- 13 to Semantic Pingback, it is agnostic to its payload and lished by the Triplify project . can be used for all publish/subscribe communication connections. 2.4. Service Layer The main work flow of establishing a PubSubHub- bub connection is to advertise a hub service in an ex- Services are applications which are part of the isting feed. A subscriber follows this link and requests DSSN infrastructure (in contrast to applications from a subscription on this feed. If the feed changes, the the application layer). WebIDs can be equipped with feed publisher informs the hub service which instantly different services in order to allow manipulation and broadcasts the changes to all subscribers12. The main other actions on the user’s data by other applications. advantage of this communication model is to avoid As depicted in Figure 1, we have defined four essential frequent and unnecessary pulls of all interested sub- services for a DSSN. scribers from this feed and to allow a faster broadcast Ping Service to the subscriber. The ping service provides an endpoint for any in- In the DSSN architecture, two specific feeds are im- coming pingback request for the resources of a user. portant and interlinked with a WebID to allow sub- First and foremost, it is used with the WebID for scriptions to them: activity feeds which are used for ac- friending but also for comment notification and discus- tivity distribution and change set feeds which are used sions. for resource synchronization. One application instance can provide its services for multiple resources. In a minimal setup, a ping ser- Activity Distribution is a fundamental communica- vice provides only a notification service via email. In tion channel for any social network. A personal activ- a more complex setup, the ping service has access to ity feed publishes the stream of all activities on social the update service of a user (via access delegation) and network resources (artefacts and WebIDs) with a spe- can do more than notification. cific user as the actor. These activities can and should A ping service can be announced with the ping:to be created by any application which is allowed to up- relation as shown in Listing 4. date the feed (ref. access delegation). In addition, ac- tivity feeds can be created for data and media arte- Push Service facts in order to allow object-centered push notifica- The push service is used for activity distribution tion. Typically, activity feed updates are pushed to a and resource synchronization. Both introduced types of feeds announce its push service in the same way
12During the subscription a callback endpoint is supplied by the subscribing endpoint, which is later used for pushing the data. 13http://triplify.org 8 S.Tramp et al. / An Architecture of a Distributed Semantic Social Network
Table 1 Pingback activities distinguished by types of resources and relations.
source resource relation target resource description
WebID (foaf:Person) foaf:knows WebID (foaf:Person) friending sioct:Comment sioc:about foaf:Image commenting an image (or any other resource) sioc:Post sioc:reply_of sioc:Post replying to a (friends) post ctag:Tag ctag:tagged * any resource is tagged by a user aair:Activity aair:activityObject * any resource is object of an activity as using the rel="hub" link in the feed head. Since 2. In addition to public search services, we want to both types of feeds are valid atom feeds, a DSSN push emphasize the importance of private search ser- service can be a standard PubSubHubbub-based in- vices in our architecture. Private search services stance. are used in order to have a fast resource cache not To equip social network resources with its cor- only for public, but also for private data which a responding activity and change set feeds, we have user is allowed to access. defined two OWL object properties which are sub- A private search service is used for all users and properties of the more generic sioc:feed relation application queries from applications which act from the SIOC project [5]: dssn:activityFeed on behalf of the user. The underlying resource and dssn:syncFeed. index of a private search service is used as a In addition to these RDF properties, DSSN agents callback for all push notifications from feeds to should pay attention to the corresponding HTTP which the user has subscribed. That is, she is able header fields X-ActivityFeed and X-SyncFeed, to query over the latest up-to-date data by using which are alternative representations of the OWL ob- her private search service. In addition, she can ject properties to allow the integration of media arte- query for data which has never been public and facts without too much effort. is published for a few people only. Search and Index Service Since private search services are used by applica- Search and index services are used in two different tions which act on behalf of the user, they must be contexts in the DSSN architecture. WebID-protocol-enabled. That is, they accept requests from the user and her delegated agents only. In addi- 1. They are used to search for public web resources, tion, applications need to know which private search which are not yet part of a user’s social network. service should be accessed on behalf of the user. This These search services are well-known semantic mode is again made possible by providing a link from search engines as Swoogle [7] or Sindice [22]. the WebID to the search service15 They use crawlers to keep their resource cache In our architecture we assume that search services up-to-date and provide user interfaces as well as accept SPARQL queries. However, this assumption is application programming interfaces to integrate not true for all public Semantic Web search engine at and use their services in applications. the moment. In our architecture, these public services are used to search for new contacts as well as other arte- Update Service facts in the same way as people can use a stan- Finally, an update service provides an interface to dard web search engine. The main advantage in modify and create user resources in terms of SPARQL using Semantic Web search engines lies in their update queries. In the same way as private search ser- ability to use graph patterns for a search14.
15In our prototypes we use a simple OWL object prop- 14A motivating example in our context is the search for erty dssn:searchService, which is a sub-property of resources of type foaf:Person, which are related to the dssn:trustedService. We assume that such an easy vocabu- DBpedia topic dbpedia:DataPortability (e.g. with the lary is only the first step to a fully featured service auto-discovery foaf:interest object property). ontology and consider all dssn terms as unstable. S.Tramp et al. / An Architecture of a Distributed Semantic Social Network 9 vices, update services are secured by way of the We- plex social network application is described in the next bID protocol and accept requests only by the user itself section. and by agents in access delegation mode16. Typical examples of how to use this service are the creation of activities on behalf of the user or the mod- 3. Implementation ification of the user’s WebID, e.g. by adding a new foaf:knows relation. We provide a prototypical implementation of our ap- In the next section, we give a detailed description of proach which utilizes the OntoWiki application frame- the service interplay and usage by applications. work [2] which is also the basis for the evaluation in Section 4. All features were realized by employing the 2.5. Application Layer extension mechanisms offered by OntoWiki. The prototype described here, can be summarized as Social Web applications create and modify all kinds a WebID provider with an integrated communication of resources for a user. In our architecture, they have to hub. The following features are implemented so far: use the trusted services which are related to a WebID – Users can create and manage their WebID profiles instead of their own. Since access to these services is and any other Linked Data-enabled resource. exclusively delegated by the user to an application, the – Users can make friends, subscribe to their activ- 17 user has full control over her data . To illustrate how ities and profile updates and receive changes in- DSSN applications work with the WebID and its ser- stantly on change. vices, we will describe a simple foto-sharing applica- – Users can search and browse for friends and ac- tion: tivities inside her social network as well as fil- When a user creates an account on this service, she ter these resources by facets based on object and uses her WebID for the first login and delegates ac- datatype properties. cess to this application. The application analyses the – Users can comment on and subscribe to any WebID and discovers the trusted services and some DSSN resource which is equipped with a Se- meta-data of the user (e.g. name, short bio and depic- mantic Pingback service or an PubSubHubbub- tion). The user uploads her first image to the appli- enabled activity stream. cation, which, as a result, creates a new activity with – Users receive notifications if someone comments the user’s update service18. Furthermore, it equips the or links her WebID and send a pingback notifica- newly uploaded image with the pingback service of the tion. user; thus enabling the image for backlinks and com- In the next subsections we describe the implementa- ments. New comments can arrive from everywhere on tion decisions made to allow these functionality. An the web, but the application also provides its own com- commented screenshot of the central activity stream menting service (integrated in the image web view). If interface can be seen on Figure 3. another user writes a comment on this image, a data artefact is created in the namespace of the application 3.1. Creating and Updating Data Artefacts and a ping request is sent to the user’s pingback service (since this service is related to the image). Creating and managing Linked Data enabled RDF This simple example demonstrates the interplay and resources can be achieved without modifying the On- rules of the DSSN service architecture. A more com- toWiki basic functionality. We employ the RDFau- thor [21] JavaScript widget library in order to auto- 16We defined dssn:updateService as a relation between a matically create forms out of RDFa annotated HTML WebID and an update service documents19. Without any user effort, a newly created 17At the moment we distinguish only between access and no ac- cess to a service. As an extension, we can imagine that a private resource will be Linked data enabled if it shares the search service can handle access on parts of the private Social Graph namespace of the OntoWiki installation. In addition differently (an online game does not need to know which other ac- to that, we added support for WebID authentication tivities you pursue on the web). Access policies for RDF knowledge as well as for certificate creation by implementing a bases is a topic of ongoing research and we hope that the results of this research area can be adapted here. 18This activity is instantly pushed to all of the user’s friends but is 19Each output page which is created by OntoWiki is RDFa en- not part of the work of the image publishing service. hanced. 10 S.Tramp et al. / An Architecture of a Distributed Semantic Social Network
2 1 3
6
4 5
Fig. 3. Screenshot of the OntoWiki DSSN activity stream view. The interface elements are: (1) The Share it! activity creation module where users can post status notes and share links and media artefacts. (2) The main interface view tab to switch between configuration screen, profile manager, friending interface and the activity stream. (3) The activity filter and search module, to allow a facet based browsing of activities. (4) The events module, which queries the cache social network data for birthdays and other events. (5) The activity stream, which is the result of the SPARQL query modified by the filter module. (6) The generic wiki interfaces to create any type of Linked Data resource (e.g. comments). dedicated WebID extension. Since there is quite some tion is stored in a separate graph, a synchroniza- cryptographic processes involved, this extension needs tion is trivial. some prior configuration steps, e.g. the OntoWiki in- 4. Finally, we subscribe to two feeds when they are stance needs to be configured as a SSL/TLS enabled published within the users profile: (1) The users web application. activity feed, which we use to create the news feed for the user. The incoming atom activity en- 3.2. Maintaining Network Connections try were transformed to AAIR resources and im- ported to an additional activity RDF graph. (2) In order to maintain network connections (aka. The history feed for the friends profile. This feed friending) we execute the following steps in our imple- is employed in order to keep the cached WebID mentation: profile data up-to-date.
1. When a user enters a WebID inside the friending 3.3. Ignoring Activities and WebIDs module, we add a new statement (foaf:knows) into the RDF graph containing the users profile Since a user is normally not interested in all activi- data. This automatically generates a Pingback re- ties of his friends (e.g. certain games), there needs to quest, so the new friend will be notified. be a feasibility to remove certain activities from the 2. We create a new RDF graph, which will act as a timeline. We therefore employ EvoPat [13], a pattern- cache for the WebID profile data about that par- based approach for the evolution and refactoring of ticular friend. We also configure this new Knowl- RDF knowledge bases. EvoPat is integrated also as edge Base in such a way, that it is imported into an OntoWiki extension and in conjunction with our the users graph automatically. DSSN implementation we use this component to keep 3. We fetch the data from the friends WebID by users timeline clean. employing the Linked Data principles. We cache Each time a user selects a Hide all activities ... but- that data in the newly generated graph in order to ton next to an activity, we create a new pattern, which enhance the performance. Since those informa- subsequently matches such statements in the graph and S.Tramp et al. / An Architecture of a Distributed Semantic Social Network 11 removes them. Possible hide patterns are generated 4.1. Social Web Acid Test from the activity data itself, for instance ... from this user, ... of this language or ... about this object. Once The Social Web Acid Test (SWAT) is an integration new data for a given user is added, we re-apply the pat- use case test conceived by the Federated Social Web tern. In this way it also applies for future changes of Incubator Group of the W3C. Currently, only the first the knowledge base. and very basic level of the test (SWAT020) has been de- veloped and described completely. Nevertheless, those 3.4. Generating and Distributing Activities parts of the next level (SWAT1) which are currently published are discussed here too. A user is able to generate different kinds of activ- ities in our implementation. In the current state we SWAT0: The objectives of the first SWAT level have support three types of activities: status updates, photo clearly been described in the following use case21: sharing as well as link recommendations. We imple- mented a small extension, which displays a Share It! module inside the OntoWiki user interface. As a re- 1 User A takes a photo of user B from her phone and posts it. sult the user can quickly access this functionality and 2 User A explicitly tags the photo with user B. sharing is made really easy. Once the user generates 3 User B gets notified that she is in a photo. an activity, her activity feed is updated and subscribers 4 User C who follows user A gets the photo. to that feed are delivered with the new content by the 5 User C leaves a comment on the photo. push service. 6 User A and user B get notified about the comment. 3.5. Pingback Integration Listing 5: Social Web Acid Test - Level 0 All activities are represented as Linked Data re- sources and are equipped with a Pingback service as well as a feed. Thus, users can comment on any re- Utilizing all technologies described before, our source in their own social application and addition- DSSN architecture passes the SWAT0 without any ally subscribe to changes to that resource (e.g. com- problems. The following enumeration describes the ments by others). Each time someone comments on a details: resource (or otherwise uses it) externally, a Pingback request is send to the owning OntoWiki instance. Con- 1. User A takes a photo of user B and uploads sequently the publisher gets notified and is able to react it (e.g. with the example application from Sec- again on that new activity, which facilitates conversa- tion 2.5): The web space returns a link to the tions in a distributed manner. If a user is subscribed to user’s pingback server in the HTTP header of the a resource activity feed (which is automatically done, uploaded image. once he comments on a particular resource), she gets 2. User A explicitly tags the photo with user B: This notified about other comments, even if she is not the is done by creating a tag resource which links currently commenting person or the owner of the re- both to the image and to the WebID of user B. source. A pingback client sends a ping request to all of these resources after publishing the tag on the web. 4. Evaluation 3. User B is notified that she is on a photo: The We divided our evaluation process into two inde- notification is created by the pingback service of pendent parts: Firstly, the qualitative evaluation part User B who has received a request from the tag- aims to prove the functionality of the DSSN architec- ging application which was used by User A. ture by assessing use cases from the social web acid test (Section 4.1). Secondly, the quantitative evalua- 20http://www.w3.org/2005/Incubator/ tion part aims to prove the real-world usefulness of federatedsocialweb/wiki/SWAT0 21 our prototypical implementation by testing the perfor- For this use case the following assumptions are made: (1) Users employ at least two (ideally, three) different services each of which mance and distribution of data in the social network. is built with a different code base. (2) Users only need to have one This evaluation is carried out by using a social network account on the specific service of their choice. (3) Ideally, partici- simulation approach (Section 4.2). pants A, B, and C use their own sites (personal URLs). 12 S.Tramp et al. / An Architecture of a Distributed Semantic Social Network
4. User C, who follows user A, receives the photo: The workflow of this evaluation framework is de- User C is instantly provided with an update in picted in Figure 4. It has a pipeline architecture which her activity stream, informing her about the new is built by using three additional tools that help us to image. generate and manage the replay agent test data. In a 5. User C leaves a comment on the photo: This is first step, we use GRR (Generating Random RDF [4]) done in the same way as publishing the Tag. to generate the users’ profile and activities data. We 6. User A and user B are notified about the com- enhanced GRR by adding a post-generation and data ment: User A will be notified because her ping- adjustment component to it23. The generated data is back service informs her about this ping. User B loaded into the profile distributor which splits the data will be notified only if she has subscribed to the into separate user profiles, each with personal data and activity feed of the photo provided that it exists. activities. After splitting the data into single parts, the profile distributor passes each part on to a correspond- 22 SWAT1 is currently not finally defined , so an eval- ing replay agent. Upon call the replay agent instanti- uation can be a rough sketch only. The next SWAT ates a new OntoWiki-based DSSN node with the given level will require a few different use cases which intro- user profile. After the user profile has been loaded, and duce some new Social Web concepts. However, most because all user activities have timestamps, actions of of the user stories are satisfied already as a conse- users are simulated over time by using the received ac- quence of the fully distributed nature of the DSSN ar- tivity feed data. chitecture (e.g. data portability and social discovery). In order to generate the data as much as possible The more interesting user stories are: (1) The Private similar to real data, we decided to achieve an analysis content and Groups use cases will require a distributed of social network activities from our friends and col- ACL management. Some ideas on using WebIDs for leagues, and use the outcome of this analysis to gener- group ACL management are already published with ate artificial data. dgFOAF [15] and we think that this is a good starting point for further research. (2) The Social News use case 4.3. Quantitative Analysis of Social Network introduces a new vote activity. Since our architecture Activities applies schema agnostic social network protocols, this new type of activity can be communicated as any other The original raw input data for this analysis was activity. gathered from the Facebook accounts of several AKSW research group members. We created a desktop appli- 4.2. Evaluation Framework Architecture cation which utilizes the Facebook desktop API to ex- tract the data of one’s account. After a user logged into Our next aim was to evaluate the prototypical imple- this application by using her credentials, the applica- mentation. We decided to use a simulation approach tion extracted all of the user’s activities, friends and all in which activities are created artificially on the social activities from friends that were accessible by using network rather than arranging a user evaluation, which the API. The extracted data was fetched in JSON and strongly depends on the graphical user interface. In a later parsed and converted into a MySQL tables to sim- first step, we added an additional service for remote plify its analysis. The extracted data contains 6053 in- procedure calls to the OntoWiki DSSN node as well as dividual users. In a given time frame of a month, these an execution client (replay agent) which can execute users have generated 13682 different activities, each of activities remotely controlled and based on input activ- which was tagged with one of the following four pos- ity data in RDF. Then we designed and implemented sible types: status, link, video, music and photo. The an evaluation framework which is able to create ac- statistical analysis was performed upon the extracted tivity data randomly, but based on statistical parame- data to create a prototype of an average user. ters, and serve it to multiple instances of the OntoWiki Testing each type of activity for statistical distribu- DSSN node which use the data to simulate the social tion, showed that all types of activities follow normal network. distribution but with different values of mean and stan- dard deviation. Table 2 illustrates that normal distribu- tion parameters belong to each type of activity. 22Available online at http://www.w3.org/2005/ Incubator/federatedsocialweb/wiki/SWAT1_use_ cases (receive 29.07.2011). 23GRR+ is available at https://github.com/AKSW/GRR. S.Tramp et al. / An Architecture of a Distributed Semantic Social Network 13
1 2 3 4
GRR+ Generator Profile Data Network and activity config Activities Profile Distributor Replay Agents
Fig. 4. Workflow of the social network evaluation framework: (A) Based on a given qualitative and quantitative configuration of profiles and activities, a random knowledge base is generated. (B) The knowledge base is split up into a part for every single profile and distributed to a list of server machines. (C) For each profile, a replay agent is initialized which will create and update a specific profile as well as activities on a single social network node. (D) The social network node (in our case an OntoWiki instance) connects to other nodes as well as creates activities for the social network in the same way as it would be achieved under control of a human user.
Table 2 4.4. Data Generation and Testbed Configuration Distribution parameters as well as counts and percentages
Type Mean SD Count Percentage The data generation was split into one part for each user group, each with its own characteristics and tar- Status 215.3225 75.5157 6675 49% Link 114.1290 43.5291 3538 26% get numbers of resources. Table 4 lists some quantita- Photo 55.1935 18.8262 1711 13% tive characteristics of the generated data in the form of Video 53.9354 19.9063 1672 12% a SPARQL graph pattern. As can be noticed, the data Music 0.1612 0.4543 5 0% generated for testing represents a perfect value distri- bution based on the statistical analysis.
The analysis of the frequency of users’ activities per Table 4 month showed that the range of the number of activi- Quantitative characteristics of the generated social network data in ties per user is between 1 and 24 and the average num- the form of distinct SPARQL query counts ber is 2. We categorized users in terms of the frequency Graph pattern Distinct count of their activities into 3 groups which is shown in ta- ble 3. As can been seen, a high percentage of users are ?s a foaf:Person 100 in the inactive user category and a few users have a ?s a :Activity 1000 high number of posted activities. ?s :activityVerb :Post 500 ?s :activityVerb :Bookmark 200 Table 3 ?s :activityVerb :MediaContent 300 Different categories of users based on the frequency of their ?s foaf:knows ?o 1400 activities. ?s foaf:member ?o 100
Category Activities / Month Percentage
High 17-24 0.792 Medium 9-16 5.104 As input rules for data generation, GRR takes three Low 1-8 94.102 files with specific parameters: – A generator file that describes the rules for creat- We used this statistics to generate artificial data at ing the instances of classes a large scale. First, two pools of data, i.e. activity and – A sampler function input file that determines user pool, were generated. An activity pool was gener- URIs for classes as well as values for specific ated with respect to both proportion and distribution of predicates each type of activity, and the user pool is based on con- – A type-property mapping input file that maps the sidering the percentage of users in each category. At properties, which are described in the sampler last, activities were assigned to users through a random function input, to specific instances of different process. classes 14 S.Tramp et al. / An Architecture of a Distributed Semantic Social Network
Listing 6 shows an example generator file use for @prefix :
25http://status.net/ more difficult to steal large amounts of private 26http://diso-project.org/ data. Also, security is ensured through public re- 27http://elgg.org/ view and testing of open-standards and to a lesser 28http://foocorp.org/projects/social/ extend through obscurity due to closed propri- 29 http://themineproject.org/ etary implementations. 30http://opensource.appleseedproject.org/ 31http://onesocialweb.org/ – Data ownership. Users have full ownership and 32http://buddypress.org/ control over the use of their data. They are not re- 33http://cliqset.com/ stricted to ownership regulations imposed by their 34 http://gowalla.com/ social network provider. Instead DSSN users can 35https://posterous.com/ 36http://buddycloud.com/ implement fine-grained data licensing options ac- 37http://projectdanube.org cording to their needs. 16 S.Tramp et al. / An Architecture of a Distributed Semantic Social Network
– Extensibility. The representation of social net- data publication from relational databases. In Juan Quemada, work resources like WebIDs and data artefacts is Gonzalo León, Yoëlle S. Maarek, and Wolfgang Nejdl, editors, not limited to a specific schema and can grow Proceedings of the 18th International Conference on World 38 Wide Web, WWW 2009, Madrid, Spain, April 20-24, 2009, with the needs of the users . pages 621–630. ACM, 2009. – Reliability. Again due to the distributedness the [2] Sören Auer, Sebastian Dietzold, and Thomas Riechert. On- DSSN is much less endangered of breakdowns or toWiki - A Tool for Social, Semantic Collaboration. In Isabel F. cyberterrorism, such as denial-of-service attacks. Cruz, Stefan Decker, Dean Allemang, Chris Preist, Daniel – Freedom of communication. As we observed re- Schwabe, Peter Mika, Michael Uschold, and Lora Aroyo, edi- tors, The Semantic Web - ISWC 2006, 5th International Seman- cently during the Arab Spring social networks can tic Web Conference, ISWC 2006, Athens, GA, USA, Novem- play a crucial role in attaining and defeating civil ber 5-9, 2006, Proceedings, volume 4273 of Lecture Notes in liberties. A DSSN with a vast amount of nodes is Computer Science, pages 736–749, Berlin / Heidelberg, 2006. much less endangered of censorship as compared Springer. to centralized social networks. [3] Tim Berners-Lee. Linked Data - Design Issues. website. last change: 2009/06/18; retrieved: 2011/07/25. However, the work presented in this article can only [4] Daniel Blum and Sara Cohen. Grr: Generating Random RDF. be a first step of a larger research and development In Grigoris Antoniou, Marko Grobelnik, Elena Paslaru Bontas Simperl, Bijan Parsia, Dimitris Plexousakis, Pieter De Leen- agenda aiming at realizing a truly distributed social heer, and Jeff Z. Pan, editors, The Semanic Web: Research and network based on semantic technologies. Our imple- Applications - 8th Extended Semantic Web Conference, ESWC mentation, for example, based on the OntoWiki tech- 2011, Heraklion, Crete, Greece, May 29 - June 2, 2011, Pro- nology platform is currently only a proof-of-concept ceedings, Part II, volume 6644 of Lecture Notes in Computer implementation. For a widespread use, the usability, Science, pages 16–30. Springer, 2011. [5] John G. Breslin, Stefan Decker, Andreas Harth, and Uldis Bo- scalability and the multi-client capabilities have to be jars. SIOC: an approach to connect web-based communities. improved. Also, the distributed realization of social International Journal of Web Based Communities, 2(2):133– networking apps as implemented in centralized social 142, 2006. networks through APIs (e.g. Open Social) has to be [6] D. Brickley and L. Miller. FOAF Vocabulary Specifica- investigated. We also expect, that the combination of tion. Namespace Document 2 Sept 2004, FOAF Project, 2004. http://xmlns.com/foaf/0.1/. standards and protocols as described in this article will [7] Li Ding, Timothy W. Finin, Anupam Joshi, Rong Pan, R. Scott be implemented in a number of additional platforms Cost, Yun Peng, Pavan Reddivari, Vishal Doshi, and Joel (e.g. Wordpress, Elgg, Diaspora) thus making them Sachs. Swoogle: a search and metadata engine for the semantic first-class nodes of the DSSN. web. In David A. Grossman, Luis Gravano, ChengXiang Zhai, Otthein Herzog, and David A. Evans, editors, Proceedings of the 2004 ACM CIKM International Conference on Information and Knowledge Management, Washington, DC, USA, Novem- Acknowledgments ber 8-13, 2004, pages 652–659. ACM, 2004. [8] Maxwell Krohn, Alex Yip, Micah Brodsky, Robert Morris, and We would like to thank our colleagues from AKSW Michael Walfish. A World Wide Web Without Walls. In 6th research group (in particular Jonas Brekle and Nadine ACM Workshop on Hot Topics in Networking (Hotnets), At- lanta, GA, November 2007. Jänicke) for their helpful comments and inspiring dis- [9] S. Langridge and I. Hickson. Pingback 1.0. Tech- cussions during the development of this approach. This nical report, http://hixie.ch/specs/pingback/ work was partially supported by a grant from the Euro- pingback, 2002. pean Union’s 7th Framework Programme provided for [10] Michele Minno and Davide Palmisano. Atom Activity Streams the project LOD2 (GA no. 257943). RDF mapping. NoTube Project, 2010. http://xmlns. notu.be/aair/. [11] A. Passant and P. Mendes. sparqlPuSH: Proactive notifica- tion of data updates in RDF stores using PubSubHubbub. In References SFSW2010, 2010. [12] Alexandre Passant, John G. Breslin, and Stefan Decker. Re- [1] Sören Auer, Sebastian Dietzold, Jens Lehmann, Sebastian thinking Microblogging: Open, Distributed, Semantic. In Hellmann, and David Aumueller. Triplify: Light-weight linked Boualem Benatallah, Fabio Casati, Gerti Kappel, and Gustavo Rossi, editors, Web Engineering, 10th International Confer- ence, ICWE 2010, Vienna, Austria, July 5-9, 2010. Proceed- 38We already experimented with activities like git commits and ings, volume 6189 of Lecture Notes in Computer Science, comments on lines of source code – both usecases integrate very well pages 263–277. Springer, 2010. because of the schema-agnostic transport protocols and the Linked [13] Christoph Rieß, Norman Heino, Sebastian Tramp, and Sören Data paradigm. Auer. EvoPat – Pattern-Based Evolution and Refactoring of S.Tramp et al. / An Architecture of a Distributed Semantic Social Network 17
RDF Knowledge Bases. In Proceedings of the 9th Interna- back. In P. Cimiano and H.S. Pinto, editors, Proceedings of the tional Semantic Web Conference (ISWC2010), Lecture Notes EKAW 2010 - Knowledge Engineering and Knowledge Man- in Computer Science, Berlin / Heidelberg, 2010. Springer. agement by the Masses; 11th October-15th October 2010 - Lis- [14] P. Saint-Andre. Extensible Messaging and Presence Protocol bon, Portugal, volume 6317 of Lecture Notes in Artificial Intel- (XMPP): Core. Request for Comments 3920, IETF, October ligence (LNAI), pages 135–149, Berlin / Heidelberg, October 2004. http://www.ietf.org/rfc/rfc3920.txt. 2010. Springer. [15] F. Schwagereit, A. Scherp, and S. Staab. Representing Dis- [21] Sebastian Tramp, Norman Heino, Sören Auer, and Philipp tributed Groups with dgFOAF. In ESWC2010, pages 181–195, Frischmuth. RDFauthor: Employing RDFa for collaborative June 2010. Knowledge Engineering. In P. Cimiano and H.S. Pinto, edi- [16] Seok-Won Seong, Jiwon Seo, Matthew Nasielski, Debangsu tors, Proceedings of the EKAW 2010 - Knowledge Engineer- Sengupta, Sudheendra Hangal, Seng Keat Teh, Ruven Chu, ing and Knowledge Management by the Masses; 11th October- Ben Dodson, and Monica S. Lam. PrPl: a decentralized so- 15th October 2010 - Lisbon, Portugal, volume 6317 of Lecture cial networking infrastructure. In Proceedings of the 1st ACM Notes in Artificial Intelligence (LNAI), pages 90–104, Berlin / Workshop on Mobile Cloud Computing & Services: Social Heidelberg, October 2010. Springer. Networks and Beyond, MCS ’10, pages 8:1–8:8, New York, [22] Giovanni Tummarello, Renaud Delbru, and Eyal Oren. NY, USA, 2010. ACM. Sindice.com: Weaving the Open Linked Data. In Karl Aberer, [17] M. Sporny, S. Corlosquet, T. Inkster, H. Story, B. Harbu- Key-Sun Choi, Natasha Fridman Noy, Dean Allemang, Kyung- lot, and R. Bachmann-GmÃijr. WebID 1.0: Web identifica- Il Lee, Lyndon J. B. Nixon, Jennifer Golbeck, Peter Mika, tion and Discovery. Unofficial draft, August 2010. http: Diana Maynard, Riichiro Mizoguchi, Guus Schreiber, and //payswarm.com/webid/. Philippe Cudré-Mauroux, editors, The Semantic Web, 6th In- [18] H. Story, B. Harbulot, I. Jacobi, and M. Jones. FOAF+SSL: ternational Semantic Web Conference, 2nd Asian Semantic RESTful Authentication for the Social W. In SPOT2009, 2009. Web Conference, ISWC 2007 + ASWC 2007, Busan, Korea, [19] Henry Story, Andrei Sambra, and Sebastian Tramp. Friending November 11-15, 2007, volume 4825 of Lecture Notes in Com- On The Social Web. In Federated Social Web Europe 2011, puter Science, pages 552–565. Springer, 2007. Berlin June 3rd-5th 2011, 2011. [20] Sebastian Tramp, Philipp Frischmuth, Timofey Ermilov, and Sören Auer. Weaving a Social Data Web with Semantic Ping-