<<

Semantic Web 1 (2012) 1–16 1 IOS Press

An Architecture of a Distributed Semantic Social Network

Sebastian Tramp a,∗, Philipp Frischmuth a, Timofey Ermilov a, Saeedeh Shekarpour a and Sören Auer a a Universität Leipzig, Institut für Informatik, AKSW, Postfach 100920, D-04009 Leipzig, Germany E-mail: {lastname}@informatik.uni-leipzig.de

Abstract. Online social networking has become one of the most popular services on the Web. However, current social networks are like walled gardens in which users do not have full control over their data, are bound to specific usage terms of the social network operator and suffer from a lock-in effect due to the lack of interoperability and standards compliance between social networks. In this paper we propose an architecture for an open, distributed social network, which is built solely on Semantic Web standards and emerging best practices. Our architecture combines vocabularies and protocols such as WebID, FOAF, Se- mantic and PubSubHubbub into a coherent distributed semantic social network, which is capable to provide all crucial functionalities known from centralized social networks. We present our reference implementation, which utilizes the OntoWiki application framework and take this framework as the basis for an extensive evaluation. Our results show that a distributed social network is feasible, while it also avoids the limitations of centralized solutions.

Keywords: Semantic Social Web, Architecture, Evaluation, Distributed Social Networks, WebID, Semantic Pingback

1. Introduction platform or information will diverge. Since there are only a few large social networking players, the Web Online social networking has become one of the also loses its distributed nature. According to a recent most popular service on the Web. Especially Facebook comScore study1, Facebook usage times already out- with its 600M+ million users creates a web inside the number traditional Web usage by factor two and this Web. Drawing on the metaphor of islands, Facebook divergence is continuing to increase. is becoming more like a continent. However, users are We argue that social networking must be distributed locked up on this continent with hardly any opportu- and users should be empowered to regain control over nity to communicate easily with users on other islands their data. The currently vast oceans between social and continents or even to relocate trans-continentally. Users are bound to a certain platform and hardly have networking continents and islands should be bridged the chance to migrate easily to another social network- by high-speed connections allowing data and users to ing platform if they want to preserve their connections. travel easily and quickly between these places. In fact, Once users have published their personal information we envision the currently few social networking conti- within a social network, they often also lose control nents to be complemented by a large number of smaller over the data they own, since it is stored on a single islands with a tight network of bridges and ferry con- company’s servers. Interoperability between platforms nections between them. Compared with the currently is very rudimentary and largely limited to proprietary prevalent centralized social networks such an approach APIs. In order to keep data up-to-date on multiple plat- has a number of advantages: forms, users have to modify the data on every single

*Corresponding author. 1http://allthingsd.com/20110623/ WebID: http://sebastian.tramp.name the-web-is-shrinking-now-what/

1570-0844/12/$27.50 c 2012 – IOS Press and the authors. All rights reserved 2 S.Tramp et al. / An Architecture of a Distributed Semantic Social Network

– Privacy. Users of the DSSN can setup their own much less endangered of censorship as compared DSSN node or chose a DSSN node provider with to centralized social networks. particularly strict privacy rules in order to ensure In this paper we describe the main technological a maximum of privacy. This would facilitate a ingredients for a distributed semantic social network competition of Social Network operators about (DSSN) as well as their interplay. The semantic rep- the privacy rules most beneficial for users. Cur- resentation of personal information is facilitated by a rently, due to the monopoly or duopoly of So- WebID profile. The WebID protocol allows the use of cial Networks, privacy regulations are often more a WebID profile for authentication and access control driven by the commercialization interests. purposes. Semantic Pingback facilitates the first con- – Data security. Due to the distributed nature it is tact between users of the social network and provides more difficult to steal large amounts of private a method for communication about resources (such data. Also, security is ensured through public re- as images, status messages, comments, activities) on view and testing of open-standards and to a lesser the social network. Finally, PubSubHubbub-based sub- extend through obscurity due to closed propri- scription services allow obtaining near-instant notifica- etary implementations. As is confirmed very fre- tions of specific information as WebID profile change quently, centralized solutions are always more sets and activity streams from people in one’s so- endangered of attacks on data security. Even cial network. Combined these standards, protocols and with the best technical solutions in place, insider technologies provide all necessary ingredients to re- threats can hardly be prevented in a centralized alize a distributed social network having all the cru- setting but can not cause that much harm to a cial social networking features provided by centralized DSSN. ones. – Data ownership. Users have full ownership and This article is structured as follows: We present our control over the use of their data. They are not re- reference architecture of a distributed social semantic stricted to ownership regulations imposed by their network in Section 2. Our implementation based on social network provider. Instead DSSN users can the OntoWiki framework is demonstrated in Section 3, implement fine-grained data licensing options ac- while the evaluation results of our approach are dis- cording to their needs. Also, here a DSSN would cussed in Section 4. Finally, we give an overview of re- facilitate a competition of DSSN node providers lated work in Section 5 and conclude with an outlook for the most liberal and beneficial data ownership on future work in Section 6. regulations for users. – Extensibility. The representation of social net- work resources like WebIDs and data artefacts is 2. Architecture of a Distributed Semantic Social not limited to a specific schema and can grow Network with the needs of the users2. Although extensi- bility is also easily to realize in the centralized In this section we describe the DSSN reference ar- setting (as is confirmed by various APIs, e.g., chitecture. After introducing a few design principles Open Social), a centralized Social Network set- on which the architecture is based, we present its dif- ting could easily prohibit (or censor) certain ex- ferent layers, i.e. the data, protocol, service and appli- tensions for commercial (or political) reasons. cation layers. The overall architecture is depicted in – Reliability. Again due to the distributedness the Figure 1. DSSN is much less endangered of breakdowns or cyberterrorism, such as denial-of-service attacks. 2.1. Basic Design Principles – Freedom of communication. As we observed re- cently during the Arab Spring social networks can Our DSSN architecture is based on the following play a crucial role in attaining and defeating civil three design principles. liberties. A DSSN with a vast amount of nodes is Linked Data. The main protocol for data publishing, retrieval and integration is based on the Linked Data 2We already experimented with activities like git commits and comments on lines of source code – both usecases integrate very well principles [4]. All of the information contained in the because of the schema-agnostic transport protocols and the Linked DSSN is represented according to the RDF paradigm, Data paradigm. made de-referencable and interlinked with other re- S.Tramp et al. / An Architecture of a Distributed Semantic Social Network 3

Profile Bookmark Foto ... Manager Collection Sharing Application Layer 4 2 create update 5 search push subscribe 7 3 delegeate access to Ping Update Search Push 6 Service Layer 4 announce 1 create update 5 index 1 announce 7 read access

WebIDs Activity History Streams Feeds Data & Media announce Artefacts 1 Resources Feeds Data Layer

Fig. 1. Architecture of a Distributed Semantic Social Network (without protocol layer): (1) Resources announce services and feeds, feeds announce services – in particular a push service. (2) Applications initiate ping requests to spin the Linked Data Network. (3) Applications subscribe to feeds on push services and receive instant notifications on updates. (4) Update services are able to modify resources and feeds (e.g. on demand of an application). (5) Personal and global search services can index resources and are used by applications. (6) Access to resources and services can be delegated to applications by a WebIDs, i.e. the application can act in the name of the WebID owner. (7) The majority of all access operations is executed through standard web requests. sources. This principle facilitates heterogeneity as well ensures the extensibility of the data model and keeps as extensibility and enables the distribution of data and the overall architecture clean an reliable. services on the Web. The resulting overall distributive character of the architecture fosters reliability and free- 2.2. Data Layer dom of communication and leads to more data security by design. The data layer comprises two main data structures: resources for the description of static entities and feeds Service Decoupling. A second fundamental design for the representation and publishing of events and ac- principle is the decoupling of user data from services tivities. as well as applications [10]. It ensures that users of the network are able to choose between different ser- 2.2.1. Resources vices and applications. As a result, this principle al- We distinguish between three main categories of lows an even more distributed character of the Social DSSN resources: WebIDs for persons as well as appli- Network which stresses the same issues as distributed cations, data artefacts and media artefacts. The prop- Linked Data. In addition, this principle helps users of erties, conditions and roles in the network of these re- sources are described in the next paragraphs. the DSSN to distinguish between their own data, which they share with and license to other people and ser- WebID [20]3 is a best practice recently conceived in vices, and foreign data, which they create by using order to simplify the creation of a digital ID for end these services and which they do not own. This turns users. Since its focus lies on simplicity, the require- the un-balanced power structure of centralized Social ments for a WebID profile are minimal. In essence, Networks upside down by strictly settling the owner- a WebID profile is a de-referenceable RDF document ship of the data to the user side and allowing access to (possibly even an RDFa-enriched HTML page) de- that data in an opt-in way, which leads to more privacy. scribing its owner4. That is, a WebID profile con- Protocol Minimalism. The main task for social net- 3 http://webid.info/spec/ working protocols is to communicate RDF triples be- The latest spec is available at . 4The usage of an IRI with a fragment identifier allows the indi- tween DSSN nodes, not to enforce a specific work flow rect identification of an owner by reference to the (FOAF) profile nor an exact interpretation of the data. This constraint document. 4 S.Tramp et al. / An Architecture of a Distributed Semantic Social Network

tains RDF triples which have the IRI identifying the tion is an enhancement to the current WebID authenti- owner as subject. The description of the owner can cation process in order to allow applications to access be performed in any mix of suitable vocabularies resources and services on behalf of the WebID owner, and FOAF [7] as the fundamental ‘industry standard’ but without the need to introduce additional applica- which can be extended5. An example WebID profile tion certificates in a WebID. We describe the WebID comprising some personal information (lines 8-12) and protocol as well as our access delegation extension in two rel:worksWith6 links to co-workers (lines 6- detail in Section 2.3.1. 7) is shown in Listing 1. Agents and Applications play an important role in today’s social networks7. They have access to large parts of the profile data and can add or change some 1 @prefix rdfs: . 2 @prefix foaf: . ate and link images. Applications on the DSSN are 3 @prefix rel: also identified by using WebID profiles, but are not . 4 a foaf: described as a person but as an application. They can Person; act on behalf of a person but rely on delegated access 5 rdfs:comment "This is my public profile only, more information available with FOAF+SSL"; rights for such an activity. This process is described in 6 rel:worksWith , Section 2.3.1. 7 ; Data Artefacts are resources on the Web which are 8 foaf:depiction ; published according to the Linked Data principles. 9 foaf:firstName "Philipp"; foaf:surname " Frischmuth"; Data artefacts includes posts, comments, taggings, ac- 10 foaf:mbox ; created by services and applications on the Web. Most 11 foaf:phone ; 12 foaf:workInfoHomepage . such as SIOC [6], Common Tag8 or Activity Streams Listing 1: A minimal WebID profile with personal in RDF [12]. information and two worksWith relations to other Media Artefacts are also created by services and ap- WebIDs. plications but consist of two parts – a binary data part which needs to be decoded with a specific codec, and a meta-data part which describes this artefact. Usually, Apart from the main focus on representing user pro- such artefacts are audio, video and image files, but of- files, our architecture extends the WebID concept by fice document types are also frequently used on the facilitating two additional tasks: service discovery and Social Web. Media artefacts can be easily integrated into the DSSN by using the Semantic Pingback mech- access delegation. anism, which is described in Section 2.3.2, and a link Service discovery is used to equip a WebID with re- to a push-enabled activity stream. An example foto- lations to trusted services which have to be used with sharing application is described in Section 2.5. that WebID. The usage of the WebID itself ensures that an agent can trust this service in the same way as 2.2.2. Feeds she trusts the owner of the WebID. The most impor- Feeds are used to represent temporally ordered in- tant service in our DSSN architecture is the Seman- formation in a machine-readable way but are usually tic Pingback service, which we describe in detail in not expressed as RDF. Feeds are widely used on the Section 2.3.2. An example service relation is shown in Web and play a crucial role in combination with the Listing 4. PubSubHubbub protocol to enable near real-time com- In addition to this service, we introduce access del- munication between different services. In the context egation for the WebID protocol. WebID access delega- of the DSSN architecture, two types of feeds are worth considering:

5In theory, FOAF can be replaced by another vocabulary but as a grounding for semantic interoperability, we suggest to use it 7Social network games such as FarmVille can have more than 80 6Taken from RELATIONSHIP: A vocabulary for describing million users (according to appdata.com), which constantly cre- relationships between people at http://purl.org/vocab/ ate activities in their social network. relationship. 8http://commontag.org/Specification S.Tramp et al. / An Architecture of a Distributed Semantic Social Network 5

Activity feeds describe the latest social network ac- tion, a relying party can assert that the accessing user tivities of a user in terms of an actor - verb - owns a certain WebID. Furthermore, the WebID proto- object triple where activity verbs are used as types col can provide access control functionality for social of activities (e.g. to post, to share or to bookmark a networks shaped by WebIDs in order to regulate access specific object)9. Activity feeds can be used to produce to certain information resources for different groups a merged view of the activities of one’s own social net- of contacts (e.g. as presented with dgFOAF [18]). An work. In our DSSN architecture, each activity is cre- example of a WebID profile, which is annotated with ated as a Linked Data resource (i.e. a DSSN data arte- a public key, is shown in Listing 2. This WebID pro- facts), which links to the actor and object of the activ- file contains additionally a description of an RSA pub- ity. In addition, each activity is equipped with a Ping- lic key (line 15), which is associated to the WebID by back Server in order to allow receiving reactions on using the cert:identity property from the W3C this activity (called ) and thus to spin a con- certificates and crypto ontology (line 19). tent network between these artefacts.

History feeds are used to allow the syndication of 13 @prefix rsa:. change sets of specific resources between a publisher 14 @prefix cert: . 15 [] a rsa:RSAPublicKey; of a resource and many subscribers of the resource. 16 rdfs:comment "used from my smartphone ..."; History feeds describe changes in RDF resources in 17 cert:identity ; terms of added and deleted statements which are boxed 18 rsa:modulusc "C41199E ... 5AB5"^^cert:hex; in an feed entry. A subscriber’s social network 19 rsa:public_exponent "65537"^^cert:int. application can use this information to maintain an ex- Listing 2: An extension of the minimal WebID act copy of the original resource for caching and query- from Listing 1: Description of an RSA public ing purposes. History feeds are in particular important key, which is associated to the WebID by using for the syndication of changes of WebID profiles (e.g. the cert:identity property from the W3C if a contact changes its phone number). certificates and crypto ontology. 2.3. Protocol Layer

The protocol layer consists of the WebID identity Nevertheless the described approach requires the protocol and two networking protocols which pro- user to access a secured resource directly, e.g. through vide support for two complete different communica- a web browser which is equipped with a WebID- tion schemes, namely resource linking and push noti- enabled certificate. However, in the scenario of a dis- fication. tributed social network this arrangement is not always the case. If, for example, the software of user A needs 2.3.1. WebID (protocol) to update its local cache of user B’s profile, it will do so From a more technical perspective the WebID pro- by fetching the data in the background and not neces- tocol [20] (formerly known as a best practice [21]) sarily when user A is connected. An obvious solution incorporates authentication and trust into the WebID would be to hand out a WebID-enabled certificate to concept. The basic idea is to connect an SSL client the software (agent), but then the user needs to create certificate with a WebID profile in a secure manner a dedicated certificate for all tools that have to access and thus allowing owners of a WebID to authenticate secured information and simultaneously allows all par- against 3rd-party websites with support for the WebID ticipating tools to "steal" her identity, which is not the protocol. The WebID (i.e. a de-referencable URI) is, preferred solution from a security perspective. therefore, embedded into an X.509 certificate by using the Subject Alternative Name (SAN) extension. The document, which is retrieved through the URI, con- 20 @prefix dssn: . tains the corresponding public key. Given that informa- 21 dssn:deputy .

9Activity streams (http://activitystrea.ms) are Atom Listing 3: Access delegation through the format extensions to describe activity feeds. It is extensible in a way dssn:deputy property. that allows publishers to use new verb or object type IRIs to identify site-specific activities. 6 S.Tramp et al. / An Architecture of a Distributed Semantic Social Network

To resolve this dilemma, we have extended the We- vertisement of a lightweight RPC service13, in the RDF bID protocol by adding support for access delegation. document, HTTP or HTML header of a certain Web By delegating access to an agent, a user allows a par- resource, which should be called as soon as a (typed ticular agent to deputy access-secured information re- RDF) link to that resource is established. The Semantic sources. The agent itself authenticates against the re- Pingback mechanism enables people, but also authors lying party by using it’s own credentials, e.g. by em- of RDF content, a weblog entry or an article in gen- ploying the WebID protocol, too. Additionally, it sends eral, to obtain immediate feedback, when other peo- a X-DeputyOf HTTP header, which indicates that ple establish a reference to them or their work, thus a resource is accessed on behalf of a certain WebID facilitating social interactions. It also allows to pub- user10. The relying party then fetches the WebID and lish backlinks automatically from the original WebID checks for a statement as shown in Listing 311. profile (or other content, e.g. status message) to com- If such a statement is found and the relying party ments or references of the WebID (or other content) trusts the accessing agent, access to the secured re- elsewhere on the Web, thus facilitating timeliness and source is then granted. Given that the user is always coherence of the Social Web. able to modify the information provided by her We- As a result, the distributed network of WebID pro- bID, she stays in control with regard to the delegation files, RDF resources and social websites can be much of access to other parties. In contrast to other access tighter and timelier interlinked by using the Semantic control solutions such as OAuth12, a user only has to Pingback mechanism than conventional websites, thus maintain a central resource, her WebID. rendering a network effect, which is one of the ma- 2.3.2. Semantic Pingback jor success factors of the Social Web. Semantic Ping- The purpose of Semantic Pingback [23] in the con- back is completely downwards compatible with the text of DSSN architecture is twofold: conventional Pingback implementations, thus allowing the seamless connection and interlinking of resources – It is used to facilitate the first contact between two on the Social Web with resources on the DSSN. An WebIDs and establish a new connection (Friend- extension of our example profile with Semantic Ping- ing). back functionality making use of an external Semantic – It is used to ping the owner of different social Pingback service is shown in Listing 4. network artefacts if there are activities related to As requested by our third DSSN design paradigm these artefacts (e.g. commenting on a blog post, (protocol minimalism), Semantic Pingback is a generic tagging an image, sharing a website from the data networking protocol which allows to spin rela- owner). tions between any two Social Web resources. In the The Semantic Pingback approach is based on an ex- context of the DSSN Architecture, Semantic Pingback tension of the well-known Pingback technology [11], is used in particular for friending, commenting and which is one of the technological cornerstones of the tagging. overwhelming success of the in the Social Web. The overall architecture is depicted in Figure 2. 13In fact, we experimented with different service endpoints, and The Semantic Pingback mechanism enables bi- as a result we now prefer simple HTTP post requests which are not directional links between WebIDs, RDF resources as compatible with standard XML-RPC pingbacks. This is described in well as weblogs and websites in general (cf. Figure 1). more detail in [22]. It facilitates contact/author/user notifications in case a link has been newly established. It is based on the ad- 22 @prefix ping: . 23 ping:to < http://pingback.aksw.org>. 10We have recently discussed the motivation and solution of this extension with a few members of the W3Cs WebID incubator group Listing 4: Extension of the minimal WebID profile and hope that they will extend their specification regarding access delegation. from Listing 1: Assignment of an external Semantic 11The dssn:deputy relation and other related schema re- Pingback service which can be used to ping this sources introduced in this paper are part of the DSSN namespace, specific resource.) which is available at http://purl.org/net/dssn/. 12http://oauth.net/ S.Tramp et al. / An Architecture of a Distributed Semantic Social Network 7

Resource Layer Linking Resource Linked Resource (Source) (Target) (typed) linking 1

5 autodiscovery Publisher fetches 3

2 6 announces (updates) observes (notifies) 7 Pingback Client 4 Pingback Server (Link Propagator) RPC request RPC Layer

Link Publisher Link Reveiver

Fig. 2. Architecture of the Semantic Pingback approach: (1) A linking resource links to another (Data) Web resource, here called linked resource. (2) The Pingback client is either integrated into the data/content management system or realized as a separate service, which observes changes of the Web resource. (3) Once the establishing of a link has been noted, the Pingback client tries to auto-discover a Pingback server from the linked resource. (4) If the auto-discovery has been successful, the respective Pingback server is used for a ping. (5) In order to verify the retrieved request (and to obtain information about the type of the link in the semantic case), the Pingback server fetches (or de-references) the linking resource. (6 + 7) Subsequently, the Pingback server can perform a number of actions such as updating the linked resource (e.g. adding inverse links) or notifying the publisher of the linked resource (e.g. via email).

Friending is the process of establishing a symmetric tries from one publisher to many subscribers. Since foaf:knows relation between two WebIDs. A rela- feed entries are not described as RDF resources, Pub- tionship is approved when both persons publish this re- SubHubbub is not the best solution as a transport pro- lation in their WebIDs. A typical friending work flow tocol for a DSSN from our perspective. However, Pub- can be described by the following steps: SubHubbub with atom feeds is widely in use and has good support in the web developer community which – Alice publishes a foaf:knows relation to Bob is why we decided to use it in our architecture. Similar in their WebID. to Semantic Pingback, it is agnostic to its payload and – Alice’s WebID hosting service pings Bob’s We- can be used for all publish/subscribe communication bID because of this new statement. connections. – Bob receives a message from his Pingback Ser- The main work flow of establishing a PubSubHub- vice. bub connection is to advertise a hub service in an ex- – Bob approves of the relation by publishing it in isting feed. A subscriber follows this link and requests his WebID, which sends again a ping back to Al- ice. a subscription on this feed. If the feed changes, the feed publisher informs the hub service which instantly This basic model of communication can be applied broadcasts the changes to all subscribers14. The main to different events and activities in the social network. advantage of this communication model is to avoid Table 1 lists some of the more important pingback frequent and unnecessary pulls of all interested sub- events. In any case, the owner of these resources can scribers from this feed and to allow a faster broadcast be informed about the event and in most cases specific to the subscriber. actions should be triggered (refer to Section 3 for more In the DSSN architecture, two specific feeds are im- details). portant and interlinked with a WebID to allow sub- 2.3.3. PubSubHubbub scriptions to them: activity feeds which are used for ac- PubSubHubbub is a web-hook-based publish/sub- scribe protocol, as an extension to Atom and RSS, 14During the subscription a callback endpoint is supplied by the which allows for near instance distribution of feed en- subscribing endpoint, which is later used for pushing the data. 8 S.Tramp et al. / An Architecture of a Distributed Semantic Social Network

Table 1 Pingback activities distinguished by types of resources and relations.

source resource relation target resource description

WebID (foaf:Person) foaf:knows WebID (foaf:Person) friending sioct:Comment sioc:about foaf:Image commenting an image (or any other resource) sioc:Post sioc:reply_of sioc:Post replying to a (friends) post ctag: ctag:tagged * any resource is tagged by a user aair:Activity aair:activityObject * any resource is object of an activity tivity distribution and history feeds which are used for the application layer). WebIDs can be equipped with resource synchronization. different services in order to allow manipulation and other actions on the user’s data by other applications. Activity Distribution is a fundamental communica- tion channel for any social network. A personal activ- As depicted in Figure 1, we have defined four essential ity feed publishes the stream of all activities on social services for a DSSN. network resources (artefacts and WebIDs) with a spe- Ping Service cific user as the actor. These activities can and should The ping service provides an endpoint for any in- be created by any application which is allowed to up- coming pingback request for the resources of a user. date the feed (ref. access delegation). In addition, ac- First and foremost, it is used with the WebID for tivity feeds can be created for data and media arte- friending but also for comment notification and discus- facts in order to allow object-centered push notifica- sions. tion. Typically, activity feed updates are pushed to a One application instance can provide its services personal search / index service of a subscribed user for multiple resources. In a minimal setup, a ping ser- (see next section) vice provides only a notification service via email. In Resource Synchronization is an additional commu- a more complex setup, the ping service has access to nication scheme based on feeds. It is required, espe- the update service of a user (via access delegation) and cially in distributed social networks, to take into ac- can do more than notification. count that relevant data is highly distributed over many A ping service can be announced with the ping:to locations, and access to and querying of this data can relation as shown in Listing 4. be very time-consuming without caching. A properly connected resource synchronization tackles this prob- Push Service lem by allowing users to subscribe to changes of cer- The push service is used for activity distribution tain resources over PubSubHubbub. For a WebID, this and resource synchronization. Both introduced types process can be included into the friending process, of feeds announce its push service in the same way while for other resources a user can subscribe man- as using the rel="hub" link in the feed head. Since ually (e.g. for a group resource to receive updates of both types of feeds are valid atom feeds, a DSSN push its membership relations). Since resource synchroniza- service can be a standard PubSubHubbub-based in- tion is not an issue which was raised in the context of stance. DSSN at first, we have built our data model as Linked To equip social network resources with its cor- Data update logs [2] based on the work previously pub- responding activity and history feeds, we have de- lished by the Triplify project15. fined two OWL object properties which are sub- properties of the more generic sioc:feed relation 2.4. Service Layer from the SIOC project [6]: dssn:activityFeed and dssn:syncFeed. Services are applications which are part of the In addition to these RDF properties, DSSN agents DSSN infrastructure (in contrast to applications from should pay attention to the corresponding HTTP header fields X-ActivityFeed and X-SyncFeed, 15http://triplify.org which are alternative representations of the OWL ob- S.Tramp et al. / An Architecture of a Distributed Semantic Social Network 9 ject properties to allow the integration of media arte- In our architecture we assume that search services facts without too much effort. accept SPARQL queries. However, this assumption is not true for all public Semantic Web at Search and Index Service the moment. Search and index services are used in two different contexts in the DSSN architecture. Update Service Finally, an update service provides an interface to 1. They are used to search for public web resources, modify and create user resources in terms of SPARQL which are not yet part of a user’s social network. update queries. In the same way as private search ser- These search services are well-known semantic vices, update services are secured by way of the We- search engines as Swoogle [9] or Sindice [25]. bID protocol and accept requests only by the user itself They use crawlers to keep their resource cache and by agents in access delegation mode18. up-to-date and provide user interfaces as well as Typical examples of how to use this service are the application programming interfaces to integrate creation of activities on behalf of the user or the mod- and use their services in applications. ification of the user’s WebID, e.g. by adding a new In our architecture, these public services are used foaf:knows relation. to search for new contacts as well as other arte- In the next section, we give a detailed description of facts in the same way as people can use a stan- the service interplay and usage by applications. dard web search engine. The main advantage in using Semantic Web search engines lies in their 2.5. Application Layer ability to use graph patterns for a search16. 2. In addition to public search services, we want to emphasize the importance of private search ser- Social Web applications create and modify all kinds vices in our architecture. Private search services of resources for a user. In our architecture, they have to are used in order to have a fast resource cache not use the trusted services which are related to a WebID only for public, but also for private data which a instead of their own. Since access to these services is user is allowed to access. exclusively delegated by the user to an application, the 19 A private search service is used for all users and user has full control over her data . To illustrate how application queries from applications which act DSSN applications work with the WebID and its ser- on behalf of the user. The underlying resource vices, we will describe a simple foto-sharing applica- index of a private search service is used as a tion: callback for all push notifications from feeds to When a user creates an account on this service, she which the user has subscribed. That is, she is able uses her WebID for the first login and delegates ac- to query over the latest up-to-date data by using cess to this application. The application analyses the her private search service. In addition, she can WebID and discovers the trusted services and some query for data which has never been public and meta-data of the user (e.g. name, short bio and depic- is published for a few people only. tion). The user uploads her first image to the appli- cation, which, as a result, creates a new activity with Since private search services are used by applica- the user’s update service20. Furthermore, it equips the tions which act on behalf of the user, they must be WebID-protocol-enabled. That is, they accept requests dssn:trustedService. We assume that such an easy vocabu- from the user and her delegated agents only. In addi- lary is only the first step to a fully featured service auto-discovery tion, applications need to know which private search ontology and consider all dssn terms as unstable. service should be accessed on behalf of the user. This 18We defined dssn:updateService as a relation between a mode is again made possible by providing a link from WebID and an update service 19 the WebID to the search service17 At the moment we distinguish only between access and no ac- cess to a service. As an extension, we can imagine that a private search service can handle access on parts of the private Social Graph 16A motivating example in our context is the search for differently (an online game does not need to know which other ac- resources of type foaf:Person, which are related to the tivities you pursue on the web). Access policies for RDF knowledge DBpedia topic dbpedia:DataPortability (e.g. with the bases is a topic of ongoing research and we hope that the results of foaf:interest object property). this research area can be adapted here. 17In our prototypes we use a simple OWL object prop- 20This activity is instantly pushed to all of the user’s friends but is erty dssn:searchService, which is a sub-property of not part of the work of the image publishing service. 10 S.Tramp et al. / An Architecture of a Distributed Semantic Social Network newly uploaded image with the pingback service of the toWiki basic functionality. We employ the RDFau- user; thus enabling the image for backlinks and com- thor [24] JavaScript widget library in order to auto- ments. New comments can arrive from everywhere on matically create forms out of RDFa annotated HTML the web, but the application also provides its own com- documents21. Without any user effort, a newly created menting service (integrated in the image web view). If resource will be Linked data enabled if it shares the another user writes a comment on this image, a data namespace of the OntoWiki installation. In addition artefact is created in the namespace of the application to that, we added support for WebID authentication and a ping request is sent to the user’s pingback service as well as for certificate creation by implementing a (since this service is related to the image). dedicated WebID extension. Since there is quite some This simple example demonstrates the interplay and cryptographic processes involved, this extension needs rules of the DSSN service architecture. A more com- some prior configuration steps, e.g. the OntoWiki in- plex social network application is described in the next section. stance needs to be configured as a SSL/TLS enabled web application.

3. Implementation 3.2. Maintaining Network Connections

We provide a prototypical implementation of our ap- In order to maintain network connections (aka. proach which utilizes the OntoWiki application frame- friending) we execute the following steps in our imple- work [3] which is also the basis for the evaluation in mentation: Section 4. All features were realized by employing the extension mechanisms offered by OntoWiki. 1. When a user enters a WebID inside the friending The prototype described here, can be summarized as module, we add a new statement (foaf:knows) a WebID provider with an integrated communication into the RDF graph containing the users profile hub. The following features are implemented so far: data. This automatically generates a Pingback re- – Users can create and manage their WebID profiles quest, so the new friend will be notified. and any other Linked Data-enabled resource. 2. We create a new RDF graph, which will act as a – Users can make friends, subscribe to their activ- cache for the WebID profile data about that par- ities and profile updates and receive changes in- ticular friend. We also configure this new Knowl- stantly on change. edge Base in such a way, that it is imported into – Users can search and browse for friends and ac- the users graph automatically. tivities inside her social network as well as fil- 3. We fetch the data from the friends WebID by ter these resources by facets based on object and employing the Linked Data principles. We cache datatype properties. that data in the newly generated graph in order to – Users can comment on and subscribe to any enhance the performance. Since those informa- DSSN resource which is equipped with a Se- tion is stored in a separate graph, a synchroniza- mantic Pingback service or an PubSubHubbub- tion is trivial. enabled activity stream. 4. Finally, we subscribe to two feeds when they are – Users receive notifications if someone comments published within the users profile: (1) The users or links her WebID and send a pingback notifica- activity feed, which we use to create the news tion. feed for the user. The incoming atom activity en- In the next subsections we describe the implementa- try were transformed to AAIR resources and im- tion decisions made to allow these functionality. An ported to an additional activity RDF graph. (2) commented screenshot of the central activity stream The history feed for the friends profile. This feed interface can be seen on Figure 3. is employed in order to keep the cached WebID profile data up-to-date. 3.1. Creating and Updating Data Artefacts

Creating and managing Linked Data enabled RDF 21Each output page which is created by OntoWiki is RDFa en- resources can be achieved without modifying the On- hanced. S.Tramp et al. / An Architecture of a Distributed Semantic Social Network 11

2 1 3

6

4 5

Fig. 3. Screenshot of the OntoWiki DSSN activity stream view. The interface elements are: (1) The Share it! activity creation module where users can post status notes and share links and media artefacts. (2) The main interface view tab to switch between configuration screen, profile manager, friending interface and the activity stream. (3) The activity filter and search module, to allow a facet based browsing of activities. (4) The events module, which queries the cache social network data for birthdays and other events. (5) The activity stream, which is the result of the SPARQL query modified by the filter module. (6) The generic wiki interfaces to create any type of Linked Data resource (e.g. comments).

3.3. Ignoring Activities and WebIDs sharing as well as link recommendations. We imple- mented a small extension, which displays a Share It! Since a user is normally not interested in all activi- module inside the OntoWiki user interface. As a re- ties of his friends (e.g. certain games), there needs to sult the user can quickly access this functionality and be a feasibility to remove certain activities from the sharing is made really easy. Once the user generates timeline. We therefore employ EvoPat [15], a pattern- an activity, her activity feed is updated and subscribers based approach for the evolution and refactoring of to that feed are delivered with the new content by the RDF knowledge bases. EvoPat is integrated also as push service. an OntoWiki extension and in conjunction with our DSSN implementation we use this component to keep 3.5. Pingback Integration users timeline clean. Each time a user selects a Hide all activities . . . but- All activities are represented as Linked Data re- ton next to an activity, we create a new pattern, which sources and are equipped with a Pingback service as subsequently matches such statements in the graph and removes them. Possible hide patterns are generated well as a feed. Thus, users can comment on any re- from the activity data itself, for instance . . . from this source in their own social application and addition- user, . . . of this language or . . . about this object. Once ally subscribe to changes to that resource (e.g. com- new data for a given user is added, we re-apply the pat- ments by others). Each time someone comments on a tern. In this way it also applies for future changes of resource (or otherwise uses it) externally, a Pingback the knowledge base. request is sent to the owning OntoWiki instance. Con- sequently the publisher gets notified and is able to react 3.4. Generating and Distributing Activities again on that new activity, which facilitates conversa- tions in a distributed manner. If a user is subscribed to A user is able to generate different kinds of activ- a resource activity feed (which is automatically done, ities in our implementation. In the current state we once he comments on a particular resource), she gets support three types of activities: status updates, photo notified about other comments, even if she is not the 12 S.Tramp et al. / An Architecture of a Distributed Semantic Social Network

currently commenting person or the owner of the re- 1. User A takes a photo of user B and uploads source. it (e.g. with the example application from Sec- tion 2.5): The web space returns a link to the user’s pingback server in the HTTP header of the 4. Evaluation uploaded image. We divided our evaluation process into two inde- 2. User A explicitly tags the photo with user B: This pendent parts: Firstly, the qualitative evaluation part is done by creating a tag resource which links aims to prove the functionality of the DSSN architec- both to the image and to the WebID of user B. ture by assessing use cases from the social web acid A pingback client sends a ping request to all of test (Section 4.1). Secondly, the quantitative evalua- these resources after publishing the tag on the tion part aims to prove the real-world usefulness of web. our prototypical implementation by testing the perfor- 3. User B is notified that she is on a photo: The mance and distribution of data in the social network. notification is created by the pingback service of This evaluation is carried out by using a social network User B who has received a request from the tag- simulation approach (Section 4.2). ging application which was used by User A. User C, who follows user A, receives the photo: 4.1. Social Web Acid Test 4. User C is instantly provided with an update in The Social Web Acid Test (SWAT) is an integration her activity stream, informing her about the new use case test conceived by the Federated Social Web image. Incubator Group of the W3C. Currently, only the first 5. User C leaves a comment on the photo: This is and very basic level of the test (SWAT022) has been de- done in the same way as publishing the Tag. veloped and described completely. Nevertheless, those 6. User A and user B are notified about the com- parts of the next level (SWAT1) which are currently ment: User A will be notified because her ping- published are discussed here too. back service informs her about this ping. User B SWAT0: The objectives of the first SWAT level have will be notified only if she has subscribed to the clearly been described in the following use case23: activity feed of the photo provided that it exists. SWAT1 is currently not finally defined24, so an eval- 1 User A takes a photo of user B from her phone uation can be a rough sketch only. The next SWAT and posts it. 2 User A explicitly tags the photo with user B. level will require a few different use cases which intro- 3 User B gets notified that she is in a photo. duce some new Social Web concepts. However, most 4 User C who follows user A gets the photo. of the user stories are satisfied already as a conse- 5 User C leaves a comment on the photo. quence of the fully distributed nature of the DSSN ar- 6 User A and user B get notified about the comment. chitecture (e.g. data portability and social discovery). The more interesting user stories are: (1) The Private Listing 5: Social Web Acid Test - Level 0 content and Groups use cases will require a distributed ACL management. Some ideas on using WebIDs for group ACL management are already published with Utilizing all technologies described before, our dgFOAF [18] and we think that this is a good starting DSSN architecture passes the SWAT0 without any point for further research. (2) The Social News use case problems. The following enumeration describes the introduces a new vote activity. Since our architecture details: applies schema agnostic social network protocols, this new type of activity can be communicated as any other 22 http://www.w3.org/2005/Incubator/ activity. federatedsocialweb/wiki/SWAT0 23For this use case the following assumptions are made: (1) Users employ at least two (ideally, three) different services each of which is built with a different code base. (2) Users only need to have one 24Available online at http://www.w3.org/2005/ account on the specific service of their choice. (3) Ideally, partici- Incubator/federatedsocialweb/wiki/SWAT1_use_ pants A, B, and C use their own sites (personal URLs). cases (receive 29.07.2011). S.Tramp et al. / An Architecture of a Distributed Semantic Social Network 13

1 2 3 4

Triplify Profile Data Network and Activity Dump Activities Profile Distributor Replay Agents

Fig. 4. Workflow of the social network evaluation framework: (1) An activity and profile graph is preprocessed and dumped as RDF. (2) The knowledge base is split up into a part for every single profile and can be distributed to a list of server machines. (3) For each profile, a replay agent is initialized which will create and update a specific profile as well as activities on a single social network node. (4) The social network node (in our case an OntoWiki instance) connects to other nodes as well as creates activities for the Social Network in the same way as it would be achieved under control of a human user.

4.2. Evaluation Framework Architecture phones and small virtual hosts) to medium and heavy class systems (e.g. cloud instances, hosted services and Our next aim was to evaluate the architecture in full root servers). Each of these nodes accesses only quantitative terms. We decided to use a simulation ap- a small part of the complete Social Network graph proach in which activities are created artificially (but since the information is shared only with the connected based on real data, see Section 4.4) on the Social Net- nodes. work rather than arranging a user evaluation, which Consequently, we need to pose the following ques- strongly depends on the graphical user interface. In a tions in our quantitative evaluation: first step, we added an additional service for remote procedure calls to the OntoWiki DSSN node as well as 1. How many incoming triples need to be cached an execution client (replay agent) which can inject ac- from an averagely connected node in a week of tivities remotely controlled and based on input activity average Social Network activity? data in RDF. Then we used the public Twitter dataset 2. Are the queries fast enough to provide the data described in [1] and transformed it to an RDF graph as for the user interface from a triple store on such a base for a replay of activities of type status note. weak hardware? The workflow of the evaluation framework is de- To answer these questions, we created a testbed picted in Figure 4. It has a pipeline architecture which where we can analyse each node’s triple store. is built by using three additional tools that help us to generate and manage the generated replay agent test data. The generated data was processed by the pro- 4.4. Data Generation and Testbed Configuration file distributor which splits the data into separate user profiles, each with personal data and activities. Af- In a first step, we used Triplify [2] to generate the ter splitting the data into single parts, the profile dis- activity data from the relational database. The database tributor passes each part on to a corresponding replay consists of 2.3 Mio public tweets fetched from 1701 agent. Upon call the replay agent can instantiate a new Twitter accounts over a time frame of two months. For OntoWiki-based DSSN node with the given user pro- each posted tweet, we created a status note resource file and activities. description and added two SIOC properties to link the creator account and publish the creation timestamp. In 4.3. Evaluated Questions addition to that, we linked each status note to the Twit- ter terms of services to demonstate the usage of licens- Based on the proposed architecture, a DSSN will be ing in our architecture (the data ownership issue from distributed over hundreds of servers. These Social Net- the Introduction). Listing 6 shows an example status work nodes will have hardware specifications which note resource taken from the database26. can range from very light-weight (e.g. plug comput- ers as proposed by the FreedomBox25 project, smart- 26The Listings in this section use the prefixes aair, dct, sioc, rdfs, xsd, atom and foaf which are well known or already ref- 25http://www.freedomboxfoundation.org/ ered in this paper. The base prefix swj refers to the namespace 14 S.Tramp et al. / An Architecture of a Distributed Semantic Social Network

Query Q1 asks for an ordered list of the last ten status 1 swj:o5516682621620224 a aair:Note; 2 rdfs:seeAlso ; fetching a resource list based on a given user defined 3 sioc:created_at "2010-11-19T08:04:14"^^xsd: dateTime; configuration. The query is used for area 2 on Figure 3 4 sioc:has_creator swj:youngglobal; and is typically followed by a query which fetches the 5 dct:license ; 6 aair:content "I’m at Norwood (241 W 14th St, btw needed data of exactly these ten activities (rather than 7th & 8th, New York) w/ 5 others. http://4sq doing both in one single query). .com/66OndN". 7 8 swj:a5516682621620224 a aair:Activity; 9 atom:published "2010-11-19T08:04:14"^^xsd: dateTime; 1 SELECT DISTINCT ?r 10 aair:activityActor swj:youngglobalPerson; 2 WHERE { 11 aair:activityVerb aair:Post; 3 ?r a aair:Activity. 12 aair:activityObject swj:o5516682621620224. 4 ?r atom:published ?pub. 5 ?r aair:activityObject ?aairObject. 6 ?aairObject a aair:Note. Listing 6: Example status note resource and 7 FILTER corresponding activity. 8 (?pub >= "2010-12-01T00:00:00"^^xsd:dateTime) 9 FILTER 10 (?pub <= "2010-12-01T23:59:59"^^xsd:dateTime) 11 } 12 ORDER BY ?pub 13 LIMIT 10 For the creation of the corresponding activities we intepreted re-tweets as sharing and (original) tweets Listing 8: Ordered list of the last ten status posts. as posting activities and assigned different activity verbs based on the data. Each activity is linked to a FOAF person resource (without WebID specific en- hancements) which we additionally enriched with a Query Q2 is used to build the facet based exploration random foaf:birthday. module depicted in area 3 on Figure 3. It asks for all values of a specific exploration facet of the activities of a given time frame. This specific query asks for the 1 swj:youngglobal a sioc:UserAccount; 2 sioc:name "youngglobal"; used verbs in the selected set of activities (possible 3 sioc:account_of swj:youngglobalPerson; 4 rdfs:seeAlso . verbs are post, share, comment etc.). 5 6 swj:youngglobalPerson a foaf:Person; 7 foaf:name "youngglobal"; 8 foaf:knows swj:RdubuchePerson; 1 SELECT DISTINCT ?activityVerb 9 foaf:birthday "01-31"; 2 WHERE { 10 foaf:depiction ; 5 ?r aair:activityObject ?aairObject. 11 foaf:account swj:youngglobal. 6 ?aairObject a aair:Note. 7 ?r aair:activityVerb ?activityVerb. Listing 7: Example user account and FOAF person 8 FILTER 9 (?pub >= "2010-12-01T00:00:00"^^xsd:dateTime) resource. 10 FILTER 11 (?pub <= "2010-12-01T23:59:59"^^xsd:dateTime) 12 }}

Listing 9: A list of verbs connected to a list of Listing 7 shows the user account and FOAF person activities. resource which is linked from the activity in Listing 6. In order to evaluate the query performance on dif- ferent types of DSSN nodes in our architecture, we ex- tracted four exemplary queries, which are crucial for Query Q3 is used to fetch the list of the next five rendering the user interface depicted in Figure 3. upcoming birthdays together with the associated per- son. Since foaf:birthday values have a string http://dssn.lod2.eu/SWJ2012/, where the generated data datatype, comparison is also done by string order here. set is available for download. This query is used for area 4 on Figure 3. S.Tramp et al. / An Architecture of a Distributed Semantic Social Network 15

tuoso 6.1.4 backed OntoWiki DSSN node on a dual 1 SELECT DISTINCT ?person ?bday 2 WHERE { core 2.4GHz system, with 4GB of RAM and an SSD. 3 ?person a foaf:Person. A FreedomBox is a personal server running on a 4 ?person foaf:birthday ?bday. 5 FILTER (xsd:string(?bday) >= xsd:string("01-29")) low-end system in an area where the users privacy can 6 } be preserved (e.g. as a DSL router in his household). 7 ORDER BY ASC(?bday) 8 LIMIT 5 No-one else has access to the system which runs 24 hours a day in the same way a server does. In this cate- Listing 10: List the next five upcoming birthdays. gory, we used a virtual machine with 1GB of memory, one core and a 25% CPU limitation from the server system above. In addition to that, we limited the triple store process to 300MB of RAM. Query Q4 is used to prepare a human readable la- Smartphones bel for all the resources which are currently visible in are very important for Social Network the interface. This includes schema resource as well activities today, but they are used mostly as a thin as instance data. The query uses a list of resources client without a backend. We argue that connection (line 5) and a list of possible label attributes (line 6) stability and battery issues will soon be solved in and fetches them in a vertical result set. The client re- the near future and smartphones can be used as first class DSSN nodes. In this category, we used an in- ceives the data and has to select a value based on an 27 ordered internal list (e.g. a foaf:name value is pre- browser JavaScript API store based on rdfQuery and ferred over an rdfs:label value, because the latter deployed the data and the store without a frontend on is more general). This query strategy is especially use- an iPhone 4S. ful in combination with incomplete data (e.g. use the foaf:nick if you do not have a foaf:name). 4.5. Results and Discussion

As described in Section 4.4, the testbed consists of 1 SELECT DISTINCT ?s ?p ?o 1701 DSSN node profiles with 2.3 Mio activities. We 2 FROM 3 WHERE { first looked, how this data was shared over these DSSN 4 OPTIONAL {?s ?p ?o.} nodes to get a feeling, which amount of data these 5 FILTER(sameTerm(?s, <...>) || sameTerm(?s, <...>) || ... ) nodes have to store. Regarding this, two characteristic 6 FILTER(sameTerm(?p, skos:prefLabel) || sameTerm(?p, indices are important: The number of activities of an dc:title) || sameTerm(?p, dct:title) || sameTerm(?p, foaf:name) || sameTerm(?p, aair: account and the number of related accounts which will name) || sameTerm(?p, sioc:name) || sameTerm(? receive these activities. p, rdfs:label) || sameTerm(?p, foaf: accountName) || sameTerm(?p, foaf:nick) || Figure 5 shows a scatter plot where each account sameTerm(?p, foaf:surname) || sameTerm(?p, is one point while the x axis represents the number skos:altLabel))} of foaf:knows relations to other persons from the Listing 11: Ask for all known title attributes for a given testbed network and the y axis is the amount of triples list of resources. which are produced with the node’s frontend (profile triple and activity triple). We cleaned the data by eliminating outlier in both dimensions. The average size of outgoing triples for Since one of the ideas of a distributed Social Net- all profiles after one week of activity is around 1500 work is the usage of low-end hardware, which can be triple. The average amount of related contacts in the achieved by everyone or which already exists in most cleaned data is 200. We used these values as an artifi- households as DSL router or WLAN access points, we cial point in the graph and identified one single account defined three prototypical categories of DSSN nodes which had the smallest distance to this point. This ac- where we wanted to test the query performance: A server is a typical host in a computing center count was used for the query evaluation. foaf:knows which can be used for a rental fee per month. Privacy is This account has 212 relations with moderatly preserved on such a system since the com- other accounts, shared 113 foreign notes and posted puting center staff can access the system. This category 5 original notes in one week. With this activities, the of DSSN node is used mostly by people with a strong technical background. In this category, we tested a Vir- 27https://github.com/alohaeditor/rdfQuery 16 S.Tramp et al. / An Architecture of a Distributed Semantic Social Network

cache at the right places and pre-calculate many inter- face elements fostering a faster user experience.

50000 ● As a nice spin-off, this evaluation demonstrates how ● ●● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● Social Network federation can be achieved based on ●● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ●●● ●●● ●●● ●● ● ●● ● ●● ● ●●●●●● ●●● ●● ● ● ● ● ●●●●● ● ● ● ● ● ● ●●● ●● ●● ●● ● ● ●● ●● ● ●● ● ●● ● ● ● ● ● ●●●●● ●●●● ● ● ●● ● semantic interoperability. If a user wants to federate ● ● ● ●●● ●●●● ●●●● ● ● ●●●● ● ● ● ●●●● ●● ● ●●●●● ●● ●● ●● ●● ● ●●● ●●●● ● ● ● ●● ●● ● ●● ●● ●●● ●●●●●●●●●● ●● ●●● ● ● ●● ● ●● ●●●●● ●●●●●● ● ●● ●● ● ● ● ● ●●● ●●● ●●● ●●●●●●●●●●● ●●●●●● ●● ● ●● ●●●●● ●● ●●●●●●●● ●● ●● ●●● ● ● his DSSN node with other Social Networks, a semantic 5000 ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●● ● ●● ● ●●●●●●●●●●●●●●● ●● ● ●● ●● ● ● ●● ●●● ●●●●●●●●●● ●●●●● ● ●●●●●● ● ● ● ● ● ● ●●●●● ● ●●● ● ●●● ● ●●● ● ● ● ●●● ●● ●●●●●●● ●●●●●●●●●● ●●●●● ● ● ●● ●●●●● ● ●●●●●●●●●●●●●● ●●● ●● ● ●● ● ● ● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●●●● ● ●● agent can easily fetch and write data from Social Net- ●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●● ●● ● ● ●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●● ● ● ● ● ●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●● ● ● ● ● ●● ●●●●●●●●●●●●●●●●●●● ●●●●●●● ● ●● ●●●● ●●●● ● ● ●●●●● ●●●●●●●●●●●●●●●●● ●●●●●● ●● ● ● ● ● ● ● ● ●●●●●●●●●● ●●●●●●●●●●●●●●● ● ● ● ● ● ●●●● ●● ●●●●●●●●●●●●●●●●●●●● ● ●●●●●● ● ●● work APIs, translate it to RDF and publish it as Linked ●● ●●●●●● ●●●● ●●●● ● ●●●●●●●●● ●●● ● ● ● ● ● ● ●●●●●●●●●●●●●● ● ●● ● ● ● ● ●●● ● ●●●●●●●●●●●● ●●●●●●●●●●●●● ● ● ●● ● ● ●●●●●●●●●●●●●● ●●●●●● ● ●●● ●●●● ● ● ● ●● ●●●●●●●●●●●●●●●● ● ●● ●●●● ● ● ●● ● ● ●●● ●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●● ●● ●● ● ●● ● ● ● ●● ●●● ●●●●● ●●●●● ●●●● ● ● ● Data including the discoverable services described in ● ● ●● ●●●●●●●●●●●● ● ●●● ● ● ●● ● 500 ● ● ● ●● ●●● ● ● ●●●● ● ● ● ● ● ●●●●●●●●● ●● ●●●●●●●● ●● ● ●● ●●● ●● ●● ●● ●●●● ●●● ● ● ● ● ●●●● ●●●● ●● ●● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● Section 2. This can even be an integral part of its own ● ● ● ●●●●●●●● ● ● ● ●●●● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ●●●●● ●●●● ● ● ● ● ● ● ●●● ●● ● ●● ●● ●● ● ● ● ● ●● ●● ●●●●●● ●● ●● ● ●● ●● ● ● ●● ● DSSN node and only accessible for him.

produced triple in one week ● ● ● ● ● ●● ● ● ●

100 ● ●● ● ● ● ● ● ● ●●●● ● ● ●● ● ● ● ●

50 ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ●● 5. Related Work

10 ● ●●● ●●●● ●●● ● ●● ●●● ● ●●● ●

1 10 100 1000 10000 Most social web applications operate as silos of in- foaf:knows relations formation. This model addresses some drawbacks like lack of interoperability between applications, having one-single ownership model, inability in fully export- Fig. 5. Scatter plot with logarithmic axes of related accounts vs. the ing data, as well as opacity in using private data like number of created outgoing triples after one week of Social Network how data can be either used or transmitted. These activity (taken from [1], uncleaned). challenges are the reasons for addressing a distributed model. Various architectures for achieving a federated owner produced 1425 triples (incl. his profile). In addi- social network were proposed and numerous projects tion to that, he received 396346 triples from his friends based on those were developed with respect to a conve- over Linked Data and PubSubHubbub. nient and secure way for users. With the advent of the We used this graph and executed the introduced Semantic Web, research in this field was adapted for queries on the three different systems, described in taking advantages of machine-readable data and on- Section 4.4. Table 2 shows the average execution time tologies. We can roughly divide the related work into in milliseconds after 5 runs for each query. distributed social networks on the Web 2.0 and dis-

Table 2 tributed social networks on the Semantic Web. Runtime (in ms) of different evaluation queries Distributed social networks on the Web 2.0. The dis- Query Server FreedomBox Smartphone tributed social network model emerged to deal with challenges due to centralized models. The founda- Q1 (posts) 35 71 41325 tion of the distributed model lays on a set of stan- Q2 (facets) 150 312 37558 dards and technologies. This set of standards and Q3 (birthdays) 29 36 11324 protocols which together referred to as Open Stack Q4 (titles) 522 2205 n/a contains Rss, PubSubHubbub, Webfinger, ActivityS- treams, Salmon, OAuth authorization, OpenID authen- The results show, that querying this data is possi- tication, Portable Contacts, Wave Federation Proto- ble at least on real triple stores and with a moderate col, OpenSocial widget, XRD and OStatus. For in- time frame. Using a DSSN node on a dedicated server stance, the Webfinger protocol enables users to make should even work with much more data. The results email addresses valuable by adding metadata. Some of from the smartphone demonstrate a gap of working the projects which were developed using these tech- triple store implementations for HTML5 applications. nologies are: StatusNet28, DiSo29, Elgg30, GNU So- The huge amount of data which is received by DSSN nodes in a realistic environment should be con- 28http://status.net/ trolled not only by faster triple stores and more hard- 29http://diso-project.org/ ware, but also by implementing smart interfaces which 30http://elgg.org/ S.Tramp et al. / An Architecture of a Distributed Semantic Social Network 17 cial31, The Mine Project32, Appleseed33, OneSocial- Distributed Social Networks on Semantic Web. Pri- Web34, BuddyPress35, Cliqset36, Gowalla37, Poster- marily, attempts have been concentrated on the confor- ous38, DSNP39, OneSocialWeb40. These projects differ mation of traditional technologies such as microblog- in the employed protocols and federation policy. As an ging, or pingback for Semantic Web. example StatusNet is a platform which Thus, these adapted technologies can form the basis extends OStatus for providing federated status updates, for a new generation of social networks. SMOB is a and OneSocialWeb aims at connecting social networks semantic and distributed microblogging framework in- by employing XMPP for instance messaging. troduced in [14]. It presents some main requirements Centralized models benefit from short development for using microblogging at a large scale, i.e. machine- time, no required routing, central control and storage readable metadata, a decentralized architecture, open as well as data mining capabilities. In addition to that, data as well as re-usability and interlinking of data. using distributed models solely is subject to disadvan- These challenges have been addressed in SMOB by us- tages e.g. reachability of interlinked data or the main- ing RDF(a)/OWL data, distributed Hubs for exchang- tenance of many peers. While a federated model as a ing information and a sync protocol (based on SPAR- hybrid model improves some of those disadvantages, QL/Update over HTTP) and interlinking components. some of them still remain. There are many differ- sparqlPuSH [13] utilizes the PubSubHubbub protocol ent views for tackling distributed social network chal- to broadcast RDF query result updates. In this project, lenges on the way to a federation in the Web 2.0. A a SPARQL query which is associated with the user is well-known view is the network of networks, which at first created and monitored in an RDF triple store. employs existing protocols and standards for provid- The registered user is notified whenever the result set ing foundations based on which networks can easily changes. communicate with each other. Another view is using Two important prerequisites for establishing a dis- mashups. Mashups are web applications that combine tributed social network on the Semantic Web is firstly data from more than one service provider to create new 41 transforming social network data to RDF and sec- services. An example is the buddycloud project , run- ondly aggregating the exported datasets by linking be- ning an inbox server as a centralized manager over fed- tween person instances in different datasets. For the erated social networks. This server aggregates all the former prerequisite, an appropriate ontology is essen- posts and updates from social networks to which a user tial for representing social data. FOAF (Friend of a has subscribed. Here again, the XMPP protocol [17] is Friend) [8], which specifies how to describe personal used for messaging and federation. information and relationships with other people in a A user-centric architecture, used in Danube42, em- social network, is well suited for this purpose. Also ploys individuals for personal data and relations. It SIOC (Semantically Interlinked Online Communities) allows them to manage their relationships with each project43 extends FOAF in order to describe rich social other and with vendors. Another analogous view has data. The work presented in [5] uses SIOC for express- been proposed by PrPl [19] as a person-centric, social- ing the content of blogging and bookmarks. For the lat- networking infrastructure, where a person’s data is log- ter prerequisite, a graph matching model is needed for ically collected in one place, and social-networking providing linkages. The work which has been carried applications can be executed in a distributed manner out by [16] describes a method for interlinking user without a central service. profiles from distributed social networks like Face- book, MySpace and Twitter. The core idea is to gener- 31 http://foocorp.org/projects/social/ ate RDF graphs, which then can be interlinked based 32http://themineproject.org/ 33http://opensource.appleseedproject.org/ on the corresponding user identifiers in each graph. 34http://onesocialweb.org/ Beyond these activities, an infrastructure is neces- 35http://buddypress.org/ sary to which these technologies can be employed. Our 36 http://cliqset.com/ current work is a pioneer effort in integrating current 37http://gowalla.com/ 38https://posterous.com/ technologies into a coherent architecture to form a dis- 39http://www.complang.org/dsnp/ tributed social network in a semantic-based context. 40http://onesocialweb.org/ 41http://buddycloud.com/ 42http://projectdanube.org 43http://sioc-project.org/ 18 S.Tramp et al. / An Architecture of a Distributed Semantic Social Network

6. Conclusions and Future Work Gonzalo León, Yoëlle S. Maarek, and Wolfgang Nejdl, editors, Proceedings of the 18th International Conference on World In this article we described our reference architec- Wide Web, WWW 2009, Madrid, Spain, April 20-24, 2009, pages 621–630. ACM, 2009. ture and proof-of-concept implementation of a dis- [3] Sören Auer, Sebastian Dietzold, and Thomas Riechert. On- tributed social network based on semantic technolo- toWiki - A Tool for Social, Semantic Collaboration. In Isabel F. gies. Compared with the currently prevalent central- Cruz, Stefan Decker, Dean Allemang, Chris Preist, Daniel ized social networks this approach has a number of Schwabe, Peter Mika, Michael Uschold, and Lora Aroyo, edi- advantages regarding privacy, data security, data own- tors, The Semantic Web - ISWC 2006, 5th International Seman- tic Web Conference, ISWC 2006, Athens, GA, USA, Novem- ership, extensibility, reliability and freedom of com- ber 5-9, 2006, Proceedings, volume 4273 of Lecture Notes in munication. However, the work presented in this ar- Computer Science, pages 736–749, Berlin / Heidelberg, 2006. ticle can only be a first step of a larger research Springer. and development agenda aiming at realizing a truly [4] Tim Berners-Lee. Linked Data - Design Issues. website. last distributed social network based on semantic tech- change: 2009/06/18; retrieved: 2011/07/25. [5] Uldis Bojars, Alexandre Passant, Richard Cyganiak, and John nologies. Our implementation, for example, based on Breslin. Weaving sioc into the web of linked data. In Pro- the OntoWiki technology platform is currently only ceedings of the WWW 2008 Workshop Linked Data on the Web a proof-of-concept implementation. For a widespread (LDOW2008), Beijing, China, 2008. use, the usability, scalability and the multi-client capa- [6] John G. Breslin, Stefan Decker, Andreas Harth, and Uldis Bo- bilities have to be improved. Also, the distributed re- jars. SIOC: an approach to connect web-based communities. International Journal of Web Based Communities, 2(2):133– alization of social networking apps as implemented in 142, 2006. centralized social networks through APIs (e.g. Open [7] D. Brickley and L. Miller. FOAF Vocabulary Specifica- Social) has to be investigated. We also expect, that the tion. Namespace Document 2 Sept 2004, FOAF Project, 2004. combination of standards and protocols as described in http://xmlns.com/foaf/0.1/. this article will be implemented in a number of addi- [8] Dan Brickley and Libby Miller. FOAF vocabulary specifica- tion. Technical report, FOAF project, May 2007. tional platforms (e.g. Wordpress, Elgg, Diaspora) thus [9] Li Ding, Timothy W. Finin, Anupam Joshi, Rong Pan, R. Scott making them first-class nodes of the DSSN. Cost, Yun Peng, Pavan Reddivari, Vishal Doshi, and Joel Sachs. Swoogle: a search and metadata engine for the semantic web. In David A. Grossman, Luis Gravano, ChengXiang Zhai, Acknowledgments Otthein Herzog, and David A. Evans, editors, Proceedings of the 2004 ACM CIKM International Conference on Information and Knowledge Management, Washington, DC, USA, Novem- We would like to thank our colleagues from AKSW ber 8-13, 2004, pages 652–659. ACM, 2004. research group (in particular Jonas Brekle, Nadine [10] Maxwell Krohn, Alex Yip, Micah Brodsky, Robert Morris, and Jänicke and Norman Heino) for their helpful com- Michael Walfish. A Without Walls. In 6th ments and inspiring discussions during the develop- ACM Workshop on Hot Topics in Networking (Hotnets), At- lanta, GA, November 2007. ment of this approach. This work was partially sup- [11] S. Langridge and I. Hickson. Pingback 1.0. Tech- ported by a grant from the European Union’s 7th nical report, http://hixie.ch/specs/pingback/ Framework Programme provided for the project LOD2 pingback, 2002. (GA no. 257943). [12] Michele Minno and Davide Palmisano. Atom Activity Streams RDF mapping. NoTube Project, 2010. http://xmlns. notu.be/aair/. [13] A. Passant and P. Mendes. sparqlPuSH: Proactive notifica- References tion of data updates in RDF stores using PubSubHubbub. In SFSW2010, 2010. [1] Fabian Abel, Ilknur Celik, Geert-Jan Houben, and Patrick [14] Alexandre Passant, John G. Breslin, and Stefan Decker. Re- Siehndel. Leveraging the semantics of tweets for adap- thinking Microblogging: Open, Distributed, Semantic. In tive faceted search on twitter. In Lora Aroyo, Chris Welty, Boualem Benatallah, Fabio Casati, Gerti Kappel, and Gustavo Harith Alani, Jamie Taylor, Abraham Bernstein, Lalana Kagal, Rossi, editors, Web Engineering, 10th International Confer- Natasha Fridman Noy, and Eva Blomqvist, editors, The Seman- ence, ICWE 2010, Vienna, Austria, July 5-9, 2010. Proceed- tic Web - ISWC 2011 - 10th International Semantic Web Con- ings, volume 6189 of Lecture Notes in Computer Science, ference, Bonn, Germany, October 23-27, 2011, Proceedings, pages 263–277. Springer, 2010. Part I, volume 7031 of Lecture Notes in Computer Science, [15] Christoph Rieß, Norman Heino, Sebastian Tramp, and Sören pages 1–17. Springer, 2011. Auer. EvoPat – Pattern-Based Evolution and Refactoring of [2] Sören Auer, Sebastian Dietzold, Jens Lehmann, Sebastian RDF Knowledge Bases. In Proceedings of the 9th Interna- Hellmann, and David Aumueller. Triplify: Light-weight linked tional Semantic Web Conference (ISWC2010), Lecture Notes data publication from relational databases. In Juan Quemada, in Computer Science, Berlin / Heidelberg, 2010. Springer. S.Tramp et al. / An Architecture of a Distributed Semantic Social Network 19

[16] Matthew Rowe. Interlinking distributed social graphs. In Pro- back. In P. Cimiano and H.S. Pinto, editors, Proceedings of the ceedings of the WWW 2009 Workshop Linked Data on the Web EKAW 2010 - Knowledge Engineering and Knowledge Man- (LDOW2009), 2009. agement by the Masses; 11th October-15th October 2010 - Lis- [17] P. Saint-Andre. Extensible Messaging and Presence Protocol bon, Portugal, volume 6317 of Lecture Notes in Artificial Intel- (XMPP): Core. Request for Comments 3920, IETF, October ligence (LNAI), pages 135–149, Berlin / Heidelberg, October 2004. http://www.ietf.org/rfc/rfc3920.txt. 2010. Springer. [18] F. Schwagereit, A. Scherp, and S. Staab. Representing Dis- [24] Sebastian Tramp, Norman Heino, Sören Auer, and Philipp tributed Groups with dgFOAF. In ESWC2010, pages 181–195, Frischmuth. RDFauthor: Employing RDFa for collaborative June 2010. Knowledge Engineering. In P. Cimiano and H.S. Pinto, edi- [19] Seok-Won Seong, Jiwon Seo, Matthew Nasielski, Debangsu tors, Proceedings of the EKAW 2010 - Knowledge Engineer- Sengupta, Sudheendra Hangal, Seng Keat Teh, Ruven Chu, ing and Knowledge Management by the Masses; 11th October- Ben Dodson, and Monica S. Lam. PrPl: a decentralized so- 15th October 2010 - Lisbon, Portugal, volume 6317 of Lecture cial networking infrastructure. In Proceedings of the 1st ACM Notes in Artificial Intelligence (LNAI), pages 90–104, Berlin / Workshop on Mobile Cloud Computing & Services: Social Heidelberg, October 2010. Springer. Networks and Beyond, MCS ’10, pages 8:1–8:8, New York, [25] Giovanni Tummarello, Renaud Delbru, and Eyal Oren. NY, USA, 2010. ACM. Sindice.com: Weaving the Open Linked Data. In Karl Aberer, [20] M. Sporny, S. Corlosquet, T. Inkster, H. Story, B. Harbu- Key-Sun Choi, Natasha Fridman Noy, Dean Allemang, Kyung- lot, and R. Bachmann-GmÃijr. WebID 1.0: Web identifica- Il Lee, Lyndon J. B. Nixon, Jennifer Golbeck, Peter Mika, tion and Discovery. Unofficial draft, August 2010. http: Diana Maynard, Riichiro Mizoguchi, Guus Schreiber, and //payswarm.com/webid/. Philippe Cudré-Mauroux, editors, The Semantic Web, 6th In- [21] H. Story, B. Harbulot, I. Jacobi, and M. Jones. FOAF+SSL: ternational Semantic Web Conference, 2nd Asian Semantic RESTful Authentication for the Social W. In SPOT2009, 2009. Web Conference, ISWC 2007 + ASWC 2007, Busan, Korea, [22] Henry Story, Andrei Sambra, and Sebastian Tramp. Friending November 11-15, 2007, volume 4825 of Lecture Notes in Com- On The Social Web. In Federated Social Web Europe 2011, puter Science, pages 552–565. Springer, 2007. Berlin June 3rd-5th 2011, 2011. [23] Sebastian Tramp, Philipp Frischmuth, Timofey Ermilov, and Sören Auer. Weaving a Social Data Web with Semantic Ping-