Design and Implementation of a Federated Social Network
Total Page:16
File Type:pdf, Size:1020Kb
Hochschule f¨urTechnik und Wirtschaft Dresden Diploma Thesis Design and Implementation of a Federated Social Network Stephan Maka June 1st 2011 Supervised by Prof. Dr. rer. nat. Ralph Großmann Acknowledgements In gratitude to Simon Tennant for providing a vision I also want to thank the following persons for proofreading and their valuable feed- back: Maria Maka, Johannes Schneemann, Jan Dohl, Christian Fischer, Holger Kaden, and Frank Becker. License This original work is free to distribute under the terms of Creative Commons. Please give the author attribution. Contents Contentsi List of Tables.................................. iii List of Figures.................................. iii 1 Introduction1 1.1 Motivation.................................1 1.2 buddycloud................................2 2 Aspects of Social Networks5 2.1 User Profiles................................5 2.2 Social Connections............................6 2.3 Topology Models.............................6 2.3.1 Centralized Topology.......................7 2.3.2 Decentralized Topology......................8 2.3.3 Federated Topology........................9 2.4 Privacy Considerations.......................... 12 2.4.1 Access Control.......................... 12 2.4.2 Data Ownership.......................... 12 2.5 Data Portability.............................. 13 3 Requirements 15 3.1 Functional Requirements......................... 15 3.1.1 Transport Layer.......................... 15 3.1.2 Content Model.......................... 17 3.1.3 Content Distribution as a Publish-Subscribe System...... 18 3.1.4 Access Control & Roles...................... 19 3.1.5 Operations............................. 20 3.1.6 Interaction with Mobile Users.................. 23 3.2 Non-functional Requirements...................... 24 3.2.1 Scalability............................. 24 3.2.2 Open Source............................ 25 4 System Design 27 4.1 Use of XMPP............................... 27 4.1.1 Transport Layer.......................... 27 4.1.2 Identity.............................. 29 4.1.3 XMPP Server........................... 29 4.1.4 Federated Topology........................ 30 4.1.5 Causal Ordering.......................... 32 4.2 XMPP Components............................ 32 4.2.1 Flexibility............................. 32 4.2.2 Scalability............................. 33 i 4.3 XEP-0060 Publish-Subscribe....................... 33 4.3.1 Affiliations............................. 34 4.4 Content Formats............................. 35 4.4.1 ATOM............................... 35 4.4.2 Discussion Threading....................... 36 4.4.3 Activity Stream Semantics.................... 36 4.4.4 Geolocation............................ 38 4.4.5 Normalization........................... 39 4.4.6 Metadata............................. 40 4.5 buddycloud Protocol........................... 40 4.5.1 Conventions............................ 40 4.5.2 Partial Data Transfer....................... 42 4.5.3 Operations............................. 44 4.6 Comparison to Other Federated Systems................ 51 4.6.1 OStatus.............................. 52 4.6.2 OneSocialWeb........................... 55 5 Implementation 57 5.1 Setup Prerequisites............................ 57 5.1.1 Required DNS Configuration................... 57 5.1.2 BOSH Endpoint Deployment.................. 58 5.2 Service Component............................ 58 5.2.1 Network Services as Proxies................... 58 5.2.2 Database Backend........................ 59 5.2.3 Controller............................. 60 5.3 Web-based Client............................. 61 5.3.1 Client-Side Model......................... 62 5.3.2 In-Browser Caching........................ 62 5.3.3 Synchronization.......................... 63 6 Evaluation 65 6.1 Suitability of XMPP........................... 65 6.1.1 Binary Data............................ 65 6.2 Suitability of Publish-Subscribe..................... 66 6.2.1 Inboxes.............................. 66 6.2.2 Privacy Considerations...................... 68 6.2.3 Message Archive Management.................. 68 6.3 Anonymous Browsing........................... 68 6.4 Future Work................................ 69 Bibliography 71 ii Lists of tables and figures List of Tables 2.1 Topologies of example legacy and contemporary Internet systems..7 4.1 Mapping model entities of example activity streams to feed formats. 35 4.2 Fields of XEP-0080: User Location................... 39 4.3 Most prominent metadata fields with their identifiers.......... 40 4.4 Subscription states for the different node access models........ 47 List of Figures 2.1 Mutual-only subscriptions........................6 2.2 Unilateral subscriptions..........................6 2.3 Centralized topology...........................7 2.4 Decentralized topology..........................9 2.5 Federated topology............................ 10 3.1 Content model.............................. 17 3.2 Operations use cases by user roles.................... 21 4.1 Use case overview of XMPP servers................... 30 4.2 Federated & client-server topologies in XMPP............. 31 4.3 Multiple load-balancing components................... 33 4.4 Discovering a user's channel service.................... 44 4.5 Subscription state transitions for the authorize access model.... 51 4.6 References in WebFinger discovery................... 52 5.1 Proxy model of a buddycloud channel service............. 58 5.2 Service-side data model.......................... 59 5.3 Control flow of the service controller.................. 61 5.4 Client-side data model.......................... 62 6.1 Synchronization parties with direct reachability but no inbox..... 67 6.2 Synchronization with an inbox...................... 67 iii 1 Introduction 1.1 Motivation The Internet has witnessed three milestones that helped it reach out to the general public. Starting in the 1990s, the World-Wide Web turned information into visually appealing but structured documents that can reference each other regardless of the underlying network topology. The linking paradigm allows anyone to browse the global information space by just following pointers, where, from a user's point of view, the most significant indication of different providers is visual style. Enter Web 2.0, turning previously static documents into user-friendly interactive applications that allow many more people to publish documents and contribute to information; not just those with sufficient technical skills. Lowering this entry barrier is essential for the paradigm shift from a distinction between publishers and viewers towards services that focus on user-generated content. Gradually evolving the user-centered concept of Web 2.0, user relationships have become the driving force of Social Content. Replacing constant information sources by personal connections adds a highly personal flavour to the global Internet, just like interactions in our everyday lives. It also led to recognition of the high eco- nomical value social sharing yields: people are much more likely to buy products recommended by friends they know, and probably trust, than through conventional advertising. With over 600 million active users1, Facebook is the number one Internet ap- plication nowadays. Many popular Web sites have been extended by Facebook widgets that allow users to recommend (\like") information to their social circle. In advertising, companies resort from Internet domain names to service-specific identi- fiers. The citizens who organized protests for the Egyptian revolution of 2011 have been dubbed Facebook Generation due their widespread usage of the social network service. Democratic societies have come a long way to value pluralistic facilities. Therefore we must ask: is it right that one or a few companies dominate the communication of the general public? Is it not dangerous that these monopolies concentrate power over the fabric of our society? There are even downsides from a technical perspective: in the 1960s the Internet was built deliberately decentralized to withstand partial destruction. In 2011, Egypt had been disconnected from the rest of the global Internet to impede further use of popular social networking services, effectively driving more activists onto the streets. A national network could have continued to function in the absence of foreign services, but by relying on these central institutions, it was rendered useless. Internet services need to become decentralized again, just like the World-Wide 1As of January 2011, Facebook reaches out to almost one tenth of the entire human population. 1 1.2. BUDDYCLOUD Web prevailed over the walled gardens of AOL and Compuserve. Users must be able choose their service providers freely. If in doubt, educated users should be able to run their own infrastructure to not depend on foreign institutions. 1.2 buddycloud buddycloud is an existing social network service. Its content model revolves around a few main sets of functionality: • Users have associated channels where they post status updates, making it some kind of an online diary. Friends who are interested may comment on these posts. This aspect provides interaction with a much more conversational nature over a single-publisher model. • Focussing on mobile usage, users also publish their geographical location to their friends for awareness. • Similar to channels, a mood status update can be published. In contrast to the channel model, mood content does not allow comments by others. It is status publication deliberately