Semantic Web Update

Interlinking the Social Web with Semantics

Uldis Bojārs, John G. Breslin, Vassilios Peristeras, Giovanni Tummarello, and Stefan Decker, DERI, National University of Ireland, Galway

ne of the most visible trends on the Web is the emergence of Social Web sites, Owhich help people create and gather knowledge by simplifying user contribu- Applying Semantic tions via blogs, tagging and , wikis, podcasts, and online social networks.

Web technologies The Social Web has enabled community-based knowledge acquisition with efforts such

to the Social Web as Wikipedia, demonstrating the “wisdom of the ity. The Scientific American article by Tim Berners- crowd” in creating the world’s largest online ency- Lee, , and defined the Se- can lead to a Social clopedia. Although defining which structures or ab- mantic Web as “an extension of the current Web in stractions belong to the Social Web is difficult, they which information is given well-defined meaning, , all facilitate collaboration and sharing among users, better enabling computers and people to work in although usually on single sites. cooperation.”1 During the last couple years, much creating a network Current online-community sites are isolated from effort has gone into defining standards for data in- one another, like islands in a sea. Various discus- terchange and interoperation. The Semantic Web of interlinked and sions might contain complementary knowledge and technology stack is well defined, enabling the cre- discussions—parts of the answer a person is look- ation of and associated vocabularies. The semantically rich ing for—but people participating in one discussion Semantic Web effort is in an ideal position to make can’t readily access information about related dis- Social Web sites interoperable. Applying Semantic knowledge. cussions elsewhere. As more and more Social Web Web frameworks such as SIOC (Semantically In- sites, communities, and services come online, the terlinked Online Communities) and FOAF (Friend- lack of interoperation among these data silos or of-a-Friend) to the Social Web can lead to a Social “stovepipes” becomes obvious. The potential syner- Semantic Web (see Figure 1), creating a network of gies among many sites, communities, and services interlinked and semantically rich knowledge. are expensive to exploit, and their data are diffi- cult and cumbersome to link and reuse. The main What is SIOC? reason for this lack of interoperation is that for the The SIOC initiative aims to enable the integra- most part in the Social Web, common standards still tion of online-community information. SIOC (pro- don’t exist for knowledge and information exchange nounced “shock”) provides a Semantic Web ontol- and interoperation. ogy for representing rich data from Social Web sites However, the Semantic Web effort aims to provide in RDF.2 It has recently achieved broad adoption in the tools needed to define extensible, flexible stan- a variety of commercial and open source software dards for information exchange and interoperabil- applications (see the sidebar “Some Recent SIOC

May/June 2008 1541-1672/08/$25.00 © 2008 IEEE 29 Published by the IEEE Computer Society

Authorized licensed use limited to: UNIVERSITY OF ALBERTA. Downloaded on October 1, 2009 at 14:10 from IEEE Xplore. Restrictions apply. Semantic Web Update

2007, 16 organizations submitted the SIOC Ontology to the W3C, which published it Social Web (see www.w3.org/Submission/2007/02). Social Semantic Web Wikis, blogs, SIOC, Twine, DBPedia social networks The SIOC ontology The ontology consists of the SIOC Core Ontology (containing 11 classes and 53 properties, http://rdfs.org/sioc/spec) and the SIOC Services and SIOC Types mod- ules. The SIOC Core Ontology defines the Semantic Web URLs, HTML, HTTP RDFS, OWL, SPARQL main concepts and properties required to describe semantic information from online Syntax Semantic communities. Figure 2 shows the ontology’s main terms. We chose SIOC’s basic con- Figure 1. The Social Semantic Web. Applying Semantic Web technology to the Social cepts to be as generic as possible so that we Web could create a network of interlinked, semantically rich knowledge and bring could describe many different kinds of user- the Social Web to its full potential. generated content. We originally created the SIOC Core On- tology using the terms for describing Web- Some Recent SIOC Applications based discussion areas such as blogs and mes- sage boards. For example, users create posts Companies are using Semantically Interlinked Online Communities (SIOC) in a (sioc:Post) organized in forums (sioc:Forum), variety of application domains: which are hosted on sites (sioc:Site). In paral- • OpenLink’s Data Spaces product provides access to SIOC instance data (think lel with the evolution of new kinds of Social open social graph++) from a range of applications including blogs, wikis, ag- Web sites, these concepts became subclasses gregated feeds, shared bookmarks, discussions, photo galleries, and brief- of higher-level concepts that were added to cases (for example, WebDAV file servers). SIOC as it evolved: data spaces (sioc:Space, a • Engage, a community information application from Talis, combines SIOC place where data resides), containers (sioc: in its schematics with the Simple Knowledge Organization System (SKOS) Container, used for grouping items together), for knowledge organization and Friend-of-a-Friend (FOAF) for person and content items (sioc:Item). These classes description. • The Seesmic “video microblogging” service has also adopted the SIOC ontol- let us structure the information in online- ogy as one of its open-platform formats. community sites and distinguish between • Talk Digger, a Web service from Zitgist LLC, helps people find, follow, and en- different kinds of objects. Properties defined ter conversations on the Web. It exports all their data using SIOC. in SIOC let us describe relations between • ImageMatters LLC recently announced gnizr, an open source social-book- objects and their attributes. For example, marking and mashup application that exports saved bookmarks using SIOC in combination with a tag ontology. • the sioc:has_reply property links reply posts Many other open source applications of SIOC are coming from the Web devel- to the content to which they’re replying, oper community. OpenQabal, an open source social-networking and collabora- • the sioc:has_creator and :maker properties tion platform, supports SIOC, allowing Roller, JavaBB, and other component ap- link user-generated content to additional plications to become part of the “SIOC-o-sphere.” information about its authors, and SIOC descriptions of forums are also being used for teaching and learning: • the sioc:topic property points to a resource • Faculty Academy’s Fishtank project leverages RDF’s structure and searching describing the topic of content items (for power. example, their categories and tags). • IkeWiki, a knowledge-engineering wiki, allows discussions represented using the SIOC ontology (following a forum style with threaded views) to be at- The high-level concepts sioc:Space, sioc:Con- tached to wiki pages. tainer, and sioc:Item are at the top of the SIOC • Swaml (Semantic Web Archive of Mailing Lists) uses SIOC as its base ontology. class hierarchy; most other SIOC classes The project also includes Buxon, a sioc:Forum visor written in PyGTK. are subclasses of these. A data space (sioc: • Tom Morris has created a third-party RDF exporter for the Twitter microblog Space) is a place where data resides, such service that uses SIOC (to represent all microblog entries) and FOAF (to de- as a Web site, personal desktop, or shared file space. It can be the location for a set of Container(s) with content Item(s). Subclasses Applications”). For example, it’s commonly user-generated content from such sites, of Container can be used to further specify used in conjunction with the FOAF vocabu- SIOC enables new usage scenarios for on- typed groupings of Item(s) in online com- lary for expressing personal-profile and so- line-community site data and lets develop- munities. The class sioc:Item is a high-level cial-networking information. ers build innovative semantic applications concept for content items; it describes user- By becoming a standard way to express on top of the existing Social Web. In mid- created content.

30 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS

Authorized licensed use limited to: UNIVERSITY OF ALBERTA. Downloaded on October 1, 2009 at 14:10 from IEEE Xplore. Restrictions apply. We usually use these high-level concepts subClassOf has_scope as abstract classes from which we can de- Space Site Role rive other SIOC classes. We need them to ensure that SIOC can evolve and be applied to specific domain areas where definitions has_space has_parent has_host has_function of the original SIOC classes such as sioc:Post or sioc:Forum can be too narrow. For example, Container Forum User an address book, which describes a collec- subClassOf tion of social and professional contacts, is a type of sioc:Container, but it’s not the same as a has_container has_container has_member discussion forum. subClassOf Let’s look at some subclasses of these Item Post Usergroup high-level SIOC concepts from the SIOC has_creator Types module. has_reply topic SIOC modules The SIOC Types module defines more spe- Topicopic cific subclasses of the SIOC Core concepts that we can use to describe the structure and types of content found in Social Web sites. Figure 2. The main classes and properties in the Semantically Interlinked Online This module defines subtypes of SIOC ob- Communities (SIOC) ontology. The ontology describes common terms corresponding jects needed to more precisely represent to the application data found in online communities. various elements of online-community sites (for example, sioc_t:MessageBoard is a subclass + of sioc:Forum). It also introduces new sub- SIOC + FOAF SKOS classes for describing different kinds of So- foaf: cial Web objects in SIOC. In addition, the Person module points to existing ontologies suit- foaf: foaf: holdsOnlineAccount holdsOnlineAccount able for describing these objects’ details (for User User instance, a sioc_t:ReviewArea might contain foaf: Review(s), described in detail using the Re- holdsOnlineAccount view Vocabulary). Examples of SIOC Core User foaf: User Post foaf: holdsOnlineAccount Post Post Ontology classes and their corresponding holdsOnlineAccount SIOC Types module subclasses include ChatChannel Weblog Post Post • sioc:Container: AddressBook, AnnotationSet, Audio- Post Post Weblog User topic topic Channel, BookmarkFolder, Briefcase, EventCalendar; MessageBoard Post Post Post • sioc:Forum: ChatChannel, MailingList, Message- User skos:Concept Board, Weblog; and foaf: MailingList skos: account_of isSubjectOf • sioc:Post: BlogPost, BoardPost, Comment, Instant- Agent Post Message, MailMessage, WikiArticle. skos: skos:Concept isSubjectOf skos: ? Post narrower Community sites typically publish Web MessageBoard service interfaces for programmatic search Post and content management services (typically SOAP or REST [Representational State Figure 3. Relationships between the SIOC, Friend-of-a-Friend (FOAF), and Simple Transfer]). These services can be generic Knowledge Organization System (SKOS) ontologies. This example shows how (with standardized signatures covering in- SIOC uses FOAF terms to describe person-centric data and uses a SKOS schema to put and output message formats) or service represent topic hierarchies and relationships. specific (where service signatures are unique to specific functions performed, as in cur- nitions. The sioc_s:service_definition property re- laries, leading to better data interoperabil- rent Web 2.0 API usage patterns). The SIOC lates a sioc:Service to its full Web service defini- ity. The SIOC ontology follows this prac- Services module lets us indicate that a Web tion (for example, in WSDL). tice by reusing the FOAF vocabulary to service is associated with (located on) a sioc: describe person-centric data and the Dublin Site or part of the sioc:Site—for example, a sioc: Relationships between SIOC Core (DC) vocabulary to describe proper- Forum. This module provides a simple way to and other ontologies ties of SIOC content items. Figure 3 shows tell others about a Web service and shouldn’t One of the Semantic Web best practices is some of the relationships between SIOC be confused with detailed Web service defi- the reuse of existing ontologies and vocabu- and other vocabularies. A person (described

May/June 2008 www.computer.org/intelligent 31

Authorized licensed use limited to: UNIVERSITY OF ALBERTA. Downloaded on October 1, 2009 at 14:10 from IEEE Xplore. Restrictions apply. Semantic Web Update

What SIOC is not Useful SIOC Resources SIOC tries to represent the information of online-community sites and human com- These resources will be useful in finding out more about the Semantically Inter- munication. However, unlike other repre- linked Online Communities (SIOC) initiative: sentation approaches (for example, OWL Time3), SIOC isn’t aiming at a domain’s ax- The SIOC project page: iomatization. We deliberately designed the http://sioc-project.org The SIOC W3C member submission: SIOC ontology to capture existing informa- www.w3.org/Submission/2007/02 tion that’s mostly present in current Web in- A SIOC developer mailing list: formation systems and to provide a low en- http://groups.google.com/group/sioc-dev try barrier for users and developers. SIOC’s An relay chat channel about SIOC: design lets us export information relatively Channel #sioc on IRC server irc.freenode.net simply from many sources and thus gen- A comprehensive list of SIOC applications: erate a critical mass of information. This http://rdfs.org/sioc/applications leads quickly to the Web community adopt- The SIOC Browser prototype: ing SIOC, providing additional exporters, http://sparql.captsolo.net/browser and writing applications. SIOC aims at a The Semantic Radar extension for Firefox: https://addons.mozilla.org/en-US/firefox/addon/3886 minimal consensus of a given domain rather PingTheSemanticWeb.com’s namespace statistics page: than a complete specification. http://pingthesemanticweb.com/stats/namespaces.php SIOC is listed here as the fourth-most-used namespace out of more than 500. SIOC-enabled According to this service, more than 100,000 RDF documents use SIOC on the applications and tools Web. SIOC gives different online-community sites a common format for expressing their data in a rich, interlinked form. This interconnec- tion of Social Web content using Semantic Producers Collectors Consumers Web technologies can lead to many interest- Add-ons and functions SIOC crawlers and RDF browsers and other ing possibilities on the individual and com- for exporting SIOC from aggregate storage custom SIOC explorers munity level. Taking advantage of the infor- existing apps of data SIOC detectors and mation represented in SIOC, developers can SIOC from semistructured Indexers of SIOC clipping applications build various browsers and applications on data or queriable APIs instances without top of SIOC. full storage Applications with native Reuse and import for data storage of SIOC data portability requirements The SIOC food chain Figure 4 shows some application types in the Bypassing apps by directly Food chain of applications Graphical viz of derived mapping RDBMs to SIOC SIOC networks SIOC food chain, illustrating where SIOC data are being produced, collected, and consumed. This isn’t exhaustive but does Figure 4. The SIOC food chain. Many types of applications produce, collect, cover most available SIOC application types and consume SIOC data. (http://rdfs.org/sioc/applications). SIOC data producers include by foaf:Person) will usually have a number of tent items and are represented in SIOC us- online accounts (sioc:User) on different online- ing a sioc:topic property. SIOC doesn’t require • application add-ons and modules that re- community sites. They use these accounts to the values of sioc:topic to be in a particular on- use and build on existing internal func- create content (sioc:Item or sioc:Post). The class tology (apart from the value being a URI). tions to export SIOC data in RDF for- sioc:User is a subclass of foaf:OnlineAccount, and It’s left up to information system architects mat (for example, the SIOC exporters for the foaf:holdsOnlineAccount property links a per- to choose the most appropriate ontology to WordPress or Drupal); son to his or her online accounts. SIOC con- represent topics in each case. One approach, • sources of semistructured data or queri- tent items (shown as Post objects in Figure 3) illustrated in Figure 3, uses the Simple able APIs that output data in formats are described using properties from SIOC, Knowledge Organization System (SKOS) that can be converted to SIOC and FOAF, and DC. By using FOAF to point schema to represent topic hierarchies and augmented with other RDF data (such to multiple social-media site accounts reg- their relationships. as Swaml [Semantic Web Archive of istered to a user and using SIOC to express The W3C Member Submission docu- Mailing Lists] for mailing list mail- user-generated content on these sites, we can ments for SIOC contain more detailed in- boxes, the Flickr2RDF tool, and Sioku, aggregate the content created by a person all formation on the relationships between which leverages the Jaiku microblog- across the Social Web. SIOC and other RDF ontologies. (For more ging API); Topics are usually present on the Social information on SIOC, see the sidebar “Use- • applications that store semantic data na- Web as categories and tags assigned to con- ful SIOC Resources.”) tively and use SIOC as one of their repre-

32 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS

Authorized licensed use limited to: UNIVERSITY OF ALBERTA. Downloaded on October 1, 2009 at 14:10 from IEEE Xplore. Restrictions apply. sentation formats (for example, Talis En- Generating the data son), posts and comments (sioc:Post and sioc_t: gage); and SIOC export tools are a class of applica- Comment), and the Web site structure (sioc:Site • schema-mapping tools that directly ac- tion that produces SIOC RDF data from on- and sioc:Forum). cess MySQL or other relational data- line-community sites, often implemented base stores—bypassing default applica- as plug-ins to a site’s specific content- Reusing SIOC data tion access methods—to generate SIOC management system. By enriching social- As the Social Web begins to generate more metadata (for example, OpenLink Data community sites with SIOC RDF exporters, SIOC data, SIOC data consumers can reuse Spaces). we can automatically create high-quality this information—for example, to provide data without screen-scraping or recon- better tools for finding related information To help produce SIOC data from new ap- structing relationships from the informa- across online-community sites or to trans- plications, developers have also created re- tion visible on Web pages. fer rich information about content items be- usable data-export APIs for PHP, Ruby on SIOC export plug-ins are available for tween those sites. The pilot implementation Rails, and Java. several common content creation platforms. of a SIOC import plug-in for WordPress The SIOC data these applications pro- These SIOC exporters represent informa- (http://wiki.sioc-project.org/w/SIOC_ duce can be discovered, collected, stored, tion about every content item on a site in Import_Plugin) is one application for reus- or indexed in an intermediary step or RDF, making all the main information at a ing SIOC data between social-media sites. consumed directly by end-user applica- site available in RDF and ready for reuse. Regular bloggers can reuse SIOC data tions.4 Collectors can include Semantic by installing this plug-in to the WordPress Web spiders (“scutters”) or crawlers that blog engine. A user begins by supplying the are specifically tailored to gather SIOC URL of some SIOC data. The plug-in then data. These crawlers can store SIOC in- The SIOC Explorer aggregates retrieves the data from this URL, extracts stances from multiple sources in a sin- all the content items, and posts them to the gle data store (such as SWSE [Semantic data from various online- blog. While the pilot implementation pro- Web Search Engine], Swoogle, Zitgist, cesses only one SIOC data file at a time, it and the SIOC Crawler). Other collec- community sites and lets can be easily extended to mirror all content tors include indexers that store URLs of from a blog or forum site. sites where developers can access SIOC users browse and explore Tools such as this SIOC import plug-in data instances (for example, Sindice and let us transfer content items between not PingTheSemanticWeb.com). only the same type of sites (for instance, SIOC data consumers include all the disparate information blogs) but also different types of social-me- dia systems (for example, between a mailing • generic browsers for RDF data (such as in an integrated manner. list, expressed in SIOC using Swaml, and a Tabulator and Disco); blog). By combining SIOC data import and • browsers customized to display SIOC export tools, we enable various scenarios data (for example, SIOC Explorer and Most SIOC exporters also use RDF autodis- for data portability of social-media contri- Buxon); covery links (that is, a link to an RDF docu- butions (described later), letting people save • extensions for Web browsers that detect ment that’s inserted in the HTML HEAD ele- their contributions and move them between the presence of SIOC and other RDF ment of Web pages on the site) that enable different online-community sites. data; tools such as the Semantic Radar plug-in for • extensions for pinging data-indexing ser- Firefox (https://addons.mozilla.org/en-US/ Faceted exploration of SIOC data vices or reusing portions of community firefox/addon/3886) to automatically dis- The SIOC Explorer (https://launchpad. contributions elsewhere (for example, the cover this content. net/sioc-ex) aggregates data from vari- Semantic Radar); The WordPress SIOC export plug-in, de- ous online-community sites and lets users • applications that can import SIOC veloped for a popular blogging platform, is browse and explore all the disparate infor- data—for example, for portability of data one of the most popular SIOC export tools. mation in an integrated manner.5 This ap- contributions between user accounts on A number of different online-community plication’s core is BrowseRDF—a domain- different systems or for migrating a com- sites have similar export tools (http://rdfs. independent, faceted navigation system for munity from one site to another (for ex- org/sioc/applications/#creating-others). We RDF data that provides a generic view of all ample, the SIOC importer for Word- developed SIOC APIs to lower the barrier RDF data associated with SIOC concepts Press); and for entry and to help people who are un- (for example, the foaf:maker relation between • applications with visualizations oriented familiar with the Semantic Web to write authors and posts). The application aggre- toward SIOC- graphs (for ex- SIOC exporters for their community sites. gates SIOC content into a local RDF store ample, Alexandre Passant’s “SIOCal One such library, the SIOC API for the PHP and provides various ways to view the con- network” browser [http://apassant.net/ scripting language, lets users easily manip- tent and associated data. The start screen home/2006/06/sioc-browser/index.php/ ulate SIOC data through PHP objects and lets users choose from a list of available network], which displays SIOC comment methods, and renders the data into RDF/ sioc:Forums, with data coming from SIOC- structures in terms of a network of their XML. The API creates and exports SIOC enabled systems such as online-community creators). concepts about authors (sioc:User and foaf:Per- forums, blogs, and mailing lists.

May/June 2008 www.computer.org/intelligent 33

Authorized licensed use limited to: UNIVERSITY OF ALBERTA. Downloaded on October 1, 2009 at 14:10 from IEEE Xplore. Restrictions apply. Semantic Web Update

use similar tags and subscribe to the same blogs. SIOC and FOAF can be used together (www.mkbergman.com/index.php?p=372) to describe the objects in this social network of users, with FOAF describing information about the people and SIOC describing user accounts and their user-generated content. All this information, integrated across the Social Web, lets us build a picture of all the objects that a user has created, interacted with, and commented on across different social-media sites, from which the links be- tween the users themselves emerge. The Social SIOC Explorer7 is an exten- sion of SIOC Explorer that lets us see and explore social relations on the Social Web as manifested via user-generated content. The rich data structure, including links be- tween an original post and its replies and links between a post and additional infor- mation about its author, is collected from SIOC data producers. The Social SIOC Ex­ Figure 5. Faceted exploration of SIOC RDF data. The SIOC explorer lets users browse plorer can use this information to mine re- a broad view of disparate information in an integrated way. lations between people on the Web—for example, to find a set of people who have When viewing posts from an individual as the inverse join or the existential join. participated in a discussion with or com- forum or a group of forums, the user sees The SIOC Explorer is built on the Ruby mented on the content created by a certain a list of posts in reverse chronological or- on Rails framework for Web application user (see Figure 7 on page 36). der. Each post is summarized (see Figure development. It uses three components for In addition, this application contains a 5), but the user can expand it to read the consuming and processing Semantic Web component that analyzes social networks full content. Clicking on a post’s creator data. The first is ActiveRDF for mapping using information about the content cre- shows all posts (including comments and RDF data onto programmatic objects. The ated by users, extracts social-context infor- replies) written by this person across all second is BrowseRDF, a faceted browsing mation from the SIOC data, and allows vi- forums; clicking on a topic shows all posts engine that enables navigation of large Se- sualization of this information in the user tagged with this topic, again across all fo- mantic Web data sets without domain-spe- interface. rums. In contrast to ordinary feed readers, cific navigation knowledge. The third com- such lateral browsing works across all dif- ponent is the SIOC RDF crawler, which SWPop: Live, embeddable ferent types of community forums that can crawls, extracts, normalizes, and integrates SIOC aggregation be described in SIOC. For instance, click- SIOC data from various community sites. SWPop is a framework for creating seman- ing on the user “Elias Torres” will show tic-data-powered plug-ins that can be em- not only his blog posts but also his emails Exploring people’s social bedded in Web applications. It demonstrates and contributions to Internet relay chats. connections via objects a synergy of applications for collecting and Finally, Figure 5’s left side shows a ge- Object-centered sociality (www.zengestrom. consuming SIOC data. Using SWPop, de- neric faceted navigation interface, including com/blog/2005/04/why_some_social.html) velopers have created SIOC-powered wid- relevant facets that aren’t already shown as refers to the hypothesis that people are con- gets for the WordPress blog engine, which part of the default browsing interface. The nected through the objects they create and enhances the blogging platform with infor- application builds facets dynamically at view collaborate on. On the Social Web, people mation derived from harvested SIOC data. time, showing the properties and values de- are related through user-generated content When the SWPop plug-in is activated, click- rived from actual data as well as the proper- and annotations. Figure 6 shows a conceptual ing on highlighted links in the Web applica- ties that might not be known at system de- illustration of this idea using a model of con- tion will activate pop-ups that show a sum- sign time. Some facets (like the year) contain tent “circles” created by a person via multiple mary of information and discussions from only “simple” values, whereas complex fac- online accounts.6 Connections are formed the entire “SIOC-o-sphere” (all available ets such as maker or topic can be further ex- between these circles by people creating sim- sources of SIOC data). panded to show subsequent subfacets (see ilar content or using similar annotations. Let’s look at an example of a forum the bottom left of Figure 5). Application de- For example, Bob and Carol are con- dedicated to work opportunities for Web- velopers can customize the facet navigation nected via bookmarked URLs that they technology experts. Suppose the Human to their needs and can exclude or include cer- both have annotated and through events that Resources Department is interested in hir- tain facets or certain advanced operators such they’re both attending, and Alice and Bob ing a Semantic Web consultant and posts a

34 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS

Authorized licensed use limited to: UNIVERSITY OF ALBERTA. Downloaded on October 1, 2009 at 14:10 from IEEE Xplore. Restrictions apply. tent on d con comm content on com ste un sted mu po ity spain po nit ’s s ’s y s ce ite ol it li s ar es A r You C r You ick Tu ick Tu Fl be Fl be

r al ol ca ce ice ar ro li r C lo a lice’s IDs h arol’s IDs A O C

P P

g o o g ice aro Al r C l r A d c d o O a l a o c . c l i . i

r a a o

g c

c g o

r

i

s s

e

n l

l

a

i

n

t t

o

i

A s s

C

m

m

o

o

c

c

p

p

U

U

C

l

a

a

o

l

a

i

r

r

i

c

o

a

c e

i

c

l

l

r

h

A

o

B B

s s

l l

o o

u u

. .

g g

o o

l l

i i

i i

n n

c c

i i

e e

. .

l l

s s

e e

d d

content on comm sted uni po ty s b’s ite Bo s

de s l.ic ne io. gli us lo B

R tr ob er er b ob’s IDs to ro B

U

p

c Bob

R o

m o t s

t r b

i n

e

s

e

g

a

b r

t

c .

o

o

o

d

R

r

o

g

P

pets

R

o

t b

r

e

e

r

b

t

o

o

r

F

e

l

i

b

c

u k

T r

u

o

Y

Figure 6. Object-centered sociality. The Social Web connects users through user-generated content and annotations. request for this position. Many people re- Technically, SWPop plug-ins for Web ap- which keeps a frequently updated index of ply with questions and proposals. With the plications operate as follows. The site owner all the RDF resources on the Semantic Web.8 SWPop plug-in, a user clicks on an au- adds SIOC RDF data export functionality to Sindice’s public API provides various ways thor’s name (for instance, “John”) and sees the site. When a new post is created on the to locate RDF sources—for example, by text a popup window with topics and posts John site, WordPress pings the Sindice Seman- keywords, URI, or inverse functional prop- has recently created anywhere on SIOC- tic Web search engine. SWPop enriches the erties, such as the FOAF SHA1 hash of an enabled online-community sites (see Figure rendered Web page by adding a microfor- email address in this case (mbox_sha1sum). 8 on page 36). The user can then navigate mat containing a SHA1 hash of the author’s The SWPop back end uses these APIs across his posts and restrict results to a given email and the SWPop JavaScript (JS) code, to locate RDF information related to the topic, if needed. Similar cross-site navigation which will skin the with an post’s author on the basis of the SHA1 hash is possible starting from links that are men- SWPop information popup. of his email address. The information re- tioned in a post or from topics (keywords). When the page is loaded, the SWPop JS turned might include articles and comments This not only is interesting for casual back- detects the microformat and performs an made by the poster across the SIOC-o- ground checks but also introduces the con- Ajax call to the back-end SWPop applica- sphere (with data about posts such as topic, cept of reputation portability by enabling tion. The back-end application is based on date, and subject), details about the poster others to quickly assess the value of John’s the Sindice Semantic Web search engine (such as photos) from other sources (for statements given his previous distributed on- (see “Indexers of SIOC Instances without example, FOAF profile files), and social- line activity. Full Storage” in Figure 4, www.sindice.com), network information. SWPop collects all

May/June 2008 www.computer.org/intelligent 35

Authorized licensed use limited to: UNIVERSITY OF ALBERTA. Downloaded on October 1, 2009 at 14:10 from IEEE Xplore. Restrictions apply. Semantic Web Update

is that little effort is required to install them and to tap into the existing vault of informa- tion described in SIOC and other Semantic Web vocabularies. The application’s client- side part is lightweight; the SWPop back end and large RDF indexing applications such as Sindice do all the “heavy lifting.”

SIOC-related initiatives Various working-group initiatives are pro- moting SIOC and tailoring it for their com- munity domains, including social-network portability, collaborative work spaces, and life-science discourse. The initiatives de- scribed here have found value in SIOC and are finding it useful in their projects.

Data portability using SIOC The DataPortability working group (www. Figure 7. The Social SIOC Explorer’s aggregate view of information about a specific dataportability.org) was recently established person. The screen shows the total number of replies to that person’s posts to look at ways to port data from one social- calculated across multiple sites on which he or she participates. media service to another. It aims to docu- ment the best practices for integrating exist- ing open standards and protocols to enable end-to-end data portability between online tools, vendors, and services. Another of its goals is to let users move, share, and control their identity, photos, videos, and all other forms of personal data. One sample data portability scenario in- volves using the Yadis communications pro- tocol to discover a particular person’s iden- tity. The protocol returns a Yadis XRDS (Extensible Resource Descriptor Sequence) document indicating which identities that person prefers to use and what services those identities are held on. Then, we can use the WRFS (Web Relational File Sys- tem) abstraction model to find out what con- tainers the returned identities hold on those services (http://dataportability.pbwiki.com/ WRFS%20Prototype%20Workspace). SIOC is an ideal representation method for describing all the content items a per- son has created (via his or her user accounts) Figure 8. The SWPop framework. The back end locates RDF information about John, on various social-media sites and the struc- including recent activity on SIOC-enabled online-community sites. ture contained therein. A combination of the FOAF vocabulary and SIOC is well suited to describe the relationship between a per- this information into a named graph reposi- loon, the server will create an aggregate son and multiple social-media site accounts tory on top of which Sindice performs anal- of post titles, sorted by date, along with this person owns. ysis and queries. recent topics about which the poster has In Figure 9, Bob holds user accounts on On the basis of analysis results, SW- been writing from all sources of SIOC various social Web sites (two shown for clar- Pop adds a speech balloon with posts and data. It will then send this data to the SW- ity, but there could be many more). Via those other information about the person, col- Pop JS using JavaScript Object Notation accounts, he creates content items on those lected from across the Social Web—for (http://swm.deri.org:8080/swpop/samples/ sites (usually in containers of some sort, such example, next to the poster’s name on the thread2.html). as a bookmark folder, personal blog, mes- Web page. When a visitor clicks the bal- A benefit of applications such as SWPop sage board, or image gallery). He should be

36 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS

Authorized licensed use limited to: UNIVERSITY OF ALBERTA. Downloaded on October 1, 2009 at 14:10 from IEEE Xplore. Restrictions apply. People Services and Containers and user accounts content items

Alice foaf:Document sioc:Container / sioct:BookmarkFolder Roberto’s bookmarks del.icio.us foaf:knows sioc:owner_of sioc:Item / annotea:Bookmark foaf:accountServiceHomepage sioc: sioc:User DataPortability.org sioc:creator_of container_of Roberto sioc:creato sioc:Item / annotea:Bookmark r_of sioc: ount WebCamp.org sAcc container_of foaf:Person foaf:hold sioc:Forum / sioct:Weblog Bob foaf:accountServiceHomepage foaf:Document Robert Rodriguez’s blog foaf:hol sioc:Post / sioct:BlogPost sioct:Comment ner_of dsA WordPress.com sioc:ow Happy New sioc:has_reply ccount Brrr! foaf:knows f Year! sioc:creator_o sioc:User sioc:Post / sioct:BlogPost Carol sioc:creator_o robertr f My Christmas Wish List … sioc: container_of

Figure 9. Bob’s user accounts on two object-centered social networks. Via these and other Social Web sites, Bob creates content items in various containers. able to port not only his social graph (in this which raises the issue of mapping between dars, forums, and mailing lists. These CWE case, his connections to Alice and Carol) but these formats. We hope that existing work platforms integrate CWE tools and related also his personal containers or sets of content from the Semantic Web domain—such as information, enhancing intraorganizational items and perhaps even associated comment ontology mapping and Grddl (a technique knowledge storage and processing. replies. The vocabulary terms (foaf:knows, sioc: for gleaning resource descriptions from di- With the introduction of these CWE plat- User, and so on) are shown in blue. alects of languages)—will significantly ad- forms, the interoperability problems were SIOC is more that just a way to represent dress this issue. adequately addressed internally. However, at personal data containers. The DataPortabil- For this initiative, SIOC is an open stan- the interorganizational level, problems re- ity working group and others are explor- dard for describing user-created content and main. As legacy and isolated systems, these ing methods for porting not just small user- thus enabling data portability. Because it’s platforms remain informational silos that centric data sets but whole sets of commu- based on Semantic Web technologies such can’t communicate with the rest of the nity data. We created SIOC to provide a way as RDF, SIOC data can be easily mixed with world. From an organizational perspective, to describe the content from online commu- other RDF data formats (such as DOAP this creates insurmountable barriers to in- nities (mailing lists, message boards, and [Description of a Project], Review, and so formation exchange with peer organizations. so on). While people soon used it for blogs on) that more specialized community sites From the single user’s perspective, it creates and more recently for other personal sets of might require. a situation in which information that might Web 2.0-type content items, it has the con- exist in separate systems can’t be automati- cepts needed to describe the structure and Semantically interlinking CWEs cally combined and must be collected and contents of a community site as a whole. If Collaborative-work-environment (CWE) plat- processed manually. someone runs a community site and decides forms such as Lotus Notes, Microsoft Share- The case of e-professionals (for example, to port a group from one place to another, he Point, and BSCW (Be Smart—Cooperate engineers, lawyers, and researchers) who or she can use SIOC to describe the existing Worldwide) provide integrated work spaces work concurrently on multiple projects community site’s structure and content to to support knowledge workers inside the en- that are supported by different CWE plat- recreate it on a different information system. terprise. During the last 20 years, the main forms is quite interesting. How can they One challenge for DataPortability is the driver for developing such platforms has been find a comprehensive list and links to all mapping of different data standards. This the need for interoperability among numer- documents (and posts) that have been up- initiative can use related but different data ous applications that were initially built sepa- loaded (or edited) during the last week in representation methods for various scenar- rately to support specific tasks, such as tools all the projects (and platforms) in which ios (for instance, XFN/hCard and FOAF), for project and document management, calen- they might be involved? And how can they

May/June 2008 www.computer.org/intelligent 37

Authorized licensed use limited to: UNIVERSITY OF ALBERTA. Downloaded on October 1, 2009 at 14:10 from IEEE Xplore. Restrictions apply. Semantic Web Update

Posst Interest PPost Post Topic Interest Postt Topic Interest PostPo Posst Post Topic Article PostP Post Location Article Post Article rticlecle Product Location ArticArticleA Article Poststs e Article ArArticleA Social Web e Location Article Semantic Web ticletic Articticle Artic ticlecle nd more special ArticArti ic a ized cif vo pe ca -s ries and s bu in bula upp l a ca le ar m vo m ie o g en s D n t ti a c r of core v ry e be oc m n m ab n nu u o o l la d ImImage c l u r r Location a i l e e e t m s s n S FOAF I Location Image SIOC ImageIma r Image a Imagge d Image SKOS n e l G a Creator e c Creator ImImamageage o DC F v D o c R ab IImageIm ul ary ats Microform

Figure 10. A possible “crystallization point” for domain-specific ontologies. SIOC will bring multiple domain ontologies together to better describe user content.

identify links to resources categorized— ated in the previous stage, annotate the and accessible by the SIOC4CWE Explorer for example, as “architecture”—that might internal data and export them as SIOC from the moment it exports SIOC-annotated be stored in all these different platforms? RDF data. Because these tools use the data. Developing a SIOC exporter is a com- Currently, the Ecospace Integrated Proj- same SIOC ontology, the internally kept paratively easy task that could give a huge ect is addressing these types of problems. data acquires the same semantics and return on investment for favorite CWE The project has adopted the SIOC ontology can be understood and processed by any platforms, because it provides a semantic to provide the basis for much-needed mul- RDF-based application. bridge to the outer world. The Ecospace tiplatform integration and to allow cross- 3. Our research institute developed a spe- Integrated Project is now looking at the project querying and access to this semanti- cialized SIOC4CWE Explorer for nav- development of a SIOC exporter for SAP cally interlinked information.9 This has oc- igating and querying aggregated SIOC NetWeaver. curred in three stages: data from heterogeneous shared work- spaces in a unified way. This gives the Bio-zen and the art of scientific 1. Concepts (such as document, folder, user, user a single interface to query all the community maintenance or post) that exist in the CWE domain projects he or she participates in regard- The bio-zen initiative (http:// and appear in the involved platforms— less of whether they’re stored in BSCW, neuroscientific.net/index.php?id=43) is at- namely, BSCW and Business Collabo- BC, or any other platform. tempting to represent on the Semantic rator (BC)—were mapped to the SIOC Web data, information, and knowledge from ontology. This was feasible because we So, queries such as those presented ear- research in all facets of the life sciences. It developed SIOC to cover online commu- lier for e-professionals can be answered. aims to unify information that’s now scat- nities, a domain quite related to CWE. The user can directly access documents, tered through a multitude of different data In this way, SIOC provides a metalan- posts, calendars, contacts, and any other structures, exchange formats, and data- guage for the CWE domain. kind of resource that were previously bases. As part of this, SISC (Semantically 2. The consortium developed SIOC ex- “locked” inside each separate platform. Interlinked Scientific Communities, http:// porters for BSCW and BC. These tools, Following this approach, any CWE plat- neuroscientific.net/sisc/sisc_introduction. based on the conceptual mappings cre- form can become interoperable with others htm) aims to improve how we currently rep-

38 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS

Authorized licensed use limited to: UNIVERSITY OF ALBERTA. Downloaded on October 1, 2009 at 14:10 from IEEE Xplore. Restrictions apply. resent and communicate scientific data and knowledge. The Authors Bio-zen and SISC employ the SIOC, FOAF, DC, Creative and Science Com- Uldis Bojārs is a researcher and PhD student in the Digital Enterprise Research Institute at the mons, Open Biomedical Ontologies, and National University of Ireland, Galway. His work focuses on the SIOC (Semantically Interlinked Online Communities) project. His research interests include the Semantic Web, online commu- Health Care and Life Sciences ontologies nities, and social software. Bojārs received his master’s in computer science from the University and technologies. The initiative’s creators of Latvia and is a member of the ACM. Contact him at [email protected]. adapted SIOC to represent basic scientific discourse in scientific publications and on John G. Breslin is a researcher and adjunct lecturer in the Digital Enterprise Research Insti- tute at the National University of Ireland, Galway. His research interests include social software, the Web. According to Matthias Samwald, online communities, and the Semantic Web. Breslin received his PhD in electronic engineer- one of the creators, they chose SIOC as a ing from the National University of Ireland and is a member of the IEEE. Contact him at john. base ontology because it provides “an excel- [email protected]. lent tool to describe scientific discourse in a practical, Web-centric manner.” Vassilios Peristeras is a research fellow and adjunct lecturer in the Digital Enterprise Re- search Institute at the National University of Ireland, Galway. His research interests include e- government and e-participation, collaborative work environments, social software, and the Se- Future “shock” mantic Web. He received his PhD in electronic government from the University of Macedonia. Following on from the SIOC ontology’s suc- Contact him at [email protected]. cessful dissemination and implementation Giovanni Tummarello is a research fellow and adjunct lecturer in the Digital Enterprise Re- in over 40 modules and applications, we search Institute at the National University of Ireland, Galway, where he leads the Data Intensive plan to extend our SIOC research by look- Infrastructures research unit. His research has focused on computer science, computational in- ing at more specific application domains. telligence, and multimedia semantics. He leads or led projects such as DBin, Sindice, and Se- The SIOC project has a stable ontology and mantic Web Pipes and started the Semantic Web Applications and Perspectives conference se- a good foundation of tools using it, but we ries. Contact him at [email protected]. need to figure out what the requirements are Stefan Decker is a professor at and director of the Digital Enterprise Research Institute at the in particular domains, whether they involve National University of Ireland, Galway. His research interests include the Semantic Web, digital interlinking real-estate communities or ex- libraries, and the social semantic desktop. Decker received his PhD in computer science from the tracting personal skills from user-generated University of Karlsruhe. Contact him at [email protected]. content. We are now targeting a number of use cases to see how to augment SIOC with domain-specific terms. Some domains of interest include facilitating customer health natural-language-processing techniques) is To develop new solutions and enable inno- support groups (patient support from peers available about a content item, we can de- vation, networks are necessary—networks on the Web), providing structured represen- scribe this information in RDF and attach it of people and organizations having access tations of professional scientific discourse, to existing SIOC data about this item. to networks of knowledge. Although knowl- and modeling or navigating interactions in We’ll also investigate Social Semantic edge is inherently strongly interconnected, prelegislative consultation. Web solutions for enhancing discussion top- current information fabrics don’t reflect this An interesting dimension is the applica- ics on social software systems such as fo- interconnectedness, so they’re not optimal tion of SIOC in CWE platforms. SIOC’s rums, blogs, wikis, and podcasts with seman- for supporting the development of solutions initial purpose was to link open online tics, to yield what we might term “semantic and innovation. The lack of interconnected- communities on the Web. CWEs are using dialectic topics.” This research will build on ness hampers basic information manage- SIOC in a fundamentally different environ- SIOC, SALT (Semantically Annotated La- ment and problem-solving capabilities such ment: linking legacy, closed, and proprie- TeX, http://salt.semanticauthoring.org), and as finding, creating, and deploying the right tary systems. We intend to reuse this expe- other efforts to represent argumentative dis- knowledge at the right time for problem rience for applying SIOC-based solutions in cussions such as IBIS (Issue-Based Informa- solving and collaboration. other similar areas. The more general area tion Systems), Diligent, and Compendium to We can direct the combination of Seman- of enterprise application integration seems create a structure for defining the topics and tic Web and Social Web collaboration tech- to be a challenging candidate where in- argumentative nature of future discussions. nologies toward a universal collaboration teroperability issues are critical. It will also provide methods to classify ex- and networked knowledge infrastructure, We expect SIOC to act as a “crystalli- isting topic discussions. The application do- enabling capabili- zation point” along with FOAF, DC, and mains of scientific discourse (as in SISC) ties as expressed by visionaries such as Van- SKOS (see Figure 10) in bringing various and legislative debate are of particular inter- nevar Bush10 and Doug Engelbart.11 For the domain ontologies together to better de- est here. most part, their ideas remained just a vision scribe user-generated content both on the for far too long because the necessary foun- Social Web and in enterprise environments. dational technologies hadn’t been invented. SIOC provides a framework to which we Figuratively speaking, these ideas were pro- can attach further details about content ystematic access to knowledge is criti- posing jet planes when the rest of the world items. If additional information (for exam- Scal for solving today’s problems on an had just invented bicycles. Starting with ple, embedded in the item or retrieved using individual, organizational, and global level. the Semantic Web and the Social Web, the

May/June 2008 www.computer.org/intelligent 39

Authorized licensed use limited to: UNIVERSITY OF ALBERTA. Downloaded on October 1, 2009 at 14:10 from IEEE Xplore. Restrictions apply. Semantic Web Update

foundations to make these visions a reality now exist.

Acknowledgments We acknowledge the contributions of Fabio Call Corneti and Adam Westerski on SWPop and of Eyal Oren and Benjamin Heitmann on the SIOC (Semantically Interlinked Online Com- for munities) Explorer. Science Foundation Ireland Articles funded this work under grant SFI/02/CE1/I131, Semantic Scientific Knowledge Integration as did Ecospace under grant FP6-IST-5 35208.

This special issue aims to report on the state of the art in semantic e-science. References We’re interested in original research papers that bridge the semantic-technolo- 1. T. Berners-Lee, J.A. Hendler, and O. gies community with the scientific-information-technology community in the area Lassila, “The Semantic Web,” Scientifi c American, vol. 284, no. 5, 2001, pp. 34 43. of knowledge integration, and we welcome papers on practice and theory. All − 2. J.G. Breslin et al., “Towards Semantically papers should include some discussion of evaluation and impact. Potential topics Interlinked Online Communities,” Proc. include, but aren’t limited to, European Semantic Web Conf. (ESWC 05), LNCS 3532, Springer, 2005, pp. 500−514. • foundational aspects of science knowledge representation and integration, 3. J.R. Hobbs and F. Pan, “An Ontology of • evaluation of science ontologies for use in knowledge integration, Time for the Semantic Web,” ACM Trans. • semantically enabled information architectures and infrastructure supporting Asian Language Processing, vol. 3, no. 1, 2004, pp. 66 85. scientific research, − 4. U. Bojārs et al., “An Architecture to Dis - • methodologies for integrating semantics within existing scientific research cover and Query Decentralized RDF infrastructure, Data,” Proc. Scripting for the Semantic • semantically enabled science applications involving knowledge integration Web Workshop at ESWC, CEUR Workshop Proc., 2007, http://CEUR-WS.org/Vol-248/ (aimed at readership beyond a single discipline), paper11.pdf. • semantic environments for building and integrating science applications, 5. B. Heitmann and E. Oren, “Leveraging • use case studies for semantic interdisciplinary science applications, Existing Web Frameworks for a SIOC Explorer to Browse Online Social Com- • ontology-enhanced search and integration of scientific information, munities,” Proc. Scripting for the Semantic • ontology-enhanced science workflow tools involving knowledge integration, Web Workshop at ESWC, CEUR Workshop • provenance-aware semantic e-science tools and applications, Proc., 2007, pp. 52−61, http://CEUR-WS. org/Vol-248/paper6.pdf. • explanation services for semantic e-science applications, and 6. J.G. Breslin and S. Decker, “The Future of • Semantic Web services supporting knowledge integration in e-science. Social Networks on the Internet: The Need for Semantics,” IEEE Internet Computing, Important Dates vol. 11, no. 6, 2007, pp. 86−90. 7. U. Bojārs, B. Heitmann, and E. Oren, “A Submissions due for review: 28 July 2008 • Acceptance notification: 3 Nov. 2008 Prototype to Explore Content and Context Final version submitted: 17 Nov. 2008 • Issue publication: Jan. 2009 on Social Community Sites,” Proc. 1st Conf. Social Semantic Web (CSSW), LNI Submission Guidelines P-113, 2007, pp. 47−58. 8. G. Tummarello, R. Delbru, and E. Oren, Submissions should be 3,000 to 7,500 words (counting a standard figure or “Sindice.com: Weaving Open Linked table as 200 words) and should follow the magazine’s style and presentation Data,” Proc. 6th Int’l Semantic Web Conf. (ISWC), LNCS 4825, Springer, 2007. guidelines (see www.computer.org/portal/pages/intelligent/mc/author.html). 9. K. Ning et al., “A SIOC-Enabled Explorer References should be limited to 10 citations. To submit a manuscript, access of Shared Workspaces,” CSCW/Web 2.0 the IEEE Computer Society Web-based system, Manuscript Central, at https:// Workshop 10th European Conf. (ECSCw mc.manuscriptcentral.com/cs-ieee. 07), 2007. 10. V. Bush, “As We May Think,” Atlantic Monthly, vol. 176, no. 1, 1945, pp. 101−108. Questions? 11. D.C. Engelbart, Augmenting Human Intel - Contact the guest editors at [email protected]. lect: A Conceptual Framework, tech. re - port AFOSR-3233, Stanford Research Inst., 1962. The #1 AI Magazine For more information on this or any other com- www.computer.org/intelligent IEEE puting topic, please visit our at www.computer.org/csdl.

40 www.computer.org/intelligent Ieee InTeLLIGenT SySTeMS

Authorized licensed use limited to: UNIVERSITY OF ALBERTA. Downloaded on October 1, 2009 at 14:10 from IEEE Xplore. Restrictions apply.