Quick viewing(Text Mode)

Perspectives on Social Tagging

Perspectives on Social Tagging

Perspectives on Social Tagging

Ying Ding and Elin K. Jacob School of Library and Information Science, Indiana University, 1320 E. 10th Street, Bloomington, IN 47405. E-mail: {dingying, ejacob}@indiana.edu

Zhixiong Zhang Library of Chinese Academy of Sciences, Beijing, China. E-mail: [email protected]

Schubert Foo Division of Information Studies, Nanyang Technological University, Singapore. E-mail: [email protected]

Erjia Yan, Nicolas L. George, and Lijiang Guo School of Library and Information Science, Indiana University, 1320 E. 10th Street, Bloomington, IN 47405. E-mail: {eyan, nlgeorge, lijguo}@indiana.edu

Social tagging is one of the major phenomena transform- pages was beyond the skill level of many users, it was diffi- ing the from a static platform into an cult for the user to publish data on the Web. More important, actively shared information space. This paper addresses resources published in this environment consisted of simple various aspects of social tagging, including different views on the nature of social tagging, how to make use character strings that adhered to a prescribed syntactic for- of social tags, and how to bridge social tagging with mat, and the machine had no way to interpret the meanings otherWeb functionalities; it discusses the use of facets to of the strings because they did not have machine-readable facilitate browsing and searching of tagging data; and it semantics embedded in them. presents an analogy between bibliometrics and tagomet- The current Web environment is the second-generation rics, arguing that established bibliometric methodologies can be applied to analyze tagging behavior on the Web. Web, often referred to as the Social Web. The phrase Based on the Upper Ontology (UTO), a Web crawler “Social Web” was introduced in 1998 by Peter Hoschka to was built to harvest tag data from Delicious, , and stress the social function of the Web (Hoschka, 1998). The YouTube in September 2007. In total, 1.8 million objects, Social Web is an open and globally distributed data shar- including bookmarks, photos, and videos, 3.1 million tag- ing network that links people, organizations, and concepts. gers, and 12.1 million tags were collected and analyzed. Some tagging patterns and variations are identified and The scope of the Social Web is extended here to include any discussed. Web-related technologies, phenomena, or developments that enhance the social nature of the Web. Web 2.0 is a substrate of the Social Web that provides Introduction platforms and technologies (e.g., wikis, , tags, RSS The World Wide Web (Web) is evolving from a static plat- feeds) for online communication and collaboration. Com- form into an actively shared information space. Participation munication on the Web has morphed from one-way commu- of the user in the first generation Web (Web 1.0) consisted pri- nication to human-to-human communication, and the Web marily of browsing online content.Although documents were has become a convenient platform for users to publish and connected through hyperlinks, allowing the user to move eas- share information. The user not only browses Web 2.0 but ily from one resource to another, the averageWeb 1.0 user was is also actively involved in online communication, includ- locked into a one-way process of communication, much like ing the publishing of resources, the tagging of bookmarks, reading a book, and was not actively involved either in online and the sharing of images and videos. Online publishing in discussions or information sharing. Because writing HTML Web 2.0 is now so easy that anyone who can type can publish. This has spurred users of all ages, from teenagers to seniors, to become involved in Web-based communication. One of the Received November 3, 2008; revised June 4, 2009; accepted June 26, 2009 most popular ways of communicating via Web 2.0 is tagging, © 2009 ASIS&T • Published online 17 August 2009 in Wiley InterScience the act of adding keywords to online resources. Tagging is one (www.interscience.wiley.com). DOI: 10.1002/asi.21190 of the major players in the evolution of the World Wide Web,

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 60(12):2388–2401, 2009 TABLE 1. Summary of three generations of the World Wide Web.

Traditional Web (Web1.0) Social Web (Web2.0) Semantic Web (Web3.0)

User Browsing Browsing Browsing Publishing Publishing Organizing Organizing Interacting Communication style One-way One-way One-way (e.g., reading a book) Human-to-human Human-to-human (e.g., sharing) Human-to-machine (e.g., query-answering) Data Resources (syntactic data): Resources, tags, metadata: Resources, tags, metadata: – content and format mixed – content and format separated – content and format separated – documents hyperlinked – data are linked – ontological data – data are semantically linked Data contributor Webmaster or experienced user Average user Average user and web agents Linking data Hyperlinks Different types of hyperlinks Semantic links Adding data Composing HTML pages Online publishing tagging Online publishing tagging machine generated data and metadata contributing to the change from a collection of hyperlinked of social tagging with other Web functionalities. The major documents to a hyperlinked Web of Data. research focus of this paper is the possibility of using the What is the future of the World Wide Web? While some Upper Tag Ontology (UTO) to profile the features of three talk of Web 3.0, others speak enthusiastically of the Seman- different social tagging networks and the tagging behaviors tic Web. Regardless of what the next generation Web is to be of their users so as to distinguish between them. The section called, it may have features that may be inferred based on the Approaches to Tagging presents a brief review of different evolutionary trajectory of the World Wide Web. It is likely approaches to the study of tags and social tagging and intro- that the communication style of the Web will be human to duces the UTO as one approach to modeling social tagging machine. For example, a user may be able to interact with data. Section 3 discusses the Universal Tag Identifier (UTI) the Web in the manner described by Tim Berners-Lee, Jim and its importance for uniquely identifying a tag and for Hendler, and Ora Lassila in their 2001 article in Scientific being able to reference tags. Section 4 shows the potential American. It may be difficult for users to determine if they link between social tagging and bibliometrics, and provides are communicating with another human being or with a Web an example of co-tag analysis by using data from Delicious. agent. In the current Web 2.0 environment, human-created Section 5 summarizes some features of three social tagging metadata, including tags and other social data (e.g., FOAF in networks: Delicious, Flickr, andYouTube. Section 6 outlines RDF/XML), is growing daily, and even more machine pro- directions for future work with social tagging networks. cessable metadata will be available on the future Web (see Table 1 for a summary of the three generations of the World Approaches to Tagging Wide Web). The trend toward introduction of additional data about Sinha (2005) approaches tagging from the perspective Web resources will include not only the descriptive meta- of cognitive science, finding the “wisdom of crowds” data of universal schemes such as the Dublin Core but also (Surowiecki, 2004) in this new social phenomenon. Sinha metadata about domains or services. Machines will also con- argues that taggers enjoy being embedded in a social envi- tribute data that have been generated automatically through ronment, being watched by others, and receiving feedback inferences based on predefined semantics. Moreover, these on their actions. Furthermore, social tagging can lead to the data will no longer be solitary annotations but will be inter- formation of like-minded groups, thereby enabling social dis- linked (Miller, 2008). Based on the principles for linking covery and the development of connections among taggers. open data proposed by Tim Berners-Lee in 2007 (see Link These emergent groups will frequently demonstrate a wide Open Data initiative available at http://esw.w3.org/topic/ diversity of opinions, thereby providing multiple avenues of SweoIG/TaskForces/CommunityProjects/LinkingOpenData), access to resources. The act of tagging reflects an individual’s more and more semantic data will be available to link con- conceptual associations and enables loose coordination, but cepts or instances using owl:sameAs or foaf:knows. These it does not enforce a single interpretation of a tag or concept. powerful semantic links will weave the current Web into a More important, tagging works because it strikes a balance Web of semantically linked social data. between the individual and the social. Suchanek, Vojnovic, This paper discusses various aspects of social tagging, and Gunawardena (2008) conducted a social tagging analy- including different views on the nature of social tagging, how sis of 65,000 Delicious bookmarks and a user study of over to make use of social tags, and how to bridge the phenomenon 4,000 participants. They found that the more popular a tag is,

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—December 2009 2389 DOI: 10.1002/asi the more likely it is to be meaningful. Hammond, Hannay, user-generated vocabularies work as well as they do: The Lund, and Scott (2005) discuss user motivations for tagging, freedom of natural language, with its everyday familiarity aligning them with popular tagging sites. They contend that and loose associations, is more intuitive and requires less these motivations range from the “selfish” to the altruistic: cognitive effort than making a decision about how well a from tagging of one’s own content for personal use (e.g., predefined category captures the content of a resource or Flickr) or tagging one’s content for the use of others (e.g., represents the immediate needs of the user. In the future, Technorati and HTML metatags) to tagging another’s content it may be feasible to combine these approaches to meta- for one’s personal use (e.g., Delicious, CiteULike, Connotea) data creation in order to achieve optimal representation and or tagging the content of another for the use of others (e.g., retrieval of resources. Indeed, it may be possible to harvest Wikipedia). Lawley (2008) presents another view of tagging user-generated tags in order to develop controlled vocabu- when she cautions that it can be easily manipulated to pro- laries that speak the language of specific user groups and duce negative social consequences such as making political communities. statements or promoting personal interests (Blood, 2005). In social tagging applications, tags are used for navi- Tagging is a primary means for adding metadata to gating, for browsing, and for retrieving resources. A social resources in the Web 2.0 environment. Mathes (2004) offers tagging system often provides tag recommendation mech- an historical perspective on tagging when he describes dif- anisms (generally based on co-occurrence) that are used ferent methods used to generate metadata. In libraries, meta- both to facilitate the identification of appropriate tags for data has traditionally been created by trained professionals a resource and to encourage consolidation of the tagging using standards such as AACR2, formats such as Machine- vocabulary across users. Recent work on more specialized Readable Cataloging (MARC), and controlled vocabularies topics, such as structure mining of user-generated vocabular- such as the Library of Congress Classification (LCC), the ies, has attempted to visualize trends (Dubinko et al., 2006), Dewey Decimal Classification (DDC), and the Library of to identify patterns (Schmitz, Hotho, Jaschke, & Stumme, Congress Subject Headings (LCSH). These professionally 2006) in tagging behavior, or to rank terms in a vocabu- created metadata records form the basis for user displays lary (Hotho, 2006). Xu, Fu, Mao, and Su (2006) introduced in online public access catalogs (OPACs) and are generally a collaborative approach to tag suggestion that is based on of high quality, but they are time-consuming to create and the HITS algorithm (Kleinberg, 1999) and computes a mea- thus impractical for handling the large volume of resources sure of the goodness of tags that are iteratively adjusted available on the Web. Web-based Metadata has also been gen- using a reward-penalty algorithm based on collective use. erated by content creators using HTML metatags as well as Fountopoulos (2007) designed the RichTags system, which a myriad of metadata schemes, including Dublin Core (DC), uses Semantic Web technologies in an attempt to overcome the Text Encoding Initiative (TEI), and the VRA Core 4.0 the weaknesses of conventional social tagging systems. (VRA) developed by the Visual Resources Association, to Modeling of social tagging behaviors can help to organize name but a few. All too often, however, there is a lack of tagging data and to interlink it with data from other social standardization and content creators who have not been for- applications, and several tag ontologies have been devised mally trained in metadata generation, leading to inadequate, for this purpose (e.g., Gruber, 2007; Kim, Passant, Breslin, inaccurate, or “noisy” resource descriptions. More important, Scerri, & Decker, 2008). The UTO (Ding et al., submitted) is in both of these approaches the process of metadata genera- an upper level ontology for social tagging that is designed to tion is frequently disconnected from the actual users of the circumvent the complexity and potential redundancy inher- resources. ent in user-generated tagging vocabularies. UTO is based on Based on the claim that social tagging reflects the vocabu- Gruber’s (2007) suggestion that an ontology can be used to lary and conceptual associations of users (Mika, 2005), it can model tagging data, but it extends this idea by focusing on be a practical approach to allow users to generate metadata ontology alignment and on the integration of tagging data (Shirky, 2005). Unlike formal taxonomies and classifica- with other social metadata. In a formal definition of UTO, let tion schemes, however, which specify explicit relationships O be the ontology, among terms, the accumulation of tags from individual con- O = (C, ) (1) tributors is “metadata for the masses” and reveals the digital equivalent of “desire lines” (Merholz, 2004) created in the where C ={ci,i∈ N} is a finite set of concepts and physical landscape by purposeful foot traffic. This collective ={(ci,ck), i, k ∈ N} is a finite set of relations established vocabulary reflects the interests of users; more important, among the concepts in C. In UTO, C ={Tag, Tagging, this vocabulary speaks the same language as users. With Object, Tagger,Source, Date, Comment, Vote}, (Figure 1). their uncontrolled nature and organic growth, user-generated   vocabularies have the ability to adapt quickly to changes in = hasRelatedTag, hasTag, hasObject, hasSource, both the needs and the vocabulary of users. Barriers to entry hasDate, hasCreator,hasComment, hasVote and the cognitive effort required to use such vocabularies are very low. Indeed, Butterfield (2004) claims that the lack of The emphasis in UTO is on the structure of tagging behav- hierarchy, of synonym control, and of semantic precision— iors rather than the meaning of the tags themselves. It is a the hallmarks of controlled vocabularies—are precisely why simple and easy-to-use ontology that is designed to be able to

2390 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—December 2009 DOI: 10.1002/asi FIG. 1. The Upper Tag Ontology (UTO). integrate metadata from one social tagging site with metadata Because predefined tag lists are similar to traditional from other social tagging sites. controlled vocabularies—indeed, some tag lists could be provided by utilizing existing vocabularies such as LCSH or DDC—some pundits may argue that one of the primary advantages of social tagging is that tags are natural language Universal Tag Identifier (UTI) keywords and thus uncontrolled, allowing users to select any As social tagging becomes more popular, it is important word(s) to represent an online resource. This argument is to find efficient ways to identify, manage, and reference the most powerful when applied to individual taggers who are increasing abundance and diversity of tag data. Identification interested in representing resources for their own use. But is the key enabler for data integration because identifica- social networks are platforms not only for storing one’s own tion of tags on a global level can facilitate the disambiguation bookmarks but also for sharing those bookmarks. When the of tag meanings and uses. Efforts are currently under way intent of a tagger is to share resources with other users, natural to support the global identification of tags. The OpenID language keywords can sometimes hamper retrieval. initiative (http://openid.net/) allows Web users to have one Using an identifier for a tag or drawing tags from a con- Web account for logging into different websites: Using a trolled vocabulary facilitates sharing and reuse of resources Yahoo! account, for example, an individual can log in to as well as tags. Social tagging and traditional indexing are AOL, Livejournal, DIGG, etc. The MOAT project provides similar in that the objective of both activities is to pro- a framework for taggers to produce semantically annotated vide access to and support retrieval of a group of resources content by using URLs of existing resources from DBPe- that share similar features. The primary difference between dia (available at http://dbpedia.org/), geonames (available tagging and indexing is the source responsible for assign- at http://www.geonames.org/), or any other well-accepted ing a descriptor, be it a natural language tag or a term knowledge base that offers predefined meanings for tags. It from a controlled vocabulary. Using descriptors from a is also suggested that, rather than having users type in tags, controlled vocabulary ensures shared meaning and clean it should be possible for them to select tags from predefined data, while using natural language keywords often leads to lists of tags (descriptors) such as WordNet, LCSH, DDC, ambiguity and noisy data. Despite some well-known disad- Wikipedia, or any well-known social tagging network such as vantages of controlled vocabularies (e.g., the potential lack Delicious. Each term in these controlled tag lists would have of concurrency among terms and the difficulties that can be a UTI—a tag URI that would simplify reuse of or reference associated with updating and maintaining a controlled vocab- to that particular tag. ulary), they offer definite advantages for social tagging and

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—December 2009 2391 DOI: 10.1002/asi FIG. 2. Mock-up of a system combining tags and subject headings. should be an integral component of a combined approach: Like the hypertext links of Web 1.0, tagging is a form of controlled vocabularies can be used to supplement natural citation in the Web 2.0 environment, and various methods language tagging, and natural language tags supplied by from citation analysis can be adapted to the tagging envi- users can be used to enrich and extend controlled vocabu- ronment. Bibliometrics provides methodologies to map and laries. For example, when a user bookmarks ski resorts with measure scholarly communication based on citation behav- the tag ski, the LCSH descriptor Dry slope skiing could be ior, and similar methodologies can be applied as tagometrics added to disambiguate the natural language tag and allow to track and measure social communication online based on future users to evaluate the appropriateness of the resource tagging behavior. (see Figure 2). Citation analysis and co-citation analysis are two biblio- FaceTag (available at http://www.facetag.org/) is a work- metric methods that can be applied to social tagging (Kipp & ing prototype that demonstrates how the flat structure of Campbell, 2006): Bibliometric measures used to investi- user-generated tags can be combined with a faceted vocab- gate author co-citation relationships, co-word occurrence, ulary to enrich an information system by incorporating and co-journal citations can be extended to the analysis of relationships among tags (see Figure 3). If users are averse to relationships among taggers, tags, and objects. Other meth- the use of tags from controlled vocabularies because it seems ods commonly employed in bibliometric analysis can also to undermine individual freedom of choice, incorporating a be deployed to investigate tagging behavior, including clus- simple but predefined faceted vocabulary can improve access tering, multidimensional scaling, and factor analysis (see to resources by supporting shared semantics. Through the Table 2). simple addition of four basic facets (resource type, theme, Additional approaches that appear promising for analyz- people, and purpose), users can supplement their own tags ing tagging behaviors include Formal Concept Analysis and with descriptors chosen from the appropriate facet(s), and Support Vector Machine. Formal Concept Analysis (FCA) users searching or browsing Web resources can clearly view is a theory of data analysis based on concept lattices that both the natural language tags and the faceted vocabulary that was introduced by Rudolf Wille in 1982 (Wille, 1982). has been assigned to each resource. FCA is used to identify conceptual structures among terms and has been applied in many fields including, among oth- ers, medicine, psychology, musicology, linguistics, software From Bibliometrics to Tagometrics engineering, civil engineering, and ecology. Priss (2007) pro- Tagging serves the function of semantic reference and is vides an excellent survey of the application of FCA in infor- thus one of the most important activities on the Social Web. mation science. The primary strength of FCA is its ability to

2392 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—December 2009 DOI: 10.1002/asi FIG. 3. FaceTag screenshot. FaceTag uses green (Resource type), yellow (Theme), red (People), and blue (Purpose) highlighting for tags assigned to resources in the retrieval set, allowing the searcher to identify tags from particular facets in a controlled vocabulary.

TABLE 2. Bibliometrics vs. tagometrics.

Bibliometrics Tagometrics

Object of analysis Citation Tag Ranking Authors, journals, subjects Taggers, objects, tags Citing object Scholarly papers Online objects Citing purpose Giving credit Organizing, sharing, retrieving Methods Citation and co-citation analysis tag and co-tag analysis Application Scientific evaluation, impact analysis, retrieval Social impact analysis, retrieval Hybrid Tagging publications (e.g., Connotea, CiteULike, BibSonomy)

reveal hidden relationships between objects and attributes, to support query reformulation. Like clustering, FCA is not construct the extension and intension of formal concepts, and good at handling large datasets; however, unlike clustering, to generate graphical visualizations of the inherent structural which is a purely statistical approach, FCA introduces logical relationships among concepts. FCA can facilitate navigation subsumption and inheritance into the analysis and thus has and browsing and can be used to generate tag lattices that the potential to be a powerful component of tagometrics.

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—December 2009 2393 DOI: 10.1002/asi Support Vector Machine (SVM) is a supervised learning TABLE 3. Tag clusters in Delicious. technique used in machine learning for both classification Cluster Tags and regression. SVM algorithms are based on the principle of structural risk minimization (Cortes & Vapnik, 1995). 1 ajax, c, code, development, html, java, library, net, python, In classification, SVM maps the input space into a high rails, rudy dimensional feature space and constructs an optimal separat- 2 dictionary, English, language, literature, writing 3 comic, entertainment, film, forum, japan, Japanese, movie, ing hyperplane to create a maximal margin classifier, where radio, streaming, television, tv margin represents the minimal distance from the separating 4 calculator, conversion, convert, converter, currency, euro, hyperplane to its closest data points. Although co-citation is exchange generally based on clustering methods, the result of SVM can 5 account, bank, banking, bill, consumer, credit, deal, doctor, improve both the understanding and labeling of clusters. The financial, healthcare, insurance, loan, medical, medicare, medicine, savings basic differences between clustering and SVM classification 6 air, apartment, building, cleaning, do, fire, guide, house, are that the data points used in clustering are unlabeled and housing, move, rental, safety, studio the resulting clusters and data in the clusters are not ordered, 7 Black, blue, brown, fairy, flower, gratis, leather, line, neo, pink, while the data points in SVM classification are labeled and the red, skull, stripes, style, Sweden, Swedish, vintage, white, data in classes are ordered. However, like clustering and FCA, yellow 8 culture, history, philosophy, politics, religion SVM is not good at dealing with large datasets. For this 9 astronomy, earth, geography, german, map, nasa, space, world reason, some researchers have used hierarchical clustering 10 font, illustration, inspiration, portfolio, typography algorithms to provide better training sets in order to make SVM scalable for handling large data sets (Yu,Yang, & Han, 2003; Awad, Khan, Bastani, & Yen, 2004). Traditionally, multidimensional scaling has been used in addressing topics related to geography; Cluster 10 includes bibliometrics to visualize clustered data. Other methods such five tags relevant to formatting. as Self Organization Maps (SOMs) and Pathfinder Networks It is possible to draw some interesting insights based on (PFNets) can also be used to map citation data (Kohonen, this clustering of tags. For example, unlike traditional index- 1989; Schvaneveldt, 1990; White, Buzydlowski, & Lin, ing, which relies on nouns and noun phrases, taggers used 2000). Combining co-citation analysis with scientific impact adjectives to tag bookmarks, as demonstrated by the col- evaluations such as the h-index evaluation (Hirsch, 2007) ors and patterns of Cluster 7. Additionally, tags related to might be an interesting approach to explore for application currency conversion (Cluster 4), banking (Cluster 5), and to social tagging data. housing (Cluster 6) demonstrate a remarkable homogeneity that is not characteristic of many of the other clusters. Tagometric Analysis of Delicious Tagging Features of the Popular Social Networks Co-tag analysis was conducted using 9.3 million tags har- vested from Delicious (see following section). The tags were In order to test the utility of UTO in harvesting tagging normalized using WordNet: for example, related nouns and data, a Web crawler was constructed that would apply the adjectives were combined (e.g., America and American → elements of the UTO in the collection of tagging data from America); stop words were removed (e.g., an, a, and, etc.); three major social tagging websites: Delicious, Flickr, and different verb forms were combined (e.g., go, goes, going → YouTube. go); and uppercase and lowercase were synchronized (e.g. API, → API). Tags with a frequency of occurrence of Crawling and Integrating Tagging Data more than 90 were then selected to form a co-tag matrix. This resulted in a matrix of ∼10,000 × 10,000. Clustering analysis In September 2007, Delicious was crawled to retrieve data was applied to the co-tag matrix using an X-Means algorithm, about taggers, tags, and bookmarks. The crawler began with an unsupervised clustering algorithm that allows the speci- the Delicious tag cloud at http://delicious.com/tag and visited fication of the minimum and maximum number of clusters every tag contained in this tag cloud. For example, for TagA generated during the training (Pelleg & Moore, 2000). Table 3 in the tag cloud the crawler visited http://delicious.com/tagA presents 10 primary clusters generated during the analysis. and parsed the HTML code to grab information about book- Cluster 1 contains 11 tags referencing programming lan- marks, taggers, and related tags. For each bookmark the guages; Cluster 2 has five tags representing natural language crawler then went to http://delicious.com/url and crawled topics. Cluster 3 has 11 tags dealing primarily with entertain- the history of the bookmark, focusing on which taggers ment and entertainment media. Cluster 4 contains seven tags had tagged this bookmark on which date(s). After gathering relevant to currency conversion; Cluster 5 contains 16 tags data about all of the bookmarks on the first page of TagA, generally related to banking and economic issues. Cluster 6 the crawler continued to visit the second and subsequent includes 13 tags focused on housing-related topics; Cluster 7 pages for TagA, performing the same tasks. The crawler then gathers 19 tags referencing colors and patterns; and Clus- repeated this process with subsequent tags in the tag cloud ter 8 has five tags relevant to culture. Cluster 9 has eight tags until it had visited all of the tags in the cloud. Following the

2394 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—December 2009 DOI: 10.1002/asi TABLE 4. Tag data for Delicious, Flickr, and YouTube.

Social network Objects Taggers Tags Tag/object Tag/tagger Object/tagger

Delicious 996,748 2,787,860 9,282,058 9.31 3.33 0.36 Flickr 295,837 153,778 1,351,201 4.57 8.79 1.92 YouTube 527,924 185,975 1,443,924 2.74 7.76 2.84 Total 1,820,509 3,127,613 12,077,183 5.54 6.63 1.71

same crawling method, data were also collected from Flickr TABLE 5. Top 20 tags in Delicious for the years 2005, 2006, and 2007. and YouTube in September 2007. In total, the data retrieved Rank 2005 2006 2007 contain 21 million RDF triples for Delicious, 2.3 million RDF triples for Flickr, and 2.2 million RDF triples for YouTube; 1 /blogs blog/blogs blog/blogs Table 4 shows the details of these different datasets. 2 programming programming design In total, our dataset contains around 1 million bookmarks, 3 software software software 4 music design programming 2.8 million taggers, and 9.3 million tags from Delicious, 5 design reference reference 0.3 million photos, 0.2 million taggers, and 1.4 million tags 6 web music tools from Flickr, and 0.5 million videos, 0.2 million taggers, and 7 reference web Web2.0 1.4 million tags from YouTube. The average number of tags 8 java tools web per object ranges from 2.74 (YouTube) to 9.31 (Delicious). 9 art art video 10 tools java music The average number of tags per tagger uses ranges from 11 linux video art 3.33 (Delicious) to 8.79 (Flickr). The average number of 12 news Web2.0 linux objects tagged per tagger ranges from 0.36 (Delicious) to 13 linux webdesign 2.84 (YouTube). In considering these averages, it should be 14 science news howto noted that, when users upload bookmarks to Delicious, title 15 search tutorial free 16 games howto tutorial is a required field, but tag is not. Thus, there may be many 17 research imported news bookmarks with titles but without tags. 18 technology development development 19 security research opensource Profiling Three Major Social Tagging Networks1 20 video internet java In order to generate portraits of tag use and the compo- sition of tag vocabularies in Delicious, Flickr, and YouTube, the data from each social network were analyzed indepen- level of general interest that spans all 3 years, evidence from dently using three time frames (2005, 2006, 2007). actual tagging behaviors supports the popular assumption that Delicious is a social network for individuals interested Delicious. Table 5 shows the 20 most frequently assigned in the Web and programming skills. Furthermore, the tags tags in Delicious for the years 2005, 2006, and 2007. These introduced in 2006 and 2007 indicate a growing interest in tag sets appear to be relatively stable across the 3 years. free or open-source resources as well as tutorials and how-to The tags xml, science, search, games, technology, and secu- resources that support learning programming languages or rity appear among the top 20 tags for 2005 but are dropped applications and developing new computer skills. from the lists of top 20 tags for 2006 and 2007; and the tags Figure 4 shows the evolution of dominant topical tags imported, research, and internet are dropped from the list used in the Delicious social network for the period 2005– of top 20 tags for 2007. The tags development, howto, tuto- 2007. The tag Web2.0 shows the highest peak in both 2006 rial, and Web2.0 appear in the lists for both 2006 and 2007, and 2007: The raw frequency with which Web2.0 was used and webdesign, free, and opensource are introduced in 2007, to tag bookmarks increased 16 times in 2006 and 76 times pointing to the emergence of new trends in user interests. in 2007 when compared with its raw tagging frequency in Overall, 85% of the top 20 tags are stable across 2006 and 2005. The tags showing the most dramatic increase in raw tag- 2007, indicating that a shared social vocabulary may be ging frequency from 2006 to 2007 were webdesign, free, and emerging in Delicious. Web2.0, indicating growing interest in these topics on the part A profile of Delicious users can be generated through anal- of Delicious taggers. The three tags with the least impressive ysis of the lists of popular tags. The dominance of tags such increase in raw tagging frequency from 2006 to 2007 were as blog, web, programming, and design indicate key interests java, programming, and music. The tag programming drops of Delicious users who are tagging bookmarks to store or to from second position in 2005 and 2006 to fourth position in share. While the tags music, video, art, and news indicate a 2007. The tag music does demonstrate a more dramatic drop in popularity—from fourth position in 2005, to sixth posi- 1 This section appeared in D-LIB Online Magazine: http://www.dlib.org/ tion in 2006, and to tenth position in 2007—but the fact that dlib/march09/ding/03ding.html Last.fm became one of the more popular social networks for

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—December 2009 2395 DOI: 10.1002/asi FIG. 4. Evolution of the top 20 tags in Delicious for the period 2005–2007. The line with diamonds represents the increase in tag frequency from 2005 to 2006 (tag frequency for 2006/tag frequency for 2005). The line with squares represents the increase in tag frequency from 2005 to 2007 (tag frequency for 2007/tag frequency for 2005). The line with triangles represents the increase in tag frequency from 2006 to 2007 (tag frequency for 2007/tag frequency for 2006). sharing music during this period may help to explain why autumn and fall (2007). In addition, users also favor tagging tagging with music decreased from 2005 through September photographs with the time of day (or lighting conditions), 2007. especially when the photographs are night views. With the exception of tags in the categories year, color, and location, Flickr. Table 6 shows the 20 most frequently used tags in the top 20 tag sets differ widely across the 3 years. Flickr for the years 2005, 2006, and 2007. In sharp contrast to Flickr taggers frequently assign informal tags to pho- the more topical tagging culture of Delicious, Flickr taggers tographs (e.g., me), indicating that users may be tagging like to tag their photographs with dates, locations, colors, photographs for purposes of storing and retrieving them for and seasons. Favorite locations in Flickr include Hong Kong their own use rather than with any intent to share them with (2005), Germany (2005), USA (2006 and 2007), London others. When tagging photographs, users tend to empha- (2005–2007), California (2006), and Japan (2007). Favorite size the eye-catching features of an image such as color, color tags are orange (2005), blue (2006 and 2007), red (2006 subject (e.g., sky, water, beach, and specific locations), and and 2007), green (2006 and 2007), and black-and-white light conditions (e.g., night and nightview). Nonetheless, time (i.e., bw in 2007). Most frequently used tags for seasons are (i.e., year, season, or month), locations, and colors are the major features of images tagged by users. It would be use- TABLE 6. Top 20 tags in Flickr for the years 2005, 2006, and 2007. ful to analyze the tagging culture of Flickr in greater detail given that annotating images is an important area for image Rank 2005 2006 2007 retrieval.2 1 2005 usa 2007 Figure 5 shows the temporal history of tag popularity in 2 d70 california canon Flickr for the period 2005–2007. In 2005 and 2006, tagging 3 tsimshatsui (hong kong) 2006 nature was not particularly popular in the Flickr community, with 4 hongkong cameraphone autumn 5 nightview celltagged art total tags of 3,598 in 2005 and 23,066 in 2006. However, as 6 germany zonetag nikon tagging became more popular on the Web tagging behavior 7 newkie sanfrancisco water changed dramatically in Flickr. In 2007, 1,324,537 tags were 8 ragbrai blue bw assigned by Flickr taggers from January through September, 9 art light red ∼50 times more tags than were assigned for all of 2006. Raw 10 wonder sky blue 11 night urban sky tagging frequency for cannon, the second most popular tag 12 buttersweet red japan in 2007, increased 203.5 times over its total use in 2006; 13 15fav sea fall but fall, the thirteenth most popular tag in 2007, showed the 14 central me beach greatest jump, increasing 672.5 times over its raw frequency 15 light water portrait of assignment in 2006. 16 marco nature london 17 london marco night 18 apargioides london green 2 19 orange green usa An interesting example of ongoing research on social annotation of 20 ads1 music november images and videos is GWAP, the “games with a purpose” project at Carnegie Mellon available at http://www.gwap.com/gwap/

2396 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—December 2009 DOI: 10.1002/asi FIG. 5. Evolution of the top 20 tags in Flickr for the period 2005–2007. The line with diamonds represents the increase in tag frequency from 2005 to 2006 (tag frequency for 2006/tag frequency for 2005). The line with squares represents the increase in tag frequency from 2005 to 2007 (tag frequency for 2007/tag frequency for 2005). The line with triangles represents the increase in tag frequency from 2006 to 2007 (tag frequency for 2007/tag frequency for 2006).

Interestingly, an analysis of tagged photographs indicates videos, humor, sex, and girls, apparently reflecting the broad that there are two major communities of Flickr taggers: One interests of the general Web community. community contains nonprofessional photographers who use Tagging activity in YouTube increased dramatically Flickr as a platform for sharing photographs with friends between 2005 and 2007. The total number of tags assigned in and family and tag images so that they can be retrieved by YouTube increased from 4,735 in 2005, to 366,147 in 2006, to others; the second community consists of professional pho- 1,073,042 in 2007: Tag use was 78.7 times greater in 2006 and tographers who do not tag often but who frequently provide 236.7 times greater in 2007 than it was in 2005. Compared comments on photographs taken by other professionals. with 2005, the tag for the current year showed the greatest increase in use in 2007, followed by new and sex/sexy, while dance showed the least increase between 2005 and 2007. YouTube. Table 7 shows the 20 most popular tags in The tag set inYouTube appears to be more stable than that of YouTube for the years 2005, 2006, and 2007. The topics that Flickr for the same time period, indicating, perhaps, that areas are most frequently tagged in this social network are music, of user interest have remained fairly steady for the social web community as a whole (see Figure 6). In Delicious, Flickr, and YouTube, social tagging behav- TABLE 7. Top 20 tags in YouTube for the years 2005, 2006, and 2007. iors increased greatly between 2005 and 2007. More and Rank 2005 2006 2007 more users are relying on social tagging applications to index online resources for future retrieval by themselves and by 1 music the the others. Analysis of these tagging behaviors offers insights 2 funny funny music into the culture of a social network and may be able to iden- 3 video music funny 4 the video video tify emerging trends and topics of increasing interest to a 5 dance live girl community. 6 crazy of of 7 commercial comedy sexy 8 dancing dance live Tagging Patterns 9 live rock dj 10 AMV cat 2007 Based on a review of the 1,300 most frequently assigned 11 fun Halloween dance tags in the combined set of tags from Delicious, Flickr, 12 guitar love hot and YouTube for the years 2005, 2006, and 2007, certain 13 hot girl comedy tagging behaviors were identified (see Table 8; for an alpha- 14 girl movie rock 15 japan dj love betical listing of all 1,300 tags, see Ding et al., 2009). For 16 anime in and example, taggers often date objects by assigning tags for 17 Halloween sexy sex the current year or month; this is especially prevalent in 18 cat and in Flickr, where the current year is always among the top-ranked 19 halo fight new tags (see Table 6). Taggers also like to identify geograph- 20 of you cat ical locations by assigning tags for cities, countries, and

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—December 2009 2397 DOI: 10.1002/asi FIG. 6. Evolution of the top 20 tags in YouTube for the period 2005–2007. The line with diamonds represents the increase in tag frequency from 2005 to 2006 (tag frequency for 2006/tag frequency for 2005). The line with squares represents the increase in tag frequency from 2005 to 2007 (tag frequency for 2007/tag frequency for 2005). The line with triangles represents the increase in tag frequency from 2006 to 2007 (tag frequency for 2007/tag frequency for 2006).

TABLE 8. Tagging preferences.

Tagging features Tag examples

Time 2005, 2006, 2007, November Geo-location Africa, America, Amsterdam, Berlin, Argentina, Japan, China, Chicago, Chile, England Scientific domains Bioinformatics, biology, ecology, ecommerce Religion Bible, Buddhism, Christianity, Islam Programming Ajax, algorithm, C, C++, eclipse, hp, ibm, intel, j2ee, java, javascript, lisp, perl, python, ruby IT Database, datamining, enterprise2.0, itunes, ontology, owl, rdf, semanticweb, semweb, wordpress, xml, xslt Company Apple, , eBay, Google, Microsoft, Nokia, Nikon, Oracle, w3c, Yahoo Opinion Bad, cool Color Black, bw, dark, white Name/celebrity Britney, Spears, Bush, George, James, John, Michael, David Social networks , Flickr, Secondlife, YouTube, Delicious, Wikipedia, Myspace

Note. Tagging preferences are indicated by the 1,300 most frequently occurring tags in the combined set for Delicious, Flickr, and YouTube for the years of 2005, 2006, and 2007. even continents. Scientific domains, such as bioinformatics, the meaning these names convey is not simply an explicit spe- biology, and ecology are also among the most frequently cific reference to individuals but carries an implicit reference occurring tags, as are different religions including Buddhism, to certain social topics and/or social trends. It is also interest- Christianity, and Islam. Computer programming languages, ing to see that taggers frequently use both the first name and IT topics, and large IT companies are also represented, in the last name of an individual as distinct tags rather than con- all likelihood because of the strong technological orientation catenating the names into a single tag. However, this may be of Delicious users. And opinion terms are also used as tags due to lack of understanding on the part of some taggers that for bookmarks, photos, and videos, indicating that taggers blank spaces are used to separate tags in most social tagging often feel the need to express their opinions regarding selected Websites. objects. Review of the most frequently assigned tags also pointed Colors are popular tags in Flickr, not only to represent up variations in the usage of tags that can confound analysis the dominant color(s) in a photograph but also to specify the (see Table 9). For example, there are numerous instances of medium, as evidenced by the occurrence of the tag bw for both plural and singular forms of a given tag (e.g., animal and black-and-white photographs; the names of the social net- animals, girls and girl, etc.). Acronyms and abbreviations are works themselves are often used as tags. But most interesting also used in tagging, especially for common computer terms. is the appearance among the most frequently occurring tags Surprisingly, taggers frequently use conjunctions (e.g., and), of the names of well-known celebrities (e.g., Britney, Spears) prepositions (e.g., of, in), and articles (e.g., the) to tag objects, and politicians (e.g., George, Bush), indicating, perhaps, that as can be seen in the list of most frequently used tags in

2398 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—December 2009 DOI: 10.1002/asi TABLE 9. Tag variations.

Tag variation Tag examples

Plural vs. singular animal(s), application(s), band(s), boy(s), car(s), cartoon(s), cat(s), girl(s) Conjunction, prepositions, etc. and, at, all, for, from, of, on, one, out, the, to Acronyms and abbreviations api, au, apps, bw, cms, cs, css, de, dc, dhtml, dj, diy, dns, dom, drm, dvd, el, en, fanfic, gps, gui, hci, hdr, hp, nyc, , rpg, Compound words audiobooks, blackandwhite, cameraphone, celltagged, creativecommons, datamining, dotnet, enterprise2.0, filesharing, filesystem, firefox:bookmarks, firefox:rss, firefox:toolbar, googlemaps, graphicdesign, howto, losangeles, newmedia, newyork, projectmanagement, rubyonrails Variant spellings and languages color/colors/colour, cultura, economica, educacion, foto, fotographia, fotos, het, humor/humour, musik, musica Variant word forms Blog/blogger/blogging/blogs, bookmarking/bookarmks, barsil/brazil, conversion/convert/converter, el/elearning/ e-learning, hack/hacking/hacks, lifehack/lifehacker/lifehacks, newyork/nyc, opensource/open-source, podcast/ podcasting/podcasts, process/processing, product/production/productivity/products, san-sanfrancisco/sf, tag/tags/tagging

YouTube (see Table 7). This would seem to be counterintu- by indicating that a particular photograph is a favorite image. itive, given that resources are generally tagged for sharing Flickr does provide the ability for users to tag photos uploaded or retrieval; however, the repeated occurrence of these word by friends, but this severely constrains the group of users who forms may signal a lack of understanding on the part of some can actually tag a resource. Flickr is a closed self-tagging taggers that phrases cannot be used as tags since social tag- system for resource contributors and their close friends, not a ging systems generally identify separate tags by the presence community-based system where tagging of resources is open of blank spaces. In recognition of this, some taggers have to all Flickr users. YouTube is similar to Flickr in that con- learned to tag with compound terms, such as blackandwhite, tributors to YouTube can tag the objects they have uploaded, googlemaps, losangeles, and projectmanagement. but the general user is limited to voting for a resource by Tags such as todo and toread are not only compounds but assigning “stars.” personalized reminders in the form of verb infinitives. Per- These differences in tagging rights have generated other haps most problematic are the variant word forms used by differences among the three tagging systems—differences in different taggers to refer to the same or very similar con- the nature and types of tags and in the role of tags in the sys- cepts. These tag variations reflect individual preferences for tem itself (Marlow et al., 2006). Based on analysis of the top describing online resources, but they also underscore the 20 tags in each of these three social networks, tags used in primary difference between the that evolve Delicious are more content-oriented and more closely related on social tagging sites and the controlled vocabularies that to the topics of bookmarks. Tags in Flickr serve as annota- have traditionally been used in formal indexing systems (see tion that represents features of photographs, such as color, Universal Tag Identifier section, above). While these indi- year, and location. In contrast, the tags assigned to resources vidualized folksonomies support retrieval by the original in YouTube tend to be more diverse in that they often rep- tagger, they require more effort and creativity on the part resent not only the content and the features of a video, the of the searcher, who must think of all possible variations on medium (e.g., video), but also the genre (e.g., music, dance, a term or concept in order to retrieve resources on a social comedy, etc.). bookmarking site. In Delicious, tags are used to organize, to retrieve, and to share bookmarks. Indeed, tags play a major role—perhaps, the major role—in Delicious, which relies on tags to support Tagging Features in Delicious, Flickr, and YouTube sharing and retrieval of bookmarks among users. In contrast, When comparing these three social networks, Delicious tags in Flickr are more of a side effect: whether or not to tag demonstrates the tightest connection to the use of tags as a photograph is left to the choice of the user. The absence extended information about resources. In Delicious, users can of tags is not problematic for Flickr users since photographs tag an available online resources with the tag or tags of can be searched by title, comments, or votes. Similarly, tag- their own choice, and an object can be tagged many times ging does not play a major role in YouTube: the popularity and by multiple users, thereby indicating that it “belongs” of a resource is not based on the number of taggers but on (or is more relevant) to the Delicious community as a user ratings, as many YouTube contributors do not actually whole. Similar social networks include CiteULike and Con- tag their videos, which are shared through the comments and notea, where online resources are bibliographical records, votes of other users. and LibraryThing where online resources are books. Social tagging behavior is also related to the community Such tagging sites are very different from Flickr, where that comprises each social network. Delicious attracts a group tagging of content is generally done by the user who has of users interested in IT-related topics. These people tend uploaded the object-as-photograph. The major activities of to focus on the content of resources and tagging provides a other users are to “comment” on or “vote” for these resources means for them to summarize this content. Naturally, then,

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—December 2009 2399 DOI: 10.1002/asi tagging is the key functionality for sharing and retrieving lead to incomplete or ambiguous descriptions. In comparison, resources. In Flickr, users appear to be aligned with one of two user-generated metadata in the form of social tags enables distinct groups: a community of professional photographers individual users not only to organize resources for their own who want to share their arts for comment and feedback from use but also to share with like-minded users (Mathes, 2004). other professionals or the community of users who are inter- Although social tags do not constitute a controlled vocab- ested in managing their personal photos and sharing them ulary, a contributor who does not use the preferred tagging with their close friends. In contrast, the user community of vocabulary of the social network will make it difficult for YouTube perhaps can be characterized as a snapshot of the others to find the resources they have tagged. This feedback entire Web community. YouTube users are people from all in the form of shared tagging creates a communicative loop over the world, and they represent all age groups as well as a between users and metadata, suggesting that users may actu- wide range of different interests. Many of these users do not ally be negotiating the evolving meaning of a term through tag their videos, and if they do, the “tags” are frequently lit- their individual choices of tags assigned to online objects. eral descriptions, which may account for the high incidence Tagging in large social networks may also lead to the cre- of tags such as video, the, of, and in (see Table 7). In similar ation of a local social culture. This is aptly demonstrated by fashion to searching for photographs on Flickr, searching for the history of the tag flicktion, which was created by Andrew videos on YouTube relies primarily on the titles of a video Losowsky in 2004. Although flicktion was originally used by or its creator. But users come to YouTube with very differ- Losowsky to tag his own photographs, the referent of the term ent expectations: While sharing appears to be the dominant evolved as other members of the Flickr community began to purpose for uploading resources, the role of tagging is over- use flicktion to tag images that had a short fiction (e.g., a shadowed, if not eclipsed, by a focus on rating videos and “fiction in Flickr” or “flickr fiction”) attached to it. commenting on them. Social tagging has the potential to contribute to traditional solutions for organizing and browsing information as well as monitoring trends, and user incentives can play an important Conclusion role in the design of tagging systems. The different social Tagsare an emerging form of inductive or bottom-up social tagging features of Delicious, Flickr, and YouTube point to metadata that is generated by the synergy of collectives of corresponding differences in the dynamics of interaction and users. Social tagging is a new way of storing and retrieving participation among users. Delicious supports tagging where resources online and of sharing those resources with others. anyone can tag any resource on the Web, while Flickr facili- The phenomenon of tagging not only creates new data about tates personal tagging where only the owner of the image or online resources, but it also generates new fields for explo- his close friends can assign a tag. This leads to a very dif- ration. The link between social tagging and bibliometrics ferent design model for each of these social networks. It also offers potential for the application of well-tested bibliometric leads to different social features in the tagging vocabularies methodologies to extend and enhance current research efforts that predominate in these networks. Because this research has in the area of tagometrics. been limited to an analysis of tags from three popular social The UTO can be used to model unstructured tagging data tagging networks (i.e., Delicious, Flickr, and YouTube), the on the social Web, and alignment of UTO with other sys- resulting profiles of tagging networks and their users are tems of social semantics can help pave the way for data limited and may not be representative of all social tagging integration, data management, and query formulation. Using networks. Nonetheless, system designers should be aware integrated data about tags and tagging behavior that has of such differences and take them into consideration when been modeled with UTO, new applications can be developed planning the architecture of a tagging system. to enhance social communication, to extend the capabili- Social tagging generates massive accumulations of data ties of social computing, and to facilitate online resource that are said to reflect “the wisdom of crowds” (Surowiecki, selection and ranking. Furthermore, by applying citation and 2004). Creatively managed, these data can lead to the devel- co-citation analysis for the investigation of tagging behav- opment of a variety of interesting applications. The potential ior, it will be possible to build more effective recommender exists for social tagging to build bridges between disci- systems. plines and to enhance social communication as well as social Social tags are user-generated metadata. In popular social computing. To this end, future research should focus on networks with millions of users—networks such as Delicious, the development, implementation, and evaluation of these Flickr, and YouTube—tagging may lead to the emergence new applications as well as on how they may enhance com- of a social vocabulary that reflects the content features and munication and contribute to a sense of community among communication needs of the social network. Traditionally, taggers. the metadata created by professionals (e.g., the classification schemas and other controlled vocabularies used in libraries Acknowledgments and other organizations) is not scalable, making it impractical for application in large collections of resources such as those The authors thank the University of Innsbruck, where data found on the Web, and author-generated metadata that uses collection and analysis were conducted, and Ioan Toma and schemes such as the Dublin Core Metadata Element Set often Michael Fried, of the University of Innsbruck, who provided

2400 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—December 2009 DOI: 10.1002/asi invaluable technical support. The authors also thank Blaise Lawley, L. (2005, January 20). Social consequences of social tagging. Cronin and three anonymous reviewers for insightful com- Message posted to http://many.corante.com/archives/2005/01/20/social_ ments on an early draft of the paper. We also thank Mike consequences_of_social_tagging.php Retrieved May 16, 2009. Losowsky, A. (2004, September 21). The doorbells of Florence Thelwall for valuable comments and support. project. Retrieved July 31, 2009, from http://losowsky.com/doorbells/ interviews.html Marlow, C., Naaman, M., Boyd, D., & Davis, M. (2006). HT06, tagging References paper, taxonomy, Flickr, academic article, to read. In P. J. Nürnberg (Chair), Proceedings of the 17th Conference on Hypertext and Hyper- Awad, M., Khan, L., Bastani, F., & Yen, I.L. (2004). An effective sup- media (pp. 31–40). New York: ACM Press. port vector machines (SVMs) performance using hierarchical clustering. Mathes, A. (2004). Folksonomies—Cooperative classification and commu- In Proceedings of the 16th IEEE International Conference on Tools nication through shared metadata. Retrieved May 16, 2009, from http:// with Artificial Intelligence (pp. 633–667). Los Alamitos, CA: IEEE www.adammathes.com/academic/computer-mediated-communication/ Press. folksonomies.html Berners-Lee, T., Hendler J., & Lassila, O. (2001). The semantic web. Merholz, P. (2004). Metadata for the masses. Retrieved May 16, 2009, from Scientific American, 284(5), 34–43. http://www.adaptivepath.com/publications/essays/archives/000361.php Blood, R. (2005, January 18). Rebecca’s pocket. Message posted to Mika, P. (2005). Ontologies are us: A unified model of social networks and http://www.rebeccablood.net/archive/2005/01.html#11technorati semantics. InY.Gil, E. Motta,V.R.Benjamins, & M.A. Musen (Eds.), Pro- Butterfield, S. (2004, August 4). I am sharing this with you. Message ceedings of Fourth International Semantic Web Conference 2005 (ISWC posted to http://www.sylloge.com/personal/2004/08/folksonomy-social- 2005) (pp. 522–536). Berlin, Germany: Springer. classification-great.html Miller, P. (2008). Sir Tim Berners-Lee talks with Talis about the semantic Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learn- web. Retrieved May 16, 2009, from http://talis-podcasts.s3.amazonaws. ing, 20(3), 273–297. com/twt20080207_TimBL.html Ding, Y., Jacob, E., Caverlee, J., Fried, M., & Zhang, Z. (2009). Profiling Pelleg, D., & Moore, A.W. (2000). X-means: Extending K-means with social networks: A social tagging perspective. D-Lib, 15(3/4). Retrieved efficient estimation of the number of clusters. In Proceedings of the May 16, 2009 from http://www.dlib.org/dlib/march09/ding/03ding.html 17th International Conference on Machine Learning (pp. 727–734). San Ding, Y., Jacob, E., Fried, M., Toma, I., Yan, E., & Foo, S. (Submitted). Francisco: Morgan Kaufmann. Upper Tag Ontology (UTO) for integrating social tagging data. Priss, U. (2007). Formal concept analysis in information science. Annual Dubinko, M., Kumar, R., Magnani, J., Novak, J., Raghavan, P., &Tomkins,A. Review of Information Science and Technology, 40, 521–543. (2006). Visualizing tags over time. In Proceedings of the 15th Inter- Schmitz, C., Hotho, A., Jaschke, R., & Stumme, G. (2006). Mining asso- national World Wide Web Conference (pp. 193–202). New York: ACM ciation rules in folksonomies. In V. Batagelj, H.H. Bock, A. Ferligoj, & Press. A. Ziberna (Eds.), Data science and classification: Proceedings of the Fountopoulos, G.I. (2007). RichTags: A social semantic tagging 10th IFCS Conference, Studies in Classification, Data Analysis, and system. Unpublished master’s thesis, University of Southampton, Knowledge Organization (pp. 261–270). Berlin, Germany: Springer. Southampton, UK. Schvaneveldt, R.W. (1990). Pathfinder associative networks: Studies in Gruber, T. (2007). Ontology of : A mash-up of apples and knowledge organization. Norwood, NJ: Ablex. oranges. International Journal on Semantic Web & Information Sys- Shirky, C. (2005). Ontology is overrated: Categories, links, and tags. tems, 3(2). Retrieved May 16, 2009, from http://tomgruber.org/writing/ Retrieved May 16, 2009, from http://shirky.com/writings/ontology_ ontology-of-folksonomy.htm overrated.html Hammond, T., Hannay, T., Lund, B., & Scott, J. (2005). Sinha, R. (2005, October 1). A social analysis of tagging. Message tools (I): A general review. D-Lib Magazine, 11(4). Retrieved May 16, posted to http://blog.jackvinson.com/archives/2005/10/01/a_cognitive_ 2009, from http://www.dlib.org/dlib/april05/hammond/04hammond.html analysis_of_tagging.html Retrieved May 16, 2009. Hirsch, J.E. (2007). Does the h-index have predictive power? Proceedings Suchanek, F.M., Vojnovic, M., & Gunawardena, D. (2008). Social tags: of the National Academy of the Sciences of the United States of America Meaning and suggestions. In Proceedings of the 17th ACM Conference 104, 19193–19198. on Information and Knowledge Management (pp. 223–232). New York: Hoschka, P. (1998). CSCW research at GMD-FIT: From basic groupware to ACM Press. the Social Web. ACM SIGGROUP Bulletin, 19(2), 5–9. Surowiecki, J. (2004). The wisdom of crowds: Why the many are smarter Hotho, A., Jäschke, R., Schmitz, C., & Stumme, G. (2006). Information than the few and how collective wisdom shapes business, economies, retrieval in folksonomies: Search and ranking. InY.Sure and J. Domingue societies and nations. New York: Random House Large Print. (Eds.), The semantic web: research and applications (Vol. 4011). White, H.D., Buzydlowski, J., & Lin, X. (2000). Co-cited author maps as Heidelberg, Germany: Springer. interfaces to digital libraries: Designing pathfinder networks in the human- Kim, H., Passant, A., Breslin, J., Scerri, S., & Decker, S. (2008). Review and ities. In Proceedings of the IEEE International Conference on Information alignment of tag ontologies for semantically-linked data in collaborative Visualisation (pp. 25–32). Washington, DC: IEEE Computer Society. tagging spaces. In Proceedings of the Second International Confer- Wille, R. (1982). Restructuring lattice theory: an approach based on hierar- ence on Semantic Computing (pp. 315–322). Los Alamitos, CA: IEEE chies of concepts. In I. Rival (Ed.), Ordered sets (pp. 445–470). Dordrecht, Press. Netherlands: Reidel. Kipp, M.E., & Campbell, D.G. (2006). Patterns and inconsistencies in Xu, Z., Fu, Y., Mao, J., & Su, D. (2006, May). Towards the semantic web: collaborative tagging systems: An examination of tagging practices. Collaborative tag suggestions. Paper presented at the Proceedings of the In Proceedings of the American Society for Information Science and Collaborative Web Tagging Workshop at the International World Wide Technology, 43, 1–18. Web Conference (WWW 2006), Edinburgh, Scotland. Kleinberg, J.M. (1999). Authoritative sources in a hyperlinked environment. Yu, H., Yang, J., & Han, J. (2003). Classifying large data sets using SVMs Journal of the ACM, 46(5), 604–632. with hierarchical clusters. Proceedings of the Ninth ACM SIGKDD Kohonen, T. (1989). Self-organization and associative memory. New York: International Conference on Knowledge Discovery and Data Mining Springer. (pp. 306–315). New York: ACM Press.

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—December 2009 2401 DOI: 10.1002/asi