Why We : Understanding Usage and Communities

Akshay Java Xiaodan Song University of Maryland Baltimore County NEC Laboratories America 1000 Hilltop Circle 10080 N. Wolfe Road, SW3-350 Baltimore, MD 21250, USA Cupertino, CA 95014, USA [email protected] [email protected] Tim Finin Belle Tseng University of Maryland Baltimore County NEC Laboratories America 1000 Hilltop Circle 10080 N. Wolfe Road, SW3-350 Baltimore, MD 21250, USA Cupertino, CA 95014, USA fi[email protected] [email protected]

ABSTRACT provided by several services including Twitter2,Jaiku3 and 4 Microblogging is a new form of in which more recently . These tools provide a light-weight, users can describe their current status in short posts dis- easy form of communication that enables users to broadcast tributed by instant messages, mobile phones, or the and share information about their activities, opinions and Web. Twitter, a popular microblogging tool has seen a lot status. One of the popular microblogging platforms is Twit- of growth since it launched in October, 2006. In this paper, ter [29]. According to ComScore, within eight months of its we present our observations of the microblogging phenom- launch, Twitter had about 94,000 users as of April, 2007 [9]. ena by studying the topological and geographical properties Figure 1 shows a snapshot of the first author’s Twitter home- of Twitter’s . We find that people use mi- page. Updates or posts are made by succinctly describing croblogging to talk about their daily activities and to seek one’s current status within a limit of 140 characters. Top- or share information. Finally, we analyze the user intentions ics range from daily life to current events, news , and associated at a community level and show how users with other interests. IM tools including Gtalk, Yahoo and MSN similar intentions connect with each other. have features that allow users to share their current status with friends on their buddy lists. Microblogging tools facili- tate easily sharing status messages either publicly or within Categories and Subject Descriptors asocialnetwork. H.3.3 [Information Search and Retrieval]: Information Search and Retrieval - Information Filtering; J.4 [Computer Applications]: Social and Behavioral Sciences - Economics

General Terms Social Network Analysis, User Intent, Microblogging,

1. INTRODUCTION Microblogging is a relatively new phenomenon defined as “a form of blogging that lets you write brief text updates (usu- ally less than 200 characters) about your life on the go and send them to friends and interested observers via text mes- saging, (IM), email or the web.” 1.Itis

1http://en.wikipedia.org/wiki/Micro-blogging

Permission to make digital or hard copies of all or part of this work for Figure 1: An example Twitter homepage with up- personal or classroom use is granted without fee provided that copies are dates talking about daily experiences and personal not made or distributed for profit or commercial advantage and that copies interests. bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific 2http://www.twitter.com permission and/or a fee. 3 http://www.jaiku.com Joint 9th WEBKDD and 1st SNA-KDD Workshop ’07 , August 12, 2007 , 4 San Jose, California , USA . Copyright 2007 ACM 1-59593-444-8...$5.00. http://www.pownce.com Compared to regular blogging, microblogging fulfills a need to understand the user intentions and community structure for an even faster mode of communication. By encourag- in microblogging. From our analysis, we find that the main ing shorter posts, it lowers users’ requirement of time and types of user intentions are: daily chatter, conversations, thought investment for content generation. This is also one sharing information and reporting news. Furthermore, users of its main differentiating factors from blogging in general. play different roles of information source, friends or informa- The second important difference is the frequency of update. tion seeker in different communities. On average, a prolific bloger may update her once ev- ery few days; on the other hand a microblogger may post The paper is organized as follows: in Section 2, we describe several updates in a single day. the dataset and some of the properties of the underlying social network of Twitter users. Section 3 provides an anal- With the recent popularity of Twitter and similar microblog- ysis of Twitter’s social network and its spread across geogra- ging systems, it is important to understand why and how phies. Next, in Section 4 we describe aggregate user behav- people use these tools. Understanding this will help us ior and community level user intentions. Section 5 provides evolve the microblogging idea and improve both microblog- a taxonomy of user intentions. Finally, we summarize our ging client and infrastructure software. We tackle this prob- findings and conclude with Section 6. lem by studying the microblogging phenomena and analyz- ing different types of user intentions in such systems. 2. DATASET DESCRIPTION Much of research in user intention detection has focused on Twitter is currently one of the most popular microblogging understanding the intent of a search queries. According to platforms. Users interact with this system by either using a Broder [5], the three main categories of search queries are Web interface, IM agent or sending SMS updates. Members navigational, informational and transactional. Understand- may choose to make their updates public or available only to ing the intention for a search query is very different from friends. If user’s profile is made public, her updates appear user intention for content creation. In a survey of bloggers, in a “public timeline” of recent updates. The dataset used Nardi et al. [26] describe different motivations for “why in this study was created by monitoring this public timeline we blog”. Their findings indicate that are used as a for a period of two months starting from April 01, 2007 to tool to share daily experiences, opinions and commentary. May 30, 2007. A set of recent updates were fetched once Based on their interviews, they also describe how bloggers every 30 seconds. There are a total of 1,348,543 posts from form communities online that may support different social 76,177 distinct users in this collection. groups in real world. Lento et al. [21] examined the im- portance of social relationship in determining if users would Twitter allows a user, A, to “follow” updates from other remain active in a blogging tool called Wallop. A user’s re- members who are added as “friends”. An individual who is tention and interest in blogging could be predicted by the not a friend of user A but “follows” her updates is known as a “follower”. Thus friendships can either be reciprocated or comments received and continued relationship with other 5 active members of the community. Users who are invited by one-way. By using the Twitter developer API , we fetched people with whom they share pre-exiting social relationships the social network of all users. We construct a directed tend to stay longer and active in the network. Moreover, cer- graph G(V,E), where V represents a set of users and E tain communities were found to have a greater retention rate represents the set of “friend” relations. A directed edge e due to existence of such relationships. Mutual awareness in exists between two users u and v if user u declares v as a social network has been found effective in discovering com- a friend. There are a total of 87,897 distinct nodes with munities [23]. 829,053 friend relation between them. There are more nodes in this graph due to the fact that some users discovered In computational linguists, researchers have studied the prob- though the link structure do not have any posts during the lem of recognizing the communicative intentions that un- duration in which the data was collected. For each user, we derlie utterances in dialog systems and spoken language in- also obtained their profile information and mapped their terfaces. The foundations of this work go back to Austin location to a geographic coordinate, details of which are [2], Stawson [32] and Grice [14]. Grosz [15] and Allen [1] provided in the following section. carried out classic studies in analyzing the dialogues be- tween people and between people and computers in coopera- 3. MICROBLOGGING IN TWITTER tive task oriented environments. More recently, Matsubara This section describes some of the characteristic properties [24] has applied intention recognition to improve the per- of Twitter’s Social Network including it’s network topology formance of automobile-based spoken dialog system. While and geographical distribution. their work focusses on the analysis of ongoing dialogs be- tween two agents in a fairly well defined domain, studying user intention in Web-based systems requires looking at both 3.1 Growth of Twitter the content and link structure. Since Twitter provides a sequential user and post identifier, we can estimate the growth rate of Twitter. Figure 2 shows In this paper, we describe how users have adopted a spe- the growth rate for users and Figure 3 shows the growth rate cific microblogging platform, Twitter. Microblogging is rel- for posts in this collection. Since, we do not have access to atively nascent, and to the best of our knowledge, no large historical data, we can only observe its growth for a two scale studies have been done on this form of communication month time period. For each day we identify the maximum and information sharing. We study the topological and geo- value for the user identifier and post identifier as provided graphical structure of Twitter’s social network and attempt 5 http://twitter.com/help/api Twitter Growth Rate (Users) Twitter Growth Rate (Posts) 6500000 70000000 Growth of Users Growth of Posts 65000000 6000000 60000000

5500000 55000000

50000000 5000000 45000000

40000000 Max PostID

Max UserID 4500000 35000000

4000000 30000000

25000000 3500000 20000000

3000000 15000000 1-Apr 7-Apr 14-Apr 21-Apr 29-Apr 5-May 11-May 1-Apr 7-Apr 14-Apr 21-Apr 29-Apr 5-May 11-May April - May 2007 April - May 2007

Figure 2: Twitter User Growth Rate. Figure shows Figure 3: Twitter Posts Growth Rate. Figure shows the maximum userid observed for each day in the the maximum post ID observed for each day in the dataset. After an initial period of interest around dataset. Although the rate at which new users are March 2007, the rate at which new users are joining joining the network has slowed, the number of posts Twitter has slowed. are increasing at a steady rate. by the Twitter API. By observing the change in these val- has been shown that many properties including the degree ues, we can roughly estimate the growth of Twitter. It is distributions on the Web follow a power law distribution interesting to note that even though Twitter launched in [19, 6]. Recent studies have confirmed that some of these 2006, it really became popular soon after it won the South properties also hold true for the [31]. by SouthWest (SXSW) conference Web Awards6 in March, 2007. Figure 2 shows the initial growth in users as a result Property Twitter WWE of interest and publicity that Twitter generated at this con- Total Nodes 87897 143,736 ference. After this period, the rate at which new users are Total Links 829247 707,761 joining the network has slowed. Despite the slow down, the Average Degree 18.86 4.924 number of new posts is constantly growing, approximately Indegree Slope -2.4 -2.38 doubling every month indicating a steady base of users gen- Outdegree Slope -2.4 NA erating content. Degree correlation 0.59 NA Diameter 6 12 Following Kolari et al. [18], we use the following definition Largest WCC size 81769 107,916 of user activity and retention: Largest SCC size 42900 13,393 Clustering Coefficient 0.106 0.0632 Definition A user is considered active during a week if he Reciprocity 0.58 0.0329 or she has posted at least one post during that week. Table 1: Twitter Social Network Statistics Definition An active user is considered retained for the given week, if he or she reposts at least once in the following X weeks. Table 1 describes some of the properties for Twitter’s social network. We also compare these properties with the corre- Due to the short time period for which the data is available sponding values for the Weblogging Ecosystems Workshop and the nature of Microblogging we decided to use X as a (WWE) collection [4] as reported by Shi et al. [31]. Their period of one week. Figure 4 shows the user activity and study shows a network with high degree correlation (also retention for the duration of the data. About half of the shown in Figure 6) and high reciprocity. This implies that users are active and of these half of them repost in the fol- there are a large number of mutual acquaintances in the lowing week. There is a lower activity recorded during the graph. New Twitter users often initially join the network last week of the data due to the fact that updates from the on invitation from friends. Further, new friends are added public timeline are not available for two days during this to the network by browsing through user profiles and adding period. other known acquaintances. High reciprocal links has also been observed in other online social networks like Livejour- 3.2 Network Properties nal [22]. Personal communication and contact network such as cell phone call graphs [25] also have high degree corre- The Web, blogosphere, online social networks and human lation. Figure 5 shows the cumulative degree distributions contact networks all belong to a class of “scale-free net- [27, 8] of Twitter’s network. It is interesting to note that works” [3] and exhibit a “small world phenomenon” [33]. It the slopes γin and γout are both approximately -2.4. This 6http://2007.sxsw.com/ value for the power law exponent is similar to that found for Indegree Distribution of Twitter Social Network 1 10 Indegree γ = −2.412 User Activity and Retention in 0 10 60000

50000 −1 10

40000

K) −2 30000 ≥ 10 P(x

20000

Number of Users −3 10

10000

−4 0 10 12345678 Week

Retained Users Active Users −5 10 0 1 2 3 4 10 10 10 10 10 Indegree K (a) Indegree Distribution

Outdegree Distribution of Twitter Social Network 2 Figure 4: Twitter User Activity and Retention 10 Outdegree γ = −2.412 1 out Continent Number of Users 10 North America 21064 0 Europe 7442 10 Asia 6753 −1 Oceania 910 10 K) South America 816 ≥

P(x −2 Africa 120 10 Others 78 −3 Unknown 38994 10

−4 Table 2: Table shows the geographical distribution 10 of Twitter users. North America, Europe and Asia −5 10 have the highest adoption of Twitter. 0 1 2 3 4 10 10 10 10 10 Outdegree K (b) Outdegree Distribution the Web (typically -2.1 for indegree [11]) and blogosphere (-2.38 for the WWE collection). Figure 5: Twitter social network has a power law 3.3 Geographical Distribution exponent of about -2.4 which is similar to the Web Twitter provides limited profile information such as name, and blogosphere. bio, timezone and location. For the 76K users in our collec- tion about 39K had specified locations that could be parsed correctly and resolved to their respective latitude and longi- 7 continent links than across continents. This is consistent tudinal coordinates (using Yahoo! Geocoding API ). Fig- with observations that the probability of friendship between ure 7 and Table 2 shows the geographical distribution of two users is inversely proportionate to their geographic prox- Twitter users and the number of users in each continent. imity [22]. Twitter is most popular in US, Europe and Asia (mainly Japan). Tokyo, New York and San Francisco are the major In Table 4, we compare some of the network properties cities where user adoption of Twitter is high [16]. across these three continents with most users: North Amer- ica, Europe and Asia. For each continent the social network Twitter’s popularity is global and the social network of its is extracted by considering only the subgraph where both the users crosses continental boundaries. By mapping each user’s source and destination of the friendship relation belong to latitude and longitude to a continent location we can extract the same continent. Asian and European communities have the origin and destination location for every edge. Table 3 a higher degree correlation and reciprocity than their North shows the distribution of friendship relations across major American counterparts. Language plays an important role continents represented in the dataset. Oceana is used to is such social networks. Many users from Japan and Span- represent Australia, New Zealand and other island nations. ish speaking world connect with others who speak the same A significant portion (about 45%) of the Social Network still language. In general, users in Europe and Asia tend to have lies within North America. Moreover, there are more intra- higher reciprocity and clustering coefficient values in their 7http://developer.yahoo.com/maps/ corresponding subgraphs. Twitter Social Network Scatter Plot 10000 from-to Asia Europe Oceana N.A S.A Africa Correlation Coefficient = 0.59 Asia 13.45 0.64 0.10 5.97 0.005 0.01 Europe 0.53 9.48 0.25 6.16 0.17 0.02 1000 Oceana 0.13 0.40 0.60 1.92 0.02 0.01 N.A 5.19 5.46 1.23 45.60 0.60 0.10 S.A 0.06 0.26 0.02 0.75 0.62 0.00 100 Africa 0.01 0.03 0.00 0.11 0.00 0.03 Outdegree (follows) Table 3: Table shows the distribution of Twitter 10 social network links across continents. Most of the social network lies within North America. (N.A = North America, S.A = South America) 1 1 10 100 1000 10000 Indegree (followed by)

Property N.A Europe Asia Figure 6: Scatter plot showing the degree correla- Total Nodes 16,998 5201 4886 tion of Twitter social network. A high degree corre- Total Edges 205,197 42,664 60519 lation signifies that users who are followed by many Average Degree 24.15 16.42 24.77 people also have large number of friends. Degree Correlation 0.62 0.78 0.92 Clustering Coefficient 0.147 0.54 0.18 Distribution of Twitter Users Across the World ° 90 N Percent Reciprocity 62.64 71.62 81.40

° Table 4: Network properties of social networks 45 N within a continent. Europe and Asia have a higher reciprocity indicating closer ties in these social net-

° 0 works. (N.A = North America) ° ° ° ° ° ° ° ° ° 180 W 135 W 90 W 45 W 0 45 E 90 E 135 E 180 E

° 45 S high hub scores have relatively low authority scores, such as

° dan7, startupmeme, and aidg. They follow many other users 90 S while have less friends instead. Based on this rough cate- gorization, we can see that user intention can be roughly categorized into these 3 types: information sharing, infor- Figure 7: Figure shows the global distribution of mation seeking, and friendship-wise relationship. Twitter users. Though initially launched in US Twitter is popular across the world. After the hub/authority detection, we identify communi- ties within friendship-wise relationships by only considering the bidirectional links where two users regard each other as 4. USER INTENTION friends. A community in a network can be vaguely defined In this paper, we propose a two-level framework for user in- as a group of nodes more densely connected to each other tention detection. First, we used the HITS algorithm [17] to than to nodes outside the group. Often communities are find the hubs and authorities in the network. Hubs and au- topical or based on shared interests. To construct web com- thorities have a mutually reinforcing property and are com- munities, Flake et. al. [12] proposed a method using HITS puted as follows: H(p) represents the hub value of the page pandA(p) represents the authority value of a page p.  Authority(p)= Hub(v) User Authority User Hub v∈S,v→p Scobleizer 0.002354 Webtickle 0.003655 Twitterrific 0.001765 Scobleizer 0.002338 And ev 0.001652 dan7 0.002079  JasonCalacanis 0.001557 startupmeme 0.001906 Hub(p)= Authority(u) u∈S,p→u springnet 0.001525 aidg 0.001734 bloggersblog 0.001506 lisaw 0.001701 chrispirillo 0.001503 bhartzer 0.001599 Table 5 shows a listing of top ten hubs and authorities. From darthvader 0.001367 bloggersblog 0.001559 this list, we can see that some users have high authority ambermacarthur 0.001348 JasonCalacanis 0.001534 score, and also high hub score. For example, Scobleizer, Ja- sonCalacanis, bloggersblog, and Webtickle who have many followers and friends in Twitter are located in this category. Table 5: Top 10 Hubs and Authorities in Twitter. Some users with very high authority scores have relatively Some of the top authorities are also popular blog- low hub score, such as Twitterrific, ev, and springnet. They gers. Top hubs include users like startupmeme and have many followers while less friends in Twitter, and thus aidg which are microblogging versions of a blogs and are located in this category. Some other users with very other web sites. and maximize flow/minimize cut to detect communities. In ties. In GSPN’s bio, he mentioned he is the Producer of the social network area, Newman and Girvan [13, 7] proposed a Generally Speaking Network8; while in pcamarata’s metric called modularity to measure the strength of the com- bio, he mentioned he is a family man, a neurosurgeon, and munity structure. The intuition is that a good division of a a a podcaster. By looking at the top key terms of these two network into communities is not merely to make the num- communities, we can see that the focus of the green com- ber of edges running between communities small; rather, the munity is a little more diversified: people occasionally talk number of edges between groups is smaller than expected. about podcasting, while the topic of the red community is Only if the number of between group edges is significantly a little more focused. In a sense, the red community is like lower than what would be expected purely by chance can we a professional community of podcasting while the green one justifiably claim to have found significant community struc- is a informal community about podcasting. ture. Based on the modularity measure of the network, op- timization algorithms are proposed to find good divisions of Figure 10 illustrates five communities connected by Scobleizer, a network into communities by optimizing the modularity who is a Tech geek blogger. People follow his posts to over possible divisions. Also, this optimization process can get technology news. People in different communities share be related to the eigenvectors of matrices. However, in the different interests with Scobleizer. Specifically, AndruEd- above algorithms, each node has to belong to one commu- wards, Scobleizer, daryn, and davidgeller get together to nity, while in real networks, communities often overlap. One share video related news. CaptSolo et al. have some inter- person can serve a totally different functionality in different ests on Semantic Web. AdoMatic et al. are engineers and communities. In an extreme case, one user can serve as the have interests with coding related issues. information source in one community and the information seeker in another community. Studying intentions at a community level, we observe users participate in communities which share similar interests. In- People in friendship communities often know each other. dividuals may have different intentions for joining these com- Prompted by this intuition, we applied the Clique Perco- munities. While some act as information providers, oth- lation Method (CPM) [28, 10] to find overlapping commu- ers are merely looking for new and interesting information. nities in networks. The CPM is based on the observation Next, we analyze aggregate trends across users spread over that a typical member in a community is linked to many many communities, we can identify certain distinct themes. other members, but not necessarily to all other nodes in the Often there are recurring patterns in word usages. Such same community. In CPM, the k-clique-communities are patterns may be observed over a day or a week. For exam- identified by looking for the unions of all k-cliques that can ple Figure 11 shows the trends for the terms “friends” and be reached from each other through a series of adjacent k- “school” in the entire corpus. While school is of interest cliques, where two k-cliques are said to be adjacent if they during weekdays, friends take over on the weekends. The share k-1 nodes. This algorithm is suitable for detecting the dense communities in the network.

Here we list a few specific examples of how communities form in Twitter and why users consist of these communities - what user intentions are in each community. Figure 8 illustrates a representative community with 58 users closely communicat- ing with each other through Twitter service. The key terms they talk about include work, Xbox, game, and play. It looks like some users with gaming interests getting together to discuss the information about certain new products on this topic or sharing gaming experience. When we go to specific users , we also find this type of conversation: “BDazzler@Steve519 I don’t know about the Jap PS3’s. I think they have region encoding, so you’d only be able to play Jap games. Euro has no ps2 chip” or “BobbyBlackwolf Play- ing with the PS3 firmware update, can’t get WMP11 to share MP4’s and the PS3 won’t play WMV’s or AVI’s...Fail.” We also noticed that users in this community also share with Figure 11: Daily Trends for terms “school” and each other their personal feeling and daily life experiences “friends”. The term school is more frequent during in addition to comments on “gaming”. Based on our study the early week, friends take over during the week- of the communities in Twitter dataset, we observed that this end. is a representative community in Twitter network: people in one community have certain common interests and they also share with each other about their personal feeling and log-likelihood ratio is used to determine terms that are of daily experience. significant importance for a given day of the week. Using a technique described by Rayson and Garside [30], we create Using CPM, we are able to find how communities connected a contingency table of term frequencies for a given day and to each other by overlapped components. Figure 9 illustrates the rest of the week. two communities with podcasting interests where GSPN and pcamarata are the ones who connected these two communi- 8 http://ravenscraft.org/gspn/home Key Terms just:273 com:225 work:185 like:172 good:168 going:157 got:152 time:142 live:136 new:133 xbox:125 tinyurl:122 today:121 game:115 playing:115 twitter:109 day:108 lol:10 play:100 halo:100 night:90 home:89 getting:88 need:86 think:85 gamerandy:85 ll:85 360:84 watching:79 want:78 know:77

Figure 8: An example of a “gaming” community who also share daily experiences.

Day Rest of the Week Total 5. DISCUSSION Freq of word a b a+b Following section presents a brief taxonomy of user inten- Freq of other words c-a d-b c+d-a-b tions on Twitter. The apparent intention of a Twitter post Total c d c+d was determined manually by the first author. Each post was read and categorized. Posts that were highly ambigu- Comparing the terms that occur on a given day with the ous or for which the author could not make a judgement were histogram of terms for the rest of the week, we find the most placed in the category UNKNOWN. Based on this analysis descriptive terms. The log-likelihood score is calculated as we have found following are some of the main user intentions follows: on Twitter: ∗ ∗ a ∗ b LL =2 (a log( E1 )+b log( E2 )) • Daily Chatter Most posts on Twitter talk about daily ∗ a+b ∗ a+b routine or what people are currently doing. This is the where E1=c c+d and E2=d c+d largest and most common user of Twitter Figure 12 shows the most descriptive terms for each day • Conversations In Twitter, since there is no direct way of the week. Some of the extracted terms correspond to for people to comment or reply to their friend’s posts, recurring events and activities significant for a particular early adopters started using the @ symbol followed by day of the week for example “school” or “party”. Other a username for replies. About one eighth of all posts terms are related to current events like “easter” and “EMI”. in the collection contain a conversation and this form of communication was used by almost 21% of users in the collection. Key Terms going:222 just:218 work:170 night:143 bed:140 time:139 good:137 com:130 lost:124 day:122 home:112 listening:111 today:100 new:98 got:97 gspn:92 watching:92 kids:88 morning:81 twitter:79 getting:77 tinyurl:75 lunch:74 like:72 podcast:72 watch:71 ready:70 tv:69 need:64 live:61 tonight:61 trying:58 love:58 cliff:58 dinner:56

Key Terms just:312 com:180 work:180 time:149 listening:147 home:145 going:139 day:134 got:126 today:124 good:116 bed:114 night:112 tinyurl:97 getting:88 podcast:87 dinner:85 watching:83 like:78 mass:78 lunch:72 new:72 ll:70 tomorrow:69 ready:64 twitter:62 working:61 tonight:61 morning:58 need:58 great:58 finished:55 tv:54

Figure 9: Example of how two communities connect to each other

• Sharing information/URLs About 13% of all the posts Despite infrequent updates, certain users have a large in the collection contain some URL in them. Due to number of followers due to the valuable nature of their the small character limit a URL shortening service like updates. Some of the information sources were also TinyURL9 is frequently used to make this feature fea- found to be automated tools posting news and other sible. useful information on Twitter.

• Reporting news Many users report latest news or com- • Friends Most relationships fall into this broad cate- ment about current events on Twitter. Some auto- gory. There are many sub-categories of friendships mated users or agents post updates like weather re- on Twitter. For example a user may have friends, ports and new stories from RSS feeds. This is an in- family and co-workers on their friend or follower lists. teresting application of Twitter that has evolved due Sometimes unfamiliar users may also add someone as to easy access to the developer API. a friend.

• Information Seeker An information seeker is a person Using the link structure, following are the main categories who might post rarely, but follows other users regu- of users on Twitter: larly.

• Information Source An information source is also a Our study has revealed different motivations and utilities hub and has a large number of followers. This user of microblogging platforms. A single user may have multi- may post updates on regular intervals or infrequently. ple intentions or may even serve different roles in different 9http://www.tinyurl.com communities. For example, there may be posts meant to com:175 twitter:134 just:133 like:86 good:82 tinyurl:75 time:74 new:74 jasona:73 going:68 day:63 don:61 work:58 think:56 ll:54 scottw:54 today:52 hkarthik:50 nice:49 getting:47 got:47 really:46 yeah:44 need:43 watching:41 love:41 night:40 home:40

com:198 twitter:132 just:109 tinyurl:87 going:59 blog:56 like:55 good:51 new:50 url:50 day:49 people:46 com:93 twitter:74 time:45 today:45 :42 just:35 new:32 don:41 think:40 night:38 tinyurl:29 going:24 ll:38 need:35 got:33 ll:22 blog:21 ireland:33 great:31 looking:29 :21 don:21 work:29 thanks:28 video:26 leo:21 flickr:21 like:19 video:18 google:18 today:18 feeds:18 getting:16 yeah:16 good:15 people:15

com:93 twitter:76 tinyurl:34 just:32 new:28 video:26 going:24 ll:22 jaiku:22 blog:21 leo:21 like:19 com:121 twitter:76 just:50 don:19 gamerandy:19 yeah:18 ustream:43 tv:42 live:42 google:17 live:16 people:16 today:39 hawaii:36 day:33 got:16 know:15 time:15 new:33 time:33 good:33 video:32 leo:30 work:30 like:28 watching:28 tinyurl:28

Figure 10: Example Communities in Twitter Social Network. Key terms indicate that these communities are talking mostly about technology. The user Scobliezer connects multiple communities in the network. update your personal network on a holiday plan or a post in improving them and adding new features that would re- to share an interesting link with co-workers. Multiple user tain more users. In this work, we have identified different intentions have led to some users feeling overwhelmed by types of user intentions and studied the community struc- microblogging services [20]. Based on our analysis of user tures. Currently, we are working on automated approaches intentions, we believe that the ability to categorize friends of detecting user intentions with related community struc- into groups (e.g. family, co-workers) would greatly benefit tures. the adoption of microblogging platforms. In addition fea- tures that could help facilitate conversations and sharing 7. ACKNOWLEDGEMENTS news would be beneficial. We would like to thank Twitter Inc. for providing an API to their service and Pranam Kolari, Xiaolin Shi and Amit 6. CONCLUSION Karandikar for their suggestions. In this study we have analyzed a large social network in a new form of social media known as microblogging. Such net- 8. REFERENCES works were found to have a high degree correlation and reci- [1] J. Allen. Recognizing intentions from natural language procity, indicating close mutual acquaintances among users. utterances. Computational Models of Discourse, pages While determining an individual user’s intention in using 107–166, 1983. such applications is challenging, by analyzing the aggregate [2] J. Austin. How to Do Things with Words.Oxford behavior across communities of users, we can describe the University Press Oxford, 1976. community intention. Understanding these intentions and [3] A.-L. Barabasi and R. Albert. Emergence of scaling in learning how and why people use such tools can be helpful random networks. Science, 286:509, 1999. Weblogs and Social Media (ICWSM 2007),March 2007. [19] R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Trawling the Web for emerging cyber-communities. Computer Networks (Amsterdam, Netherlands: 1999), 31(11–16):1481–1493, 1999. [20] A. Lavallee. Friends swap , and frustration - new real-time messaging services overwhelm some users with mundane updates from friends, March 16, 2007. [21] T. Lento, H. T. Welser, L. Gu, and M. Smith. The ties that blog: Examining the relationship between social ties and continued participation in the wallop Figure 12: Distinctive terms for each day of the week weblogging system, 2006. ranked using Log-likelihood ratio. [22] D. Liben-Nowell, J. Novak, R. Kumar, P. Raghavan, and A. Tomkins. Geographic routing in social networks. Proceedings of the National Academy of [4] Blogpulse. The 3rd annual workshop on weblogging Sciences,, 102(33):11623–1162, 2005. ecosystem: Aggregation, analysis and dynamics, 15th [23] Y.-R. Lin, H. Sundaram, Y. Chi, J. Tatemura, and conference, May 2006. B. Tseng. Discovery of Blog Communities based on [5] A. Broder. A taxonomy of web search. SIGIR Forum, Mutual Awareness. In Proceedings of the 3rd Annual 36(2):3–10, 2002. Workshop on Weblogging Ecosystem: Aggregation, [6] A. Broder, R. Kumar, F. Maghoul, P. Raghavan, Analysis and Dynamics, 15th World Wid Web S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Conference, May 2006. Graph structure in the web. In Proceedings of the 9th [24] S. Matsubara, S. Kimura, N. Kawaguchi, international World Wide Web conference on Y. Yamaguchi, and Y. Inagaki. Example-based Speech Computer networks : the international journal of Intention Understanding and Its Application to In-Car computer and telecommunications netowrking, pages Spoken Dialogue System. Proceedings of the 19th 309–320, Amsterdam, The Netherlands, The international conference on Computational Netherlands, 2000. North-Holland Publishing Co. linguistics-Volume 1, pages 1–7, 2002. [7] A. Clauset, M. E. J. Newman, and C. Moore. Finding [25] A. A. Nanavati, S. Gurumurthy, G. Das, community structure in very large networks. Physical D. Chakraborty, K. Dasgupta, S. Mukherjea, and Review E, 70:066111, 2004. A. Joshi. On the structural properties of massive [8] A. Clauset, C. R. Shalizi, and M. E. J. Newman. telecom call graphs: findings and implications. In Power-law distributions in empirical data, Jun 2007. CIKM ’06: Proceedings of the 15th ACM international [9] Comscore. http://www.usatoday.com/tech/ conference on Information and knowledge webguide/2007-05-28-social-sites_N.htm. management, pages 435–444, New York, NY, USA, [10] I. Derenyi, G. Palla, and T. Vicsek. Clique percolation 2006. ACM Press. in random networks. Physical Review Letters, [26] B. A. Nardi, D. J. Schiano, M. Gumbrecht, and 94:160202, 2005. L. Swartz. Why we blog. Commun. ACM, [11] D. Donato, L. Laura, S. Leonardi, and S. Millozzi. 47(12):41–46, 2004. Large scale properties of the webgraph. European [27] M. E. J. Newman. Power laws, pareto distributions Physical Journal B, 38:239–243, March 2004. and zipf’s law. Contemporary Physics, 46:323, 2005. [12] G. W. Flake, S. Lawrence, C. L. Giles, and F. Coetzee. [28] G. Palla, I. Derenyi, I. Farkas, and T. Vicsek. Self-organization of the web and identification of Uncovering the overlapping community structure of communities. IEEE Computer, 35(3):66–71, 2002. complex networks in nature and society. Nature, [13] M. Girvan and M. E. J. Newman. Community 435:814, 2005. structure in social and biological networks, Dec 2001. [29] J. Pontin. From many tweets, one loud voice on the [14] H. Grice. Utterers meaning and intentions. . The New York Times, April 22, 2007. Philosophical Review, 78(2):147–177, 1969. [30] P. Rayson and R. Garside. Comparing corpora using [15] B. J. Grosz. Focusing and Description in Natural frequency profiling, 2000. Language Dialogues. Cambridge University Press, New [31] X. Shi, B. Tseng, and L. A. Adamic. Looking at the York, New York, 1981. blogosphere topology through different lenses. In [16] A. Java. http://ebiquity.umbc.edu/blogger/2007/ Proceedings of the International Conference on 04/15/global-distribution-of-twitter-users/. Weblogs and Social Media (ICWSM 2007),March [17] J. M. Kleinberg. Authoritative sources in a 2007. hyperlinked environment. Journal of the ACM, [32] P. Strawson. Intention and Convention in Speech 46(5):604–632, 1999. Acts. The Philosophical Review, 73(4):439–460, 1964. [18] P. Kolari, T. Finin, Y. Yesha, Y. Yesha, K. Lyons, [33] D. J. Watts and S. H. Strogatz. Collective dynamics of S. Perelgut, and J. Hawkins. On the Structure, ‘small-world’ networks. Nature, 393(6684):440–442, Properties and Utility of Internal Corporate Blogs. In June 1998. Proceedings of the International Conference on