<<

THE LONG-TAILS IN CONTENT SERVICES: HOW THE

STRUCTURE OF HYBRID NETWORKS SHAPE

CONTENT POPULARITY AND RELATED DECISION-

MAKING

by

NIKHIL SRIKRISHNA SRINIVASAN

Submitted in partial fulfillment of the requirements

For the degree of Doctor of Philosophy

Dissertation Adviser: Dr. Kalle Lyytinen

Information Systems Department

CASE WESTERN RESERVE UNIVERSITY

January, 2013

CASE WESTERN RESERVE UNIVERSITY

SCHOOL OF GRADUATE STUDIES

We hereby approve the thesis/dissertation of

Nikhil Srikrishna Srinivasan

candidate for the Doctor of Philosophy degree *.

(signed) Dr. Kalle Lyytinen

(chair of the committee)

Dr. Kalle Lyytinen

Dr. Fred Collopy

Dr. Youngjin Yoo

Dr. Samer Faraj

Dr. Jagdip Singh

(date) 12/15/2012

*We also certify that written approval has been obtained for any proprietary material contained therein.

Table of Contents

List of Tables 5

List of Figures 7

Acknowledgements 9

Abstract 10

1. Pushing the Envelope of Popularity: The Role of Multi-

Modal Networks in IT Mediated Environments 12

1.1 Introduction 12

1.1.1 Research Questions 22

1.1.2 Contribution of Thesis 26

1.2 Organizing Framework and Literature 28

1.2.1 Guiding Literature 31

1.2.2 Network Theory and Methods 34

1.2.3 Networks in Marketing 35

1.2.4 Cognition in Groups and Networks 37

1.2.5 Group Decision-making 38

1.3 Theoretical Development 40

1.3.1 Emergent Popularity 40

1.3.2 Popularity in IT based Environments 42

1.3.3 Networks in Popularity 44

1

1.4 Research Methodology 46

1.4.1 Study Site and Phenomenon

Description 47

1.4.2 Methodology and Research Design 54

1.4.3 Validity and Reliability Assurance 58

1.4.4 Summary and Discussion 60

1.5 Conclusion 66

2. Case Study Analysis (Racing to the head: A dynamic analysis

of long-tail networks in services) 68

2.1 Introduction 68

2.2 Literature Review 70

2.2.1 Network and Diffusion of Information 70

2.2.2 Networks and Viral Marketing of Online

Information 72

2.3 Long-tail in Social Bookmarking Services 74

2.4 Research Methodology 78

2.4.1 Analytic Method 82

2.4.2 State Diagrams 85

2.5 Case Descriptions 87

2.5.1 Case 1: Big Brother Protect 87

2.5.2 Case 2: Firefox Memory Leak 95

2.5.3 Case 3: Ludios 103

2.5.4 Case 4: Camcorder Information 108

2

2.6 Findings 111

2.6.1 Structural Analysis 112

2.6.2 Dynamic Analysis 118

2.7 Discussion and Summary 124

3. Quantitative Analysis (Explaining Network Growth and

Popularity in Hybrid Content Networks) 131

3.1 Introduction 131

3.2 Popularity, Cognition Sharing and Hybrid Networks 137

3.2.1 Tags as Cognition Artifacts 137

3.2.2 Forming Socio-Technical Networks

through Tags 138

3.3 Research Design and Method 144

3.3.1 Research Goals 144

3.3.2 Research Site 146

3.3.3 Construct and Variable Measurement 150

3.3.4 Data Cleaning and Collection 153

3.3.5 Sampling Strategy and Process 155

3.3.6 Descriptive Statistics and Data

Distribution 161

3.3.7 Hypothesis Testing 164

3.4 Findings 166

3.5 Discussion and Conclusion 178

3.5.1 Managerial Implications 184

3

3.5.2 Limitations 185

3.5.3 Future Work 185

4. Discussion and Conclusion 187

4.1 Introduction 187

4.2 Summary of Findings 188

4.2.1 Lessons from Case Studies 188

4.2.2 Lessons from Field Study 192

4.2.3 Concluding Remarks 194

4.3 Limitations 196

4.4 Future Research 198

4.5 Managerial Implications 201

5. Appendix 204

6. Bibliography 207

4

List of Tables

Table 1 – Summary of Guiding Literature 32

Table 2 – Case Categories and Descriptions 82

Table 3 – Betweenness measures for BBP 92

Table 4 – Network Connectedness measures for BBP 94

Table 5 – Betweenness measures for FML 100

Table 6 – Network Connectedness measures for FML 102

Table 7 – Network Connectedness measures for Ludios 107

Table 8 – Betweenness measures for CamcorderInfo 109

Table 9 – Network Connectedness measures for CamcorderInfo 111

Table 10 – Node Centrality Interpretation across cases 112

Table 11 – Node Class Interpretation across cases 112

Table 12 – Network Connectedness Interpretation across cases 114

Table 13 – Control Variables for the research questions 150

Table 14 – Construct and measures for dependent variables 151

Table 15 – Constructs and measures for the independent variables 152

5

Table 16 – Comparison of Study and Reference samples 158

Table 17 - Descriptive statistics for the independent variables in

RQ1 162

Table 18 – Descriptive statistics for the Independent variables for

RQ2 162

Table 19 – Shapiro-Wilks Normality test 163

Table 20 – Results of Logit Analysis 168

Table 21 – Analysis of popularity Cox Regression 174

Table 22 – Survival Table 176

6

List of Figures

Figure 1 – Long-tail distribution of bookmarks 23

Figure 2 – Peaking Characteristic of bookmark 24

Figure 3 – Social Bookmarking System Operations 53

Figure 4 – Long-tail distribution of bookmarks 80

Figure 5 – Decision tree for case selection 81

Figure 6 – Steady State of Social Bookmarking System 86

Figure 7 – State of various Objects in the Social Bookmarking

System 87

Figure 8 – Big Brother Protect 89

Figure 9 – Firefox Memory Leak 96

Figure 10 – Ludios 105

Figure 11 – Camcorder Info 109

Figure 12 – State diagrams for “Rapid” Popular Content 120

Figure 13 – State diagrams for “Slow” Popular Content 121

Figure 14 – State diagrams for “Unpopular” Content 123

7

Figure 15 – Social Bookmarking System Operation 148

Figure 16 – Sampling based on location within the long-tail of content popularity 156

Figure 17 – Generation of network data and measures 160

Figure 18 – Survival function at the mean of covariates 178

8

Acknowledgements

This dissertation is the result of the effort of several individuals who have

contributed in a variety of ways. I would like to first and foremost thank my dissertation committee whose input has contributed greatly to the thesis and without whose support and encouragement this thesis may not have seen the light of day. I would like to express my gratitude to the committee chair Dr Kalle

Lyytinen for his guidance through this process and unwavering commitment in assisting me. I would also like to express my gratitude to my colleagues and friends who have been my travelling partners in this process. Finally, I would like to express my deepest thanks and gratitude to my family whose support and encouragement has been invaluable in making this dissertation possible.

9

The Long-Tails in Content Services: How the Structure of

Hybrid Networks Shape Content Popularity and Related

Decision-Making

Abstract

by

NIKHIL SRIKRISHNA SRINIVASAN

This thesis examines the role that socio-technical networks that drive content popularity. Socio-technical networks are conceptualized as the infrastructure through which distributed individuals make and share cognition.

This making and sharing of cognition through socio-technical networks results in an emergent distributed decision-making process. This emergent distributed decision-making process is embedded within the context of the socio-technical networks and consequently the structure of the networks influences it. The emergent distributed decision-making process influences the popularity of content. This thesis examines the question about which factors of socio-technical networks influence the popularity of content.

10

I explore this question through the use of a combination of a case study

and quantitative field study. The multi-method approach allows us to use a

combination of approaches to explore the static and dynamic factors that

influence popularity of content embedded within socio-technical networks. I find

that technical artifacts play an important role in the sharing of cognition within

content networks and socio-technical networks structures influence both the popularity of content and the duration that content takes to emerge as popular.

This thesis has implications for both research and practice. It moves beyond an examination of the consequences and implications of long-tail behaviors to the structural characteristics that underlie such distributions. It also serves the knowledge management community by emphasizing the role of representational or classification systems in managing and disseminating knowledge. It adds to the IT literature by elaborating on the description of long tail characteristics of IT mediated networks. The implications for managers on the basis of this work are clear. Managers should focus on bridging large networks and making sure that participants in these networks have relationships with each other. While we might appreciate the phrase “let a thousand flowers bloom”, managers must not forget the forest for the trees. This work suggests that managers in addition to encouraging the growth of new ideas should also ensure that these ideas are disseminated as widely as possible through cohesive and connected networks.

11

1. Pushing the Envelope of Popularity: The Role of Multi-Modal Networks in IT Mediated Environments

1.1 Introduction

The following string of hexadecimal digits: 09 F9 11 02 9D 74 E3 5B D8

41 56 C5 63 56 88 C0 appeared on websites such as Slashdot.com and digg.com in the early weeks of March 2007 and created a storm of controversy for the user communities of these sites. This string comprises the key to decoding HD-DVD titles encoded by the AACS (Advanced Access Content

System) content protection system that was developed by the MPAA (Motion

Picture Association of America). Publishing this code compromised resources invested in this protection system and the large number of HD-DVD titles that had implemented this protection system. The sequence was posted before the first week in March 2007 on several occasions, but was taken down due to

MPAA takedown threats. The string was posted on digg.com, a community where users participate in evaluating or “digg-ing” content. A “digg” is an evaluation of the worthiness of content and successive “diggs” indicate that the content is popular and subsequently gets placed on the front page of the site which results in it getting more “eyeball” hits, or more views, which further increase its popularity. Digg.com received several takedown notices from the

MPAA subsequently removed the content under the threat of a lawsuit. The user community reacted strongly to the censoring and treated the takedown of content

12

as an act of censorship and a First Amendment issue. The user community banded together to keep posting the string and “digging” the content so that it continuously moved to the home page of the digg site and went on to become the most popular content. Several commentaries on sites such as Wired and

Slashdot that chronicled this user rebellion also received attention and rose in popularity. The management of the website had a decision to make; whether to be concerned with the survival of the site through a lawsuit, or to alienate their user community. The management relented after overwhelming responses from their users and decided that the voice of the community was more important and rolled the dice with their legal troubles. Currently digg.com is operational and the

MPAA declined to pursue the legal matter any further. However the story is illustrative of a broader point -- that networks of individuals can make decisions without overt collaboration, and these networks drive fads, trends and even specific forms of content emerging as highly popular.

This story is not unique, but is illustrative of an instance where seemingly powerless individuals coalesce using electronic means, and organize around ideas that they have an interest in. Similar phenomena are found on sites and virtual worlds such as , MySpace, Del.icio.us, Second Life, etc. These environments provide individuals with a platform to share opinions, ideas and content and a willing community that may be receptive to these ideas. A community structure emerges in these environments, as each individual organizes him or herself around topics of interest. This thesis focuses on and explores the characteristics of networks and technologies through which content

13

become popular. This form of collective behavior is the result of the intertwining of individual decisions through sharing that reconciles and reflects their opinions.

As a result organizations seek to tap into the collective wisdom (Hempel 2007;

Kozinets et al. 2008) of the masses by leveraging the technologies that groups such as open-source movements use. This thesis sheds light on the dynamic behaviors of networks, their organizing processes and their influence in emergent collective decision-making.

This phenomenon is better understood when contrasted with processes of knowledge evaluation in organizations. Knowledge management focuses on establishing ‘knowledge’ best practices (Alavi et al. 2001; Hansen et al. 1999;

Szulanski 1996) within the organization. These best practices are determined by domain experts and decision makers within the knowledge domain (Hollingshead

2000; Libby et al. 1987). The domain experts and decision-makers are individuals that are familiar with the organizational culture, context and the intricacies of the problem addressed by the best practice. After evaluating the basket of practices the domain experts determine the best practice that applies widely and deeply to organizational processes. These best practices serve as frameworks that can be applied across the rest of the organization in similar problem domains. This form of decision-making about organizational knowledge relies on a set of individuals making decisions for the organization on the value of specific knowledge (Duboff 2007). Thus a few individuals have a great deal of influence in affecting the type of content or knowledge that members in the organization consume and the practices that they deploy in response to

14

organizational challenges. In contrast to this top-down view of diffusion I focus on a bottom up emergent distributed approach. This emergent approach views individuals as being distributed in space and time and emergence refers to the non-linear processes through which new information and content reaches individual and collective levels (Markus et al. 2002). Emergence implies that the process outcomes are not pre-specified and hierarchically controlled. Distributed refers to the geographical dispersion of individuals around the globe, their distribution across multiple organizations, and their differing knowledge bases and socio-economic contexts. Individuals use heterogeneous technologies to mediate their interactions with each other and these technologies take an active role in shaping the process through which content reaches the distributed context.

Several frameworks may be employed to examine this phenomenon and some have already been applied to understand collective processes. One framework that may be applied to this phenomenon is that of knowledge based organizing (Conner et al. 1996; Grant 1996; Tsoukas 1996), a second is that of group decision-making (Davis et al. 1996; Fulk et al. 1991; Poole et al. 1986;

Soubie et al. 2005) and a third perspective may be that of distributed cognition and social organizing (Hutchins 1995; Hutchins et al. 1996; Weick 1979; Weick

1984; Weick et al. 1993). These frameworks highlight different aspects of the social processes of organizing, its outcomes and the socio-psychological characteristics that may influence its outcomes.

15

In an information and knowledge based economy, the focus is to understand the creation, transformation and sharing of knowledge and information by individuals and organizations (Alavi et al. 2001). Individuals are immersed in a steady flow of content and constantly struggle with techniques and methods to manage this information (Brown et al. 2000). A variety of social and technological solutions have been brought to bear on this problem; communities of practice, best practices, knowledge rating systems, corporate directory systems, just to mention a few of the available techniques (Alavi et al. 2001;

Brown et al.). Centralized schemes, as mentioned, are the most often used as to determine the value of information. These schemes rely on expert practitioners and are dominant as they offer economies of scale and take advantage of specialization of knowledge to reduce the cost of determining value of information.

A more moderated view to determining the value of information draws upon a socio-psychological perspective (March 1997; Moscovici 1985; Salancik et al. 1978). This view situates information and knowledge in the context of an ecology of actors and institutions. Value determinations for information are not developed in isolation; rather, they are moderated by social interactions and contextual cues. This view paints an image of an individual actor who is enmeshed in an ecology, has multiple identities, and inconsistent preferences

(March, 1994). In this view relationships among the actors create ecological properties that are not attributable to the behavior of any individual actor (March

1997). Taking the ecological view even further, an emerging body of literature

16

suggests that decision-making in organizations is a collective and dynamic phenomenon that is distributed among individual actors and artifacts (Boland Jr et al. 1995; Hutchins 1991; Tsoukas 1996; Weick et al. 1993). In this view the individual decision does not emerge based on the cognitive process of the mind, but it is based on the structure of relations that the individual has with the actors and artifacts present in the environment. The artifact, in this view, plays an active role in shaping cognition and decision. The artifact is not only a conduit between individuals as previously understood (Butler 2001; Daft et al. 1986; Rice et al.

1991), but forms an active element in the network interacting with other actors

(Kane et al. 2008).

The rapid growth of technologies that provide collaboration and cheap communication has made possible an era in which consensus in decision-making and participatory governance is more feasible (Lee et al. 2003). The use of distributed and participatory decision making mechanisms to determine popularity is made possible by electronic environments. In such environments, popularity and value of information is determined by the consumers that share and view it. The distributed and emergent nature of the process that determines the value among the consumers also makes the structure where the content is embedded important. Thus information technology has transformed the manner in which individuals interact and organize to establish value and come to a consensus on what they consider valuable and popular.

Organizing has been examined in the context of organization and distributed work (Orlikowski 2002), the conclusions of such work has been

17

examine the nature of the work practice without a focus on the social structures within which such work is accomplished. Furthermore, organizing processes have been examined in the context of co-located work and play an important role in coordinating this organizing process (Weick et al. 1993). Information technology artifacts may play a significantly different role in the organizing process where the process itself might influence the information that is used for organizing. Furthermore, the organizing process, in addition to improving the work practice may also lead to specific outcomes such as popularity. In such contexts, multi-modal networks are formed through interactions among individuals, content and technology artifacts (Kim 2006; Soubie et al. 2005).

Multi-modal networks are a type of network that consists of individual actors and heterogeneous artifacts (Kane et al. 2008; Monge et al. 2003a).

This conceptual network allows us to examine the social and technological elements of such systems simultaneously. Past studies have examined the role of the IT artifact in structuring the organizing and distributed collaborative decision-making process (Baker 2002; Ching et al. 1992; Dennis et al. 2001; Kim

1990; Kim 2006; Nault 1998; Soubie et al. 2005; Sutcliffe et al. 2001). IT artifacts and individual actors are brought together through networks. Furthermore, these networks allow to examine non-linear and dynamic characteristics of an iterative process of content evaluation and network creation (Fischhoff et al. 1997a;

Simon 1996). Interactions in such networks influence the process and the popularity outcomes of collaborative processes. Whether and how IT artifacts, as active elements, interact with other actors in shaping outcomes in such hybrid

18

networks has not been explored. Subsequently the central research question that drives this thesis is

RQ -- How do collective organizing processes and distributed decision- making influence popularity in an information technology mediated environment?

As previously mentioned, the process of popularity and organizing can be examined through several lens; for instance social cognition, group decision- making and knowledge based organizing. These lenses highlight different aspects of the same phenomenon i.e. organizing and popularity. In analyzing this phenomenon I draw on the long history of the social, marketing and management science research in social networks and groups. Particularly, I draw on the previous work on group dynamics literature that studied the process (Fredrickson

1986; Janis 1972; Poole et al. 1986; Poole et al. 1989; Segal 1982) via which the decisions are made in groups, group support systems literature (Dennis et al.

1993; DeSanctis et al. 1987b; Huber 1990; Kiesler et al. 1992; Valacich et al.

1994) that examined the role of technology as an aid in group decision-making, research on decision making practice that examined how the organization’s structure and the manner in which decisions are made influences decision outcomes, and organizational information processing and how information asymmetry plays critical role in shaping decision outcomes. I also recognize the value of a growing body of research that examines communications using the lens of infrastructures (Bowker et al. 1999; Star et al. 1996), distributed cognition

(Hutchins 1995; Hutchins et al. 1996), and structuration theory (DeSanctis et al.

1994). These literatures relate the embedding of artifacts into practices that

19

subsequently influence individual decision processes. The concepts of nodes and links present in network literature have been used to study distributed- cognition (Boland et al. forthcoming; Hutchins et al. 1995; Kane et al. 2008).

These studies find that the structure of networks of heterogeneous actors influence decisions in organizations and other collectives. This complements traditional the social network research that has looked at the effects of networks on decision-making in varied contexts including inter-organizational relationships

(Granovetter 1973), strategic decision-making in organizations (Carpenter et al.

2001), institutional influence (Mizruchi et al. 2001), alliance and coalition formation (Mizruchi et al. 1998) and individual media choice (Chang et al. 2001;

Rice et al. 1991).

Networks have used to describe underlying social and technological structures that characterize groups, communities and organizations. Researchers have used the network lens to analyze macro-level outcomes such as innovation, job-seeking, information diffusion and, social capital (Barabási et al. 2002;

Borgatti et al. 2003; Brass et al. 1998; Burt 1997b; Buskens et al. 1999; Cross et al. 2001; Hansen 2002; Lai et al. 2002; McPherson et al. 2001; Podolny et al.

1997; Wellman et al. 1996). In this thesis the processes by which popularity is determined takes place in the context of networked individuals and artifacts and their structures. Organizing and popularity arise from the interactions between individuals and artifacts in the network through non-linear dynamic processes.

The network lens allows me to move between macro-level network outcomes

20

and the micro-level individual interactions that contribute to the structures that generate the emergent outcomes.

A key characteristic of the process by which popularity is created, is, the assignment of value and the embedding of the cognitive structures of independent individual actors to information and content. In contexts where the value of content is determined by the masses, individuals are situated in idiosyncratic knowledge involvements. Such involvements give them a unique view on the importance and value of information and knowledge that centralized systems do not have. This form of practice and decision-making is evident in open-source communities (Lakhani et al. 2003; Von Krogh et al. 2003) where decisions regarding specific pieces of the application are left to the developers who have the most experience (No’Mahony et al. 2007; Sharma et al. 2002) with that portion of the application. Centralized mechanisms to determine value of content quickly become problematic when dealing with decisions regarding diverse information (Brown et al. 2000) and knowledge bases (No’Mahony et al.

2007). Using groups of individuals to determine the value information and content is quickly becoming the norm in contemporary IT environments with a variety of sites employing IT features to get consumers to judge the value of information.

Organizations such as IBM are also investing in technologies to evaluate the value of the information and knowledge they have available within the organization through the diverse knowledge bases of their employees.

These efforts of the collective determination of value are visible in the consumption behaviors of individuals. Specifically, collective value is visible

21

through the consumption of some information more than others. While the focus of research and practice is on the ways by which this can be employed, there has been little focus on the structures that underlie these collectives and the manner in which they influence decision-making and consumption. Thus while we know that collaboration is important (Fleming 2007) in determining the value of innovation the structures underlying these collective efforts of valuation in such contexts is still unclear and this thesis is an effort to understand how these underlying structures influence collective decision-making.

I now refine the broad research question posed into specific research questions to explore this topic in further detail.

1.1.1 Research Questions

In this thesis I attempt to answer the overall research question in the context of a (social) bookmarking service where individuals make collective decisions about the value of content by independently “voting” through bookmarking the content they deem valuable. “Voting” is a form of decision- making where individuals filter their individual preferences through a collective process that takes place across the whole network. The multi-modal network in social bookmaking services consists of individuals, the content bookmarked and the tags that are used to describe the content. Tags signal the nature of the content bookmarked, and form part of the cognitive network. They are analogous to reputations systems that are used to signal the quality of information.

Reputation systems convey information about the quality of individuals, content, products, etc (Resnick et al. 2002). Reputation systems influence trust, customer

22

choice and buying patterns in sites such as eBay, and the systems are part of the information environment (Pavlou et al. 2006). Tags on the other hand are similar to taxonomy or a set of keywords, and they are used to characterize the content of the bookmark. Within the context of decision-making, tags serve to signal the nature of the content. An individuals decision to bookmark specific content takes place within the hybrid multi-modal network and the tags that they use are influenced by their positional, relational and structural characteristic within the network.

An exploratory examination of popular bookmark services suggests that collective decisions concerning the value of content are a function of the frequency of bookmarking. This generally obeys a long tail distribution (Figure 1) and can be described as an exponential distribution created by a power-law.

Figure 1. Long-tail distribution of bookmarks

23

In such a distribution a small proportion of content receives a large number of bookmarks while most content receives very few. Exploratory observation of the content that receives a large number of bookmarks further suggests that there is a ‘peak’ with a distinctive deflection in the collective bookmarking (Figure 2). This deflection signals a sudden change in the bookmarking activity associated with the specific content. It signals the introduction of a node, which changes the rate of attachment to the network that pushes the content towards popularity.

Figure 2. Peaking Characteristic of bookmark

In this study I explore this dynamic social bookmarking process as an instance of emergent distributed decision-making by analyzing structures of hybrid networks consisting of individuals, tags and content. Specifically I ask the following two questions.

24

Research Question 1: What are the characteristics of hybrid networks that help distinguish related content that belongs to the head from that which belongs to the tail of a long tail distribution?

By answering this question I identify a set of structural properties within hybrid networks that act as antecedents of emergent decision-making outcomes i.e. popularity of the content. Connections among the heterogeneous elements in these hybrid networks influence the manner in which collective decisions materialize from the actions of individuals. By understanding these structural variables I expect to be able to predict the outcomes of collective decisions based on a limited set of initial network conditions (Hutchins et al. 1995)1.

Research Question 2: For content belonging to the head region of the long-tail, what are the characteristics of the hybrid networks that determine the duration at which this content emerges as popular?

There is a point in the life of content that bookmarking activity by a certain individual triggers a massive and rapid bookmarking behavior by others. This deflection point seems to be similar to critical point or tipping point observed in other complex systems. In prior research (Barabási et al. 1999a; Barabási et al.

2002) such rapid growth in the network is the result of preferential attachment on part of others in the network to this individual. Prior to this deflection point there is a dormant period in the bookmarking behaviors. I will look for structural characteristics of hybrid networks that can explain the length of this dormant

1 In a similar manner Hutchins through simulations of connectionist networks demonstrates that shared lexicons can emerge from interactions of individuals.

25

period. This question allows me to examine the relationships between network characteristics and temporal behaviors of bookmarking. In addition I will also look for additional temporal patterns that might exist in distributed decision-making in social bookmarking settings. By examining such temporal dynamics I hope to explain the efficiency (duration) in materialization of the collective decision which is analogous to decision-making time in group decision-making settings.

Furthermore by understanding these characteristics one may be able to predict and identify nodes that are critical for rapid dissemination of content. Identifying these nodes allow us to “seed” content in the most fertile spots rather than randomly distributing it.

1.1.2 Contribution of the Thesis

This thesis contributes to research on networks and communication and practice in several ways. It develops a hybrid network model of the process via which popularity of the content is determined. It goes beyond a discussion of collaboration (Fleming 2007) to look at the structures that resulting from such collaboration and their influence on popularity. It examines how individual decision-making is influenced by characteristics of the hybrid network and how emergent decisions materialize from individual decision-making behaviors. This model is developed in the context of social bookmarking, where multiple decisions about the value of content is combined into an emergent decision.

It adds to the IT literature by elaborating on the description of long-tail characteristics of IT mediated networks. It theorizes about the hybrid network

26

structures that underpin long-tail distributions. It further theorizes about the social-technical characteristics of nodes that are critical to the emergence of the long-tail. Collaboration is critical to the emergence of long-tail structures in innovative contexts and success of an innovation is based often on the collaborative relations that individuals and organizations form (Fleming 2007).

However we have yet to understand the specific forms of collaborative structures that influence the nature of such long-tail distributions and this thesis can contribute to such an understanding. To practice, this study contributes by moving the focus of knowledge research from content and best practices to emphasizing the broader dynamic network context within which content is placed. This study hopes to move from “what do I put in content/knowledge?” to

“where do I place content/knowledge?” Thus this study emphasizes the planting of a seed of knowledge and its diffusion in addition to recognizing the importance of the seed itself. It also contributes to practice by examining the phases or the process through which content moves to popularity by examining the structural features of the underlying networks. Practitioners can subsequently design interventions in such networks to influence the manner in which they develop and consequently the content evaluative processes in these networks. Furthermore, practitioners can also embed specific IT artifacts in the content environment to aid decision-making and articulation of the cognitive structures that assist in the collective emergent decision-making process.

In the following sections of this chapter of the thesis, I summarize existing literature on group dynamics, group decision-making, social networks,

27

information classification and knowledge transfer to guide the process of theoretical development. I subsequently develop the broad research model and describe the constructs of interest. The concepts used in the thesis are introduced in this section and mechanisms that describe the popularity process in emergent decision-making contexts are described. The roles of the IT artifacts in the domain are described along with the manner in which they contribute to the formation of the hybrid network structure. In the section on research methodology

I describe the site of data collection for this thesis and discuss how it serves in answering the research questions. This chapter also details the data collection process. I also describe the companion pieces to the theoretical development that are a case study and a qualitative analysis that builds on the results from the case study. In the case study I perform a detailed analysis of specific cases of network growth and change to understand how popularity is determined in multi- modal networks. This helps refine the constructs that are critical for detailed theory development and testing. The quantitative analysis expands on findings of the qualitative study by formulating a set of hypotheses about network characteristics and popularity that are tested using systematic sampling and statistical analysis.

1.2 Organizing Framework and Literature

The framework of organizing is used to examine work and decision- making in distributed settings and is popular within management circles since the late 1970s. Organizing in the context of management of organizations developed in the work of Weick (1979) with his seminal book on the social processes in

28

groups e.g. loose coupling, and the psychological processes within the individual mind e.g. satisficing, that aid in the organizing process. While not the predominant theme, they have been part of management vocabulary and thinking for the last 4 decades and have been influential in manner in which organizations are researched and structured (Anderson 2006b). This organizing framework is the basis on which the literature and theoretical development takes place in this thesis.

Organizing is intertwined with the process of sense-making on part of the organization members (Weick et al. 2009) as they notice, understand, share and respond to actions and behaviors of individuals around them and the organizational environment. Sense-making takes place through the process in which organizational circumstances are transformed into words and organizing takes place through the spoken and written texts generated by the organizations.

Organizing is subsequently the experience of being thrown into an ongoing unknowable and unpredictable stream of experience (Weick et al. 2009). Sense- making takes place in the process of work as individuals engage in their knowledge involvements and this process when subsequently articulated and shared forms the infrastructure around which organizing occurs (Dutton et al.

2006). Organizing is fundamentally a distributed and emergent concept as reflected in the various contexts of use. Organizing takes place in making sense of pain and loss in the community as members of the community, identify and organize around the source of pain in the community (Dutton et al. 2006) and attempt to find ways to alleviate it. Organizing takes place even within certain

29

bounded structures such as jazz and classical music (Hatch 1999). While organizations may have certain degrees of structure, such as a jazz quartet, organizing may take place in the empty space between the structures. The organizing takes place in the process of playing, hearing and feeling the jazz ensemble and not as a prearranged process. Innovation may also be enabled through the organizing process (Chesbrough et al. 2002). Organizing processes are specifically suited for situations where autonomous innovation, i.e. innovation that can be pursued independently, is the goal of the organization. In the context of this thesis, the organizing lens sheds light on the constituent elements of the process and the role of technology in organizing and distributed decision-making.

Organizing in the context of distributed knowledgeable individuals suggests that theory and methods that examine the processes and interactions of distributed individuals may shed light on the research questions. For instance, literature in network theory and methods describe the how network characteristics influence processes and outcomes at individual and collective levels. Nodes, ties and structural characteristics influence a variety of individual and organizational outcomes such as knowledge sharing (Favela 1997; Hansen

2002; Reagans et al. 2003; Singh 2005; Teigland et al. 2009), team performance

(Cummings et al. 2003; Lamertz 2006; Sparrowe et al. 2001; Steyer et al.

2006a), social capital (Burt 1997b; Gargiulo et al. 2000; Reagans et al. 2003;

Tsai et al. 1998; Wasko et al. 2005), etc and network literature helps identify characteristics relevant to the research questions. In addition to the social network lens, literature on viral marketing has examined the role of individuals

30

and relationships in information diffusion through large collectives also aids in theorizing. This literature also examines the flow of information through word of mouth networks, the role of cues and signals in information flow and the process by which peripheral members propagate information in networks. Factors such as norms, cohesiveness, and leadership affect individual decision-making in group contexts and I examine their analogs in a network context. Influence processes that exist in face-to-face groups might not have direct analogs in distributed digital environments, but the relevant concepts may describe how interpersonal influence decisions in distributed settings.

Guiding Literature

To describe the effect of information technology on the organizing process, literature on distributed cognition and classification systems serve as a guide to understanding the cognition sharing and emergent popularity process in

IT environments. IT serves as a generative tool (Avital et al. 2009) for new organizational structures and allow them to rapidly take advantage of new opportunities (Lucas Jr et al. 1994). Literature on distributed cognition argues that cognitive processes are distributed across individuals and artifacts and I look at how classification systems serve as these artifacts in an electronic environment. These disparate literatures are synthesized to understand the processes and mechanisms that underpin the popularity process, decision- making behaviors of individuals and the characteristics of networks that influence the process.

31

Table 1. Summary of Guiding Literature

Literature Area of Focus Domain Outcomes Determinants Stream Group Research Decision-Making Processes Effectiveness, Efficiency, Phase models (Forsyth 1990; Poole et al. 1989), Procedural mechanisms (Davis et al. 1996), Task Characteristics (Segal 1982; Zigurs et al. 1998) Structure Efficiency, Effectiveness, Decisional Norms (Dennis et al. 2003; Kraut et al. 1998), Stress, Participation, Cohesiveness (Janis 1972), Leadership, Decision Outcomes Isolation

Information Group Success, Communication (DeSanctis et al. 1987b; Kim Processing Group Effectiveness, 2006; Matzat 2004; Soubie et al. 2005), Media Assessment and Use, Group Social Cues (Salancik et al. 1978), Memory Proximity (Ahuja et al. 2003), Size

Marketing Influence Product adoption, Centrality (Subramani et al. 2003), Information Diffusion, Interpersonal Relationships (Brooks Jr 1957; Time of adoption Phelps et al. 2005), Strength of relationships (Iacobucci et al. 1992; Ouyang et al. 2004), Signals/Cues (Herbig 1996), Frequency(Herbig 1996)

Social Networks Ties, Positions, Social capital, Information Diffusion, Network Closure (Coleman 1988; Coleman Structure Organizational Interlocks 1990), Structural Holes (Burt 1992a; Burt 1999; Burt 1997a), Weak Ties (Constant et al. 1996; Granovetter 1973; Hansen 1999; Pickering et al. 1995), Positional Similarity (Burkhardt et al. 1990), Reachability, Homophily (Ibarra et al. 1993), Strong/Kin ties, Centrality (Ahuja et al. 2003; Friedkin 1991; Ibarra et al. 1993; Mizruchi et al. 1998) Science Redundancy, Path length Preferential Attachment (Barabási et al. 1999a; Barabási et al. 2002; Jeong et al. 2003), 32

Clustering(Watts 1999a; Watts 1999b; Watts et al. 1998), Weak Ties (Strogatz 2001)

Distributed Influence Group/Team effectiveness, Job Mental models (Boland et al. 1994), Collective Cognition effectiveness, System design mind (Weick et al. 1993), Team cognition (Hutchins 1995; Hutchins et al. 1996; Jun et al. 2007)

33

1.2.2 Network Theory and Methods

The network literature has several facets that help shed light on the research questions. The network method extracts structural features of networks and relates these features to broader phenomenon such as those asked in the research questions. It also allows me to resolve analytical tensions by examining relationships between artifacts and social actors in a symmetric nature (Latour

1999). I start with a review of this literature stream since it is most directly relevant to the research questions posed.

An examination of the findings from the network literature suggests that a lot of attention has been paid on relationships (Burt 1999; Burt 1997b; Burt 2000;

Coleman 1988; Granovetter 1983; Granovetter 1973; Putnam 1995) and positional characteristics of individuals (Ahuja et al. 2003; Cheng-Min et al. ;

Ebadi et al. 1984; Hossain et al. 2009; Ibarra et al. 1993; Lamertz 2006) and organizations. However while the configuration of ties and the content passed through them it important, the synthesis of these two perspectives is hard and results in a compromise of one or the other or both. For instance, examination of information diffusion usually examined specific types of networks such as kinship networks giving priority to the structure of the network over the content that is passed through it. The review also suggests that little attention is paid to the dynamics of the network. One reason might be that this is a limitation of the method since social networks grab snapshots of the social structure. The dynamic nature of networks has typically been examined through the use of

34

simulation models2 that rely on state and process information as key parameters in the models. I also find that recent findings on properties of networks such as small worlds (Davis et al. 2003; Watts 1999a; Watts 1999b; Watts et al. 1998) and preferential attachment (Barabási et al. 2002; Barabási et al. 1999b; Jeong et al. 2003) useful in explaining emergent phenomenon in networks. These findings are important in the context of popularity and decision-making since it sheds light on the processes via which nodes attach to networks and influence subsequent network evolution. The findings also aid in examining specific contingent characteristics (Burt 1997a), such as network content, in examining organizing and the popularity process. Technological mediation of social networks creates new relationships between nodes and subsequently new forms of networks. Technological mediation also implies that technology artifacts are becoming the nodes that have been traditionally occupied by social actors. Thus individuals and technologies come together to form multi-nodal networks (Kane et al. 2008) and network research is increasing focusing on such networks.

1.2.3 Networks in Marketing

Marketing literature has examined the influence of network position during information transmission and diffusion. While, social network and organizational research highlights the role of central members (Beattie 2002) in information and knowledge diffusion, literature on viral marketing illustrates that both central

2 An exception is Venkatraman, N. and C. H. Lee (2004). "Preferential linkage and network evolution: A conceptual model and empirical test in the US video game sector." Academy of Management Journal 47(6): 876-892.

35

(Brooks Jr 1957; Phelps et al. 2005; Vilpponen et al. 2006) and peripheral members (2005) play an important role in propagating information. Individuals may have different motivations for diffusing and propagating information in consumer networks and subsequently may play different roles in networks

(2003). Viral marketing also examined as word-of-mouth networks and has examined the success of word-of-mouth phenomenon in generating popular outcomes. Word-of mouth takes the form of reviews (Chevalier et al. 2006), indications of valence (Liu 2006) that when conveyed by individuals and groups that are highly regarded, influence the decision-making processes (Duan et al.

2008) of others and results in popular and un-popular outcomes. Critic reviews are both predictors and influencers of ticket sales for movies and they tend to influence sales in the early period after the movie launch rather than the latter period (Basuroy et al. 2003). Word-of mouth studies typically examine the effects of volume, valence, ratings rather than the structures in which such volume, valence and ratings are embedded. To the extent that structures of the industry influence word-of-mouth effects, star ratings for actors and directors mediate or moderate timing effects of release and influence movie sales (Ainslie et al.

2005). However this approach narrowly focuses on the individual critic or review as the focal method through which influence and word of mouth is propagated without examining the broader structure through which the diffusion takes place.

Furthermore, marketing literature also highlights the importance of signals, cues and type of content in decision-making similar to the concept of the “stickiness”

(Gladwell 2002). From this section I can conclude that peripheral structures are

36

created by members who are influenced by the signals propagated in the environment (Bhuian 1997; Sia et al. 2002; Stafford 1996). The structures of these peripheral members generate are around core individual nodes and peripheral members monitor these core nodes and attempt to mimic them in the process of information diffusion. Subsequently, the structure of the periphery is important in a study of popularity and decision-making since peripheral members are influenced by the signals conveyed by central members and subsequently influence the dissemination of information.

1.2.4 Cognition in Groups and Networks

The theory of distributed cognition systems (Hutchins 1991; Hutchins

1995; Hutchins et al. 1996) illustrates that sharing cognition through the use of artifacts results in better coordination and sharing of knowledge resources

(Bowker et al. 2000; Madhavan et al. 1998; Weick et al. 1993). Cognition is a process that takes place across a variety of representational media and knowledge and information is propagated across these states. These representational states can include written media, graphical media, computer representations, etc. Cognition embedded in artifacts is not a stable and fixed entity, but rather one that is continuously in the process of change. This cognitive change is a function of the individuals that interact with the artifact and is visible through representations such as classifications as illustrated by Bowker and Star

(2000). Cognitive structures embed themselves within information technologies systems and are also subject to change through group processes (Argyres

37

1999b; Crowston et al. 1998; Rogers et al. 1994). Change is especially important in the context of popularity since changes in artifacts change decisions of individuals in unexpected manners.

1.2.5 Group Decision-Making

Group research broadly focuses on two aspects of decision-making, the process of how decisions come about and the structure of the groups that make decisions. Processes refer to the stages that a group goes through before it reaches a decision. This emphasis on process focuses on the sequence of events that lead to the group decision or any individual aspect of the processes within group decision-making. This has also been referred to as decision- development (Forsyth 1990). There is no singular uniform process for a group to go through since groups evolve differently and work through problems differently.

A combination of rules and shared social norms influence group decision-making process and outcomes through. One such form of procedural mechanism is polling or voting. Davis, Hulbert and Au (1996) show through computer simulations that timing of the poll can influence the decision outcome. However, group members also take into account group structure in their deliberations

(Schultz 1999). By structure I refer to the characteristics of the individual and the group that are engaged in the decision-making process. Poole and Roth (1989) find that group structural variables influence decision-development processes.

They measure group structure with variables such as group size, cohesiveness, etc and find that group structural variables influence decision processes to a

38

greater degree than task characteristics. Relational communication influences the group structure and group characteristics such as norms, cohesiveness, satisfaction and groupthink (Keyton 1999). Several non-communicative means such as non-verbal cues that are conveyed by the individual and the environment influence the group’s performance, effectiveness and decision-making. Status hierarchies within groups also influence information processing within it. Salancik and Pfeffer (Salancik et al. 1978) developed the social information processing theory that states that individual decisions and attitudes are not developed in isolation. The influence mechanism in social information processing is the social relationships in the environment. These social relations are observed through various network linkages that individuals have (Bovasso 1996; Meyer 1994).

Several studies (Dennis et al. 2001; DeSanctis et al. 1987a; Poole et al.

1990; Trauth et al. 2000; Zigurs et al. 1998) examine the effect of information and communication technologies on group decision process. Since groups are always engaged in information sharing and information processing, the manner in which information is processed can help explain group success and effectiveness. Since information is essential to the ability of a group to perform tasks and solve problems groups discussions are also referred to as collective information processing. Collective information processing is the degree to which information ideas or cognitive processes are shared and are being shared among group members and how this sharing of information affects both individual and group level outcomes. This stream of research sheds light on how information processing takes place within collectives. Information processing in the context of

39

emergent distributed decision-making will rely on the relational cues from individuals in the environment and the efficacy of such cues is based on the proximity and equivalence of individuals. Communication changes the perception of valence of a piece of information. Pieces that were not considered important by group members might receive more significance after communication among members. Group support systems also increase the intensity of communication in groups (Connolly et al. 1990; Gallupe et al. 1992; Jessup et al. 1990). They increase the participation among all participants on average (McLeod 1992) as they weakened group norms of not speaking out against authority (Chidambram et al. 1993). Position-to-position relationships conveyed through organizational structure are broken with the use of GSS systems and this changes the dynamics of the group. GSS also result in a reduction of tie strength within the group (Kiesler et al. 1992) since they reduce typical norms that exist in face-2- face groups.

1.3 Theoretical Development

In this section I examine the concept of popularity and articulate a process through which popularity emerges from the interactions within a socio-technical, multi-modal network.

1.3.1 Emergent Popularity

Popularity is a multi-level emergent process via which autonomous independent actions by actors (such as decisions communicated through

40

information technology) are transformed into collective patterns and behaviors.

Emergence has rich history and has been discussed in biological, social and organizational contexts. Emergence in complex systems arises from self- organization rather than from an imposition of external order. Interactions between micro-agents and macro-levels of the system ushers forth novel patterns, structures and processes that are not present in the agents, but arise from their interactions (Goldstein 2000). Emergence creates new patterns in social organizations such as culture, meaning, relationships, decision processes, etc. (Truex et al. 1999). New patterns emerge through a process of social negotiation and consensus building and takes place between individuals in a variety of contexts. Emergence continuously recreates organizational populations

(Chiles et al. 2004) through mechanisms such as spontaneous fluctuations, positive feedback loops, coordinating mechanisms and recombination’s of resources. Collective systems usage (Burton-Jones et al. 2007) and team cognition (Jun et al. 2007) in organizations are described as emergent phenomenon. Emergence is a characteristics of groups as well (Katz 1993) and is represented in Gersick’s (1991) models of groups in real situations.

Subsequently, the process by which the collective decision is emergent and is non-linear and the outcome of the process cannot be pre-determined. Thus emergence in the popularity process is the “the transformation of multiple individual decisions through non-linear socio-technical processes to create new patterns and outcomes at both individual and collective levels of the system.”

41

Distributed characterizes the social context in which decision-making and emergent popularity develops. The actors of the system in which decision- making takes place are dispersed both geographically and temporally (Kim 2006;

Soubie et al. 2005). It also characterizes the individual decision-maker who may be stratified by occupation, education, income and a wide variety of other individual characteristics (Ching et al. 1992). Distribution also characterizes the domains in which the decision-making takes place exists. The object of the decision can exists at the intersection of multiple information domains that influence the decision-making behaviors of individuals. Thus this environment is characterized by a unique set of processes and social structure characteristics that influence decision-making and the emergent popularity process.

1.3.2 Popularity in IT Based Environments

Information technology is the enabler of communication between individuals in distributed settings. Subsequently IT influences both the individual decision and the influence of individuals on others in the decision-making process as well as the process of emergence itself (Huber 1990). Thus information technologies mediate the process of individual decision-making and emergent popularity. Popularity and individual decision-making is examined in the identification, adoption and classification of information and content by distributed individuals.

Cheaper storage and increased computing capacity has made it possible to store, classify and manage greater amount of information. Due to the

42

increasing scale and scope of the information that needs to be classified there is movement towards the decentralization of the classification process.

Classification is pushed to the individuals who are the most knowledgeable about the information and the domain. The knowledgeable individuals are those that interact with the information on a regular basis and are involved in the practice of creating and using that information. As classification is decentralized, local and in-practice meanings of the information and knowledge emerge. The emergent classifications have meanings to the individuals in local contexts. This emergent classification is referred to as a “”. A folksonomy is the classification scheme that is generated by a distributed group of individuals for the knowledge domain. relate tags to an individual’s understanding of the knowledge domain. The folksonomy emerges as multiple individuals participate in creating a shared and emergent topological classification scheme.

Decision-making and group processes literature finds that the social structure of groups and the flow of information within it influence the decision- making process. In contemporary IT based environments, decision-making is influenced by the task and the cues that individuals receive from the environment

(Moscovici 1985; Salancik et al. 1978). Technology has increased the scope and scale of what needs to be classified but has also provided innovative solutions to the classification challenge (Brown et al. 2000). Embedded in information systems are classifications that coordinate the activities of individuals (Argyres

1999a). These forms of structural cues influence member decisions and actions

(Moscovici 1976; Salancik et al. 1978). These structural cues are similar to

43

discussions of “signals in marketing and advertising literature that are external information cues (Herbig 1996). Individual decision-making is an emergent process where cues in the environment influence the decisions of individuals.

They look for decisions made by others similar to themselves and hence look for structural similarity cues in the emergent distributed decision-making environment (Ching et al. 1992).

1.3.3 Networks in Popularity

Multi-modal networks to refer to those networks that have a combination of social actors and technological nodes (Kane et al. 2008). Social actors are nodes that make decisions about content and create a folksonomy. These actors are of the type typically represented in formal social network analysis. By technological nodes I refer to the tags or the structure of a classification scheme.

Tags used to construct folksonomies are cognitive repositories for the individuals making decisions. Such cognitive devices have been considered as technological artifacts (Argyres 1999b) in literature dealing with distributed cognition (Hutchins

1991; Hutchins et al. 1996) and information classification (Bowker et al. 1999;

Star 2002). These cognitive artifacts persist over time and influence subsequent decisions. As cognitive artifacts tags are interpretations by individuals. They are signals that serve as extrinsic information cues to the nature of the content

(Herbig 1996). As they are a type of classification they mediate the relationships between the social actors that create them. Networks are not formed by nodes but based on a combination of nodes and relationships between the various

44

nodes (Wasserman et al. 1994). The relationship between the social actors and technological nodes in this network is that of co-presence. Co-presence implies that the nodes are present in the same environment and relationships exist as part of the co-presence. Communication among the actors is not necessary to establish the relationship and the presence of nodes in the same location is sufficient to establish a relationship (Wasserman et al. 1994). Such forms of relationship are referred to as affiliation based relationships where relationships exist based on affiliations to organizations or other entities. In this case the affiliation based relationship exists because the social actors and technological nodes are affiliated with the same content.

Decision-making has been studied in a variety of contexts such as varied as groups, teams, organizations and natural systems. In such contexts emergence has resulted in collective system usage, team cognition, information organizational structures (Johannisson 1987), organizing processes (Weick et al.

1993) and more. The folksonomy is a result of a distributed decision-making process. In addition to a folksonomy, the process results in a long-tail distribution for information that has been organized and classified. Such distributions are seen in sites such as Amazon.com, Barnes and Nobles where purchase decisions of individuals are based on purchases by others (Anderson 2006a;

Brynjolfsson et al. 2006). This means that the more individuals that purchase a book, the more likely it is that others will purchase it as well. Each individual decision to purchase contributes to the subsequent purchase decision. The cumulative outcome of these decisions results in a frequency distribution on a set

45

of information, content or objects. The frequency count represents the number of individuals making consumption decisions on that information content or object.

The long tail distribution represents an ordered frequency distribution, from high to low, on a set of information, content or object (Brynjolfsson et al. 2006). A long-tail distribution is the result or the emergent “pattern” of a decision-making process. The long-tail distribution is also be referred to as a power-law distribution and this distribution can be segmented into two regions, the head that contains objects with a high frequency and a tail that contains objects with lower frequency.

In this section I described the individual decision-making and emergent popularity process. I argued that this process is used to evaluate informational content and to generate folksonomies. This process results in long-tail distributions. Multi-modal networks characterize the relationships between the social and technological nodes in an emergent distributed decision-making content. These networks influence the long-tail results from the process and I use the characteristics of multi-modal networks to explain specific characteristics of the long-tail distribution about which I develop specific propositions.

1.4 Research Methodology In this section I describe the phenomenon of interest and the context in which I examine the process of collective decision-making and popularity. I describe the process via which site of the study is selected and its relevance to in

46

examining popularity and emergent decision-making processes. Furthermore, I describe the criteria and process of data collection from the study site and also discuss how concerns of validity are addressed in this thesis.

1.4.1 Study site and Phenomenon Description

The phenomenon of emergent decision-making and popularity is visible on several websites on the internet and hybrid networks play a significant role in the process. Sites including content aggregators, user-submitted services, and social bookmarking services such as digg.com, delicious.com, stumbleupon, Furl and many more display such behavior. User-submitted content services (digg.com) aggregate a variety of content and provide recommendations while bookmarking services (delicious.com) have a narrower focus towards representing individual bookmarking activity. The user-submitted content services bring together many sources of information and allow individuals to vote on the information but do not allow users to comment on the information itself. Thus aggregator’s services are typically one-dimensional i.e. voting when it comes to interacting with the user.

Bookmarking services on the other hand allow users to share content with other individuals and gives them the opportunity to add tags and comments to the information and content they choose to share.

Social bookmarking is a phenomenon where individuals share content through the use of bookmarks and these bookmarks to organize them. As such, social bookmarking services allow users to employ the artifacts in the ecology of the social bookmarking space to deploy and maintain cognitive structure in the space and also communicate with users in that space. This thesis

47

employs social bookmarking services as the site for content services to investigate the research questions.

The site for this thesis must display characteristics of emergent decision- making, popularity and hybrid networks in order to enquire into the research questions. Specifically the site must display a. decision-making behavior on part of individuals, b. distribution in knowledge and domain expertise of individuals and c. the emergence of collective decision-making. I discuss each in turn.

a. Individual Decision-making – Bookmarks are made by individuals in

the social bookmarking service. Bookmarking is a choice of the

individual and represents the determination of the value of content.

Bookmarks are a choice of specific pieces of content among others.

Thus they represent decisions by individuals in the social bookmarking

system on the value of the content. Content is also tagged by

individuals and is a form of scaffolding used to organize the content.

The choice of tags that get applied to content is the individual’s

cognitive structure about the domain. Tags that are used to categorize

the content are decisions about the cognitive scaffolding applied to

content. Thus individuals bookmarking and tagging activities represent

decision-making behavior.

b. Distribution – Distribution need to be established in the social

bookmarking domain. One dimension of distribution is knowledge

domains across which content are present in social bookmarking

domains. This dimension of distribution is also includes the

48

representational activities of individuals of content such as tags.

Bookmarking behavior is not limited to any one individual and many

can bookmark any piece of content. These individuals who view and

evaluate content are distributed in the time and across the

geographical space of the web. Subsequently this dimension also

includes individuals that are distributed both geographically and

temporally in the social bookmarking system.

c. Emergence – Emergence is the process through which popularity is

developed in social bookmarking environments. Bookmarking

continues over time and generates an emergent decision about the

value about content. This process is non-linear and varies for each

individual piece of content. In addition to generating an emergent

decision about the value or popularity of a piece of content, the

processes also generate a folksonomy around that piece of content. All

content and associated bookmarks display emergent decision-making

activity.

In addition to the previous characteristics, cases in emergent distributed decision-making contexts need to display long tail characteristics. Random sampling is not adequate since the population does not display a normal distribution for popularity. Subsequently, to get a representative sampling of the population that exhibit long tail characteristics purposeful sampling of the population needs to be employed.

49

Several bookmarking services are available3 that feature different types of content with proprietary/public features. Individuals can bookmark research content and share it through services like IBM Lotus Connections, Diggo and

Connectbeam. Other services provide bookmarking for users in specific regions of the world. All such sites provide bookmarking, tagging, sharing and note-taking services. Despite the number of services available, delicious.com serves as the template for most of such services and most local and regional social bookmarking services imitate this first social bookmarking service.

Delicious.com is a social bookmarking web service designed for storing, sharing, and discovering web bookmarks. On this service individuals start by creating accounts on the system (Figure 3) after which they download a toolbar that integrates with the users web-browser and provides them access to features of the website. As part of their browsing behavior individuals use this toolbar to save content as a bookmark on the social bookmarking service. This bookmark is saved as part of the user’s personal set of bookmarks and also shared with the broad community. On choosing to bookmark content the user is provided with an interface to describe the content and assign tags. Users can also bookmark and tag content that has been shared by others. Other features available to users include -

a. The service lists the most recent bookmarks by users that

individuals can browse.

3 http://en.wikipedia.org/wiki/List_of_social_bookmarking_websites#Social_bookmarking

50

b. The service provides users with the ability to add users as part of

their network allowing users to track bookmarking activities of other

users.

c. The service provides a listing of the most popular bookmarks.

Popularity is determined by the number of bookmarks that a piece

of information or content receives over a period of time. If a piece of

content is bookmarked by several individuals in quick succession it

moves higher on the popularity list generated by the system. The

list of popular content changes on a regular basis as content comes

in at a steady stream.

d. The service also provides a list of popular tags that are being used

by users in the form of a “tag cloud”.

Data about bookmarking behavior and tags is collected through 2 methods. delicious.com provides an application programming interface (API) that allows individuals to download bookmarks and tags for pieces of content. This

API is used to create a database of bookmarks, tags and individuals. Del.icio.us is also screen-scraped to collect information about bookmarks, tags and individuals.

Figure 3 depicts the use process of the delicious.com and also the bookmark and tag storage system. Within the delicious.com system, each bookmark made by the user is associated with the tags that are used to describe them and date and time stamps so that bookmarks, users and content is tracked

51

over time. This information is compiled to generate a list of popular bookmarks.

Popular bookmarks represent those pieces of content that have been bookmarked by multiple individuals with increasing frequency. A list of popular bookmarks is published by the social bookmarking systems to represent pieces of content that many users are interested in currently.

52

User Interface System

visits visits website

Individual A Individual B bookmarks bookmarks

Toolbar A Toolbar B

Bookmark A Bookmark B

Tags A, B, C, D Tags X, Y, B, D

Delicious Bookmark Storage System

Individual A Individual B Individual C Individual D

5th Feb 2007 11th Sept 2007 11.50 am 12.30 am 12th Mar 2006 Bookmarking th 5.10 pm 30rd Mar 12 Mar 2006 time and date 2006 9.00 pm 6.00 pm 25th Mar 2006 9.00 pm

23rd Mar 5th Feb 2007 2006 1.00 pm 11.50 am

Website A Website B Website C Website D

Tag A Tag B Tag C Tag D Tag X Tag Y Tag P Tag Q

Tag E Tag F Tag G Tag W Tag Z Tag R Tag R

Delicious Popular List

2 bookmarks in the same day Website D over the period of an hour D e c r e

a 3 bookmarks over 1 week over s

i Website A n several time periods g

o r d

e 2 bookmarks over 5 months r Website B over several time periods

Website C 1 bookmark

Figure 3. Social Bookmarking System Operations

53

1.4.2 Methodology and Research Design

This thesis uses multiple methods to investigate the research questions. It employs a combination of qualitative case study based analyses and quantitative field studies. These studies seek first and foremost to highlight salient characteristics of networks that contribute to popularity and emergent decision- making behavior through the case study and then engage in quantitative analyses of hypothesis testing associated with factors that contribute to popularity and influence decision making associated with content.

The first research question enquires into the characteristics of socio- technical networks that contribute to and distinguish the head and tail regions of the long-tail. This question requires an investigation of the process of how socio- technical networks contribute to the long-tail of popularity and the structures that these processes result in. Furthermore, the question examines a combination of contemporary and historical events as popularity of information content is both driven by past behavior of socio-technical networks and contemporaneous events. This combination of historical and contemporaneous data (Yin 2003) leads us to employ the case study as the analytic method for the first research question. The case study method allows me to examine the set of individual decisions about content as part of a larger emergent decision process within the socio-technical infrastructure of a social bookmarking system. Furthermore, the case study method allows me to examine the social network in combination with the technical infrastructure as the method allows an examination of multiple elements and variables in the analysis (Eisenhardt 1989). The case study

54

method also allows me the ability to leverage several theoretical domains (Yin

2003) ranging from group decision-making to cognition and social network theories to examine the mechanisms and processes present within the emergent popularity process.

The case study analysis explores the static and dynamic characteristics of the network based on the selection of specific cases. Multi-case comparison (Yin

2003) through the case study method allows me to compare and contrast the characteristics of the socio-technical networks. This allows the use of a combination of qualitative and quantitative tools to provide an in depth examination of the phases of the networks as it saturates with individual decisions regarding the case. This analysis identifies transition points in the network and develops state diagrams that aid in identifying the consequences of the introduction of new nodes and structures. The use of state diagrams in the analysis of multiple cases establishes construct validity as described by Yin

(2003) by identifying of the types of changes the multiple cases go through. In the case study analysis I identify cases that originate from different regions of the long-tail distribution of popularity. The sampling of the cases is both for theoretical and replicative purposes. Sampling is both purposive in obtaining the representatives of the entire pool of content and replicative across select cases.

The multi-case study performed here is to elicit contrasting results (Yin 2003) for similar theoretical constructs across all cases. Therefore cases are selected from the head of the long-tail of popularity, from the tail region of the long-tail of popularity and from an intermediate region of this distribution. Cases from these

55

regions allow comparison of the different characteristics of the networks. I also examine how these networks differ from one another over time and identify the specific characteristics that initiate popularity behavior in some cases versus others. The data collected as part of the case study may be considered as archival records in the language of Yin (2003). The analysis techniques employed performing pattern matching, examining rival explanations such as other network concepts and performing a time series analysis (Yin 2003).

Based on the results of the case study analysis, propositions as to the specific effects of network characteristics on popularity and decision-making behavior are developed (Eisenhardt 1989). The case studies provide a window into the cross-sectional and longitudinal behavior of the networks in which content is embedded. Cross-sectional and longitudinal examinations of the case content correspond closely with the within case and cross-case pattern identification described by Eisenhardt (Eisenhardt 1989). While the case studies provide discrete samples which allow for an in-depth examination of bookmarking and tagging behavior of various individuals, it does not allow development of generalizable propositions about the effects of these network characteristics on the broad population of content.

I also employ a field study to complement the case study analysis. The field study builds on the results of the case study by evaluating the theoretical propositions of the case study and complementing the results with theoretical insights from a review of literature. The field study may also be considered as a natural experiment (Shadish et al. 2002) as I employ natural variation in the

56

sample population to create the comparison groups distinguish regions of the long-tail distribution. The field study is conducted on data collected through the delicious.com application programming interface and screen scraping tools. In order to answer the first research question I sample 50 data points from the head of the long-tail distribution and 50 data points from the tail of the long-tail distribution. The sampling of data from both regions of the distribution is to build on the analysis of the cases selected from those regions from the case study and to generalize the results from the case study to the population Furthermore, the selection and analysis through quantitative techniques of more cases allows for both replication and an examination of the means through which content popularity is influenced. The data points for popular content are drawn from the head region or popular content of the long-tail distribution. The selection of data points from the tail of the distribution is performed on content that is not on the popular list at the time of data collection. A characteristic of the long-tail distribution is that the tail portion of the distribution does not display a large variance in the frequency. As such a random sampling of cases from the tail population adequately represents that portion of the distribution. The main sample for the quantitative field study is compared to an evaluation sample to determine the representativeness of the main sample used. A logistic analysis

(Cohen et al. 1983) performed on the 100 data points is used to confirm the effects of specific network characteristics that explain differences between the head and tail portions of the long tail distribution. The second research question inquires about the characteristics of hybrid networks that explain duration to

57

popularity for content at the head of the long-tail distribution. This question is evaluated as part of the quantitative field study. Since the field study is examining a larger sample size than the case study it provides the context within which to develop causal constructs that influence duration to popularity. The sample of 50 data points that originate from the head of the long-tail distribution have variable duration are employed to examine this question. The structural characteristics of the hybrid networks for the 50 data points at the head of the long-tail distribution are used to determine which of them influence the length of dormancy prior to the deflection point. A survival analysis (Efron 1988; Hosmer et al. 2011; Tabachnick et al. 2001) is performed on hybrid network characteristics prior to the deflection point to determine those that successfully explain the duration to popularity. The dependent construct in this analysis is a continuous variable that measures duration to popularity and subsequently survival analysis is appropriate to explain the variation in this construct. Duration to popularity represents a time to event variable and time to event analysis is performed through survival techniques

(Efron 1988). The continuous nature of the duration to popularity variable implies that the effect of independent variables and covariates on the dependent variable may be evaluated.

1.4.3 Validity and Reliability Assurance

Validity is the approximate truth that can be assigned to the inferences made from this thesis. Internal validity concerns the inferences made about the relationships between the causal variables. Internal validity ensures that there

58

are no other explanations to the causal relationships other than variables specified. One manner in which internal validity is ensured is through appropriate sampling procedures. In this thesis indirect manipulation is performed by sampling from two separate regions of the long-tail distribution. Such purposive sampling helps ensure the internal validity of the thesis. Furthermore, to explain away other causal variables such as type of content type, I sample from multiple domains both in the head and tail region of the long-tail or exponential distribution. Sampling from multiple context domains reduces the influence of domain type in this thesis. Another form of validity is construct validity which refers to the accuracy of representation of the constructs in their operationalization. In this thesis the variables are constructed by relying on network tools with pre-programmed formulae for the constructs. Discriminant validity ensures that the constructs being measured are distinct and unique.

Discriminant validity does not apply to this thesis since the constructs and operationalizations of it represent unique characteristics of the hybrid network.

While the constructs might be related they do not overlap.

Reliability refers to the similarity in estimates of the variable if multiple measurements of it are taken. In case of social network data reliability is hard to establish since social networks change over time (Wasserman et al. 1994).

Reliability is also ensured by examining the responses that individuals provide. In this thesis I rely on second-hand network data and subsequently this issue is less relevant. In this thesis, reliability refers to the accurate representation of the network based on the data collection procedures and the valid collection of

59

affiliation network data. I have an established manual method that is used in collection of test data that is complemented by a web script that automates the data collection technique. This web script is the primary technique used to collect the data and supplements the manual technique. Hybrid network matrices rely on

Excel macros and provide a reliable manner to generate affiliation based networks. These procedures ensure reliability in this thesis.

1.4.4 Summary and Discussion

This thesis enquires into the nature of popularity on social bookmarking services. Two specific questions into the nature of popularity are posed in this thesis; the first question inquiries into the defining characteristics of popular vs. non-popular content on social bookmarking services and the second question inquiries into the defining characteristics that determine or explain the duration to popularity on these social bookmarking services. These questions are answered through the use of a combination of a case study and the use of a field study.

Both these studies shed light on the nature of content in social bookmarking services. Specifically they highlight the changing role of tags and individuals and network structures in influencing content popularity.

Case Study Summary

A case study was conducted on four unique and distinct cases of content popularity growth which originated in different regions of the long-tail distribution.

These cases are employed as representative instances of broader classes of

60

typical content in social bookmarking. In addition to originating from different regions of the long-tail distribution, the cases also have unique characteristics in terms of their popularity, their duration to popularity, their tags and their individual and network structures. These distinct and unique characteristics’ allowed me to compare and contrast various types of content in terms of its popularity. The case study analysis employed then a combination of qualitative and quantitative analyses to examine the differences between the cases. Two types of quantitative analysis were performed on the cases: a structural analysis and subsequently a dynamic analysis.

All content networks are characterized by individual-tag combinations that create hybrid networks. These individual-tag combinations vary across the cases and may influence the popularity and duration to popularity. The structural analysis identifies 3 types of content based on the case selection i.e. the popular, the slow popular and the non-popular. The first type of content is the popular content or the “sprinter”. Sprinters rise to popularity and when they do so they rise very quickly. Sprinters are characterized by individual-tags combinations that are very tightly knit whereby individuals connect to similar tags or similar combinations of tags. These connections established between individuals and tags assist individuals in developing scaffolding that structure their cognitive domain. This cognitive scaffolding is shared between individuals and forms a tight network through which understandings and decisions about content are developed and shared. The second type of content is the slow popular or the slow winner. These type of content rise to popularity but take a larger amount of

61

time than the winners to rise to popularity. Such slow-winners take several months to rise to popularity and sometimes exhibit spurts of bookmarking activity that may be mistaken for popularity but are not so. Similar to the sprinters, the slow winners also rely on individual-tag combinations to form the hybrid network.

These combinations of individuals and tags however do not rapidly gather new nodes. The cognitive scaffolding generated by the tag and tag combinations is unstructured without a central nucleus or nuclei to bind them. Consequently, without a unified cognitive structure to bind the individual-tag combinations of the hybrid network, multiple disjointed individual-tag hubs dominate the long duration prior to popularity in such content. Popularity emerges in the slow winner when the disjointed individual-tag hubs develop bridges between them that integrate the disparate cognitive structures. The third type of content identified on social bookmarking services is the unpopular content or the dead-ducks. Dead-ducks do not display bridges between disjointed individual-tag hubs. Dead-ducks in fact display a greater number of isolated nodes and individual-tag combinations than the sprinters and the slow-winners.

Findings from the case study support discussions in information systems and organizational literature and modify them in the context of popular content in

IT mediated contexts. While Burt (Burt 1999; Burt 1997b; Burt 2000) has established the importance of “bridging nodes” and network connectivity to social capital, the findings from the case study suggest that types of “bridging nodes” contribute to the emergence of content popularity. Specifically, nodes of type

“individual” inhibit the emergence of the content as popular while nodes of type

62

“tag” contribute to the emergence of popular content. This is especially evident in the case of the “slow winner” where the content rose to popularity after a node of type “tag” has a relatively high betweeness score and a prominent position in the network. In addition to supporting and extending Burt’s work, this study also suggests that network closure i.e highly connected networks (Coleman 1988;

Coleman 1990), followed by a weakening of the network bonds, can lead to content emerging as popular. This finding is evident in the case of the “slow winner” where network connectedness has a high steady value but drops before the content rises to popularity. This in effect reverses the process of network closure argued by Burt (Burt 2000) who suggests that networks with structural holes move towards closure after the establishment of bridging ties. This study also argues against the benefit of highly connected networks for the spreading of information. Highly connected networks resist to adding new nodes and developing new relationships. Information flow within such networks is stagnant and restricted to the existing set of nodes and the growth pattern in such networks is slow and steady. This conclusion is evident from the analysis of the network surrounding the case of the “dead duck”.

While the findings above extend and complement existing work on network characteristics and their influence on individuals and organizations, it does so in the context of content popularity and as such the findings need to be examined in this light. While the substantive literature has little to say about the influence of network characteristics on content popularity they do discuss how specific network structures act to generate emergent outcomes (Barabási et al.

63

1999b; Katz 1993; Monge et al.). This case study in a similar vein attempts to highlight how network and node characteristics shape the emergent outcome of content popularity in online social bookmarking contexts.

Quantitative Analysis Summary

The quantitative analysis is performed on 100 unique and distinct samples of content. These cases were selectively sampled from the head and tail region of the long tail distribution. In addition to originating from different regions of the long-tail distribution, the cases also have unique characteristics in terms of their popularity, their duration to popularity, their tags and their individual and network structures.

The results show that networks in the head region of the long-tail have a greater number of clique structures that those in the tail region. Network cohesiveness, however, is not significantly different between the two. This suggests that networks in the head region are likely to have a varied group of individuals that associate with the disconnected regions of these networks but never get truly connected to the core of the network, and remain peripheral. Such clusters exist in both the head and tail regions of the long tail distribution. The results also indicate that networks with low cohesion scores, indeed, determine the duration to popularity. Popular content achieve popularity within a shorter duration when the networks where the content is embedded have low network cohesion scores.

64

The results also expand the findings and discussions in the research on networks, long-tail and popularity (Fleming 2007). Network cohesion has an inverse and significant relationship with duration to popularity and confirms the results of the case study which showed that high network connectedness was a predominant feature among non-popular content.

Overall, the thesis verifies several findings in the literature and expands them. To the network literature, the result supports on Coleman’s argument

(1986) on network closure and the value of tightly knit networks on information diffusion and information redundancy. Tightly knit networks are not open to new sources of information and resist change to new entrants. In contrast, less cohesive networks are able to incorporate new sources of information through the entry of individuals as brokers (Burt 1997). Burt accordingly (1992a) suggests that weak ties to a closed network by nodes allow networks to be bridged. These weak ties, when established by critical nodes, expand the network and change the nature of the network evolution.

This thesis also examined the role of a technical artifact, tags, in the evolution of socio-technical networks. The concept of socio-technical networks has been developed in the literature by Monge and Contractor (2003), Yoo et al

(2007), and others. The technical artifact in most cases has been a physical device in the form of databases, CAD systems, health information systems that mediate relationships between individuals. In this thesis I theorize on the role of a cognitive artifact in a virtual space, i.e. tags. These cognitive artifacts serve as a mode of communication and coordination device among distributed actors in a

65

similar manner as Post-it notes on a fridge communicate and coordinate activities among family members. We find that these tags form the nexus around which other nodes organize themselves through high levels of centrality and connectedness.

This thesis also contributes to literature on the long-tail and popularity.

The extant literature (Brynjolfsson et al 2006, Lew 2008, Anderson 2006, Elberse

2008, Rubinson 2008, Shepherd 2008) on long-tail has examined its characteristics, but not its generative mechanisms. This thesis theorized that socio-technical networks contribute to the mechanisms that lead to long-tail behaviors. These findings form an important contribution to the literature on the long-tail as they examine for the first time generative mechanisms rather than focusing on economic consequences.

1.5 Conclusion

In conclusion, the goal of this thesis is to examine the popularity of content through a structural lens of networks. I have briefly reviewed literature within the domains of group decision-making, cognition, network theory to examine the role that specific characteristics have on decision-making regarding information and content.

I adopt a structural perspective on decision-making regarding information and content on social media sites and web 2.0 platforms. I examine the validity of the structural approach through the use of a case study and a longitudinal field

66

study. I have described the methods, tools and techniques employed in both studies and have summarized the findings that have emerged from them. I conclude by discussing the implications of the findings to both research and practice.

In the following sections of the thesis I proceed to describe and conduct the case study, setup and conduct the field study that relies on data collected from 3rd part websites and . I conclude with a discussion of the implications of my findings based on both the case study and the field study, for research and practice. I also discuss the limitations of this thesis in both the theoretical and the methodological domains. I discuss some of the other methodologies that may have been applied that could illuminate the phenomenon in a different light. I also discuss the extensions that may be made to this work as part of future research.

67

2. Case Study: Racing to the Head: A dynamic

analysis of long-tail networks in Social

Bookmarking Services

2.1 Introduction

The Internet has radically changed the way we create, store, search, and consume information. YouTube, for example, has emerged as a platform where anyone with an Internet access and an inexpensive digital video camera can create and upload short video clips and compete with traditional broadcasting companies. Similarly, millions of bloggers offer their news report, commentaries, novels and poems. Indeed, the Internet has leveled the field and given millions of previously unknown individuals and obscure small entrepreneurs a platform to express their ideas, promote their products and offer their services. However, at the same time, it is also known that there is only a disproportionately small percentage of information available on the Internet ever reach to what is known to be the “head” of the distribution (Anderson 2006a; Brynjolfsson et al. 2006), most of the them remained in the “tail” and are only looked at or heard of few times, in any. Often dubbed as “the long tail” (Anderson 2006), this represents

68

an ordered frequency distribution, from high to low, on a set of information, content or object (Brynjolfsson et al. 2006)4.

Studies examining such distributions have typically looked at the effect of the long-tail on the consumption of goods and information that exist at the tail region of the distribution (Anderson 2006a; Brynjolfsson et al. 2006; Elberse

2008; Rubinson 2008). Other studies have looked at characterizing populations of specific phenomenon as long-tails (Lew 2008; Shepherd 2008). Very few studies have examined the factors contributing to the distribution itself and have done so in very specific contexts such as that of innovation (Fleming 2007). What is unknown yet is precisely how some of the contents emerge in the head, while others remain in the tail. Are there any systematic differences between the contents in the head and contents in the tail? In this study, we take a network perspective (Barabasi et al. 1999; Burt 1992b; Monge et al. 2003b) in answering these questions. In particular, we look at the dynamic patterns by which certain contents emerge as head while others remain in the tail. Specially, the primary question that we seek to address is: What are different dynamic patterns of evolution of on-line contents to emerge (or fail to emerge) in the head region?

What are the characteristics of networks that affect dynamic patterns?

We examine the question in the context of a social bookmarking service where individuals assess the value of a particular content by independently

“voting” by bookmarking the content using a shared bookmarking service.

4 In the balance of the paper, we will simply focus on on-line information. However, the same basic principles apply to other forms of on-line contents, products and services.

69

Contents that received a large number of “votes” form the head of the distribution. We conduct multiple case studies, taking advantage of longitudinal archival data of five different contents on a popular social bookmarking site of delicious.com. Case studies allow us to approach the data in a grounded manner and perform an in-depth longitudinal analysis of specific instances in the population rather than the population itself.

The next sections describe literature that has examined similar questions theoretical development based on this literature. We then describe the method employed in this study, describe and summarize their findings. We conclude by discussing the implications of the finding to what we currently know about the question and compare it with previous findings.

2.2 Literature Review

In this review of the literature I shall examine how socio-technical networks have been examined in the context of the diffusion of information and the implications of network characteristics on this process.

2.2.1 Network and Diffusion of Information

Over decades, network scholars have examined how structures of networks contribute to information flow and social influence in various settings. A network consists of nodes and ties. Past research on networks has found that node characteristics play very important role on the social influence process. In particular, past studies found centrality of a node – the degree to which a node is

70

connected to other nodes in the network –is important as it characterizes the immediate consequence of the presence of an actor in the network. For example, early adopters of technology have greater power and centrality in organizational networks and that centrality influences adoption and diffusion processes

(Burkhardt et al. 1990). In addition, network scholars have found that network’s overall structural characteristics including diameter, density, and hierarchy play also important role in the behavior of the network (Wasserman et al. 1994). One variable of particular importance is cohesiveness that describes the extent to which the network is connected. For example, cohesiveness influences the status of individuals in organizations (Lamertz 2006). Information and knowledge is spread to a greater number of nodes, if the network is connected (Nerur et al.

2005). Thus the network characteristic of cohesiveness can play an important role in influencing the region of the distribution in which the content and information is present and thus influence team performance.

Network scholars have also found that tie strength between nodes also play important roles. For example, Carpenter and Westphal (2001) find that ties among board members contribute to information flow and influence decision- making of individuals. Furthermore, these ties result in structures that explain the similarity in organizational behaviors and the mimetic nature of organizations

(Carpenter et al. 2001; Pfeffer 1972; Westphal et al. 2001). Patterns of ties result in interlocking structures that explain the diffusion of information across organizations and help reduce uncertainty through the sharing of information and pooling of resources. Similarly, organizational behavior in response to turbulent

71

environments is based on patterns of network ties among corporate board members (Haunschild et al. 1998). Tie strength is again found to influence the speed at which information disseminates within social networks specifically strong ties and that kin ties in particular tend to spread information faster than non-kin ties (Lai et al. 2002). Taken together, networks studies have shown than information flow in networks is influenced by centrality of key actors whose behaviors are mimicked by others. And the speed of information flows in these networks is influenced by the cohesiveness of the network and the strength of ties.

2.2.2 Network and Viral Marketing of On-line Information

Recently, the popularity of on-line information has been examined in the marketing area. Researchers have started looking at the networks in which the on-line information is embedded in order to understand its viral behavior

(Goldenberg et al. 2001; Subramani et al. 2003) This is a departure from their traditional focus the characteristics of the product itself (Stafford 1996). The intuition behind this move is that relationships in networks provide a form of validation that signals the value of that product, information or content. For instance, interpersonal networks are especially important when certain types of innovations and products being marketed (Beattie 2002) especially where the direct experience of adoption is less important (Lai et al. 2002). Furthermore,

Steyer et al. (2006c) find that network characteristics rather than critical mass

72

effects are important for the continuing diffusion of information among consumers in word-of-mouth networks on the internet.

As the diffusion of content in such networks relies on individuals to a large extent, a careful initial seeding of information is important to on-line viral dissemination (Phelps et al. 2005). Vilpponen et al. (2006) found that the centrality of initial target node in the network and the strength of the strength of the tie influence the diffusion of on-line information. Furthermore, studies found that the characteristics of influencers plays an important role in viral marketing as well (Subramani et al. 2003). In addition to central individuals, secondary and tertiary customer strings also contribute towards the spread of information

(Hogan et al. 2005). In summary, the key to the effective dissemination of on-line information lies in the hands of few central individuals and the strength of their relationship.

In addition to network characteristics, cues in information and social environments such as credibility indicators or seller-buyer rating can also influence the diffusion of information. For example, credibility indicators of content such as ratings are indicators of quality and shape search and evaluation of that content and serve as cues in the content space (Poston et al. 2005).

Ratings of buyers and sellers on e-commerce sites influence the number of customers communicating with them. In fact negative ratings were more influential in shaping behavior than positive ratings (Standifird 2001). In fact, the success of rating systems as effective indicators of quality determine the success of such e-commerce based stores (Komiak et al. 2008).

73

Although these studies based on a network perspective have provided useful insights on viral nature of on-line information diffusion, they do not address the problem of long-tail. The phenomenon being addressed distinguishes content that succeeds from that which fails. While marketing literature examines successful viral campaigns, it does not address which characteristics of networks distinguish a successful one from a failed one. This distinction is not addressed by literature on networks that typically examines only one form of content and information. Furthermore, viral marketing focuses on the role of word-of-mouth networks in traditional media based campaigns and not the role of information technology mediated networks. Furthermore, the causal mechanisms that contribute to the phenomenon of content popularity and the long-tail has not been examined in the literature. Most of the research done in this domain examines the implications of the long-tail and content popularity. In contrast, this research focuses on the mechanisms and structures that contribute to and generate the long-tail distribution. We now discuss the theoretical concepts that we use in our case analysis. Appendix A describes a few of the theoretical concepts discussed in this paper.

2.3 Long-tail in Social Bookmarking Services

Long-tail distributions for on-line information can be seen in sites like

Amazon.com where purchase decisions by individuals are influenced by previous purchases by others (Anderson 2006a; Brynjolfsson et al. 2006). A long-tail distribution, thus, emerges from distributed individual choices. The emerging

74

pattern of these distributed decisions can be depicted as a histogram. The long- tail distribution can be segmented into two regions: a head that contains objects with a high frequency and a tail that contains objects with lower frequency

(Brynjolfsson et al. 2006).

In this study, we focus on the long-tail distribution in social-bookmarking services that are used in the identification, adoption and classification of on-line information by distributed individuals. Traditionally, classification systems are repositories for ordered information generated by the members of a community.

The act of classification itself is the sorting of objects, artifacts, ideas into distinct buckets (Bowker et al. 1999). In addition to ordering the world, classifications are also cognitive devices through which communication is enabled (Star 2002).

Thus classifications are decision support devices as they support the coordination of activities of distributed actors.

Tagging and folksonomy are concepts used to describe open-ended classification systems that are used in social-bookmarking services. A folksonomy is the classification scheme that is generated by a distributed group of individuals for the knowledge domain. Folksonomies rely on ‘tags’ or user descriptions of information or knowledge as the classification scheme. The process of assigning tags to information is known as ‘tagging’. As distributed individuals sharing a domain generate folksonomies the tags that they can use to describe the domain with pieces of content is diverse and open-ended.

Folksonomies relate tags to an individual’s understanding of the knowledge domain. Folksonomies have value as they are shared and co-constructed within

75

a community. The folksonomy emerges as multiple individuals participate in creating a shared and emergent topological classification scheme.

Tags used to construct folksonomies are cognitive repositories for the individuals making decisions. Such cognitive devices have been considered as technological artifacts (Argyres 1999b) in literature dealing with distributed cognition (Hutchins 1991; Hutchins et al. 1996) and information classification

(Bowker et al. 1999; Star 2002). These cognitive artifacts persist over time and influence individual behavior. As cognitive artifacts, “tags” are interpretations by individual and serve as extrinsic information cues to the nature of the content

(Herbig 1996). As they are a type of classification, they mediate the relationships between the social actors that create them. Similar to classifications schemes that are generated through communication within and across communities of practice, folksonomies are also generated through communication. The communication takes place by the articulation of the individual taxonomies by the individuals in the community. This process takes place over time as individual members articulated, apply, evaluate and re-articulate the emergent folksonomy.

The information space in a social bookmarking service is comprised of individuals who use tags to make sense of and systematize this space. As a consequence, the information space of social bookmarking service inherently forms a multi-modal network, consisting of individual actors and tags as nodes and ties among them. The multi-modal network is based on the communication among individuals through these tags. Thus, the formation of ties in this network is mediated through the tags. This multi-modal network in social bookmarking

76

systems is a form of affiliation network (Monge et al. 2003b). An affiliation network is formed by the participation of individuals in organizations, events and other social activities. Their participations create relationships based on an overlapping occurrence of the same individuals at different events. Analogous to such affiliation networks, we can construct a multi-modal network of the information space in a social bookmarking service based on the affiliation based relationships that individuals have with tags used for on-line information.

We characterize the combination of the social actors and technical artifacts as a multi-modal network. By multi-modal networks I refer to networks that have a combination of social actors and technological nodes. Social actors are nodes that make decisions about content. These actors are of the type typically represented in formal social network analysis. By technological nodes we refer to “tags” or the structure of a classification scheme. The characteristics of these multi-modal networks influence the decision-making behavior of individuals and we explore node and network characteristics as explanatory factors in the emergence of long-tail behavior.

Node Class: Node class refers to the type of nodes that are present in the network and the role they have in shaping the movement of specific content to popularity. Node class refers to the two specific types of nodes present in the network. The first type of node is the individual. The individual is the bookmarker and poster of content on the social bookmarking site. The other type of node is the technology node embodied by the tag. The tag is the technological artifact that individuals use to articulate and convey their cognition. The relative network

77

characteristics of these two types of nodes influence the rate of change for the content and the region of the long-tail distribution that the content is situated in.

In the next section I examine the contributory role of these node and network characteristics in the long-tail of content through the methodology of case study.

2.4 Research Methodology

We use the case study method to answer the research questions. Case studies are an effective tool in the study of networks over time since the collection of quantitative data about longitudinal networks poses several challenges (Wasserman et al. 1994). Longitudinal network studies suffer from the loss of responders over time, insufficient responses in initial and subsequent periods and limited number of analytic techniques to analyze the data. To mitigate against these challenges, longitudinal studies take the form of simulations where the real network data serve as base parameters for the simulation (Carrington et al. 2005). Longitudinal network studies (Fowler et al.

2008; Kossinets et al. 2006; Snijders 2002) also examine either an instantiations of a longitudinal network and effect of its characteristics on certain outcomes or they focus on the population of networks and extrapolate specific characteristics of these populations. These challenges argue for the use of more detailed data- based methodologies, which is provided by the case study methodology (Yin

2003).

This study examines instantiations of networks rather than the population of networks. It focuses on the characteristics of this instantiation and relates it to

78

the long-tail outcome. The goal of this study is to get an in depth understanding of the characteristics of the network that contribute to the emergence of popular content. Case studies lend themselves to this purpose as they enable researchers to ground themselves in the data and perform an in-depth analysis

(Eisenhardt 1989) on a small enough sample size that allows them to reach theoretical generalizations that can be later extended to larger data sets or the population (Yin 2003). This form of analysis is also similar to that found in clinical case studies (Hersen 2002). The goal of clinical case studies is to demythologize what happens during a case and to present to readers what really happens during the clinical process. Based on the case presentation the clinician discusses assessment and analytic strategies and how they fit into a larger theoretical scheme (Hersen 2002). Clinical cases are also used as teaching tools for reflection and discussion which is subsequently comparatively analyzed with the treatment or service provided.

The social bookmarking service that serves as the study site for this paper is del.icio.us (http://del.icio.us). This site tracks bookmarking behavior of individuals and associates bookmarking behavior with specific content and tags.

Multiple bookmarks are made by individuals in the social bookmarking systems where the act of bookmarking is a determination made by individuals as to the value of the content. These bookmarks represent a choice of select pieces of content among many others. Thus bookmarks made by individuals in the social bookmarking system represent a decision on the value of the content to the individual. Emergent distributed decision-making behavior is visible in the

79

materialization of collective decisions on tags and content through the popular lists.

A data set of 18 cases was compiled to confirm the long-tail nature of content on the social bookmarking site. These cases represented the broad sample from the long-tail distribution and based on the representational distribution of these cases I conclude that the population of content on the social bookmarking site is part of this distribution. The exponential distribution of the long-tail for frequency of bookmarking is depicted in Figure 4.

Figure 4. Long-tail distribution of bookmarks

From this sample of 18 cases I used systematic and representative sampling to generate the list of cases for the purposes of this study. Sampling for the cases studies is based on 2 characteristics, the region of the tail the content comes from and the rate of change as it moved to popularity. One of the characteristics that the sampling is based on is the dependent variable used in the study and emerges from the long-tail distribution. The dependent variable in this study is that of popularity and its operationalization is based on the region of

80

the long-tail distribution that the content is present in. Thus the sampling strategy operationalizes the dependent variable in this study which is content popularity.

The sampling strategy can be operationalized as a decision-tree depicted as in

Figure 5. The decision tree depicts the broad area of the distribution that the population of content can be present in.

Content

Popular Unpopular Content Content (High region) (Low region)

Case 4

Short Long duration duration (fast) (slow)

Case 1, 2 Case 3

Figure 5. Decision tree for case selection

The region of the distribution that the content comes from is characterized as “low” and “high” where the “high” region is the 20% that represents the head of the distribution and the low represents the remainder of the 80% that did not achieve popularity. The rate of change attempts to characterize the period as content moves towards popularity. The rate of change can either be “fast” or

“slow” as content moves to popularity.

For the purpose of this study I examine 4 cases from the distribution to answer the research question. These cases were chosen as representative of the specific characteristics. The cases are listed in Table 2.

81

Table 2. Case Categories and Description

Case Content Title Content Description Category

Sprinter Big Brother Protect (BBP) Set of software tools for protecting individual privacy on the WWW

Sprinter Firefox Memory Leak Fix A technical description of how to correct memory (FML) leaks in the Firefox web browser

Slow Greek God Family Tree A parentage tree for Greek gods Winner (Ludios)

Dead Duck Camcorder Info A site providing camcorder reviews

I use contractions of the case names such as FML, BBP, Ludios as I

discuss the cases in the remainder of this paper. I have selected one case each

branch of the decision tree, the unpopular (dead ducks) the popular but slow

climber (slow winners) and have selected 2 cases for the quadrant of the popular

(sprinters). While I argue that network characteristics account for differences in

rate of change and region of long tail distribution, it may also be due to

exogenous factors such as timing of content introduction. To reduce the influence

of exogenous factors, I use 2 cases for the sprinters as the selection of cases at

different points in time help mitigate against exogenous environmental factors.

The research question for this study primarily addresses the difference in

content that falls within the head or tail of the long-tail distribution. While some of

the cases fall in the other quadrants, the analysis does not specifically discuss

the differences across those cases but it might highlight some differences.

2.4.1 Analytic Method

82

I perform exploratory analysis of these cases to examine the characteristics of networks that contribute to the emergence of popular content.

In this exploratory analysis I look at node and network characteristics and their change to explain differences in popularity. Specifically I look to explain the popularity of content and the rate at which this popularity emerges. I employ several network measures from literature to examine these cases. The cases were first analyzed by using measures of characteristics that were dominant in the literature (Wasserman et al. 1994). Measures of node and network centrality have been used in the literature to explain the opinion formation (Burt 1999), information diffusion in groups (Buskens et al. 1999; Cavusoglu et al. 2010;

Romero et al. 2011) and thus a centrality analysis can shed light on the popularity of content. Measures of node characteristics such as degree centrality, network characteristics such as density were employed before converging on the characteristics described below. Discarding and selecting specific characteristics were based on the extent to which they were theoretically justified (Yin 2003) and served to explain the research question.

Node Characteristics: The operationalization that I choose for node characteristics is that of betweeness. Betweeness measures the position of the node in relation to the other nodes that exist on the network. Nodes that lie on paths that connection sections of the network together have a higher betweeness score indicating they have more power and influence in the network. I use this measure rather than degree centrality since it provides a relative measure of

83

centrality versus degree which provides an absolute measure of the node connectivity.

Node Class: Node class refers to the two specific types of nodes present in the network. The first type of node is the individual. The individual is the bookmarker and poster of content on the social bookmarking site. The other type of node is the technology node embodied by the tag. The tag is the technological artifact that individuals use to articulate and convey their cognition. These relative network characteristics of these two types of nodes influence the rate of change for the content and the region of the long-tail distribution that the content is situated in.

Network Characteristics: I operationalize network structure through the use of the connectedness measure. Connectedness describes the extent to which the different parts of the network are linked to each other. Higher connectedness measures suggest that the network is connected to a greater extent and also suggests that a greater number of paths exist between nodes in the network.

The characteristics discussed above are developed for several points in time as the cases moved to popularity. In the case of the “sprinters”, these network characteristics were developed for all days due to the relatively short duration over which the content moves to popularity. Thus for the cases of BBP and Firefox memory leak, I develop the measures for several points in time as the content moved to popularity. However in the cases of the “slow winners” and

84

the “dead ducks” the time points were chosen based on the sampling increments of 25% over the duration measured. While the specific time points are not at exactly 25% increments they are at the time points that had bookmarking activities closest to the 25% increments.

2.4.2 State Diagrams

In addition to the use of node and network characteristics to analyze the cases, I use state diagrams (Ali et al. 2001; Kim et al. 1999) describe the socio- technical nature of the social bookmarking system. State diagrams are typically used to describe the behavior of systems. The major components in such diagrams are the states through which the objects or elements of the system.

Another component is the transitions that the state goes through as represented by directed arrows. Thus the states and the transitions together represent broad structural dynamic features of the system. While typically only one object is represented in state diagrams, the analysis below bring together several objects that exist in the space of a social-bookmarking system. The steady state of the social bookmarking system and the objects is depicted in figure 6.

85

Steady State of a Bookmarking System

Content

[bookmarked]

s e

k b

r i

a r c

m s

k e o D o B Enter, Register, Login Network [Unconnected]

Individual Tag [bookmark] [assign] Uses

Figure 6. Steady State of Social Bookmarking System

We see from the steady state that there are 4 objects that are present in this socio-technical system, tags, individuals, content and the network. Tags refer to the descriptive identifiers that are used to label and categorize content.

Individuals are the actors that perform bookmarking activities in the system and

Content refers to the information that is present in the system and the bookmarking activities are performed on. The steady state of the system starts with the user entering, registering and logging into the social bookmarking system. Subsequent to this the individual bookmarks the content and uses tags to describe the associated content. Individual and Tag states are recursive since the bookmarking activity of any individual generates additional bookmarking activity in the system and the use of any tags generates further use of the tags by individuals and creation of new tags. The states of the various objects that are part of the system are depicted below in figure 7.

86

Content Content Content Content States [bookmarked, [bookmarked, [bookmarked] popular] unpopular]

Individual Individual Individual [bookmark, [bookmark, Individual States [bookmark] central] peripheral]

Tag Tag Tag [assign, [assign, Tag States [assign] central] peripheral]

Network Network Network States Network [Increasing Network [Highly [Unconnected] Connectednes [Connected] Connected] s]

Figure 7. Sate of various Objects in the Social Bookmarking System

I now describe these 4 cases and subsequently characterize the dynamics of the networks the content is embedded in as they moved towards popularity through the use of state diagrams.

2.5 Case Descriptions

2.5.1 Case 1: Big Brother Protect (Suite of Security Tools)

Big Brother Protect (BBP) is a suite of security tools that is freely available on the internet. This piece of content was introduced to the social bookmarking site delicious in the month of May ‘06. This set of software tools and services were made freely available to the average individual who was concerned about

87

their privacy and wished to anonymize themselves on the internet. This piece of content took 7 days to emerge as popular on the social bookmarking website and the evolution of the socio-technical network over those 7 days is depicted below in figure 4. The different time periods for the case are described by the heading of T0-T6 for this case. This case belongs to the bottom right quadrant or is a

“sprinter”. It achieved popularity very quickly, i.e. in 7 days, and belongs to the head region of the long-tail distribution.

88

T0 T1 T2

T3 T4 T5

89

T 6

Figure 8. Big Brother Protect

90

The images in figure 8 display the evolution of the multi-modal network for

BBP over the period of 7 days as it moves towards popularity. This network reaches popularity in a week and does so at an exponential rate. It starts out with on the creation of a bookmark about the content by a user. The use also has a choice to assign tags to describe that piece of content. Over the next 3 time periods, T1 to T3, this individual is joined by others in describing the same piece of content using different sets of tags that subsequently results in 4 individuals isolated from each other using 4 separate sets of tags. The individuals in the above images are depicted through the “red circles” and the tags are depicted in the images are depicted through the use of “maroon squares”. At T3 we have 4 separate sets of individual-tag combinations as the foundation for the multi-modal network to develop around. At T4 we see that that there is substantial development in the network as individuals attach to specific tags that and broaden the existing individual-tag network. T5 shows the exponential and rapid growth of the individual-tags networks with a rapid infusion of individuals bookmarking the content and developing the individual-tag multi-modal network that is continued in period T6. T5 and T6 also show that the network has stabilized around certain central tags and individuals in terms of core-periphery of the network. The network which was dynamic and in flux until period T4 stabilized rather quickly as a core network of individual and tags was established.

Node Characteristics

An examination of degree centrality for the period of T0-T4 is not informative since the network is to a large extent disconnected. Nodes (tags and

91

individuals) are not connected to each other making the degree centrality measure meaningless. Degree of centrality for the nodes from periods T4-T6 indicates that the degree centrality for the nodes increases over this period while the increase is substantial between T4 to T5 and slower from that point on.

Betweeness as a measure of centrality provides a measure of greater significance to the connectivity of nodes in the network. Betweeness in the network is established for T4, T5 and T6 when large parts of the network are connected. This provides greater validity than in earlier periods when the network is disconnected.

Table 3. Betweeness measures for BBP

Day4 Day5 Day6

Between Between Between Nodes ess Nodes ess Nodes ess

Privacy 0.562 privacy 21.473 privacy 23.53

Encryption 0.444 encryption 11.1 encryption 12.301

Security 0.292 Security 10.101 Security 11.484 tools 0.24 tools 5.891 tools 6.622 del.icio.us/darkshado del.icio.us/janb w46236 0.212 web 2.72 oi 3.826

del.icio.us/maje del.icio.us/shoroco 0.17 lla77 2.044 web 2.857 del.icio.us/stanklees 0.101 mobile 2.025 mobile 2.229 tech 0.093 usb 1.817 usb 1.876

del.icio.us/maje usb 0.088 firefox 1.554 lla77 1.702

del.icio.us/guru hacks 0.085 tc 1.423 firefox 1.598

92

The table above shows the betweeness of the top 10 nodes for days 4, 5 and 6. We see that both individuals and tags contribute to the betweeness of the network with tags dominating the list. The top 4 nodes are tags and remain consistently as the nodes with high betweeness scores. However we see that actors and tags interchange positions as significant betweeness scores over days 4 to 6. We see that actor “darkshadow46236” has a significant betweeness score as the network starts to grow on day 4 but over the next 2 days his score diminishes in importance as other actors establish connections and start to play central roles in connecting the network together. Similarly “shoroco” and

“stanklees” also move from central roles to lesser ones in the multi-modal network. Actors such as “majella77” and “gurutc” play a greater role in the betweeness of the network on days 5 and 6 as the network saturates. We see one of the early individuals in the formation of the network, “stanklee” continue to play a significant role in the network over days 4-6. Thus while in using degree centrality as a measure of node significance, we find that tags emerge as significant nodes with growing number of attachments, betweeness centrality highlights the importance of nodes as the network grows relative to other nodes in the network.

Node Class

The two classes of nodes, tags and individuals, have differing roles in the network depicted. An examination of the diagrammatic representation of the network over the seven days reveals that the tags play a significant role as

93

central actors in the network. This is evident from an examination of centrality measures as well since tags represent over 70% of the high betweeness nodes over days 4,5 and 6. Tags dominate as the nodes that have high values for degree centrality. In period T4, tags comprise 10 of the top 20 nodes with the highest betweeness centrality. This proportion increases in periods T5 and T6.

There is however movement within the tags themselves as tags that are more general dominate over those that are more specific tags such as “firefox” that relate the toolset to a specific browser are relegated to secondary status as ones such as “Security” are more general and describe the toolset in more broad terms.

Network Characteristics

The table below provides the connectedness measure for network connectivity over time.

Table 4. Network Connectedness measures for BBP

day0 day1 day2 day3 day4 day5 day6

Connectedness 0 0 0.0001 0.0001 0.018 0.5276 0.5872

The network characteristics indicate that there is a significant increase in the connectedness of the network from day 4 to day 5. This increase in connectedness can be traced to the increase in betweeness scores of a few key nodes in the network which a number of individuals connected to. Specifically the increased betweeness of those nodes subsequently influenced the connectedness of the network as reflected in the measures. Thus the primary

94

factor in moving this content to popularity was the presence of a node or a set of nodes that bridged critical paths in the networks and increased network connectedness.

2.5.2 Case 2: Firefox Memory Leak (A solution guide)

This case is that of “Firefox Memory Leak” (FML), a problem associated with the popular web browser Firefox. This web browser has a problem that made it consume increasing amounts of memory over time as the application continued to run. This problem resulted in several browser users to uninstall the application. However, a few evangelists of Firefox developed a work-around this problems by supplying information to users about a manner in which to control the memory consumption of the browser. This piece of content described the method for process. This piece of content took 6 days to emerge as popular on the social bookmarking website and the evolution of the socio-technical network over those 6 days is depicted below. This case belongs to the bottom right quadrant or is a “sprinter”. It achieved popularity very quickly, i.e. in 6 days, and belongs to the head region of the long-tail distribution.

95

T0 T1 T2

T3 T4 T5 96

T6

Figure 9. Firefox Memory Leak

97

Figure 9 displays the evolution of the multi-modal network for Firefox

Memory Leak over the period of 7 days as it moves towards popularity. This network reaches popularity in a week. As we see, at time T0, several users bookmarked the piece of content and added several tags describing its function.

The individuals in the above images are depicted through the “red circles” and the tags are depicted in the images are depicted through the use of “maroon squares”.

Node Characteristics

As with the previous case, in this case, the degree centrality of tags outweighed that of the individuals. Over the period of a week, tags accumulated greater number of relations than individuals. This was expected since the individuals establish relationships with the tags and the network connects through them. Again, we look to betweeness to provide an explanation as to the significance of specific nodes in the network. The table below describes the betweeness scores for the 7 days that it took for this content to reach popularity.

Time 3, 4 and 5 are excluded since there were no nodes added to the network for those dates. The betweeness scores show us that tags play an important role in connecting the various regions of the network with few tags such as “firefox”,

“memory”, “tips” and “Hacks” playing a significant role as the network continues to evolve. However, we also see that the relationships that the individuals establish with specific tags also change their significance in the network. We see

98

that users “blndcat”, “5ndime” and “Hervey” establish their centrality in the network that prevails as the content moves to popularity. They do this by establishing connections to the appropriate tag combinations.

99

Table 5. Betweeness measures for FML

Day0 Day1 Day2

Betweene Betweene Betweene Nodes Nodes Nodes ss ss ss

reference 0.011 firefox 0.08 firefox 0.155

http://del.icio.us/iketola 0.008 reference 0.027 reference 0.028

http://del.icio.us/anushar http://del.icio.us/anushar firefox 0.006 0.019 0.023 aji aji

http://del.icio.us/Runky.F http://del.icio.us/Runky.F memory 0.006 0.015 0.02 unky unky

http://del.icio.us/dongha 0.005 memory 0.015 memory 0.019 ima

http://del.icio.us/mcdav http://del.icio.us/buddydv http://del.icio.us/buddydv 0.005 0.01 0.012 e d d

Day 6 Day 7

firefox 27.537 firefox 40.841

memory 2.214 memory 3.443

tips 1.613 tips 2.641

Hacks 1.365 Hacks 2.296

http://del.icio.us/5ndime 1.069 http://del.icio.us/blndcat 2.037 100

tweak 1.063 tweak 1.519

http://del.icio.us/Hervey 0.883 http://del.icio.us/5ndime 1.139

browser 0.787 optimization 1.042

optimization 0.675 browser 1.04

http://del.icio.us/menyx 0.612 http://del.icio.us/Hervey 1.015

101

Node Class

Over the period of T1 to T6 we see a steady increase in the number of users bookmarking this piece of content. We see from the images that the network is primarily connected through a few specific tags. T5 and T6 show the exponential and rapid growth of the individual-tags networks with a rapid infusion of individuals bookmarking the content. T5 and T6 also show that the network has stabilized around certain central tags and individuals in terms of core- periphery of the network. Again we see that in this nodes of class “tag” dominate as the primary means through which the network is connected. We see a few nodes of type “individual” that have betweeness scores indicating that they are significant in connecting the network. But, due to the rapidly changing nature of the network over the few days these individuals fluctuate in their positions and new individuals rapidly take their place.

Network Characteristics

The table below provides the connectedness measure for the duration the content took to reach popularity.

Table 6. Network Connectedness measures for FML

day0 day1 day2 day3 day4 day5 day6

Connectedness 0.0003 0.0014 0.0023 0.0026 0.0026 0.0068 0.3648

As with the previous case, the connectedness measure indicate that there is a sudden increase in the connectivity of the network from day 5 to day 6 as evidences in the increasing betweeness scores of specific nodes such as

102

“firefox” and “memory”. Similar to the previous case we find that the network got more connected after the introduction of specific nodes that bridged holes in the network structure.

2.5.3 Case 3: Ludios (The family Tree for Greek Gods)

This case is that of Ludios (http://ludios.org/greekgods/), a site that documents the parentage of gods in Greek mythology. This piece of content took

948 days to reach popularity on the social bookmarking site but was bookmarked over 26 periods over those 948 days. I have not presented all the days of bookmarking activity on the site but the first and last few days. This case is from the head region of the long-tail distribution but took a significantly long time to achieve popularity. This is the case of a slow winner, someone who achieves popularity but takes a longer period of time to achieve it.

103

T0 T1 T2

T7 T8 T9 104

T15 T16 T17

T24 T25 T26

Figure 10. Ludios

105

Figure 10 represents different periods of time as the content moves to popularity. We see that in the first few days as this content was introduced into the site; several individuals bookmarked it in isolation with the use of different tag combinations to relate to the same content. Over time, I find that people start connecting to similar tag combinations. These individual-tag combinations successfully move the content to popularity. We see at time T26 that the network revolves around one primary tag with a large number of individuals connecting to this tag, around this core set of tag-individuals are other sets of tags that get used in combination with the dominant tag. Around this third layer we find other individuals and tag combinations that exist.

Node Characteristics

This network poses a problem to develop node characteristics of betweeness and centrality. This is because the network takes a significantly long duration to rise to popularity and several portion of the network while connected to significant number of tags, remain unconnected to the broader network as can be seen from the images. Also the slow rate of change in the network implies that the node characteristics are also slow to change.

Node Class

From the diagrammatic representations of this network we see a large number of nodes of both types present in this network. Nodes of “individual” and

“tag” are both present to a significant extent in this network. While ultimately at

106

T26 we see that nodes of type “individual” are present to a greater extent than nodes of type “tag” we notice that a greater proportion of the “tags” are present in the periphery of the network with predominantly one tag dominating the central core. Thus, network connectivity was only achieved in this network through this one node of class “tag”.

Network Characteristics

The table below provides the network characteristics for this piece of content for specific days during its rise to popularity.

Table 7. Network Connectedness measures for Ludios

day165 day521 day897 day948

Connectedness 0.6206 0.6043 0.4416 0.8276

From the connectedness measures we see that the network is highly connected in its early periods, but as individuals bookmark it over subsequent periods, the number of tags increases and subsequently decreases the connectivity in the network. New individual-tag combinations are introduced to the network that reduce the overall connectivity of the network and increase the time it takes to reach popularity. This processes reverses in later periods as individuals start connecting to tags and tag sets that are predominant in the network.

While in this case I do not examine network centrality due to the large duration the content took to reach popularity we see that connectivity in the network was not the predominant issue after examining the network

107

connectedness measures. However, a qualitative examination of the period data in which the content did rise to popularity shows that the introduction of specific nodes- unrelated to bridging the network- provided the catalyst to moving this network to popularity.

2.5.4 Case 4. Camcorder Info (camcorder review site)

This case is that of http://www.camcorderinfo.com/, a review site that offers reviews of various camcorder devices in the market. It reviews cover a wide range of camcorder products and this site sees an uptick in browsing behavior especially during periods of shopping and gift giving. The site itself is professional looking and its reviews seem to focus on a broad set of characteristics for camcorder devices. A brief examination of user comments on the site also suggests that the reviews are helpful and are actively sought by the member of the site. This piece of content received over 1500 bookmarks over a period of 4 years. It was bookmarked first in February of 2004 and thereafter was bookmarked on slow but regular basis. At the time the data about this case was collected it was bookmarked over 1515 days. This is the case of a “dead duck”.

108

Figure 11. Camcorder Info

Figure 11 depicts the whole network for “Camcorder Info” after the duration of 1515 days of bookmarking. We see that this network comprises of several sets of individuals and tags however this network did not reach popularity.

Node Characteristics

The table below provides centrality measures at 4 points in time for the above content. These 4 days were chosen to sample since they approximately represent 25% marks over the period of 1515 days of bookmarking.

Table 8. Betweeness measures for CamcorderInfo

Day 391 Day 696 Day 1004 Day 1515

Betweene Betweene Betweene Betweene Node Node Node Node ss ss ss ss

video 36.559 video 25.815 video 30.955 video 29.48

camcorde ruski 29.892 24.387 camcorder 24.794 camcorder 26.153 r

rickl 20.645 reviews 16.503 reviews 16.671 reviews 16.457 camcorder 19.785 7.8 7.026 8.472 camcorde camcorder camcorder

109

rs s s

jacobee 8.817 dv 5.753 shopping 5.789 shopping 4.609 prashantra 8.817 Ravogt 5.532 dv 4.999 camera 4.179 ne

photograp arnons 4.516 shopping 5.023 3.899 dv 3.346 hy

victorpanlil photograp dv 4.516 tech 2.949 2.641 3.103 io hy

tech 0.215 blair 2.656 review 2.43 cameras 1.531

From the betweeness centrality measures we see that there is substantial difference in betweeness and the nodes present while comparing days 391 and days 696 with the introduction of new nodes that decrease betweeness centrality of previously existing nodes such as video and increasing the betweeness centrality of nodes such as camcorder. This indicates that the network is still in the state of flux. Between days 696 and day 1515 we see that while nodes have established relative position in their significance to the network there is still fluctuations in the measures of betweeness indicating that there are new nodes are being introduced in the network that are not connecting to preexisting ones are in fact seeking new connections.

Node Class

In this network we find that in earlier time periods nodes of type” individual” and “tag” have significant roles to play in the network. For instance, in the time period Day391 both classes have equal representation in the top betweeness scores in the network. Nodes of class “tag” increase and come to dominate the proportional representation in the top betweeness scores but the

110

overall betweeness scores in the network for that class fluctuates during day696 to day1004. This suggests that the specific class has a greater representation in the network but the significance of class “individual” is not to be underestimated in this network.

Network Characteristics

Network characteristics for the days sampled is presented below in table

9.

Table 9. Network Connectedness measures for CamcorderInfo

day day day day 391 696 1004 1515

Connectedness 0.5222 0.7912 0.8141 0.7944

We see from the connectedness measures for the 4 days that the network fluctuates between days 391 to day 696 but subsequently settles down over days

696, 1004 and 1515. The network is connected to a large extent over the entire duration indicating a steady rate of incoming bookmarks and connectivity of individuals to tags. We observe high betweeness scores for individuals and network connectivity for the whole life of this bookmark. However, it is these characteristics that inhibit this network in rise to popularity.

2.6 Findings

In this section we perform a comparative analysis of case 1 to 4 and subsequently employ state diagrams to illustrate the structural characteristic differences in the growth dynamics among the cases. The comparative analysis

111

of cases 1 to 4 are serve to highlight the structural differences in the networks and the state diagrams highlight the dynamic characteristics in the network behaviors that lead the rise of content to popularity at different rates.

2.6.1 Structural Analysis

I use the sampling strategy described previously to develop contrasts between the different cases as they relate to the different branches in the decision tree. Table 10 provides the interpretation of the node centrality characteristics over the 4 cases. Table 11 provides the interpretation of the node class characteristics over the 4 cases. Table 12 provides the network connectedness measures across the different cases. We will use these figures to interpret differences in structural analysis.

Table 10. Node Centrality Interpretation across cases

Cases

Sprinter Sprinter Slow Winner Dead Ducks

Firefox Memory Big Brother Ludios Camcorder Info Leak Protect

Centrality Increasing Increasing Fluctuating and Interpretation betweeness betweeness high betweeness scores scores scores

Table 11. Node Class Interpretation across cases

Cases

Sprinter Sprinter Slow Winner Dead Ducks

Firefox Memory Big Brother Ludios Camcorder Info

112

Leak Protect

Node Class Multiple nodes Multiple nodes Multiple nodes Fluctuating Interpretation of type “tag” of type “tag” of type “tag” but nodes of type dominate dominate only one node of “tags” and type “tag” individuals” dominated

113

Table 12. Network Connectedness Interpretation across cases

Network Connectedness Interpretation

Cases Time Periods

T0 T1 T2 T3 T4 T5 T6

Firefox Sudden Sprinter Memory 0.0003 0.0014 0.0023 0.0026 0.0026 0.0068 0.3648 Increase in Leak connectedness

Big Sudden Sprinter Brother 0 0 0.0001 0.0001 0.018 0.5276 0.5872 increase in Protect connectedness

day165 day521 day897 day948

Steady to Slow sudden Ludios 0.6206 0.6043 0.4416 0.8276 winner increase in connectedness

day day day day

391 696 1004 1515

Steady Dead Camcorder 0.5222 0.7912 0.8141 0.7944 Network ducks Info Connectedness 114

“Slow Winners” and “Sprinters”

I make a distinction between slow winners and sprinters in terms of the duration it takes the content to reach popularity. The case of the slow winner was

“Ludios”- a site that discussed ancestry of Greek gods. The sprinter type was represented by 2 cases, that of “FML” and “BBP”. FML and BBP emerged as popular in a week while Ludios took approximately 2(1/2) to 3 years to achieve the same popularity. From the difference in networks we see that individual-tie combinations form rapidly in the case of the sprinters, but take significant duration to form in the case of the slow winners. The rate of growth of the content relates to these individual-tag combinations.

Tags in the network help individuals structure their cognitive space. It helps them get an understanding of the domain. Individuals, who introduce the content, attempt to frame the content and use tags that emerge from their local and specific understanding of the domain. However, as others also bookmark and tag the content, new more generic tags emerge, and new individual-tag relationships are formed. The individual-tag combinations help connect parts of the network to each other and assist in the formation of the domain and related contributors. These individual-tag combinations are important in explaining the rate at which content achieves success or popularity. In the case of the slow winner, Ludios, we see that the individuals remain attached to isolated and differing tag sets. These differing tags sets isolate the individuals in the network and subsequently inhibit the formation of a common cognitive space. In the case the sprinters, however, the individuals in the network quickly over the period of a

115

few days converge on specific tags and tag sets that ideally represent the content and domain. Furthermore, in the case of FML and BBP (“sprinters”) there were intersections between unconnected parts of the network. These took the form of bridging individuals who connect the previously unconnected regions of the network. These bridging individuals connect or use tags that already exist in the content/bookmark space.

“Dead Ducks” and “Sprinters”

Dead ducks and sprinters are polar opposites of each other in both the popularity component and the duration component. Dead ducks are those that exist in the tail region of the long tail and the time it takes in that portion of the distribution is substantially longer to emerge a network. In contrast sprinters move from the tail to the head in a substantially short duration of time.

The case of the dead duck is that of “Camcorder Info”. This content acquired a large number of bookmarks over a substantially long period of time while the sprinters were “FML” and “BBP”. We see from the measures of network connectivity that the network for the “dead duck” was actually substantially connected for the measurement period, but for the sprinter network connectivity dramatically increased during its rise to popularity. This suggests that steady network connectedness is detrimental to content reaching popularity. On the other hand, content that has central actors that jump in and form the bridging tags by attaching themselves to specific tag sets and thereby increasing network connectedness substantially increase the likelihood of content popularity.

116

This idea is similar to that of structural holes as elaborated by Burt (Burt

1999; Burt 1997b; Burt 2000). Nodes bridging structural holes are prominent in the network since they bring together disconnected regions of the network. Other nodes start building ties to these bridges in the network. Furthermore, preferential attachment suggests that individuals will connect those who are well connected. Thus preferential attachment behavior on part of new entrants in the network strengthens the bridges in the network and further moves content towards popularity while adding new nodes.

Dead ducks on the other hand already have a connected network with relatively high betweeness scores. This is analogous to closed networks described by Coleman Coleman 1988; Coleman 1990). Closed networks are resistant to new entrants and at times actively resist them. Closed networks admit those that share similar conceptions, values, norms and the growth of such networks is therefore gradual and not spontaneous and non-linear. Such networks do not lend themselves to reaching highly popular content and as such bookmarking displays a gradual behavior of ‘densing’ the network.

“Dead ducks” and “Slow Winners”

Dead ducks and slow winners share the same type of characteristics in the network structure. Both types of content have a high degree of network connectedness. This degree of network connectedness persists over time.

Similar to the case of dead ducks described previously, connectedness is an

117

indicator of closed networks that work against the movement of content to popularity.

2.6.2 Dynamic Analysis

The comparative analysis conducted may also be represented in the form of state diagrams that can depict the user actions that transition of content from unpopularity to the popular spectrum. The three major characteristics of networks and related actions that were found to influence the popularity of content and duration to popularity are: 1) the betweeness of nodes in the network, 2) the node classes that dominate the network and 3) the level of connectedness of the network. I describe the differing effects of these characteristics in the three branches of the decision tree.

“Rapid” Popular Content

Figure 12 depicts the series of states through which the socio-technical network progresses has to reach popularity in content in the cases of Firefox

Memory Leak and Big brother Protect (cases 1 and 2). The rate of change in the network and popularity for the content is also rapid. The initial state of the content, shared by all new content in the system, depicts the network as unconnected. Bookmarking behavior on part of individuals is taking place in the system and so is adding tags. This is depicted as phase 1 in Figure 12.

Bookmarking behavior on part of individuals, however, results in increasing betweeness scores associated with a specific class of nodes, i.e. tags. This is depicted as phase 2 in Figure 12. The increasing betweeness is the outcome of

118

selecting and using persistently specific “tags” that act as cognitive markers in the individual’s representational system. When similar sets of markers become shared by a larger number of individuals the object “tag” achieves a high betweeness score indicating that it is shared by many and helping connect them together. The use of “tag” class nodes with high betweeness results over time in a connected network. The network becomes, in particular, connected through the bridging ties established between individuals by the “tag” class. Furthermore, the recursive nature of the use of “tag” reinforces the bridging nature of the “tag” node class. This is depicted as phase 3 in Figure 12. The recursive use of similar representational markers, “tags” results in increasingly recursive bookmarking activity and leads to the content reaching the popular state. This is depicted as phase 4 in figure 12 where the network is connected, the “tag” class bridges the network, and the content is popular.

119

Initial State Changed State

Content Content

[bookmarked] [bookmarked]

e

s e s

b

b k

k i

i r

r r

r a a c c

s

m s m e

k e k

o D o D o o B B Enter, Register, Login Network Network [Unconnected] [Unconnected] Tag Individual Tag Individual [assign, high [bookmark] [assign] [bookmark] betweenness] Uses Uses

Phase 1 Phase 2

Content Content

[bookmarked] [popular]

s e e

s k b i

k b r i

r r

r a

a c c

m s s

m k e

k e o D o D o o B B Network Network [Connected] [Connected] Tag Tag Individual Individual [assign, high [assign, high [bookmark] [bookmark] betweenness] betweenness] Uses Uses

Phase 3 Phase 4 Figure 12. State diagrams for “Rapid” Popular Content

“Slow” Popular Content

Figure 13 depicts the series of states that the network progresses through for “slow” popular content as represented by the case Ludios (case 3). After the introduction of content by individuals two processes take place in parallel in the system. The first process is the use of combinations of nodes of the “tag” class and the second is the increasing betweeness of nodes of class “tag”. While in the case of the “rapid” popular content, there were specific nodes of class “tag” that dominated the network, in the case of “slow” popular content there are multiple instances of nodes of class “tag” that have high betweeness scores. Each of these nodes is recursively used and bridge mainly separate regions of the

120

network. Thus, while the betweeness of the nodes increases in the network, it is mitigated by the number of high betweeness nodes in the network. This state of recursive use of multiple nodes of class “tag” can be prolonged by the representational systems employed by the individuals performing the bookmarking activity. Representational systems that remain diverse and separate prolong this period while converging and congruent representational systems move the bookmarking system to phase 3. Convergent use of a node of class “tag” takes place after a critical mass around a coherent representational system has been achieved. Subsequent to this the content behaves similar to

“rapid” popular content.

Initial State Changed State

Content Content

[bookmarked] [bookmarked]

e

s e s

b

b k

k i

i r

r r

r a a c c

s

m s m e

k e k

o D o D o o B B Enter, Register, Login Network Network [Unconnected] [Unconnected] Tag Individual Tag Individual [assign, [bookmark] [assign] [bookmark] increasing Uses Uses betweenness]

Content Content

[bookmarked] [Popular]

s e e

s k b i

k b r i

r r

r a

a c c

m s s

m k e

k e o D o D o o B B Network Network [Highly [Connected] Connected] Individual Tag Tag [bookmark, Individual [assign, [assign, high fluctuating [bookmark] increasing betweenness] betweenness] betweenness] Uses Uses

Figure 13. State Diagrams for “Slow” Popular Content

121

Unpopular Content

Figure 14 depicts the state diagrams for unpopular content. These are the state diagrams for the case Camcorder Info (case 4). After the introduction of content we see that tags become again the predominant nodes in the network and that they have fluctuating betweeness scores. The recursive use of the tags moves the network from an unconnected state to a connected state. However, this connected network is far from stable and the fluctuation in betweeness scores for the tags continues. We see a combination of node class ”individuals” and node class “tag” forming the core of the network. This form of steady network connectedness and fluctuating node betweeness suggests that while bridges are formed across the network they are constantly changing thus the central core of the network is never established. These processes result in the state where content does not move to the head of the long-tail distribution and remains

“unpopular”.

122

Initial State Changed State

Content Content

[bookmarked] [bookmarked]

e

s e s

b

b k

k i

i r

r r

r a a c c

s

m s m e

k e k

o D o D o o B B Enter, Register, Login Network Network [Unconnected] [Unconnected] Tag Individual Tag Individual [assign, [bookmark] [assign] [bookmark] fluxtuating betweenness] Uses Uses

Content Content

[bookmarked] [Unpopular]

s e e

s k b i

k b r i

r r

r a

a c c

m s s

m k e

k e

o D o D o o B B Network Network [Steady [Steady Connectednes Connectednes Individual s] Individual s] Tag Tag [bookmark, [bookmark, [assign, [assign, steady fluctuating fluctuating fluxtuating betweenness] betweenness] betweenness] betweenness] Uses Uses

Figure 14. State Diagrams for “Unpopular” Content

123

2.7 Discussion and Summary

These findings support several findings in the literature and modify them in the context of popular content as well. While Burt (Burt 1999; Burt 1997b; Burt

2000) has established the importance of “bridging nodes” to network connectivity and social capital, these findings suggest that specific types of “bridging nodes” contribute to the emergence of content popularity. Specifically, nodes of type

“individual” inhibit the emergence of the content as popular while nodes of type

“tag” contribute to the emergence of popular content. This is especially evident in the case of the “slow winner” where the content rose to popularity after a node of type “tag” has a relatively high betweeness score and a prominent position in the network. In addition to supporting and extending Burt’s work, this study also suggests that network closure i.e. highly connected networks (Coleman 1988;

Coleman 1990), followed by a weakening of the network bonds, can lead to content emerging as popular. This finding is evident in the case of the “slow winner” where network connectedness has a high steady value, but drops before the content rises to popularity. This in effect reverses the process of network closure argued by Burt (Burt 2000), who suggests that networks with structural holes move towards closure after the establishment of bridging ties. This study also argues against the benefit of highly connected networks for the spreading of information. Highly connected networks resist the addition of new nodes and development of new relationships. Information flow within such networks is stagnant and restricted to the existing set of nodes and the growth pattern in

124

such networks is slow and steady. This conclusion is evident from the analysis of the network surrounding the case of the “dead duck”.

While the findings above extend and complement existing work on network characteristics and their influence on individuals and organizations, it does so in the content of content popularity and as such the findings need to be examined in this light. While substantive literature has little to say about the influence of network characteristics on content popularity they do discuss how specific network structures contribute to generating specific emergent outcomes

(Barabási et al. 1999b; Katz 1993; Monge et al.). This study, in a similar vein, attempts to highlight how network and node characteristics shape the emergent outcome of content popularity in online social bookmarking contexts.

Summary

The four cases reveal characteristics of evolving networks that help explain the manner in which content reaches popularity. While literature has suggested the characteristics of networks and the role of individuals in spreading information it has not highlighted their specific contribution to the emergence of popular content. The cases support the literature in the notion that centrality is an important concept in networks. But degree centrality does not contribute to an understanding of networks and their role in emerging content popularity. Degree centrality in emerging networks does not provide a indication of a nodes relative significance in the network and node betweeness is a more relevant and appropriate measure. Betweeness highlights the role of nodes in shaping the

125

path to content popularity. Betweeness helps us distinguish content that moves to popularity. The popular cases display increasing betweeness scores for the different node classes over time. Both “tags” and “individuals” display increasing node betweeness scores in the case of the sprinters. Both these classes of nodes however have fluctuating betweeness scores for “dead ducks”. Fluctuating betweeness scores indicate that the node positions in the network are fluid and that the relative influence of any specific node is uncertain.

Examination of differences in node class between the cases suggests that the node class “tags” plays a significant role in the rise of content to popularity.

Specifically in the case of the “sprinters” nodes of type “tags” are the dominant node type and help the content reach popularity in a short duration of time.

However in the case of the “slow winner” while there are many nodes of type

“tag”, only after one specifically comes to dominate the network by obtaining a relatively high betweeness score and central position in the network does the content rise to popularity. In the case of the “dead ducks”, both classes of nodes

“individuals” and “tags” were present in the network, but neither played significant roles in shaping the manner in which the content evolved.

Network connectedness also played a significant role across the different cases. In the case of the “sprinters” we see an unconnected network being connected through the use of “tags” while in the case of the “slow winner” we find that steady connectivity in initial periods of the network fades and subsequently the network gets connected similar to that of the sprinter. This suggests the introduction of a focal node in the network that moves the network from a steady

126

state. However in the case of the “dead ducks” we find that the network connectedness is relatively high during the life of a content and while it fluctuates, it does so within a narrow band of connectedness. Based on the discussion and comparative analysis using state diagrams we can next articulate propositions about network characteristics and the role they play in distinguishing content in the head of the long tail distribution (popular) from that of the tail of the distribution (unpopular).

Proposition 1a. Content in the head region of the long-tail distribution is embedded in networks with high connectedness and high node betweeness.

Proposition 1b. Content in the tail region of the long-tail distribution is embedded in networks with fluctuating network connectedness and fluctuating node betweeness.

The propositions 1a follows from the first two state diagrams and suggest that high network connectedness and node betweeness scores are important for popular content. The propositions 1b follows from the third state diagram and suggests that fluctuating network connectedness and node betweeness scores are detrimental to popular content. Furthermore case study 4 demonstrates that the greater the number of dominant nodes of class “tag” and class “individual” there are in a network, the longer it takes for the network to become well connected. The fluctuating dominance of different classes exacerbates the effect of the fluctuating node betweeness scores and does not lead to content popularity.

127

Proposition 2. The rate at which the content moves to popularity is mediated by the number of nodes of the class “tag” that have high betweeness scores.

However the rate at which content emerges as popular is mediated by the network characteristics. This follows from state diagram 2 and the case study 3, which demonstrates that the greater the number of dominant nodes of class “tag” there are in a network, the longer it takes for the network to become well connected. Subsequently, this slows the rate at which the content emerges as popular.

Limitations

The study has several limitations that accompany such research. The first limitation is that of the number of cases that were employed in each branch of the sampling decision tree. Greater number of cases would increase the confidence in the generalizations (Yin 2003) being made and as such this research needs to be followed by work examining a broader population of popular content.

Furthermore, the network and node characteristics were developed at time increments of approximately 25%. Sampling at more frequent time intervals might provide a more nuanced understanding of the dynamics of network evolution. Finally, this study does not delve into the characteristics of the content itself and seeks a structural explanation to content popularity. Future work should

128

also examine the contributory role of content characteristics in explaining the evolution of popular content.

Implications

These findings have implications for research and practice. While research (Anderson 2006a; Brynjolfsson et al. 2006; Elberse 2008) on long-tail structures has typically focused nature of the long-tail by examining several instances of them, little is known about the systems and structures that generate such behavior. Our findings suggest that network structures that contribute to long-tail behavior and that the content or information that is part of the long-tail can move from the tail to the head based on the changing network dynamics of the socio-technical system. The research also serves the knowledge management community by emphasizing the role of representational systems and related classification systems (Hansen et al. 2001; Poston et al. 2005) in managing and disseminating knowledge. The role of tags in the system suggests that the storage and representational systems for knowledge needs to be closely examined (Star 2002). It adds to IT literature by elaborating on the description of long tail characteristics of IT mediated networks. It theorizes about the multi- modal network structures that underpin long-tail distributions and about the social-technical characteristics of nodes that are critical to the long-tail. This paper also serves the marketing and consumer research (Phelps et al. 2005;

Subramani et al. 2003) by identifying the characteristics of networks that should be analyzed to examine viral behavior in consumer markets. It goes beyond the traditional cross sectional examination of consumer networks for viral behavior.

129

We perform a longitudinal case-based analysis and thus identify how the change in networks over time results in viral behavior.

Future Research

While this paper examines the dynamic nature of networks and their contributory role in the long-tai nature of content and information through the use of case studies, cross sectional work is the predominant method through which the long-tail has been examined. As such, a cross sectional study of the population of networks across the long-tail distributions will help generalize the findings of this study to broader populations. Furthermore, this paper does not examine the behavioral motivations of the individual that are present in such networks. Some motivations that are explored in the literature use preferential attachment and other mechanisms (Albert et al. 2002; Barabási et al. 2002;

Jeong et al. 2003) to explain tie formation. Whether these mechanisms hold true in long-tail systems is an open question and needs further examination. Finally, consumption of content is based on both the structure of networks that content is present in and the type of content itself. As such, we suspect that the type of information along with the structure of the networks influences the popularity and rate of popular growth for information and content. The interactions effects of content type with network characteristics on popularity and rate of popularity is still an open question at this point.

130

3. Explaining Network Growth and Popularity

in hybrid content networks: A quantitative

Study

3.1 Introduction

Individuals make decisions on information consumption as part of their everyday life that are influenced by the social structures that surround them. The effects of these structures are observed in traditional social settings by examining the normative influences that they have on individual decision-making (Ching et al. 1992; Davis et al. 1996; Fischhoff et al. 1997b; Poole et al. 1986; Segal 1982).

However, contemporary environments are saturated with a plethora of social information technologies such as Facebook, , LinkedIn, MySpace, and more (Parameswaran et al. 2007). Individuals use these technologies to communicate and establish relationships with others and the structure of these relationships subsequently influences consumption behavior of engaged individuals. The result of these relationships are emergent behaviors such as coordination of collective decisions on information consumption and its value

(Parameswaran et al. 2007). While we may assume that structures formed in social information technologies influence behavior we know little of the

131

characteristics of the structures and the social information technologies that influence such consumption behaviors.

In this study I focus on networks created by (social) bookmarking and user-submitted content services and ask what drives popularity in these services

(Szabo et al. 2010) and what explains the growth of content networks (Ping

2009). This study is motivated by the observation that consumption of content in social computing environments are increasingly part of contemporary life while at the same time few systematic studies that have sought to expose mechanisms that can explain content popularity or network growth. The long-tail distribution observed in these networks is a common phenomenon observed in social behaviors including the consumption of content online, sales of movies, books and music etc. (Brynjolfsson et al 2006, Lew 2008, Anderson 2006, Elberse

2008, Rubinson 2008, Shepherd 2008). More broadly, these are examples of behaviors where the power law mathematics can be used to describe their structure and dynamics (see e.g. Taleb 2007). Long-tail distributions are common when individual’s decisions are influenced by his or her knowledge of the purchases of others (Anderson 2006a; Brynjolfsson et al. 2006). Accordingly, the more individuals purchase a specific content, the more likely it is that others will purchase it as well, and, each individual’s purchase contributes to the subsequent decisions. The cumulative outcome of these decisions can be depicted in the form of a histogram as the frequency of the use of information, content etc. The long tail distribution then is an ordered frequency distribution from high to low, on content (Brynjolfsson et al. 2006). A long-tail distribution

132

shows the emergent “pattern” of distributed decision-making. These distributions are segmented into two areas: a head that contains high frequency objects and a tail that contains low frequency objects. Objects in the head area are consumed by a large number of individuals while those in the tail area are consumed by a small number of individuals. The distribution represents the proportional popularity of each commodity in the potential user community. Popularity thus measures the rate of consumption of a commodity over a fixed duration of time.

I argue that the genesis of specific forms of distributions in the consumption patterns is partially created by the properties and structures in socio-technical networks constituted by social bookmarking. Such socio-technical networks comprise, in particular, systems of representation, modes of communication, individuals leveraging the technologies and, specific features of technological artifacts. We argue that the examination of long-tails and their antecedents is critical in understanding the nature and dynamics of consumer- created content from viral videos, to music use, or popularity of specific texts.

Such long-tail structures pervade many forms of content where its creation and consumption is driven by both the consumers and producers. Earlier studies

(Anderson 2006a; Brynjolfsson et al. 2006; Elberse 2008; Rubinson 2008) have examined such distributions with an eye on effects of IT on content consumption within long-tail form – especially one that is located at its tail. Consequently, studies on the long-tail have focused on the value of investing and entering in niche-markets. While such studies theorize around the inherent value of harvesting the long-tail, they help us little in understanding how such structures

133

are formed. Recently, some studies (Lew 2008; Shepherd 2008) have looked at characterizing content populations as long-tails. Only few studies (see e.g.

Fleming 2007) have examined factors that contribute to the form of the distribution. Fleming (2007) in examining long-tail of innovations finds that innovators that collaborate to a greater extent are far more likely to produce successful innovations. These successful collaborations take place through exploiting the types and nature of the relationships that these collaborators are embedded in and are typically cross-disciplinary and have multi-disciplinary teams (Powell et al. 1996). Other studies (Chesbrough et al. 2002) that have examined successful innovation do so by examining characteristics (Sherif 2004) of individual innovators or examine the context of innovation diffusion (Attewell

1991). However, as a large and growing number of the long-tail distributions prevail in on-line content and information consumption it is instrumental to examine how different structures contribute to such distributions.

The earlier case studies reveal that in social bookmarking services the content’s location in alternative regions of the long-tail can be predicted using the network measures associated with hybrid networks consisting of tags, users and content including node betweenness, node class and network connectedness. These characteristics distinguish between three classes of content: 1) those that grow rapidly to become popular (sprinters); 2) those that grow to become popular over a prolonged period (slow winners); 3) and those that fail to emerge as popular at all (dead ducks). These three classes of content displaying varying degrees of specific network characteristics provided an initial

134

tentative model describing the structures and dynamics that contribute to content’s location and its dynamics. The question remains however, can such results be generalized and do content networks in general exhibit such structural characteristics that explain the content’s popularity. We will next conduct a quantitative study that focuses on validating and refining the original model using a larger sample of content networks whilst controlling for extraneous effects such as network size.

In this section our goal is to carry out a quantitative study to validate, refine and expand the initial model of explaining network popularity and its growth as proposed in the earlier case studies. The two research questions we examine are:

RQ1: What characteristics of hybrid content networks predict content’s

location in the long-tail?

RQ2: What characteristics of hybrid content networks influence the

temporal duration at which content potentially moves to high popularity?

To answer these questions a cross-sectional sampling is carried out on content network following a long-tail distribution. This sample is derived from delicious.com web site with data points drawn both from the head and tail region of its long-tail. For duration analysis data points are drawn from the head region of the distribution are sorted out into two categories: rapid popular and slow popular content.

135

The remainder of the section is organized as follows. In the next section we review relevant theory and research and formulate the hypotheses that guided the study. In the subsequent section, research design, data collection and analysis are explained. We then report main findings. We conclude by discussing limitations, related research and implications for theory and practice.

136

3.2 Popularity, Cognition Sharing and Hybrid Networks

3.2.1 Tags as Cognition Artifacts

Bookmarking systems are repositories for organized content generated by a community. The acts of identification and classification of content through during bookmarking by using tags ‘sort’ content or ideas into distinct ‘buckets’

(Bowker et al. 1999). In addition to organizing the ‘content’, the identification and classification supports member’s shared cognition (Star 2002). Tags thus act as cognitive artifacts for sharing knowledge between individuals and social worlds.

As cognitive artifacts, “tags” convey interpretations and serve as extrinsic cues to the nature of the content (Herbig 1996). As a form of classification these tagging systems mediate the relationships between the social actors that create them and their cognition within the rest of the community. As shared artifacts they persist over time and influence individual’s cognitive behaviors. Similar cognitive devices have been considered as ways to engage in distributed cognition (Hutchins 1991; Hutchins et al. 1996) and create information classifications (Bowker et al. 1999; Star 2002). Information systems are also engulfed with classifications to coordinate activities (Argyres 1999a). For instance, Argyres (1999b) describes information systems as a means to share

“technical grammars” that help organize design worlds between designers and different organizations. Bowker and Star (1999) emphasize likewise the role of definitions in generating classifications for diseases.

137

The tagging activity is typically arranged around simple classifications like treasure hunt lists where teams divide the work and coordinate activities based on item’s geography, etc. to find the items (Bowker et al. 1999). Tagging creates consequently evolving and open ended classifications denoted often as folksonomies. One reasons for this is that a folksonomy is created ‘bottom up’ by constantly classifying emerging content through tagging by distributed and autonomous individuals. These individuals share an overlapping and loosely coupled ‘knowledge’ domain and they use tags to identify and describe the shared content as it evolves. Due to autonomy and distribution of the individuals the classification remains fundamentally open-ended. Tags and related user descriptions serve therefore as foundation logic for the classification and folksonomies basically reflect the group’s evolving understandings of the knowledge domain. Their value for individuals comes with the evolving taxonomic structure that helps organize information across community.

3.2.2 Forming Socio-technical Networks through Tags

The content space in bookmarking systems comprises of individuals who employ the tags to make sense of and systematize the content. Subsequently, the content space is inherently socio-technical in nature, consisting of social actors and technical artifacts (tags). Thus ties among individuals are mediated through the tags that relate to the content space. Similar tie formation between actors can be found in traditional social networks and where it is referred as affiliation networks (Wasserman and Faust 1994). Affiliation networks are formed by participating in organizations or social activities. This participation creates

138

relationships between individuals based on an overlapping occurrence of the same individuals at different events. Analogous to such affiliation networks, bookmarking forms thus networks by the affiliational relationships that individuals create with tags and their sharing.

The resulting combination of social actors and technical artifacts forms in the end a multi-modal network. i.e. a network that connects social actors and technological nodes- tags (Kane et al. 2008). Social actors represented in the networks make decisions about content whereas tags that build up the structure of classifications connect to actors and content. We accordingly contend that characteristics of these hybrid networks both reflect and influence the decision- making of individuals and what content becomes popular. Therefore we need to explore the node and network characteristics in these hybrid networks in order to understand factors that influence long-tail behavior.

We posit that the content in the head region of the long-tail will have a greater number of nodes in its related tag and actor networks, and subsequently these networks have a larger number of relationships. While greater number of nodes does not necessarily mean a greater number of relationships, hybrid networks typically contain several node types and therefore affiliation relationships between different node types will occur with greater frequency. This increasing frequency is a result of the diversity of nodes as individuals seek out new sources of information. As noted tags represent cognitive markers of new information sources and related affiliation based relationships establish bridges with new sources of information. An increase in the number of tags in the

139

network subsequently represents an increase in new information sources. Actors nodes seek out new information sources increasing the number of relationships between social actor nodes and technological “tag” nodes”. Relationships among the nodes reflect affiliations actors form through the tags they use to identify and classify the content. Relationships among nodes are thus not random but are due to socio-structural cues available in the environment. Affiliation based relationships through tags are also greater in number, since they are results of co-presence rather than direct contact based relationships.

The socio-structural available cues in tags influence decision-makers through similarity and proximity (Ibarra et al. 1993; Marsden et al. 1993; Meyer

1994; Salancik et al. 1980). Influence through similarity resembles structural equivalence in a network, where decision-makers adopt attitudes and opinions of individuals that have similar relationships as they themselves have in the network (Meyer 1994). Influence through proximity suggests that social actors are influenced by those they have close relationships with (Meyer 1994). We suggest that similarity and structural equivalence form basic mechanisms through which the decision-makers are influenced in book-marking networks.

Influence through similarity flows from the similarity in relationships with individuals or artifacts in the network. These linkages are visible to other individuals in the environment through tagging actions (Hutchins et al. 1996).

This is analogous to organizations adopting the same technologies and practices with those that are in similar industries, or with the same set of inter- organizational ties (Haveman 1993). Employee perceptions based on notions of

140

power and similarity also follow a similar logic (Ibarra et al. 1993). The affiliation based relationships in social-bookmarking environments are formed with the tags that individuals use. Individuals use in most cases several tags to describe any piece of content. Structural equivalence reflects such environments the extent to which the tag structures used by individuals are similar to each other. Individuals with similar configurations of tags and associated tie structures tend to coalesce together. Network closure (Burt 2000; Coleman 1988) suggests that individuals will have strong relationships with those they trust. These relationships are forged by informational and structural cues available in the environment. The relationships are reciprocal in nature and echo interpersonal attraction (Brass

1985). A great number of such relationships generates homophily in the network structure i.e. that connections between similar people grow at a higher rate than dissimilar people (McPherson et al. 2001). Since connections between individuals grow at a greater rate in homophilic networks we suggest the following:

Proposition 1: Networks in the head region of long tail have a greater number of structurally similar nodes and subsequently display greater cohesion than networks in the tail region of the distribution5.

As the number of structurally similar people increases in a social space it tends to give rise to niches or cliques (McPherson et al. 2001). Cliques are defined as three or more nodes with strong or frequent interactions and are

5 This is in proportion to the overall number of nodes in the respective networks.

141

present in a variety of networks. Cliques are present in most networks, but smaller networks such as team networks (Joshi 2006) are likely to have stronger relationships among all members compared to larger networks such as organizational networks that have a combination of weak and strong relationships. This suggests the following:

Proposition 2: Networks in the head region of the long tail have greater number of clusters (cliques) than networks in the tail region of the distribution.

The duration to popularity refers to the period prior to the introduction of a critical node in the network. Critical nodes are bridging individuals or tags in a hybrid network. These critical nodes bridge unconnected and distinct parts of the network. As they connect parts of the network together, the nodes introduce new information through tags and shared cognitive structures to the networks being bridged. The criticality of the node is determined its ability to share cognitive information with its networks. The introduction of new information leads to increased shared cognition between individuals across the network. Since, the fit of a node with a specific network is based on its location within the network structure and its congruence with the network, sharing cognitive structures between nodes of different networks increases the fit of any specific node with the network. This is an extension of the “rich get richer logic” or preferential attachment (Albert et al. 2002).

While sharing cognitive structures are important to bridging nodes, these same structures also lead to strong relationships, similar nodes and cohesive

142

tightly knit networks. Thus the shared cognitive structures within a network do not allow new nodes to be introduced since the network and relationships are closed to external nodes (Joshi 2006) making it hard for any new node to join and share cognitive information (Coleman 1988). Closed networks are hard to penetrate, and penetration is even hard for critical nodes. Models of density-dependence suggest that large numbers of firms operating in a market tend to decrease the number of new entrants (Hannan et al. 1988). Furthermore, Burt (1992a) suggests accordingly that nodes with weak ties in a closed network allow multiple networks to be bridged. The weak ties, when established, expand the network and change the direction of the network evolution. The establishment of these ties depends on the closure properties of the network that dictate its subsequent growth. Thus we propose:

Proposition 3: Networks with greater cohesiveness will increase the duration to popularity for content in the head region of the long tail distribution.

143

3.3 Research Design and Method

3.3.1 Research Goals

The research questions inquire into characteristics and dynamics of networks in which popular content becomes embedded. Such content networks pose an interesting challenge to network analysis, since they combine social actors with technological artifacts that make analysis and interpretation challenging (Monge and Contractor 2003). Yet, traditional social network methodologies are being increasingly applied to understand the growth and behavior of such multi-modal networks (Kane and Alavi 2008, Monge and

Contractor 2003).

We apply quantitative methods (Cohen et al. 1983) as to test and generalize propositions (P1-P3). The reason for using these quantitative methods is to provide a stronger theoretical justification and statistical generalizability.

Several quantitative methods are available. Survey techniques (Wasserman et al. 1994) can provide the sample size and population required, but surveys are difficult to administer without a direct contact with the population. Interviews can also be employed to collect network data, but again they require direct contact with the population. This is however difficult to create as the population is too dispersed geographically. Furthermore, associating specific usernames applied in bookmarking to real individual identities poses insurmountable challenges as it requires contacting individuals using both survey and interview based methods.

144

Since methods involving direct contact with such a dispersed population pose numerous challenges, we will rely on publicly available web sites as the source of the data. We will use ‘direct’ samples from existing ‘stored’ networks as represented by bookmarks available on the Internet. The techniques employed to collect the data ease the collection effort while at the same time provide relatively valid and reliable data to address the research questions. One reason to use these methods is that and related social computing sites provide application programming interfaces (APIs) to mine network data effectively.

While there might be several services aggregating content not all of them track the popularity of the content. Subsequently from the pool of content services we examined those that track content popularity. Furthermore, this information on popularity must be publicly available so that we can aggregate the data. The specific information that we are interested in these sites is the tags that individuals use to describe content. As noted these artifacts serve as repositories of cognition affiliations with which allow individuals to form networks. Filtering content in this manner offers data to address our first research question i.e. content popularity in online services. To filter the popular content we need to examine the extent to which these services provide network and temporal information about specific content that has emerged as popular. This information is crucial to addressing the second research question about the duration. The information we search from the sites is records that track popularity of the content as defined by the following data: the number of individuals that saved

145

that content, the time period in which they did that, and affiliated information that the individual might associate with that content.

3.3.2 Research Site

A variety of content networks meet the requirements above are available on the Internet. Some examples are content aggregators such as digg and , bookmarking services such as delicious and furl and user submitted services such as stumbleupon. User submitted content services aggregate such content and provide recommendations while bookmarking services have a narrower focus by representing individual bookmarking activity. Bookmarking services optionally allow the users to share their content with other individuals.

Furthermore, bookmarking services offer users the option to add tags and notes to the content they choose to bookmark.

A variety of bookmarking services are available6 for tagging different types of content with varying proprietary/public features. Individuals and organizations bookmark research content and share it through services like IBM Lotus

Connections, Diggo or Connectbeam. Other services provide bookmarking for users in specific regions of the world. All such sites provide bookmarking, tagging, sharing and note-taking functionality. Despite the large number of services available, delicious.com serves as the template for most services.

Therefore most local and regional social bookmarking services imitate this first social bookmarking service. It is a bookmarking web service designed for

6 http://en.wikipedia.org/wiki/List_of_social_bookmarking_websites#Social_bookmarking

146

storing, sharing, and discovering web bookmarks. The site was founded by

Joshua Schachter in 2003 and acquired by Yahoo! in 2005. It had more than 1 million users by the 25th of September 2006, and by Nov 6th 2008 it had more than five million users and 180 million unique URLs bookmarked7. In contrast to specialized content services delicious users can bookmark and share any content. In addition, the delicious.com provides API’s that allow add-on services and invite developers to access the delicious database of bookmarked content.

This API is highly functional when compared to other similar sites, and provides an invaluable tool for data collection. Consequently, we selected as the study site delicious.com (http://delicious.com). The reach of the delicious service (i.e. the % of all web users) has grown steadily from 0.18% to 0.40% over the last six months8. Users typically spend 3 minutes on the site with an average of about 3-

5 page views per visit. Users on the site are typically between 25 to 34 years of age with a declining proportion of older user groups. The users are typically educated in a graduate school and browse the site dominantly from work9. The site tracks individuals registered with the service and their associated bookmarks and tags. Multiple URLs can be bookmarked by individuals and the bookmarking activity is determined by individuals based on value of the content. While the value ascribed to the bookmark is individually determined, it also serves as

7 http://en.wikipedia.org/wiki/Delicious_%28website%29; http://blog.delicious.com/blog/2006/09/million.html ; http://blog.delicious.com/blog/2008/11/delicious- is-5.html

8 http://www.alexa.com/siteinfo/icio.us

9 http://www.alexa.com/siteinfo/icio.us

147

cognitive device in cases where the individual wishes to return to a piece of content.

User Interface System

visits visits website

Individual A Individual B bookmarks bookmarks

Toolbar A Toolbar B

Bookmark A Bookmark B

Tags A, B, C, D Tags X, Y, B, D

Delicious Bookmark Storage System

Individual A Individual B Individual C Individual D

5th Feb 2007 11th Sept 2007 11.50 am 12.30 am 12th Mar 2006 Bookmarking th 5.10 pm 30rd Mar 12 Mar 2006 time and date 2006 9.00 pm 6.00 pm 25th Mar 2006 9.00 pm

23rd Mar 5th Feb 2007 2006 1.00 pm 11.50 am

Website A Website B Website C Website D

Tag A Tag B Tag C Tag D Tag X Tag Y Tag P Tag Q

Tag E Tag F Tag G Tag W Tag Z Tag R Tag R

Delicious Popular List

2 bookmarks in the same day Website D over the period of an hour D e c r e

a 3 bookmarks over 1 week over s

i Website A n several time periods g

o r d

e 2 bookmarks over 5 months r Website B over several time periods

1 bookmark Website C

Figure 15. Social Bookmarking System Operation

Figure 15 depicts the use process of delicious.com and the progression of use and social bookmarks into popularity lists. This use process of the delicious.com social bookmarking system corresponds closely with the data acquired to analyze research questions 1 and 2. From this use process we are able to glean the popular from the unpopular content and the duration and process via which popular content emerged from bookmarking activity.

148

Individuals start by creating an account on the bookmarking system. After the account creation they download a toolbar that provides them access to the features of the website. The toolbar forms part of the user’s browser capability.

Users save content as a bookmark on the bookmarking site by using this toolbar.

The bookmark is saved as part of the user’s personal set of bookmarks and is also shared socially. This browser-based toolbar forms the primary infrastructure used to perform bookmarking activity.

This functionality comes into play when the user visits some piece of content on the internet. Typically, the individual chooses to bookmark the content based on the value he or she places in it. Using the toolbar in the browser creates the bookmark. In addition to the bookmark itself the user is provided with tagging functions that help manage the bookmarks. This is similar to the folder structures on a computer hard drive. These tags are help organize bookmarks and are shared with the community of users. Users can also browse delicious.com for content and can choose to bookmark and tag content that has already been shared by other individuals. Other features available are lists of the most recent bookmarks, adding others as part of their personal network and listing and ranking of the most popular bookmarks. On delicious.com, popularity is determined by the number of bookmarks that a piece of content has received over a specified duration of time.

Figure 15 also depicts the bookmark and tag storage system. Within the delicious.com system, each bookmark made by the user is associated with the tags that are used to describe them. In addition to this the system also

149

associates date and time stamps with the bookmarks so that bookmarks, users and content can be tracked. Based on this tracking information the system generates a list of popular bookmarks.

3.3.3 Construct and Variable Measurement

Measures of network connectedness, node betweeness and node class were observed to be determining factors in influencing whether the content was popular, and at what rate the content moved to popularity based on earlier case studies. Associated measures were constructed for the analysis using the network analysis tool UCINET 6 (Borgatti et al. 2002).

Control Variables

Measures that characterize the specific content bookmarked such as number of individuals that bookmark the content and the numbers of tags used in the depiction of the content were calculated to serve as control variables. These control variables can vary across the sample and can be considered similar to organization size, or age of an individual in other contexts. It is important to control for these effects since we wish to examine the effects of the network structure on these popularity behavior rather than look at characteristics of the population affecting popularity behavior.

Table 13. Control Variables for the research questions

Control Related Related Construct Construct Variable Research Propositions Definition Operationalization Name Question Number of Research Propositions Number of Count the number of Individuals Question 1, 2, 3 Individuals individuals that 1, 2 bookmarking bookmark the piece of

150

content Number of Research Propositions Number of Count the total number Tags Question 1, 2, 3 tags used of tags used by 1, 2 individuals to characterize the content

Dependent Variables

Two dependent variables were used in testing the hypotheses. The first dependent variable - a measure of content popularity- was operationalized as a binary dummy with a value 1 indicating that the content reached popularity and 0 indicating that the content did not reach popularity. This dependent variable is relevant to testing for hypothesis (propositions) 1 and 2. The second dependent variable used was a measure of the duration it took a content to reach popularity.

This duration is measured from the time the content is introduced into the system to the day it reaches popularity.

Table 14. Construct and measures for dependent variables

Dependent Related Related Construct Construct Variable Research Proposition Definition Operationalization Name Question Popularity Research Proposition 1 Popularity is a Binary dummy Question measure of the variable with 1 1 number of indicating popularity bookmarks and 1 indicating non- content receives popularity in a predetermined period of time10. Duration to Research Proposition 2 Duration to Duration measured Popularity Question and 3 popularity is a in the number of 2 measure of the days to reach number of popularity

10 http://www.deliciousforum.com/forum/comments.php?DiscussionID=2468&page=1#Item_0

151

bookmarks a content receives relative to the duration the content has been present on the site.

Independent variables

Network cohesion is a measure of the extent to which the nodes in the network are connected by having connecting relationships (Wasserman et al.

1994). This measure was used as the major IV. A higher value of network cohesion indicates that the network is highly connected. Network cohesion is operationalized through a network centralization measure for each of the data points in the sample. Node betweeness is a measure of the extent to which nodes lie on a critical path through the network. Nodes that are present in critical paths have higher node betweeness scores. The measures for the specific constructs were constructed using UCINET v6. Cliques measures are constructed by counting the number of maximally connected sub-graphs of a specified size. The minimum specified clique size was 3.

Table 15. Constructs and measures for the independent variables

Independent Related Related Construct Construct Variable Research Proposition Definition Operationalization Name Question Network Research Proposition Network Network cohesion Cohesion Question 1, 1 and 3 cohesion is the measured through 2 extent to which a network nodes within the centralization index network share similar characteristics and subsequently

152

coalesce together. Average Research Proposition Node Measure Node Question 2 1 Betweenness is constructed by Betweenness a measure of averaging node the extent to betweeness over which an actor all network nodes lies on the paths connecting other actors in the network. Number of Research Proposition Cliques are a The number of Cliques Question 1 2 measure of the cliques in the extent to which network clustering is taking place in the network

In addition to these measures, network density measure was also calculated. Network density serves to highlight global interconnectedness and its inclusion in the analysis aids in examining the effects of global interconnectedness on popularity behavior. Network density is defined as a ratio of the number of relationships that exist in the network to the number of possible relationships in the network. It provides a broad measure of the extent to which the nodes are connected to each other.

3.3.4 Data Collection and Cleaning

The data about content, individuals and tags can be collected through 3 methods. These methods rely on either automated services or manual approaches. The first automatic method is the use of the delicious.com application programming interfaces11 (APIs). The APIs allow developers and

11 http://delicious.com/help/api

153

users to create applications that illustrate bookmarking behavior and tag use.

The second approach is through automated screen scraping. These applications are not supported by delicious.com APIs and if misused can result in banning the originating IP addresses by delicious.com. Screen-scraping applications employ the structure of the webpage to parse the related HTML as to extract data embedded in the HTML. The applications are usually written in a web based programming language and employ parsing modules and complex fee-text queries that use alphanumeric operators and associative rules to obtain the required data. The third method is through a manual screen-scraping. The manual approach is highly tedious and used only as the first step to create and validate automated scripts. Here, the user browses the site by hand and uses copy/paste functions to extract and save the data. The study employed all three approaches at different stages of the data collection. The APIs were used to collect information about content popularity. This data was then parsed by automated scripts running on a server to scrape the history of popular content bookmarking on the delicious.com as to extract all data about individuals and tags used and their changes. Manual screen-scraping was used to complement the automated approaches when necessary as in some cases the script was locked out of the site for a small duration of time.

The collected data points were cleaned as follows: we used multiple text mining and editing tools including free-text analysis software (Text Pad http://www.textpad.com/) and Excel 2007 functions such as Text to Column functions, if-else statements and prepackaged extensions to Excel such as the

154

ASAP add-on package (http://www.asap-utilities.com/). The last one provides sorting, text and numeric cleaning functionality for Excel tables. These tools were used to provide column headers, partition the dataset based on the level of popularity, segment the tag groups, create lists of unique tags and individuals for each of the cases for input to network programs, etc.

3.3.5 Sampling Strategy and Process

The collected data included both the popular and un-popular content.

Overall, data points covered a bookmarking period of 2 years from September

2007 to September 2009. These data points were collected at random time intervals during this period to avoid exogenous characteristics of any specific period such as US elections, Oscar events, etc.

These points were divided into 2 groups of 100 data points and 20 data points, respectively. The first group was used for hypothesis testing and the second group was used for testing sample representativeness as discussed below. In this regard the sampling for the relevant data was based on two characteristics, the region of the tail distribution the content comes from, and the rate of change as it moved to popularity. One of the characteristics (the location) implies that the sampling is based on is the dependent variable and emerges from examining long-tail distribution. The followed sampling strategy can thus be operationalized as a decision-tree as depicted in Figure 16 which shows how the popularity of content can be located.

155

Figure 16. Sampling based on location within the long-tail of content popularity

The region of the distribution that the content comes from is characterized as “low” and “high” where the “high” region is the 20% that represents the head of the distribution and the low represents the remainder of the 80% that did not achieve popularity. The rate of change characterizes the length of the period as content moves towards popularity. The rate of change can either be “fast” or

“slow” in terms of how long it takes for a content to move to popularity. The sample was therefore split based on strategy depicted in the figure 16. The delicious.com generated popular list (http://delicious.com/popular) was used as the aggregator for popular content. This list was sampled intermittently for popular content over the duration of a study period for 50 data points of popular content. During this period corresponding to every sample of a popular content data point, an unpopular content data point was also collected. The unpopular data point was collected by comparing the number of days the content was on the site with the number of bookmarks it had received over that duration. Popular

156

content receive a large number of bookmark behavior on part of individuals over the duration it is present on the site while unpopular content receive a lower number of bookmarks. Using this process 50 data points were collected for popular and unpopular samples, respectively. The sampling of the 50 popular data points did not have further segmentation but was a continuous variable representing the number of days the content took to emerge as popular. This continuous variable (duration of popularity) has sufficient variation as described in Table 16 to examine research question 2 and test proposition 3.

Sample Representativeness

Given the large potential population to draw upon, 100 million unique pieces of content12, it is necessary to ensure the representativeness of the sample. Several approaches to evaluate the representativeness of the sample are available. One approach compares groups across demographic factors based on treatment conditions employed to ensure no self-selection bias (Goetz et al. 1984). Another approach examines the representativeness of a sample to the population as to generate similarity measures between the sample collected with separate and distinct new collections (Cao et al. 2002). Studies have also examined the difference between randomized population sampling strategies and those that employ a random assignment and treatment model conclude that hybrid models that combine aspects of random selection and subsequent

12 http://en.wikipedia.org/wiki/Delicious_%28website%29; http://blog.delicious.com/blog/2006/09/million.html ; http://blog.delicious.com/blog/2008/11/delicious- is-5.html

157

assignment are better (Ludbrook et al. 1998). The approach to testing sample representativeness in this case was a permutation analysis of differences across means during the assignment process (Ludbrook et al. 1998). This study thus compares mean of measures of the study sample with means of a reference sample collected for representativeness analysis.

As noted 20 additional data points were randomly collected and similar measures were generated for the reference sample to ensure that the analysis sample is representative of the larger population. The measures include the duration to popularity, the number of individuals that bookmark the content and the number of tags that are used to bookmark the content. The mean and standard deviation for the sample of 20 is provided alongside the analysis sample in Table 16. In addition we also find that while the means are different the differences are not statistically significant suggesting that the sample and the broad population of delicious.com share similar characteristics. Hence the analysis sample can be regarded as a representative sample of the population.

The Levene test for equality of variances is provided below in table 16.

Table16. Comparison of Study and Reference samples

Descriptive Statistics Variables Duration to Number of Number Popularity(days) People of Tags bookmarking Analysis Sample Size 100 100 100 Sample Mean 107.70 379 111 Std. Deviation 208.518 261.673 43.500 Representative Sample Size 20 20 20 Sample Mean 115.29 372 104 Standard 214.368 240.791 49.60

158

Deviation Statistical Test for Equality of Variance in Samples Levene’s Test for Equality of Variances F Significance T statistic 0.006 0.942092 .017032

Generation of Network structures from Data

The creation of network structures was conducted as follows. First using unique lists of tags and individuals, links that highlighted co-occurrences of tags between individuals were used to create a network structure for each data point.

The process is illustrated in Figure 17.

159

Individuals and Tags Website A

Individual A Individual B Individual C Individual D

Tag A Tag B Tag C Tag D Tag X Tag Y Tag P Tag Q

Tag E Tag F Tag G Tag W Tag Z Tag R

Translating diagram to table

Tag A Tag B Tag C Tag D Tag P Tag Q Tag R Tag X Tag Y Tag Z Tag E Tag F Tag G Tag W

Individual A 1 1 1 1

Individual B 1 1 1 1

Individual C 1 1 1 1 1

Individual D 1 1 1 1 1 1

Creating social network from co- occurences

Individual A Individual B Individual C Individual D

Individual A 1 0 0

Individual B 1 1 1

Individual C 0 1 1

Individual D 0 1 1

B

A C

D

Figure 17. Generation of network data and measures

Individual’s bookmark and link tags to pieces of content. Consequently individuals become connected through these tags. As tags are shared ‘socially’ in the system, the tags are consequently interpreted as shared cognitive devices representing individuals’ shared cognition about a domain. This shared cognition through tags is interpreted as a relationship in the subsequent network analysis.

Since more than one tag can be used to describe a piece of content individuals using similar tags create overlapping and interconnected relationships. A network is created when multiple individuals use similar tags. In network terminology an affiliation is created between users and tags, where the users have relationships

160

with each other through the use of similar tags. Network literature (Wasserman et al. 1994) refers to this as two mode networks where the mode refers to the type of nodes i.e. two different types of nodes. Here the first mode represents the actors and the second represents the tags that the individuals are affiliated with.

Since matrix formats are typically used to represent networks, the first mode here are the rows and the second mode are the columns. Thus the affiliation network can be represented as entries between individuals (rows) and tags (columns) over time for a piece of content. Next such affiliation networks are converted into one-dimensional relationship networks by symmetrizing the rows and columns of the same mode. Figure 17 depicts the transformation of individual’s associations with tags into a one-dimensional relationship network. This can be done using automated tools provided by network analysis software such as UCINET 6.

3.3.6 Descriptive statistics and data distributions

Descriptive statistics for network density, network cohesion and network connectivity that are relevant for the analysis of RQ 2 and Hypothesis 3 were calculated next. Based on the means and standard deviation of the variables, we observed sufficient variance in the variables to be included in the analysis and to provide meaningful results. P-P plots were further used to verify the normality of the variables. Here the normal distribution is indicated by the data being present along the diagonal of the X and Y axis of the P-P plot (Appendix B). The descriptive statistics for the independent variables for RQ 1 are provided below.

161

Table 17. Descriptive statistics for independent variables for RQ1 Descriptive Variables Correlations Statistics Network Number of Network Number Cohesion Cliques Cohesion of Cliques Sample Size 100 100 Network 1 0.173(0.2) Cohesion Mean 0.0035 327.41 Number of 0.173(0.2) 1 Cliques Std. 0.007 256.95 Deviation

Table 18. Descriptive statistics for Independent variables for RQ

Descriptive Variables Correlations Statistics Network Network Network Network Network Network Density Cohesion Connectivity Density Cohesion Connectivity Sample Size 50 50 50 Network 1 0.181 0.416 Density Mean 0.006759 0.0053 0.5963 Network 0.181 1 -0.051 Cohesion Std. 0.002 0.003 0.29 Network 0.416 -0.051 1 Deviation Connectivity

162

Finally, assumptions about normality and heteroscedasticity had to be

verified (Cohen et al. 1983) for hypothesis testing. A Shapiro-Wilks test of

normality was used to test for normality of independent and dependent variables.

We found that the independent variables were normally distributed while the

dependent variable “Duration to popularity” was not. The verification of normality is depicted in table 19 and non-significance indicates that the data are normally distributed. P-P plots were also constructed for number of people and number of tags to verify normality of distribution for these two variables. The plots indicate that they were normally distributed. We also find that the number of people bookmarking specific content is also not normally distributed across the data. P-P plots were also constructed for duration of popularity to verify the power-law nature of the duration to popularity. P-P plots can be used to verify the power law nature of the sample and is indicated by the data being present along the diagonal of the X and Y axis of the P-P plot. The P-P plot for the duration to popularity verified the power law nature of the sample (Appendix A).

Table 19. Shapiro-Wilks Normality Test Names of Variables Shapiro-Wilk Significance. Statistic Count of Number of People 0.665525 0.0000*** Count of number of Tags 0.955378 0.3528 Network Cohesion 0.96989 0.6643 Network Connectedness 0.899478 0.0210 Network Density 0.984699 0.9644 Duration to Popularity 0.584316 0.0000*** = significant at 0.1 level, ** = significant at the 0.05 level, *** = significant at the 0.005 level

163

3.3.7 Hypotheses testing

Logistic Regression (Cohen et al. 1983; Tabachnick et al. 2001) and the survival analysis (Efron 1988; Hosmer et al. 2011) were used to test the propositions 1-3 by estimating coefficients of the measures included in the regression model. The dependent variable in propositions 1 and 2 are categorical variables and consequently I employ regression methods that allow for categorical dependent variables. Logistic regression analysis allows for categorical dependent variables such as in this case of popular content and unpopular content. In the case of more than 2 categories in the dependent variable, I would employ a multinomial regression analysis. Such an analysis may be performed if the popular content was broken down into further intervals based on the number of days they took to become popular. This analysis is, however, out of the scope of this study and subsequently will not be performed.

In case of proposition 3, duration to popularity is a continuous variable.

Duration to popularity is a measure of the number of days it takes for content on the social bookmarking service to emerge as popular. This duration is measured as a period of n days where n is the number of days after initial submission when the content reached popularity. Due to the power-law nature of such a distribution where there are large number of content that take long durations of time to become popular while a few take shorter durations of time to become popular. This results in a distribution where the data set is censored and thus the analysis methodology needs to account censored data where the distribution of the dependent variable, “duration to popularity”, is taken into account. Based on

164

the right-censored nature of the dependent variable, the methods employed to analyze the dependent variable need to account for the censored data. Cox regression models (Efron 1988; Hosmer et al. 2011) are designed for the analysis of time to an event or time between events. It can be used to examine both left and right censored data sets. It is also referred to as survival analysis

(Tabachnick et al. 2001) since one of its popular applications is to examine time from diagnosis of a terminal illness to death i.e. survival. Consequently, for the analysis of proposition 3, Cox regression survival analysis will be employed.

Several other methods may be applied to test the effect of various independent variables on the binary outcome dependent variable in the case of propositions 1 and 2. ANOVA (Cohen et al. 1983) may be used to distinguish the two groups when the experimental condition applied is a manipulation. In this case however ANOVA is insufficient since the goal of the analysis is to explain how the outcome dependent variable is a function of independent continuous covariates rather than as a grouping variable for an ANOVA. In addition to an

ANOVA analysis, a profile analysis may also be used to analyze the data to find the characteristics along which the group differs. Uni-variate and multivariate t- tests may also be performed to distinguish the two groups on the continuous independent variables. These methods, however, do not help with the current analysis since the propositions theorize about the effect of the relationship and the direction of the effect between the dependent and independent variables.

165

In the case of proposition 3, several other event analysis methods that may be employed to examine duration to event variables such as Kaplan-Meir survival analysis and Life Tables (Efron 1988; Hosmer et al. 2011). These methods however do not lend themselves to the analysis of duration to popularity. Kaplan-Meier survival analysis does not allow the examination of effects of continuous covariates on the dependent variable, duration to popularity, and allows only the presence of categorical groups as the factor by which the duration to event is analyzed. Life tables are similarly unsuitable for the analysis of duration to popularity since they rely on categorical factors to examine duration to event data and furthermore, life tables convert the duration data into intervals thereby reducing the variance in the duration to popularity dependent variable.

3.4 Findings

Propositions 1 & 2: Popularity Analysis

Popularity is measured as a binary dependent variable where the value of

0 represents non-popular content and 1 represents the popular content. This coding is based on the nature of the sampling process where a popularity is a dummy variable representing the region of long-tail distribution from which the case originates. I employ the logistic regression analysis methods as part of the

PASW/SPSS18 package to examine the popularity variable. Tests were conducted at the 0.05 level of significance. 3 logistic analyses are run with nested models within each analysis. The nested models within each analysis are

166

the constant only model, model with constant and control variables, model with constant, controls and proportional tagging variable and the final model with the constant, controls, proportional tagging and independent variables (see table 20).

The control variables in these models are number of individuals bookmarking and the number of tags used. Proportional tagging is the number of tags per individual for each case. The creation of this variable will be discussed as part of the discussion of the analysis results. In the 3 different logistic regression runs I compare the effect of specific data and variable characteristics on parameter estimates, model fit and significance. Additional variables may be developed as part of this analysis to account for certain effects.

167

Table 20. Results of Logit Analysis

Analysis Analysis Model Model AIC Cox and No Name No Name -2LL Snell R2 Variables B S.E. Sig Constant 1 Baseline 1 only 138.6 Constant 0 0.2 1.00 Constant and 2 Controls 31.07 37.07 0.659 Constant -6.80 1.59 0.00** count_people 0.003 0.007 0.643 count_tags 0.108 0.031 0.001***

Tag controlled Constant 2 Model 1 only 138.6 Constant 0 0.2 1.00 Constant and 2 Controls 31.06 37.06 0.659 Constant -6.80 1.59 0.00** count_people 0.003 0.007 0.643 count_tags 0.108 0.031 0.001*** Constant, Controls, Proportional 3 Tagging 31.04 39.04 0.659 Constant -6.361 3.129 0.042** count_people .001 .017 .956 count_tags .116 .064 .070* prop_tags -1.475 9.182 .872

3 Zscored 1 Constant 138.6 Constant 168

Whole Model only with Endo Effects Constant, Controls, Proportional Tagging and 2 Endo 35.76 43.76 0.643 Constant 2.231 .813 .006** count_people 9.570 2.706 .000*** prop_tags 3.066 .922 .001** tags_per_cliq -.444 .836 .595 Constant, Controls, Proportional Tagging, Independent 3 Vars 11.74 23.74 0.719 Constant 3.747 2.539 0.14 count_people -1.404 6.350 .825 prop_tags 2.848 3.803 .454 tags_per_cliq 15.590 8.041 .053* net_cohesion 3.164 2.185 .148 nos_cliques 15.286 7.904 .048** * = significant at 0.1 level, ** = significant at the 0.05 level, *** = significant at the 0.005 level

169

Table 20 provides the results of analysis 1 which is the baseline analysis.

The dependent variable is the binary coded variable dependent variable. The 3 models run are the constant only model, the constant and controls and the constant, controls and the IV’s i.e. network cohesion and number of cliques. I find that there is a significant difference between the constant model and the controls only model. I find that the number of tags has a positive and significant effect on the dependent variable popularity. This suggests that the number of tags in cases have a positive effect on the popularity of content.

However, this is an absolute number of tags for each case. There may be content with a large number of tags which is dependent on the number of people that actually bookmark it. Cases with a large number of individuals bookmarking a piece of content may have a large number of tags associated with it resulting in a significant positive effect for the count_tags variable on the DV, popularity. To mitigate against this effect, a ratio variable was constructed that depicts the number of tags per individual for a case. This variable is referred to as prop_tags or the proportional representation of tags per individual.

Analysis 2 reflects the effect of prop_tags variable in the model. We see that the addition of prop_tags does not significantly change the model but it does reduce the influence of the count_tags variable to non-significance at the 0.05 level. In subsequent analysis, the prop_tags variable included to mitigate the effect of count_tags in the model. Furthermore, as part this analysis, it was determined that standardized measures will be employed for the independent

170

and control variables due to the large standard errors and the different scales of the variables.

Endogenous effect Correction: In addition to the prop_tags variable, there may be endogenous effects in the model. Endogenous effects refer to the constructs and variables in the model that change over time and have a mutually constitutive relationship with the dependent variable (Chintagunta et al. 2010).

These effects need to accounted for so that we can distinguish effects of independent variables distinctly from any potential endogenous effects in the model. In the current analysis endogenous effects may take the following form; the number of tags used for a piece of content increases over time and the effect of the number of tags in prior time periods may influence the number of tags used in subsequent time periods and also influence the number of cliques that result in the network since the clique formation is around the tag structures. And subsequently the number of cliques influences the popularity of the content. This endogenous effect can also be accounted for by taking the ratio of the number of tags and the number of cliques for a specific case. This provides an average value for the number of tags per clique in the final network for each case which remains relatively time invariant. This variable is referred to as tags_per_clique for which a standardized score is constructed as well.

Since the 2 constructed variables prop_tags and tags_per_cliq employ the count_tags for each case, there is a high likelihood that certain effects are getting re-estimated in the model. Subsequently, the count_tags variable will be removed from the model since the effects of the number of tags are accounted in

171

the prop_tags and tags_per_cliq variables. The variables and changes discussed are reflected in analysis number 3 depicted in the table 20. I find that there are significant differences between a constant only model and a model with the controls and proportional and endogenous variables (Models 1 and 2). I find that both the count of people and the proportion of tags are significant in explaining the dependent variable. Thus controlling for these effects in the model for popularity makes sense in the analysis. I also find that the endogenous effect in model 2 is not significant.

In a comparison of Models 2 and 3 i.e. the controls model and the controls and independent variables model, I find that the significant effects of people and the proportional tags construct is mitigated by the number of cliques in the cases and the average number of tags per clique for each case. This suggests that an endogenous effect is present in the model and is significant at the 0.1 level. After controlling for this endogenous effect in the analysis, there is an effect of the number of cliques at the 0.05 level of significance on the popularity of content. I also find that network cohesion does not have a significant effect on the popularity of content.

Thus: Proposition 1 is unsupported by the analysis. Proposition 2 is supported at the 0.05 level of significance since the sign on the coefficient is positive. Furthermore, we see that the addition of the independent variables provides an R-squared for the model as 0.719. While the R-squared of a logistic analysis does not necessarily tell indicate the explanatory power of an analysis, what it allows me to argue for is the improved explanatory capability of the

172

independent variables. From a comparison of Models 2 and 3 we see an improved explanatory effect of approximately 8% (71.9%-64.3%) between models 2 and 3 based on the addition of the independent variables. Based on the significance of nos_cliques we can conclude that nos_cliques can potentially explain for 8% of the probability that content is popular. The implications of these results will be discussed in the discussion section.

Proposition 3: Duration to Popularity Analysis

I employed the Cox regression survival analysis methods provided in

PASW/SPSS 18 to examine the duration to popularity variable. Three models are analyzed with duration to popularity as the dependent variable. Model 1 is the null model where time is the only covariate in the model, Model 2 includes time as a covariate and also includes the number of bookmarks and the number of tags as control variables and Model 3 includes the time, control and independent variables. The log-likelihoods of the 3 models will be examined to determine, if the models account for a significant proportion of the variance in the duration to popularity. The table 21 below provides the results from the 3 models.

173

Table 21. Analysis of popularity Cox Regression Model -2 Log- ChiSq Sig Variables in Coeffi- SE Sig Exp(B) Likihood the Model cient Ratio Null 303.716 Time only Model

Control 302.257 1.612 0.482 Time only Only Model Number of 0.00027 0.001 0.667 1.000 People Number of 0.00208 0.004 0.604 1.002 Tags

Complete 292.686 10.994 0.048* Time only Model Number of 0.00021 0.001 0.764 1.000 People Number of 0.003 0.005 0.589 1.003 Tags Network -65.604 23.37 0.005 0.033 Cohesion *** Network -0.907 0.693 0.191 0.404 Connectedness Network -0.651 61.63 0.992 0.552 Density = significant at 0.1 level, ** = significant at the 0.05 level, *** = significant at the 0.005 level

The null model is the baseline hazard model where the change in the dependent variable duration to popularity is based on time alone and all covariates are set to zero. All subsequent models are compared with the baseline model to examine if they explain more than the time variant model. In a comparison of the controls only model with the null model I find that the addition of the control variables is not sufficient to distinguish the model from the time variant one alone. Furthermore, the control variables of the “number of people” and the “number of tags” does not have a statistically significant effect on duration to popularity and subsequent the odds ratio (Exp(B)) is meaningless. In

174

model 3, i.e. the complete model, the independent variables, network cohesion, network connectedness and network density are added to the model. The comparison of the -2 Log Likelihood ratios of the null and complete model

(303.716 >>> 292.686 respectively) indicate that this model is statistically significant at the 0.05 level and is different from the null model and thus 1 or more variables in this model aid in the explaining the duration to popularity.

Model 3 i.e. the complete model, supports proposition 3 (α< 0.05). A positive sign on coefficient would suggest that the covariate increases the hazard or the likely hood to reach popularity sooner and consequently the negative coefficient suggests the inverse. The coefficient for the network cohesion is negative which suggest that network cohesion increases the duration to reach popularity.

Network connectedness and Network density are found to be insignificant in the complete model. Exp(B) provides the odds ratio or the hazard ratio in the model where values greater than 1 indicate positive hazard or increase the odds of the event occurring while values less than 1 indicate negative hazard or increase the odds of the event not occurring. The significant odds ratio for network cohesion suggests that for a unit change in network cohesion there is a 3% chance that there is an increase i.e. longer duration to popularity.

175

Table 22. Survival Table

At the means of Baseline Covariates Time Cum Hazard Survival SE

1 .879 .667 .051 2 1.201 .575 .058 4 1.291 .551 .060 5 1.588 .481 .063 8 1.695 .457 .064 9 1.809 .434 .065 10 1.928 .411 .065 11 2.053 .388 .065 12 2.183 .365 .065 13 2.321 .343 .065 25 2.471 .320 .064 27 2.629 .297 .063 33 2.795 .275 .062 44 2.970 .254 .061 61 3.155 .233 .059 82 3.356 .213 .057 101 3.571 .193 .055 172 3.810 .172 .053 198 4.088 .152 .050 214 4.403 .131 .047 258 4.754 .112 .044 267 5.167 .092 .040 288 5.666 .073 .035 308 6.257 .056 .031 350 6.967 .040 .025 401 7.866 .027 .020 561 8.987 .016 .014 578 11.227 .006 .006 948 . .000 .

The survival table provides a measure of the extent of the population that has not yet achieved popularity at various time intervals. The baseline cumulative

176

hazard provides the ratio for the time only model with the continuous covariates set to zero. At the means of the covariates, the survival rate provides the percentage of the population of popular content that has not yet reached popularity. The means of the covariates are count_people = 402, count_tags =

117.240, net_cohesion = 0.010, net_connec = 0.584, and net_density = 0.007.

From the table 22, I note that 34% of the population of popular content achieves popularity within 1 day and that approximately 59% of popular content achieve popularity within 10 days. After this period the change is gradual with smaller proportions of cases distributed at various time intervals. The interpretation of these results will be provided in the discussion section of this paper.

DfBeta’s were also calculated as part of this analysis to provide a measure of influence for the various cases in the models for different covariates. The

DFBetas can be used to identify influential in the cases that may influence the final model and subsequently are eliminated in the analysis or taken into other analysis models. One case was identified as an influential in this analysis with high values of Dfbeta’s. When this case was eliminated from the dataset and subsequent analysis the values of the other covariates and the model did not change significantly to merit removal and the case was reintroduced into the dataset. The chart below provides the cumulative survival time at the means of the covariates and is a graphical depiction of the survival table (Figure 18). By probing the table we can see that there are a few cases that become popular

177

quickly, some take an average amount of time of about 10-20 days and a large number of them take greater than that duration to gain popularity.

Figure 18. Survival function at the mean of covariates

3.5 Discussion and Conclusion

These findings offer partial support for the case study findings in chapter

2, and also elaborate on them. The case studies in chapter 2 found that duration to popularity is more common in networks with low initial connectedness and nodes of tag type in the network and especially nodes of tag type with high betweeness scores. The results of the quantitative analysis show that networks in the head region of the long-tail do not have a greater number of structurally

178

similar nodes as compared to the tail region of the long tail distribution

(disconfirmed proposition 1). The number of cliques, however, is significantly different between the two (confirmed proposition 2).

The results of propositions 1 and 2 illustrate a few points. Firstly, cohesiveness is not only a property of networks in which popular content is embedded. Unpopular content may also be embedded in cohesive networks.

This implies that content of both types, popular and unpopular, are able to attract individuals that subsequently form cohesive sub-networks around those pieces of content. Secondly, while both types of content are able to attract individuals, popular content attracts individuals from a diverse range of interests represented through their tagging behavior. Consequently, a greater number of clique sub- structures are likely to form for popular content. This is evident by examining the significant effects for number of cliques and tags per clique for analysis 3.

Endogenous effects are a concern in such analysis as such effects may influence the have mutually constitutive relationships and as such after such effects are accounted for number of cliques has a significant effect.

The results of proposition 3 also bolster the results from the analysis of propositions 1 and 2. The result of proposition 3 suggest that while network cohesion does have an effect on duration to popularity, the effect is negative or in the opposite direction. Furthermore, the model that network cohesion is part of is statistically significant and serves to explain the duration to popularity of content.

The findings of the model are that highly cohesive networks are not conducive to generating rapidly popular content. Higher values of network cohesion increase

179

the duration to reach popularity. The survival table also illuminates the differences between various types of popular content. Approximately 34% of content sampled achieved popularity rapidly, within one day of the content being introduced into the content platform and social network. Approximately 60% of content achieve popularity within 10 days. This suggests that most content that achieves popularity does so quite rapidly while others take longer durations to achieve popularity.

Putting together the results of propositions 1 and 3, we find that while network cohesion does not influence the popularity of content, it does influence the duration that a piece of popular content takes to achieve popularity. This effect is the opposite direction and implies that highly cohesive networks are detrimental to achieving popularity quickly. The results of proposition 2 suggest that network cliques are influential in the determination of popular and non- popular content. Popular content has a greater number of cliques compared to non-popular content and significantly determines the popularity of content. These results expand the findings and discussions in the literature on networks, long-tail and popularity (Fleming 2007).

Overall, the study verifies and disconfirms several findings in the literature and expands on them. To the network literature, the results augment Coleman’s argument (1986) on network closure and the value of tightly knit networks on information diffusion and information redundancy. Tightly knit networks, are not very flexible to new sources of information and resist change to new entrants.

Less cohesive networks are able to incorporate new sources of information

180

through the entry of individuals as brokers (Burt 1997). Burt accordingly (1992a) suggests that weak ties to a closed network by nodes allow networks to be bridged. These weak ties, when established by critical nodes, expand the network and change the nature of the network evolution. While popular content and unpopular content cannot be distinguished on the basis of network cohesion, network cohesion does influence duration to popularity. Less cohesive networks are able to incorporate new members and information and consequently are able to incorporate new members into the existing network rapidly leading to the rapid change in popularity.

This also suggests that the argument of the “fit get richer” logic suggesting that node introduction based on the fitness between a node and the network does not always hold true. While greater fit between a node and the network increases node attachment, it does so within the restricted population of nodes that might bear closer correspondence, or homophily, to the network. Yet, networks that are rooted in specific forms of content, for instance Linux coders, have a wider population of nodes to draw from. If the fitness logic were to apply to Linux content one might expect to find only technical coders as participating in this network. In a content-based network, however, we find a broader population of nodes that have interest and hence attachment is not only a matter of fitness with the established affiliation based relationships. This lack of attachment based on fitness actually results in a larger population that is drawn to the content and consequently results in a shorter duration to popularity for popular content as evidenced by the confirmatory results of proposition 2 and proposition 3. Lack of

181

attachment due to fitness is also displayed in the larger number of cliques that are present in the popular content. Cliques are representative of various groupings of actor and tag combination. A larger number of cliques imply that popular content draws from heterogeneous populations of users from a variety of domains whose representations of the content may not be congruent with one another. These heterogeneous populations look for like-minded individuals through the use of tags as signifiers to underlying cognitive schema. The search and assignment of these tags to associate oneself with like-minded individuals and networks result in clique structures for popular content.

This paper also examined the role of an artifact, tags, in the evolution of socio-technical networks. The concept of socio-technical networks has been developing in the literature by the works of Monge and Contractor (2003), Yoo et al (2007), and others. The technical artifact in most cases has been a physical device in the form of databases, CAD systems, health information systems that mediate relationships between individuals. In this paper I theorize on the role of a cognitive artifact in a virtual space, i.e. tags. These cognitive artifacts serve as a mode of communication and coordination device among distributed actors in a similar manner as Post-it notes on a fridge communicate and coordinate activities among family members. We find that these tags form the nexus around which other nodes organize themselves through high levels of centrality and connectedness. Tag structures as cognitive artifacts are also a way to attract and maintain attention of various distributed groups of individuals and are significant in establishing the popularity of content. Tag based classification schemes are

182

flat classification structures that allow individuals to place their own cognitive schemata on content. Such flat classification schemes allow for multiple interpretations of the content space resulting in attention and engagement from the users of a content community.

This paper also contributes to literature on the long-tail and popularity. The extant literature (Brynjolfsson et al 2006, Lew 2008, Anderson 2006, Elberse

2008, Rubinson 2008, Shepherd 2008) on long-tail has examined its characteristics, but not its generative mechanisms. This paper theorized that socio-technical networks contribute to the mechanisms that lead to long-tail behaviors. We find that several network characteristics such as clique structures, connectedness, centrality indeed play an important role in distinguishing between types of content that show up in the head region from those that show up in the tail. We also find that duration to popularity is also influenced by specific generative network characteristics. Network cohesion measured through the distance i.e. the number of paths to get from one corner of the network to another, has a negative and significant effect on the duration to popularity.

Accordingly, networks that are highly cohesive take a longer duration to reach popularity. Such highly cohesive networks are tightly-knit and have established norms, conventions and language (Coleman 1986). As such they are able to transmit information internally very easily, but are more resistant to external sources of information. In contrast, networks that start out as less cohesive are able to incorporate new and varied sources of information and grow also content in popularity fast. These findings form an important contribution to the literature

183

on the long-tail as they examine for the first time generative mechanisms rather than focusing on economic consequences.

3.5.1 Managerial Implications

The implications for managers are clear. Managers should focus on bridging large networks and making sure that participants in these networks have relationships with each other. While we might appreciate the phrase “let a thousand flowers bloom13”, managers must not forget the forest for the trees.

This work suggests that managers in addition to encouraging the growth of new ideas should also ensure that these ideas are disseminated as widely as possible. For instance, in the context of music, it would be important to get new pieces of music related to existing ones to get them to be popular (P1). In addition, it is essential that these new pieces of music not be cast into a clique or genre so as to be broadly interpreted (P2 and case study results). These findings can similarly be interpreted in other contexts of information content, books, music and video.

Furthermore, to get ideas spread widely it is important to ensure that the networks which these ideas are embedded are cohesive and connected.

Connected networks are formed through a process of articulation and coordination of cognitive structures of individuals. Such articulations and coordination aid in establishing relationships between individuals. Unconnected

13 http://www.phrases.org.uk/meanings/226950.html

184

networks will result in stagnant ideas that will not spread through the network and subsequently remain unpopular.

3.5.2 Limitations

This study has several limitations that affect its generalizability. This study was primarily run on content based networks and its generalizability to other types of networks needs to be evaluated on a case by case basis. Furthermore, this paper has not examined the role of timing or the context of introduction on popularity, or the rate of popularity of content. The effect of context should not be underestimated, but there are few mechanisms, if any to measure, the context and timing of introduction in a quantitative manner, and relate them to the specific characteristics of the sample. Sample size might also be a concern since the sample size used to represent the head and long-tail of the distribution was

100. While the potential population is several hundred thousand we felt that limiting the sample size was important since a very large sample can provide confirmatory power in place of meaningful interpretation. In addition to concerns about the data, there may be other network variables that can add to the explanatory power of the model. The addition new variables might increase the explanatory power of the model. Furthermore, this analysis does not consider the timing of the various pieces of content i.e. sample. As such there might be exogenous factors in the environment that have not been accounted in the model.

3.5.3 Future Work

185

Future analysis should focus on increasing the size of the sample as another method to achieve representativeness. Researchers often sample larger parts of the population to establish representativeness and that method should also be verified in this case. Future work should also include exogenous factors such as timing of content introduction in conjunction with variables that model the environment in that period. That would control and account for the influence of exogenous characteristics in the popularity and duration to popularity analysis. In addition to the influence of exogenous characteristics, it might be fruitful to examine how these exogenous factors interact with the characteristics of the network to drive popularity. This type of work would be analogous to work in viral marketing that examines the interaction of type of content and the structures best suited for that content (Hogan et al. 2005; Phelps et al. 2005).

Further work in the domain of the long-tail should examine the tail region of the distribution in more detail. The tail end of the long-tail distribution has been examined in several studies and this study finds that network structures underlie this region of the distribution as well. Since this region of the distribution consists of niche markets, organizations are not invested in understanding the social structures that underlie the niche since they are relatively small. However individuals in these niches are part of structures that spread information within the niche efficiently and thus it is vital to examine such niches in greater detail.

Furthermore, organizations find differing success in such niches and examining the structural differences between niches is vital to marketing in them.

186

4. Discussion and Conclusion

4.1 Introduction

The goal in this thesis was to identify, in distributed IT environments, characteristics of socio-technical networks that influence content popularity and duration to popularity. I argued that tags and folkonomy serve as cognitive devices for individuals in networks to embed their cognition. As cognition is articulated, embedded and shared through these artifacts and affiliation based relationships are formed between individuals that employ similar cognitive devices or structures. Similar cognitive structures imply similarity or homophily in the network, i.e. that nodes are similar to each other in terms of their piece of content. This homophily results in groups coalescing together around a select set of cognitive structures. This behavior results in patterns of interlocking relationships between individuals through shared cognitive structures that subsequently result in specific network structures. The shared cognitive structures enumerated by individuals are part of decisions made and shared by individuals on content. The networks of individuals and related shared cognitive structures subsequently form socio-technical infrastructures that influence the popularity of content and the duration to achieve popularity. I investigate these characteristics of networks and the structures that they are embedded in through a series of research questions by generating, testing and validating several hypotheses about antecedents of popularity and population duration length in

187

these environments. We employ a series of case studies followed by a field study into characteristics of networks that influence popularity and duration to popularity.

4.2. Summary of Findings

Two studies were conducted to examine the manner in which distributed decision-making took place through IT artifacts and the manner in which the socio-technical network surrounding the information and content influenced the popularity of the content. The findings of these studies are discussed and implications for research and practice are drawn.

4.2.1. Lessons from Case Studies

To examine the process by which content emerges as popular and the formation of network relationships through shared cognitive structures we examine several instances of content. I specifically examine various forms of content in relation to popularity and their duration to popularity. Based on the characterization of the popularity of content as part of long-tail behavior we sample cases from different regions of the long-tail to examine in detail the manner in which the specific network and cognitive structures change and are shaped over time. State diagrams are employed to describe the various phases through which the network transitions as individuals articulate and share their cognitive structures.

188

Three types of characterizations were developed for the different classes of content popularity; dead-ducks; sprinters and slow winners. Dead-ducks refer to cases of content where there was no movement to popularity and they exist at the tail end of the long-tail distribution. Instead of garnering a lot of attention from individuals in the IT environment the content attracted few individuals and cognitive activity around that piece of content is relatively stagnant. While individuals might occasionally attach to a dead-duck it is the activity around the cognitive structures that is a significant determinant in the move to popularity.

The dead–ducks of content were contrasted with the sprinters and the slow winners. The slow-winners were content that had steady and fluid activity around the cognitive structures. The networks of individuals around these cognitive structures was connected for extended periods of time but these slow winners only became popular once there was a disconnection or disruption in the cognitive structures surrounding the content. This disruption resulted in the inclusion of a connective individual joining the network and bridging the disconnected cognitive structures. The bridging of this disconnection subsequently becomes the network and cognitive structure around which the broad network coalesces. This bridging activity also further spurs increased tagging activity around the content and subsequently results in the content becoming popular. The sprinters when contrasted with the slow-winners indicate the presence of substantially differing cognitive structures when the content is introduced into the IT environment. The differing structures reflect the diverse opinions of the individuals engaging with the content and knowledge domain. The

189

opinions are reflected in the articulations or the tags and folkonomy that individuals used to represent the knowledge domain. Thus conflicting cognitive structures or representations are the key to triggering bookmarking and representing new cognitive structures. Coalescing around one cognitive structure is achieved by individuals reconciling the different structures and the network establishing relationships to similar tags.

Findings from the case studies support several findings in the literature about the role of network structures in influencing opinions of individuals, in groups and in social networks. However, these findings need to be modified in the context of content networks studied in this thesis. Though Burt (Burt 1999;

Burt 1997b; Burt 2000) has established the importance of generic “bridging nodes” to network connectivity and social capital, findings from the case studies suggest that specific types of “bridging nodes” contribute more to the emergence of content popularity. Nodes of type “individual” inhibit the emergence of the content as popular, while nodes of type “tag” contribute to the emergence of popular content. This is evident in the case of the “slow winner” where the content rose to popularity only after a node of type “tag” had a relatively high betweeness score and a prominent position in the network. This analysis extends

Burt’s work by examining the role of different types of artifacts playing bridging roles in networks (Kane et al. 2008). This is significant since it suggests that in IT mediated environments nodes other than the social ones have significant bridging roles (Kane et al. 2008). Tags convey representations of cognitive structures that can be easily shared and thus offer the main mode through

190

which individuals communicate about a domain (Romero et al. 2011). As a mode through which individuals communicate about the domain, the tags and associated cognitive structures become key brokers (Hutchins et al. 1996) in creating networks structures that influence content popularity.

In addition the case study suggests that network closure i.e highly connected networks (Coleman 1988; Coleman 1990), followed by a weakening of the network bonds, can lead to content emerging as popular. This finding is evident in the case of the “slow winner” where network connectedness has initially a steady high value, but drops before the content rises to popularity. This in effect develops a more nuanced contingency for the process of gaining network closure (Burt 2000) as it suggests that networks with structural holes move towards closure after the establishment of bridging ties. Furthermore, this case study argues against the benefit of highly connected networks for the spreading effectively information. Highly connected networks resist the addition of new nodes and relationships. Information flow within such networks becomes is stagnant and restricted to the existing set of nodes and the growth pattern in such networks is slow and steady. This conclusion is evident from the analysis of the network surrounding the case of the “dead duck”.

The findings from the case study extend and complement existing work on network characteristics and their influence on individuals and organizations and it does so in the content of content popularity. While substantive literature has little to say about the influence of network characteristics on content popularity they do discuss how specific network structures act to generate emergent outcomes

191

(Barabási et al. 1999b; Katz 1993; Monge et al.). The case study analysis in a similar vein attempts to highlight how network and node characteristics shape the emergence of popular content in online social bookmarking.

4.2.2. Lessons from a Quantitative Field Study

The research questions posed in this thesis were also examined through a quantitative field study where hypotheses were tested on a broader sample. The quantitative analysis allowed me to control for certain exogenous characteristics that I was not able to specifically control in the case analyses.

The field study performed a Logit and Hazard (Efron 1988) analysis by sampling content from the head and tail regions of the long-tail distribution and verified that they were a good representative sample from the population.

Subsequent analysis was performed to examine the extent to which specific network characteristics distinguish between content in the head and tail of the long-tail distribution. The results show that networks in the head region have greater number of cliques. However the cohesiveness of the network is not significantly different between the networks in the tail and head. Networks in the head region have a greater number of cliques and a greater number tags per clique. However these tags that are embedded within the cliques do not serve to connect the cliques together. The disconnected regions of the network never truly get connected to the core of the network and remain disconnected and peripheral. Such clusters exist in both the head and tail regions of the long tail

192

distribution. This also suggests that tags in the tail region do not serve to connect distinct clusters of users together.

The duration to popularity analysis suggests that network cohesion which not significant for influencing the popularity of content does serve to influence duration to popularity. Thus the variable that is non-significant in influencing popularity serve to influence the duration it takes for content to achieve popularity. Network cohesion makes it harder to achieve popularity for content.

The more cohesive the network, the longer it takes for content to achieve popularity. This suggests that popular content that has a large number of tags within the cliques are not served by this characteristic beyond a certain threshold.

Too many tags within cliques with popular content will bridge the network too cohesively, which results in longer durations to popularity.

Overall, the quantitative analysis verified several findings in the literature and expands them. The result adds to Coleman’s argument (1986) on network closure and the value of tightly knit networks. Tightly knit networks are not flexible to new sources of information and resist change as new entrants try to join. As such, less cohesive networks through the entry of individuals as brokers

(Burt 1997), are better able to incorporate new sources of information. Burt accordingly (1992a) suggests that weak ties to a closed network allow multiple networks to be bridged. These weak ties, when established, expand the network and change the trajectory of the network evolution. This also suggests that the argument of the “fit get richer” logic, that node introduction in networks is based on the fitness between a node and the network, is not always the case. While

193

greater fit between a node and the network increases node attachment, it does so within a restricted population that might bear closer correspondence or homophily to the network. Networks that are rooted in specific forms of content, for instance Linux coders, have a wider population of nodes to draw from. If the fitness logic were to apply to Linux content one might find only technical coders as participating individuals in this network, however in a content-based network we find a broader population of nodes that have greater interest and hence attachment is not only a matter of fitness and established affiliation based relationships.

The quantitative analysis also finds that socio-technical networks contribute to the mechanisms that lead to long-tail behavior. The technical characteristics of the network that are represented by the tag structures are important to both the popularity of content and indirectly responsible for the duration to popularity by influencing the cohesion of the network. Multimodality is an important component of content networks and the multi-modality in reflected in the relationship between tags, content and users that contribute to popularity.

Network characteristics such as clique structures and cohesion play an important role in distinguishing the types of content that show up in the head region from those that show up in the tail. I also find that duration to popularity is also influenced by network characteristics. These findings are an important contribution to the literature on the long-tail, as they examine generative mechanisms rather than focus on phenomenological descriptions.

4.2.3. Concluding remarks

194

The results from the case study and the field study highlight the emergent nature of popularity. Popularity of content (Szabo et al. 2010) is a function of the multi-modal networks that they are embedded in. One may not be able to influence the popularity of content but we are able to create the conditions of it.

This emergent nature of popularity is reflected in the manner in which network cohesion while not driving popularity does influence the duration to popularity.

Tags play an important role (Tisselli 2010) in determining the duration to popularity. Tags are demonstrated to serve as socio-cognitive devices that help in the establishment and maintenance of relational structures within the content network. They mediate individual interaction and consequently play a pivotal role in structuring the network.

The quantitative analysis highlights the role of socio-technical networks in popularity and in the chosen context, content popularity. The case studies in chapter 2 found that duration to popularity is more common in networks with low initial connectedness, and nodes of tag type in the network and especially nodes of tag type with high betweeness scores. The quantitative analysis offers partial support for the case study findings and also elaborates on them. The case studies showed that high network connectedness was a predominant feature of non-popular content, which also reinforces the inverse relationship finding in the quantitative analysis. While network density was not examined in the qualitative analysis the findings in the quantitative analysis suggest that higher network density is good for rapidly popular content.

195

4.3. Limitations

The thesis has several limitations. The first one is that the number of cases employed in the case analysis in each branch of the decision tree. Greater number of cases would increase the confidence in the theoretical generalizations being made from the case data and provide more detailed insights into the way network structures evolve. Furthermore, the network and node characteristics were developed at time increments of approximately 25% in the case analysis.

Sampling at more frequent time intervals might provide a more nuanced understanding of the dynamics of network evolution.

The quantitative analysis also has its own limitations Since the quantitative analysis was primarily run on content based networks its generalizability to other types of networks is limited. Furthermore, the quantitative analysis did not examine the role of timing, or the context of content introduction in the network on popularity, or duration to popularity. The effect of context should not be underestimated, but there are few mechanisms, if any, to measure the context and timing of introduction in a quantitative manner, and relate them to the characteristics of the sample. Sample size might also be a concern since the sample size to represent the head and long-tail of the distribution was limited to

100. While the potential population is several hundred-thousand we felt that limiting the sample size was necessary since a very large sample can provide a biased confirmatory power to our interpretation. Representativeness of the sample was established through methods described previously. Future analysis

196

should focus therefore on increasing the size of the sample in a controlled manner. Researchers often sample larger parts of the population to establish representativeness and that method should also be validated. Future work should also include new exogenous factors such as timing of content introduction in conjunction with variables that model the environment in that period. That would control and account for the influence of exogenous characteristics in the popularity and duration to popularity analysis. In addition to the influence of exogenous characteristics, it might be fruitful to examine how these exogenous factors interact with the characteristics of the network to drive popularity. This type of work would be analogous to work in viral marketing that examines the interaction of type of content and the structures best suited for that content

(Brooks Jr 1957; Steyer et al. 2006b; Subramani et al. 2003).

Furthermore, this thesis does not delve into the characteristics of the content itself. In contrast, it seeks a structural explanation to content popularity.

Future work can also examine the contributory role of content characteristics in explaining the evolution of a popular content. Other theories than can be employed to contrast with a network approach include a Diffusion of Innovations analysis (Abrahamson 1991; Rogers 1995; Valente 1996) that can be customized to content based networks. While such theoretical explanations were not explored in this paper they may offer insight into other contributory mechanisms.

Other research methods may also be applied to the analysis of popularity in content based networks. Analysis of popularity may be run through the use of

197

simulated data. Simulated data are especially used for the dynamic analysis of popularity as this thesis was only able to perform a dynamic analysis based on the analysis of a few sample cases of popularity. The application of advanced computational techniques and dynamic network analysis may shed additional light on the processes and characteristics of networks that influence popularity at a more granular level (Carley et al. 2007). Tools for dynamic network analysis are still in the process of development and hold the potential for shedding new light on the evolution and growth of networks.

The static and dynamic analysis of popularity may be performed on other leading content sharing sites such as digg.com, buzzfeed.com. These sites display voting information from users on popularity of content but rather than focusing on the two dimensions of popularity and non-popularity, the sites further impose other dimensions of content classification ranging from surprising, novel, funny and more. These generative and emergent categories from users may highlight additional patterns in the rise of certain types of content. Furthermore, the sampling that was applied in this thesis focused on a limited set of content from a large population to avoid the problem of a large sample size resulting in biased confirmatory results (Royall 1986). However one may also approach this problem by sampling from a large population and correct for the large sample size by using statistical methods (Cohen et al. 1983).

4.4 Future Research

198

The findings of this thesis have several implications for research and practice. While earlier research (Anderson 2006a; Brynjolfsson et al. 2006;

Elberse 2008) on long-tail structures has chiefly focused nature and consequences of the long-tail, little is known about the systems and structures that generate such behavior. There is limited support for the role of collaboration in social networks having effects on the positions in the long-tail (Fleming 2007).

Our findings suggest that network structures can contribute to long-tail behaviors and that the content that is part of the long-tail can move from the tail to the head based on the changing network dynamics. Future work in research of long-tail structures need to examine the environments that these structures are embedded in. While Brynjolfsson (2006), Elberse (2008) and others examine the long-tail and discuss the investment potential in such distributions, they need to comprehend the underlying networks that produce such behavior. Such distributions while spontaneous also have underlying structures and targeted investments at focal nodes in such structures can greatly influence the outcome of such distributions. Furthermore, while collaboration (Fleming 2007) is critical in long-tail behavior, collaboration with specific focal nodes can result in vastly greater returns, consequently research should closely examine the returns to collaboration at various regions of the long-tail distribution.

This thesis also serves the knowledge management community (Bailey et al. 2010; Hester 2008; Phang et al. 2009) by emphasizing the role of representational or classification systems (Hansen et al. 2001; Poston et al.

2005) in managing and disseminating knowledge. The role of tags in the system

199

suggests that the storage and representational systems for knowledge needs to be closely examined (Bowker et al. 2000; Star 2002). This thesis examines the role of classification structures in mediating fluid and emerging cognitive understanding of a domain. This is a significant expansion since earlier work

(Bowker et al. 2000; Star 1989) has highlighted the manner in which cognition is shared through stable and fixed cognitive structures. However, this thesis has highlighted the manner in which fluid and emergent structures, folksonomies, aid in the dynamic articulation and shaping of cognitive structures in “Internet Time”

(MacCormack et al. 2001). Future work on classification structures and folksonomies should examine the stability of networks that form around such structures. While this thesis has examined network formation around cognitive articulations around tags and folksonomies, it has not examined the stability of these networks structures and their permanence should be examined in future work.

It adds to the IT literature (Brynjolfsson et al. 2006; Elberse 2008; Fleming

2007) by elaborating on the description of long tail characteristics of IT mediated networks. It theorizes about the hybrid network structures (Kane et al. 2008) that underpin long-tail distributions and about the social-technical characteristics of nodes that are critical to the long-tail. Further work in this area needs to examine the roles that various types of nodes play in IT mediated networks. While Kane

(Kane et al. 2008) and Monge (Monge et al. 2003a) both theorize about IT as nodes in the network, they do not specifically characterize the specific nature of these nodes. This thesis argues that tags and folksonomies as IT nodes aid in

200

sharing cognitive structures. However this is only one of many possible roles of

IT artifacts as nodes in a network and subsequent work needs to examine other possible roles that IT may have in a network when characterized as a node.

This thesis also serves the marketing and consumer research (Phelps et al. 2005; Subramani et al. 2003) community by identifying the characteristics of networks that need to be analyzed to examine viral behaviors in consumer markets. In this sense the paper goes beyond the traditional cross sectional examination of consumer networks for viral behaviors by conducting both a longitudinal case-based analysis and a cross-sectional quantitative analysis to identify how the change in networks results in viral behavior. This is not the typical word-of-mouth phenomenon that marketing research (Brooks Jr 1957;

Hogan et al. 2005; Phelps et al. 2005; Steyer et al. 2006a) has typically focused on. This thesis finds that cues in the environment, tags and folksonomies, are important to information diffusion in consumer networks. Subsequently marketing and advertising research should also focus on the effect that cues in the information space have on word-of-mouth effects. Peripheral network structures, also of interest to marketing researchers, may include technological artifacts in addition to social nodes that are also conduits of information. Subsequently the role of such information related structures should also be examined in marketing and related research.

4.5. Managerial Implications

201

To managerial practice, this study contributes by moving the focus of knowledge research from content and best practices to emphasizing the broader dynamic context within which content is placed. This study identifies “where do I place content/knowledge?” and distinguishes it from “what do I put in content/knowledge?” that is the typical focus of marketing research. The implications for managers on the basis of this work are clear. Managers should focus on bridging large networks and making sure that participants in these networks have relationships with each other. While we might appreciate the phrase “let a thousand flowers bloom14”, managers must not forget the forest for the trees. This work suggests that managers in addition to encouraging the growth of new ideas should also ensure that these ideas are disseminated as widely as possible. To get these ideas spread widely it is important to ensure that the networks in which these ideas are embedded are cohesive and connected.

Unconnected networks will result in stagnant ideas that will not spread through the network and subsequently remain unpopular.

Managers should also be aware of the potential of content sharing networks to act as a generative engine of new ideas and domains. Content sharing networks that are populated by users provide both a means of generating new content but also serve as the organizing mechanism of that content.

Consequently managers have to provide the information technology infrastructure for the individuals to collaborate and create content rather than focusing on the imposition of pre-existing categories and codes. Information

14 http://www.phrases.org.uk/meanings/226950.html

202

technology should be regarded as a means for individuals to express their individuality rather than as a tool for the control of individuals. Tags and content categorized through the tags represent the users of the technology engaging in meaning-making and sense-making processes (Weick et al. 2009). On information technology platforms such as content sharing sites, this sense- making is a collective process and should be leveraged by mangers as a way to organize content. This implies that managers can take these emergent categories and examine patterns of emergence to aid in the management of content and information within the organization. Managers of social media platforms can use the emergent processes of popularity to customize and tailor the content that they push to users based on the examination of similarities of tag and content interests so that users are served with content that is engaging to them.

203

5. APPENDIX

APPENDIX A Theoretical Concepts

Concept Definition

Social Social bookmarking describe the phenomenon where individuals Bookmarking collectively contribute and classify/order content.

Long-tail Long-tail describes the sections of an exponential distribution that follows a power law. The distribution can be divided into two regions, the head and the tail. The head refers to the region of the distribution that has high-frequency populations and the tail refers to the low- frequency region of the distribution.

Multi-modal A multi-modal network refers to a socio-technical network that has a Networks combination of individuals actors and technological artifacts as nodes. Relationships in such networks are based on communication linkages or affiliation patterns among the nodes.

Popularity Popularity is the outcome of a process of decision-making behavior on part of individuals. Individuals decision-making behavior contributed to the collective valuation to determine the value of a commodity, content or other items.

Tags Tags are the terms or concepts used to organize a domain or content. They are the structure on which flat classification systems are based.

Folksonomy Folksonomy refers to the social and collectively generated classification systems that are created through the use of tags. Folksonomies are emergent classifications that are generated and consumed by members of the community.

204

APPENDIX B

P-P Plot for the duration to popularity

The P-P plot for duration of popularity was constructed for a power-law distribution.

Normal P-P Plot of Pop_duration

1.0

0.8

0.6

0.4 Expected Cum Prob Cum Expected

0.2

0.0 0.0 0.2 0.4 0.6 0.8 1.0 Observed Cum Prob

P-P plot for Number of people and Number of tags

The P-P plots were constructed for a normal distribution

Normal P-P Plot of Count_people Normal P-P Plot of Count_tags

1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4 Expected Cum Prob Cum Expected Expected Cum Prob Cum Expected

0.2 0.2

0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Observed Cum Prob Observed Cum Prob

205

P-P plot for Network Cohesion, Network Connectivity and Network Density respectively The P-P plots were constructed for a normal distribution

Normal P-P Plot of Net_Cohesion

1.0

0.8

0.6

0.4 Expected Cum Prob Cum Expected

0.2

0.0 0.0 0.2 0.4 0.6 0.8 1.0 Observed Cum Prob

Normal P-P Plot of Net_Connec

1.0

0.8

0.6

0.4 Expected Cum Prob Cum Expected

0.2

0.0 0.0 0.2 0.4 0.6 0.8 1.0 Observed Cum Prob

Normal P-P Plot of Net_Density

1.0

0.8

0.6

0.4 Expected Cum Prob Cum Expected

0.2

0.0 0.0 0.2 0.4 0.6 0.8 1.0 Observed Cum Prob

206

6. BIBLIOGRAPHY

Abrahamson, E. "Managerial fads and fashions: the diffusion and rejection of innovations," Academy of management review) 1991, pp 586-612. Ahuja, M., Galletta, D., and Carley, K. "Individual Centrality and Performance in Virtual R & D Groups: An Empirical Study," Management Science (49:1) 2003, pp 21-38. Ainslie, A., Drèze, X., and Zufryden, F. "Modeling movie life cycles and market share," Marketing Science (24:3) 2005, pp 508-517. Alavi, M., and Leidner, D. E. "Review: Knowledge management and knowledge management systems: Conceptual foundations and research issues," MIS Quarterly (25:1) 2001, pp 107-136. Albert, R., and Barabási, A. L. "Statistical mechanics of complex networks," Reviews of Modern Physics (74:1) 2002, pp 47-97. Ali, J., and Tanaka, J. "Implementing the dynamic behavior represented as multiple state diagrams and activity diagrams," Journal of Computer Science & Information Management (JCSIM) (2:1) 2001, pp 24-34. Anderson, C. The long tail Hyperion, 2006a. Anderson, M. H. "How can we know what we think until we see what we said?: A citation and citation context analysis of Karl Weick's the social psychology of organizing," Organization Studies (27:11) 2006b, p 1675. Argyres, N. "The Impact of Information Technology on Coordination: Evidence from the B-2" Stealth" Bomber," Organization Science (10:2) 1999a, pp 162-180. Argyres, N. S. "The Impact of Information Technology on Coordination: Evidence from the B-2 "Stealth" Bomber," Organization Science (10:2) 1999b, pp 162-180. Attewell, P. "Technology diffusion and organizational learning: The case of business computing," Organization Science (3:1), February 1991, pp 1-19. Avital, M., and Te'eni, D. "From generative fit to generative capacity: exploring an emerging dimension of information systems design and task performance," Information Systems Journal (19:4) 2009, pp 345- 367.

207

Bailey, D. E., Leonardi, P. M., and Chong, J. "Minding the gaps: Understanding technology interdependence and coordination in knowledge work," Organization Science (21:3) 2010, pp 713-730. Baker, G. "The Effects of Synchronous Collaborative Technologies on Decision Making: A Study of Virtual Teams," Information Resources Management Journal (15:4) 2002, pp 79-93. Barabási, A., and Albert, R. "Emergence of Scaling in Random Networks," Science (286:5439) 1999a, p 509. Barabási, A., Jeong, H., Néda, Z., Ravasz, E., Schubert, A., and Vicsek, T. "Evolution of the social network of scientific collaborations," Physica A: Statistical Mechanics and its Applications (311:3-4) 2002, pp 590- 614. Barabasi, A.-L., and Albert, R. "Emergence of scaling in random networks," Science (286) 1999, pp 509-512. Barabási, A. L., and Albert, R. "Emergence of Scaling in Random Networks," Science (286:5439) 1999b, p 509. Basuroy, S., Chatterjee, S., and Ravid, S. A. "How critical are critical reviews? The box office effects of film critics, star power, and budgets," Journal of Marketing (67:4) 2003, pp 103-117. Beattie, J. "Undercover advertisers: You may never see these stealth marketing tactics coming," ABC News (22) 2002. Bhuian, S. N. "Marketing cues and perceived quality: Perceptions of Saudi," Journal of Quality Management (2:2) 1997, p 217. Boland, D., Lyytinen, K., and Yoo, Y. "Wakes Of Innovation In Project Communities: The Case Of Digital 3-D Representations In Architecture, Engineering And Construction,") forthcoming. Boland Jr, R., and Tenkasi, R. "Perspective Making and Perspective Taking in Communities of Knowing," Organization Science (6:4) 1995, pp 350- 372. Boland, R. J., Tenkasi, R. V., and Te eni, D. "Designing information technology to support distributed cognition," Organization Science (5:3) 1994, pp 456-475. Borgatti, S., and Cross, R. "A Relational View of Information Seeking and Learning in Social Networks," Management Science (49:4) 2003, pp 432-445.

208

Borgatti, S. P., Everett, M. G., and Freeman, L. C. "Ucinet for Windows: Software for Social Network Analysis," Analytic Technologies, Harvard, MA, 2002. Bovasso, G. "A Network Analysis of Social Contagion Processes in an Organizational Intervention," Human Relations (49:11) 1996, p 1419. Bowker, G., and Star, S. Sorting Things Out: Classification and Its Consequences MIT Press, 1999. Bowker, G. C., and Star, S. L. Sorting things out: classification and its consequences The MIT Press, 2000. Brass, D., Butterfield, K., and Skaggs, B. "Relationships and Unethical Behavior: A Social Network Perspective," The Academy of Management Review (23:1) 1998, pp 14-31. Brass, D. J. "Men's and Women's Networks: A Study of Interaction Patterns and Influence in an Organization," The Academy of Management Journal (28:2) 1985, pp 327-343. Brooks Jr, R. "" Word-of-Mouth" Advertising in Selling New Products," Journal of Marketing (22:2) 1957, pp 154-161. Brown, J., and Duguid, P. "Organizing Knowledge," REFLECTIONS (1:2). Brown, J., and Duguid, P. The Social Life of Information Harvard Business School Press, 2000. Brynjolfsson, E., Hu, J., and Smith, M. D. "From Niches to Riches: The Anatomy of the Long Tail," Sloan Management Review (47:4) 2006, pp 67-71. Burkhardt, M., and Brass, D. "Changing Patterns or Patterns of Change: The Effects of a Change in Technology on Social Network Structure and Power," Administrative Science Quarterly (35:1) 1990. Burt, R. Structural Holes: The Social Structure of Competition Harvard University Press, Cambridge, MA, 1992a. Burt, R. "The Social Capital of Opinion Leaders," The ANNALS of the American Academy of Political and Social Science (566:1) 1999, pp 37-54. Burt, R. S. Structural Holes: The Social Structure of Competition Harvard University Press, Cambridge: MA, 1992b. Burt, R. S. "The Contingent Value of Social Capital," Administrative Science Quarterly (42:2), June 1997a, pp 339-365. Burt, R. S. "A note on social capital and network content," Social Networks (19) 1997b, pp 355-373.

209

Burt, R. S. "Structural Holes versus Network Closure as Social Capital," in: Social Capital: Theory and Research, N. Lin, K.S. Cook, R.S. Burt and A.d. Gruyter (eds.), 2000, p. 1. Burton-Jones, A., and Gallivan, M. J. "Towards a Deeper Understanding of System Usage in Organizations: A Multilevel Perspective," MIS Quarterly (31:4), December 2007 2007, pp 657-679. Buskens, V., and Yamaguchi, K. "A New Model for Information Diffusion in Heterogeneous Social Networks," Sociological Methodology (29) 1999, pp 281-325. Butler, B. S. "Membership Size, Communication Activity and Sustainability: A Resource-Based Model of Online Social Structures," Organization Science (12:4) 2001. Cao, Y., Williams, D. D., and Larsen, D. P. "COMPARISON OF ECOLOGICAL COMMUNITIES: THE PROBLEM OF SAMPLE REPRESENTATIVENESS," Ecological Monographs (72:1) 2002, pp 41-56. Carley, K. M., Diesner, J., Reminga, J., and Tsvetovat, M. "Toward an interoperable dynamic network analysis toolkit," Decision Support Systems (43:4) 2007, pp 1324-1347. Carpenter, M. A., and Westphal, J. D. "The Strategic Context of External Network Ties: Examining the Impact of Director Appointments on Board Involvement in Strategic Decision Making," The Academy of Management Journal (44:4) 2001, pp 639-660. Carrington, P. J., Scott, J., and Wasserman, S. Models and methods in social network analysis Cambridge university press, 2005. Cavusoglu, H., Hu, N., Li, Y., and Ma, D. "Information Technology Diffusion with Influentials, Imitators, and Opponents," Journal of Management Information Systems (27:2), Fall2010 2010, pp 305-334. Chang, H., and Johnson, J. "Communication networks as predictors of organizational members’ media choices," WESTERN JOURNAL OF COMMUNICATION (65:4) 2001, pp 349-369. Cheng-Min, C., and Kuen-Shiou, Y. "HOW INDUSTRY NETWORK AND HIERARCHY POSITIONS INFLUENCE INNOVATION IN GLOBAL SEMICONDUCTOR INDUSTRY," Academy of Management Proceedings, Academy of Management, pp. N1-N6. Chesbrough, H. W., and Teece, D. J. "Organizing for innovation: when is virtual virtuous?," Harvard Business Review (80:8) 2002, pp 127-134.

210

Chevalier, J. A., and Mayzlin, D. "The effect of word of mouth on sales: Online book reviews," Journal of Marketing Research (43:3) 2006, pp 345-354. Chidambram, L., and Jones, B. "Impact of Communications Medium and Computer support on Group Perceptions and Performance: A comparison of Face-to-Face and Dispersed Meetings.," MIS Quarterly (17:4), Dec 1993, pp 465-491. Chiles, T. H., Meyer, A. D., and Hench, T. J. "Organizational Emergence: The Origin and Transformation of Branson, Missouri's Musical Theaters," Organization Science (15:5) 2004, pp 499-519. Ching, C., Holsapple, C., and Whinston, A. "Reputation, Learning and Coordination in Distributed Decision-Making Contexts," Organization Science (3:2) 1992, pp 275-297. Chintagunta, P. K., Gopinath, S., and Venkataraman, S. "The effects of online user reviews on movie box office performance: Accounting for sequential rollout and aggregation across local markets," Marketing Science (29:5) 2010, pp 944-957. Cohen, J., Cohen, P., West, S. G., and Aiken, L. S. "Applied multiple regression/correlation analysis for the behavioral sciences,") 1983. Coleman, J. S. "Social Capital in the Creation of Human Capital," The American Journal of Sociology (94) 1988, pp S95-S120. Coleman, J. S. Foundations of Social Theory Belknap Press of Harvard University Press, Cambridge MA, 1990. Conner, K. R., and Prahalad, C. K. "A resource-based theory of the firm: Knowledge versus opportunism," Organization Science (7:5) 1996, pp 477-501. Connolly, T., Jessup, L. M., and Valacich, J. S. "Effects of Anonymity and Evaluative Tone on Idea Generation in Computer-Mediated Groups," Management Science (36:6) 1990, p 689. Constant, D., Sproull, L., and Kiesler, S. "The Kindness of Strangers: The Usefulness of Electronic Weak Ties for Technical Advice," Organization Science (7:2) 1996, pp 119-135. Cross, R., Parker, A., Prusak, L., and Borgatti, S. "Knowing what we know: Supporting knowledge creation and sharing in social networks," Organizational Dynamics (30:2) 2001, pp 100-120.

211

Crowston, K., and Kammerer, E. E. "Coordination and collective mind in software requirements development," IBM Systems Journal (37:2) 1998, pp 227-245. Cummings, J., and Cross, R. "Structural Properties of work groups and their consequences for performance," Social Networks (25:3) 2003, p 197. Daft, R. L., and Lengel, R. H. "Organizational information requirements, media richness and structural design," Management Science (32:5), May 1986, pp 554-571. Davis, G., Yoo, M., and Baker, W. "The small world of the American corporate elite, 1982-2001," Strategic Organization (1:3) 2003, pp 301-326. Davis, J. H., Hulbert, L., and Au, W. T. "Procedural Influence on Group Decision Making," Communication and Group Decision Making) 1996. Dennis, A., Wixom, B., and Vandenberg, R. "Understanding fit and appropriation effects in group support systems," MISQ (25:2) 2001, pp 167-193. Dennis, A. R., and Garfield, M. J. "The adoption and use of GSS in project teams: Toward more participative processes and outcomes," MIS Quarterly (27:2), june 2003, p 289. Dennis, A. R., and Valacich, J. S. "Computer Brainstorms: More Heads Are Better Than One," Journal of Applied Psychology (78:4), 1993 1993, pp 531-537. DeSanctis, G., and Gallupe, R. B. "A Foundation for the Study of Group Decision Support Systems," Management Science (33:5) 1987a, pp 589-609. DeSanctis, G., and Gallupe, R. B. "A FOUNDATION FOR THE STUDY OF GROUP DECISION SUPPORT SYSTEMS.," Management Science (33:5), may 1987b, pp 589-610. DeSanctis, G., and Poole, M. S. "Capturing the complexity in advanced technology use: Adaptive structuration theory," Organization Science (5:2) 1994, pp 121-147. Duan, W., Gu, B., and Whinston, A. B. "The dynamics of online word-of- mouth and product sales--An empirical investigation of the movie industry," Journal of Retailing (84:2) 2008, pp 233-242. Duboff, R. S. "The wisdom of (expert) crowds," Harvard Business Review (85:9) 2007.

212

Dutton, J. E., Worline, M. C., Frost, P. J., and Lilius, J. "Explaining compassion organizing," Administrative Science Quarterly (51:1) 2006, pp 59-96. Ebadi, Y. M., and Utterback, J. M. "THE EFFECTS OF COMMUNICATIONS ON TECHNOLOGICAL INNOVATION," Management Science (30:5) 1984, pp 572-585. Efron, B. "Logistic Regression, Survival Analysis, and the Kaplan-Meier Curve," Journal of the American Statistical Association (83:402) 1988, p 11. Eisenhardt, K. "Building Theories from Case Study Research," The Academy of Management Review (14:4) 1989, pp 532-550. Elberse, A. "Should You Invest in the Long Tail?," Harvard Business Review (86:7/8) 2008, pp 88-96. Favela, J. "Capture and Dissemination of Specialized Knowledge in Network Organizations," Journal of Organizational Computing and Electronic Commerce (7:2&3) 1997, pp 201-226. Fischhoff, B., and Johnson, S. "The possibility of distributed decision making," Organizational Decision Making) 1997a. Fischhoff, F. E., and Johnson, S. "Organisational decision-making," Cambridge: Cambridge University Press, 1997b. Fleming, L. "Breakthroughs and the" Long Tail" of innovation," MIT Sloan Management Review (49:1) 2007, p 69. Forsyth, D. R. Group Dynamics Thomson Brooks/Cole, 1990. Fowler, J. H., and Christakis, N. A. "The dynamic spread of happiness in a large social network," BMJ: British medical journal (337) 2008, p a2338. Fredrickson, J. "The Strategic Decision Process and Organizational Structure," The Academy of Management Review (11:2) 1986, pp 280-297. Friedkin, N. "Theoretical Foundations for Centrality Measures," The American Journal of Sociology (96:6) 1991, pp 1478-1504. Fulk, J., and Boyd, B. "Emerging theories of communication in organizations," Journal of Management (17:2), June 1991, pp 407- 446. Gallupe, R. B., Dennis, A. R., Cooper, W. H., Valacich, J., Bastianutti, L. M., and Nunamaker, J. "Electronic BrainStorming and Group Size," The Academy of Management Journal (35:2), June 1992, pp 350-369.

213

Gargiulo, M., and Benassi, M. "Trapped in Your Own Net? Network Cohesion, Structural Holes and the Adaptation of Social Capital," Organization Science (11:2), March-April 2000, pp 183-196. Gersick, C. J. G. "Revolutionary Change Theories: A Multilevel Exploration of the Punctuated Equilibrium Paradigm," The Academy of Management Review (16:1) 1991, pp 10-36. Gladwell, M. The Tipping Point Time Warner, New York, 2002. Goetz, E. G., Tyler, T. R., and Cook, F. L. "Promised Incentives in Media Research: A Look at Data Quality, Sample Representativeness, and Response Rate," Journal of Marketing Research (21:2) 1984, pp 148- 154. Goldenberg, J., Libai, B., and Muller, E. "Talk of the Network: A Complex Systems Look at the Underlying Process of Word-of-Mouth," Marketing Letters (12:3) 2001, pp 211-223. Goldstein, J. "Emergence: A Construct Amid a Thicket of Conceptual Snares," Emergence (2:1) 2000, pp 5-22. Granovetter, M. "The strength of weak ties: A network theory revisited," Sociological theory (1) 1983, pp 201-233. Granovetter, M. S. "The Strength of Weak Ties," The American Journal of Sociology (78:6), May 1973, pp 1360-1380. Grant, R. M. "Toward a knowledge-based theory of the firm," Strategic management journal (17:10) 1996, pp 109-122. Hannan, M. T., and Freeman, J. "The Ecology of Organizational Mortality: American Labor Unions, 1836-1985," The American Journal of Sociology (94:1) 1988, pp 25-52. Hansen, M. "The Search-Transfer Problem: The Role of Weak Ties in Sharing Knowledge across Organization Subunits.," Administrative Science Quarterly (44:1) 1999, pp 82-85. Hansen, M., and Haas, M. "Competing for Attention in Knowledge Markets: Electronic Document Dissemination in a Management Consulting Company," Administrative Science Quarterly (46:1) 2001, pp 1-28. Hansen, M. T. "Knowledge networks: Explaining effective knowledge sharing in multiunit companies," Organization Science (13:3), May- June 2002, pp 232-248. Hansen, M. T., Nohria, N., and Tierney, T. "What's your strategy for managing knowledge?," Harvard Business Review (77:2) 1999, pp 106-116.

214

Hatch, M. J. "Exploring the empty spaces of organizing: how jazz can help us understand organizational structure," Organization Studies (20:1) 1999, pp 75-100. Haunschild, P. R., and Beckman, C. M. "When Do Interlocks Matter?: Alternate Sources of Information and Interlock Influence," Administrative Science Quarterly (43:4) 1998, pp 815-818. Haveman, H. A. "Follow the Leader: Mimetic Isomorphism and Entry into New Markets," Administrative Science Quarterly (38:4) 1993. Hempel, J. "Tapping the Wisdom of the Crowd," Business Week Online (27) 2007. Herbig, P. "Market signalling: a review," Management Decision (34:1) 1996, pp 35-45. Hersen, M. "Rationale for Clinical Case Studies: An Editorial," Clinical Case Studies (1:1) 2002, pp 1-3. Hester, A. "Innovating with organizational wikis: factors facilitating adoption and diffusion of an effective collaborative knowledge management system," ACM, 2008, pp. 161-163. Hogan, J., Lemon, K., and Libia, B. "Quantifying the Ripple: Word-of-Mouth and Advertising Effectiveness," Journal of Advertising Research (44:03) 2005, pp 271-280. Hollingshead, A. B. "Perceptions of Expertise and Transactive Memory in Work Relationships.," Group Processes & Intergroup Relations (3:3) 2000, p 257. Hosmer, D. W., Lemeshow, S., and May, S. Applied survival analysis: regression modeling of time to event data Wiley-Interscience, 2011. Hossain, L., and Wu, A. "Communications network centrality correlates to organisational coordination," International Journal of Project Management (27:8) 2009, pp 795-811. Huber, G. P. "A Theory of the Effects of Advanced Information Technologies on Organizational Design, Intelligence, and Decision-Making," Academy of Management Review (15:1), Jan 1990, pp 47-71. Hutchins, E. "Organizing Work by Adaptation," Organization Science (2:1) 1991, pp 14-39. Hutchins, E. Cognition in the Wild Bradford Book, 1995. Hutchins, E., and Hazlehurst, B. "How to invent a lexicon: the development of shared symbols in interaction," Artificial Societies: The Computer Simulation of Social Life) 1995, pp 157-189.

215

Hutchins, E., and Klausen, T. "Distributed cognition in an airline cockpit," Cognition and Communication at Work) 1996, pp 15–34. Iacobucci, D., and Hopkins, N. "Modeling Dyadic Interactions and Networks in Marketing," Journal of Marketing Research (29:1) 1992, pp 5-17. Ibarra, H., and Andrews, S. "Power, Social Influence, and Sense Making: Effects of Network Centrality and Proximity on Employee Perceptions," Administrative Science Quarterly (38:2) 1993. Janis, I. L. Victims of groupthink: a psychological study of foreign-policy decisions and fiascoes Houghton Mifflin, 1972. Jeong, H., Neda, Z., and Barabasi, A. L. "Measuring preferential attachment in evolving networks," Europhysics Letters (61:4) 2003, pp 567-572. Jessup, L. M., and Connolly, T. "The Effects of Anonymity on GDSS Group Process With an Idea-Generating Task.," MIS Quarterly (14:3) 1990, p 313. Johannisson, B. "Beyond Process and Structure: Social Exchange Networks," International Studies of Management and Organization (17:1) 1987, pp 3-23. Joshi, A. "The Influence of Organizational Demography on the External Networking Behaviour of Teams," The Academy of Management Review (31:3) 2006, pp 583-595. Jun, H., Butler, B. S., and King, W. R. "Team Cognition: Development and Evolution in Software Project Teams.," Journal of Management Information Systems (24:2) 2007, pp 261-292. Kane, G. C., and Alavi, M. "Casting the Net: A multimodal Network Perspectiev on User-System Interactions," Information Systems Ressearch (19:3) 2008, pp 253-272. Katz, J. A. "The Dynamics of Organizational Emergence: A Contemporary Group Formation Perspective," Entrepreneurship: Theory and Practice (17:2) 1993. Keyton, J. "Relational Communication in Groups," The Handbook of Group Communication Theory and Research) 1999. Kiesler, S., and Sproull, L. "Group Decision Making and Communication Technology," Organizational Behaviour and Human Decision Processes (52) 1992, pp 92-123. Kim, K. K. "Task characteristics, decentralization, and the success of hospital information systems.," Information and Management (19:2) 1990, pp 83-93.

216

Kim, Y. "Supporting Distributed Groups with Group Support Systems," Journal of Organizational and End User Computing (18:2) 2006, pp 20-37. Kim, Y. G., Hong, H. S., Bae, D. H., and Cha, S. D. "Test cases generation from UML state diagrams," IET, 1999, pp. 187-192. Komiak, P., Komiak, S. Y. X., and Imhof, M. "Conducting International Business at eBay: The Determinants of Success of e-Stores," Electronic Markets (18:2) 2008, pp 187-204. Kossinets, G., and Watts, D. J. "Empirical analysis of an evolving social network," Science (311:5757) 2006, pp 88-90. Kozinets, R. V., Hemetsberger, A., and Schau, H. J. "The Wisdom of Consumer Crowds: Collective Innovation in the Age of Networked Marketing," Journal of Macromarketing (28:4) 2008, p 339. Kraut, R., Rice, R., Cool, C., and Fish, R. "Varieties of Social Influence: The Role of Utility and Norms in the Success of a New Communication Medium," Organization Science (9:4) 1998, pp 437-453. Lai, G., and Wong, O. "The tie effect on information dissemination: the spread of a commercial rumor in Hong Kong," Social Networks (24:1) 2002, pp 49-75. Lakhani, K. R., and Von Hippel, E. "How open source software works:“free” user-to-user assistance," Research Policy (32:6) 2003, pp 923-943. Lamertz, K. "Organizational Citizenship Behaviour as Performance in Multiple Network Positions," Organization Studies (01708406) (27:1) 2006, pp 79-102. Latour, B. "On recalling ANT," in: Actor network theory and after, J. Law and J. Hassard (eds.), Blackwell Publishers, Boston, Mass, 1999. Lee, G., and Cole, R. "From a Firm-Based to a Community-Based Model of Knowledge Creation: The Case of the Linux Kernel Development," Organization Science (14:6) 2003, pp 633-649. Lew, A. A. "LONG TAIL TOURISM: NEW GEOGRAPHIES FOR MARKETING NICHE TOURISM PRODUCTS," Journal of Travel & Tourism Marketing (25:3/4) 2008, pp 409-419. Libby, R., Trotman, K. T., and Zimmer, I. "Member Variation, Recognition of Expertise, and Group Performance," Journal of Applied Psychology (72:1) 1987, p 81. Liu, Y. "Word of mouth for movies: Its dynamics and impact on box office revenue," Journal of Marketing (70:3) 2006, p 74.

217

Lucas Jr, H. C., and Baroudi, J. "The role of information technology in organization design," Journal of Management Information Systems (10:4) 1994, p 23. Ludbrook, J., and Dudley, H. "Why Permutation Tests Are Superior to t and F Tests in Biomedical Research," The American Statistician (52:2) 1998, pp 127-132. MacCormack, A., Verganti, R., and Iansiti, M. "Developing products on" Internet time": The anatomy of a flexible development process," Management Science (47:1) 2001, pp 133-150. Madhavan, R., and Grover, R. "From embedded knowledge to embodied knowledge: new product development as knowledge management," The Journal of Marketing (62:4) 1998, pp 1-12. March, J. G. "Understanding how decisions happen in organizations," Organizational Decision Making) 1997. Markus, M. L., Majchrzak, A., and Gasser, L. "A design theory for systems that support emergent knowledge processes," MIS Quarterly (26:3), September 2002, p 179. Marsden, P., and Friedkin, N. "Network Studies of Social Influence," Sociological Methods & Research (22:1) 1993, p 127. Matzat, U. "Academic Communication and Internet Discussion Groups: Transfer of Information or Creation of Social Contacts?," Social Networks (26:3) 2004, pp 221–255. McLeod, P. L. "An Assesment of the Experimental Literature on Electronic Support of Group Work: Results of a Meta Analysis," Human- Computer Interaction (7) 1992, pp 257-280. McPherson, M., Smith-Lovin, L., and Cook, J. M. "Birds of a Feather: Homophily in Social Networks," Annual Review of Sociology (27:1) 2001, pp 415-444. Meyer, G. "Social Information Processing and Social Networks: A Test of Social Influence Mechanisms," Human Relations (47:9) 1994, p 1013. Mizruchi, M. S., and Potts, B. B. "Centrality and power revisited: actor success in group decision making," Social Networks (20:4) 1998, pp 353-387. Mizruchi, M. S., and Stearns, L. B. "Getting Deals Done: The Use of Social Networks in Bank Decision-Making," American Sociological Review (66:5) 2001, pp 647-671.

218

Monge, P., and Contractor, N. "Emergence of Communication Networks," Urbana (51), p 61801. Monge, P., and Contractor, N. Theories of Communication Networks Oxford University Press, 2003a. Monge, P. R., and Contractor, N. S. Theories of communication networks Oxford University Press, New York, 2003b. Moscovici, S. Social influence and social change Academic Press New York, 1976. Moscovici, S. "Social influence and conformity," Handbook of social psychology (2) 1985, pp 347–412. Nault, B. "Information Technology and Organization Design: Locating Decisions and Information," Management Science (44:10) 1998, pp 1321-1335. Nerur, S., Sikora, R., Mangalaraj, G., and Balijepally, V. "ASSESSING THE RELATIVE INFLUENCE OF JOURNALS IN A CITATION NETWORK," Communications of the ACM (48:11) 2005, pp 71-74. No’Mahony, S., and Ferraro, F. "The emergence of governance in an open source community," Academy of Management Journal (50:5) 2007, pp 1079-1106. Orlikowski, W. J. "Knowing in practice: Enacting a collective capability in distributed organizing," Organization Science (13:3) 2002, pp 249- 273. Ouyang, M., and Grant, S. "Mechanism of Network Marketing Organizations Expansion as Pyramid Structures," Journal of Management Research (4:3) 2004, pp 138-146. Parameswaran, M., and Whinston, A. B. "Research issues in social computing," Journal of the Association for Information Systems (8:6) 2007, pp 336-350. Pavlou, P. A., and Dimoka, A. "The Nature and Role of Feedback Text Comments in Online Marketplaces: Implications for Trust Building, Price Premiums, and Seller Differentiation," Information Systems Research (17:4) 2006, pp 392-414. Pfeffer, J. "Size and Composition of Corporate Boards of Directors: The Organization and its Environment," Administrative Science Quarterly (17:2) 1972, pp 218-228. Phang, C. W., Kankanhalli, A., and Sabherwal, R. "Usability and Sociability in Online Communities: A Comparative Study of Knowledge Seeking and

219

Contribution," Journal of the Association for Information Systems (10:10) 2009, pp 721-747. Phelps, J., Lewis, R., Mobilio, L., Perry, D., and Raman, N. "Viral Marketing or Electronic Word-of-Mouth Advertising: Examining Consumer Responses and Motivations to Pass Along Email," Journal of Advertising Research (44:04) 2005, pp 333-348. Pickering, J. M., and King, J. L. "Hardwiring Weak Ties: Interorganizational Computer mediated Communication, Occupational Communities and Organizational Change," Organization Science (6:4) 1995, pp 479-486. Ping, W. "Popular Concepts beyond Organizations: Exploring New Dimensions of Information Technology Innovations," Journal of the Association for Information Systems (10:1) 2009, pp 1-30. Podolny, J., and Baron, J. "Resources and Relationships: Social Networks and Mobility in the Workplace," American Sociological Review (62:5) 1997, pp 673-693. Poole, M. S., and Baldwin, C. L. "Developmental processes in group decision-making," Communication and group decision-making) 1986, pp 35–61. Poole, M. S., and DeSanctis, G. "Understanding the Use of Group Decision Support Systems: The Theory of Adaptive Structuration," Organizations and Communication Technology) 1990, pp 173-193. Poole, M. S., and Roth, J. "Decision development in small groups V," Human Communication Research (15:4) 1989, pp 549-589. Poston, R., and Speier, C. "Effective use of knowledge management systems: a process model of content ratings and credibility indicators," MIS Quarterly (29:2) 2005, pp 221–244. Powell, W. W., Koput, K. W., and Smith-Doerr, L. "Interorganizational Collaboration and the Locus of Innovation: Networks of Learning in Biotechnology," Administrative Science Quarterly (41:1) 1996, pp 116-145. Putnam, R. "Bowling Alone: America's Declining Social Capital," Journal of Democracy (6:1) 1995, pp 65-78. Reagans, R., and McEvily, B. "Network Structure and Knowledge Transfer: The Effects of Cohesion and Range," Administrative Science Quarterly (48:2) 2003, pp 240-267.

220

Resnick, P., and Zeckhauser, R. "Trust Among Strangers in Internet Transactions: Empirical Analysis of eBay’s Reputation System," The Economics of the Internet and E-Commerce (11) 2002, pp 127–157. Rice, R., and Aydin, C. "Attitudes toward New Organizational Technology: Network Proximity as a Mechanism for Social Information Processing.," Administrative Science Quarterly (36:2) 1991. Rogers, E. M. Diffusion of innovations Free Pr, 1995. Rogers, Y., and Ellis, J. "Distributed cognition: an alternative framework for analysing and explaining collaborative working," Journal of information technology (9) 1994, pp 119-119. Romero, D. M., Meeder, B., and Kleinberg, J. "Differences in the mechanics of information diffusion across topics: Idioms, political hashtags, and complex contagion on Twitter," ACM, 2011, pp. 695-704. Royall, R. M. "The effect of sample size on the meaning of significance tests," American Statistician) 1986, pp 313-315. Rubinson, J. "Editorial: Marketing in the Era of Long-Tail Media," in: Journal of Advertising Research, 2008, pp. 301-302. Salancik, G., and Pfeffer, J. "A Social Information Processing Approach to Job Attitudes and Task Design," Administrative Science Quarterly (23:2) 1978, pp 224-253. Salancik, G. R., and Pfeffer, J. "Effects of Ownership and Performance on Executive Tenure in US Corporations," The Academy of Management Journal (23:4) 1980, pp 653-664. Schultz, B. G. "Improving Group Communication Performance: An Overview of Diagnosis and Intervention," in: The Handbook of Group Communication Theory and Research, L.R. Frey, D.S. Gouran and M.S. Poole (eds.), Sage Publications, 1999. Segal, U. A. "MICRO-BEHAVIORS IN GROUP DECISION-MAKING: AN EXPLORATORY STUDY," Journal of Social Service Research (5:1/2) 1982. Shadish, W. R., Cook, T. D., and Campbell, D. T. Experimental and quasi- experimental designs for generalized causal inference Wadsworth Cengage learning, 2002. Sharma, S., Sugumaran, V., and Rajagopalan, B. "A framework for creating hybrid-open source software communities," Information Systems Journal (12:1) 2002, pp 7-25. Shepherd, C. "The long tail of training," e.learning age) 2008, pp 26-26.

221

Sherif, K. "Managing technology and administration innovations: Four case studies on software reuse," … of the Association for Information Systems) 2004. Sia, C.-L., Tan, B. C. Y., and Wei, K.-k. "Group Polorization and Computer Mediated Communication:Effects of Communicaiton Cues, Social Presence and Anonymity," Information Systems Research (13:1) 2002. Simon, H. The Sciences of the Artificial MIT Press, 1996. Singh, J. "Collaborative networks as determinants of knowledge diffusion patterns," Management Science (51:5) 2005, pp 756-770. Snijders, T. A. B. "The statistical evaluation of social network dynamics," Sociological methodology (31:1) 2002, pp 361-395. Soubie, J., and Zarate, P. "Distributed Decision Making: A Proposal of Support Through Cooperative Systems," Group Decision and Negotiation (14:2) 2005, pp 147-158. Sparrowe, R. T., Liden, R. C., Wayne, S. J., and Kraimer, M. L. "Social Networks and the Performance of Individuals and Groups," Academy of Management Journal (44:2) 2001, pp 316-325. Stafford, M. R. "Tangibility in Services Advertising: An Investigation of Verbal versus Visual Cues," Journal of Advertising (25:3), Fall96 1996, pp 13-28. Standifird, S. S. "Reputation and e-commerce: eBay auctions and the asymmetrical impact of positive and negative ratings," Journal of Management (27:3) 2001, pp 279-295. Star "Institutional Ecology,Translations' and Boundary Objects: Amateurs and Professionals in Berkeley's Museum of Vertebrate Zoology, 1907- 39," Social Studies of Science (19:3) 1989, p 387 . Star, S., and Ruhleder, K. "Steps Toward an Ecology of Infrastructure: Design and Access for Large Information Spaces," Information Systems Research (7:1) 1996, pp 111-134. Star, S. L. "Got Infrastructure? How Standards, Categories and Other Aspects of Infrastructure Influence Communication," 2nd Social Study of IT, London School of Economics, 2002. Steyer, A., Garcia-Bardidia, R., and Quester, P. " Online Discussion Groups as Social Networks: An Empirical Investigation of Word-of-Mouth on

222

the Internet," Journal of Interactive Advertising (Spring 2006) 2006a, pp 51-59. Steyer, A., Garcia-Bardidia, R., and Quester, P. "Online discussion groups as social networks: An empirical investigation of word-of-mouth on the internet," Journal of Interactive Advertising (6:2) 2006b, pp 51-59. Steyer, A., Garcia-Bardidia, R., and Quester, P. G. "Online discussion groups as social networks: An empirical investigation of word-of-mouth on the internet,") 2006c. Strogatz, S. "Exploring complex networks," Nature (410) 2001, pp 268-276. Subramani, M., and Rajagopalan, B. "Knowledge-sharing and influence in online social networks via viral marketing," Communications of the ACM (46:12) 2003, pp 300-307. Sutcliffe, K., and McNamara, G. "Controlling Decision-Making Practice in Organizations," Organization Science (12:4) 2001, pp 484-501. Szabo, G., and Huberman, B. A. "Predicting the Popularity of Online Content," Communications of the ACM (53:8) 2010, pp 80-88. Szulanski, G. "Exploring internal stickness: Impediments to the transfer of best practice within the firm," Strategic Management Journal (17) 1996, pp 27-43. Tabachnick, B. G., Fidell, L. S., and Osterlind, S. J. "Using multivariate statistics,") 2001. Teigland, R., and Wasko, M. "Knowledge transfer in MNCs: Examining how intrinsic motivations and knowledge sourcing impact individual centrality and performance," Journal of International Management (15:1) 2009, pp 15-31. Tisselli, E. "thinkflickrthink: A Case Study on Strategic Tagging," Communications of the ACM (53:8) 2010, pp 141-145. Trauth, E. M., and Jessup, L. M. "Understanding Computer Mediated Discussions: Positivist and Interpretive Analyses of Group Support Systems Use," MIS Quarterly (24:1), March 2000, pp 43-79. Truex, D. P., Baskerville, R., and Klein, H. "Growing systems in emergent organizations," Communications of the ACM (42:8) 1999, pp 117-123. Tsai, W., and Ghoshal, S. "Social Capital and Value Creation: The Role of Intrafirm Networks," Academy of Management Journal (41:4) 1998, pp 464-476. Tsoukas, H. "The firm as a distributed knowledge system: a constructionist approach," Strategic Management Journal (17:1) 1996, pp 11-25.

223

Valacich, J. S., Dennis, A. R., and Connolly, T. "Idea generation in computer- based groups: A new ending to an old story," Organizational Behavior & Human Decision Processes (57:3), Mar 1994, pp 448-467. Valente, T. W. "Social network thresholds in the diffusion of innovations," Social Networks (18:1) 1996, pp 69-89. Vilpponen, A., Winter, S., and Sundqvist, S. "Electronic Word-of-Mouth in Online Environments: Exploring Referral Network Structure and Adoption Behavior," Journal of Interactive Advertising (6:2) 2006, pp 71-86. Von Krogh, G., Spaeth, S., and Lakhani, K. R. "Community, joining, and specialization in open source software innovation: a case study," Research Policy (32:7) 2003, pp 1217-1241. Wasko, M., and Faraj, S. "Why should I share? Examining Social Capital and Knoledge Contribution in Electronic Networks of Practice," MIS Quarterly (29:1) 2005, pp 35-57. Wasserman, S., and Faust, K. Social Network Analysis: methods and applications Cambridge University Press, 1994. Watts, D. "Networks, Dynamics, and the Small-World Phenomenon," AJS (105:2) 1999a, pp 493-527. Watts, D. Small Worlds: The Dynamics of Networks Between Order and Randomness Princeton University Press, 1999b. Watts, D., and Strogatz, S. "Collective dynamics of 'small-world' networks.," Nature (393:6684) 1998, pp 409-410. Weick, K. The Social Psychology of Organizing Mc-Graw-Hill Publishing Co, 1979. Weick, K. "Toward a model of organizations as interpretation systems," Academy of Management Review (9:2) 1984, pp 284-295. Weick, K. E., and Roberts, K. H. "Collective mind in organizations: Heedful interrelating on flight decks," Administrative Science Quarterly (38:3), Sep 1993 1993, pp 357-381. Weick, K. E., Sutcliffe, K. M., and Obstfeld, D. "Organizing and the process of sensemaking," Handbook of Decision Making (16:4) 2009, p 83. Wellman, B., and Wetherell, C. "Social Network Analysis of Historical Communities: Some questions from the past to the present," The History of the Family (1:1) 1996, pp 97-121.

224

Westphal, J. D., Seidel, M. D. L., and Stewart, K. J. "Second-Order Imitation: Uncovering Latent Effects of Board Network Ties," Administrative Science Quarterly (46:4) 2001, pp 717-747. Yin, R. Case Study Research: design and methods Sage Publications Inc, 2003. Zigurs, I., and Buckland, B. K. "A Theory of Task/Technology Fit and Group Support Systems Effectiveness," MIS Quarterly), Sept 1998.

225