Academic team formation as evolving Carla Taramasco, Jean-Philippe Cointet, Camille Roth

To cite this version:

Carla Taramasco, Jean-Philippe Cointet, Camille Roth. Academic team formation as evolving hy- pergraphs. Scientometrics, Springer Verlag, 2010, 85 (3), pp.14. ￿10.1007/s11192-010-0226-4￿. ￿hal- 00474160￿

HAL Id: hal-00474160 https://hal.archives-ouvertes.fr/hal-00474160 Submitted on 19 Apr 2010

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Academic team formation as evolving hypergraphs

Carla Taramasco∗ ,† ,‡ Jean-Philippe Cointet∗ ,† ,§ Camille Roth∗ ,† ,¶,k

[preprint — paper to appear in scientometrics]

Abstract & Rosen, 1978–1979; deB. Beaver, 1986; Melin & Persson, 1996). The latter includes network-based This paper quantitatively explores the social and studies, which are generally aiming at understand- socio-semantic patterns of constitution of academic ing the structural determinants and patterns of col- collaboration teams. To this end, we broadly un- laboration (Mullins, 1972; Newman, 2001; Barab´asi derline two critical features of social networks of et al., 2002; Moody, 2004; Wagner & Leydesdorff, -based collaboration: first, they essen- 2005; Leahey & Reikowsky, 2008). In this case, the tially consist of -level interactions which call quantitative formal framework of choice is the so- for team-centered approaches. Formally, this in- cial network of dyadic interactions, addressing ques- duces the use of hypergraphs and n-adic interactions, tions related to how ego-centered characteristics, in rather than traditional dyadic frameworks of inter- the broad sense, influence the likelihood of in- action such as graphs, binding only pairs of agents. volved in a collaboration. Second, we advocate the joint consideration of struc- tural and semantic features, as collaborations are The Team Level and Networks allegedly constrained by both of them. Consider- ing these provisions, we propose a framework which Network studies, specifically in the context of sci- principally enables us to empirically test a series of entific collaboration, indeed often focus on the level hypotheses related to academic team formation pat- of the individual in spite of a large amount of work terns. In particular, we exhibit and characterize the on the question of (Lott & Lott, influence of an implicit group structure driving re- 1965; Bollen & Hoyle, 1990; Friedkin, 2004). There current team formation processes. On the whole, are wider implications of this focus on the ego- innovative production does not appear to be corre- centered level: lated with more original teams, while a polarization appears between groups composed of experts only or • By aiming at describing individual behavioral non-experts only, altogether corresponding to collec- patterns, this perspective may overlook the tives with a high rate of repeated interactions. influence of characteristics expressable at the meso-level of the team itself. In particular, by focusing on dyadic interactions and relational 1 Introduction patterns between ego and alter(s), the presence of ego in a given collaboration is interpreted The mechanisms of academic collaboration are the as a function of the characteristics of ego and focus of a long and established of re- those of alter(s), and of the characteristics of search (Katz & Martin, 1997), from qualitative the various dyads between ego and alter(s). studies on cooperation and co-optation behaviors • (Crane, 1969; Chubin, 1976; Latour & Woolgar, Further, the creation of a group results from a 1979) to more quantitative approaches (deB. Beaver complex agreement and arrangement between all its members, who jointly decide to collabo- ∗ISC-PIF (Institut des Syst`emes Complexes – Paris-ˆIle- rate. As such, even when assuming that the be- de-France). 56, rue Lhomond, 75005 Paris, France. havior of ego may depend on non-dyadic, team- † CREA (CNRS/Ecole Polytechnique, France). level characteristics, interpreting team forma- CREA/ENSTA, 45 Bd Victor, 75015 Paris, France ‡DECOM (Universidad de Valparaiso, Chile). Avenida tion processes as a sum of individual rational- Gran Breta˜na, 1091 Playa Ancha. Valparaiso, Chile ities may oftentimes seem difficult, or irrele- §INRA SenS (Sciences en Soci´et´e) - IFRIS. 5, Bd vant. Put differently, there are regularities in Descartes. 77420 Champs-sur-Marne, France team formation processes which are difficult to ¶CAMS (CNRS/EHESS, France). EHESS/CNRS, 54 Bd Raspail, 75006 Paris, France ascribe specifically back to individuals; it may kCRESS (U. Surrey, GB). University of Surrey, Guildford appear more natural and consistant to appraise GU2 7XH, United Kingdom the underpinnings of group formation at the 1 Emails: group level. [email protected] (corresponding author) [email protected] 1Note that what we call a “team” here actually relates [email protected] to a group that is involved in the production of an academic

1 To sum up, when dyadic frameworks are in- understand the underlying social processes if both volved, collaboration teams are appraised under social and semantic dimensions of, e.g., scientific the lens of multiple one-to-one interactions. It activity, are not considered as two interdependent should be no surprise: literature is dynamics (Roth, 2006; Roth & Cointet, 2010). Go- itself overwhelmingly concerned with dyadic links. ing further, we construe scientific dynamics as made However, a sizeable portion of , starting of groupings of both agents and : the epis- with Simmel (1898), has long been concerned by temic dynamics, i.e. the scientific knowl- wider frameworks of interactions, or so-called “so- edge construction, is made of events which simul- cial circles”, which some authors have formalized to taneously involve compounds of actors and con- take directly into account non-dyadic relationships: cepts. In line with the program introduced by Cal- Breiger (1974, 1990), for instance, proposed to use lon (1986), we will appraise scientific dynamics as bipartite graphs to represent and analyze ties be- made of constant reconfiguration and re-negotiation tween actors and social groups. Focusing on the of of both humans and non-humans. group-level, Ruef (2002) quantitatively examined In this respect and more broadly, in addition to the contribution of several factors including , focusing on teams, we thus advocate the enrichment status, or ethnicity, in the preferential constitution of the notion of team by considering teams as joint of business founding teams. In a review study, Free- groupings of both agents and semantic items. man (2003) explored various approaches previously adopted in mathematical sociology to model two- Knowledge-based teamwork mode data in order to account for the presence of subsets of people participating altogether in (subsets The interest in the social of academic of) identical events. communities also has a broader reach. As a knowl- In this respect, it therefore first appears that aca- edge production arena, science is indeed likely to demic collaboration choices and dynamics should share features found in other collaborative knowl- be characterized by investigating the meso-level of edge creation contexts. team formation. More precisely, it should be fruitful (i) Collaboration in knowledge production sys- to focus on teams rather than pairs of agents inter- tems. acting together, thus advocating the use of hyper- graphs or bipartite graphs rather than traditional This issue may shed light, to some extent, on frameworks based on graphs. Hypergraphs indeed the interaction processes underlying, broadly, feature hyperlinks which connect arbitrary numbers collaborative knowledge production. These of agents, while graphs feature links which con- contexts indeed define a particularly com- nect only pairs of agents. In other words, consid- mon class of social networks of collaboration, ering hypergraphs prevents making the superfluous where agents jointly and collectively interact and plausibly debatable assumption that teams are for purposes of knowledge production, in the equivalent to complete subgraphs featuring one-to- broad sense. This encompasses activist groups one interactions between all its members (i.e. as- and political epistemic communities (Ruggie, suming for instance that a triad is equivalent to 1975; Haas, 1992), scientific communities (deB. three dyads). Beaver & Rosen, 1978–1979; Laband & Tolli- son, 2000; Jones et al., 2008; Stokols et al., Hybrid Networks of Actors and Concepts 2008; Leahey & Reikowsky, 2008) and more specifically research projects (Lar´edo, 1995, Secondly, collaboration massively depends on cog- 1998), open-source development communities nitive properties, in particular some cognitive fit (Kogut & Metiu, 2001) and discussion lists and between team members, as agents plausibly com- forums (Constant et al., 1996; Welser et al., pose teams in order to gather complementary com- 2007), wiki platform-mediated communities petences. For instance, some economic models of (Bryant et al., 2005; Levrel, 2006), artists gath- knowledge creation consider matching rules based ering for a theater performance (Uzzi & Spiro, on the similarity of agent profiles, as elements of 2005) or making a movie (Faulkner & An- a vector space, to explain economic network struc- derson, 1987; Ramasco et al., 2004), board ture (Cowan et al., 2002). In other words, equal members making collective decisions (Davis & attention should be given to social and semantic fea- Greve, 1996). tures, which are traditionally left apart in the liter- ature, although the of -driven (ii) Collaboration in teams. interactions has been underlined in numerous works This kind of relatively autonomous collabora- (McPherson et al., 2001). tion mode has to be understood in a context Our main hypothesis is that one cannot correctly where traditionally vertical and hierarchical organizations have recently been functioning paper, i.e. the team of coauthors that produces it; it does not correspond to the more or less explicit notion of team that in increasingly horizontal and networked ways may exist in some research labs. (Powell, 1990; Miles & Snow, 1996; Smangs,

2 2006). This contemporary so-called “network communities) and semantic (epistemic commu- governance” involves dynamic coalitions of ac- nities). In particular, the choice of collabora- tors both at organizational and individual lev- tion partners is likely to highly depend on cog- els, increase of teamwork and frequent group nitive similarity. reconfigurations (Jones et al., 1997). This shift is particularly sensible in contexts where More to the point, in terms of strictly social agents are relatively free to group to form ca- and strictly semantic associations, we first aim sual alliances and where collaboration some- at checking the following simple hypotheses, by appears to be self-organized. comparing what happens empirically with what In this respect, science appears to be a proto- would have happened if teams had been formed typical case of such teamwork-based systems (deB. strictly by chance (i.e. by comparing empirical teams Beaver, 1986; Adams et al., 2005; Wuchty et al., with a null-model featuring random compositions of 2007) — scientific knowledge production essentially teams). involves events where researchers jointly work to (H1). Teams with a high rate of interaction rep- manipulate and introduce concepts. It is addi- etition should be more likely, as could tionally one of the most accomplished context of be expected because of social cohesion knowledge-based collaboration, as well as one of the (Bollen & Hoyle, 1990; McPherson & most explicit, by its very stigmergic2 : papers Smith-Lovin, 2002; Friedkin, 2004) or indeed constitute a concrete, often public instance organizational constraints (Rodriguez & of these gatherings and therefore provide an oppor- Pepe, 2008). tunity to understand the impact of these collabora- tions on the dynamics of science. On the empirical (H2). Teams where a high proportion of con- side, we thus rely on large bibliographic databases. cepts are repeatedly associated should be As such, our approach does not pretend to more likely — as assumed by co-word anal- embrace the whole complexity of knowledge- ysis (Callon et al., 1986; Noyons & van intensive organizations, in particular the intri- Raan, 1998), where frequent associations cate co-evolutionary processes existing between for- of terms are supposed to define conceptual mal organizations and more local team-based and cores and field boundaries. individual-based decisions (Lazega et al., 2008). However, the metholodogy we propose is able to (H3). Papers with a higher semantic originality shed some original light on portions of the dynamics (i.e. new association of concepts) should of these knowledge production systems. be those where there is a higher num- ber of new interactions.3 Put differently, as suggested by social and semantic rep- The paper is organized as follows: in Sec. 2, etitions assumed by H1 and H2, teams we present the framework and support several hy- with a high number of repeated interac- potheses on socio-semantic team-based collabora- tions should tend to produce papers that tion, Sec. 3 introduces the protocol and methods, have smaller semantic/topical originality; while Sec. 4 presents the results, which we then dis- which in some sense belong to a narrower cuss in light of the initially proposed hypotheses. subfield of research (Leahey & Reikowsky, 2008). 2 Framework Then, we appraise the socio-semantic composition of teams. We more precisely focus on the distinction As follows from the introduction, we hence argue between agents who are already familiar with some that two features are key in extending the under- concepts involved in the interaction, and those who standing of, one hand, collaboration networks, and are not. This approach will more broadly inform us on the other hand and additionally, knowledge pro- about the cognitive specialization of teams. duction networks: (HI). Because of both scientific specialization 1. Group effects underlie and partially determine (Chubin, 1976) and homophily (McPher- dyadic interactions: affiliation to teams of col- son et al., 2001; Stokols et al., 2008), laboration, membership in identical epistemic teams gathering around a given topic communities, for instance, structure and influ- should generally involve more individuals ence the very formation of these interactions. knowledgeable about this given topic. 3 2. In the case of social networks of knowledge, As Callon (1994, p.414) sums up from the existing liter- these underlying groups are both social (work ature, “The more numerous and different these hetero- 2“Stigmergic”: that is, leaving traces susceptible to guide geneous collectives are, the more the reconfigura- the work of others. For an extensive discussion of this notion, tions produced are themselves varied” see Karsai & Penzes (1993).

3 (HII). Teams with a balanced composition of ex- 3. Scientific committee members for JEMRA perts in a given field should produce more meetings4: this dataset includes the publica- innovation (Ancona & Caldwell, 1992), tions of an initial of 168 scientists involved which in terms of networks could be trans- in these meetings, gathered from 1985 to 2007. lated into: This ends up with 5893 papers (15375 authors, 69 word classes). • more semantic originality, i.e. novel associations of concepts, 4. Scientific committees members for JECFA • more social originality, i.e. novel in- meetings5: similarly, publications of an initial teractions between agents. set of 178 scientists are gathered from 1985 to 2007. This ends up with 8685 papers (21195 3 Protocol and methods authors, 85 word classes).

In line with this focus on socio-semantic aspects, 3.2 -based definitions we will thus endeavor at exhibiting how new teams are formed by considering both social and concep- Now, these agents and concepts formally define an tual past acquaintances of scientists involved in new evolving hypergraph where each article is a hybrid collaborations. We will concretely describe the se- hyperlink gathering both authors and the topics in- mantic dimension in terms of attributes qualifying volved in the collaboration, as partly exemplified by topics of interest of authors and the social dimen- Fig. ??. sion as structural and relational properties in the In what follows, we describe comprehensively our dynamic collaboration network — which altogether formal framework (Sec. 3.2.1), which, basically, al- will enable us to confirm or refute the previous set lows us to gather both agents and concepts in a dy- of hypotheses. namic setting and to define which agents are new, or not (newcomers vs. veterans), which concepts are 3.1 Datasets new, or not (novelties vs. standards), and which agents have used which concepts in the past, or not Our empirical analysis focuses on collaboration (neophyte or experts). databases, which reveal a large part of the under- lying collaboration activity, including social links Building upon these definitions, we will then pro- between individuals or conceptual acquaintances of pose a series of hypergraphic measures (Sec. 3.2.2) — each individual (i.e. details regarding which topics that is, measures at the level of teams, or non-dyadic which agents are familiar with). These datasets measures — which cover the proportion of experts in provide temporal on teams, gathering a given collaboration (expertise ratio) and the orig- agents and the topics they work on, assuming that inality of participants in a team (hypergraphic rep- topics are described by the very terms used in paper etition, i.e. describing to what extent a team does abstract. For each dataset, we focus on a set of no gather agents, or concepts, which were jointly asso- more than a hundred of relevant terms. These terms ciated, at the team-level, in previous periods). For are selected with the help of an expert of the corre- instance, a team with an expertise ratio of one will sponding field and are such that they appropriately be such that all agents are experts; a team with a cover the most significant topics of each field. hypergraphic repetition of one, in terms of agents, will be such that all its agents will have altogether We use the following datasets, defined either from previously collaborated (it is zero in case none of the a semantic perspective (using e.g. field names) or agents have previously been associated). from a social perspective (using e.g. scientific as- Then, we present a (Sec. 3.2.3) for semblies), and involving both large and small com- computing how much the empirical data diverges munities: from a random setting with a comparison between the actual observed data and a uniform null-model of 1. Embryologists working within a given and well- hypergraph . Put simply, we will appraise determined subfield — the zebrafish, on a pe- how much teams with, e.g., a given hypergraphic riod of 20 years (1985–2004). Data was ex- repetition ratio, are forming significantly more often tracted from the publicly available database than could be expected by chance. This latter tool Medline, which eventually yields a dataset of will be the cornerstone of the empirical testing of 6, 145 articles (13 084 authors, 71 word classes). hypotheses 1-2-3 & I-II. 2. Scientists working on rabies from the same kind 4 of MedLine extraction as for zebrafish embryol- Joint FAO/WHO Expert Meet- ogists — the observed period spans from 1985 ings on Microbiological Risk Assessment, http://www.fao.org/ag/agn/agns/jemra index en.asp to 2007. This ends up with 4648 events (9684 5Joint FAO/WHO Expert Committee on Food Additives, authors, 70 word classes). http://www.who.int/ipcs/food/jecfa

4 3.2.1 Objects E ⊆A∪C, the projection of e on the set E is noted eE = e∩E. For instance, the that all hyperlinks Hypergraphs. contain at least one agent translates as ∀e, eA 6= ∅. Formally, a hypergraph features nodes and hyper- We can thus define a (dynamic) collaboration hy- A links, which describe n-adic interactions among any pergraph {e | e ∈ Et} = At ⊆ P(A) whose hyper- subset of nodes. It is therefore a generalization of links connect team members, and a semantic hyper- C the notion of graph whose links only describe dyadic graph {e | e ∈ Et} = Ct ⊆ P(C) whose hyperlinks interactions, i.e. between pairs of nodes. As such, are sets of concepts mentioned in a given collabora- any hyperlink corresponds to any grouping of agents tion. In particular, At is isomorphic to a bipartite and any kind of social circle: it may describe social graph of collaboration, traditional in the literature events, organizations, , teams, etc. A hyper- (Newman et al., 2001; Guimera et al., 2005). graph is also isomorphic to a , where agents on one side are connected to various affilia- tions, groups or events on the other side; as such a Neophytes and newcomers. structure which reifies the duality of social groups We say that an agent a is, at t,a“neophyte in a (Breiger, 1974; Freeman, 2003). See Fig. ??. given c ∈ C” if s/he has never used c at t: Beyond the simple of the structure formally, if ∄e ∈ Et−1, {a,c}⊆ e. Otherwise, s/he is of such networks, several studies have endeavored called an “expert”. at reconstructing structural properties typically in- We say that an agent a is a “newcomer” if s/he duced by the hypergraphic setting — namely, that has never published before t, which is equivalent to agents interact within groups of some sort — rather say that ∄e ∈ Et−1,a ∈ e. Otherwise, s/he is called than using dyadic interactions only: in this direc- a “veteran”. tion Newman et al. (2001); Ramasco et al. (2004); Similarly, we say that a concept c is a “novelty” Guimera et al. (2005), inter alia, examine the struc- at t if all agents are neophyte in this concept: ∄e ∈ ture of a social network whose dyadic links stem E 1,c ∈ e. Otherwise, it is a “standard”. from teams — team composition is first empiri- t− cally appraised then stylized and used as a basis for what essentially is a addition process. In 3.2.2 Measures these models however the focus remains on dyadic relationships or dyadic interaction behaviors, rather Homogeneity of teams and expertise ratio. than truly hypergraphic measures. Given these basic concepts, we may first examine the In contrast, the focal level of analysis of the composition of teams using a simple hypergraphic present study is the hypergraph and its hyperlinks. measure pertaining to the composition of teams in terms of a simple proportion of experts: “how much Epistemic hypergraphs. are teams made of people familiar or not with a To bind the social and semantic aspects, we intro- given concept which is used by the team?”. duce the notion of epistemic hypergraph Et using: We call this proportion expertise ratio, noted “ξ”; for example, a paper on “ants” where half of the au- (i) a set of agents A, thors already worked on ants has a ratio of expertise in “ants” of .5. Formally, the expertise ratio ξ (e) (ii) a set of concepts C, and c,t in concept c ∈ eC at t of team e is given by: (iii) the epistemic hypergraph itself E ⊆P(A∪C), A describing the joint appearance of agents and |{a ∈ e | a is an expert in c}| ξc,t(e)= concepts, and henceforth the usage of the lat- |{a ∈ eA}| ter by the former, where each collaboration is a hyperlink e ∈P(A∪C). This notion, derived from the composition of a given team in terms of experts vs. neophytes in a As such, an “epistemic hypergraph” is properly given concept, expresses the socio-conceptual homo- defined by a triple (A, C, E). Dynamic epistemic hy- geneity of a team. See Fig. ??. pergraphs are indexed with time, Et, and are con- ′ sidered to be growing: t

5 conceptual originality by describing the proportion Zebrafish 6 0 of new associations of concepts in a paper. 10 More precisely, in the dyadic case, an interaction is said to be repeated if the two nodes already jointly appeared in a previous collaboration. We extend −1 this notion to the hypergraphic case: 10 • We first say that a set of nodes has “previously co-occurred” if there is at least one previously- −2 existing (< t) hyperlink including this set. We 10 define the corresponding function ρt as follows: e′ E e e′ e 1 if ∃ ∈ t−1, ⊆ ρt( )= −3 10  0 otherwise. 0 ]0,0.1[[0.1 [0.2 [0.3 [0.4 [0.5 [0.6 [0.7 [0.8[0.9,1[ 1

Thus, for instance, if a and a′ never collabo- ′ rated at t, we have ρt({a,a }) = 0. Figure 1: Empirical distribution of the hypergraphic repetition rate for concepts, r (eC ). • The notion of hypergraphic repetition is prop- t erly defined for veteran agents and/or standard concepts — by definition, repetition cannot oc- 3.2.3 Estimating propensities of team for- cur with newcomers or novelties. mation Therefore, in the following formulas, hyperlinks ′ Null-model of hypergraph. e must be such that ∀e ∈ e, ∃e ∈ Et−1 such that e ∈ e′. In other words, we ensure the use of such A null-model of new teams based on agents (resp. hyperlinks by considering, ∀e ∈ Et, truncated concepts) is defined such that, at each period t, we hyperlinks e restrained to the set of previously- randomly create new teams respecting empirically- existing nodes, i.e.: observed numbers of agents (resp. concepts) and their respective numbers of team participations. e e e′ = ∩ What is fundamentally randomized is the exact ′ e ∈[Et−1 composition of teams in terms of who is collaborat- ing with whom: in our null-model, team members We then compute the hypergraphic rate of rep- are basically reshuffled. Put differently, the null- etition for a hyperlink e ∈ E as the proportion t model expresses the composition of teams as would of subsets of this hyperlink that have previously be happening by chance. co-occurred: 1 In other words and more practically, r (e)= ρ (e′) t |e| e t 2 − | |− 1 ′ eX⊆e • we empirically measure: ′ |e |≥2 1. the size of new teams appearing at t, i.e. = rt(e) the distribution of |eA| (resp. |eC|) for e ∈ E Depending on the objectives, it might be ap- ∆ t, propriate to weight the relative importance of 2. for every element e ∈ A (resp. e ∈ C), each subset of hyperlink e in the sums, for in- the number of times it appears in newly- stance according to their size: for a discussion formed teams, i.e.: on weighting functions, see Appendix A. {e ∈ ∆E such that e ∋ e} Let us consider the following example: given a new t e eC collaboration forming at t, rt( ) thus measures • its hypergraphic concept repetition, i.e. how much we then generate an artificial, uniformly ran- E the concepts of eC have been jointly associated, al- dom set of new teams ∆ t ⊂ P(A∪C) which respects above-mentioned distributions, that is: together, in previous periods. Eventually, we may g plot the distribution of such values r for all teams, t 1. same distribution of sizes of new hyper- as shown in Fig. 1. Put simply, it shows that about a links, third of teams have a hypergraphic conceptual rep- etition of 1, i.e. all their concepts eC have already 2. same distribution of participations of ele- jointly been used in the past. ments in these new hyperlinks. 6 In which case, new concept associations are new with In the remainder, we examine and compare the respect to the whole system, consistently with the social case: E i.e. this refers to concept associations which never existed in properties of the empirical ∆ t and the randomly- any paper of the preceding periods. created ∆Et. g 6 Propensity. newcomers are thus less frequent than they should be. In particular, we define the propensity of team for- mation with respect to a given function f of a hy- Hypergraphic rate of repetition: social or se- perlink (e.g. the hypergraphic rate of repetition) mantic homogeneity/heterogeneity as, for each possible x of the function, the ra- tio between the observed number of new hyperlinks Measuring now propensities of group formation with (events) e such that f(e) = x and the randomly- respect to hypergraphic rates of repetition, we can created number of such events: empirically exhibit the existence and influence of an implicit group structure which drives recurrent team {e ∈ ∆Et such that f(e)= x} Π (x)= (1) formation — this group structure exists along the t {e ∈ ∆Et such that f(e)= x} two above-mentioned dimensions:

Obviously, if thisg quantity is above 1 for a certain • Social homogeneity/heterogeneity: With re- value of x, we say that this type of team empirically spect to agents, the hypergraphic rate of repe- occurs more than expected; otherwise, less. tition measures the extent to which a team fea- tures repeated interactions among former col- laborators. Once again, our results have to 4 Results be compared to the null hypothesis for which teams are formed randomly. Figure 4–top fea- We may now empirically appraise hypotheses 1-2-3 tures the corresponding propensities which are & I-II. several orders of magnitude higher than 1 for teams with a non-negligible proportion of such 4.1 Simulation of the null-model repetitions (r > .1) • We start by measuring the propensity of team for- Conceptual homogeneity/heterogeneity: Simi- mation, first with respect to simple expertise ratios larly, we measure the propensity of team for- and, second, with respect to hypergraphic repeti- mation with respect to repeated concept asso- tion rates. To this end, we simulate 2, 500 instances ciations, addressing the following issue: “are of above-defined null-model-based epistemic hyper- there cores of concepts which are likely to be graphs, which are therefore random hypergraphs.7 recurrently associated, given that they were We then compare the composition of teams thus ob- previously jointly used in previous papers?” tained with that of the empirical data. Results, shown on Fig. 4–bottom, demonstrate again (and even in a stronger fashion than in the social case) that there is a significant Expertise ratio: socio-semantic homogene- towards gathering groups of concepts which ity/heterogeneity were previously associated. Distinguishing agents who have already been asso- ciated with a concept (“experts”) and agents who 4.2 Discussion of hypotheses are not yet associated (“neophytes”), we thus assess It is now possible to review and check the afore- whether real teams involve agents of mixed back- mentioned hypotheses. As follows from Fig. 4, it is grounds or not, relatively to a randomly-built set of clear that (H1) and (H2) are quantitatively con- teams. Details of this comparison are displayed on firmed: teams with a high proportion of interaction Fig. 2 for the zebrafish case, which illustrates the repetitions or with a high proportion of repeated composition of teams for various levels of expertise conceptual associations are much more likely than ratios, in both the real and random cases. Corre- should be expected by chance. sponding propensities, for both cases, are shown on Additionally, and irrespective of the simulation Fig. 3: their shapes are consistent across all datasets model, we check if there is a correlation between se- and consist of a U-shaped curve above 1 for extreme mantic and social hypergraphic rates of repetition. values of expertise ratios (towards 0 and towards 1) As shown on Fig. 5, there seems to be no corre- and below 1 for central values (typically, from 0.1 to lation between social and semantic originality in a ca. 0.4 −−0.5). collaboration (in our datasets, which come from var- Empirically, we thus observe that there is a signif- ied backgrounds but are also focused on particular icantly high propensity of formation of teams com- epistemic communities). This invalidates (H3): in posed of either experts only or newcomers only, with other words, contrarily to , new semantic a significantly lower propensity for mixed teams. associations do not stem more from original teams Teams involving a mixed proportion of experts and than from repeated teams. In other words, semantic 7For of computational complexity, we consider innovation is as likely from agents who, globally, pre- event sizes not greater than 10 agents and 10 concepts — viously collaborated, as from new collaborations.8 with this constraint we still consider no less than 89% of the total original number of teams. 8This does not mean, however, that the backgrounds of

7 Zebrafish expertise observed theoretical

0 0 ratio 10 10

−1 −1 10 10

−2 −2 10 10

−3 −3 10 10

−4 −4 10 10

0 ]0,0.1[[0.1 [0.2 [0.3 [0.4 [0.5 [0.6 [0.7 [0.8[0.9,1[ 1 0 ]0,0.1[[0.1 [0.2 [0.3 [0.4 [0.5 [0.6 [0.7 [0.8[0.9,1[ 1

Figure 2: Probability distribution of the expertise ratio on all teams aggregated over all years and all concepts (left: observed, right: theoretical). The computation of propensities below will be based on the ratio of such observed distributions over theoretical ones.

Zebrafish Rabies

2 2 10 10

1 1 10 10

0 0 10 10

−1 −1 10 10 0 ]0,0.1[[0.1 [0.2 [0.3 [0.4 [0.5 [0.6 [0.7 [0.8[0.9,1[ 1 0 ]0,0.1[[0.1 [0.2 [0.3 [0.4 [0.5 [0.6 [0.7 [0.8[0.9,1[ 1

JECFA JEMRA

2 2 10 10

1 1 10 10

0 0 10 10

−1 −1 10 10 0 ]0,0.1[[0.1 [0.2 [0.3 [0.4 [0.5 [0.6 [0.7 [0.8[0.9,1[ 1 0 ]0,0.1[[0.1 [0.2 [0.3 [0.4 [0.5 [0.6 [0.7 [0.8[0.9,1[ 1

Figure 3: Propensity for proportions of experts per article, from our real data vs expected from our random theoretical model — averaged over all years, then over all concepts. (Error bars correspond to 95% confidence intervals with respect to concept averages.)

As regards expertise, (HI) — “teams gathering previous collaborators who are causing semantic innovation around a given topic should involve more individ- should necessarily be similar (semantic innovation might in- uals knowledgeable about it” — is partially con- deed come from repeated collaboration with individuals who fimed and partially contradicted by the empirical have varied semantic backgrounds).

8 Zebrafish Rabies

5 5 10 10

4 4 10 10

3 3 10 10

2 2 10 10 Π Π 1 1 10 10

0 0 10 10

−1 −1 10 10

−2 −2 10 10 0 ]0,0.1[[0.1 [0.2 [0.3 [0.4r[0.5 [0.6 [0.7 [0.8[0.9,1[ 1 0 ]0,0.1[[0.1 [0.2 [0.3 [0.4r[0.5 [0.6 [0.7 [0.8[0.9,1[ 1 JECFA JEMRA

5 5 10 10

4 4 10 10

3 3 10 10

2 2 10 10 Π Π 1 1 10 10

0 0 10 10

−1 −1 10 10

−2 −2 10 10 0 ]0,0.1[[0.1 [0.2 [0.3 [0.4r[0.5 [0.6 [0.7 [0.8[0.9,1[ 1 0 ]0,0.1[[0.1 [0.2 [0.3 [0.4r[0.5 [0.6 [0.7 [0.8[0.9,1[ 1

Zebrafish Rabies

3 3 10 10

2 2 10 10

1 1 10 10 Π Π

0 0 10 10

−1 −1 10 10

−2 −2 10 10 0 ]0,0.1[[0.1 [0.2 [0.3 [0.4r[0.5 [0.6 [0.7 [0.8[0.9,1[ 1 0 ]0,0.1[[0.1 [0.2 [0.3 [0.4r[0.5 [0.6 [0.7 [0.8[0.9,1[ 1 JECFA JEMRA

3 3 10 10

2 2 10 10

1 1 10 10 Π Π

0 0 10 10

−1 −1 10 10

−2 −2 10 10 0 ]0,0.1[[0.1 [0.2 [0.3 [0.4r[0.5 [0.6 [0.7 [0.8[0.9,1[ 1 0 ]0,0.1[[0.1 [0.2 [0.3 [0.4r[0.5 [0.6 [0.7 [0.8[0.9,1[ 1

Figure 4: Propensity of team formation (random hypergraph vs. real data) with respect to hypergraphic repetition ratios for agents (top) and concepts (bottom). (Values are averaged over all years, error bars correspond to 95% confidence intervals with respect to these averages.)

. Firstly, teams with a high proportion of above 1. experts in a concept involved in the collaboration Yet, secondly, teams with a very small proportion of are much more likely, as shown on the right side of experts regarding a concept, i.e. high proportion of each graph on Fig. 3, whose values are significantly neophytes, are also significantly more likely, suggest-

9 cial and semantic features. This allowed the quan- 1.5 JECFA titative estimation of the relative strength of social JEMRA RABIES and semantic patterns behind academic team forma- ZEBRAFISH tion, by empirically studying several communities

1 of scientists and estimating how the composition of teams, both cognitively and socially, diverges from a null hypothesis where collaborators and/or topics would be randomly chosen. 0.5 We could thereby confirm several hypotheses as well as invalidate some hypotheses which had been established in a relatively qualitative fashion in the literature, or in a possibly misleading dyadic form. 0 [0,0.16[ [0.16,0.33[ [0.33,0.5[ [0.5,0.66[ [0.66,0.83[ [0.83,1] More precisely, our measurements suggest a mech- anism of team formation based on (i) a high likeli- ness to repeat previous collaborations patterns, not only dyadic but also n-adic interactions (n ≥ 3) and Figure 5: Average semantic hypergraphic repeti- (ii) a sensible confinement of groups of individuals, tion ratio (y-axis) for a given range of social hyper- whose collaborations appear to depend largely on graphic repetition ratio (x-axis). (Error bars corre- the of team memberships, and, similarly, a spond to 95% confidence intervals with respect to sensible semantic confinement where associations of averages on each repetition ratio bin (in abscissa), concepts depend largely on the repetition of previ- such as e.g. [0, 0.1[.) ous associations. On the whole however, the orig- inality of a paper does not seem to stem from an ing that part of the use of new concepts is also due to original composition of the underlying team, while teams almost completely new to such concepts (even a polarization appears between groups made of ex- if, as is proved by (H1), these very teams are still perts only or made of non-experts only, which al- more likely to stem from repeated collaborations). together correspond to collectives exhibiting a high Put bluntly, new concept usage, and thus part of rate of repeated interactions. innovation, appears to stem both from teams sig- Perspectives on models of academic collaboration. nificantly ignorant of such concepts and from teams Taking into account an implicit group structure, globally knowledgeable about such concepts. both at a social and at a socio-semantic level, as evi- From this observation that “all-experts” and “all- denced by the data, is likely to faithfully account for neophytes” teams are more likely, we may expect the structure of academic collaboration networks. that such teams stem from underlying groups (either Indeed, the underlying low-level dynamics is plau- still working on the same topic, or working on a new sibly closer to hypergraphic team formation mecha- topic, respectively) and thus have a higher social hy- nisms than would be allowed by a design based on pergraphic repetition ratio. Similarly, those teams dyadic interactions only. As said before, this should stemming from underlying groups are likely to carry not yield a lack of organizational thinking regarding normal, specialized science and have higher semantic the underpinnings of scientific production: beyond hypergraphic repetition ratio (or lower originality). the step that constitutes our present contribution, Figure 6 sheds light on these issues by comparing av- an exhaustive approach about this type of collabora- erage hypergraphic repetition ratios with expertise tion mechanisms would indeed have to involve both ratios. In particular, we observe that teams with epistemic hypergraphs and organizational features. a balanced composition of experts have a higher In this respect, while we claim and show that hyper- social originality (lower social hypergraphic repe- graphs make it possible to capture some interesting tition ratio), yet semantic originality remains con- processes of team-based, knowledge-intensive pro- stant across various values of expertise ratios. This duction systems, we also emphasize that the richness partially confirms (HII) as regards social originality of organizational mechanisms should not be shad- and partially invalidates it as regards semantic orig- owed by this formalism. inality: indeed, social originality is increased when In line with our results, it should also be possi- there is a mixed proportion of experts, but not se- ble to determine which features, at the level-team, mantic originality. favor better collaborations — not only in terms of semantic originality, but also in terms of and of output, in a broad sense. 5 Concluding remarks

We presented a formal framework to appraise the Acknowledgements. This work was partially underpinnings of collaboration formation with a hy- supported by the and Emerging Technolo- pergraphic approach which encompasses both the gies programme FP7-COSI-ICT of the European meso-level of teams and the joint dynamics of so- Commission through project QLectives (grant no.:

10 Zebrafish Rabies

1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0 [0,0.16[ [0.16,0.33[ [0.33,0.5[ [0.5,0.66[ [0.66,0.83[ [0.83,1] [0, 0.16[ [0.16,0.33[ [0.33,0.5[ [0.5,0.66[ [0.66,0.83[ [0.83,1]

JECFA JEMRA

1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0 [0,0.16[ [0.16,0.33[ [0.33,0.5[ [0.5,0.66[ [0.66,0.83[ [0.83,1] [0,0.16[ [0.16,0.33[ [0.33,0.5[ [0.5,0.66[ [0.66,0.83[ [0.83,1]

Figure 6: Average hypergraphic repetition ratios (y-axis) with respect to expertise ratios (x-axis): social (dashed line) and semantic (plain line) cases. (Error bars correspond to 95% confidence intervals with respect to averages on each expertise ratio bin (in abscissa), such as e.g. [0, 0.1[.)

231200). We thank David Chavalarias and several M. Meyer, W. R. Scott (Eds.) Structures of Power anonymous reviewers for their useful comments. and Constraint: Papers in Honor of Peter M. Blau, Cambridge University Press, pp. 453–476.

References Bryant, S. L., Forte, A., Bruckman, A. (2005), Becoming wikipedian: Transformation of participation in a col- Adams, J. D., Black, G. C., Clemmons, J. R., Stephan, laborative online encyclopedia, in: Proc. of Group’05, P. E. (2005), Scientific teams and institutional collab- Sanibel Island, FL, USA. orations: Evidence from U.S. universities, 1981–1999, Research Policy, 34:259–285. Callon, M. (1986), Some elements of a sociology of trans- lation: domestication of the scallops and the fisher- Ancona, D. G., Caldwell, D. F. (1992), Demography and men of StBrieuc Bay, Power, Action and : A design: predictors of new product team performance, New Sociology of Knowledge, 32:196–233. Organization Science, 3:321–341. Callon, M. (1994), Is science a public good?, Science, Barab´asi, A.-L., Jeong, H., Ravasz, R., Neda, Z., Vicsek, Technology & Human Values, 19:395–424. T., Schubert, T. (2002), Evolution of the social net- work of scientific collaborations, Physica A, 311:590– Callon, M., , J., Rip, A. (1986), Mapping the dy- 614. namics of science and technology, MacMillan Press, London. Bollen, K. A., Hoyle, R. H. (1990), Perceived cohe- sion: A conceptual and empirical examination, Social Chubin, D. E. (1976), The conceptualization of scientific Forces, 69:479–504. specialties, The Sociological Quarterly, 17:448–476. Breiger, R. L. (1974), The duality of persons and groups, Social Forces, 53:181–190. Constant, D., Sproull, L., Kiesler, S. (1996), The kind- ness of strangers: The usefulness of electronic weak Breiger, R. L. (1990), and social net- ties for technical advice, Organization Science, 7:119– works: a model from Georg Simmel, in: C. Calhoun, 135.

11 Cowan, R., Jonard, N., Zimmermann, J.-B. (2002), The Lar´edo, P. (1998), The networks promoted by the frame- joint dynamics of networks and knowledge, Comput- work programme and the questions they raise about ing in and Finance 354, Society for Com- its formulation and implementation, Research Policy, putational Economics. 27:589–598.

Crane, D. (1969), Social structure in a group of scien- Latour, B., Woolgar, S. (1979), Laboratory Life: The tists: a test of the ’invisible college’ hypothesis, Amer- Social Construction of Scientific , Sage Publica- ican Sociological Review, 34:335–352. tions, Beverly Hills.

Davis, G. F., Greve, H. R. (1996), Corporate elite net- Lazega, E., Jourda, M.-T., Mounier, L., Stofer, R. works and governance changes in the 1980s, American (2008), Catching up with big fish in the big pond? Journal of Sociology, 103:1–37. multi-level network analysis through linked design, Social Networks, 30:159–176. deB. Beaver, D. (1986), Collaboration and teamwork in , Czech Journal of Physics B, 36:14–18. Leahey, E., Reikowsky, R. C. (2008), Research special- ization and collaboration patterns in sociology, Social deB. Beaver, D., Rosen, R. (1978–1979), Studies in sci- Studies of Science, 38:425–440. entific collaboration. Parts I, II, III., Scientometrics, 1:65–84, 133–149, 231–245. Levrel, J. (2006), Wikip´edia, un dispositif m´ediatique de Faulkner, R. R., Anderson, A. B. (1987), Short-term publics participants, R´eseaux, 24:185–218. projects and emergent careers: Evidence from holly- Lott, A. J., Lott, B. E. (1965), Group cohesiveness wood, American Journal of Sociology, 92:879–909. as : a review of relationships Freeman, L. C. (2003), Finding social groups: A meta- with antecedent and consequent variables, Psycholog- analysis of the Southern women data, in: R. Breiger, ical Bulletin, 64:259–309. K. Carley, P. Pattison (Eds.) Dynamic Social Net- work Modeling and Analysis, The National Academies McPherson, M., Smith-Lovin, L. (2002), Cohesion and Press, Washington, D.C., pp. 39–97. membership duration: linking groups, relations and individuals in an ecology of affiliation, Advances in Friedkin, N. E. (2004), Social cohesion, Annual Review Group Processes, 19:1–36. of Sociology, 30:409–425. McPherson, M., Smith-Lovin, L., Cook, J. M. (2001), Guimera, R., Uzzi, B., Spiro, J., Amaral, L. A. N. Birds of a feather: Homophily in social networks, An- (2005), Team assembly mechanisms determine collab- nual Review of Sociology, 27:415–444. oration network structure and team performance, Sci- ence, 308:697–702. Melin, G., Persson, O. (1996), Studying research collab- oration usign co-authorships, Scientometrics, 36:363– Haas, P. (1992), Introduction: epistemic communities 377. and international policy coordination, International Organization, 46:1–35. Miles, R. E., Snow, C. C. (1996), Organizations: New concepts for new forms. a reader in industrial organi- Jones, B. F., Wuchty, S., Uzzi, B. (2008), Multi- zation, in: P. J. Buckley, J. Michie (Eds.) Firms, Or- university research teams: Shifting impact, geogra- ganizations and Contracts, Oxford: Oxford University phy, and stratification in science, Science, 322:1259– Press, pp. 429–441. 1262. Moody, J. (2004), The structure of a social science col- Jones, C., Hesterly, W. S., Borgatti, S. P. (1997), A laboration network: Disciplinary cohesion from 1963 general theory of network governance: Exchange con- to 1999, American Sociological Review, 69:213–238. ditions and social mechanisms, Academy of Manage- ment Review, 22:911–945. Mullins, N. C. (1972), The development of a scientific specialty: The phage group and the origins of molec- Karsai, I., Penzes, Z. (1993), Comb building in social ular biology, Minerva, 10:51–82. wasps: Self-organization and stigmergic script, Jour- nal of Theoretical Biology, 161:505–525. Newman, M. E. J. (2001), Scientific collaboration net- works. I. Network construction and fundamental re- Katz, J. S., Martin, B. R. (1997), What is research col- sults, and II. Shortest paths, weighted networks, and laboration?, Research Policy, 26:1–18. , Physical Review E, 64:016131 & 016132. Kogut, B., Metiu, A. (2001), Open-source software de- velopment and distributed innovation, Oxford Review Newman, M. E. J., Strogatz, S., Watts, D. (2001), Ran- of Economic Policy, 17:248–264. dom graphs with arbitrary degree distributions and their applications, Physical Review E, 64:026118. Laband, D. N., Tollison, R. D. (2000), Intellectual collaboration, The Journal of Political Economy, Noyons, E. C. M., van Raan, A. F. J. (1998), Monitor- 108:632–662. ing scientific developments from a dynamic perspec- tive: self-organized structuring to map neural network Lar´edo, P. (1995), Structural effects of ec rt & d pro- research, Journal of the American Society for Infor- grammes, Scientometrics, 34:473–487. mation Science, 49:68–81.

12 Powell, W. W. (1990), Neither market nor hierarchy: where w. is a weight function (given e, we : N → R) Network forms of organization, Research in Organi- which makes it possible to give more or less weight zational Behavior, 12:295–336. to particular subset sizes. Ramasco, J. J., Dorogovtsev, S. N., Pastor-Satorras, For instance: R. (2004), Self-organization of collaboration networks, • taking we(i) = 1, i.e. actually no weighting as Physical Review E, 70:036106. has been used in the paper, Rodriguez, M. A., Pepe, A. (2008), On the relationship 1 between the structural and socioacademic communi- r (e)= ρ (e′) t |e| e t Journal of Informet- 2 − | |− 1 ′ ties of a coauthorship network, eX⊆e ′ rics, 2:195–201. |e |≥2

Roth, C. (2006), Co-evolution in epistemic networks – • reconstructing social complex systems, Structure and if instead we(i)= i, i.e. weighting proportional Dynamics: eJournal of Anthropological and Related to the size of the considered subset, Sciences, 1(3). 1 r (e)= |e′|ρ (e′) t e |e|−1 t Roth, C., Cointet, J.-P. (2010), Social and semantic | |(2 − 1) ′ eX⊆e coevolution in knowledge networks, Social Networks, ′ |e |≥2 32:16–29. 1 Ruef, M. (2002), A structural event approach to |e| − • if finally we(i)= , i.e. weighting propor- the analysis of group composition, Social Networks, i tional to the number  of possible subsets of size 24:135–160. |e| in a set of size i, Ruggie, J. G. (1975), International responses to tech- ρ (e′) nology: Concepts and trends, International Organi- r e t t( )= |e| zation, 29:557–583. ′ eX⊆e |e′| e′ 2 Simmel, G. (1898), The persistence of social groups, | |≥  American Journal of Sociology, 3:662.

Smangs, M. (2006), The nature of the business group: A social network perspective, Organization, 13:889–909.

Stokols, D., Hall, K. L., Taylor, B. K., Moser, R. P. (2008), The science of team science, American Journal of Preventive Medicine, 35:S78–S89.

Uzzi, B., Spiro, J. (2005), Collaboration and creativity: the small-world problem, American Journal of Soci- ology, 111:447–504.

Wagner, C. S., Leydesdorff, L. (2005), Network struc- ture, self-organization, and the growth of inter- national collaboration in science, Research Policy, 34:1608–1618.

Welser, H. T., Gleave, E., Fischer, D., Smith, M. (2007), Visualizing the signatures of social roles in online dis- cussion groups, Journal of Social Structure, 8.

Wuchty, S., Jones, B., Uzzi, B. (2007), The increasing dominance of teams in the production of knowledge, Science, 316:1036–1039.

A Weighting functions

A weighted hypergraphic repetition rate could be written as follows:

′ ′ we(|e |) · ρt(e ) ′ eX⊆e ′ |e |≥2 r (e)= t |e| we(i)  i  i∈{2X,...,|e|}

13