arXiv:cond-mat/0308544 v1 26 Aug 2003 ieo h aasti eylrecmae otenet- the to compared large very is set sampling data the less if the is However, of cessation tie’s time beginning. a its of than time the clear-cut accumu- that reason of is network the contacts, the lated probably on ongoing focused and have out studies complication, on mapped earlier The based has 17)) surveys 16; contacts. field (5; or Refs. and interviews (e.g. actors studies of network traditional network social Furthermore, net- the do. a contacts what of accumulated than their idea friends, contacts colloquial ongoing of the of work to network closer ever the conceptually has believe lies person also a We acquaintances all can had. not acquaintances but distant helpful, In be (20), processes relevance. no search have social relationships inactive often network—in cases the are of such (19) evolution the fads to and compared fast (3), rather formation dis- opinion of (2), spreading the so- eases of ongoing dynamics of The network relationships: the cial social rather the is interest cases, of many network networks In on relationships. focused accumulated have of studies network social scale admntok,adhwti tutr a emerge Most can structure behavior. this individual from how and completely networks, from the random deviates ways interaction what social of studies structure—in networks these network specifically on More focused 13). have 7; (1; sociology physics and between field interdisciplinary the an witnessed of has appearance are years last the sta- sets thus, of and data techniques physics, modeling tistical large and methods These the to suitable sequences. hu- of interaction sets data man large very extract to researchers allowed INTRODUCTION I. ewr yaiso non oilrelationships social ongoing of dynamics Network ∗ 1 lcrncades [email protected] address: Electronic ietr itn ntesm or,Rf (6). Ref. board, same the in sitting of instan directors networks (relatively) is literature large physics recent of in figuring type network only the knowledge, our To h eetdvlpeti aaaetcnlg has technology database in development recent The cuuae otcs n httesrcua fluctuations structural the that and contacts, accumulated oapwrlwta h eredsrbto ftentokof coe network mixing the ne assortative of for distribution and distribution degree clustering degree the the the than power-law that a find We to be proposed. th time reanalyze are to time (the Methods duration decaying. exponentially relationship is actors) of distribution the that etrHolme Petter yaia set.W n itiuino epnetms( ongoin times response of of networks distribution social a Sweden explore find di We we Umeå, paper aspects. ha 87 this dynamical networks In interaction 901 of contacts. studies University, large-scale recent Umeå Many Physics, of Department ff rn ieto ewe w cos hthsapwrlwsha power-law a has that actors) two between direction erent ASnmes 89.65.-s,89.75.Hc,89.75.-k numbers: PACS ∗ 1 fteercn large- recent these of ffi insaeo h aeodrfrntok fogigand ongoing of networks for order same the of are cients corporate taneous ftefre r ahrlarge. rather are former the of aat opnaefrtefiiesampling finite the for compensate to data e we h rtadls otcsbetween contacts last and first the tween efcsdo ewrso accumulated of networks on focused ve ie ewe osctv otcsof contacts consecutive between times cuuae otcsd.W e that see We do. contacts accumulated wrso non otcsfisbetter fits contacts ongoing of tworks eainhp iha mhsson emphasis an with relationships g endas defined okdnmc;te n a,a time a at can, one then dynamics; work ie o cos htocri otc tatm earlier time a at contact a in occur than that actors) (or tices ( times ino h rp fogigcnat ttime at contacts ongoing of graph the of tion ct evrhnln negaut tdns email students’ traf- undergraduate email handling out-going server and in- a all to of fic consists and (8) Ref. occurred. has collaboration with the together say preprint a of coauthor v a as appeared has contacts CONSTRUCTION NETWORK AND NOTATIONS II. Internet an within interaction (10). and (11), community collaborations (8) sets scientific exchange data from email The obtained are relationships. use the we the investigate of we network, structure the temporal of compared evolution enough time To long the is to time relationships. sampling the ongoing that of justify structural networks and of structure paper the fluctuations present study to the method In that this use again. contacts we occur of will and network occurred the has of network by the relationships approximate ongoing span, time sampling the of ttime at ewrsteitrcini ietd ecl h e of set the (neglecting the a elements order) two call first two We other same the the with directed. contacts is for interaction unordered, the are networks arguments first two tr ri.r hr cetsstesle a upload between edge can An repos- themselves manuscripts. preprint scientists year the where from one arxiv.org extracted sampled itory is (but data (11) This Ref. in longer). used as data similar eoe ag ag.W loargue also We range. large a over pe v B A h ietemnsrp sulae stetm we time the is uploaded is manuscript the time The . h mi ewr stesm aasta rsne in presented as set data same the is network email The l u aast aetefr flsso rpe,or triples, of lists of form the take sets data our All o h ewr fsinicclaoain euse we collaborations scientific of network the For , v t B and , t hr hr xs otcsbetween contacts exist there where ) ′ ( , t and relationship o h cetfi olbrto ewrsthe networks collaboration scientific the For . v A G E , t ( v ( ′′ t B t ) stesto nree ar fvertices of pairs unordered of set the is ) , uhthat such = t enn that meaning ) { V between ( t ) , E ( t t ′ ) < } where , v t A < and v v t A A ′′ . and and V v B ( t u approxima- Our . stesto ver- of set the is ) v v B B t a interacted has en that means nteinterior the in v A and t sthen is v B v at A 2

TABLE I Statistics of the networks. Date notations have the format year-month-day hour:minute:second GMT. The number of ties does not include self-communication (e.g. self-addressed e-mails) but in “number of contacts” such communication is included. e-prints e-mail pussokram.com start of sampling 1995-01-01 06:00:00 2001-07-29 03:11:33 2001-02-13 14:39:25 end of sampling 2001-01-01 05:31:00 2001-11-18 02:06:28 2002-07-10 15:28:00 sampling duration, tstop 2192.0 days 112.0 days 512.0 days number of actors, N 58 342 64 370 29 341 number of ties, M 294 901 97 425 115 684 number of contacts 530 481 447 543 536 276 relationship duration, tdur 1532(30) days 187(5) days 129(10) days accounts in Kiel, Germany. The Internet community network is constructed from 0.001 (a) the same data set as in Ref. (10). Here an edge represents ff any of four di erent ways of contacts between users of 10−5 the Swedish Internet community pussokram.com. This community is intended for romantic communication p p’ among adolescents and young adults. p, p’ 10−7 e−print For the email and pussokram.com networks one can define a direction for the contacts. In the study of net- 10−9 e−mail work structure, however, we will consider the contacts pussokram.com as bidirectional. Statistics for the networks are presented 10−11 in Table I. 0.001 0.01 0.1 1 10 100 1000 τ (days) 0.01 III. RELATIONSHIP DYNAMICS AND THE SPEED OF NETWORK CHANGE e−mail (b) 0.005 p Before we investigate the approximate network of on- going contacts, as defined above, we discuss the speed 0 1 2 3 4 5 6 7 8 9 10 of interaction and the validity of the approximation. τ (days) First, we focus on the distribution of response times τ— 0.005 times between consecutive contacts of different direction 0.004 2 0.003 pussokram.com (c)

within a relationship. For the undirected e-print data p we simply define τ as the time between consecutive up- 0.002 loads of e-prints within a relationship. We measure the 0.001 ′ 0 τ-distribution of the data sets, p , and also a quantity p 1 2 3 4 5 6 7 8 9 10 where the effects of the finite size effects are compen- τ (days) sated for. An earlier study (9) has found a power-law ′ like τ-distribution. As shown in Fig. 1(a) this picture FIG. 1 Response time statistics. p is the frequency of response is confirmed in the large scale. This stretched func- times of in the data sets. p is the recalculated quantity com- tional form makes the finite sampling time a problem pensated for the finite sampling time. (a) shows p for all data sets in log scales. The data is log-binned. p′ is plotted for the as it imposes a cut-off on the recorded distribution p′. last five bins, elsewhere p′ and p overlaps to a great extent. A To compensate for this and construct a better approx- blow-up of p (in linear scale) is shown in (b) (e-mails) and (c) imation p to the real distribution, we use the formula (pussokram.com). P(Bτ) = P(A ∩ Bτ)/P(A|Bτ) where A is the event that a re- sponse interval that starts within the sampling interval It = [0, tstop] also ends within It, and Bτ is the statement during the sampling. To find P(A|Bτ) we note that, if we that the response time is τ. Now P(A ∩ Bτ) is just the assume that contacts occurs with a constant rate (which frequency distribution of interval length as measured is reasonable in a long term perspective for a system of a relatively constant number of actors), then a response interval ends within It with probability

2 As mentioned we will focus on undirected networks later, but for 1 − τ/tstop . (1) comparison with other works we use directed contacts in this defi- nition. The conclusion from an undirected definition would be the However, the sizes of the communities need not to be same. time independent. For response intervals involving an 3 actor that enters the system at time t Eq. (1) becomes and finally a formula for integrating µ: 1 − (τ + t)/tstop. Now we approximate the time an actor v enters the system with the first time t that v is involved v ∆µ′(t + ∆t) + µ(t) in a contact, and get the formula: µ(t + ∆t) = . (6) 1 − ∆tp(tstop − t) tv + τ p(τ) = P(A|Bτ) = a Θ(tstop − tv − τ) 1 − (2) " tstop # Xv∈V We also need the factor a of Eq. (2) which is hard to where Θ( · ) denotes the Heaviside function, and a is estimate since we don’t exactly know p(τ)’s long term a normalizing constant. p is plotted, along with the p′ behavior. However, we note that for certain a the µ(t) values that differs most from p, in Fig. 1(a) (here a is cho- curves are rather straight in a lin-log plot, see Fig. 2 (as sen to make p coincide with p′ for small τ values rather opposed to the e-mail and pussokram.com curves the than to normalize p). We note that p is straighter than e-print curve decays so slowly that a power-law form of p′ in the log-log scale (for at least the e-mail and pus- µ(t) cannot be ruled out). This means that the charac- sokram.com curves), which suggests a power-law like teristic duration time is well-defined—fitting to an ex- behavior over a considerable range. (Of course there is ponential A exp(−t/tdur)(A and tdur are the two degrees eventually a cut-off—from the human life time, if noth- of freedom) gives the characteristic durations tdur of re- ing else.) The e-print curve has a peculiar bend as it lationships displayed in Table I. To be able to approxi- seems to shift exponent around τ = 300 days, an obser- mate the network of ongoing contacts with the network vation we hope future studies can explain. There is a of contacts that have happened and will happen again conspicuous irregularity around τ = 1 day for the e-mail one would like tdur ≪ tstop to hold. We see that for the and pussokram.com curves. This was also observed in pussokram.com data tstop is about four times as large as Ref. (9) and explained as an effect of people’s everyday tdur which enables us to draw some conclusions using routines—the Kiel students read and reply their e-mails this approximation. The effective sampling times of the at the same hour as the day before, the pussokram.com e-print and e-mail data are, however, so short that we members log in after school or work, and so on. This exclude these for the latter section of this paper. ff e ect is more visible in a linear scale, see Figs. 1(b) and Now we take a brief look at the time evolution of the (c). For the e-mail curve the peak at seven days is larger network sizes—n, the number of active actors at time t, than the surrounding peaks, indicating that some emails and m, the number of edges in our approximate network are associated with weekly routines among the Kiel stu- of ongoing relationships. If the number of active users dents and their contacts. This one-week-peak can not increases during the sampling period, the time evolution be seen in the pussokram.com curve; possibly reflecting of n and m should be right-skewed, and this is indeed that business (or university studies) has more weekly true for the e-print data (as seen in Figs. 3(a) and (b)). scheduled routines than leisure do. The e-mail and pussokram.com curves are more sym- Now we turn to the more central question about the metric (the pussokram.com curve is indeed slightly left- speed of relationship cessation. Our central quantity skewed). We note that for pussokram.com, m is much is the number of relationships existing at time t0 that less than M. The kinks of the e-mail curves are due to ≤ still remains at time t (we assume 0 t0 < t < tstop), group or spam e-mails, the other quantities m, p and so µ(t0, t). This quantity can crudely be approximated with on, are not affected by this. the number of relationships at t that existed at t0 that will ′ occur again before tstop, µ (t0, t). The error in the approx- imation will be rather large for t close to tstop. But, just as above, one can improve the approximation considerably. If one assumes that the response time distribution p(τ) IV. NETWORK STRUCTURE AND STRUCTURAL applies to all relationships regardless if the relationship FLUCTUATIONS is new or old; then, during a time interval ∆t, the change of µ can be written: Now we turn to the structure of the network of ongo- ′ ∆µ = ∆µ + µ ∆π (3) ing contacts, and the fluctuations of the structural mea- where π(t) is the probability that a relationship, that has sures. In this Section we only use the pussokram.com ff its last recorded contact at time t, actually continues after data (due to, as mentioned above, the large e ective sam- pling time for this data set). We focus on three quantities tstop: that recently have received much attention: The first ∞ structural measure is the distribution of degree (number π(t) = p(τ) ∆τ, (4) of edges to a vertex). The first quantity is the clustering τ=Xtstop−t coefficient C(G) where we use the traditional sociological where the sum is over the bins of the p(τ) histogram. A definition change of variables gives:

∆π(t) = ∆tp(tstop − t) , (5) C(G) = c3(G)/p3(G) , (7) 4

5000 10000 10000 1000 5000 5000 500 ’ ’ ’ (a) (b) 100 (c) µ, µ e−print µ, µ e−mail µ, µ pussokram.com 1000 µ 1000 500 µ’ 500

500 1000 1500 2000 25 50 75 100 200 300 400 500 t (days) t (days) t (days)

′ FIG. 2 The speed of network change. µ is the number of edges at time t0 that still are present at time t > t0. We choose t0 = 0.2tstop.

20000 (a) where c3(G) is the number of representations of every 3- 3 10000n e−print cycle (triangle) of G and p3(G) is the number of represen- 0 tations of 3-paths. The second quantity is the assortative 0 500 1000 1500 2000 mixing coefficient (12) t (days)

hk1 k2i−hk1ihk2i 50000 (b) r = (8)

m 25000 h 2i−h i2 h 2i−h i2 e−print k1 k1 k2 k2 0 q q 0 500 1000 1500 2000 t (days) where averages are taken over E, and k1 and k2 are the degrees of an edge’s first and second arguments as they 10000 (c) appear in E. n 5000 e−mail The cumulative degree distribution of our approxi- 0 mate network of ongoing relationships, along with the 0 10 20 30 40 50 t (days) corresponding data for the network of accumulated con- tacts is plotted in Fig. 4(a). Just as for the accumulated 20000 (d) network, our approximate network of ongoing relation- m 10000 e−mail ships has a fat tailed degree distribution; but the network 0 of ongoing relationships fits better to a single power-law 0 10 20 30 40 50 t (days) with. The stronger downward bend of P(k) for accumu- lated social contacts has been observed earlier (10; 11); maybe this larger correction to a power-law form is due 5000 (e) n pussokram.com to inactive edges. We note that even if the degree dis- 0 tribution fits very well to that of the Barab´asi-Albert 0 100 200 300 400 500 4 t (days) model (4), the central ingredient in the Barab´asi-Albert model (the “”) does not apply di- rectly to the pussokram.com community. Preferential at- 5000

m tachment means that a vertex acquires new edges with a (f) pussokram.com 0 rate proportional to its degree, but in the pussokram.com 0 100 200 300 400 500 t (days) community the degree of a member is invisible to oth- ers (10). FIG. 3 (a) shows the number of vertices n(t) and (b) shows Next we turn to the time evolution of C and r. In the number of edges m(t) of the networks of contacts that have Figs. 4(b) and (c) these are displayed for the whole sam- occurred and will occur again. pling time. The earliest and latest times can, of course, be affected by the proximity to the borders of the sampling time frame—for our discussion we focus on the interval

3 If (v1, v2, v3) is a triangle, then (v2, v3, v1) another representation of the same triangle. So the number of distinct triangles is c3(G)/6. 4 For a case study of papers citing Ref. (4), see Ref (15). 5

1 0.1 0 ongoing ongoing (c) 0.1 accumulated 0.08 accumulated −0.02 (a) (b) 0.01

) 0.06 −0.04 k r ( C

P 10−3 0.04 −0.06

10−4 0.02 −0.08 10−5 0 −0.1 1 10 100 1000 0 100 200 300 400 500 0 100 200 300 400 500 k t (days) t (days)

FIG. 4 Structure of the network of ongoing contacts. (a) shows the cumulative degree distribution of the network of ongoing contacts along with the degree distribution of the network of accumulated contacts. Error-bars are shown if larger than symbol size. The errors are calculated assuming that networks at times differing more than tdur are independent. (b) shows the clustering coefficient as function of time for networks of ongoing and accumulated contacts. (c) shows assortative mixing coefficient as function of time.

[tdur, tstop −tdur] ≈ [129, 383]days. We see that both C and compensate for the finite sampling time by supposing r are of the same order of magnitude for the networks that the response time distribution is the same for all re- of ongoing and accumulated contacts. These values of lationships, and the same throughout the duration of the C and r are rather neutral in the sense that they can be relationship. We find a response time distribution that expected from a random network with a skewed degree has a power-law like shape in the large scale, but has distribution (14). The fluctuations are rather large, es- an informative small-scale structure reflecting the daily pecially for the clustering coefficient (with a standard and weekly routines. The distribution of relationship deviation of around half the average value). An intrigu- duration is consistent with an exponential decay. This ing question for future studies is how dynamical systems indicates that there is a well-defined characteristic dura- on the networks are affected by strong structural fluctu- tion time of a relationship, tdur; and that if tdur is much ations. Slightly outside our interval, at t ≈ 395 there is less than the sampling time tstop the network of ongoing an upward jump in both C (from 0.023 to 0.041) and r contacts can be reasonably well approximated by the (from −0.057 to −0.043) that is the result of a new contact network of contacts that have happened and will hap- between two of the most central actors. Such a new edge pen again. For one of our data sets—that of the Internet introduces a new triangle for every common neighbor community pussokram.com—we have 4tdur ≈ tstop. For of the two vertices, and can thus increase C substantially this data set we compare the approximate network of on- as two high-degree actors may have many neighbors in going contacts with networks of accumulated contacts— common. Such an event will, by definition, also give a the common way of constructing social networks from large positive contribution to the assortative mixing. We interaction data. We find a degree distribution that fits can expect sudden jumps in many structural quantities much better to a power-law for the network of ongoing for networks of ongoing relationships with fat tailed de- contacts than the network of accumulated contacts. The gree distributions, as the rare event of an edge appearing clustering coefficient and assortative mixing coefficients or disappearing between two of the most connected ver- are of the same order; which, to some extent, justifies tices will affect many structural measures (various kinds the use of network of accumulated contacts as a proxy of centrality measures (18) are probably even more sen- for networks of ongoing contacts. The fluctuations in sitive to such events). Unfortunately the sampling time these quantities are, however, rather large. A fact that is too short, despite the fast pussokram.com dynamics, may have important consequences for dynamical sys- to get good statistics for the autocorrelation function of tems. We hope these results will inspire more extensive C(t) and k(t) (it is consistent with a characteristic time of longitudinal studies of interaction networks with fast dy- decay similar to tdur). namics, as well-converged data for relationship duration distribution and autocorrelation functions of structural quantities are within reach. We also point out the inter- play between dynamical systems on the networks and V. SUMMARY AND CONCLUSIONS the structural fluctuations as an interesting area of future studies. In this paper we investigate networks of ongoing con- tacts from three large sets of social interaction data. We study the response time distribution and distribution of relationship duration. We reanalyze these quantities to 6

Acknowledgments [9] J.-P. Eckmann, E. Moses, and D. Sergi, Dialog in e-mail traffic. e-print cond-mat/0304433. The author would like to thank Niklas Angemyr, Ste- [10] P. Holme, C. R. Edling, and F. Liljeros, Structure and time- fan Bornholdt, Holger Ebel, Michael Lokner, Mark New- evolution of the Internet community pussokram.com. e-print man, and Christian Wollter for help with data acquisi- cond-mat/0210514. tion; and Beom Jun Kim for comments and suggestions. [11] M. E. J. Newman, Scientific collaboration networks. I. Network construction and fundamental results, Phys. Rev. E 64 (2001), The author was partly supported by the Swedish Re- p. 016131. search Council through contract no. 2002-4135. [12] , Assortative mixing in networks, Phys. Rev. Lett. 89 (2002), p. 208701. [13] , The structure and function of complex networks, SIAM References Rev. 45 (2003), pp. 167–256. [14] M. E. J. Newman and J. Park, Why social networks are diffrent [1] R. Albert and A.-L. Barab´asi, Statistical mechanics of complex from other types of networks. e-print cond-mat/0305612. networks, Rev. Mod. Phys 74 (2002), pp. 47 – 98. [15] F. P ¨utsch, Analysis and modeling of science collaboration net- [2] R. M. Anderson and R. M. May, Infectious diseases in hu- works. e-print cond-mat/0305112. mans, Oxford University Press, Oxford, 1992. [16] A. Rapoport and W. J. Horvath, A study of a large , [3] D. B. Bahr and E. Passerini, Statistical mechanics of opinion Behavioral Sci. 6 (1961), pp. 279–291. formation and collective behavior, J. Math. Sociol. 23 (1998), [17] F. J. Roethlisberger and W. J. Dickson, Management and the pp. 1–27. Worker, Harvard University Press, Cambridge, MA, 1939. [4] A.-L. Barab´asi and R. Albert, Emergence of scaling in random [18] S. Wasserman and K. Faust, analysis: Meth- networks, Science 286 (1999), pp. 509–512. ods and applications, Cambridge University Press, Cam- [5] A. Davis, B. B. Gardner, and M. R. Gardner, Deep South, , 1994. University of Chicago Press, Chicago, 1941. [19] D. J. Watts, A simple model of global cascades on random [6] G. F. Davis and H. R. Greve, Corporate elite networks and networks, Proc. Natl. Acad. Sci. USA 99 (2002), pp. 5766– governance changes in the 1980s, Amer. J. Sociol. 103 (1997), 5771. pp. 1–37. [20] D. J. Watts, P. S. Dodds, and M. E. J. Newman, Identity and [7] S.N.DorogovtsevandJ.F.F.Mendes, Evolution of networks, search in social networks, Science 296 (2002), pp. 1302–1305. Adv. Phys. 51 (2002), pp. 1079 – 1187. [8] H. Ebel, L.-I. Mielsch, and S. Bornholdt, Scale-free topology of e-mail networks, Phys. Rev. E 66 (2002), p. 035103.