Temporal Dynamics of Scale-Free Networks

Erez Shmueli, Yaniv Altshuler, and Alex ”Sandy” Pentland

MIT Media Lab {shmueli,yanival,sandy}@media.mit.edu

Abstract. Many social, biological, and technological networks display substan- tial non-trivial topological features. One well-known and much studied feature of such networks is the scale-free power-law distribution of nodes’ degrees. Several works further suggest models for generating complex networks which comply with one or more of these topological features. For example, the known Barabasi-Albert ”preferential attachment” model tells us how to create scale-free networks. Since the main focus of these generative models is in capturing one or more of the static topological features of complex networks, they are very limited in cap- turing the temporal dynamic properties of the networks’ evolvement. Therefore, when studying real-world networks, the following question arises: what is the mechanism that governs changes in the network over time? In order to shed some light on this topic, we study two years of data that we received from eToro: the world’s largest social financial trading company. We discover three key findings. First, we demonstrate how the may change significantly along time. More specifically, we illustrate how popular nodes may become extremely less popular, and emerging new nodes may become extremely popular, in a very short time. Then, we show that although the network may change significantly over time, the degrees of its nodes obey the power- law model at any given time. Finally, we observe that the magnitude of change between consecutive states of the network also presents a power-law effect.

1 Introduction

Many social, biological, and technological networks display substantial non-trivial topological features. One well-known and much studied feature of such networks is the scale-free power-law distribution of nodes’ degrees [4]. That is, the of nodes is distributed according to the following formula: P [d] = c·d−λ. As the study of complex networks has continued to grow in importance and popularity, many other features have attracted attention as well. Such features include among the rest: short path lengths and a high clustering coefficient [12, 2], or disassortativity among vertices [10], [8] and hierarchical structure [11] for undirected networks and reciprocity [7] and triad significance profile [9] for directed networks. Several works further suggested models for generating complex networks which comply with one or more of these topological features. For example, the known Barabasi-Albert model [4] tells us how to create scale-free networks. It incorpo- rates two important general concepts: growth and preferential attachment. Growth means that the number of nodes in the network increases over time and prefer- ential attachment means that the more connected a node is, the more likely it is to receive new links. More specifically, the network begins with an initial con- nected network of m0 nodes. New nodes are added to the network one at a time. Each new node is connected to m ≤ m0 existing nodes with a probability that is proportional to the number of links that the existing nodes already have. More sophisticated models for creating scale-free networks exist. For example, in [6], at each time step, apart of m new edges between the new node and the old nodes, mc new edges are created between the old nodes, where the probability that a new edge is attached to existing nodes of degrees d1 and d2 is proportional to d1 · d2. A very similar effect produces a rewiring of edges [1]. That is, instead of the creation of connections between nodes in the existing network, at each time step, mr randomly chosen vertices loose one of their connections. In mrr cases, a free end is attached to a random . In the rest mrp = mr − mrr cases, a free end is attached to a preferentially chosen vertex. The main focus of these generative models is in capturing one or more of the static topological features of complex networks. However, these models are very lim- ited in capturing the temporal dynamic properties of the networks’ evolvement. Therefore, when studying real-world networks, the following question arises: what is the mechanism that governs changes in the network over time? In order to shed some light on this question, we studied two years of data (from 2011/07/01 to 2013/06/30) that we received from eToro: the worlds largest social financial trading company. We discover three key findings. First, we demonstrate how the network topology may change significantly along time. More specifically, we illustrate how popular nodes may become extremely less popular, and emerging new nodes may become extremely popular, in a very short time. Then, we show that although the network may change significantly over time, the degrees of its nodes obey the power- law model at any given time. Finally, we observe that the magnitude of change between consecutive states of the network also presents a power-law effect.

2 Datasets

Our data come from eToro: the world’s largest social financial trading company (See http://www.etoro.com). eToro is an on line discounted retail broker for for- eign exchanges and commodities trading with easy-to-use buying and short sell- ing mechanisms as well as leverages up to 400 times. Similarly to other trading platforms, eToro allows users to trade between cur- rency pairs individually (see Fig ??). In addition, eToro provides a platform which allows users to watch the financial trading activity of other users (displayed in a number of statistical ways) and copy their trades (see Fig. 1). More specifically, users in eToro can place three types of trades: (1) Single trade: The user places a normal trade by himself, (2) Copy trade: The user copies one single trade of another user and (3) Mirror trade: The user picks a target user to copy, and eToro automatically places all trades of the target user on behalf of the user. Our data contain over 67 million trades that were placed between 2011/07/01 and 2013/06/30. More than 53 million of these trades are automatically executed mirror trades, less than 250 thousands are copy trades and roughly 13 million are single trades. The total number of unique traders is roughly 275 thousands and the total number of unique mirror operations is roughly 850 thousands (one mirror operation may result in several mirror trades). eToro

. The world’s largest social eToro financial trading company. . Watch the financial trading activity of other users and copy them. . Serving 3 million users worldwide.

. Roughly two years of data.

. The platform allows users to trade between currency pairs (individually) or…

1 . All trades are automatically uploaded to the network where they Fig. 1. The eToro platform. Illustratingcan thebe displayed trading portfolio in a number of a of single statistical user ways. (left) and the trading activity of all users (right). 2

In the remainder of this paper, we use these trades to construct snapshot networks as we proceed to describe. Given a start time s and an end time e, the snapshot network’s nodes consist of all users that had at least one trade open at some point in time between s and e. An edge from user u to user v exists, if and only if, user u was mirroring user v at some point in time between s and e. Figure 2 illustrates how the size of the eToro network grows along time terms of both the number of nodes and the number of edges. For each day during the two years period, a snapshot network is constructed, and the number of nodes and edges for that network are counted.

50000 100000

40000 80000

30000 60000

20000 40000 Number of nodes Number of edges

10000 20000

0 0 0 100 200 300 400 500 600 700 800 0 100 200 300 400 500 600 700 800 Day Day

Fig. 2. The size of the eToro network in terms of the number of nodes (left) and the number of edges (right) along time. 3 Results

First, we examined the in-degrees of nodes in the eToro network, over the entire period of two years. As can be seen in Figure 3, the presents a strong power-law pattern. Although, quite expected, this result is non-trivial. One might expect to see a bunch of users that are mirrored by the others, but what we actually witness is a heavy tail of users with only a few followers each. This result is consistent with the observation in [3] where the authors demonstrate by simulation that the degree distribution of social-learning networks converges to a power-law distribution, regardless of the underlying social network topology.

10-1

10-2

-3 γ=1.64 10

10-4 Density 10-5

10-6

10-7

101 102 103 104 Degree

Fig. 3. In-degree distribution of nodes in the entire eToro network. (The in-degree of a node depicts the number of mirroring traders for the trader represented by that node)

Next, we investigated how the popularity of traders in eToro, in terms of the num- ber of mirroring traders, changes along time. Fig. 4 illustrates the popularity of four traders. As can be seen in the figure, popular traders may become extremely less popular, and emerging new traders may become extremely popular, in a very short time. Note how this behavior differs significantly from the state-of-the-art ”rich get richer” behavior.

1600 700 400 1400 1500 600 1200 300 500 1000 1000 400 800 200 300 600 500 200 400 100 200 100 Number of mirroring traders Number of mirroring traders Number of mirroring traders Number of mirroring traders

0 0 0 0

0 100 200 300 400 500 600 700 800 0 100 200 300 400 500 600 700 800 0 100 200 300 400 500 600 700 800 0 100 200 300 400 500 600 700 800 Day Day Day Day

Fig. 4. The in-degree of four nodes in the evolving eToro network. (Depicting the popularity of the four corresponding traders along time)

To illustrate this point further we checked how similar different snapshots of the network are. Figure 5 presents the top 50 popular nodes for four different time periods: July-September 2011 (snapshot 1), January-March 2012 (snapshot 2), July-September 2012 (snapshot 3) and January-March 2013. That is four three- month snapshots with three-month gaps in between. As can be seen in the fig- ure, only 11 nodes that were included in the top 50 popular nodes of snapshot 1 remained in the top 50 popular nodes of snapshot 2; only 17 nodes that were included in the top 50 popular nodes of snapshot 2 remained in the top 50 popular nodes of snapshot 3 and only 19 nodes that were included in the top 50 popular nodes of snapshot 3 remained in the top 50 popular nodes of snapshot 4. That is, the network may change significantly along time.

Snapshot 1 Snapshot 2 Snapshot 3 Snapshot 4

Fig. 5. The 50 most popular nodes in each one of the four snapshots. Green nodes represent nodes that are included in the 50 most popular nodes of the current snapshot but were not included in the previous one. Red nodes represent nodes that were included in the 50 most popular nodes of the previous snapshot but are not included in the current one. Blue nodes represent nodes that were included in both snapshots. The node’s circle area is proportional to its popularity.

We then examined the degree distribution for each one of the four snapshots above. As can be seen in Figure 6, although the four snapshots differ significantly, the degree distribution for each one of them obey the power-law model.

Snapshot 1 Snapshot 2 Snapshot 3 Snapshot 4

-1 -1 10 -1 -1 10 10 10 -2 -2 10 10-2 10 10-2 γ=1.52 γ=1.63 γ=1.64 γ=1.65 10-3 -3 10-3 10 -4 10-3 10 -4

Density Density -4 Density 10 Density 10 10-5 -4 10 -5 -5 10 10 10-6 -5 -6 10 -6 10 10 10-7

101 102 103 101 102 103 101 102 103 101 102 103 104 Degree Degree Degree Degree

Fig. 6. Degree distribution for each one of the four snapshots that are shown in Figure 5

Next, we studied more carefully the eToro network changes between consecutive days. More specifically, we measured the number of added edges (i.e., edges that did not appear in the previous day and appear in the current day) and the number of removed edges (i.e., edges that appeared in the previous day and do not appear in the current day). Since the size of the eToro network grows over time (see Fig. 2), we normalized the above quantities by dividing them in the number of edges that were present in the previous day. We found that, the normalized magnitude of change between each two consecutive snapshots (according to each one of the two measures) follows a power-law distribution (see Figure 7). 2 10 102

1 gamma=2.88 gamma=2.80 10 101 Density Density

100 100

10-1 2-5 2-4 2-3 2-2 2-6 2-5 2-4 2-3 Change Change

Fig. 7. Distribution of the normalized changes in the eToro network: added edges (left) and re- moved edges (right).

In order to understand better this finding, we tried to break down the overall network changes into two smaller components. First, we measured the changes by taking into account only the nodes that were added and removed between the two consecutive days. That is, we considered only users that were not trading in the previous day but are trading in the current day and users that were trading in the previous day but are not trading in the current day. As can be seen in the top two subfigures of Figure 8, the normalized number of added and removed nodes also follows a power-law distribution. That is, in most days, only a small number of nodes are added to or removed from the network, but occasionally, a large number of nodes are added or removed. We repeated the same analysis, when taking into account only the edges that at least one of their nodes was added or removed. As can be seen in the bottom two subfigures of Figure 8, the result was again a power-law distribution. Then, we measured the changes by taking into account only the nodes that existed in both of the two consecutive days. That is, we considered only users that were trading in the previous day and are also trading in the current day. As can be seen in Figure 9, even when only the common nodes are considered, the normalized number of added and removed edges follows a power-law distribution. Our results were validated using the statistical tests for power-law distributions that were suggested in [5]. First, we applied the goodness of fit test. As can be seen in Table 1, the p-values for all cases are greater than 0.1, as required. Sec- ond, we tested alternative types of distribution. As can be seen in the table, the distribution is more likely to be truncated power-law than general power-law in all cases (the GOF value is negative), and the results are significant in three out of eight of the cases (the p-values are lower than 0.05); the distribution is more likely to be truncated power-law than exponential and the result is significant in five out of eight of the cases cases and the distribution is more likely to be trun- cated power-law than log-normal in all cases and the result is significant in five out of eight of the cases.

4 Summary and Future Work

In this paper, we investigate how scale-free networks evolve over time. Studying a real-world network, we find that: (1) the network topology may change signif- icantly along time, (2) the degree distribution of nodes in the network obeys the 102 102

gamma=3.64 gamma=3.24 101 101 Density Density

100

100

2-6 2-5 2-4 2-3 2-2 2-6 2-5 2-4 2-3 Change Change

102 102

gamma=3.13 gamma=3.00

101 101 Density Density

100

100

2-7 2-6 2-5 2-4 2-3 2-7 2-6 2-5 2-4 Change Change

Fig. 8. Distribution of the normalized changes in the eToro network, as reflected by the added and removed nodes: added nodes (top left), removed nodes (top right), added edges (bottom left) and removed edges (bottom right)

2 102 10

gamma=2.87 gamma=2.61

1 101 10 Density Density

0 100 10

2-6 2-5 2-4 2-3 2-6 2-5 2-4 2-3 Change Change

Fig. 9. Distribution of the normalized changes in the eToro network, as reflected by the common nodes: added edges (left) and removed edges (right).

Goodness Power-Law vs. Trunc. Power-Law vs. Fig. Subfigure xmin alpha of Fit Trunc. Power-Law Exponential Log-Normal added eges 0.024 2.88 0.121 (-) 0.108 (+) 0.012 (+) 0.396 7 removed edges 0.025 2.80 0.207 (-) 0.012 (+) 0.008 (+) 0.000 added nodes 0.073 3.64 0.613 (-) 0.613 (+) 0.093 (+) 0.732 removed nodes 0.023 3.24 0.111 (-) 0.099 (+) 0.160 (+) 0.005 8 added edges 0.018 3.13 0.545 (-) 0.544 (+) 0.063 (+) 0.411 removed edges 0.012 3.00 0.110 (-) 0.108 (+) 0.159 (+) 0.006 added edges 0.014 2.87 0.123 (-) 0.039 (+) 0.027 (+) 0.032 9 removed edges 0.014 2.61 0.131 (-) 0.009 (+) 0.014 (+) 0.000 Table 1. Statistical tests for power-law distributions. The numbers in the three right columns represent the p-value and the sign of the GOF value in brackets. power-law model at any given state and (3) the magnitude of change between consecutive states of the network also presents a power-law effect. Better understanding the temporal dynamics of scale-free networks would allow us to develop improved and more realistic algorithms for generating networks. Moreover, it would help us in better predicting future states of the network and estimating their probabilities. For example, it may help in bounding the probabil- ity that a given node remains popular over a certain period of time. In future work we intend to check how the distribution of changes between con- secutive states of the networks influences the overall networks performance. We hypothesize that in cases where the distribution of changes is closer to a power- law distribution, the overall network performance would be higher. Furthermore, we would like to investigate the mechanism that is responsible for the power-law shape of the distribution. Finally, we would like to suggest a generative model for networks based on the above findings.

References

1. Albert, R., and Barabasi,´ A.-L. Topology of evolving networks: local events and universality. Physical review letters 85, 24 (2000), 5234. 2. Amaral, L. A. N., Scala, A., Barthel´ emy,´ M., and Stanley, H. E. Classes of small-world networks. Proceedings of the National Academy of Sciences 97, 21 (2000), 11149–11152. 3. Anghel, M., Toroczkai, Z., Bassler, K. E., and Korniss, G. Competition- driven network dynamics: Emergence of a scale-free leadership structure and collective efficiency. Physical review letters 92, 5 (2004), 058701. 4. Barabasi,´ A.-L., and Albert, R. Emergence of scaling in random networks. science 286, 5439 (1999), 509–512. 5. Clauset, A., Shalizi, C. R., and Newman, M. E. Power-law distributions in empirical data. SIAM review 51, 4 (2009), 661–703. 6. Dorogovtsev, S. N., and Mendes, J. F. F. Scaling behaviour of developing and decaying networks. EPL (Europhysics Letters) 52, 1 (2000), 33. 7. Garlaschelli, D., and Loffredo, M. I. Patterns of link reciprocity in directed networks. Physical Review Letters 93, 26 (2004), 268701. 8. Girvan, M., and Newman, M. E. Community structure in social and bio- logical networks. Proceedings of the National Academy of Sciences 99, 12 (2002), 7821–7826. 9. Milo, R., Itzkovitz, S., Kashtan, N., Levitt, R., Shen-Orr, S., Ayzenshtat, I., Sheffer, M., and Alon, U. Superfamilies of evolved and designed networks. Science 303, 5663 (2004), 1538–1542. 10. Newman, M. E. in networks. Physical review letters 89, 20 (2002), 208701. 11. Ravasz, E., and Barabasi,´ A.-L. Hierarchical organization in complex net- works. Physical Review E 67, 2 (2003), 026112. 12. Watts, D. J., and Strogatz, S. H. Collective dynamics of small- worldnetworks. nature 393, 6684 (1998), 440–442.