Modeling the Dynamics of WeChat Social System

Chengxi Zang Peng Cui Linyun Yu CS Department, Tsinghua University CS Department, Tsinghua University CS Department, Tsinghua University [email protected] [email protected] [email protected] Tianyang Zhang Wenwu Zhu Hao Ye CS Department, Tsinghua University CS Department, Tsinghua University Corp., Shenzhen, China [email protected] [email protected] [email protected]

ABSTRACT can share their wonderful lives in the “moments” – a social feed ‘e phenomena in social systems, just like many laws of nature, function which allows users to post images, texts, videos, music, are of dynamic origins. For the €rst time, we systematically sum- etc., to receive comments and likes, and to repost the information to marize our experiences in dealing with dynamic phenomena/data others. Meanwhile, with 600 million active mobile payment users, in WeChat social system, which is one of the largest, most com- it o‚ers the largest payment service in China, covering the features plex social systems in the real world. A framework is proposed, like digital wallets, balance accounts, red packet, lucky money, cash which covers the pipeline of dealing with complex data, complex transfer, €nancial investment, etc. People pay through WeChat in phenomena, complex mechanisms, a spectrum of methodologies, restaurants, supermarkets or even vendors by scanning QR codes and various applications. By following our framework, we discuss rather than using cash or credit cards. Moreover, WeChat provides four main aspects of dynamics in WeChat social system, including ocial accounts, city services, mini programs, WeChat index, news human dynamics, network dynamics, group dynamics and cascade feed, search, and many more functions. With the convenient and dynamics. We abstract the data structures from the digital logs, personalized services provided by WeChat, people now exchange investigate the phenomena we observed from the data, explain the their WeChat accounts rather than the business cards when making possible mechanisms governing the phenomena, and give mod- friends or developing business. It’s a super app, an ecosystem, an els we design according to our methodology spectrum to answer app for “everything” in China and beyond. various “WHEN” related questions. We illustrate the “research net- What are the dynamics in WeChat? ‘e WeChat social sys- work” underlying our framework when applied to four dynamic tem is constantly changing over time in all the scenarios described scenarios. Typical applications are summarized and illustrated. ‘e above. When we text each other, add a new friend, post a photo, framework we give could potentially guide future researches on make a payment, join a group chat, make a conversation, scan a social dynamics related problems, sheding light on modeling and QR code, etc., all these behaviors change, develop over time, and designing data structures, explaining observed phenomena, infer- we call these temporally changing behavioral phenomena of each ring possible mechanisms, choosing and designing dynamic models, user as the dynamics of human behaviors, or Human Dynamics and pinpointing applications in the real-world social systems. for short. ‘e WeChat was €rst released in 2017 with only few users, and then developed to a super huge and complex social net- KEYWORDS work with a billion users and tens of billions of social links. A‰er registering WeChat accounts, new users build social links to the WeChat; Social Dynamics; Human Dynamics; Network Dynamics; existed users, and recommend the WeChat app to his friends who Group Dynamics; Cascade Dynamics; have not registered yet. We call the dynamic phenomena during the evolving process of the WeChat social network as the Network 1 INTRODUCTION Dynamics. Forming social group is an intrinsic human nature. Over What is WeChat? Words fall short of describing whatDRAFT WeChat the network, people build all kinds of groups with speci€c topics. really is. Originally, it appeared as a communication tool. Step by New group members join the group and existed members quit the step, it has become the largest (online) social network in China, group, leading to the group evolution dynamics, namely, Group with 963 million monthly active users in the second quarter of Dynamics. Social medias right now are the major platforms for us 2017. ‘rough WeChat, people communicate with each other via get informed. ‘e information is produced, spreaded and consumed various messaging methods, such as text message, voice message, over the social network, taking on information cascading phenom- group chat, video call, etc. Now, WeChat has gone far beyond a ena. We call the dynamic phenomena of the information cascading communication tool; it comes to be a lifestyle for many. People process as the Cascade Dynamics. We summarize all these dynamic phnomena in the social sytems as Social Dynamics (as shown in Permission to make digital or hard copies of all or part of this work for personal or Fig. 1-Dynamics panel). classroom use is granted without fee provided that copies are not made or distributed for pro€t or commercial advantage and that copies bear this notice and the full citation Why do the social dynamics matter? ‘e phenomena in on the €rst page. Copyrights for components of this work owned by others than ACM WeChat, just like many laws of nature, are of dynamic origins. ‘e must be honored. Abstracting with credit is permi‹ed. To copy otherwise, or republish, studies on the dynamics of WeChat social system, to the best of to post on servers or to redistribute to lists, requires prior speci€c permission and/or a fee. Request permissions from [email protected]. our knowledge, provide the €rst-ever and the largest empirical Conference’17, Washington, DC, USA investigations on the real-world dynamics of the human behaviors, © 2016 ACM. 978-x-xxxx-xxxx-x/YY/MM...$15.00 DOI: 10.1145/nnnnnnn.nnnnnnn Social Dynamics

Human Network Group Cascade

Dynamics Dynamics Dynamics Dynamics Dynamics

Event Binned Network

Time Data … Data Data

Heterogeneous Linked Dynamic

Stretched- Complex Correlated Complex Power-law Complex External Exponential Multiscale Joint Bursts Join Quit Holding-time Distribution Periodicity Shocks Distribution Distribution Distribution Distribution

Stochastic Population External Complex Minor Early Stage Power Law Bursts Relaxation Power Law Densification Cascades Shocks Growth Dominance Dominance Phenomena Growth Growth …

(High-Order) Decision Multiscale Self Circadian Modulation Word-of- Correlation Drop Out Stickness Making Memory Excitement Rhythm Effect Mouth Effect

Average Exogenous Preferential Environment Fizzling Information Network Structure Temporal Effect Triggers Attachment Limit Effect Flow Effect Heterogeneity Heterogeneity

Mechanism … Microscopic Level ↑Interpretability Randomness↓ Macroscopic Level

Agent- Stochastic Stochastic Priority Point Survival Differential Time Series Based Differential RNN/LSTM

Model Process Queue Process Analysis Equation Model Model Equation …

Outlier Policy Understanding Fitting Generator Prediction Pattern Mining Generality

Detection Making Apps …

Figure 1: Research Framework of Modeling Dynamics of WeChat Social System. ‡e social dynamics are generated by the hu- man behaviors in the nature, including human dynamics, network dynamics, group dynamics and cascade dynamics, etc. ‡e digital records are sampling data of the social dynamics. We attempt to understand and model the social dynamics by un- derstading the phenomena reƒected in the data records. Possible mechanisms accouting for the phenomena are the pivot to understand the origin of social dynamics, to choose and design suitable models. ‡e models are also degined and used for speci€c applications. network evolution, group evolution and information cascading phe- include spreads of social products, provisioning, social implications nomena. For example, we validate a lot of previous knowledges, of policy changes, churn prediction, policy making, and so on. All ranging from social science, economy, statistical physics to com- these questions are social-dynamics-centric. puter science, on the social dynamic phenomena. Some of them However, to understand and to model the dynamics of WeChat are consitent with what we €nd in WeChat, while many of them social system are very challenging. ‘e chanllges lie in the follow- disobey the realities we observed. For example, does the spread of ing €ve folds: WeChat really follow Bass model with S-shaped curve [33]? What’s Data Challenge 1 . Lack of detailed traces of human dynamics, more, indeed, we €nd a lot of new phenomena in WeChat (such network dynamics, group dynamics and cascade dynamics pre- a complex and huge real-world social system), which areDRAFT not yet vents previous understanding of social dynamics. ‘e WeChat known in literature. Understandings of the real-world social sys- o‚ers possibely the largest and the most detailed records of vari- tems, including data structure, phenomena, mechanisms, modeling ous dynamic changes of a real-world social systems. As shown in techniques, applications, etc., are very valuable for us to further the Fig. 1-Data panel, we have the network data, the time of each understand the constantly changing complex social systems in the event, and the aggregated data in various social scenarios. However, real world. On the other hand, a lot of issues are of vital importance challenges coexist with opportunities. ‘e records generated in for the industry. Can we predict the behaviors of next online pur- WeChat, including the network data, behavior data and so on, are chase of each user in WeChat? Can we spot the abonormal groups indeed “big data”. Besides, these “big data” are indeed complex, through their dynamic behaviors, which may possibly recruit ter- ranging from structure data to temporal data, from individual data rorist members, or spread illegal informations? How many new at microscopic level to population data at macroscopic level, from users will WeChat (or Facebook, Twi‹er, etc.) have next month network evolution data, cha‹ing behavior data, group formation (company valuation)? What are the potential popularity of a new data to information ƒow data, covering di‚erent social scenarios. WeChat verion when being released? What is the right timing of Phenomena Challenge. Refering to the main framework illus- releasing next version of WeChat, or an advertisment? Can we pre- trated in Fig. 1, the social dynamics are generated by the human dict if an post will be popular or not? Can we apply our knowledge 1 All the data that we could access were behavior data with anonymized identities. No deveolped in WeChat to other social systems? And other usefulness content data can be reached. Strict privacy policies are followed. behaviors in the nature, the digital records are sampling data of science, to understand and interpret the possible mechanisms un- the social dynamics, and we a‹empt to understand and model the derlying the phenomena we observed and cared about. We explain social dynamics by understading the phenomena reƒected in the what these mechanisms are. Furthermore, we show the complex digital records. However, we observe quite complex phenomena, as “network” underlying our research framework in the four social enumerated in the Fig. 1-Phenomena panel, ranging from structure dynamic scenarios, including the relationships within these mecha- phenomena to temporal phenomena, from individual phenomena nisms, the relationships between phenomena and mechanisms, and at microscopic level to population phenomena at macroscopic level, the models which are suitable to capture the targeted phenomna from individual behavior phenomena, network evolution phenom- based fundamentally on these mechanisms (Figs. 2-4). ena, group evolution phenomena to information ƒow phenomena, We give a spectrum of modeling frameworks (methodology) covering di‚erent social scenarios. ‘e complex phenomena are which can be potentially used to capture dynamic phenomena in heterogeneous, linked and dynamic. di‚erent realms, as shown in the Fig. 1-Model panel, ranging from Mechanism Challenge. Underlying the complex phenomena agent-based model (used in physics and social science), stochas- are the complex mechanisms which governing human and social tic process (physics and math), queuing theory (math and engi- behaviors, as shown in the Fig. 1-Mechanism panel. We cannot em- neer), point process (math), survival analysis (statistics, epidemiol- phasize enough the importance of mechanism understanding. First, ogy), stochastic di‚erential equation (€nance), di‚erential equation to understand the mechanisms is the major road to uncover social (physics and engineer), timeseries model (engineer and computer dynamics and social systems. Second, to understand mechanisms science), deep learning like RNN and LSTM (computer science) is the vital pivot for us to choose and design models to capture, and other techniques developed in data mining and machine learn- predict and change the realities. However, to understand intrinsic ing €elds. We summarize this spetrum of modeling framwroks mechanisms underlying complex phenomena are very challenging. according to their abilities of capturing microscopic phenomena We are limited and biased by what we learn, what we see, what or macroscopic phenomena, stochastic phenomena or determin- tools we use, and how we interpret. istic phenomena, temporal phenomena or structure phenomena, Model Challenge. Faced with such complex data, phenomena, and their interpretability. Usually, as shown in the spetrum of mechanisms and applications (discussed later), how to choose suit- methodologies in the Fig. 1-Model panel, if we want to capture the able model framework and then design speci€c models are very microscopic and stochastic phenomena, possibly with network in- challenging. Is there a model which can capture and reproduce the formations and with good interpretability, we tend to seach models phenoemena ranging from structure dimension to temporal dimen- le‰ward. On the other hand, if we want to capture the macroscopic sion, from individual at microscopic level to population at macro- and population level phenomena, with less randomness, less in- scopic level, from individual behavior, network evolution, group terpretability, and less emphasis on the intrinsic complexity and evolution to information ƒow, covering di‚erent social scenarios, heterogeneity among the population, we tend to models rightward. and be applied to di‚erent applications? Possibly the ansewer is We show our choice of the methodology and model design in each NO. following section. Application Challenge. Actually in the “Why do the social We summarize the commonly asked questions and applications dynamics ma‹er” paragraph, we illustrate a lot of questions or as: understanding, €‹ing, generator (simulation), prediction, pat- applications the academia and the industry may concern about. tern mining, outlier detection, generality, policy making and so on. ‘ese applications are proposed by di‚erent people and di‚erent ‘e research framework we give, illustrated in Figs. 1-4, could teams who care about di‚erent social scenarios, ranging from hu- potentially guide future researches on social dynamics related man dynamics, network dynamics, group dynamics and cascades problems, sheding light on modeling and designing data structure, dynamics, from microscopic level to macroscopic level. How to explaining observed phenomena, inferring possible mechanisms, cluster and abstract each speci€c task into a higher level problem choosing and design dynamic models and pinpointing applications is of vital importance and challenge. Besides, di‚erent models are in the real-world social systems. usually used for di‚erent tasks. How to choose suitableDRAFT model framework for speci€c applications are challenging. Our Contributions. For the €rst time, we try to summarize our experiences in dealing with social dynamic phenomena, data and 2 HUMAN DYNAMICS related problems occurring in (large-scale, real-world, complex) ‘e human dynamics try to capture the dynamics of individual hu- social systems, especially in WeChat social systems, into a princi- man behaviors, focusing on the timing of actions and their intrinsic pled framework (Fig. 1). Four major aspects of social dynamics are driven factors. For example, when will you add a new friend? When discussed (Figs. 2-4), including human dynamics, network dynam- will you response to your frineds in a conversation or a group chat? ics, group dynamics, and cascade dynamics. We abstract the data When will you post an orginal tweet? When will you retweet a records in these four social dynamic scenarios into data structures. message from your friends? When will you start or response an A‰er that, we summarize the new phenomena we observed in ? When will you purchase another prop in an online game? WeChat data and other social systems. We explain our understand- By following the pa‹ern, i.e. when will sb do something, we can ing of each phenomenon, and investigate the complex relationships enumerate hundreds of “WHEN” questions in the WeChat ecosys- underlying these phenomena in di‚erent social dynamic scenarios. tem. Furthermore, we want to know “WHY” these behaviors occur, We try our best to leverage interdisciplinary knowledges, cov- namely, mechanisms. ‘us, the studies on human dynamics are the ering social science, statistical physics, economy and computer building block of all kinds of social dynamic phenomena, and we try Power Law and Bursty 풇(흉) 2.3 Mechanisms and Models Complex Multiscale 풇(흉) Human Event Correlated Joint Distribution 풇(흉 , … , 흉 ) Time … ퟏ 풏 Dynamics Data Neglected Mechanisms for Bursts ‘e human dynamics describe the individual behaviors at micro- Complex Periodicity scopic level, exhibiting large randomness. Besides, the modeling Stretched- Complex Correlated Power-law Complex Exponential Multiscale Joint Bursts Distribution Periodicity … of human dynamics possibly serves as the microscopic building Distribution Distribution Distribution

Phenomena block of other dynamics, and we’d be‹er give it a good interpreta- tion. ‘us, we prefer model frameworks towards the microscopic

(High-Order) Decision Multiscale Self Circadian Modulation Correlation direction as shown in Fig. 2. Making Memory Excitement Rhythm Effect … Effect (1) Complex multiscale distributions. ‘e physicists explain the Mechanism power law f (τ ) by the decision making process, and they

Stochastic Agent-Based Stochastic Priority Point Survival Differential Time Series model this process by priority queue from queueing thoery Differential RNN/LSTM Model Process Queue Process Analysis Equation Model Equation …

Model or stochastic process [27]. As for the complex multiscale Microscopic Level ↑Interpretability Randomness↓ Macroscopic Level distribution, however, there is no such a model in literature. We try to make it a user-friendly building block for the fol- Figure 2: Human Dynamics. Our research network under- lowing models in more complex scenarios, and we choose lying the human dynamics. Phenomena and correspond- the tools developed in the survival analysis [16] to model ing mechanisms and models are linked by arrow lines with the complex distribution [36]. By modeling the hazard same color. ‡e links in the phenomena panel illustrate the rate λ(τ ) which captures the multiscale memory of human behaviors [36], we get complex distributions according to progress of researches based on new phenomena and thus ∫ t − λ(s)ds new understandings. the formula f (τ ) = λ(τ )e −∞ , together with princi- to €nd and model the shared phenomena among above questions. pled ways to learn the parameters and simulation random We illustrate the “research network” of human dynamics in Fig. 2. samples developed in the hazard rate space. (2) Correlated joint distribution. ‘e correlated joint distribu- 2.1 Data tion of inter-event times are due to the high-order-temporal Usually, the data of the human dynamics are recorded as a sequence correlations, or self excitement of previous behaviors [32]. However, the survival analysis models are designed for of event time Hn = (t1, t2,..., tn−1, tn), where ti is the time of the th the i.i.d. random variables. ‘us, we resort to more gen- i event. Some equivalent transformations of Hn are: Í eral tools which can model the high-order-temporal cor- • Couting process N (t) := t ≤t 1(0,t](ti ), or a sequence i relations and at the same time encompass the results of (t1 : 1, t2 : 2,..., tn : n). complex multiscale distributions modeled by hazard rate. • Inter-event time (IET) sequence Tn(τ1,τ2,...,τn−1), where We choose point process, say Hawkes process [17] and its τi = ti+1 − ti ,i = 1,...,n − 1. variants [32], to model the correlated events. Detailed in 2.2 Phenomena the network dynamics section. (3) Complex periodicity. ‘e complex periodicity exhibits in Due to the intrinsic complexity of human behaviors, the understand- di‚erent time scale, from second, minute, hour, day, week ing and modeling of human dynamics starts from investigating the to month, etc. ‘ese complex periodicity are derived from statistical properties of inter-event time, i.e., the probability denstity the multiscale memory modulated by the circadian rhythm function f (τ ). ‘e seminar studies on human dynamics found that of human behaviors [29]. We follow the point process the i) inter-event time τ follows ii) power law distribution f (τ ) ∝ τ −α framework and model the periodicity by a mixture model , leading to iii) bursty behaviors [5, 27]. However, the reality exhibits with 7 components to capture weekly periodicity, each one much more complexites: of which captures the daily periodicity [29]. ‘e modeling (1) Complex multiscale distributions. We €nd the inter-event of weekly periodicity and daily periodicity strikes a bal- time distribution f (τ ) exhibits complex multiscaleDRAFT distri- ance between the simplicity of models and the accuracy bution rather than the pure power law distribution in vari- of modeling complex periodical dynamics at a reasonable ous datasets [32, 36]. Besides, the power-law distribution time scale. is indistinguishable from stretched-exponential (Weilbull) (4) Bursty f (τ ). Previous explaination of bursty phenomena distribution, or other heavy-tailed distributions [9]. is that the burst is derived from heavy-tailed f (τ ) only [5]. (2) Correlated joint distribution. Furthermore, we found that However, other mechanisms, including self-excitement, the inter-event times are correlated, i.e., f (τ ,τ + ) , f (τ )f (τ + ) i i 1 i i 1 (high-order) temporal correlation also lead to bursts. We or even high-order-temporal correlations , i.e., f (τ ,τ ) , i i+k discuss this in the network dynamics section. f (τi )f (τi+k ), beyond consecutive inter-event times [32], indicating that modeling of f (τ ) only is inadequate [5, 27]. ‘us we try to capture the Hn (N (t) or Tn) directly rather 3 NETWORK DYNAMICS then f (τ ) only. ‘e network dynamics are the dynamics of the evolving process of (3) Complex periodicity. We observe complex periodical changes social networks, at both individual level and population level. As of Tn [29]. For example. the dynamics of we sending or the largest social network in China, to study the network dynamics reading messages change periodically during a day or a of WeChat is of vital importance for the academic interests: How week. does the real-world social network look like? How does WeChat Individual Degree • e(t): the number of new social links in WeChat at time t; Network Event Binned Population Node/Link Rate Network Time Data Population Node/Link Cumulative Dynamics Data • ( ) Microscopic Simulation E t : the cumulative social links in WeChat user by time t, where E(t) = Í e(t);.

Stochastic External Population Complex Power Law Bursts Relaxation Power Law Densification Growth Shocks Growth Growth …

Phenomena 3.2 Phenomena

(High-Order) Average Multiscale Self Exogenous Preferential Fizzling Environment ‘e network dynamics exhibit large complexities at di‚erent level, Correlation Effect Excitement Triggers Attachment Effect Limit …

Memory Effect ranging from individual level to population level. Mechanism (1) Individual degree dynamics ki (t). • ( ) Stochastic Stochastic power law growth. We €nd ki t exhibits var- Agent-Based Stochastic Priority Point Survival Differential Time Series Differential RNN/LSTM Model Process Queue Process Analysis Equation Model …

Equation ious nonlinear growth pa‹erns at long term, including Model Microscopic Level ↑Interpretability Randomness↓ Macroscopic Level accelerating power-law growth, linear growth, decel- erating power-law growth, etc. [32]. For example, Figure 3: Network Dynamics. Our research network under- active users or even outlier users exhibit accelerating lying the network dynamics. Phenomena and correspond- growth of ki (t). Besides, the dynamics of individuals ing mechanisms and models are linked by arrow lines with exhibit large randomness. same color. • Bursts. ‘e ki (t) exhibits bursty growth (bursts) at grow over time? Do the network dynamics of WeChat follow or short term[32]. For example, in China, we exchange disobey previous assumptions? Besides, the studies on network WeChat accounts instead of exchanging printed name dynamics of WeChat are also very import to industrial concerns: cards. When we a‹end a conference, we may add a lot How many new users will WeChat (or Facebook, Twi‹er, etc.) have of new friends within a short period of time, leading next month? How many social links are there next year? How does to bursts in ki (t). However, the bursts are derived each user add friends over time? How many new users will use from complex mechanisms rather than f (τ ) only. WeChat when Tencent releases a latest WeChat verion? etc. All (2) Population rate n(t) . We €nd n(t) exhibits recurring pat- these question are crucial for the company valuation, spreads of terns of external shocks followed by a long period of relax- social products, provisioning, social implications of policy changes, ation dynamics [under review]: churn prediction etc. • External shocks. We €nd a lot of dynamic changes We marjorly care about following four questions on network are triggered by external shocks. For example, the dynamics of WeChat: release of a major version of WeChat may inccur large (1) How does the number of friends ki (t) of user i change over shocks of n(t). ‘e promotion acitivities like the digi- time? tal red packet during Chinese New Year also leads to (2) How does the number of new users n(t) change over time a large shock of n(t). ‘ese phenomena are very com- a‰er the release of a new WeChat version at t0? mon, because WeChat is updated for iPhone, Android, (3) How does the totoal number of users N (t) in WeChat grow Windows Phone, etc., almost every month. over time? • Relaxation. We also €nd a‰er the external shock is a (4) How does the total number of social links E(t) in WeChat long period of relaxation regime, which accounts for grow over time? the majority of the lifetime between consecutive two We see the problem (1) is at microscopic level, namely for each shocks. individual, while problem (2-4) are at macroscopic level, namely (3) Population cumulative number N (t) and E(t). By examing for population. ‘e intrinsic di‚erences between the individual the cumulative number of nodes N (t) over time, we focus dynamics and populations dynamics lie in the di‚erences in data on trend rather than randomness. forms, phenomena, mechanisms and models, as shownDRAFT in Fig. 3 • (Population) Power law growth. ‘e previous network model [6, 19] all assume uniform growth of nodes 3.1 Data N (t). And, the celebrated Bass model [4, 20] which For each user i, we know the records (i, j, ti, j ), j = 1,...,ki which is widely used to model the spreads of innovation, represent the user i add user j as his/her friend at time ti, j . Log- and the Susceptible-Infected (SI) model [2] which is ically, the social link in WeChat is bidirectional. ‘us, we de€ne used to model the epidemic spreading, all produce the history of the adding friend dynamics of user i as H(i,k) = Sigmoid (S-) curve with exponential early growth of (t1,..., tk−1, tk ). N (t). And none of them studied the dynamics of link • ‘e individual degree dynamics, namely the number of E(t). However, reality disagrees. We €nd power law Í friends of useri, is a counting processki (t) := j ≥1 1(0,t](tj ). early growth of both node N (t) and link E(t), indicat- On the other hand, we also care about the population dynamics, ing their scaling relationships over time, implying a recorded as the binned number over time t = 1,...,T with time rather long period of consistent growth [31, 33]. scale hour, day, week or quarter etc., including: • Complex growth paˆern. However, the power law • n(t): the number of new WeChat user at time t; growth may reach the inƒection point due to limited • N (t): the cumulative WeChat user by time t, where N (t) = potential population. And the scaling relationship Ín(t); is origined from the critical state which is seldom reached strictly. ‘us, the real growth pa‹ern are com- middle-scale and long-scale, generating intense plex, possibly taking on stretched long-term growth activities followed by a long vacation, namely, pa‹ern before reach the upper limit [31, 33]. bursts. However, due to ignoring the correla- • Densi€cation of Links. ‘e growth of links are largely tions between IETs, this is an i.i.d. f (τ ) bursty neglected previously. ‘e links built between the mechanism. already-existing nodes leading to densi€cation phe- – (High-order-temporal) Correlation e‚ect and Self- nomenon. Further, we €nd the E(t) scales with N (t) Excitement. Given a Poisson process exhibiting in WeChat, taking on E(t) ∝ N (t)1.41 [31, 33]. . non-bursty growth, if we rearrange the IET se- quence as follows: make small value IET fol- 3.3 Mechanisms and Models lowed by smaller IET, and make large value Here we try to €gure out what are the mechanisms governing IET followed by larger IET, and then we also the phenomena we observed, and what are the suitable model to get bursty behaviors. ‘us, the correlations be- capture these mechanisms. ‘e thingkings leading to the model tween IETs can also generate bursts. Studies are shown. on Hawkes process shed light on one possible (1) Individual degree dynamics ki (t). explanation of these (high-order-temporal) cor- • Average e‚ect and Stochastic power law growth. relation e‚ect : the self excitement, describing – Average e‚ect. Starting from the macroscopic cumulation of the endogenous self-excitement level model, we €nd the power law growth is of previous events. ‘e intensity function of ( |H ) Í ( − due to the average e‚ect, which describes the the Hawkes model is λ t t = µ + ti

€zzling e‚ect. the quit records Qg,m = {(t1,uQ1 ),..., (tm,uQm )}, where tj is the ( ) th In all, we model the totoal number of nodes N t and links time of the j user uQj who quits the group g. Similarly, we have ( ) d N (t) β ( )( − ( )) two counting process which desscribe the join process and quit E t by di‚erential equations dt = θ N t N N t 0 t dE(t) β ( )( ( ( ) − )γ − E(t) ) d N (t) process respectively: and = θ N t α N t 1 ( ) + 2 re- dt t N t dt • Holding time τi of user i, the time interval between i quit spectively to capture the network dynamics at population and join time, τi = tQ − tJ . level. Our node model uncovers various growth frame- i i Í • Join couting process J(t) := ≤ 1( , ](ti ), or a sequence work, including exponential growth, power law growtrh, ti t 0 t (t1 : 1, t2 : 2,..., tn : n). stretched-exponential growth, Sigmoid (Logistic) growth, Í • it couting process Q(t) := ≤ 1( , ](tj ), or a sequence log-logistic growth and stretched-logistic growth. Our link tj t 0 t (t : 1, t : 2,..., t : m). model captures the power law growth of links and densi€- 1 2 m cation power law. We further explain our population level 4.2 Phenomena model by agent-based simulation and survival analysis model [31, 33]. According to the data we have, we can decompose the group dynam- ics into join group dynamics and quit group dynamics. ‘e group 4 GROUP DYNAMICS dynamics capture the mesoscopic dynamics over the network, im- plying mixture characteristics of both microscopic dynamics and macroscopic dynamics. Speci€cally, we €nd the join group dynam- Group Join Dynamics ics exhibit [35]: Group Quit Dynamics Group Event Microscopic Simulation Time … Dynamics Data (1) Di‚usion join dynamics. ‘e existed group members invite outsiders to join the group, exhibiting di‚usion-like ( word-

External Complex Join Quit Holding-time of-mouth, or epidemic-like) join dynamics. Shocks …

Distribution (2) External-shock join dynamics. Besides, a bunch of outsiders Phenomena may join the group simultaneously due to some external

Word-of- Exogenous triggers, taking on shock-like dynamics. Drop-Out Stickness … Mouth Trigger (3) Complex quit dynamics. How members quit the group Mechanism is largely unknown. We €nd that the holding time (the

Stochastic time between join and quit the group) distributions for Agent-Based Stochastic Priority Point Survival Differential Time Series Differential RNN/LSTM Model Process Queue Process Analysis Equation Model …

Equation di‚erent group exhibit large complexity and heterogeneity, Model Microscopic Level ↑Interpretability Randomness↓ Macroscopic Level ranging from exponential-like distribution, power-law-like distribution and inbetween cases. Figure 4: Group Dynamics. Our research network under- lying the group dynamics. Phenomena and corresponding 4.3 Mechanisms and Models mechanisms and models are linked by arrow lines with We €nd the join process exhibits collective dynamics while quit same color. DRAFTprocess exhibits large randomness. Usually, a person joins a group ‘e group dynamics are the evolution dynamics of social groups, due to invitations, while quits a group largely by the decision of which are the aggregated dynamics of individual behaviors which himself. ‘us, we choose the di‚erential equation, survival analysis are associated with a speci€c group (say, family group, classmates to capture the join and quit group dynamics, and agent-based model group, bussiness group, work team group, interest group, alumni to simulate microscopic process as shown in Fig. 4. group, etc.) in WeChat. ‘e group members join and quit groups, (1) Di‚usion join dynamics. ‘e di‚usion dynamics are due to leading to group evolution dynamics. For example, we want to word-of-mouth, or epidemic-like process. We model the know how many members will this bussiness group have next collective dynamics by di‚erential equation inspired by week? How many members will quit the group next month? ‘e Susceptible-Infected (SI) model and Susceptible-Infected- study of group dynamics is of vital importance. First, forming social Recovered (SIR) model. As for the microscopic simulation, group is an intrinsic human nature. Second, the group dynamics we follow the agent-based model of SI and SIR [2]. may inƒuence human dynamics, network dynamics, and informa- (2) External-shock join dynamics. In WeChat, we have a lot of tion ƒow dynamics. ‘ird, groups have semantics. ‘e study of ways to join a group. A bunch of outsiders may join the business group dynamics can provide insight into the evolution group simultaneously due to some external triggers, like of organizational structure of economic behaviors; the study of joining a new built group in a conference by face-to-face abnormal group dynamics can detect outlier behaviors, say the function, or scaning an online QR-code to join the group, etc. We model these shocks by adding mixture of dirac is the root node, Par(ui ) is the user who passed the information to delta functions into the di‚erential equation model. ui , denoted as the “parent” user, and t1 ≤ t2 ≤ ... ≤ tm. ‘e cascade (3) Complex quit dynamics. ‘e most complex and interest- data structure contains both structure and dynamic informations ing phenomena are the quit dynamics. New members [34]: join group collectively, however, existed members quit • Cascade structure: Cs = {(u1, Par(u1)),..., (um, Par(um)}. the group largely due to their own decisions, leading to • Cascade dynamics: Cd = {t1, t2,..., tm }, with counting ( ) Í ( ) ( large randomness. Furthermore, we €nd the stickness of process Cd t := tj ≤tm 1(0,t] tj , or a sequence t1 : 1, t2 : staying in a group, indicating an existed member tends 2,..., tm : m). to stay in the group longer if he/she has already stayed 5.2 Phenomena there for a long time. In contrast, new members tend to ‘e dynamic spreading process of information ƒow over network quit the group quickly. We model this temporal stickness is a complex phenomenon: by survival analysis model and then incorporate it to the (1) Cascading Phenomenon. ‘e information a user may get is di‚erential equation model [35]. inƒuenced by his/her friends, and thus cascading phenom- ‘e combination of population dynamics and individual dynamics enon occurs naturally. are the key characteristics of group dynamics. ‘us, for the €rst (2) Minor dominance. Although a information cascades reaches time, we combine the di‚erential equation models and survival a lot of people, each one in the cascades contributes to the analysis models to capture mesoscopic group dynamics[35]. total cascade size signi€cantly di‚erent. It is intuitive that 5 CASCADE DYNAMICS an active user with thousands of friends can inƒuence more friends than that of an inactive user with several friends.

Mimic Cascading Process Cascade Event ‘e size of a cascade is usually dominated by few nodes, Networked Survival Analysis Network Time Dynamics Data taking on star-like structure. (3) Early-stage dominance. We further €nd the dominant nodes Minor Early Stage Cascades Dominance Dominance … are also prone to join cascades in early stage, taking on

bursty growth dynamics. Phenomena 5.3 Mechanisms and Models Information Network Multiscale Structure Temporal Flow Effect Memory Heterogeneity Heterogeneity … Predictive modeling on cascades has aroused considerable interests

Mechanism recently. Earlier works focus on predicting the €nal size based on feature engineering on content, behavior dynamics and structure Stochastic Agent-Based Stochastic Priority Point Survival Differential Time Series Differential RNN/LSTM Model Process Queue Process Analysis Equation Model [8], important nodes [11], and so on. More ambitiously, we a‹empt Equation … Model Microscopic Level ↑Interpretability Randomness↓ Macroscopic Level to predict the evolving process of a cascade. A fundamental way to address this problem is to look into the microscopic mechanisms of how cascades take place. Figure 5: Cascade Dynamics. Our research network under- (1) Decomposing cascading phenomenon. In order to capture lying the information cascade dynamics. Phenomena and the cascading phenomenon, we decompose the macro- cas- corresponding mechanisms and models are linked by arrow cade into multiple one-hop subcascades. We adopt the lines with same color. survival model to capture dynamics of each subcascade, ‘e cascade dynamics are the dynamic cascading process of in- and then sum up each subcascades over the real network formation over networks. For example, in WeChat or any other by an agent-based model. social networks, the information you received is from one of your (2) Dominance and heterogeneity. ‘e minor dominace and friend, and you want to share it to your other friends, taking on early-stage dominance are derived from structure hetero- information cascading phenomena, or cascades for short.DRAFT Cascades geneity and temporal heterogeneity respectively. In order are very important in various network scenarios, partly because to capture the temporal heterogeneity, we use Weibull haz- of their ubiquity, and partly because of their various applications, ard rate to capture the heterogeneous temporal dynamics ranging from viral marketing, epidemic prevention, to predicting with di‚erent roo users. However, the traditional hazard the spreads of information. In WeChat and other Tencent social sys- model su‚ers two drawbacks in cascade scenario: a) some tems, say , we focus on how to predict the cascading users have very sparse or even no data on their subcas- process, ‘at is, given the early stage of an information cascade, cades; (2) the hazard rates of connected nodes are also can we predict its cumulative cascade size at any time [30]? correlated. ‘us, we can not directly apply the survival model to the cascading phenomenon. To address these, we 5.1 Data leverage the knowledge in social science and data mining Given a network G = hN, Ei, where N is a set of nodes and E is a methods, introducing interpretable covariates of each user, set of directed/undirected relationships. A piece of information can say their structure, behavior, etc., features, to regularize be originated from one node and spread to its neighboring nodes. the hazard rate. We then propose a Networked Weibull A cascade is typically formed by repeating this process. ‘erefore, Regression model to capture above ideas [30]. a cascade can be represented as following data structure: C = We combine network-based simulation, survival analysis with reg- {(u1, Par(u1), t1), (u2, Par(u2), t2),..., (um, Par(um), tm)}, where u1 ularizers to capture cascading process. 6 APPLICATIONS [10] Riley Crane and Didier Sorne‹e. 2008. Robust dynamic classes revealed by measuring the response function of a social system. PNAS 105, 41 (2008), 15649– ‘e €‹ing, generator/simulation, and prediction applications are 15653. the three basic methods to validate the proposed models in all the [11] Peng Cui, Shifei Jin, Linyun Yu, Fei Wang, Wenwu Zhu, and Shiqiang Yang. scenarios. However, by following the general methods like time 2013. Cascading outbreak prediction in networks: a data-driven approach. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge series models at macroscopic level, it’s dicult for us to interpret, discovery and data mining. ACM, 901–909. understand and to do policy making by human decisions. Besides, [12] Angelos Dassios, Hongbiao Zhao, and others. 2013. Exact simulation of Hawkes process with exponentially decaying intensity. Electronic Communications in they are not suitable to model non-linear, stochastic, and bursty Probability 18, 62 (2013), 1–13. social dynamics. However, by following our frameworks in all four [13] Alceu Ferraz Costa, Yuto Yamaguchi, Agma Juci Machado Traina, Caetano social dynamic scenarios, pivoted by phenomena and mechanism Traina Jr, and Christos Faloutsos. 2015. Rsc: Mining and modeling temporal activity in social media. In SIGKDD’15. ACM. procedures, our models give quite good performance, and at the [14] Alan G Hawkes. 1971. Spectra of some self-exciting and mutually exciting point same time provide interpretations for various scenarios we consid- processes. Biometrika 58, 1 (1971), 83–90. ered. Besides, pa‹ern mining and outlier detection are conducted [15] Hang-Hyun Jo, Juan I Pero‹i, Kimmo Kaski, and Janos´ Kertesz.´ 2015. Correlated bursts and the role of memory range. Physical Review E (2015). in the parameter space with interpratable physical meanings [30– [16] David G Kleinbaum and Mitchel Klein. 2010. Survival analysis. Vol. 3. Springer. 33, 35, 36]. As for the generality, our dynamic models are applied [17] Patrick J Laub, ‘omas Taimre, and Philip K Polle‹. 2015. Hawkes processes. arXiv preprint arXiv:1507.02822 (2015). to a wide range of social systems [30–33, 35, 36]. [18] Jure Leskovec, Lars Backstrom, Ravi Kumar, and Andrew Tomkins. 2008. Micro- scopic evolution of social networks (KDD ’08). ACM, 462–470. 7 CONCLUSIONS [19] Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2007. Graph evolution: Densi€cation and shrinking diameters. TKDD (2007). In this paper, we systematically summarize our experiences in deal- [20] Vijay Mahajan, Eitan Muller, and Frank M Bass. 1990. New product di‚usion ing with social dynamic phenomena, data, and related problems in models in marketing: A review and directions for research. Œe journal of marketing (1990), 1–26. WeChat social system, which is one of the largest, most complex [21] R Dean Malmgren, Daniel B Stou‚er, Adilson E Mo‹er, and Lu´ıs AN Amaral. social systems in the real world. A framework covering the pipeline 2008. A Poissonian explanation for heavy tails in e- communication. PNAS 105, 47 (2008), 18153–18158. of dealing with complex data, complex phenomena, complex mech- [22] Jorge J More.´ 1978. ‘e Levenberg-Marquardt algorithm: implementation and anisms, a spectrum of methodologies, and various applications is theory. In Numerical analysis. Springer, 105–116. proposed. By following our framework, we mainly discuss four [23] Vincenzo Nicosia, Petra E Vertes,´ William R Schafer, Vito Latora, and Edward T Bullmore. 2013. Phase transition in the economically modeled growth of a aspects of social dynamics, including human dynamics, network cellular nervous system. PNAS 110, 19 (2013), 7880–7885. dynamics, group dynamics and cascade dynamics. We abstract the [24] Tohru Ozaki. 1979. Maximum likelihood estimation of Hawkes’ self-exciting data structures from the digital logs, investigate the phenomena point processes. Annals of the Institute of Statistical Mathematics 31, 1 (1979), 145–155. we observed from the data, explain the possible mechanisms gov- [25] JA Peacock. 1983. Two-dimensional goodness-of-€t testing in astronomy. erning the phenomena, and give the models we choose and design Monthly Notices of the Royal Astronomical Society 202, 3 (1983), 615–627. [26] Pedro Olmo S Vaz de Melo, Christos Faloutsos, Renato Assunc¸ao,˜ and Antonio according to our methodology spectrum of modeling dynamics. We Loureiro. 2013. ‘e self-feeding process: a unifying model for communication illustrate the speci€c “research network” of our reseach frameworks dynamics in the web. In Proceedings of the 22nd international conference on World when applied to four dynamic scenarios. Typical applications are Wide Web. ACM, 1319–1330. [27] Alexei Vazquez,´ Joao Gama Oliveira, Zoltan´ Dezso,¨ Kwang-Il Goh, Imre Kondor, summarized and illustrated. ‘e framework could potentially guide and Albert-Laszl´ o´ Barabasi.´ 2006. Modeling bursts and heavy tails in human future researches on social dynamics related problems, sheding dynamics. Physical Review E 73, 3 (2006), 036127. light on modeling and designing data structures, explaining ob- [28] Ye Wu, Changsong Zhou, Jinghua Xiao, Jurgen¨ Kurths, and Hans Joachim Schellnhuber. 2010. Evidence for a bimodal distribution in human commu- served phenomena, inferring possible mechanisms, choosing and nication. PNAS 107, 44 (2010), 18803–18808. designing dynamic models, and pinpointing applications in the [29] Linyun Yu, Peng Cui, Chaoming Song, Tianyang Zhang, and Shiqiang Yang. 2017. A Temporally Heterogeneous Survival Framework with Application to real-world social systems. Social Behavior Dynamics. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1295–1304. REFERENCES [30] Linyun Yu, Peng Cui, Fei Wang, Chaoming Song, and Shiqiang Yang. 2015. From [1] Rodrigo Augusto da Silva Alves, Renato Martins Assuncao, and Pedro Olmo Stan- Micro to Macro: Uncovering and Predicting Information Cascading Process with cioli Vaz de Melo. 2016. Burstiness Scale: A Parsimonious Model forDRAFT Character- Behavioral Dynamics. In IEEE International Conference on Data Mining, ICDM. izing Random Series of Events. In Proceedings of the 22nd ACM SIGKDD. ACM, [31] Chengxi Zang, Peng Cui, and Christos Faloutsos. 2016. Beyond Sigmoids: ‘e 1405–1414. NetTide Model for Social Network Growth, and Its Applications. In Proceedings [2] Roy M Anderson, Robert M May, and B Anderson. 1992. Infectious diseases of of the 22Nd ACM SIGKDD (KDD ’16). ACM, 2015–2024. humans: dynamics and control. Vol. 28. Wiley Online Library. [32] Chengxi Zang, Peng Cui, Christos Faloutsos, and Wenwu Zhu. 2017. Long [3] Valerio Arnaboldi, Marco Conti, Andrea Passarella, and Robin Dunbar. 2013. Short Memory Process: Modeling Growth Dynamics of Microscopic Social Con- Dynamics of personal social relationships in online social networks: a study on nectivity. In Proceedings of the 23rd ACM SIGKDD International Conference on twi‹er. In Proceedings of the €rst ACM conference on Online social networks. ACM, Knowledge Discovery and Data Mining. ACM, 565–574. 15–26. [33] C. Zang, P. Cui, C. Faloutsos, and W. Zhu. 2018. On Power Law Growth of Social [4] Robert B Banks. 1994. Growth and di‚usion phenomena: mathematical frameworks Networks. IEEE Transactions on Knowledge and Data Engineering PP, 99 (2018), and applications. Vol. 14. Springer Science & Business Media. 1–1. DOI:h‹p://dx.doi.org/10.1109/TKDE.2018.2801844 [5] Albert-Laszlo Barabasi. 2005. ‘e origin of bursts and heavy tails in human [34] Chengxi Zang, Peng Cui, Chaoming Song, Christos Faloutsos, and Wenwu Zhu. dynamics. Nature 435, 7039 (2005), 207–211. 2017. antifying Structural Pa‹erns of Information Cascades. In Proceedings of [6] Albert-Laszl´ o´ Barabasi´ and Reka´ Albert. 1999. Emergence of scaling in random the 26th International Conference on WWW Companion. 867–868. networks. science 286, 5439 (1999), 509–512. [35] Tianyang Zhang, Peng Cui, Christos Faloutsos, Yunfei Lu, Hao Ye, Wenwu Zhu, [7] Ginestra Bianconi and A-L Barabasi.´ 2001. Competition and multiscaling in and Shiqiang Yang. 2016. Come-and-go pa‹erns of group evolution: A dynamic evolving networks. EPL (Europhysics Leˆers) 54, 4 (2001), 436. model. In Proceedings of the 22nd ACM SIGKDD International Conference on [8] Justin Cheng, Lada Adamic, P Alex Dow, Jon Michael Kleinberg, and Jure Knowledge Discovery and Data Mining. ACM, 1355–1364. Leskovec. 2014. Can cascades be predicted?. In Proceedings of the 23rd interna- [36] Tianyang Zhang, Peng Cui, Chaoming Song, Wenwu Zhu, and Shiqiang Yang. tional conference on World wide web. ACM, 925–936. 2016. A multiscale survival process for modeling human activity pa‹erns. PloS [9] Aaron Clauset, Cosma Rohilla Shalizi, and Mark EJ Newman. 2009. Power-law one 11, 3 (2016), e0151473. distributions in empirical data. SIAM review 51, 4 (2009), 661–703.