Interval Signature: Persistence and Distinctiveness of Inter-Event Time Distributions in Online Human Behavior
Total Page:16
File Type:pdf, Size:1020Kb
Interval Signature: Persistence and Distinctiveness of Inter-event Time Distributions in Online Human Behavior Jiwan Jeong Sue Moon School of Computing School of Computing KAIST KAIST [email protected] [email protected] ABSTRACT all individuals in order to explicate or model the human be- Interval patterns, or inter-event time distributions that oc- havior [3, 13, 17, 25, 26, 36, 41]. On the contrary, we take cur in human activity, have long been an interest of many another angle and examine the diversity in individual distri- researchers studying human dynamics. While previous stud- butions of inter-event times, which we call interval pattern in ies have mostly focused on characterizing the aggregated this work. How does an individual’s interval pattern change inter-arrival patterns or finding universal patterns across all over time? Does it remain consistent or fluctuate from time individuals, we focus on the diversity among the patterns of to time? If the pattern is persistent, is it distinctive enough di↵erent individuals; the goal of this paper is to understand to characterize one from the others? how persistent an individual’s interval pattern is and how In this work we use four online datasets—Wikipedia edit distinctive it is from those of the others. We use Wikipedia, history, me2DAY, Twitter, and Enron email—to study the me2DAY, Twitter, and Enron email data to study the inter- interval patterns of users across di↵erent platforms. In par- val patterns of online human behavior. Our analysis reveals ticular, we have the entire history of Wikipedia and me2DAY, that individuals have robust and unique interval signatures. which enables us to study longitudinal changes over the years The interval pattern of a user tends to persist over years, in individuals’ interval patterns. even after coming back from a long hiatus of inactivity, Our analysis reveals that individuals have persistent and despite considerable change in circadian rhythms. Further- distinct interval patterns in all four platforms. For a given more, the interval patterns of individuals are highly distinct user, the interval pattern persists over years, even after com- from that of others. We put our new findings in practical ing back from a long hiatus of inactivity or despite consid- use of identifying users. erable change in circadian rhythms. Even more, the interval pattern of an individual is highly distinct from those of oth- ers. Also, we show that abrupt changes in interval patterns Keywords are mostly due to instant bursts rather than changes in per- Inter-event time; human behavior; personal characteristic; sisting trends. This result implies that the interval pattern behavioral signature; interval signature; is a coherent personal characteristic. Based on our findings, we use interval patterns in user 1. INTRODUCTION identification and linking an individual’s accounts. Our pro- Vibrant human dynamics manifests in rich patterns and totype implementation demonstrates that the interval pat- leads to complex inter-event times of individual human ac- tern is a good behavior feature distinguishing one from the tions. We access numerous online services of this 21st cen- others. In this era of digital footprints, our findings from tury countless times a day and leave traces of our activities this work have far-reaching consequences. We show that an online. Let us take Facebook as an example. An individ- interval pattern is representative of a person, distinguishing ual may update Facebook status very frequently, say, every one from the rest, and adds a new dimension in capturing other minute, or only daily and weekly, or only after a long one’s personality or preference. hiatus of a few months. If we capture the inter-event time of a user’s activities in a distribution, what does the distri- 2. DEFINING TERMS WITH SAMPLES bution tell us? Figure 1 displays Twitter timelines and the longitudinal Previous studies have mostly focused on analyzing aggre- inter-tweeting time distributions for three famous computa- gated inter-arrival patterns from the perspective of a service tional social scientists, Lada Adamic, Albert-Laszlo Barabasi, provider [1, 2, 10, 20, 42], or finding universal patterns across and Nicolas Christakis. In addition, on our project web page1 we uploaded more samples from many other researchers and 10 celebrities who actually run their own Twitter accounts c 2017 International World Wide Web Conference Committee (IW3C2), [11]. In this section, we introduce the terms we use through- published under Creative Commons CC BY 4.0 License. out this paper, using these samples. WWW 2017 Companion, April 3–7, 2017, Perth, Australia. ACM 978-1-4503-4913-0/17/04. 2.1 Intervals, Windows, and the Patterns http://dx.doi.org/10.1145/3041021.3051115 An interval or inter-event time, ⌧, is the time gap between two consecutive actions by the same individual on a single 1 . https://j1wan.github.io/interval-signature/ 1585 (a) W 1 W 2 W 3 W 4 2007−03−26 2009−07−16 2011−05−06 2013−02−02 2016−08−14 Tweeting Time Index 0.3 0.3 0.3 0.3 0.2 0.2 0.2 0.2 Density 0.1 Density 0.1 Density 0.1 Density 0.1 Density 0.0 0.0 0.0 0.0 1sec 1min 1hr 1d 1w 1yr 1sec 1min 1hr 1d 1w 1yr 1sec 1min 1hr 1d 1w 1yr 1sec 1min 1hr 1d 1w 1yr log(intervals, 2) log(intervals, 2) Inter−tweeting Time log(intervals, 2) log(intervals, 2) (b) W 1 W 2 W 3 W 4 2009−07−13 2010−12−27 2011−12−24 2012−09−27 2016−09−27 Tweeting Time Index 0.3 0.3 0.3 0.3 0.2 0.2 0.2 0.2 Density 0.1 Density 0.1 Density 0.1 Density 0.1 Density 0.0 0.0 0.0 0.0 1sec 1min 1hr 1d 1w 1yr 1sec 1min 1hr 1d 1w 1yr 1sec 1min 1hr 1d 1w 1yr 1sec 1min 1hr 1d 1w 1yr log(intervals, 2) log(intervals, 2) Inter−tweeting Time log(intervals, 2) log(intervals, 2) (c) W 1 W 2 W 3 W 4 2014−11−05 2015−03−12 2015−07−05 2016−01−23 2016−10−08 Tweeting Time Index 0.3 0.3 0.3 0.3 0.2 0.2 0.2 0.2 Density 0.1 Density 0.1 Density 0.1 Density 0.1 Density 0.0 0.0 0.0 0.0 1sec 1min 1hr 1d 1w 1yr 1sec 1min 1hr 1d 1w 1yr 1sec 1min 1hr 1d 1w 1yr 1sec 1min 1hr 1d 1w 1yr log(intervals, 2) log(intervals, 2) Inter−tweeting Time log(intervals, 2) log(intervals, 2) Figure 1: Longitudinal changes in tweeting interval patterns for famous computational social scientists: (a) Lada Adamic, (b) Albert-Laszlo Barabasi, and (c) Nicolas Christakis. online service. In timelines in Figure 1, each vertical line Gaussian kernel density estimation with Sheather and Jones’ represents when the tweet was created, and a length between bandwidth selector [15, 31]. (See Appendix A.1 for detail.) two neighboring vertical lines represents an interval. With providing evidence that each individual’s interval pat- A window is a time period containing a set of consecutive tern is persistent and distinctive, we will call the shape as intervals. We refer to the number of intervals in a window as interval signature. the size and the time duration as the length. For example, in Figure 1, we set each window, from W1 to W4, to have 2.2 Distances between Patterns and Users (n 1)/4 intervals where n is the total number of tweets b − c A noteworthy observation in the samples is the consis- written by each user. Their lengths in time di↵er from one tency in each individual’s interval patterns. Throughout years another. In the sample figures, we split the windows to have of time, the shape of interval patterns of a single user re- an equal number of intervals rather than an equal length in mains consistent over windows, while the shapes are quite time. That is, we fix the size of the windows constant. If we di↵erent from each other. Then, how can we quantify the set the windows to be of fixed length in time, some windows self-consistency and cross-individual diversity? would contain zero or only a few intervals during the period We begin by measuring the distance between two interval of low activity. We want all the windows to have a sufficient patterns. Given any two windows, we calculate the distance number of samples. between those interval patterns using Jensen-Shannon dis- The interval pattern of a window is its inter-event time tance [7, 19, 21]. (See Appendix A.2 for detail.) distribution in log-scale. We transform the discrete time in- Then, we call the distance between a user’s two di↵erent tervals to a continuous probability density function using windows as self-distance, dself. The smaller the self-distance 1586 is, the more consistent the individual’s interval pattern is. For our previous work [12], we cleaned the me2DAY data 4 In Figure 1, each user has 4 windows, thus we have 2 =6 and labeled spam-suspicious accounts analyzing the con- self-distances for each user; the average self-distance, dself , tents. We excluded these accounts from our main analysis. is 0.13 0.02, 0.11 0.03, and 0.12 0.04 for Adamic,h i Barabasi,± and Christakis,± respectively. For± the comparison, 3.3 Twitter we refer the distance between di↵erent users’ windows as Twitter, with its openness, has become probably the most reference distance, dref.