<<

Social ties and checkin sites: Connections and latent structures in Location Based Social Networks

Sudhir B Kylasa Giorgos Kollias Ananth Grama Elec. and Comp. Engg. Dept IBM T. J. Watson Research Center Computer Science Dept Purdue University Yorktown Heights Purdue University West Lafayette, Indiana 47907 NY 10598 West Lafayette, Indiana 47907 [email protected] [email protected] [email protected]

Abstract—Location Based Social Networks (LBSNs) integrate the network. These structures can be leveraged for enhancing location-based facilities with social connectivity for delivering user experience, as well as optimizing flow of information and a variety of services, enhancing user experience, emergency/ influence. disaster management, and streamlining business processes. A number of recent research efforts have studied relationships In this paper, we present a statistical model and detailed between geolocation and social connectivity, social connectivity analysis using real data, of the interplay be- and preferences, and node attributes and strength of social ties. tween individual user attributes (checkin location information) These efforts have successfully demonstrated prediction of various and social ties. In particular, we examine the extent to which attributes based on social connectivity, mobility, dynamic checkin shared checkins are indicative of social connections and vice information etc., including prediction of user location as well as versa. We argue that typical LBSNs are composed of multi- future checkin locations. tiered networks that display markedly distinct structural and In this paper, we study the relationship between shared functional properties. We investigate the use of shared checkin checkin locations and the structure and nature of social ties. information for deconvolving the network, and analyze the We argue that typical LSBNs are in fact composed of layers of properties of these deconvolved networks in detail. Typically, networks of varying structure and function, and that it is possible a social network can be viewed as a super-position of different to deconcolve these networks through effective statistical analysis sub-networks with statistically different properties (clustering of shared checkins. In this context, we pose and validate the coefficient, etc). We show how one can identify constituent following hypotheses: (i) a large number of shared checkins imply social connectivity; however, social connectivity does not imply subnetworks, and how properties of these sub-networks can statistically large number of shared checkins; (ii) entities in social be leveraged for specific network functions. ties that share a large number of checkins tend to be strongly Using a Bayesian approach, we first assess checkin lo- clustered. We hypothesize that such strong ties (for example, cation sharing as a social connectivity predictor. Based on family ties, friendships etc.) carry higher influence compared to real datasets, we show that a social tie is probabilistically weaker ties (mere acquaintances) in the social network; and (iii) social ties that have statistically fewer shared checkins (weak implied by large shared checkin counts. However, the inverse ties) tend to be less clustered than the underlying (baseline) is not true – social connectivity does not probabilistically network. We hypothesize that such ties (for example professional imply geolocation similarity. We also demonstrate how social ties, friends of friends, acquaintances etc.,) carry less influence. connectivity can be augmented by shared checkin location data We present statistical models and validate our hypotheses on to deconvolve networks. We define a set of discrete intervals real datasets. Our conclusions can significantly enhance flow of on the number of shared checkins. We use these intervals to information and influence in the network by suitably leveraging deconvolve the base LBSN into separate layers. We argue the distinct relationships captured in the deconcolved networks. that these layers represent networks of different strength of Keywords—Location Based Social Networks, Friends network, social ties. Our argument is based on a sequence of interesting Social connectivity, Probabilistic estimation of social connectivity; observations regarding the network layers. We demonstrate that the network layers corresponding to the high shared checkin counts show high degree of triadic closure (clustering I.INTRODUCTION coefficient). This suggests strong social ties in these layers. In Location Based Social Networks (LBSNs) integrate contrast, network layers corresponding to low shared checkins geospatial data with social connectivity to enable users to tend to be sparser, with lower degrees of triadic closure than register location information, time tags, and share preferences. the base networks. This suggests that these network layers code The rich data model of LBSNs can be abstracted into a weak ties – acquaintances, friends of friends, etc. number of disparate views – a mapping of users to locations, Deconvolution of the LBSN in this manner has significant locations to events and associated times, and users to other utility. Network layers corresponding to strong ties carry users. Interdependencies between these views, either common- information and influence more efficiently than lower net- alities (base information) or divergence (true relation-specific work layers. This enables targeted information dissemination, content) reveal interesting and important latent structures in leading to higher network utilization. Our work provides a The authors would like to acknowledge the US National Science Foundation general framework within which other node attributes (other Grants CSR 1422338 and OIA 0939370. than checkin information) can be used to generalize from node to pairwise to (sub) and function. Gu et al. [2] present analysis based on a combination of geo-sensitive textual features for improved text-based location The rest of the paper is organized as follows: Section II estimation. Cho et al. [3] explore human geographic movement initiates our discussion with an overview of related research. in relation to social ties to analyze future checkins of a user and Section III presents definitions and notation used in the rest of effects of distance between users on future checkins in a typical the paper. Section IV formalizes the main propositions put social network. Noulas et al. [4] analyze checkin dynamics forth in the paper. Section V presents detailed analyses of to study the spatio-temporal patterns of user mobility. In real datasets to validate our propositions. Section VI draws particular they use temporal checkin information to deduce conclusions and outlines avenues for future research. user mobility patterns for a recommender system to enrich user experience. Chang et al. [24] present a model for predicting II.RELATED WORK future checkins based on past checkins, time of checkin, and user demographics. Current research on LBSNs can be broadly classified into the following areas: (i) analyses of spatial properties of social Chang et al. [24] show that an increasing number of networks, (ii) inference and prediction of user attributes from shared checkins results in increasing friendship probabilities. social context (social ties, geotagged photos, etc.), (iii) infer- However, the relationship between social ties and number ence and prediction of attributes of social ties, such as mobility of shared checkins is not investigated. Pelechrinis and Kr- patterns, distance etc. and, (iv) inference and prediction of ishnamurthy [13], using affiliation networks, draw similar social ties and future checkins based on temporal aspects of conclusions as well. Their primary contribution is the interplay social networks (time of checkins, status updates, mobility between the nature of a checkin location and social ties. They patterns etc.). We briefly summarize results in each of these show that checkin locations have higher clustering coefficients categories and put our results in context. among friends, when compared to non-friends. Focal closure, which considers pairs of users and their social ties, as op- A large body of research [1]–[8] argues that distance posed to social closure is used extensively in drawing these is a controlling factor in a social network, and exhibits a conclusions. In our research social closures, in which social power-law relationship with different exponents. Scellato et ties between a set of users is considered, play a pivotal role al. [9] study social-spatial properties of networks and propose in defining characteristics of a social subgraph. a statistical model for the heterogeneity of social triads as a function of distance and probability of a link in a social triad. Our focus in this work differs from prior approaches in Other researchers [8], [10]–[12] conclude that social ties in that we analyze the impact of node (checkin) information on highly connected groups tend to span shorter distances, and the aggregate structure and function of the network. We show are more probable, compared to their long range counterparts. that there is a strong link between checkin information and Kaltenbrunner et al. [8] study the effect of geographic distance strength of ties, and that this link can be used to identify latent on online social interactions, and conclude that spatial prox- structures in networks. imity greatly impacts formation of social links; however, once formed other factors determine how messages are exchanged on these links. Pelechrinis and Krishnamurthy [13] analyze the nature of checkin locations (venues) in the context of social III.NOTATION AND DEFINITIONS ties, and how distance effects the behavior of social ties. A number of efforts [6], [14], [15] focus on prediction of We initiate our analysis by defining two related networks user location solely based on the information readily available defined over the set of users U and the set of checkin locations from the underlying social network. Jurgens [16] assigns home C: location to users based on a multi-variate median distribution, using ground truth of his/her social ties. Jahanbakhsh et al. [1] • The social (friends) graph GF = (VF ,EF ) with ver- use a power-law distribution for predicting social ties, and tices in VF = U and (undirected) edges < ui, uj >∈ clustering to assign home locations. Other efforts [17], [18] EF connects two users ui, uj iff they are socially employ probabilistic models to assign values to attributes in related (i.e. friends) – a user-user network. Setting user profiles. Spatio-temporal mining algorithms and analysis |U| = n, its adjacency matrix AF is an n × n matrix: of status updates, in addition to geographic and economic AF [i, j] = 1 iff ui, uj are friends, else vanishes. The factors, are used to study unobserved context between people matrix is symmetric, with |EF | = f nonzero entries and locations [19], [20]. in its upper triangular part. Wang et al. [21] use human mobility patterns, their prox- • The checkin graph G = ((V ,V ),E ), which imity in social networks, and their correlation to predict social C C1 C2 C ties. Hu et al. [22] propose an approach for detecting geo- is a bipartite graph with edges < ui, lj >∈ EC connecting a user u ∈ U(= V ) with any of its graphic communities in mobile social networks. This approach i C1 checkin locations l ∈ C(= V ) – a user-location relies on spatial proximity and community structure in mobile j C2 social networks. Scellato et al. [23] analyze biodiversity of network. Setting |C| = m, its adjacency matrix AC is place as a factor in link-forging between two users, along with an n × m matrix with AC [i, j] = 1 iff ui has checked common checkins, within two-hop or n-hop networks. They in at lj, else it is zero. formulate a supervised learning approach for link prediction as a binary classification problem. The following additional notion is used in the analyses: compute M users of ordering fixed product a matrix for the in entries of number the with E BN,ptfrha e fpooiin;w ee their graph defer checkin we the does propositions; of section. set next a the to as validation experimental forth put LBSNs, i.3 rbblt fsca onciiygiven connectivity social of Probability 3: Fig.

Pr(F/C ) of Probability 2: Fig. r P r P k Pr(C /F) 0.000 0.010 0.020 0.030 0.005 0.015 0.025 k 0.00 0.10 0.20 0.30 0.40 0.05 0.15 0.25 0.35 V A IV. F C n lonraieover normalize also and , eadesanme fqetosi hscnet How context: this in questions of number a address We in behaviors hypothesized of number a discuss now We tflosthat follows It r P 0 ( ( (

r P 0 1 ,j i, F|C C M 1 2 2 k a rgtieDataset Brightkite (a) 3 Dataset Brightkite (a) ( 3 ( C |F 4 C 4 = ) F 5

C 5 k k r P ( F NALYZING 6 6 k ,j i, : ) : ) : ) : ) 7 7 8 8 Number ofsharedcheckins ( 9 9 k Number ofsharedcheckins

C 10 = ) 10 : : 11 11 k ool redue ar,ie,for i.e., pairs, friend-user only to 12 12

|F 13 13 steeetta w sr share users two that event the is vn httouesaefriends ) are . . users . two 2 that 1, event 0, = (k locations h rbblt httouesshare users friends are two they that share given that locations checkin users probability two the that locations checkin probability the h rbblt httouesaefriends are users two friends, that probability are the share users they two that given that probability the 14 14

k 15 15

) 16 16 r P

ecnrsrc h ubro nre with entries of number the restrict can we , 17 snraie over normalized as 17 18 18 S 19 19 (

OCIAL 20 20 F 21 k 21 22 22 G = ) 23 23 hrdceknlctosgvnsca onciiyvs connectivity social given locations checkin shared C 24 24

C 25 25 HECKINS 26 26 27 27 C eaet h red graph friends the to relate 28 28 n f

u 29 29 ( NETVT AND ONNECTIVITY i n 2 eg,restricting (e.g., u , f − 1) j .I eae aho,to fashion, related a In ). Furthermore, . k n hci locations checkin ( n 2 em r hw nteeposfrbrevity. for plots these in shown are terms − 1) r hw nteeposfrbrevity. for plots these in shown are M eg,for (e.g., k j > i C u < Pr(F/Ck ) Pr(C /F) 0.4 0.0 0.3 0.5 0.1 0.2 k hrdceknlctos rt3 o-aihn odtoa rbblt terms probability conditional non-vanishing 30 first locations; checkin shared 0.00 0.10 0.20 0.30 0.40 0.05 0.15 0.25 0.35 0.45 k S r P = 0 1 0 HARED checkin i 2 1 ). u , 3 2 ( A 4 3 C b oal Dataset Gowalla (b) b oal Dataset Gowalla (b)

j 4

j > i 5 C k G 6 5 ) > 6 A 7

F 8 7 k k 8 C > is ∈ 9 Number ofsharedcheckins ? Number ofsharedcheckins 10 9 11 10 12 11 13 12 14 13 14 15

oilte ol aual con o xesv hci loca- checkin excessive locations for checkin account shared naturally would of ties values social larger towards distribution per the skewed connection intuition, social by a be, However implying should se. without options shared checkin in be definition, live their they then by Consequently, because could area. simply It same checkins share the checkins. people of that numbers case the moderate to low share etc. employment of a geo- places in a common participating over activity, example, activities group for common Alternately region, friend by constrained etc. a motivated graphically interest, be – mutual can interest of checkins would common location tie of a social places typical recommending the to A on [25]. checkins our understood principles encourage basic (in be these behavior can of user sharing) basis networks. on information social connections checkin of social case, individuals structure of the with influence shaping The in (connections) roles [25]. pivotal bonds us play suggests social to networks, form similar social we typical that in relationships locations governing checkin shared themselves. of number among varying to leading actions, ties social overlaps? do location Conversely, checkin users significant between connectivity? statistically (randomly location) social imply same sharing the their location at effect in checkin checkin does users selected how particular In 15 16 16 17 17 18 18 19 ovrey oilte r o mle ncssweeusers where cases in implied not are ties social Conversely, inter- social of degree varying exhibit network a in Users 19 20 20 21 21 24 24 25 25 27 27 33 33 42 42 k rt3 o-aihn odtoa probability conditional non-vanishing 30 first ; Socialization n ftebscprinciples basic the of one , Pr(F/Ck ) Pr(C /F) 0.00 0.10 0.20 0.30 0.05 0.15 0.25 k 0.4 0.6 0.0 0.8 0.3 0.7 0.5 0.1 0.2

0 0

and 1 1 2 2 3 3 4 4 5 5 c epDataset Yelp (c) c epDataset Yelp (c) 6 6 oilinfluence social 7 7 r P 8 8 9 Number ofsharedcheckins 9 Number ofsharedcheckins 10 10

( 11 11

F|C 12 12 13 13 14 14 15 15

k 16 16 17 17 ) 18 18 19 19 hudbe should 20 20 21 21 22 22 23 23 24 24 25 25 also 26 26 27 27 k 28 28

; 29 29 3000 4000 250

3500 2500 200 3000 2000 2500 150

1500 2000

100 1500

Number of users 1000 Number of users Number of users 1000 50 500 500

0 0 0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Clustering Coefficient Clustering Coefficient Clustering Coefficient (a) Zero shared checkins (b) Between (and including) 1 and 5 (c) More than 5 shared checkins shared checkins

Fig. 4: Clustering coefficient of Brightkite dataset. Dotted line indicates the mean clustering coefficient for each case.

4000 12000 300

3500 10000 250 3000 8000 200 2500

2000 6000 150

1500

Number of users Number of users 4000 Number of users 100 1000 2000 50 500

0 0 0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Clustering Coefficient Clustering Coefficient Clustering Coefficient (a) Zero shared checkins (b) Between (and including) 1 and 5 (c) More than 5 shared checkins shared checkins

Fig. 5: Clustering coefficient of Gowalla dataset. Dotted line indicates the mean clustering coefficient for each case.

tion sharing between users. We summarize these observations become friends themselves at some point in the future [25]. in the following proposition: Triadic closure draws on opportunity, incentive, and trust. Two people with a common friend may have more opportunities Proposition 1 P r(Ck|F) may not be high for nodes that for social interactions compared to two randomly selected share a large number of checkin locations k. P r(F|Ck) is people. Social psychology also notes the existence of latent high for pairs of users that share a large number of checkin stress between social ties that do not form triadic closure [25]. locations k. Furthermore, laws of socialization and social influence imply Bayes rule connects these conditional probabilities as fol- a sense of relatively high induced trust between friends, as lows: compared to randomly selected individuals. P r(Ck|F) P r(F|Ck) = ∗ P r(F) (1) For these reasons, we expect a statistical dependence P r(Ck) between the structures of our two networks GF and GC : The existence of an edge < u , u > in G correlates to the We observe that two friends are more likely to share i j F number of paths of length 2 connecting ui, uj in GC . One more than a few checkin locations, homophily in play, than way to further explore this connection is to use properties of in the case of two randomly selected users. In other words, one network to partition the other, and then explore potential P r(Ck|F) P r(Ck|F) > P r(Ck). So their quotient > 1 can P r(Ck) correlations between these partition-inducing properties and be interpreted as that factor by which checkin sharing boosts the characteristics of resulting subnetworks. friendship, i.e., it scales P r(F) to produce P r(F|Ck) in Bayes rule. Furthermore, P r(F) is a constant, relative to k, so the To this end, we use ranges K1,K2,...,Kl of shared k G functional dependence of P r(F|Ck) and the boosting factor checkins from C - successively spanning larger values P r(Ck|F) of k - to partition GF into GF |K ,GF |K ,...,GF |K and on k are identical. On the other hand P r(Ck) is 1 2 l P r(Ck) then analyze the triadic closure of these social partitions. expected to decrease for sufficiently large k; P r(C ) is shaped k In particular, we model the strength of a social tie as the by the distribution of user degrees in G and the geographic C number of shared checkins between its two end-points and, extent of their checkin targets, both naturally upper bounded. compute the distribution of clustering coefficients in the social This qualitatively explains the limiting behavior of P r(F|C ), k subgraphs. For a node u with d social edges u , u , . . . u , the P r(C |F) (for large k). u 1 2 d k number of connections between its friends is upper bounded du(du−1) Triadic closure states that if two people have a common by 2 . This is also reflected in the number of social friend, then there is an increased likelihood that they will triads, t(u) that include u (endpoints of a social tie sharing V. EXPERIMENTAL RESULTS

Dataset Unique Checkins Unique users Social edges Brightkite 1,104,692 50,686 194,090 Gowalla 4,017,525 107,067 456,760 Yelp 961,076 70,817 151,516 TABLE I: Unique number of users and checkin locations of the datasets used in our experiments.

In this section, we present detailed experimental results using the Gowalla, Brightkite and Yelp datasets. Brightkite and Gowalla datasets are obtained from [26]. Brightkite has 50, 686 nodes and Gowalla has 107, 067 nodes. These are two location based social networking websites, and the datasets were collected using their public API. The Yelp dataset was obtained from [27]. It is a restricted dataset only for the users located in Phoenix, Arizona; please see Table I for details on these datasets. We reasonably restrict the bulk of checkin coordinates ◦ ◦ Fig. 1: Users (yellow, black and red nodes) are connected as contained in the [−70 , 70 ] band on Earth’s surface for with social edges (solid lines) and checkin at locations (box, Brightkite and Gowalla datasets. The statistics of the post- triangle, rectangle and ellipsis - to facilitate shared grouping), processed datasets are also summarized in Table I. We convert associated to them with dashed arrows. The “black” user shares (longitude, latitude) pairs into geohashes and fix the discretiza- ≤ 3 locations with his/her “yellow” friends and > 3 locations tion level to the first three characters of this representation. with his/her “red” friends. Respective parts of G (bottom Yelp datasets’ checkin locations are used in their existing form F |k≤3 in all the experiments. left) and GF |k>3 (bottom right) are also given. All the software is written in Python. The numpy and scipy packages for numerical computation, for network analysis, pandas for data analysis, and geohash for a common friend). If all of his/ her friends are connected, geo-encoding tasks are used. The code for all the experiments then u’s first-hop social neighborhood would be a ; its presented in this section can be downloaded from [28]. clustering coefficient c(u) is then simply an indication of how close this neighborhood is to becoming a clique, whether one’s friends are also friends between each other: A. Validation of Propositions: Results and Discussion We now present detailed analyses of the datasets in support 2t(u) of the propositions in Section IV. c(u) = (2) d (d − 1) u u 1) Proposition 1: Recall that this proposition states that We hypothesize that social partitions with larger number of a large number of shared checkins are probabilistically in- shared checkins tend to have clustering coefficient distributions dicative of social connectivity. Conversely, social connectivity progressively more skewed towards larger values of k. This does not imply statistically large number of shared checkins. reflects the trend of social relations being naturally organized Figure 2, illustrates the probability of shared-checkins given P r(C |F) into groups of varying social density: strong ties are expected social connectivity ( k ) between a pair of users in to belong to a more clustered group than weak ties (potentially each of these three datasets. From these figures we can originating from completely different aspects of a user’s social clearly notice the inverse relation between the number of k life). shared checkin locations, , and the probability of them being shared by two users connected by a social tie. Given a social We formulate this hypothesis in the form of the following tie, its endpoints are most probable to share none or only proposition: a few shared-checkin location rather than a large number of them: Computing percentage of social ties that share at Proposition 2 Social connections that share large number 96.9% 98.9% 92.8% of shared checkin locations tend to be strongly clustered. most five shared-checkins yields , and , for brightkite, gowalla and yelp respectively. This remark - We also argue that the converse proposition holds as well: large fractions of edges share a very small number of shared- checkins - is even more pronounced for arbitrary user pairs: Proposition 3 Social connections that share fewer shared Computing over P r(C ) yields 99.8%, 99.9% and 93.1%, for chekins tend to be less clustered, compared to the underlying k brightkite, gowalla and yelp respectively, of pairs of nodes social network. share at most five shared-checkins. The shift towards higher The situation is illustrated in Figure 1, where two social shared-checkins for friend pairs (as compared to user ones) is subgraphs around a user are aligned with the propositions also the basic reason for significantly larger shared checkins (larger number of social triads for larger numbers of shared on average in the for all three datasets: For checkin locations). brightkite, gowalla and yelp these are respectively 1.38, 1.16, 4500 1200 350

4000 300 1000 3500 250 3000 800

2500 200 600 2000 150

Number of users 1500 Number of users 400 Number of users 100 1000 200 50 500

0 0 0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Clustering Coefficient Clustering Coefficient Clustering Coefficient (a) Zero shared checkins (b) Between (and including) 1 and 5 (c) More than 5 shared checkins shared checkins

Fig. 6: Clustering coefficient of Yelp dataset. Dotted line indicates the mean clustering coefficient for each case.

(a) Subgraph from Yelp dataset with social ties. (b) Edges not sharing any checkin locations from the yelp subgraph

(c) Edges sharing between (and including) 1 and 5 checkin locations from the (d) Edges sharing more than 5 checkin locations from the yelp subgraph yelp subgraph

Fig. 7: A subgraph from yelp dataset and its deconvolution into three layers based on the intervals of shared-checkins between the end points of the edges in this subgraph. and 1.39 (while for arbitrary user pairs these averages are [> 5] shared checkin locations for social ties the network 0.13, 0.07, and 0.76). Due to the rapidly decreasing behavior layer becomes more dense as evident in the figure 7(d). This with k of related P r(Ck|F) and P r(Ck) distributions (they offers a validation of the proposition and corroborates the data nearly coincide by vanishing for large k), it follows that these presented in Figures 4, 5 and 6. averages can be well approximated by summing only over low 3) Proposition 3: We further analyze the case of few shared shared-checkin counts. checkins: Figures 8 and 9 plot the relation between social Figure 3 presents the probability of a social tie given the connections that share at most one shared-checkin and all the end points of that tie sharing k number of checkins locations underlying social ties for the three datasets. Figures 8(a), 8(b), (P r(F|Ck)). As the number of shared checkin locations in- and 8(c) plot the histogram of clustering coefficients of social creases, we are more likely to find two users checking in at the connections sharing at most one shared-checkin. We observe same locations who are socially connected as well. In all the that the majority of social connections that have at most one three datasets we observe that the number of pairs of users who shared-checkin tend to be less clustered when compared to the checkin at the same location is small, with increasing number clustering coefficients of all the underlying social connections of k checkin locations. However, it is precisely this subset of in figures 9(a), 9(b), and 9(c). From these two sets of figures, pairs of users for which it is highly probable that any of its we conclude the following: Social connections that have elements are social ties. This relates to the socialization and statistically fewer shared checkins tend to be less clustered, social influence principles discussed in Section IV: Homophily compared to the underlying social network. We hypothesize dictates that individuals tend to socialize with others who that such links carry less weight in the social network. This are more like themselves. This drives the level of social clearly relates to the idea of weak ties discussed in Section IV. interactions (in this case leading to more common checkin Weak ties that connect two users who do not share many locations). The more two individuals are alike, the more likely common interests also play an important role in shaping the that they will be socially connected. structure of a typical social network. Weak ties help the gap (structural holes) in these social networks. Since they One way to quantify this trend is by summing the con- play the role of bridges between densely connected parts of ditional probabilities (P r(F|Ck)) for the number of shared the social networks their clustering coefficients tend to be on checkins k in some low, medium and large location sharing the lower end of the scale [25]. ranges for all our datasets. As an example we choose k < 10, 10 ≤ k ≤ 20 and k > 20 respectively. These sums are 0.001, VI.CONCLUSIONSAND FUTUREWORK 0.01 and 0.065 (for brightkite), 0.004, 0.017 and 0.098 (for gowalla) and 0.0007, 0.049 and 0.139 (for yelp). It is evident We put forth and validate a number of important hypothesis that these sums grow considerably (by one to two orders of relating to LSBNs in this paper. Using established statistical magnitudes for all cases) for those successively larger checkin methods and real-world data, we relate user checkins to social sharing ranges. ties and social groups. We use checkin frequencies, along with social connections to identify latent structures. We analyze Based on these figures we conclude the following: Large these latent structures for their clustering coefficients, and number of shared-checkins statistically imply social connec- show that these structures exhibit statistically significantly tivity. On the other hand, social connectivity does not imply stronger clustering coefficients than the underlying network. It statistically large number of shared-checkins. follows that these subnetworks provide more efficient channels 2) Proposition 2: Figures 4, 5 and 6 illustrate the for flow of information and influence in the network, which clustering coefficients of three datasets when users of a social can be leveraged in a number of applications. tie share zero, up to 5, and more than 5 checkins, along with the In continuing work, we are exploring techniques for shap- mean clustering coefficient in each of these figures. From Fig- ing the structure of the network. In particular, how does one ures 4(a), 4(b), and 4(c) we notice that with reasonably large seed strong ties in specific neighborhoods. Our results suggest number of shared checkins, the social connections between that larger number of shared checkins would result in such the nodes tend to be more clustered compared to the social dense neighborhoods. It follows therefore that an effective connections that do not have any shared checkins. The case in shaping mechanism would be to incentivize shared checkins. which nodes of a social connection share a few checkins falls The key question here is one of cost effectiveness and impact between these two extremes. A similar phenomenon is evident on the eventual network structure. in Figures 5(a), 5(b), and 5(c); and in Figures 6(a), 6(b) and 6(c). Based on these figures, we draw the following conclusion: REFERENCES Social connections that share statistically large number [1] K. Jahanbakhsh, V. King, and G. C. Shoja, “They know where you of shared-checkins tend to be more clustered compared to live!” February 2012. connections which share at most a few shared checkins. This [2] H. Gu, H. Hang, Q. Lv, and D. Grunwald, “Fusing text and friendships suggests an alternate basis for social connectivity: for example, for location inference in online social networks,” IEEE Web Intelligence indicative of family ties. We hypothesize that such links carry and Intelligent Agent Technology, 2012. higher weight in the social graph. Figures 7 plots a subgraph [3] E. Cho, S. A. Myers, and J. Leskovec, “Friendship and mobility: User movement in location-based social networks,” KDD, 2011. from yelp dataset and its deconvolution into three layers. [4] A. Noulas, S. Scellato, C. Mascolo, and M. Pontil, “An empirical study Clearly we notice that figure 7(b) is a sparse component of the of geographic user activity patterns in foursquare,” Association for the original subgraph in which the social ties share no checkins. Advancement of Artificial Intelligence, 2011. As we increase the number of shared checkins to interval [1, 5] [5] H. Gao and H. Liu, “Data analysis on location-based social networks,” we notice an increase in the density of the subgraph and for Arizona State University, Tech. Rep. 6000 12000 5000

5000 10000 4000

4000 8000 3000

3000 6000

2000

Number of users 2000 Number of users 4000 Number of users

1000 1000 2000

0 0 0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Clustering Coefficient Clustering Coefficient Clustering Coefficient (a) Brightkite Dataset (b) Gowalla Dataset (c) Yelp Dataset

Fig. 8: Clustering Coefficient of edges sharing at most 1 checkin location. Dotted line indicates the mean clustering coefficient for each of the datasets.

6000 16000 4000

14000 3500 5000

12000 3000 4000 10000 2500

3000 8000 2000

6000

Number of users 2000 Number of users 1500 4000 1000 1000 2000 500 0 0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0 Clustering Coefficient Clustering Coefficient 0.0 0.2 0.4 0.6 0.8 1.0 (a) Brightkite Dataset (b) Gowalla Dataset (c) Yelp Dataset

Fig. 9: Clustering Coefficient of edges of the underlying social network. Dotted line indicates the mean clustering coefficient for each of the datasets.

[6] L. Backstrom, E. Sun, and C. Marlow, “Find me if you can: Improving [17] A. Sadile, H. Kautz, and J. P. Bigham, “Finding your friends and geographical prediction with social and spatial proximity,” www, 2010. following them to where you are,” WSDM, 2012. [7] J. Illenberger, K. Nagel, and G. Flotterod, “The role of spatial interac- [18] R. Li, S. Wang, H. Deng, R. Wang, K. Chen, and C. Chang, “Towards tion in social networks.” social user profiing: Unified and discriminative influence model for inferring home locations,” KDD, 2012. [8] A. Kaltenbrunner, S. Scellato, Y. Volkovich, D. Laniado, D. Currie, E. J. Jutemar, and C. Mascolo, “Far from the eyes, close on the web: Impact [19] S. Abrol, L. Khan, and B. Thuraisingham, “Tweeque:spatio-temporal of geographic distance on online social interactions,” WOSN, 2012. analysis of social networks for location mining using graph partition- ing,” International Conference on Social Informatics, 2012. [9] S. Scellato, A. Noulas, R. Lambiotte, and C. Mascolo, “Socio-spatial properties of online location-based social networks,” Association for the [20] Z. Cheng, J. Caverlee, K. Lee, and D. Z. Sui, “Exploring millions of Advancement of Artificial Intelligence, 2011. footprints in location sharing services,” Association for the Advancement of Artificial Intelligence, 2011. [10] Y. Volkovich, S. Scellato, D. Laniado, C. Mascolo, and A. Kaltenbrun- ner, “The length of bridge ties: Structural and geographic properties of [21] D. Wang, D. Perdreschi, C. Song, F. Giannotti, and A. L. Barabasi, online social interactions,” Association for the Advancement of Artificial “Human mobility, social ties, and link prediction,” KDD, 2011. Intelligence, 2012. [22] D. Hu, S. Chen, L. Tu, and B. Huang, “Detecting geographic community in ,” IEEE Green Computing and Communica- [11] J. McGee, J. Caverlee, and Z. Cheng, “A geographic study of tie strength tions, 2012. in ,” CIKM, 2011. [23] S. Scellato, A. Noulas, and C. Mascolo, “Exploiting place features in [12] S. Scellato, “Beyond the : the geo-social revolution,” SIG- link prediction on location-based social networks,” KDD, 2011. WEB, 2011. [24] J. Chang and E. Sun, “Location: How users share and respond to [13] K. Pelechrinis and P. Krishnamurthy, “Location affilication networks: location-based data on social networking sites,” Association for the Bonding social nad spatial information,” in Proceedings of the Fifth Advancement of Artificial Intelligence, 2011. International AAAI Conference on Weblogs and Social Media, 2011. [25] D. Easley and J. Kleinberg, Networks, Crowds, and Markets: Reasoning [14] D. Rout, D. P. Pietro, K. Bontcheva, and T. Cohn, “Where’s @wally? a about a highly connected world. Cambridge University Press, 2010. classification approach to geolocating users based on their social ties,” [26] [Online]. Available: http://snap.stanford.edu/data/loc-brightkite.html ACM Conference on Hypertext and Social Media, 2013. [27] [Online]. Available: http://www.yelp.com/dataset challenge/ [15] C. A. D. Jr, G. L. Pappa, D. R. R. de Oliveira, and F. de L. Arcanjo, ∼ “Inferring the location of twitter messages based on user relationships,” [28] [Online]. Available: http:///www.cs.purdue.edu/ skylasa/ Transactions in GIS, vol. 15(6), pp. 735–751, 2011. are-we-friends.html [16] D. Jurgens, “Thats what friends are for: Inferring location in online social media platforms based on social relationships,” Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media.