Bayesian nonparametric model for sparse dynamic networks

Konstantina Palla, François Caron and Yee Whye Teh

23rd of February, 2018 University of Glasgow

Konstantina Palla 1 / 38 BIG PICTURE

Interested in networks

Examples

• Simple (undirected) graphs • Social network • Evoling • Protein-protein interaction • Internet

Konstantina Palla 2 / 38 BIG PICTURE

Build a statistical model that captures properties of real-world networks

Sparsity (e) 2 Nt = o(Nt ) (e) Nt the number of edges Nt the number of nodes at time t.

Konstantina Palla 3 / 38

pics taken by ?. BIG PICTURE

Build a statistical model that captures properties of real-world networks

Exchangebility d (Zij) = (Zπ(i)π(j)) for any permutaiotn π of N

2 where Z is the adjacency matrix and Zij ∈ {0, 1}, (i, j) ∈ N

    P =d P

> Fundamentally important concept in modelling!

Konstantina Palla 4 / 38 BIG PICTURE

Exchangeability of the classical discrete adjacency matrix

Aldous-Hoover representation theorem for exchangeable arrays (Aldous, 1981; Hoover, 1979) d (Zij) = (f (Ui, Uj, Uij))

iid 3 with Ui, Uij ∼ U(0, 1) and f :[0, 1] → {0, 1} a random function

Several network models under this framework • - Erdös-Rényi, stochastic block model, infinite relational models, etc.

Konstantina Palla 5 / 38 BIG PICTURE

The problem

> Corollary of the Aldous-Hoover theorem

Graphs represented by an exchangeable matrix are either trivially empty or dense and thus not appropriate for most real applications

> “.. It shows that most Bayesian models of network data are inherently misspecified” (Orbanz and Roy, 2015)

Konstantina Palla 6 / 38 MAIN IDEAS

2 Graph as a over R+ (Caron and Fox, 2014)

wi Completely Random Measures • P∞ 0 W = i=1 wiδθi random θ i measure on R+

• wi sociability

• θi embedding of node i on R+

wj θj ztij Point process P Z = i,j zijδ(θi,θj)

zij = 1 if there is a link between θi, θj Edge between θi and θj represented by points at (θi, θj) and (θj, θi) zij = f (wi, wj)

Konstantina Palla 7 / 38 MAIN IDEAS

2 Graph as a Point process over R+ - Exchangeability

Exchangeability in this continuous space (Kallenberg, 2005)

d 2 (Z(Ai × Aj)) = (Z(Aπ(i) × Aσ(j)), i, j ∈ N

for any disjoint measurable sets Ai and Aj of R+

Konstantina Palla 8 / 38 MAIN IDEAS

Why Point process representation

> Sprasity. Construction based on a completely random measure (CRM); ΓP(α, τ, λ). > Exchangeability. Representation theorem by Kallenberg for jointly 2 exchangeable point processes on the plane R+ (Kallenberg, 2005).

Konstantina Palla 9 / 38 PROPOSEDMODEL

A. Use a Gamma process, associate each node to a sociability parameter wti ∈ R+ X Wt = wtiδθi , Wt ∼ ΓP(α, τ, λ) i

(Wt)t=1,2,...,T also a dynamic process. wti 0 θi

Mean measure ν(dwdθ) = αw−1e−τwdwλ(dθ) Lévy measure w−1e−τwdw α > 0, τ > 0 and λ is the Lebesgue wtj θj zij measure.

Konstantina Palla 10 / 38 PROPOSEDMODEL

B. Define interactions between each pair of nodes allowing influence from the past

new old ntij = ntij + ntij

 new Poisson(2wtiwtj) , i 6= j ntij |Wt ∼ 2  Poisson wti , i = j old  −ρ∆t ntij |nt−1ij ∼ Binomial nt−1ij, e

e−ρ∆t ∈ [0, 1]; forgetting parameter. Each interaction has a lifetime distributed from a geometric distribution representing the time two individuals remember about it. 1 C. Decide for a link ztij = (ntij>0)

X 1 Zt = (ntij>0)δ(θi,θj) i,j

Konstantina Palla 11 / 38 PROPOSEDMODEL

Wt−1 Wt Wt+1

new new new nt−1 nt nt+1

n old old n t−1 nt nt nt+1 t+1

Zt−1 Zt Zt+1

X Wt = wtiδθi i,j new old ntij = ntij + ntij  new Poisson(2wtiwtj), i 6= j old  −ρ∆t ntij |Wt ∼ 2  , ntij |nt−1ij ∼ Binomial nt−1ij, e Poisson wti , i = j X 1 Zt = (ntij>0)δ(θi,θj) i,j

What about the evolution of the sociability parameters, i.e. (Wt)t=1,2,...,T ? Konstantina Palla 12 / 38 SOCIABILITY PARAMETERS

Model the latent sociabilities through a time-dependent random measure. For t = 1,..., T ∞ X Wt = wtiδθi i=1

Aim: Construct a dependent sequence (Wt)t=1,...,T which marginally follows a Gamma process Wt ∼ ΓP(α, τ, λ) How? Use the construction in Pitt and Walker (2005) (also used in Caron and Teh (2012)).

Konstantina Palla 13 / 38 SOCIABILITY PARAMETERS

> Pitt and Walker (2005), Caron and Teh (2012) introduce an auxiliary random measure Ct with conditional law

∞ X Ct|Wt = ctkδθk , ctk|Wt ∼ Poisson(φwtk) k=1 φ > 0 dependence parameter.

> Define the law of Wt+1|Ct to coincide with the law of Wt|Ct.

> Since the law of Wt+1 and Wt given Ct coincide and Wt ∼ ΓP(α, τ, λ) we have: Wt+1 ∼ ΓP(α, τ, λ).(Gamma process marginals)

Wt−1 Wt Wt+1

Ct−1 Ct Ct+1

Konstantina Palla 14 / 38 SOCIABILITY PARAMETERS

Define the conditional laws Wt|Ct and Wt+1|Ct

Proposition [Caron and Teh (2012)]. Suppose the law of Wt is ΓP(α, τ, λ). The conditional law of Wt given Ct is then:

∞ X Wt|Ct = wtkδθk + Wt∗ k=1

∞ where Wt∗ and {wtk}k=1 are all mutually independent. The law of Wt∗ is given by a gamma process, while the masses are conditionally gamma,

Wt∗ ∼ ΓP(α, τ + φ, λ) wtk|Ct ∼ Gamma(ctk, τ + φ)

Define the law of Wt+1|Ct to coincide with the law of Wt|Ct

∞ X Wt+1|Ct = wt+1,kδθk + Wt+1∗ k=1

Wt+1∗ ∼ ΓP(α, τ + φ, λ) wt+1,k|Ct ∼ Gamma(ctk, τ + φ)

Konstantina Palla 15 / 38 REVIEW

Wt−1 Wt Wt+1

Ct−1 Ct Ct+1 new new new nt−1 nt nt+1

n old old n t−1 nt nt nt+1 t+1

Zt−1 Zt Zt+1

Konstantina Palla 16 / 38 REVIEW

α, τ φ

Wt−1 Wt Wt+1

Ct−1 Ct Ct+1 ρ new new new nt−1 nt nt+1

n old old n t−1 nt nt nt+1 t+1

Zt−1 Zt Zt+1

Konstantina Palla 17 / 38 PARAMETERS φ, τ, α AND ρ

• φ tunes the correlation of the sociabilities of each node over time; larger values correspond to higher correlation and smoother evolution of the weights; φ τ E[W |W ] = W + λ t+1 t φ + τ t φ + τ • τ is a global scaling parameter that tunes the overall level of the sociabilities, and thus the size of the network; • α tunes the variability between the sociability parameters; lower values correspond to higher variability; • ρ tunes the rate at which edges may disappear; larger values correspond to faster disappearance.

(a) φ = 20 (b) φ = 2000

Konstantina Palla 18 / 38 SPARSITY

2 > Wt(R+) = ∞, so infinite number of edges on R+ (unrealistic). 2 > Restrictions Zt,α of Zt to the box [0, α] • Recall: the base measure λ in the ΓP(α, λ) is the Lebesgue measure and as such λ([0, α]) = α. (e) > Nt,α number of nodes and Nt,α number of edges at time t and for a given α. α 0 At time t (e) 2 Nt,α = o(Nt,α) Proof based on • The number of edges grows (e) 2 quadratically with α: Nt,α = Θ(α ) and α • the number of nodes grows superlinearly with α: Nt,α = ω(α).

(Caron and Fox, 2014) Konstantina Palla 19 / 38 EXPERIMENTS-REUTERS TERROR NEWS DATASET 1

• T = 66 days after the 09/11/01 attack, K ≈ 13K words. new old • Observed ntij interactions. Assumed ntij = 0, ∀t ∈ T.

Figure: (Top) Mean and 90% credible interval for the weights of chosen words over time. (Bottom) Observed number of interactions.

1http://vlado.fmf.uni-lj.si/pub/networks/data/CRA/terror.htm Konstantina Palla 20 / 38 EXPERIMENTS -FACEBOOK DATASET 2

• Links Zt observed for T = 50 days among K ≈ 5K from the Facebook New Orleans. • Only appearance of nodes, not disappearance, i.e. ρ = 0.

Figure: (Left) Mean and 90% credible interval for the sociability of some nodes. (Right) Observed degree over time for the same nodes.

2http://konect.uni-koblenz.de/networks/facebook-wosn-links Konstantina Palla 21 / 38 EXPERIMENTS -WIKIPEDIA DATASET 3

• Links Zt observed for T = 50 days among K ≈ 6K articles in the french Wikipedia. • Full model used.

Figure: (Left) Mean and 90% credible interval for the weights of some articles. (Right) Observed degree over time for the same articles.

3http://konect.uni-koblenz.de/networks/link-dynamic-simplewiki Konstantina Palla 22 / 38 CONCLUSION

• Constructed a prior for time-varying networks in discrete time.

• Used an auxiliary random measure of counts Ct to allow for Gamma process marginals over the sociabilities. • Used a link generating mechanism that lets past interactions influence the decision of a link at time t. • The model generates sparse graphs. • Experimental results on 3 real world datasets underlined the model’s efficiency.

Konstantina Palla 23 / 38 FUTURE WORK

• Extend the model to the case of Generalized Gamma process marginals. > By choosing the Levy measure characterizing the GGP, we can to construct graphs ranging from sparse to dense (Caron and Fox, 2014). Generalised Gamma Process GGP(α, τ, σ) → sparsity tuned by a single parameter σ

GGP(100,2,0) GGP(100,2,0.5) GGP(100,2,0.8)

Konstantina Palla 24 / 38 Arxiv paper: 1607.01624

Thank you!

Konstantina Palla 25 / 38 APPENDIX -COMPLETELY RANDOM MEASURES

(Kingman, 1967)

W a CRM if For any countable collection A1, A2 ... of measurable sets

• random variables W(A1), W(A2),... are independent P • W(∪jAj) = j W(Aj) • the distribution of W([t, s]) depends on t − s

Facts P∞ • W = i=1 wiδθi

Konstantina Palla 26 / 38 APPENDIX-GAMMAPROCESS

Gamma process ΓP(α, τ, λ) Completely random measure on Θ with Lévy intensity

ρ(dw) = w−1e−τwdw

on the space [0, ∞). λ is the base measure and α the concentration parameter, such that λ(Θ) = α.

∞ X G := wiδθi ∼ ΓP(α, τ, λ) i=1

∞ Countably infinite collection of pairs {θi, wi}i=1 sampled from a Poisson process over R+ × Θ with intensity ν(dw, dθ) = ρ(dw)H(dθ).

Konstantina Palla 27 / 38 APPENDIX -GAMMAPROCESS

Poisson point representation of the Gamma process

1 0 1 2 0.5 3 4 Θ R+

Konstantina Palla 28 / 38 APPENDIX -CONTINUOUS-TIME FORMULATION USING (I)

> So far we considered discrete -time model; graphs Zt observed at times t1 < . . . < tT . > Use dynamic model over continuous-time, e.g. when the time interval between observations is not constant. Use the Dawson-Watanabe superprocess Watanabe (1968); Dawson (1975); Ethier and Griffiths (1993)

Figure: Evolution of the weights as a realisation from the Dawson-Watanabe

Konstantinasuperprocess Palla - Continuous sample paths. 29 / 38 APPENDIX -CONTINUOUS-TIME FORMULATION USING SUPERPROCESS (II)

P∞ > Let (W(t))t≥0 be a Dawson-Watanabe superprocess and W(t) = i=1 wtiδθi . > Define the birth-death point process (linear death immigration):

• Interactions nij(t) follow a non-homogeneous Poisson process with rate wi(t)wj(t). • Each interaction has a lifetime with distribution Exponential(ρ).

> zij(t) = I(nij(t)+nji(t)>0)  wi(t)wj(t)dt + o(dt) i 6= j Pr(nij(t + dt) = k + 1|nij(t) = k) = 2 [birth] wi (t)dt + o(dt) i = j

Pr(nij(t + dt) = k − 1|nij(t) = k) = ρnij(t)dt + o(dt) [death]

Pr(nij(t + dt) = k|nij(t) = k) = 1 − (wi(t)wj(t) + ρnij(t))dt + o(dt)

Pr(nij(t + dt) > k + 1 or nij(t + dt) < k − 1|nij(t) = k) = o(dt)

Konstantina Palla 30 / 38 APPENDIX -EXACTSIMULATIONOFTHEDYNAMIC GRAPHINDISCRETE-TIME

Two step procedure

A. Simulate the sociabilities at each time t, i.e. (Wt)t=1,...,T .

B. Draw the interactions ntij and construct the graphs (Zt)t=1,...,T .

α, τ φ

Wt−1 Wt Wt+1

Ct−1 Ct Ct+1 ρ new new new nt−1 nt nt+1

n old old n t−1 nt nt nt+1 t+1

Zt−1 Zt Zt+1

Konstantina Palla 31 / 38 APPENDIX -EXACTSIMULATIONOFTHEDYNAMIC GRAPHINDISCRETE-TIME (A)

Simulate the evolution of the sociabilities PKt Given Ct = i=1 ctiδθi sample exactly from the MC of latent counts Ct−1 → Ct → Ct+1 → ...

K Xt Wt+1|Ct = wt+1iδθi + Wt+1∗ i=1

K Xt Ct+1|Wt+1 = ct+1jδθj + Ct+1∗ j=1

wt+1i|Ct ∼ Gamma(cti, τ + φ), Wt+1∗ ∼ ΓP(α, τ + φ, H) and ct+1i|Wt+1 ∼ Poisson(φwt+1i). > Sample new atoms in Ct+1:

wt+1∗ ∼ Gamma(α, τ + φ)

ct+1∗ ∼ Poisson(φwt+1∗)

Ψct+1∗ ∼ CRP(α) Konstantina Palla 32 / 38

Ψct+1∗ a partition over the ct+1∗ counts. APPENDIX -EXACTSIMULATIONOFTHEDYNAMIC GRAPHINDISCRETE-TIME (A)

α, τ φ

Wt−1 Wt Wt+1

Ct−1 Ct Ct+1

∗ ∗ ∗ Given the latent counts Ct−1, Ct+1, let {θi ; Ct({θi }) + Ct−1({θi }) > 0}:

K X W = w δ ∗ + W t ti θi t∗ i=1

wti ∼ Gamma(cti + ct−1i, τ + 2φ), Wt∗ ∼ ΓP(α, τ + 2φ, H)

Note that Wt∗ contains only atoms that are alive at time t and not present at any other ∗ PK time. Let c = i=1 cti + ct−1i.: PK  H + (cti + ct−1i)δθ∗  W |C , C ∼ ΓP α + c∗, τ + 2φ, i=1 i t t t−1 α + c∗

Can we now sample the new interactions ntij? Konstantina Palla 33 / 38 APPENDIX -EXACTSIMULATIONOFTHEDYNAMIC GRAPHINDISCRETE-TIME (B)

Sample interactions and construct graph

PK  H + (cti + ct−1i)δθ∗  W |C , C ∼ ΓP α + c∗, τ + 2φ, i=1 i t t t−1 α + c∗ new Sample nt using the urn scheme: ∗ ∗ 1. Sample the total mass wt ∼ Gamma(α + c , τ + 2φ). ∗ P new 2. Sample the number of new interactions dt = ij ntij at time t ∗ ∗ 2 dt ∼ Poisson wt . 0 0 3. Let (U ,..., U ∗ ) = (U11, U12,..., U ∗ , U ∗ ) be the nodes that the 1 2dt dt 1 dt 2 ∗ interactions refer to. For n = 1,..., 2dt sample the nodes as

n K H 1 X 1 X Un+1|U1,..., Un ∼ + δU0 + (cti+ct−1i)δθ∗ α + c∗ + n α + c∗ + n k α + c∗ + n i k=1 i=1

0 with {Uk}k≤n the set of distinct nodes sampled so far. 0 0 ∗ new 4. (U2k−1, U2k) for k = 1,..., dt . gives the set of new interactions nt .

Konstantina Palla 34 / 38 APPENDIX -POSTERIOR CHARACTERIZATION

∗ K • Observe T graphs Z1,..., ZT wih {θi }i=1 set of unique nodes. ∗ ∗ ∗ • wti = Wt({θi }), i = 1,..., K an wt∗ = Wt(Θ\{θ1 , . . . , θK }). ∗ ∗ ∗ • cti = Ct({θi }), i = 1,..., K and ct∗ = Ct(Θ\{θ1 , . . . , θK }). PK new new • mti = j=1(ntij + ntji ) ≥ 0 for i = 1,..., K.

Theorem 1 K K The posterior distribution given the auxiliary counts {mti}i=1, {cti}i=1 and ct∗ is a CRM with fixed atoms: ∞ K X X w P δ˜ + w δ ∗ t∗ eti θi ti θi i=1 i=1 ˜ P∞ where θi ∼ H, and the weights (Pei)i=1,2,..., with Pe1 > Pe2 > . . . and i=1 Pei = 1, are distributed from a Poisson-Kingman distribution (Pitman, 2003, Definition 3 p.6) −1 −τw with Lévy intensity ρ(dw) = w e dw, conditional on wt∗

(Peti)|wt∗ ∼ PK(ρ|wt∗).

> Characterization related to that for normalized random measures by James, Lijoi,

Konstantinaand Palla Prünster (2009). 35 / 38 APPENDIX -POSTERIORONTHESOCIABILITIES

new The weights (wt1,..., wtK , wt∗) are jointly dependent conditional on ntij with the following posterior distribution:

K K p(wt1,..., wtK , wt∗|Dt, {cti}i=1, {ct−1i}i=1, ct∗, ct−1∗) ∝ K h m +c +c −1i PK 2 PK Y ti ti t−1i −( i=1 wti+wt∗) −(τ+2φ) i=1 wti ∗ wti e × g (wt∗) i=1

α+c +c t∗ t−1∗ −2φwt∗ g∗(w ) = wt∗ e g(wt∗) g where t∗ R ∞ α+ct∗+ct−1∗ −2φw is a gamma tilted stable distribution and 0 w e dw is the distribution of the total mass of a Gamma process.

Konstantina Palla 36 / 38 BIBLIOGRAPHY I

D. J. Aldous. Representations for partially exchangeable arrays of random variables. Journal of Multivariate Analysis, 11(4):581–598, 1981. F. Caron and E. B. Fox. Sparse graphs using exchangeable random measures. Technical report, arXiv:1401.1137, 2014. F. Caron and Y. W. Teh. Bayesian nonparametric models for ranked data. In Advances in Neural Information Processing Systems, 2012. D. A. Dawson. Stochastic evolution equations and related measure processes. Journal of Multivariate Analysis, 5(1):1–52, 1975. S. N. Ethier and R. C. Griffiths. The transition function of a measure-valued branching diffusion with immigration. Stochastic Processes. A Festschrift in Honour of Gopinath Kallianpur (S. Cambanis, J. Ghosh, RL Karandikar and PK Sen, eds.), pages 71–79, 1993. D. N. Hoover. Relations on spaces and arrays of random variables. Preprint, Institute for Advanced Study, Princeton, NJ, 1979. L. F. James, A. Lijoi, and I. Prünster. Posterior analysis for normalized random measures with independent increments. Scandinavian Journal of , 36(1): 76–97, 2009.

Konstantina Palla 37 / 38 BIBLIOGRAPHY II

O. Kallenberg. Probabilistic symmetries and invariance principles. Springer, 2005. J. Kingman. Completely random measures. Pacific Journal of Mathematics, 21(1): 59–78, 1967. P. Orbanz and D. M. Roy. Bayesian models of graphs, arrays and other exchangeable random structures. IEEE Trans. Pattern Anal. Mach. Intelligence (PAMI), 37(2): 437–461, 2015. J. Pitman. Poisson-Kingman partitions. Lecture Notes-Monograph Series, pages 1–34, 2003. M. K. Pitt and S. G. Walker. Constructing stationary models using auxiliary variables with applications. Journal of the American Statistical Association, 100(470):554–564, 2005. S. Watanabe. A limit theorem of branching processes and continuous state branching processes. Journal of Mathematics of Kyoto University, 8(1):141–167, 1968.

Konstantina Palla 38 / 38