Information and Influence Propagation in Social Networks

YEP Workshop on Information Diffusion on Random Graphs, March 27, 2019 1 Social influence and viral phenomena YEP Workshop on Information Diffusion on Random Graphs, March 27, 2019 2 Voting mobilization: A Facebook study • Voting mobilization [Bond et al, Nature’2012] – show a facebook msg. on voting day with faces of friends who voted – generate 340K additional votes due to this message, among 60M people tested YEP Workshop on Information Diffusion on Random Graphs, March 27, 2019 3 Influence Propagation Modeling and Influence maximization task • Studies the stochastic models on how influence propagates in social networks – Its properties, e.g. submodularity • Influence maximization: given a budget 푘, select at most 푘 nodes in a social network as seeds to maximize the influence spread of the seeds – Applications in viral marketing, diffusion monitoring, rumor control, etc. YEP Workshop on Information Diffusion on Random Graphs, March 27, 2019 4 Outline of This Talk • Basic concepts: influence diffusion models, influence maximization task, submodularity, greedy algorithm • Scalable algorithm based on reverse influence sampling (RIS) • Influence-based centrality measures – Shapley centrality – Single Node Influence (SNI) centrality • Other models and tasks YEP Workshop on Information Diffusion on Random Graphs, March 27, 2019 5 Independent cascade model • Directed graph 퐺 = (푉, 퐸) • Each edge (푢, 푣) has a influence probability 푝(푢, 푣) • Initially seed nodes in 푆0 are activated • At each step 푡, each node 푢 activated at step 푡 − 1 activates its 0.3 neighbor 푣 independently with 0.1 probability 푝(푢, 푣) • Influence spread 휎(푆): expected number of activated nodes • Correspond to bond percolation YEP Workshop on Information Diffusion on Random Graphs, March 27, 2019 6 Linear threshold model 0.5 • Each edge (푢, 푣) has a influence 0.6 weight 푤 푢, 푣 : – when 푢, 푣 ∉ 퐸, 푤 푢, 푣 = 0 – σ푢 푤 푢, 푣 ≤ 1 0.3 0.3 • Each node 푣 selects a threshold 휃푣 ∈ [0,1] uniformly at random 0.4 0.3 0.7 • Initially seed nodes in 푆0 are activated 0.1 • At each step, node 푣 checks if the weighted sum of its active in- 0.2 0.8 neighbors is greater than or equal to 0.3 its threshold 휃푣, if so 푣 is activated YEP Workshop on Information Diffusion on Random Graphs, March 27, 2019 7 Interpretation of IC and LT models • IC model reflects simple contagion, e.g. information, virus • LT model reflects complex contagion, e.g. product adoption, innovations (activation needs social affirmation from multiple sources [Centola and Macy, AJS 2007]) • More general models are studied: triggering model, general threshold models, decreasing cascade model, etc. – Note: not all models correspond to reachability on random graphs, e.g. general threshold model corresponds to random hyper-graphs (ongoing research) YEP Workshop on Information Diffusion on Random Graphs, March 27, 2019 8 Influence maximization • Given a social network, a diffusion model with given parameters, and a number 푘, find a seed set 푆 of at most 푘 nodes such that the influence spread of 푆 is maximized. • NP-hard • Based on submodular function maximization • [Kempe, Kleinberg, and Tardos, KDD’2003] YEP Workshop on Information Diffusion on Random Graphs, March 27, 2019 9 Submodular set functions • Sumodularity of set functions 푓: 2V → 푅 – for all 푆 ⊆ 푇 ⊆ 푉, all 푣 ∈ 푉 ∖ 푇, 푓 푆 ∪ 푣 − 푓 푆 ≥ 푓 푇 ∪ 푣 − 푓(푇) – diminishing marginal return 푓(푆) – an equivalent form: for all 푆, 푇 ⊆ 푉 푓 푆 ∪ 푇 + 푓 푆 ∩ 푇 ≤ 푓 푆 + 푓 푇 • Monotonicity of set functions 푓: for all 푆 ⊆ 푇 ⊆ 푉, |푆| 푓 푆 ≤ 푓(푇) • Influence spread function 휎(푆) is monotone and submodular in the IC model (and many other models) YEP Workshop on Information Diffusion on Random Graphs, March 27, 2019 10 Example of a submodular function and its maximization problem • set coverage elements – each entry 푢 is a subset of some base elements sets | 푢∈푆 푢ڂ | = coverage 푓 푆 – – 푓 푆 ∪ 푣 − 푓 푆 : additional coverage of 푣 on 푆 top of 푆 푇 • 푘-max cover problem – find 푘 subsets that maximizes their total coverage 푣 – NP-hard – special case of IM problem in IC model YEP Workshop on Information Diffusion on Random Graphs, March 27, 2019 11 Submodularity of influence diffusion models • Based on equivalent live-edge graphs 0.3 0.1 diffusion dynamic random live-edge graph: edges are randomly removed Pr(set A is activated given seed Pr(set A is reachable from S in set S) random live-ledge graph) YEP Workshop on Information Diffusion on Random Graphs, March 27, 2019 12 Random live-edge graph for the IC model and its reachable node set • Random live-edge graph in the IC model – each edge is independently selected as live with its influence probability • Pink node set is the active node set 0.3 reachable from the seed set in a 0.1 random live-edge graph • Equivalence is straightforward (it is essentially bond percolation) YEP Workshop on Information Diffusion on Random Graphs, March 27, 2019 13 Random live-edge graph for the LT model and its reachable node set • Random live-edge graph in the LT model – each node select at most one incoming edge, with probability equal to its influence weight • Pink node set is the active node set reachable from the seed set in a 0.3 random live-edge graph 0.1 • Equivalence is based on uniform threshold selection from [0,1], and linear weight addition • Not exactly a bond percolation YEP Workshop on Information Diffusion on Random Graphs, March 27, 2019 14 Submodularity of influence diffusion models (cont’d) • Submodularity of 푅 ⋅, 퐺퐿 • for any 푆 ⊆ 푇 ⊆ 푉, 푣 ∈ 푉 ∖ 푇, • if 푢 is reachable from 푣 but not from 푇, 푆 푇 then • 푢 is reachable from 푣 but not from 푆 푣 푢 • Hence, 푅 ⋅, 퐺퐿 is submodular • Therefore, influence spread 휎 푆 is marginal contribution of submodular in the IC model 푣 w.r.t. 푇 YEP Workshop on Information Diffusion on Random Graphs, March 27, 2019 15 Greedy algorithm for submodular function maximization 1: initialize 푆 = ∅ ; 2: for 푖 = 1 to 푘 do 3: select 푢 = argmax푤∈푉∖푆[푓 푆 ∪ 푤 − 푓(푆))] 4: 푆 = 푆 ∪ {푢} 5: end for 6: output 푆 YEP Workshop on Information Diffusion on Random Graphs, March 27, 2019 16 Property of the greedy algorithm • Theorem: If the set function 푓 is monotone and submodular with 푓 ∅ = 0, then the greedy algorithm achieves (1 − 1/푒) approximation ratio, that is, the solution 푆 found by the greedy algorithm satisfies: 1 ′ 푓 푆 ≥ 1 − max ′ ′ 푓(푆 ) 푒 푆 ⊆푉, 푆 =푘 YEP Workshop on Information Diffusion on Random Graphs, March 27, 2019 17 Hardness of Influence Maximization and Influence Computation • In IC and LT models, influence maximization is NP-hard – IC model: reduction from the set cover problem • In IC and LT models, computing influence spread 휎(푆) for any given 푆 is #P-hard [Chen et al. KDD’2010, ICDM’2010]. – IC model: reduction from the s-t connectedness counting problem. • Implication of #P-hardness of computing 휎(푆) – Greedy algorithm needs adaptation --- using Monte Carlo simulations YEP Workshop on Information Diffusion on Random Graphs, March 27, 2019 18 MC-Greedy: Estimating influence spread via Monte Carlo simulations • For any given S • Simulate the diffusion process from 푆 for 푅 times (R should be large) • Use the average of the number of active nodes in 푅 simulations as the estimate of 휎(푆) • Can estimate 휎(푆) to arbitrary accuracy, but require large R – Theoretical bound can be obtained using Chernoff bound. YEP Workshop on Information Diffusion on Random Graphs, March 27, 2019 19 Theorems on MC-Greedy algorithm • Polynomial time, but could be very slow: 70+ hours on a 15k node graph YEP Workshop on Information Diffusion on Random Graphs, March 27, 2019 20 Simulation on Real Network NetHEPT in uniform IC: p=0.01 weighted IC: 푝 푢, 푣 = 1/푑푣 NetHEPT: collaboration network on arxiv MC-Greedy[20000] is the best MC-Greedy[200] is worse than Degree Random is the worst YEP Workshop on Information Diffusion on Random Graphs, March 27, 2019 21 Probabilists’ View vs. Computer Scientists’ View on Diffusion Probabilists’ view Computer scientists’ view subject (stochastic) diffusion on random networks (stochastic) diffusion on fixed networks (often equivalent to deterministic diffusion on random sub-networks of the fixed network) network family of random networks (푛 → ∞, e.g. fixed network with arbitrary topology configuration model), infinite lattice, etc. diffusion models percolation, SIR, SIS, etc. independent cascade (equivalent to bond percolation), linear threshold, triggering, general threshold, etc. goal reveal properties of the diffusion, e.g. optimization, e.g. influence maximization condition of the phase transition method and tools probabilistic analysis, Markov process, submodularity analysis, submodular branching process, maximization, concentration inequalities focus probabilistic analysis, phase transition algorithm design, efficiency, approximation ratio condition, size distribution, etc. YEP Workshop on Information Diffusion on Random Graphs, March 27, 2019 22 Outline of This Talk • Basic concepts: influence diffusion models, influence maximization task, submodularity, greedy algorithm • Scalable algorithm based on reverse influence sampling (RIS) • Influence-based centrality measures – Shapley centrality – Single Node Influence (SNI) centrality • Other models and tasks YEP Workshop on Information Diffusion on Random Graphs, March 27, 2019 23 Ways to improve scalability • Fast deterministic heuristics – Utilize model characteristic – MIA/IRIE heuristic for IC model [Chen et al. KDD’10, Jung et al. ICDM’12] – LDAG/SimPath heuristics for LT model [Chen et al. ICDM’10, Goyal et al. ICDM’11] • Monte Carlo simulation based – Lazy evaluation [Leskovec et al. KDD’2007], Reduce the number of influence spread evaluations • New approach based on Reverse Influence Sampling (RIS) • First proposed by Borgs et al.

Load more