Exact Computation of Influence Spread by Binary Decision Diagrams

Exact Computation of Influence Spread by Binary Decision Diagrams Takanori Maehara1;2) Hirofumi Suzuki3) Masakazu Ishihata3) 1) Shizuoka University 2) RIKEN Center for Advanced Intelligence Project 3) Hokkaido University [email protected] [email protected] [email protected] ABSTRACT 1. INTRODUCTION Evaluating influence spread in social networks is a fundamental procedure to estimate the word-of-mouth effect in 1.1 Background and Motivation viral marketing. There are enormous studies about this Viral marketing is a strategy to promote products by giv- topic; however, under the standard stochastic cascade mod- ing free (or discounted) items to a selected group of highly in- els, the exact computation of influence spread is known to be fluential individuals (seeds), in the hope that through word- #P-hard. Thus, the existing studies have used Monte-Carlo of-mouth effects, a significant product adoption will occur [8, simulation-based approximations to avoid exact computa- 27]. To maximize the number of adoptions, Kempe, Klein- tion. berg, and Tardos [19] mathematically formalized the dy- We propose the first algorithm to compute influence spread namics of information propagation, and proposed the op- exactly under the independent cascade model. The algo- timization problem, referred to as the influence maximiza- rithm first constructs binary decision diagrams (BDDs) for tion problem. Several cascade models have been proposed, all possible realizations of influence spread, then computes and the most commonly used one is the independent cascade influence spread by dynamic programming on the constructed model, proposed by Goldberg, Libai, and Muller [10, 11]. BDDs. To construct the BDDs efficiently, we designed a In this model, the individuals are affected by information new frontier-based search-type procedure. The constructed that is stochastically and independently propagated along BDDs can also be used to solve other influence-spread re- edges in the network from the seed (Section 2.1). To date, lated problems, such as random sampling without rejection, significant efforts have been devoted to the development conditional influence spread evaluation, dynamic probability of efficient algorithms for the influence maximization prob- update, and gradient computation for probability optimiza- lem [1, 4{7, 25, 26, 31]. tion problems. Here we consider the computational complexity of the We conducted computational experiments to evaluate the influence maximization problem. Under the independent proposed algorithm. The algorithm successfully computed cascade model, the expected size of influence spread is a influence spread on real-world networks with a hundred edges non-negative submodular function [19]; thus, a (1 − 1=e) in a reasonable time, which is quite impossible by the naive approximate solution can be obtained by using a greedy al- algorithm. We also conducted an experiment to evaluate gorithm [24]. However, the evaluation of influence spread the accuracy of the Monte-Carlo simulation-based approxi- is #P-hard [4] because it contains the problem of count- mation by comparing exact influence spread obtained by the ing s-t connected subgraphs [32]. Thus all existing studies proposed algorithm. avoided the exact computation and employed the Monte- Carlo simulation-based approximation, which simulates the Keywords dynamics of information propagation sufficiently many times (e.g., Ω(1/2)) to obtain an accurate (e.g., 1 ± ) approxi- viral marketing; influence spread; enumeration algorithm; mation of influence spread [25] (Section 6). binary decision diagram In this study, we first tackle the problem of computing influence spread exactly under the independent cascade model. As the problem is #P-hard, we are interested in an algorithm that runs on small real-world networks (i.e., having a few hundred edges) in a reasonable time. The motivations for this studies are as follows. • Influence spread over small networks is practically im- c 2017 International World Wide Web Conference Committee (IW3C2), portant. Because real social networks often consist of published under Creative Commons CC BY 4.0 License. many small communities, it is reasonable to consider WWW 2017, April 3-7, 2017, Perth, Australia. each community separately or consider only the inter- ACM 978-1-4503-4913-0/17/04. community network. http://dx.doi.org/10.1145/3038912.3052567 • When we wish to rank vertices according to their influence spread, we need to compute the values accu- . rately. Monte-Carlo simulation cannot be used for this 947 purpose because it requires Ω(1/2) samples for 1 ± Let G = (V; E) be a directed graph with vertices V and approximation; thus < 10−5 is impossible. On the edges E. Each edge e 2 E has activation probability p(e). other hand, an exact method can be used because its Each vertex is either active or inactive. Note that inactive complexity does not depend on the desired accuracy. vertices may become active, but not vice versa. Here, an active vertex is considered \influenced." • Exact influence spread helps to analyze the quality of Suppose that information is propagated from S ⊆ V , Monte-Carlo simulation. Although many experiments which is called seeds. Initially, all vertices are inactive. using Monte-Carlo simulation have been conducted, Then, propagation over the network is performed as follows. none have been compared with the exact value because First, each seed u 2 S is activated. When u first becomes there is no algorithm that can compute this value. active, it is given a single chance to activate each currently • Establishing a practical algorithm for the fundamental inactive neighbor v with probability p((u; v)). This process #P-hard problem is interesting and important task in is repeated until no further activations are possible. The computer science. expected number of activated vertices after the end of the process is called influence spread, which is denoted as σ(S). 1.2 Contributions There is a useful interpretation of influence spread with In this study, we provide the following contributions. this model. We select each edge e 2 E with probability p(e). Then, we obtain edge set F . We then consider the induced • We propose an algorithm to compute influence spread subgraph G[F ] = (V; F ), which is a network consisting of exactly under the independent cascade model. Note only the selected edges. Here, let σ(S; F ) be the number that this is the first attempt to compute this value of vertices reachable from some u 2 S on G[F ]. Then, we exactly (Section 3). obtain the following: • The proposed algorithm enumerates all spread pat- X terns using binary decision diagrams (BDDs). Then, σ(S) = E[σ(S; F )] = σ(S; F )p(F ) (1) it computes influence spread by dynamic programming F ⊆E on the BDDs. Here, we have designed a new frontier- based search method, which constructs the BDD for where s-t connected subgraphs efficiently (Section 3.2). This Y Y 0 is the main technical contribution of this study. p(F ) = p(e) (1 − p(e )): (2) • We conducted computational experiments to evaluate e2F e02EnF the proposed algorithm (Section 5). We obtained the exact influence on real-world and synthetic networks We use this formula to compute the influence spread. with a hundred edges in reasonable times. We also compared the obtained exact influence with the one 2.2 Binary decision diagram obtained using the Monte-Carlo simulation. As discussed in Section 3, the exact evaluation of (1) in- In addition, using the constructed BDDs, we can also solve volves enumerating S-t connecting subgraphs, which is the the following influence-spread related problems (Section 4). graph having a path from S to t. To maintain exponentially many such subgraphs, we use the binary decision diagram • Random sampling from the set of realizations that suc- (BDD), which is a data structure to represent a Boolean cessfully propagates information helps to understand function compactly based on Shannon decomposition. Note the route of influence spread. We can perform this that a Boolean function can be used to represents set family without rejection by using the BDD. as the indicator function. A BDD is a directed acyclic graph D = (N ; A) with node • The conditional expectation of the influence spread set N and arc set A.1 It has two terminals 0 and 1. Each under the influenced (and non-influenced) conditions non-terminal node α 2 N is associated with variable e 2 E, on some vertices can be used to measure the effect of and has two arcs called 0-arc and 1-arc. The nodes pointed conducted viral promotion from a small observations. by 0-arc and 1-arc are referred to as 0-child and 1-child (de- This value is efficiently computed by the BDDs. noted by α0 and α1), respectively. A BDD represents a • When the activation probability changes, we can effi- Boolean function as follows: A path from the root node to ciently update the influence spread. the 1-terminal represents a (possibly partial) variable assign- ment for which the represented Boolean function is True. As • The derivatives of the influence spread with respect to the path descends to a 0-arc (1-arc) from a node, the node's the activation probabilities can be computed. This is variable is assigned to False (True). used to implement a gradient method for the influence A special type of BDD, i.e., reduced ordered binary deci- spread optimization problem. sion diagram (ROBDD) [3], is frequently used in practice. A BDD is ordered if different variables appear in the same 2. PRELIMINARIES order on all paths from the root. A BDD is reduced if the 2.1 Independent Cascade Model for Influence Spread 1To avoid confusion, we use the terms \vertex" and "edge" to refer to a vertex and edge in the original graph G, and The independent cascade model [10, 11] is the most com- \node" and \arc" to refer to a vertex and edge in the BDD monly used stochastic cascade model used for social network D.

Load more