Index

G(n, m) model, 107 behavior analysis methodology, G(n, p) model, 106 320 k-clan, 183 betweenness , 78 k-clique, 182 BFS, see breadth-first search k-club, 183 big data, 13 k-means, 156 big data paradox, 13 k-plex, 180 bipartite graph, 44 botnet, 171 actor, see node breadth-first search, 47 bridge, 45 Adamic and Adar measure, 324 bridge detection, 60 adjacency list, 32 adjacency matrix, 31 cascade maximization, 225 adopter types, 229 causality testing, 321 adoption of hybrid seed corn, 229 centrality, 70 affiliation network, 44 betweenness centrality, 78 Anderson, Lisa R., 218 computing, 78 Asch conformity experiment, 216 closeness centrality, 80 Asch, Solomon, 216 degree centrality, 70 assortativity, 255 gregariousness, 70 measuring, 257 normalized, 71 nominal attributes, 258 prestige, 70 ordinal attributes, 261 eigenvector centrality, 71 significance, 258 group centrality, 81 attribute, see feature betweenness, 82 average path length, 102, 105 closeness, 82

373 degree, 81 evolution, 193 , 74 explicit, 173 divergence in computation, implicit, 174 75 community detection, 175 PageRank, 76 group-based, 175, 184 Christakis, Nicholas A., 245 member-based, 175, 177 citizen journalist, 11 node degree, 178 class, see class attribute node reachability, 181 class attribute, 133 node similarity, 183 clique identification, 178 community evaluation, 200 clique percolation method, 180 community membership behav- closeness centrality, 80 ior, 317 cluster centroid, 156 commute time, 327 clustering, 155, 156 confounding, 255 k-means, 156 contact network, 237 hierarchical, 191 content data, 316 agglomerative, 191 content-based recommendation, divisive, 191 288 partitional, 156 cosine similarity, 92, 183, 291 spectral, 187 covariance, 94, 261 clustering coefficient, 84, 102 global, 85 data point, see instance local, 86 data preprocessing, 138 cohesive subgroup, see commu- aggregation, 138 nity discretization, 138 cold start problem, 286 feature extraction, 138 collaborative filtering, 289, 323 feature selection, 138 memory-based, 289 sampling, 138 item-based, 292 random, 139 user-based, 290 stratified, 139 model-based, 289, 293 with/without replacement, collective behavior, 328 139 collective behavior modeling, 333 data quality, 137 common neighbors, 324 duplicate data, 137 community, 171 missing values, 137 detection, 175 noise, 137 emit, see explicit outliers, 137 etic, see implicit data scientist, 12

374 data sparsity, 287 undirected, 27 decision tree learning, 141 visit, 36 degree, 25, 28 edge list, 32 distribution, 29 edge-reversal test, 277 degree centrality, 70, 264 eigenvector centrality, 71 gregariousness, 70 entropy, 143 prestige, 70 epidemics, 214, 236 , 29, 102 infected, 238 dendrogram, 191 recovered, 238 densification, 194 removed, 238 depth-first search, 46 SI model, 239 DFS, see depth-first search SIR model, 241 diameter, 40 SIRS model, 244 shrinkage, 196 SIS model, 242 diffusion of innovation, 214 susceptible, 238 diffusion of innovations, 228 Erdos,˝ Paul, 107 Dijkstra, Edsger, 49 Euclidean distance, 155 Dijskra’s algorithm, 49 evaluation dilemma, 13 diminishing returns, 318 evolutionary clustering, 198 directed graph, 28, 33 external-influence model, 232 distinguishing influence and ho- mophily, 276 F-measure, 202, 307 edge-reversal test, 277 false negative, 201 randomization test, 278 false positive, 201 shuffle test, 276 feature, 133 categorical, see nominal early adopters, 229 continuous, 133 early majority, 229 discrete, 133 Eckart-Young-Mirsky theorem, interval, 134 294 nominal, 134 edge, 27 ordinal, 134 betweenness, 191 ratio, 134 directed, 27 feature selection, 319 incident, 35 Ford-Fulkerson algorithm, 55 loop, 28, 32 Frobenius norm, 293 self-link, 32 fully-mixed technique, 237 signed, 34 traverse, see visit giant component, 108, 194

375 Gilbert, Edgar, 106 adjacency list, 32 Girvan-Newman algorithm, 191 adjacency matrix, 31 global clustering coefficient, 85 edge list, 32 Granger causality, 321 shortest path, 39 Granger, Clive W. J., 321 Dijskra’s algorithm, 49 graph signed graph, 34 k-connected, 188 simple graph, 34 bipartite graph, 44 strongly connected, 38 bridge, 45 subgraph, 30 detection, 60 minimum spanning tree, 41 complete graph, 42 spanning tree, 41 component, 38 Steiner tree, 42 connected, 38 traversal, 45 connectivity, 35 breadth-first search, 47 circuit, see tour depth-first search, 46 cycle, 36 tree, 40 path, 36 undirected graph, 33 tour, 36 weakly connected, 38 trail, 36 weighted graph, 34 walk, 36 graph Laplacian, 187 , 185 graph traversal, 45 densification, 194 breadth-first search, 47 density, 190 depth-first search, 46 diameter, 40 group, see community directed, 28 group centrality, 81 directed graph, 33 betweenness, 82 edge, 25, 27 closeness, 82 empty graph, 33 degree, 81 forest, 40 minimum cut, 185 herd behavior, 214 mixed graph, 33 hitting time, 327 multigraph, 34 Holt, Charles A., 218 node, 25, 26 , 255, 274 null graph, 33 measuring, 274 partition, 44 modeling, 274 planar graph, 43 regular graph, 44 ICM, see independent cascade representation model

376 in-degree, 28 Kendall’s tau, 308 independent cascade model, 222 knowledge discovery in individual behavior, 315 databases, 131 user-community behavior, Kronecker delta function, 258 316 user-entity behavior, 316 labeled data, 140 user-user behavior, 316 laggards, 229 individual behavior modeling, Laplacian, 187 322 largest connected component, 108 influence, 255, 264 late majority, 229 measuring, 264 levels of measurement, 133 observation-based, 264 LIM, see linear influence model prediction-based, 264 linear influence model, 269, 334 modeling, 269 linear threshold model, 269, 322 influence flow, 266 link data, 316 influence modeling, 269 link prediction, 299, 323 explicit networks, 269 node neighborhood-based implicit networks, 271 methods, 324 information cascade, 214, 221, 323 path-based methods, 326 information pathways, 78 local clustering coefficient, 86 information provenance, 249 logistic function, 276 innovation characteristics, 228 logistic growth function, 240 innovators, 229 logistic regression, 333 instance, 133 logit function, 152 labeled, 133 loop, 28, 32 unlabeled, 133 low-rank matrix approximation, internal-influence model, 232 293 intervention, 214, 220, 227, 235, LTM, see linear threshold model 245 MAE, see mean absolute error Jaccard similarity, 92, 183, 324 Mahalanobis distance, 155 matrix factorization, 301 Katz centrality, 74 maximum bipartite , 57 divergence in computation, 75 mean absolute error, 305 Katz model, 230 measurement, see feature Katz, Elihu, 230 measuring homophily, 274 KDD process, see knowledge dis- Milgram, Stanley, 105 covery in databases minimum spanning tree, 41

377 Prim’s algorithm, 52 normalized mean absolute error, mixed graph, 33 305 mixed-influence model, 232 normalized mutual information, modeling homophily, 274 203 , 189, 259 nuke attack, 287 multi-step flow model, 230 observation, see instance mutual information, 203 obtaining sufficient samples, 13 out-degree, 28 naive bayes classifier, 144 overfitting, 301 nearest neighbor classifier, 146 neighborhood, 28 PageRank, 76, 264 network flow, 53 parametric estimation, 273 augmenting path, 56 Pearson correlation coefficient, 94, capacity, 54 263, 290 capacity constraint, 54 Perron-Frobenius theorem, 72 flow, 54 Poisson distribution, 112 flow conservation constraint, Power Iteration Method, 98 54 power-law distribution, 30 flow network, 54 powerlaw distribution, 103, 272, Ford-Fulkerson algorithm, 55 276, 333 residual capacity, 56 precision, 201, 307 residual network, 56 preferential attachment measure, source and sink, 54 325 network measure, 69 preferential attachment model, network segmentation, 194 333 NMI, see normalized mutual in- prestige, 70 formation prominence, see prestige node, 26 purity, 203 adjacent, 35 push attack, 287 connected, 38 quasi-clique, 191 degree, 25, 28 in-degree, 28 , 106 neighborhood, 28 average path length, 114 out-degree, 28 clustering coefficient, 112 reachable, 38 degree distribution, 111 noise removal fallacy, 13 evolution, 108 normalized cut, 185 phase transition, 109

378 random walk, 36 similarity, 91 randomization test, 278, 321 cosine similarity, 92 rank correlation, 268 Jaccard similarity, 92, 94 Rapoport, Anatol, 106 regular equivalence, 94 ratio cut, 185 structural equivalence, 92 raw data, 131 SimRank, 328 recall, 201, 307 singleton, 194 reciprocity, 87 singular value decomposition, recommendation to groups, 297 152, 293 least misery, 297 SIR model, 241 maximizing average satisfac- SIRS model, 244 tion, 297 SIS model, 242 most pleasure, 298 six degrees of separation, 105 recommender systems, 285 small world model, 333 regression, 322 small-world, 105 regular equivalence, 94, 184 small-world model, 115 regular graph, 44 average path length, 118 ring lattice, 115 clustering coefficient, 118 regular ring lattice, 115 degree distribution, 117 regularization term, 302 Social Atom, 12 relationships, see edge social balance, 89 relaxing cliques, 180 social correlation, 276 k-plex, 180 social media, 11 RMSE, see root mean squared er- social media mining, 12 ror Social Molecule, 12 Rogers, Everett M., 228 , 25 root mean squared error, 305 social similarity, 255 rooted PageRank, 327 social status, 35, 90 Renyi,´ Alfred, 107 social tie, see edge sociomatrix, see adjacency matrix scale-free network, 105 Solomonoff, Ray, 106 self-link, see loop, 32 sparse matrix, 32 sender-centric model, 222 Spearman’s rank correlation coef- sentiment analysis, 164 ficient, 268, 308 shortest path, 39 spectral clustering, 187 Dijskra’s algorithm, 49 star, 194 shuffle test, 276 Stevens, Stanley Smith, 133 SI model, 239 Strogatz, Steven, 115

379 structural equivalence, 92 TF-IDF, see term frequency- submodular function, 225 inverse document fre- supervised learning, 140 quency classification, 141 theory of scales, 133 decision tree learning, 141 training data, see labeled data naive bayes classifier, 144 transitivity, 83 nearest neighbor classifier, clustering coefficient, 84 146 global, 85 with network information, local, 86 147 Trotter, Wilfred, 216 evaluation true negative, 201 k-fold cross validation, 154 true positive, 201 accuracy, 154 undirected graph, 33 leave-one-out, 154 unsupervised learning, 155 evaluations, 153 evaluation, 158 regression, 141, 150 cohesiveness, 159 linear, 151 separateness, 159 logistic, 152 silhouette index, 160 SVD, see singular value decompo- user migration, 329 sition user-item matrix, 289 tabular data, 133 vector-space model, 135 tag clouds, 205 vectorization, 135 target data, 131 vertex, see node term frequency-inverse docu- Watts, Duncan J., 115 ment frequency, 135 test data, 140 Zachary’s karate club, 173

380