Index

A-Priori Algorithm, 224, 225, 231 Archive, 144 Accessible page, 199 Ask, 204 Accumulated sum, 513 Association rule, 217, 219 Accuracy impurity, 508 Associativity, 27, 45 Action, 44 Attribute, 33 Activation function, 531 Auction, 305 Activation map, 549 Augmentation of data, 567 Active learning, 470 Austern, M.H., 79 Ad-hoc query, 146 Authority, 204 Adjacency matrix, 375 Average, 156 Adomavicius, G., 352 Advertising, 17, 128, 216, 293 B-tree, 292 Adwords, 302 Babcock, B., 174, 292 Affiliation-graph model, 383 Babu, S., 174 Afrati, F.N., 77, 78, 426 , 539 Agglomerative clustering, see Hierar- Backpropagation through time, 561 chical clustering Backstrom, L., 426 Aggregation, 34, 38 Bad point, 489 Agrawal, R., 250 Bag, 40, 85 Alexandrov, A., 78 Balance Algorithm, 305 Algorithm, 1 Balazinska, M., 78 All-pairs problem, 64, 68 Band, 100 Alon, N., 174 Bandwidth, 22 Alon-Matias-Szegedy Algorithm, 158 Basket, see Market basket, 214, 216, Amplification, 113 217, 246 Analytic query, 60 Batch gradient descent, 495 AND-construction, 113 Batch learning, 469 Anderson, C., 352, 353 Bayes net, 5 Andoni, A., 141 BDMO Algorithm, 283 ANF, see Approximate neighborhood Beer and diapers, 218 function Bell, R., 353 ANF Algorithm, 419 Bellkor’s Pragmatic Chaos, 322 Apache, 24, 25, 78, 79 Bengio, Y., 569 Approximate neighborhood function, 419Berges, C.J.C., 570 Arbib, M,, 569 Bergmann, R., 78 Arc, 405 Berkhin, P., 212

571 572 INDEX

Berrar, D.P., 461 Budiu, M., 79 Betweenness, 363 Bulk-synchronous system, 51 BFR Algorithm, 266, 269 Burges, C.J.C., 521 BFS, see Breadth-first search Burrows, M., 78 Bi-clique, 370 Buxketing, 559 Bias vector, 530 Bickson, D., 79 Caffe, 530, 569 Bid, 303, 305, 312, 313 Candidate itemset, 227, 240 Big data, 1 Candidate pair, 81, 100, 231, 234 BigTable, 77 Carey, M., 77, 78 Bik, A.J.C., 79 Categorical feature, 464, 511, 517 Binary Classification, 464 Cell state, 562 Biomarker, 217 Centroid, 255, 258, 264, 267, 271 Bipartite graph, 299, 359, 369, 370 Chabbert, M., 353 BIRCH Algorithm, 292 chain rule, 542 Birrell, A., 79 Chandra, T., 78 Bitmap, 232, 233 Chang, F., 78 Block, 13, 21, 192 Channel, 548 Blockeel, H., 521 Characteristic matrix, 89 Blocking property, 43, 49 Charikar, M.S., 141 Blog, 200 Bloom filter, 152, 230 Chaudhuri, S., 141 Bloom, B.H., 174 Checkpoint, 52 Blum, A., 521 Chen, M.-S., 250 Bohannon, P., 78 Child, 364 Boldi, P., 426 Cholera, 4 Bonferroni correction, 6 Chowdhury, M., 79 Bonferroni’s principle, 5, 6 Chronicle data model, 173 Bookmark, 198 Chunk, 24, 240, 270 Boral, H., 427 CineMatch, 349 Borkar, V., 77, 78 Class, 464 Boser, B., 570 Classification loss, 537 Bottou, L., 521 Classifier, 330, 463 BPTT, see Backpropagation through Click stream, 145 time Click-through rate, 297, 303 Bradley, P.S., 292 Clique, 369 Breadth-first search, 363 Cloud computing, 17 Breiman, L., 20 Cluster computing, 21, 22 Brick-and-mortar retailer, 216, 320, 321 Cluster tree, 278, 279 Brin, S., 212 Clustera, 77 Broad matching, 305 Clustering, 4, 17, 253, 337, 355, 361, Broder, A.Z., 20, 141, 212 463 Bu, Y., 78 Clustroid, 258, 264 Bucket, 10, 149, 164, 168, 230, 283 CNN, see Convolutional neural network Budget, 304, 311 Collaboration network, 358 INDEX 573

Collaborative filtering, 5, 18, 84, 293, Cyclic permutation, 99 319, 333, 359 Cylinder, 13 Colossus, 24 Czajkowski, G., 79 Column-orthonormal matrix, 443 Combiner, 27, 189, 191 DAG, see Directed acyclic graph Communication cost, 22, 54, 403 Darts, 152 Community, 18, 355, 366, 368, 399 Das Sarma, A., 78 Commutativity, 27, 45 Das, T., 79 Competitive ratio, 17, 298, 301, 306 Dasgupta, A., 426 Complete graph, 369, 370 , 1 Compressed set, 270 , 1 Compute graph, 540 Data stream, 17, 244, 282, 296, 484 Compute node, 21, 22 Data-stream-management system, 144 Computer game, 327 , 17 Concept, 444 Datar, M., 142, 174, 292 Concept space, 449 Datar-Gionis-Indyk-Motwani Algorithm, Conductance, 397 163 Confidence, 218, 219 Dave, A., 79 Content-based recommendation, 319, De Raedt, L., 521 324 Dead end, 179, 182, 183, 205 Convergence, 475 Dean, J., 78 Convexity, 517 Decaying window, 169, 246 Convolutional layer, 549, 551 Decision forest, 515, 518 Convolutional neural network, 18, 523, Decision tree, 18, 330, 464, 467, 468, 527, 548 505, 518 Cooper, B.F., 78 , 3, 18, 523 Coordinates, 254 Deerwester, S., 460 Cortes, C., 521, 570 Degree, 371, 400 Cosine distance, 107, 117, 325, 330, Degree matrix, 376 450 Dehnert, J.C., 79 Counting ones, 162, 283 del.icio.us, 326, 359 Covering an output, 66 Deletion, 108 Craig’s List, 294 Denker, J.S., 570 Craswell, N., 317 Dense matrix, 31, 452 Credit, 364 Density, 263, 265 Cristianini, N., 521 Density of edges, 396 Cross entropy, 533, 538 Depth-first search, 417 Cross-Validation, 469 Determinant, 431 Crowdsourcing, 470 Development set, 468 CUR-decomposition, 429, 452 DeWitt, D.J., 78 CURE Algorithm, 274, 278 DFS, see Distributed file system Currey, J., 79 Diagonal matrix, 443 Curse of dimensionality, 256, 280, 503, Diameter, 263, 265, 406 518 Diapers and beer, 216 Cut, 374, 375 Difference, 34, 37, 41 574 INDEX

Dimension table, 60 Ethernet, 21, 22 , 18, 340, 429, Euclidean distance, 105, 119, 502 503 Euclidean space, 105, 109, 254, 255, Directed acyclic graph, 364 258, 274 Directed graph, 405 Ewen, S., 78 Discard set, 270 Exclusive-or, 529 Disk, 13, 221, 255, 278 Explainability, 3 Disk block, see Block Exploding derivative, 531 Display ad, 294, 295 Exploding gradient, 562 Distance measure, 105, 253, 361, 538 Exponential linear unit, see ELU Distinct elements, 154, 157 Exponentially decaying window, see De- Distributed file system, 21, 24, 214, caying window 221 Extrapolation, 501 DMOZ, see Open directory Document, 82, 86, 217, 254, 313, 325, Facebook, 18, 198, 356 326, 466 Fact table, 60 Document frequency, see Inverse doc- Failure, 23, 30, 43, 51 ument frequency Faloutsos, C., 427, 461 Domain, 202 False negative, 100, 111, 239 Dot product, 107, 544 False positive, 100, 111, 152, 239 Drineas, P., 460 Family of functions, 112 Dropout, 566 Fang, M., 250 Dryad, 77 Fayyad, U.M., 292 DryadLINQ, 77 Feature, 278, 324–326 Dual construction, 360 , 470 Dubitzky, W., 461 Feature vector, 464, 517 Dumais, S.T., 460 Feedforward network, 531 Dup-elim task, 49 Fetterly, D., 79 Dying ReLU, 535 Fikes, A., 78 File, 23, 24, 221, 239 e, 13 Filter, 44, 549 Edit distance, 108, 110 Filtering, 151 Eigenpair, 431 Fingerprint, 125 Eigenvalue, 179, 377, 430, 440, 441 First-price auction, 305 Eigenvector, 179, 377, 430, 435, 440 Fixedpoint, 114, 204 ELU, 535 Flajolet, P., 174 Email, 358 Flajolet-Martin Algorithm, 155, 419 Energy, 448 Flatmap, 44 Ensemble, 331 Flattening, 547 Ensemble methods, 516 Flink, 77, 78 Entity resolution, 122, 123 Flow graph, 42 Entropy, 508, 537 Forget gate, 563 Equijoin, 34 Fortunato, S., 426 Erlingsson, I., 79 Fotakis, D., 426 Ernst, M., 78 Franklin, M.J., 79 INDEX 575

French, J.C., 292 GPU, see Graphics processing unit, 556 Frequent bucket, 231, 233 Gradient descent, 18, 48, 348, 386, 492, Frequent itemset, 5, 214, 224, 226, 370, 526, 529, 531, 539 463 Granzow, M., 461 Frequent pairs, 225 Graph, 51, 64, 356, 399, 406 Frequent-items table, 226 Graphics processing unit, 530 Freund, Y., 521 GraphLab, 77 Freytag, J.-C., 78 GraphX, 79 Friends, 356 Greedy algorithm, 296, 297, 300, 304 Friends relation, 59 GRGPF Algorithm, 278 Frieze, A.M., 141 GroupByKey, 45 Frobenius norm, 433, 447 Grouping, 26, 34, 38 Fukushima, K., 569 Grouping attribute, 34 Fully connected layer, 527 Groupon, 359 Furnas, G.W., 460 Grover, R., 78 GRU, see Gated recurrent unit Gaber, M.M., 20 Gruber, R.E., 78 Ganti, V., 141, 292 Guestrin, C., 79 Garcia-Molina, H., 20, 212, 250, 292, Guha, S., 292 427 Gunda, P.K., 79 Garofalakis, M., 174 Gyongi, Z., 212 Gate, 562 Gated recurrent unit, 564 Hadamard product, 544, 563 Gaussian elimination, 180 Hadoop, 25, 79 Gehrke, J., 174, 292 Hadoop distributed file system, 24 Generalization, 469 HaLoop, 51, 77 Generated subgraph, 370 Hamming distance, 74, 109, 117 Genre, 324, 336, 350 Hanazawa, T., 570 GFS, see Google file system Harris, M., 350 Ghemawat, S., 78 Harshman, R., 460 Gibbons, P.B., 174, 427 Hash function, 10, 87, 92, 100, 149, GINI impurity, 508 152, 155 Gionis, A., 142, 174 Hash join, 37 Giraph, 77, 78 Hash key, 10, 312 Girvan, M., 426 Hash table, 10, 12, 13, 223, 230, 233, Girvan-Newman Algorithm, 363 234, 312, 314, 400 Global minimum, 342 Haveliwala, T.H., 212 GN Algorithm, see Girvan-Newman Al- HDFS, see Hadoop distributed file sys- gorithm tem Gobioff, H., 78 Head, 413 Golub, G.H., 460 Heavy hitter, 400 Gonzalez, J., 79 Heise, A., 78 Google, 176, 187, 302 Hellerstein, J.M., 79 Google file system, 24 Henderson, D., 570 Google+, 356 Henzinger, M., 142 576 INDEX

Hidden layer, 525 Input, 64, 464 Hidden state, 558 Input gate, 563 , 255, 257, 275, Input layer, 525 338, 361 Insertion, 108 Hinge loss, 491 Instance-based learning, 467 Hinton, G.E., 569, 570 Interest, 218 HITS, 204 Internet Movie Database, 324, 350 Hive, 77, 79 Interpolation, 501 Hochreiter, S., 569 Intersection, 34, 36, 40, 85 Hoger, M., 78 Into Thin Air, 323 Hopcroft, J.E., 417 Inverse document frequency, 9, see TF.IDF Hopfield, J.J., 569 Inverted index, 176, 294 Horn, H., 79 Ioannidis, Y.E., 426 Howard, R.E., 570 IP packet, 145 Howe, B., 78 Isard, M., 79 Hsieh, W.C., 78 Isolated component, 182 Hub, 204 Item, 214, 216, 217, 320, 336, 337 Hubbard, W, 570 Item profile, 324, 327 Huber loss, 536 Itemset, 214, 222, 224 Hueske, F., 78 Hyperbolic tangent, 532, 562 Jaccard distance, 104, 106, 112, 325, Hyperlink-induced topic search, see HITS 504 Hyperparameter, 535 Jaccard similarity, 82, 91, 104, 199 Hyperplane, 486 Jacobian, 541 Hyracks, 77 Jacobsen, H.-A., 78 Jagadish, H.V., 174 Identical documents, 130 Jahrer, M., 353 Identity matrix, 431 Jeh, G., 426 IDF, see Inverse document frequency Joachims, T., 521 Image, 145, 325, 326 Join, see Natural join, 45, see Multi- ImageNet, 548, 553, 569 way join, see Star join, 401 IMDB, see Internet Movie Database Join task, 49 Imielinski, T., 250 Immediate subset, 242 K-means, 266 Immorlica, N., 142 K-partite graph, 359 Important page, 176 Kahan, W., 460 Impression, 294 Kalyanasundaram, B., 318 Impurity, 507 Kamm, D., 350 In-component, 181 Kang, U., 427 Inaccessible page, 199 Kannan, R., 460 Independent rows or columns, 443 Kao, O., 78 Index, 11, 400 Karlin, A., 298 Indyk, P., 141, 142, 174 Kaushik, R., 141 Information integration, 6 Kautz, W.H., 174 Initialize clusters, 267 Kernel, 552 INDEX 577

Kernel function, 498, 502 Livny, M., 292 Key component, 149 Local minimum, 342 Key-value pair, 25, 27 Locality, 356 Keyword, 303, 331 Locality-sensitive family, 116 KL-divergence, 538 Locality-sensitive function, 112 Kleinberg, J.M., 212 Locality-sensitive hashing, 1, 81, 100, Knuth, D.E., 20 111, 326, 504 Koren, Y., 353 Log likelihood, 387 Krioukov, A., 78 Logarithm, 13 Krizhevsky, A., 569 Logistic sigmoid, 532 Kumar, R., 20, 79, 212, 426 Long short-term memory network, 18, Kumar, V., 20 523, 562 Kyrola, A., 79 Long tail, 216, 320, 321 Longest common subsequence, 108 Label, 356, 464 Low, Y., 79 Lagrangean multipliers, 58 Lower bound, 68 Landauer, T.K., 460 Lower hyperplane, 487 Lang, K.J., 426, 570 LSH, see Locality-sensitive hashing Laplacian matrix, 376 LSTM, see Long short-term memory Layer, 523 network Lazy evaluation, 46 LCS, see Longest common subsequence Ma, J., 79 Leaf, 365 , 2, 18, 330, 463 Leaky ReLU, 535 Maggioni, M., 460 , 472, 545 Maghoul, F., 20, 212 LeCun, Y., 569, 570 Mahalanobis distance, 273 Leich, M., 78 Mahoney, M.W., 426, 460 Leiser, N, 79 Main memory, 221, 222, 230, 255 Length, 158, 405 Malewicz, G, 79 Length indexing, 131 Malik, J., 427 Leser, U., 78 Manber, U., 142 Leskovec, J., 426, 427 Manhattan distance, 106 Leung, S.-T., 78 Manning, C.P., 20 Li, P., 142 Many-many matching, 126 Likelihood, 381 Many-many relationship, 64, 214 Lin, S., 142 Many-one matching, 126 Linden, G., 353 Map, 44 Lineage, 47 Map task, 25, 27 Linear equations, 180 Map worker, 28, 30 Linear separability, 471, 475 Mapping schema, 65 Linear transitive closure, 410, 415 MapReduce, 21, 25, 30, 189, 191, 241, Link, 33, 176, 190 287, 401, 408, 483 Link matrix of the Web, 205 Margin, 485 Link spam, 195, 199 Market basket, 5, 17, 213, 214, 221 Littlestone, N., 521 Markl, V., 78 578 INDEX

Markov process, 179, 182, 390 Modeling, 1 Martin, G.N., 174 Moments, 157 Master controller, 25, 27, 28 Monotonicity, 224 Master node, 24 Montavon, G., 521 Matching, 299 Moore-Penrose pseudoinverse, 453 Matias, Y., 174 Most-common elements, 169 Matrix, 31, see Transition matrix of Motwani, R., 142, 174, 250, 292 the Web, see Stochastic ma- Mueller, K.-R., 521 trix, see Substochastic matrix, Multiclass classification, 464, 479 189, 204, see Utility matrix, Multidimensional index, 503 340, see Adjacency matrix, see Multihash Algorithm, 234 Degree matrix, see Laplacian Multiplication, 31, see Matrix multi- matrix, see Symmetric matrix plication, 189, 204 Matrix multiplication, 38, 39, 48, 69 Multiset, see Bag Matrix of distances, 441 Multistage Algorithm, 232 Matthew effect, 16 Multiway join, 56, 402 Max pooling, 553 Mumick, I.S., 174 Maximal itemset, 224 Mutation, 111 Maximal matching, 299 Maximum-likelihood estimation, 381 Name node, see Master node McAuley, J., 427 Natural join, 34, 37, 38, 55 McCauley, M., 79 Naughton, J.F., 78 Mean, see Average Naumann, F., 78 Mean squared error, 536 Navathe, S.B., 250 Mechanical Turk, 470 Near-neighbor search, see Locality-sens- Median, 156 itive hashing Mehta, A., 318 Nearest neighbor, 18, 464, 468, 497, Melnik, S., 427 518 Merging clusters, 258, 261, 272, 276, Negative border, 242 281, 285 Negative example, 472 Merton, P., 20 Neighbor, 390 Miller, G.L., 427 Neighborhood, 406, 418 Minhashing, 82, 90, 103, 106, 113, 326 Neighborhood profile, 406 Minibatch gradient descent, 496 Netflix challenge, 2, 322, 349, 516 Minicluster, 270 Network, see Social network Minsky, M., 522 Neural net, 18, 467, 523, 524 Minutiae, 125 Neuron, 527 Mirrokni, V.S., 142 Newman, M.E.J., 426 Mirror page, 83 Newspaper articles, 128, 313, 322 Mitzenmacher, M., 141 Node, 508, 523 ML, see Machine learning Node pruning, 514 MLE, see Maximum-likelihood estima- Non-Euclidean distance, 264, see Co- tion sine distance, see Edit distance, MNIST dataset, 547 see Hamming distance, see Jac- Model, 381 card distance INDEX 579

Non-Euclidean space, 278, 280 Parallelism, 513 Norm, 105, 106 Parametric ReLU, 535 Norm penalty, 565 Parent, 364 Normal distribution, 269 Park, J.S., 250 Normalization, 333, 335, 346 Partition, 374 Normalized cut, 375 Pass, 222, 225, 233, 238 NP-complete problem, 369 Path, 405 Numerical feature, 464, 509, 517 Paulson, E., 78 PCA, see Principal-component analy- O’Callaghan, L., 292 sis Off-line algorithm, 296 PCY Algorithm, 230, 233, 234 Olston, C., 79 Pedersen, J., 212 Omiecinski, E., 250 , 18, 463, 467, 471, 517, 523, On-line advertising, see Advertising 524 On-line algorithm, 17, 296 Perfect matching, 299 On-line learning, 469 Permutation, 90, 99 On-line retailer, 216, 294, 320, 321 Peters, M., 78 Onose, N., 78 Phishing, 2 Open directory, 196, 470 PIG, 77 OR-construction, 114 Pigeonhole principle, 369 Orr, G.B., 521 Piotte, M., 353 Orthogonal vectors, 256, 434 Pivotal condensation, 431 Orthonormal matrix, 443, 448 Plagiarism, 83, 217 Orthonormal vectors, 435, 438 Pnuts, 77 Out-component, 181 Point, 253, 283 Out-degree, 416 Point assignment, 255, 266, 362 , 255 Polyzotis, A., 77 Output, 64, 464 Output gate, 563 Pooling function, 553 Output layer, 525 Pooling layer, 527, 553 Overfitting, 331, 348, 467, 468, 481, Position indexing, 133, 135 506, 514, 518, 565 Positive example, 472 Overlapping Communities, 381 Positive integer, 168 Overture, 303 Powell, A.L., 292 Owen, A.B., 142 Power iteration, 431, 432 Own pages, 200 Power law, 14 Predicate, 330 Paepcke, A., 142 Prefix indexing, 132, 133, 135 Page, L., 175, 212 Pregel, 51, 77 PageRank, 4, 17, 31, 32, 49, 175, 177, Principal eigenvector, 179, 431 189 Principal-component analysis, 429, 436 Pairs, see Frequent pairs Priority queue, 261 Palmer, C.R., 427 Priors, 383 Pan, J.-Y., 427 Privacy, 296 Papert, S., 522 Probe string, 133 580 INDEX

Profile, see Item profile, see User pro- Regularization parameter, 490 file Reichsteiner, A., 461 Projection, 34, 36 Reina, C., 292 Pruhs, K.R., 318 Relation, 33 Pseudoinverse, see Moore-Penrose pseu- Relational algebra, 33 doinverse ReLU, 534, 562 Puz, N., 78 Replication, 24 PyTorch, 530, 570 Replication rate, 61, 68 Representation, 278 Quadratic programming, 492 Representative point, 275 Query, 146, 165, 287 Representative sample, 149 Query example, 498 Reservoir , 174 Quinlan, J.R., 522 Residual PageRank, 393 Resilient distributed dataset, 43 R-tree, 292 Response, 549 Rack, 22 Restart, 391 Radius, 263, 265, 406 Retained set, 270 Raghavan, P., 20, 212, 426 Revenue, 304 Rahm, E., 427 Rheinlander, A., 78 Rajagopalan, S., 20, 212, 426 Ripple-carry adder, 168 Ramakrishnan, R., 78, 292 RMSE, see Root-mean-square error Ramsey, W., 317 RNN, see Random hyperplanes, 117, 326 Robinson, E., 78 Random interconnection layer, 527 Rocha, L.M., 461 Random surfer, 176, 177, 182, 196, 390 Root-mean-square error, 322, 341, 447, Randomization, 238 536 Rank, 442 Rosa, M., 426 Rarest-first order, 313 Rosenblatt, F., 522 Rastogi, R., 174, 292 Rounding data, 335 Rating, 320, 323 Row, see Tuple RDD, see Resilient distributed dataset Row-orthonormal matrix, 448 Reachability, 407, 408, 415 Rowsum, 278 Recommendation system, 18, 319 Rectified linear unit, see ReLU Royalty, J., 78 Recurrent neural network, 18, 523, 557 Rumelhart, D.E., 570 Recursion, 49 Recursive doubling, 411, 415 S-curve, 102, 111 Reduce task, 25, 27 Saberi, A., 318 Reduce worker, 28, 30 Salihoglu, S., 78 Reducer, 27, 45 Sample, 238, 242, 245, 247, 267, 275, Reducer size, 61, 67 279 Reed, B., 79 Sampling, 148, 162 Regression, 464, 502, 518 Saturation, 531 Regression loss, 536 Savasere, A., 250 Regularization, 565 Sax, M.J., 78 INDEX 581

SCC, see Strongly connected compo- Six degrees of separation, 408 nent, see Strongly connected Sketch, 119 component Skew, 28 Schapire, R.E., 521 Sliding window, 146, 162, 169, 283 Schelter, S., 78 Smart transitive closure, 413, 415 Schema, 33 Smith, B., 353 Schmidhuber, J., 569 SNAP, 426 Schutze, H., 20 Social Graph, 356 Score, 123 Social network, 18, 355, 356, 429 Search ad, 294 Softmax, 533 Search engine, 187, 203 SON Algorithm, 240 Search query, 145, 176, 198, 294, 312 Source, 405 Second-price auction, 305 Source node, 389 Secondary storage, see Disk Space, 105, 253 Selection, 33, 35 Spam, see Term spam, see Link spam, Seminaive evaluation, 409, 411, 413, 358, 470 414 Spam farm, 199, 202 Sensor, 145 Spam mass, 202, 203 Sentiment analysis, 471 Spark, 41, 43, 51, 77, 79 Set, 89, 131, see Itemset Sparse matrix, 31, 90, 91, 189, 190, 320 Set difference, see Difference Spectral partitioning, 374 Shankar, S., 78 Spider trap, 182, 185, 205 Shawe-Taylor, J., 521 Split, 46 Shenker, S., 79 Splitting clusters, 281 Shi, J., 427 SQL, 22, 33, 77, 79 Shikano, K., 570 Squared error, 536 Shim, K., 292 Squares, 404 Shingle, 86, 103, 128 Srikant, R., 250 Shivakumar, N., 250 Srivastava, U., 78, 79 Shopping cart, 216 Standard deviation, 271, 273 Shortest paths, 52 Standing query, 146 Siddharth, J., 142 Stanford Network Analysis Platform, Sigmoid, 532 see SNAP Signature, 82, 89, 91, 103 Star join, 60 Signature matrix, 92, 100 Stata, R., 20, 212 Silberschatz, A., 174 Statistical model, 2 Silberstein, A., 78 Status, 313 Similarity, 5, 17, 82, 213, 326, 334 Steinbach, M., 20 Similarity join, 62 Step function, 530, 531 Simonyan, K., 554 Stochastic gradient descent, 348, 495 Simrank, 389 Stochastic matrix, 179, 431 Singleton, R.C., 174 Stoica, I., 79 Singular value, 443, 447, 448 Stop clustering, 259, 263, 265 Singular-value decomposition, 340, 429, Stop words, 9, 88, 128, 217, 325 442, 452 Stratosphere, 77 582 INDEX

Stream, see Data stream Teleport set, 196, 197, 202, 391 Strength of membership, 387 Teleportation, 186 Stride, 551 Tendril, 181 String, 131 Tensor, 48, 546 Striping, 32, 189, 191 TensorFlow, 41, 79, 530, 538, 543, 570 Strong edge, 358 Term, 176 Strongly connected component, 181, 417 Term frequency, 9, see TF.IDF Strongly connected graph, 179, 406 Term spam, 176, 199 Substochastic matrix, 182 Test set, 468, 475 Suffix length, 135 TF, see Term frequency Summarization, 4 TF.IDF, 9, 325, 467 Summation, 168 Theobald, M., 142 Sun, J., 461 Thrashing, 191, 230 Supercomputer, 21 Threshold, 102, 171, 214, 240, 244, 471, Superimposed code, see Bloom filter, 477 173 TIA, see Total Information Awareness Supermarket, 216, 238 Timestamp, 163, 284 Superstep, 52 Toivonen’s Algorithm, 242 , 463, 465 Toivonen, H., 250 Support, 214, 239, 240, 242, 244 Tomkins, A., 20, 79, 212, 426 Support vector, 486 Tong, H., 427 Support-vector machine, 18, 463, 468, Topic-sensitive PageRank, 195, 202 485, 517 Toscher, A., 353 Supporting page, 200 Total Information Awareness, 6 Suri, S., 427 Touching the Void, 323 Surprise number, 158 Training example, 464 Sutskever, I., 569 Training rate, 475 SVD, see Singular-value decomposition Training set, 463, 464, 470, 480 SVM, see Support-vector machine Transaction, see Basket Swami, A., 250 Transformation, 44 Symmetric matrix, 377, 430 Transition matrix, 391 Szegedy, M., 174 Transition matrix of the Web, 178, 189, 190, 192, 429 Tag, 326, 359 Transitive closure, 49, 407 Tail, 413 Transitive reduction, 418 Tail length, 155, 419 Transpose, 205 Tan, P.-N., 20 Transposition, 111 Target, 405 Tree, 260, 278, 279, see Decision tree Target page, 200 Triangle, 399 Tarjan, R.E., 417 Triangle inequality, 105 Task, 23 Triangular matrix, 223, 232 Taxation, 182, 185, 200, 205 Tripartite graph, 359 Taylor expansion, 14 Triples method, 223, 232 Taylor, M., 317 TrustRank, 202 Telephone call, 358 Trustworthy page, 202 INDEX 583

Tsourakakis, C.E., 427 Wang, W., 142 Tube, 182 Warneke, D., 78 Tuple, 33 Weak edge, 358 Tuzhilin, A., 352 Weaver, D., 78 Twitter, 18, 313, 356 Web structure, 181 Tzoumas, K., 78 Weight, 471, 526 Weiner, J., 20, 212 Ullman, J.D., 20, 77–79, 250, 292, 426 Whizbang Labs, 3 Undirected graph, see Graph Widom, J., 20, 79, 174, 292, 426 Union, 34, 36, 40, 85 Wikipedia, 358, 470 Unit vector, 430, 435 Williams, R.J., 570 Universal set, 131 Window, see Sliding window, see De- , 463 caying window Upper hyperplane, 487 Windows, 13 User, 320, 336, 337 Winnow Algorithm, 475 User profile, 328 Word, 217, 254, 325 Utility matrix, 320, 323, 340, 429 Word count, 25, 44 UV-decomposition, 340, 350, 429, 496 Worker process, 28 Workflow, 42, 49, 54 VA file, 503 Working store, 144 Valduriez, P., 427 Validation set, 468, 567 Xiao, C., 142 Van Loan, C.F., 460 Xie, Y., 461 Vanishing gradient, 561 Vapnik, V.N., 521 Yahoo, 303, 326 Variable, 158 Yang, J., 427 Vassilvitskii, S., 427 Yerneni, R., 78 Vazirani, U., 318 York, J., 353 Vazirani, V., 318 Yu, J.X., 142 Vector, 31, 105, 109, 179, 189, 204, Yu, P.S., 250 205, 254 Yu, Y., 79 Vernica, R., 78 VGGnet, 553 Zaharia, M., 79 Vigna, S., 426 Zero padding, 551, 559 Vitter, J., 174 Zhang, C.H., 142 Volume, 397 Zhang, H., 461 Volume (of a set of nodes), 375 Zhang, T., 292 von Ahn, L., 327, 353 Zipf’s law, 15, see Power law von Luxburg, U., 427 Zoeter, O., 317 Voronoi diagram, 498 Zussman, A., 554

Waibel, A., 570 Wall, M.E., 461 Wall-clock time, 55 Wallach, D.A., 78 Wang, J., 350