<<

Novel Frameworks for Mining Heterogeneous

and Dynamic Networks

A dissertation submitted to the

Graduate School

of the University of Cincinnati

in partial fulfillment of the

requirements for the degree of

in the

Department of Electrical and Computer Engineering and

of the College of Engineering

by Chunsheng Fang

B.E., Electrical Engineering & , June 2006

University of Science & Technology of China, Hefei, P.R.China

Advisor and Committee Chair: Prof. Anca L. Ralescu

November 3, 2011

Abstract

Graphs serve as an important tool for discrete data representation. Recently, graph representations have made possible very powerful , such as manifold learning, kernel methods, semi-supervised learning. With the advent of large-scale real world networks, such as biological networks (disease network, drug target network, etc.), social networks (DBLP Co- authorship network, Facebook friendship, etc.), machine learning and data mining algorithms have found new application areas and have contributed to advance our understanding of proper- ties, and phenomena governing real world networks.

When dealing with real world data represented as networks, two problems arise quite naturally:

I) How to integrate and align the knowledge encoded in multiple and heterogeneous networks? For instance, how to find out the similar genes in co-disease and protein-protein inter- action networks?

II) How to model and predict the of a dynamic network? A real world exam- ple is, given N years snapshots of an evolving social network, how to build a model that can cap- ture the temporal evolution and make reliable prediction?

In this dissertation, we present an innovative graph embedding framework, which identifies the key components of modeling the evolution in time of a dynamic graph. Different from the many state-of-the-art graph link prediction and modeling algorithms, it formulates the link prediction problem from a geometric perspective that can capture the dynamics of the intrinsic continuous graph manifold evolution. It is attractive due to its simplicity and the potential to relax the mining problem into a feasible domain which enables standard machine learning and regression models to utilize historical graph time series data.

I

To address the first problem, we first propose a novel probability-based similarity measure which led to promising applications in content based image retrieval and image annotation, fol- lowed by a manifold alignment framework to align multiple heterogeneous networks, which demonstrate its power in mining biological networks.

Finally, the dynamic graph mining framework generalizes most of the current graph embedding dynamic link prediction algorithms. Comprehensive experimental results on both synthesized and real-world datasets demonstrate that our proposed algorithmic framework for multiple heteroge- neous networks and dynamic networks, can lead to better and more insightful understanding of real world networks. Scalability of our algorithms is also considered by employing MapReduce cloud computing architecture.

II

III

Acknowledgements

The doctoral study is a long and winding journey. I could not earn my lifelong prefix “Dr.” in front of my name without all kinds of help.

My first appreciation is for my academic adviser Anca Ralescu, who possesses all the qualities of a great adviser one can imagine. She supported my research explorations over a wide spectrum of problems during the four plus years I spent at UC; she had the patience to advice me from high level guidance to every single detail of my research; her warm encouragement to the students helps them broaden their horizons and always stay with state-of-the-art of machine learn- ing research; she willingly shares her experience from research to everyday life, such as careful planning and sustaining efforts to reach the goal. These will be my lifelong treasures.

My next appreciation goes to my dissertation committee. I have taken almost all advanced level

CS courses taught by Kenneth Berman, Yizong Cheng, and Fred Annexstein. These courses aided my computer science knowledge foundation. Without their distinguished lectures, it would have been much more difficult for an Electrical Engineering graduate to step into the shrine of Computer Science. Professor Anil Jegga helped me with research;

Prof. Dan Ralescu always had a way to illustrate abstract mathematical concepts as vividly as his cello melody.

During my research assistant period at the BioMedical Division, Cincinnati Chil- dren’s Hospital Medical Center, Prof. Jason Lu and Prof. Aaron Zorn, supported my research and guided me into the horizon of computational . Their talents in exploring cutting- edge interdisciplinary research territories of machine learning, bioinformatics and medical imag- ing mining, resources management and academic writing skills, have significantly helped me shape my technical problem solving abilities.

IV

Thanks to the world-leading co-op/internship programs offered by University of Cincinnati, I was fortunate enough to have two precious internship experiences:

In April 2011, I did my Software Design Engineer internship at one of the world’s best and most competitive IT companies – Amazon.com. I spent 3 months of wonderful learning life in Amazon

Seattle headquarter, and eventually delivered my project to production. Special thanks owes to my mentor Catalin Constatin and the whole Payment Platforms team, also to Amazon TRMS re- search team who gave me lots of valuable comments after my invited seminar talk.

My next internship role was a research scientist in another outstanding company, Riverain Medi- cal Group. Working together with the great R&D team and supervised by Jason F. Knapp, we were able to push the medical imaging products to the next milestone in industry.

I appreciate the CS department interim director Prof. Prabir Bhattachaya, and acting head Prof.

Raj Bhatnagar, CSGSA, GSGA, and all related administrative personnel such as Julie Muenchen etc., for supporting me to present our research in world-class research conferences such as ACM

SIG KDD@San Diego, NIPS@Vancouver, ICPR@Tempa, etc.

There would be a long list of my UC friends to appreciate that might comprise of another volume of a dissertation. To name a few but not limited to:

Ravikumar Sugandharaju who organized the afternoon coffee discussions for hacking the world’s most challenging technical coding problem; Friends in the Machine Learning and Computational

Intelligence (MLCI) research group led by my adviser, including Mojtaba Kohram, Mohammad

Rawashdeh: lots of ideas were generated during the brain storming in the MLCI weekly meeting;

Minlu Zhang, Jingyuan Deng, Xiao Zhang, Xiangxiang Meng, Chen Lu, Yingying Wang, etc. who had great discussions about bioinformatics or statistics. DQE study group in 2008, Vaibhav

Pandit, and Aravind Ranganathan who made our summer-long preparation for comprehensive

V exams so much fun; Friends I met from research conferences have also gave me lots of inspira- tions of how to solve research problems.

Also, I’d like to express my appreciation to those professors and friends in China who have sup- ported my application to USA graduate school study: Prof. Stan Z. Li (CAS), Prof. Zhongfu Ye

(USTC), Prof. Peijiang Yuan (CAS), Prof. Jirong Chen (USTC), Prof. Shoumei Li (BJUT), and my friends (all are Ph.D’s now) in NLPR, CASIA: Meng Ao, Zhen Lei, Shengcai Liao, Ran He,

Xiaotong Yuan, Dong Yi, Weishi Zheng, Rui Wang, etc.

At the end, I wish to thank my family: my beloved wife Junjun Yu for her warm love, support and culinary delicacies that fit right into my appetite for years; My father, mother, grandma and all other family members on the other side of this planet have been my driving forces. My life journey started under their love, guidance and inspiration, which prepared me for everything that came my way.

In retrospect, I would like to summarize my Ph.D journey as this quote:

Far and away the best prize that life offers is the chance to work hard at work worth doing.

-Theodore Roosevelt, Labor Day address, 1903

VI

To my family,

and

My late grandfather

VII

Contents

Chapter 1 Problem Statement and Introduction ...... 1 1 . 1 Mining Heterogeneous Domain Knowledge Problem ...... 1 1 . 2 Mining Dynamic Network Problem ...... 3 1 . 3 Spectral and manifold ...... 4 1 . 4 Graph embedding ...... 7 1 . 5 Similarity measure for Heterogeneous data ...... 7 1 . 6 Predictive models ...... 8 1 . 7 Roadmap of the Dissertation ...... 10 Chapter 2 Related work ...... 11 2 . 1 Combining similarity in image retrieval ...... 11 2 . 2 Graph-based knowledge transfer ...... 12 2 . 3 Manifold learning for visualization...... 15 2 . 4 Mining Multiple Graphs ...... 15 2 . 5 Mining Dynamic Graphs ...... 16 Chapter 3 Probability-Based Similarity Approach For Combining Heterogeneous Domain Knowledge ...... 19 3 . 1 Proposed approach : ProbSim ...... 19 3 . 2 Probability based approach to similarity evaluation versus Euclidean distance... 19 3 . 3 Heterogeneity due to different underlying distributions ...... 20 3 . 4 Combining probability-based similarity across features ...... 21 3 . 5 Content based image retrieval results ...... 22 3 . 6 Image annotation application results ...... 24 Chapter 4 Framework for mining multiple heterogeneous networks ...... 30 4 . 1 Manifold alignment with Procrustes for heterogeneous networks ...... 31 4 . 2 Experimental dataset...... 35 4 . 3 Experimental results ...... 36 Chapter 5 Framework for Mining Dynamic Networks ...... 41 5 . 1 Spectral Regression with Low-Rank Approximation Approach for Link Prediction in Dynamic Graphs[40, 41] ...... 41 5 . 2 Key components of the dynamic network mining framework[46] ...... 46

VIII

Chapter 6 Experimental Results of Mining Dynamic Network...... 52 6 . 1 Experiment Data: DBLP co-authorship network ...... 52 6 . 2 Validating Manifold Alignment ...... 54 6 . 3 Gravitational collapse of trajectories ...... 56 6 . 4 Trajectory Modeling Results ...... 56 6 . 5 Vertex Behavior Modeling ...... 60 6 . 6 Identifying Exemplar authors ...... 60 6 . 7 Reconstructing the Predicted Network ...... 63 6 . 8 Link Prediction performance analysis ...... 64 6 . 9 Cloud computing performance analysis ...... 67 Chapter 7 Discussion and future work ...... 70 7 . 1 Framework for mining heterogeneous networks ...... 70 7 . 2 Framework for mining dynamic networks ...... 70 Publications during Doctoral study ...... 73 Bibliography: ...... 75 Appendix I: More results for ProbSim image annotation ...... 78 Appendix II: DBLP Core author set ...... 79

IX

List of Figures

1-1 Road map of the dissertation 10

2-1 How LLE unrolls the “Swiss roll” nonlinear data. 13

2-2 LGC on “two moon” classification dataset 14

2-3 Static methods 17

3-1 PDF of L1 distances for RGB histogram feature in Corel5K dataset 23

3-2 Probsim aggregation CBIR result for “baby” query image. 24

3-3 Using only color features 29

3-4 Using only texture features 29

3-5 Using all color and texture features 29

3-6 ProbSim outperform JEC 30

3-7 kNN neighborhood size and Precision, Recall, Classification rate 30

4-1 The framework for mining multiple heterogeneous networks 31

4-2 Analogy to Fourier transform in signal processing. 32

4-3 Manifold alignment by Procrustes analysis. 33

4-4 Graph embedding and manifold alignment on synthetic networks G1 and G2. 34

4-5 An example orphan disease-gene network. 35

4-6 Aligning co-disease and protein-protein interation networks. 38

4-7 The zoomed in subgraph inferring gene FBLN1 is similar to FGA, LYZ, etc. 39

4-8 Another case study of 1st neighbor genes : CRYAB,NR4A2,ACTN4. 40

5-1 Graph Spectrum Regression Link Prediction Algorithm 44

5-2 Synthetic dynamic graphs and their time series. 45

5-3 The framework for link prediction problem in temporal social network. 46

X

5-4 The evolution of graph manifolds along time. 47

6-1 One sample DBLP XML entry for a co-authored publication article. 53

6-2 The DBLP co-authorship core set network snapshot at year 1995. 53

6-3 Trajectories of four different alignments for the real-world DBLP data set. 55

6-4 Gravitational collapse of trajectories to singularity. 55

6-5 The scatter plots of the estimated and predicted coordinates vs. the true coordinates. 58

6-6 Trajectory behavior analysis. 59

6-7 Trajectory behavior clustering for the Procrustes alignment. 61

6-8 An example of how affinity propagation works. 62

6-9 Convergence of AP algorithm on the Procrustes manifold alignment to 1st year DBLP dataset. 62

6-10 Sixty-six exemplar authors set computed by affinity propagation 63

6-11Ground truth embedding at the last year and the predicted embedding. 65

6-12 Link prediction accuracy on EES: manifold alignment vs. random guess 66

6-13 Major components of MapReduce architecture 68

6-14 Hadoop: HDFS Namenode, Datanode, MapReduce JobTracker, etc. 68

XI

List of Tables

3-1 Result of similarity evaluation and aggregation using ProbSim and literature. 27

4-1 Network statistics of orphan disease networks. 36

4-2 ToppGene results for the inferred gene list. 39

6-1 Estimation and Prediction Errors for the four alignments in linear and quadratic nested random effect models. 57

6-2 More information about all authors of interest in this study. 59

6-3 Performance of the algorithm with different alignment and regression methods com- pared to random. 64

6-4 Scalability performance on Hadoop. 69

XII

Chapter 1 Problem Statement and Intro-

duction

It is not knowledge, but the act of learning, not possession but the act of getting there,

which grants the greatest enjoyment.

– Carl Gauss, Letter to Bolyai, 1808

We start with the preliminary background knowledge introduction and problem definition to guide into understanding our contributions.

1 . 1 Mining Heterogeneous Domain Knowledge Problem

1.1.1 Motivation

The history of human being is the history of knowledge, invention, and dreams, all towards the goal of continuous improvement of the human condition. With the advent of computer based in- formation processing, knowledge discovery and accumulation has been greatly facilitated, while at the same time new challenges have appeared. Integrating interconnected pieces of knowledge from and across multiple domains has emerged as one for the most interesting, useful, and yet difficult problems to be addressed, for which new paradigms and algorithms are necessary.

The research described in this dissertation is motivated by this need and puts forward a collection of tools for analysis of multiple heterogeneous networks.

Two key problems in the area of multiple heterogeneous networks are considered:

1. Prediction and prioritizing of a node (graph vertex) in multiple networks.

1

2. Identification of high topological correspondence nodes among multiple heterogeneous net-

works.

Examples of the former include the transcriptional regulations networks (Transcription Factor –

Target Gene networks) controlling surfactant homeostasis in the lung, which will be introduced with more details later in this dissertation. The latter is illustrated by the following: given two gene networks, constructed from heterogeneous knowledge domains, one from pathway sharing and the other one from mouse phenotype, is it possible to align the topologies of these two graphs and find out those topologically related genes? Identifying such pathway and specific phenotype related genes can help researchers focus on the subset of most relevant genes to a disease or de- velopment process, and generate further testable biological hypotheses.

Interestingly, the first problem can be restated as a machine learning problem, more precisely in terms of manifold learning and semi-supervised learning. The second problem turns out to be the classical computer science problem, “graph isomorphism”[1], whose complexity still remains unknown, but its variant, “subgraph isomorphism” is known to be NP-complete[2], and therefore warrants an optimal approximation and scalable algorithm.

1.1.2 Problem Statement

We begin with 2 definitions that lead to the problem statement.

Definition 1.(Graph) A graph is a pair of sets, G =(V, E) , where V is the set of all vertices, and

E is edge set E={(v1, v2) | vi such that v1 and v2 are connected}

By default, in this dissertation, we consider undirected graphs, that is, (v1,v2) E if and only if

(v2, v1) E.

2

Definition 2. (Graph isomorphism) Two graphs G1 =(V1, E1) and G2=(V2, E2) are said to be iso- morphic if there exists a mapping m: V1  V2 such that for ij,

( ) ( )

Finding such a mapping for any two graphs is known to be more difficult than NP-complete, that is, no polynomial time algorithm is yet known to solve it.

Reformulated, the problem of mining heterogeneous domain knowledge is the problem of finding an approximation of the graph isomorphism problem, by embedding the discrete graph into a continuous manifold.

Thus the problem of Mining Heterogeneous Domains can now be restated as follows: Given two matrix representations of graphs X1, X2, find the mapping m which minimizes the Frobenius norm ‖ ‖ . The graph representation corresponds to embedding the graph into a con- tinuous space.

1 . 2 Mining Dynamic Network Problem

1.2.1 Motivation

With the recent advent of large-scale social networks and biological networks etc., link prediction in temporal networks emerges as a more and more important research problem. Several survey papers have summarized some recent progress in link prediction [3] and link mining [4]. A thor- ough survey paper [3] summarizes various “static” methods that explore different graph distance metrics on one network snapshot and try to predict the next state of the network.

Alternatively, recent research focuses more on formulating the link prediction problem in a “dy- namic” way, for example, as a time series regression model to accommodate historical data [4, 5] .

3

This is done in a more general framework by modeling the whole graph historical dynamics to provide a more insightful understanding into the link prediction problem.

1.2.2 Problem Statement

To start off we define a dynamic graph concept and then the problem.

Definition 4.(dynamic graph) A dynamic graph Gt with horizon T, is an un-weighted undirected graph, with vertex set Vt and edge set Et, where t = 1, 2, …, T. Here Gt denotes the graph at time moment t.

In the work described here the dynamic aspect of the graph is given exclusively by changes in the edge set, more precisely, by the addition of new links. This means that the vertex set is constant,

Vt=V, and hence Gt=(V, Et), with

The link prediction problem for dynamic graph: Given a dynamic graph, } as defined in Definition 4, the link prediction problem consists in predicting , the state of the graph at time T+1.

The approach to the link prediction problem described in this work makes use of concepts and results from spectral graph theory.

1 . 3 Spectral graph theory and manifold

The concepts we define in the previous sections are living in the discrete world. Fortunately there are some interesting theories to connect them to the continuous universe, one of which is spectral graph theory.

Definition 5. Given a graph G, its adjacency matrix A is defined as:

4

{

When G is undirected the adjacency matrix is symmetric, in which case its eigenvalues are real.

Definition 6. Let G be a graph with adjacency matrix A, and let denote the degree (number of edges) of vertex .

(1) The Laplacian matrix L is defined as:

{

(2) The Unnormalized Laplacian matrix U, is defined as:

{

(3) Graph spectrum of L is defined as all eigenvalues of L, denoted as

.

It can be readily seen that

⁄ ⁄ ⁄ ⁄

Throughout this work we assume that the graph G is connected, in which case for all ver- tices,

Definition 7. The Harmonic eigenfunction f of graph Laplacian L is defined as the solution of equation:

More specifically, f is the solution of the Rayleigh Quotient problem defined on the Laplacian L:

5

Remark: Theoretically, the graph Laplacian is a special case of Laplacian-Beltrami operator on continuous Riemann manifold:

| | ∫

| | ∫

s.t. ∫

This is an important result as it connects discrete representations with continuous counterparts.

Definition 8: Laplacian operator is a second order differential operator in the n-dimensional

Euclidean space.

Where ( ·) is the divergence, and ( f) is the gradient.

Laplacian operator has some nice properties which results in one important manifold learning algorithm, Laplacian Eigen Map[6], which describes the Laplacian operator defined on Riemann manifold and Laplacian matrix.

Definition 9 (Graph diameter). Let G be a graph with the vertex set V={ , i ∊{1, …, N}}.

(1) The distance between two vertices is defined by the length of the shortest path connect-

ing them

d(vi, vj) = where denotes the set of lengths all possible paths

connecting to .

(2) The diameter of G, diam(G) is defined as the largest vertex distance G

{ ( )| ∊

6

1 . 4 Graph embedding

Now we move into the methodology to represent the discrete graph in a Euclidean coordinate sys- tem, which preserves the pairwise distance or similarity.

Definition 10. The manifold[7] is a topological space that is locally Euclidean, i.e., around every point, there is a neighborhood that is topologically the same as the open unit ball in .

Definition 11. The generic graph embedding problem is the following[8]. In a very general set- ting, edges of the graph G may be weighted. In this case, the adjacency matrix A is replaced by a weight matrix W. Embedding of a graph G with vertex set V = {v1, …, vm}, and weight matrix

W, in the d-dimensional Euclidean space Rd, amounts to finding yi in Rd such that yi “represents” vi.

Good embedding should preserve relationships, such as node distance or similarity from the orig- inal graph representation.

1 . 5 Similarity measure for Heterogeneous data

Similarity measure is a fundamental research problem in machine learning. With the emergence of multiple domain knowledge, how to evaluate the similarity among heterogeneous data be- comes more and more important.

Definition 12. Heterogeneous data are multidimensional data whose components lie in different domains. The key word in the preceding statement is the word different and the treatment of het- erogeneity will depend on what is meant by it. The components xi may lie in different spaces (e.g. real, categorical, ordinal, sets), or their domains may all be subsets of the same space. In the for-

7 mer case, the meaning of different is rather obvious, while in the latter is not necessarily so. For example, suppose that all the domains are subsets of the real numbers, i.e. Di ⊂ R. Then one way to define Di different from Dj is to say that Di ∩ Dj =Ø, or, by relaxing this requirement, that they overlap very little. This study considers a particular case of heterogeneity, when data may come from the same domain, even same range of values, but from different underlying distributions.

1 . 6 Predictive models

In machine learning such as link prediction, one important task is how to model the observations’ properties, such as distribution, so we can make prediction in future. It is a very general category of algorithms, and here we only brief several related predictive models.

1.6.1 Regression

Definition 13 (Regression[9]). A method for fitting a curve (not necessarily a straight line) through a set of points using some goodness-of-fit criterion. The most common type of regression is linear regression, which is defined as:

Where are given as observations, and the model parameter is β and needs to be estimated by minimizing Least Square Error. More specifically,

Note that regression is one special case of Generalized Linear Model[10] .

8

1.6.2 Time Series ARMA

Different from the regression model, time series analysis models the temporal signals, such as daily stock prices, Dow Jones index, etc.

Definition 14 (ARMA model [5]) Given a time series Xt; t = 1; 2; …, T, ARMA(p; q) refers to the model with p autoregressive terms and q moving average terms, and is defined as:

where εt denotes white noise and c and θi are model parameters which need to be optimally com- puted.

The ARMA model is optimally fitted by least squares regression to find the values of the parame- ters which minimize the error term. To avoid over-fitting, it is generally considered good practice to find the smallest values of p and q which provide an acceptable fit to the data, and auto- correlation and partial auto-correlation functions can be used as a good analysis tool

1.6.3 Kalman filtering

Kalman filtering is one kind of Bayesian filtering for trajectory prediction. It assumes an underly- ing finite state machine which keeps emitting observations polluted by Gaussian noise. It has been successfully applied in video object tracking[11].

Definition 15. Kalman filter[10] can model the linear dynamic system, e.g. tracking of moving vehicles trajectories, which assumes Gaussian noise in a Bayesian filtering finite state switching machine.

9 where

. Fk is the state transition model which is applied to the previous state xk−1;

. Bk is the control-input model which is applied to the control vector uk;

. wk is the process noise which is assumed to be drawn from a zero mean with covariance Qk.

1 . 7 Roadmap of the Dissertation

The following chapters of this dissertation are organized as show in Figure 1-1.

Probability based Multiple Dynamic manifolds Similarity networks

Image annotation Temporal social networks Heterogeneous networks

Multiple biological networks

Figure 1-1 Road map of the dissertation

10

Chapter 2 Related work

“A near neighbor is better than a distant cousin.”

-- Nai’an Shi, “Outlaws of the Marsh”, 14th century

In this chapter we summarize related research that inspires our work.

2 . 1 Combining similarity in image retrieval

Standard distance combining technique L1, L2 distances are widely used as distance measures.

However, for multi-dimensional data, the heterogeneity problem is more critical since we are willing to combine data channels that are from totally different sources, e.g. different image fea- tures.

For illustration purposes, and to put the discussion of similarity measures on a more concrete ba- sis we will refer to problems in image processing/understanding/retrieval. However the concepts and approaches suggested apply to other domains.

2.1.1 Joint Equal Contribution (JEC) [12]

Each feature contributes equally towards the image distance, and JEC[12] scales each feature dis- tance by their empirically estimated upper and lower bounds into within [0,1]. Combined distance between image Ii and Ij is denoted as d(i, j), original distance as dk(i,j), and scaled distance as follows:

(1)

11 where k denotes different image features, k = 1, … , N.

2.1.2 L1-Penalized Logistic Regression (Lasso)

Lasso [13, 14] is another approach based on L1 to capture image similarity by applying regres- sion on the created training set containing pairs of similar and dissimilar images, as positive and negative samples. Using L1 norm enforces sparsity.

Our treatment of this problem by adapting probability based similarity approach to heterogeneous data. For example, in the context of image similarity, multiple image features that are naturally heterogeneous, such as texture feature, color histogram, key point descriptors, are individually evaluated and aggregated. Chapter 3 provides a detailed of this approach.

2 . 2 Graph-based knowledge transfer

2.2.1 Manifold learning and label propagation algorithms

Recently, manifold learning attracts lots of attention, due to its promise that can capture the low- dimensional manifold embedded in high-dimensional space. Different from traditional linear sub- space approaches, such as PCA, LDA, manifold learning can model nonlinear patterns.

The core of these approaches is based on graph theory: a neighborhood similarity graph is associ- ated to the data in a way that exploits the properties of this graph.

A classic example that illustrates the power of manifold learning is the “Swiss roll” data (shown in Figure 2-1). One of the first manifold learning algorithm, Locally Linear Embedding[15, 16], can “unroll” the Swiss roll dataset, and bring it down to lower dimensional space, which simply capture the implicit underlying structures that are encoded in the high-dimensional data.

12

Figure 2-1. How LLE unrolls the “Swiss roll” nonlinear data. [16]

Another interesting machine learning family which also emerged recently is the semi-supervised learning. Traditional machine learning algorithms can be roughly categorized into unsupervised learning and supervised learning, the former seeks to find out structures in the unlabeled data, such as clustering, while the latter aims to learn a discriminative model that can differentiate posi- tive and negative labeled data, such as Support Vector Machine, Linear Discriminant Analysis,

Decision Tree, Bayesian classifier, etc.

Semi-supervised learning (SSL) appears as a compromise of unsupervised and supervised, which originates from most of the real world machine learning scenarios: we can only afford a set of labeled data, but we also have huge amount of unlabeled data. Now the problem be- comes, how to utilize those unlabeled data to facilitate the machine learning algorithms?

Among the SSL algorithms, one typical approach is by Zhou et al, which is called “local and global consistency”. This algorithm consists of the following steps:

13

Local and Global Consistency algorithm

Several key components in this algorithm, which are also essential in most machine learning al- gorithms, are as follows:

 Similarity measure. In this particular approach, a graph kernel is defined as a Gaussian

kernel on local distance relationship among neighboring data points.

 How to propagate the labels. An iterative and converged fashion is adopted.

Figure 2-2. LGC algorithm on “two moon” classification dataset. The convergence process of iteration algorithm with t increasing from 1 to 400 is shown from (a) to (d). Note that the initial label information is diffused along the moons.

14

2 . 3 Manifold learning for visualization

Another advantage of manifold learning is its visualization. In [17] the authors proposed a way of visualizing the fruit fly embryo images in the embedding space.

2 . 4 Mining Multiple Graphs

Previous works on inferring regulatory networks are mainly based on microarray experiments, including differential equations [18], linear models [19, 20], Boolean networks [21], and Bayesi- an network-based approaches [22-24]. Differential equations and stochastic models simulate dy- namics of regulatory systems with detailed descriptions, but the model complexity of differential equations restricts these methods to small-scale and often well-studied regulatory systems. Linear models are robust and scalable for large datasets, but cannot capture possible non-linear relation- ship. Boolean network is the easiest model for predicting regulatory networks from gene expres- sion data. However, information loss occurs when transforming gene expression levels into Bool- ean values. Bayesian network-based methods can infer causal relationship between TFs and TGs, but these methods are more computationally complex compared with simpler models.

Semi-supervised learning is well-suited for regulatory relationship inference. Regulatory net- works are well-structured with high modularity [25]. TFs tend to form modules to co-regulate downstream TGs, and TGs tend to form modules to be co-regulated [26]. In addition, only a small number of known TF-TG regulations (labeled data) exist due to the expense to carry out experi- mental validations, and the rest TF-TG regulations (unlabeled data) need to be inferred based on their similarity to known regulations.

A most recent paper by Hwang and Kuang proposed a novel algorithm named MINProp that can propagate labels among multiple heterogeneous networks [27]. MINProp models both homoge- neous networks and the bipartite heterogeneous cross-domain networks, e.g. disease-gene net-

15 work, as affinity matrices. Given an initial label for a node in a homogeneous network, the - rithm iteratively propagates the label information through bipartite links among heterogeneous networks as well as edges in homogeneous networks based on calculated affinity matrices. And the confidence values of having the label for each node in each network is updated iteratively un- til convergence. A node is then assigned a label with the highest corresponding confidence value.

The convexity of this algorithm guarantees convergence within few steps and the bipartite associ- ations can be unraveled. MINProp achieves promising results on gene to disease phenotype asso- ciation prediction, which is a similar bipartite link prediction and prioritization problem as the one tackled in this study. Therefore we used MINProp as a comparison to evaluate the perfor- mance of both methods.

2 . 5 Mining Dynamic Graphs

Now we switch gears from multiple networks to a new dimension: the temporal direction. There are two major categories of methodologies in the state of the art research in this field, static and dynamic.

2.5.1 Static Methodologies

A comprehensive seminal survey paper has summarized [3] the following algorithms that can the distance within a graph. Worth mentioning, all these methods are taking only one sin- gle snapshot of a social network, and predict the next temporal sequences from it. The limitation is obvious, that none of these methods model the temporal dynamics.

16

Figure 2-3. Static methods summarized in [3]

2.5.2 Dynamic Methodologies

As summarized in [28], graph distance metric is essential in encoding the local and global con- sistency, but simply using it is not sufficient to capture over time. Worth mentioning, it has been gradually becoming a paradigm shift since year 2010, as more and more temporal social network data are available, and the need to come up with a more general model to utilize the his- torical data instead of simply ignoring them.

In order to utilize historical network snapshots under a general framework, a common approach is to convert the discrete graphs into the continuous space while preserving distance constraints among vertices, which can be achieved by graph embedding. Most of the existing continuous link prediction algorithms are actually different derivations of graph embedding. MDS [29] using

17 principal component of the distance matrix; Spectral embedding [6] using eigenvectors of the La- placian matrix; graph feature tracking [5] using linear embedding defined by a linear mapping.

To ensure a reliable model for estimation and prediction the stability graph-valued time series must be guaranteed. In this paper we submit that the stability aspect did not attract sufficient at- tention in previous literature. By contrast, in the framework proposed here, we point out that reli- ability of prediction is achieved only when a key graph feature is preserved in the process of graph embedding.

Since the graph embedding results in a nonlinear subspace, we refer to this constraint as manifold.

More precisely, each graph embedding can be viewed as a smooth manifold in the higher dimen- sional space.

However, the individual graph geometry such as position, orientation, scale, can be arbitrary, making the link prediction problem difficult. The manifold alignment, i.e. alignment of the mani- folds corresponding to different time points, will result in smaller variance and more stationary time series. This in turn will contribute to more accurate estimation and prediction.

To address all these above, we propose a general framework in Chapter 5 which formulates this problem as trajectory prediction in a continuous graph manifold space, and decompose it into four essential components: graph embedding, manifold alignment, trajectory prediction, graph recon- struction.

18

Chapter 3 Probability-Based Similarity Ap-

proach For Combining Heterogeneous Do-

main Knowledge

In this chapter, we present the probability-based similarity for combining heterogeneous domain knowledge, more specifically, with applications to image retrieval and annotation.

3 . 1 Proposed approach : ProbSim

Evaluation of similarity is an important aspect of machine learning algorithms. When the data are heterogeneous evaluating similarity becomes as somewhat difficult issue. The approach pursued in this chapter to similarity makes use of a probabilistic[30, 31]. This approach is introduced as follows: Let x = (x1, … , xn) denote a multidimensional data point, where xi Di, Di is a domain.

We say that x is heterogeneous if ∀ i ≠ j, i, j,= 1, . . . , n, Di and Dj are ‘different’. Note that the notion of ‘different’ may mean that the domains Di comes from different universes (e.g. real numbers, categorical data), or Di are different subsets (disjoint) of the same universe. According to the probability based approach to similarity, we aim to

 Find a mapping from distance to probability, within a specific domain;

 Aggregate the probability-based similarity across heterogeneous domains.

3 . 2 Probability based approach to similarity evaluation versus Euclidean dis-

tance

19

For ease of notation X denotes an attribute taking values in a domain DX, and a, b D denote two of its values. X is assumed to be a random variable with values in a space DX endowed with a distance measure d; X is distributed according to the distribution function F. That is, for x DX,

F(x) = P(X ≤ x). Then, following[32-34], the similarity between the values a and b of X is defined as

(1) where X, Y are independent identically distributed (iid) according to F. In order to use (2) to cal- culate SimF(a, b) one must first find the probability distribution of d(X, Y ). For example, if DX =

R, and d(X, Y) = |X−Y | the distribution of |X−Y | must be computed. The complexity of this computation depends on the distribution function F. Alternatively, using a well-known result from probability theory, according to which if X has distribution F, F(X) has distribution U[0, 1], then (2) can be replaced by

(2)

While equations (2) and (3) define a similarity measure, the extent to which they may agree de- pends on how close the distributions F and U are.

3 . 3 Heterogeneity due to different underlying distributions

Consider that a multidimensional data set of points x = (x1, … , xn), where xi Di = D, endowed with a distance d, and xi distributed according to distribution function Fi, where Fi ≠ Fj when i ≠ j.

The most common way of evaluating proximity is directly from d. For example, using the Euclid- ean distance, dE and upon normalization the similarity between a and b can be defined as

(3)

20 where M = max{ dE(x, y)|x, y D } is the maximum distance between values of X. To take into consideration the distribution underlying the data, the similarity must be defined directly from this distribution as described below.

Assume now that each attribute has a different underlying distribution: ai has a Normal distribu- tion with mean μi and variance σi . Let Gμi,σi = Prob(X ≤ x) denote the cumulative distribution function for X distributed according to Normal distribution mean μi, and variance σi . Then using d(x, y) = |x−y|, equation (1) becomes:

(4)

It is obvious that knowing the distribution (CDF) is critical in each heterogeneous feature. How- ever, often, in practice, this is not known. Instead the Empirical Cumulative Distribution Function

(ECDF) can be obtained from the data which, in turn can be used to approximate a close form

CDF.

3 . 4 Combining probability-based similarity across features

After we map the distance into a probability, we are facing the question: “How to combine these probability together?” We investigate here two different ways of combination across features.

Assume that the similarities along k features are computed as S1,…, Sk (recall that there are actu- ally probabilities). Assuming independence among features (an assumption which would be satis- fied for such features as color and texture), the overall similarity between two data points a and b would be obtained by multiplying these individual similarities. That is

(6)

21 where a = (a1, …, ak); b = (b1, … , bk), and Si = SimF (ai, bi).

With Fisher transformation, whose distribution is chi-square distribution can be employed as ag- gregation of multiple probability distributions, defined as follows:

(7)

Furthermore,

(8)

Note that the case when one of Si = 0 is treated by replacing it by a very small value ε (see (Po- povici 2008) for a discussion of this case).

3 . 5 Content based image retrieval results

The performance of the Probability-based similarity measure is illustrated on two applications:

Content based image retrieval (CBIR) and Image annotation.

This section presents results of using the probability based similarity in content based image re- trieval [6], based on both color and texture image features: normalized color histogram, and nor- malized rotation-invariant local binary pattern (LBP) histogram [3] . Therefore, an image such encoded is a heterogeneous data point. The 1,000 image dataset is selected from PSU 10,000 low resolution web-crawled misc [1]. The experiments below show that the probability- based similarity plus the aggregation using the p-value of the χ2 distribution produce better re- trieval results than the retrieval based on single image feature (color or texture).

The experiment consists of two stages, offline training and online query.

22

Figure 3-1 PDF of L1 distances for RGB histogram feature in Corel5K dataset

3.5.1 Offline Training

In this stage, the probability-based similarity is estimated from image feature vectors, for each of its heterogeneous features. Each attribute takes as values histograms, which upon normalization can be considered discrete probability distributions. Therefore, the distance between two histo- grams is evaluated using a distance between probability distributions, in particular, here the Jef- frey’s distance (12)(a symmetric version of the Kullback-Leibler distance) is used.

where p = {p1, . . . , pn}, q = {q1, . . . , qn} are two histograms.

Therefore, the first step is to compute dKL separately, for all possible pairs of color features, and texture features. This generates two 1000 × 1000 distance matrices. By sorting all elements in similarity matrix, the empirical distribution function of dKL is estimated. Figure 4.1 shows these distributions. Using equation (1) the similarity along color and texture attributes can be easily computed. Finally, the χ2 defined in (8) is used to combine these two heterogeneous similarities.

23

This step is not necessary in building up an efficient CBIR system, but for the current study it can help to visualize how the images in database are clustered. The resulting similarity matrix is shown in Figure 5. Note that this stage requires O(N2) complexity.

3.5.2 Online Query Retrieval

For the query image, first the system extracts the color and texture features. The probability-based similarities between the query image and each image in dataset, along each of its heterogeneous features are then computed. Aggregation via the Chi-Square is then computed. The top k most similar images from the image based are returned. A significant improvement in retrieval quality can be illustrated by the results for the “baby” query (image id 462).

Figure 3-2. Probsim aggregation CBIR result for “baby” query image. Left: color feature only;

3 . 6 Image annotation application results

Image annotation is another good application for our proposed approach, which seeks a good an- notation algorithm that can assign labels to unlabeled images, based on labeled image training set.

Its appeal for image annotation rests on the following assumptions:

 Similar images are more likely to share the same keywords.

 Multiple and heterogeneous image features are more likely to effectively capture mean-

ingful similarity between two images.

24

3.6.1 Image Features

Some simple image features are used to describe the low-level image statistics. Since color and textures are the most common low-level visual cues for image representation, they are also used in this study. We use the following features:

Color:

 RGB histogram, 1 ×48 vector (each channel has 16 bins);

 HSV histogram, 1 ×48 vector ;

Texture:

 Haar wavelet, 1 × 32 vector;

 Gabor wavelet, 1 ×32 vector;

3.6.2 k-Nearest Neighbor (kNN) Label Transfer

A greedy 2-stage label transfer baseline algorithm is proposed in[35], which is as follows:

 For query image Iq, k-Nearest Neighbor images are retrieved based on Probability based

similarity, which are denoted as { I1, I2, …, Ik }, where I1 is the most similar image to Iq;

 Label Transfer from the annotations for retrieval set to query image Iq .

o Sort annotations of I1 based on their global frequency estimated from training set;

o For | I1 |, transfer the n out of | I1 | highest annotations to Iq. If I1 < n, proceed to

next step; otherwise done;

o For annotations for { I1, I2, …, Ik }, select the highest ranking n- | I1 | annota-

tions based on their local frequency.

We apply this simple but efficient baseline algorithm for our image annotation.

We evaluate ProbSim-annotation algorithm on de-facto benchmark dataset, Corel5K [12] which consists of 5,000 annotated images collected from the larger Corel CD set. Among these images

4,500 samples are randomly selected as training set, and 500 are left as testing set.

25

Each result reported here is the average of 5 rounds runs. Some statistics for Corel5K are listed below:

 Each image has on average 3.5 and no more than 5 annotations.

 The dictionary contains 374 words but only 260 words are used.

 5,000 images fall into 50 folders, each of which contains 100 conceptually similar images.

3.6.3 Performance evaluation

For performance evaluation, we focus on the following measures:

 Precision (P): The annotation precision for a keyword is defined as the number of images

assigned the keyword correctly divided by the total number of images predicted to have

the keyword.

 Recall (R): The annotation recall is defined as the number of images assigned the key-

word correctly, divided by the number of images assigned the keyword in the ground-

truth annotation.

 Classification (C): The folder id can be roughly taken as classes. Therefore we can ap-

proximately evaluate the classification performance as dividing the number of images

predicted as most similar by their real folder id. Note that this metric is not standard in

image annotation literature; we use it only as another point of view to our system perfor-

mance.

 Recalled Keywords (N+): Number of keywords recalled by the system is measuring the

ability to cover the whole dataset.

26

Experimental results are reported as Table 3-1 and Figure 3-7. Table 1 shows the results for pre- cision, recall, classification and recalled key words when each of the combinationmethods

ProbSim-Chi2 and ProbSim-Prod are used. We compare these results with the results reported in focusing on three different aspects:

 Performance along each features separately.

 Performance reported in recent publication [35]

 Our proposed ProbSim with Chi2 aggregation ProbSim-Chi2 , and the product aggregation,

ProbSim-Prod.

Table 3-1: Result of similarity evaluation and aggregation using ProbSim and literature.

Method P% R% N+ C% RGB 20 23 110 - HSV 18 21 110 - Haar 6 8 53 - Gabor 8 10 72 - SML 23 29 137 - CorrLDA 6 9 59 - Lasso 24 29 127 - JEC 27 32 139 - ProbSim-Chi2 25.1 35.5 102.6 35.2 ProbSim-Prod 25.4 36.5 106.5 35.1

We can see from Table 1 that:

 ProbSim-Prod is slightly better than ProbSim-Chi2 .

 ProbSim has competitive or similar precision and recalled keywords performance

 ProbSim has outperformed others with much better recall performance.

27

3.6.4 Dependence on the kNN neighborhood size kNN is employed to select most important candidates from the query image’s neighborhood, therefore, the size of the neighborhood may influence the annotation result. Figure 3-7 shows the relationship between kNN neighborhood size and Precision, Recall, Classification rate.

We can see that as the size of neighborhood grows:

 Recall rate becomes higher because more coverage is achieved by adding more neighbors;

 Precision rate falls a little, because some incorrect annotations may be brought in by

wrong neighbors;

 The classification rate seems independent of size of neighbors.

3.6.5 ProbSim-Annotation results

Comprehensive experiments and results are illustrated in this section. In each of the figures below, above the image, is the file ID in Corel5K dataset. The first line below the image consists of pre- dicted annotations, while the second line consists of ground-truth annotations.

Comparing ProbSim using only color or texture features with all features: In Figure 3-3 and 3-4, the retrieved neighbors are not good enough to transfer the annotations to the query image. Using all of them, figure 3-5 we obtain a higher quality annotation result, noting that “water”is brought in.

Comparing ProbSim with JEC: Though ProbSim and JEC have similar performance in precision, it is still worthy to compare them through different scenarios. In Figure 3-7, ProbSim is better than JEC since JEC brings in the totally unrelated word “boats”.

More results are available on Apendix I.

28

Figure 3-3. Using only color features. Figure 3-4. Using only texture features.

Figure 3-5. Using all color and texture features.

Figure 3-6. ProbSim outperforms JEC.

29

Figure 3-7: kNN neighborhood size and Precision, Recall, Classification rate

Chapter 4 Framework for mining multiple

heterogeneous networks

Entities should not be multiplied without necessity.

-- Ockham’s razor

This chapter acts as a bridge from the probability based similarity measure on heterogeneous data, to a new ‘continent’, which is about mining of multiple heterogeneous networks, and dynamic

30 network modeling and prediction. The foundation of the two proposed algorithmic frameworks is manifold learning.

A key component in multiple heterogeneous network mining is how to find out the topological correspondence among networks. A recent paper [36] employs topological network alignment to uncover biological function and phylogeny. This field is still challenging due to its complexity.

Figure 4-1. The framework for mining multiple heterogeneous networks.

Figure 4-2, Analogy to Fourier transform in signal processing, which shares the same spirit as our approach.

4 . 1 Manifold alignment with Procrustes for heterogeneous networks

31

Manifold alignment finds the correspondence between two seemingly disparate datasets[37]. It aligns the manfolds using a set of correspondences sampled from the smooth underlying data manifold. It constitutes a critical component is our proposed framework, due to the variability of manifolds along time points. Procrustes alignment [37]seeks the isotropic dilation and the rigid translation, reflection, and rotation needed to correspond embeddings X and Y, by optimizing the following objective function:

optimal argmin ‖X kY ‖F

where k tr tr(YTY) is the diagonal matrix of the singular value decomposition (SVD) of

YTX.

With the Procrustes alignment matrix Q, correspondence between two graph manifolds can be performed with respect to different procedures. We denote the aligned manifold in time point t as:

M K Zt R

Figure 4-3. Manifold alignment example by Procrustes analysis[37].

32

Figure 4-4 further illustrates the embedding and alignment of two graphs and the effect of adding one edge. On the bottom left plot the embeddings of G1 and G2 without any alignment are shown

(circles for G1 crosses for G2). On the bottom right plot, the alignment of G1 and G2 is shown, corresponding nodes from G1 and G2 almost coincide; moreover, the addition of the edge (3, 5) pulls the nodes 3 and 5 from G2, closer.

33

Figure 4-4, graph embedding and manifold alignment on synthetic networks G1 and G2.

34

4 . 2 Experimental dataset

This section investigates the manifold alignment by Procrustes method on multiple heterogeneous biological networks. The real world dataset used is from the orphan or rare disease domain from a recent publication of a study carried out at Cincinnati Children’s Hospital Medical Center[38].

Specifically, the data sets are two heterogeneous networks of genes based on a) shared disease and b) protein-protein interation (PPI).

One example subnetwork is illustrated as Figure 4-5, which is related to rare diseases such as fa- milial Alzheimer’s disease, amyloidosis, etc. The complete network statistics are available in Ta- ble 4-1.

Figure 4-5, an example orphan disease-gene network. Red colored nodes represent orphan diseas- es while the green nodes are the causal genes. The size of node is proportional to the node or ver- tex degree.

35

Table 4-1 Network statistics of orphan disease networks

Gene-Gene #Verti- #Vertices #Edges Network Edge meaning ces(#gene #Edges #Edges shared shared In total type s) Genes that share the Co-disease 734 4817 same disease Protein Pro- 448 194 2221 Proteins that inter- tein Interac- 1173 3072 act. tion

4 . 3 Experimental results

We perform data pre-processing as follows:

Data preparations:

1. Extract the largest connected component from co-disease and PPI gene-gene networks.

2. Compute all pair wise shortest path distance kernel on each of them.

3. Get the common (overlapping nodes) gene set from the two networks;

4. Extract the sub matrix of step 2 from each network based on the common gene set, de-

noted as D1, D2, which is the distance kernel of interest;

5. Compute the graph embedding using MDS on D1, D2, resulting Y1,Y2.

Then given a genes-set of interest (GOI), we perform the following steps to infer similar genes in the embedding space. Note that the dataset consists of both training set and testing set: The GOI constitutes the training set where the manifold alignment model is estimated from; the testing set is constructed using the 1st neighborhood of each GOI defined on their corresponding network.

Case study for an input cluster:

1. Extract the known cluster of gene names, noted as gene of interest (GOI);

2. Manifold alignment with Procrustes method on the GOI subset of Y1, Y2, denoted as X1,

X2;

36

3. Visualize the X1 and aligned X2. We expect the manifold alignment will minimize the

distance between the point cloud of X1 and X2 by optimally transforming X2;

4. Evaluate the fitness of the alignment;

5. Apply this manifold transformation to the 1st order neighborhood of X1 and X2;

6. Evaluate the results by investigating in embedding space.

The evaluation is done on:

 Global metric: Sum of Squared Error (SSE) between the co-disease and aligned PPI net-

works in the embedding space;

 Local metric: case study for 2 genes in the 1st neighborhood of the GOI in each network is

quantitatively measured by gene functional enrichment using ToppGene suite[39].

The results are illustrated in the next series of figures. The sub-network is related to a rare disor- der called amyloidosis which occurs when amyloid proteins build up in various organs. Amyloid is an abnormal protein usually produced by bone marrow cells and can be deposited in any tissue or organ (heart, kidneys, liver, spleen, nervous system and digestive tract). In our result, the in- ferred gene FBLN1 is clustered with the gene network related to amyloidosis and related condi- tions suggesting that it could be a potential novel candidate gene for amyloidosis and related phe- notypes. Interestingly, when we checked if there are any "connections" between the FBLN1 and the original GOI (related to amyloidosis), we could find some shared interactions. For instances,

LYZ (from cluster 1) and FBLN1 both interact with a common protein ELN1. Likewise, FBLN1 and FGA (from cluster 1) both interact with a common protein NID1. It is to be noted that NID1 is not in our data set. Additionally, FBLN1 has been reported to directly with amyloid precursor protein (APP) and mutations in this gene have been implicated in cerebroarterial amyloidosis and autosomal dominant Alzheimer disease. We believe that this approach would be useful in identi- fying potential novel candidates (disease and also drug targets) including modifier genes.

37

Zoom 1

Zoom 2

Figure 4-6, Aligning co-disease (related to Alzheimer disease pathway) and protein-protein inter- ation networks. Each node is a gene with different color code. Each edge denotes the heterogene- ous knowledge: sharing a common disease for co-disease network; interacting protein pair in PPI network. The original co-disease (red) and PPI (green) networks are not well-aligned.

38

Figure 4-7, the zoomed in subgraph inferring gene FBLN1 is similar to FGA, LYZ, etc. based on the distance in the embedding space. Note that the model is trained on GOI, but applied to the 1st neighborhood of the GOI in each network.

Table 4-2, ToppGene results for the inferred gene list in Figure 4-7.

Category ID Name P-value Hit in Query List B2M,TTR,LYZ,G GO: Cellular SN,FBLN1,APOA GO:0005615 extracellular space 0 Component 1,IGHG1,FGA,CS T3 B2M,LYZ,FBLN1 Drug D008627 Mercuric Chloride 0.016561 ,APOA1,FGA The human plasma proteome: a non- redundant list de- TTR,GSN,FBLN1 Pubmed 14718574 0.000009 veloped by combi- ,APOA1,FGA nation of four sep- arate sources. Interaction int:FGB FGB interactions 0.006876 FBLN1,FGA HTRA2,GSN,FBL Interaction int:APP APP interactions 0.012743 N1 Interaction int:ELN ELN interactions 0.026 LYZ,FBLN1 Interaction int:NID1 NID1 interactions 0.038383 FBLN1,FGA

39

Figure 4-8. Another case study of 1st neighbor genes: CRYAB,NR4A2,ACTN4. Using the same functional enrichment analsys with ToppGene suite, we found that CRYAB,NR4A2,ACTN4 are commonly related to “Genes up-regulated in NHEK cells (normal epidermal keratinocytes) after

UVB irradiation”, with p-value of 0.03.

40

Chapter 5 Framework for Mining Dynamic

Networks

“A good hockey player plays where the puck is. A great hockey player plays where the puck is

going to be.”

--- Wayne Gretzky, via Steve Jobs

As pointed out in Chapter 2, a paradigm shift is underway in the field of social network mining, according to which a “static” view of networks is replaced by a “dynamic” view. This chapter presents such an approach where the Laplacian of a dynamic graph (a graph evolving in time) is utilized as predictor of the graph evolution. The theoretical foundations of the approach and their optimality, followed by a more general and universal framework to identify the key components of dynamic graph link prediction problem are discussed

5 . 1 Spectral Regression with Low-Rank Approximation Approach for Link Prediction in Dynamic Graphs1[40, 41]

The first important issue is to decide exactly what needs to be extracted from network such that

(i) The network can be effectively tracked, and predicted in time;

(ii) The predicted structure to (re)construct a network state is consistent with the network evolu-

tion.

1 This work is originally published in NIPS 2010 and IEEE Trans on Intelligent System, 2011.

41

For example, various features, such as node degrees of the graph representing the network can be used in conjunction with an iterative regression solver to predict the graph features as the network evolves in time[5]. Alternatively, spectral approaches attempt model graph evolution using poly- nomial curves [42]. These assume eigenvectors as stable during time. Finally, a combined time series ARMA model and low rank approximation approach for estimating the eigenvectors of the

Laplacian matrix from each time point has been proposed in [40].

5.1.1 Algorithm

Definition (Low Rank Approximation of the graph Laplacian). Given the N × N graph Laplacian matrix L, its low rank (K << N) approximation at time moment t is given by

where λ and x denote, respectively, the orthonormal eigenvectors and the corresponding eigen- values of the graph Laplacian matrix at time t. One can observe that (refer to [43] for proof)

where Lt is the actual graph Laplacian at time t, and || ||F denotes the Frobenious norm This shows that as the rank increases, the low rank approximation will naturally tend toward the actual Lapla- cian matrix, thus if enough eigenvectors are chosen, a good approximation of the Laplacian ma- trix should be obtained.

Many approaches to the link prediction problem focus on defining similarities between the verti- cesin the graph [3, 4]. These include statistical methods to predict the overall properties of the graph[44]. However, most algorithms based on these approaches still perform poorly. A more general framework with fewer assumptions on the graph topography should improve the predic-

42 tive power in dynamic graphs. A natural choice to study graph evolution over time would be to make use of a time series approach. In time series analysis[45], statistical models capture a se- quence of successive data points which are then used for prediction purposes. In particular, the

Auto Regressive Moving Average (ARMA) model is a sophisticated time series model for regres- sions and predictions of temporal signals. The first task in our approach to the link prediction is to identify those eigenvectors which are useful in solving the prediction problem.

Stated as an optimization problem, one obtaines the Raleigh Quotient problem which is optimally solved by the eigenvectors corresponding to smallest eigenvalues of the Laplacian. However, the contribution of these eigenvectors to the reconstruction of the graph Laplacian, and therefore of the graph, is very limited. The smallest eigenvectors actually reflect the graph cut property, which encodes cluster structure. On the other hand, the largest eigenvectors (corresponding to the largest eigenvalues) preserve the neighborhood structure and contribute more to the reconstruction of the graph. Therefore the K largest eigenvectors of the graph Laplacian are selected and the behavior of their elements over time is modeled using ARMA. The next issue is to determine their corre- sponding eigenvalues. From empirical data we observe that when the dynamic graph evolves gradually (i.e. few edges are added or removed at each successive time moment) the cumulative relative change in the of the eigenvalues, that is, the quantity

changes very little (by about 6%). This suggests using the spectrum of the Laplacian at time T for the low-rank approximation at time T + 1.

Now, from the predicted eigenvectors and the spectrum of the graph at time T, a low-rank ap- proximation of the graph at time T + 1 is created by weighted sum of outer products of eigenvec- tors. Since the predicted graph eigenvectors are optimal in the sense of minimizing the least square error of the objective function in the ARMA model, the result is a good estimator of the

43 graph Laplacian, with one example in the next subsection. The above discussion is captured by the algorithm shown in Figure 5-1.

Figure 5-1. Graph Spectrum Regression Link Prediction Algorithm

5.1.2 Illustrative experiments

To illustrate Algorithm we synthesize a dynamic graph with 100 nodes that consists of 10 graphs from 10 time points. The degree, degt(1), t = 1,…, 10 increases in time, simulating an active node that is attached to other nodes. In these experiments, the node 1 is successively attached to node

51, 52, ..., 60. Thus degt(1) increases from 0 to 10; degt(50 + t) = 1; t = 1,…, 10. The resulting states of the dynamic graph, a sequence of ten graphs are shown in Figure 5-2.

44

Figure 5-2. Synthetic dynamic graphs (above) and their time series (below) of four nodes for 2nd smallest and the largest eigenvector.

From Figure 5-2 it can be observed that the changes in the 2nd smallest eigenvector for node 1 at each time moment, as new links are added to this node, are much larger than those of the corre- sponding eigenvectors for nodes 50, 51, 52. This behavior is consistent with the activity (link ad- dition) in time of each of these nodes.

From this experiment, we can see how the time series models can be utilized to model the dynam- ics of a temporal graph sequence. However, can we further our critical thinking, and generalize

45 the concept of mapping the discrete graph into a continuous space? For instance, can we reduce the temporal variance of the “graph features” in Figure 5-1 to make a more reliable prediction?

We explore this problem in the next section, and propose a more general algorithmic framework for mining dynamic social network.

5 . 2 Key components of the dynamic network mining framework2[46]

In the previous section, we first propose how to use the low rank approximation of the Laplacian matrices in dynamic graph sequences. With one step further to a more general framework for mining dynamic networks, we identify the key components in an algorithmic framework as in shown in Figure 5-3.

Figure 5-3. The framework for link prediction problem in temporal social network

2 This work was originally published in ACM SIG KDD (Knowledge Discovery and Data Mining) 2011.

46

Figure 5-4. The evolution of graph manifolds along time. Each graph is represented as a mani- fold embedded in high dimensional space at each time point. The two dots are graph vertices embeddings sampled from the underlying smooth manifolds, whose correspondences are linked by red and blue lines as time series. Note that each manifold only preserves the topology within that time point, but not necessarily corresponds to the neighboring time points. This observation indicates the importance of aligning the manifolds before we model them as time series.

5.2.1 Algorithm for mining of dynamic networks

Formally, the steps shown in Figure 5-3 are captured by the following high-level-description al- gorithm:

47

Algorthmic Framework for Mining Dynamic Networks

Input: A dynamic graph, .

Parameters: a graph distance or kernel: any PSD graph distance matrix or any Mercer kernel

can be adopted such that local and global distance constraints are preserved. The shortest path

kernel is adopted here.

Output: Predicted .

 Step 1 (Graph Embedding). For each graph, apply Multi Dimension Scaling, or other

graph embedding algorithms (e.g. using graph spectral) to map the graph into Euclidean

space (or other continuous space);

 Step 2 (Manifold Alignment). Aligned the each in the embedded space with different

choices of alignment algorithms as in Section 2;

 Step 3 (Trajectory Prediction). For each graph vertex in the embedded space, its trajec-

tory during time can be modeled using any time series regression model, e.g. linear mod-

el, ARMA, etc. After this step, the graph embedding is optimally predicted;

 Step 4 (Graph Reconstruction). The predicted graph can be constructed from the

pair-wise distances in the embedded space .

5.2.2 Graph Embedding

Graph embedding is the first component of the framework. Its goal is to represent the graph as a point cloud in some continuous space where its evolution can be tracked using conventional re- gression techniques. To achieve this goal many of the available methods in Multi-Dimensional

Scaling (MDS) can be used.

48

MDS is generally used for visualization of a dissimilarity matrix in some smooth space. A good review of the available methods and techniques can be found in [12]. MDS methods can be clas- sified into metric and non-metric methods. In metric MDS the dissimilarity matrix must represent a metric on the embedding space. The MDS algorithm used here will be classical metric MDS where the metric will be the shortest distance between two nodes in the original graph.

After this step, each graph snapshot Gt is optimally embedded into real numbers with dimen- sion of M vertices by K MDS dimensions. That is:

5.2.3 Manifold Alignment

Manifold alignment finds the correspondence between two seemingly disparate datasets[37]. It constitutes a critical component is our framework, due to the variability of manifolds along time points.

Go back to the chapter where you discussed this already; delete the text from that place and add the sentence “…Procurstes alignment, which will be discussed in Chapter 5”

Procrustes alignment [37]seeks the isotropic dilation and the rigid translation, reflection and rota- tion needed to correspond embeddings X and Y, by optimizing the following objective function:

optimal argmin ‖X kY ‖F

where k tr tr(YTY) is the diagonal matrix of SVD of YTX.

With the Procrustes alignment matrix Q, correspondence between two graph manifolds can be performed with respect to different procedures. We denote the aligned manifold in time point t as:

M K Zt R

49

5.2.4 Trajectory Estimation and Prediction

This component carries out the modeling and prediction of the entire framework. Since we for- mulated the temporal graph into time series data, we can employ statistical methods e.g. General- ized Linear Model, Kalman Filtering, etc.

To avoid overfitting and keep our model sufficiently simple, we use linear and quadric regres- sions for trajectory estimation or prediction of the embedded and aligned vertex manifold coordi- nates. Each vertex's coordinate time series are centered and then fitted together into a nested ran- dom effect regression model. Both linear and quadratic fittings are considered:

Linear:

2 2 βmk N(0 k) εtmk N(0 σk) (4)

Quadratics:

( ) ( ) ( ) (5) for snapshots(time) t 1 2 T verticesm 1 2 M and dimensions k 1 2 K. Akaike

Information Criterion (AIC) and Bayesian Information Criterion (BIC) are used for the regression models (linear or quadratic) selection. To compare and select alignment methods, we define the learning errors as the Euclidean distance between the true and the learned positions of a vertex.

2 1 2 K emt (∑k 1 (Zt m k Ẑt m k ) ) (6)

Furthermore, the quantities emt are weighted to produce a more sensitive error function:

w 1 e 1 2 emt (7) mt K 2 (∑k 1 Zt m k )

50

w Indeed, e mt is more robust to the random errors on the boundaries of the whole network and put more penalty to the prediction algorithm that have relatively worse performance inside the center of the whole data clusters.

5.2.5 Graph Reconstruction

Prior to this step, an optimal predicted graph embedding XT 1 for GT 1 has been computed. To reconstruct the graph, we need to ensure that the graph kernel from XT 1 is Positive-Semi Defi- nite (PSD) [9]. In our framework, we circumvent this potential issue by building the graph from pair-wise distances in XT 1, which is guaranteed to be PSD.

Worth mentioning, the non linearity in the data is captured by the manifold learning, especially by the graph distance kernel, which captures geodesic distance along the manifold. Hence, even- tually we solve the optimization problem using linear system, but inherently we are dealing with highly non linear data.

Extensive experimental results are reported in the next chapter, which we quantitatively measure our proposed algorithmic framework.

51

Chapter 6 Experimental Results of Mining

Dynamic Network

The purpose of computing is Insight, not numbers.

--

Extensive experiments in real-world social networks evaluate the effectiveness of each com- ponent, which demonstrate the promise of our framework for link prediction. Furthermore, sever- al functionalities derived from our framework, such as visualization of temporal social networks as trajectories, vertex behavior modeling, are demonstrated.

6 . 1 Experiment Data: DBLP co-authorship network

Social network analysis with dynamic networks is still a young research area, and there are very few datasets available. The DBLP (Digital Bibliography & Library Project) data set3 is one of these precious resources and is used in our experimentation. One example DBLP XML entry is given in Figure 6-1. This data set is analyzed from 1995 to 2004 and an author set is chosen where every author in the set must have contributed in at least 8 of the 10 years under study.

From this set of authors the CORE set is chosen to be the largest connected component of the first snapshot (1995). This results in a CORE set of 2,538 authors at 10 different time points. One il- lustration of a co-authorship graph snapshot in year 1995 is in Figure 6-2.

3 http://dblp.uni-trier.de/

52

Chunsheng Fang Anca L. Ralescu Mojtaba Kohram Spectral Regression with Low-Rank Approximation for Dynamic Graph Link Prediction. 48-53 2011 26 IEEE Intelligent Systems 4 http://doi.ieeecomputersociety.org/10.1109/MIS.2011.44 db/journals/expert/expert26.html#FangRK11

Figure 6-1, one sample DBLP XML entry for a co-authored publication article. In this study, we are interested in the entry where we extract the co-author relationship forming a bipartite graph for each article; another entry is which helps to construct the dy- namic network.

Figure 6-2, the DBLP co-authorship core set network snapshot at year 1995. Each vertex denotes an author in the core set; each edge denotes at least one co-authored article by these two vertices.

53

6 . 2 Validating Manifold Alignment

The proposed framework also provides an innovative visualization of temporal social networks as

Figure 6-3 illustrates. Each vertex can be viewed as a trajectory along time axis, while each layer represents the graph embedding in each year.

Three important observations can be made from Figure 6-3:

1. Of all trajectories evolving with time, those without manifold alignment have high tem-

poral variance with large fluctuations;

2. All three alignments have reduced the trajectory variance with different levels;

3. Procrustes alignment performs the best both in magnitude and fluctuation of variance.

4. Another interesting phenomenon observed from Figures 6-3 and 6-4 (which is not ac-

counted for by previous methods), is that of gravitational collapse of trajectories.

In the current formulation of the link prediction problem, the dynamic graph has a fixed vertex set.

The edge set increases in time as edges are never deleted (for the DBLP example, this means that if two authors have been linked, they remain linked). It then follows that the diameters of succes- sive graphs (successive network snapshots) are eventually decreasing monotonically as the num- ber of shortest paths increases. The graph embedding and the trajectories of each vertex reflect this property. By analogy to the astronomical gravity effect which attracts the mass and eventual- ly every atom will collapse into a singularity, we call this phenomenon the gravitational collapse of trajectories. In the social networks context this corresponds to the convergence of the graphs representing the network to a complete graph.

54

Figure 6-3. Trajectories of four different alignments for the real-world DBLP data set with 2,538 core au- thors: No alignment; Alignment to previous year recursively; Aligned to the 1st year; Procrustes alignment to the 1st year. In all panels, each horizontal layer demonstrates the 2D graph embedding of each year. Each corresponding vertex (author) is linked by line segments going upward. As arrows pointed out in the 1st pan- el, without alignment, the trajectories have huge variance and fluctuate dramatically. All 3 alignments have reduced the trajectory variance with different levels, among which the Procrustes perform the best.

Figure 6-4. Gravitational collapse of trajectories to singularity. Left panel: trajectories after manifold align- ment for the real-world DBLP data set with 2,538 core authors. Right panel: the conceptual idea of all trajec- tories converges to singularity (complete graph). The collapsing phenomenon can be visually inspected, which indicates the diameters of the graphs are shrinking.

55

6 . 3 Gravitational collapse of trajectories

We observe an interesting phenomenon unaccounted for by previous methods. We refer to it as gravitational collapse of trajectories.

In our formulation, we consider the dynamic graph has a fixed vertex set, and the edge set mono- tonically increase as time. Obviously the diameter of each graph snapshot will asymptotically decrease due to the increasing possibility of number of shortest paths. This change is captured by the graph embedding, and furthermore, it is reflected by the trajectories of each vertex, with a trend to get closer as time approaches. This phenomenon is similar to the astronomical gravity effect which attracts the mass and eventually everything will collapse into a singularity, which in our social networks hints the convergence to a complete graph.

6 . 4 Trajectory Modeling Results

Both linear and quadratic regressions are applied for the estimation and prediction of the author coordinates in year 2004. We fit the model with the first two primary MDS dimensions. For each type of aligned data, the linear fitting has better AIC and BIC scores. Table 6-1 summarizes the means and the standard deviations of the learning errors for the four types of alignment algo- rithms. The Procrustes Alignment yields the smallest mean weighed error. It is not surprised that the data without any alignment has errors significantly larger than those from alignment.

Figure 6-5 presents the scatter plots of ground truth coordinates vs. estimated and predicted coor- dinates for the network in 2004. Prediction has wider variance than estimation, due to the exclu- sion of the last year data. Alignment plays a significant role in modeling the dynamic graph. The

Procrustes alignment method with linear random effect regression performs well in both estima- tion and prediction, which strongly supports the claim that under this framework, the nature of dynamic social network is sufficiently captured by a simple model.

56

Table 6-1. Estimation and Prediction Errors for the four alignments in linear and quadratic

nested random effect models.

Estimation Error Estimation Error, Prediction Error Prediction Error, Alignment Method Regression AIC BIC (6) Weighted (7) (6) Weighted (7)

Linear 192926.1 192942.2 1.7473(1.0435) 1.2271(0.7722) 2.3604(1.4096) 1.6576(1.0432) Without manifold

alignment Quadratic 209312.2 209328.1 1.1329(0.6530) 0.8544(1.9637) 2.5351(1.4613) 1.9120(4.3943)

Linear 184649.8 184665.8 0.4283(0.2988) 0.3353(0.5095) 0.5785(0.4036) 0.4529(0.6883) Affine transform to

previous year Quadratic 203406.2 203422.0 0.6214(0.4009) 0.5524(2.2990) 1.3905(0.8972) 1.2361(5.1446)

Linear 161244.7 161260.7 0.4043(0.3362) 0.2767(0.6975) 0.5461(0.4541) 0.3738(0.9423) Affine transform to the

1st year Quadratic 183140.3 183156.2 0.4938(0.3750) 0.3702(2.0993) 1.1051(0.8392) 0.8283(4.6977)

Linear 162879.5 162895.6 0.4071(0.3324) 0.2716(0.6748) 0.5500(0.4491) 0.3669(0.9116) Procrustes alignment Quadratic 184128.5 184144.3 0.5073(0.3750) 0.3677(1.7034) 1.1352(0.8390) 0.8227(3.8117)

57

Figure 6-5. The scatter plots of the estimated and predicted coordinates vs. the true coordinates for the DBLP data in 2004. Top: Estimations of the core author coordinates in 2004 using the whole time series. Bottom: Predictions of the core author coordinates in 2004 using whole time series except

2004. Prediction has wider variance than estimation. Alignment plays a significant role in modeling the dynamic graph, especially Procrustes alignment.

58

Figure 6-6 . Trajectory behavior analysis. Seven authors’ trajectories with highest vertex degrees are inspected. Interestingly, YM, NA, DP, MN, MS are all from , and HGM, HER are from different institutes. We observe that those 5 Israeli authors have similar patterns than the rest two.

Table 6-2. More information about all authors of interest in this study

Author Institution URL

Yishay Mansour University, ISRAEL http://www.math.tau.ac.il/~mansour

Noga Alon , ISRAEL http://www.tau.ac.il/~nogaa

Micha Sharir Tel Aviv University, ISRAEL www.math.tau.ac.il/~michas

David Peleg Weizmann Institute of Science, ISRAEL www.wisdom.weizmann.ac.il/~peleg/

Moni Naor Weizmann Institute of Science, ISRAEL www.wisdom.weizmann.ac.il/~naor

Hector Garcia- Stanford, USA http://infolab.stanford.edu/people/hector.ht

Molina ml

Hesham El-Rewini SMU, http://lyle.smu.edu/~rewini/lab.html

59

6 . 5 Vertex Behavior Modeling

Another functionality enabled by this framework is vertex behavior modeling, obtained by clus- tering the trajectories as shown in Figure 6-6. Using K-means clustering, we compute 10 clusters which group authors with similar temporal behaviors together. Similar trajectories indicate the co-evolution patterns of authors.

To get more insights into the clustering, Figure 6-7 takes a closer look into seven authors’ trajec- tories with highest vertex degrees. Interestingly, five of these vertices have similar trajectories.

These vertices correspond to authors who are all from Israel, Their behavior is different for the two remaining authors, who are, each, from a different institution.

6 . 6 Identifying Exemplar authors

With the temporal signatures encoded by our framework, another innovative analytic is to identi- fy the “exemplar authors” among the massive authors set. A recently developed method, Affinity propagation [47] is adopted.

The affinity propagation is a message passing algorithm which can not only cluster a dataset un- supervisedly but also identify the “exemplar data point”. It keeps propagating “responsibilitis” and “availabilities” messages among all data points until it converges. In practice, it can not only cluster the data, but also identify the most centralized data point for each cluster.

60

Figure 6-7. Trajectory behavior clustering for the Procrustes alignment. First row illustrates 10 K- means clusters of temporal trajectory behaviors for DBLP core authors for the 1st MDS dimen- sion. Second row is for the 2nd MDS dimension. The cluster center is superimposed on each plot.

61

Figure 6-8, An example of how affinity propagation works.

Figure 6-9. Convergence of AP algorithm on the Procrustes manifold alignment to 1st year DBLP dataset.

62

Figure 6-10, sixty-six exemplar authors set computed by affinity propagation algorithm, with only the first 3 dimensions of the embedding space are visualized. Each exemplar is annotated with its author name, and forms a satellite patten together with all other authors in the same cluster.

Worth mentioning, the “exemplar authors” in each cluster are not necessary the “exemplar” in the academia, but only the center of a cluster, based on the AP message passing heuristics. Dif- ferent from K-means clustering, AP does not need to specify the number of clusters, and it will automatically seek for the optimal number of clusters.

6 . 7 Reconstructing the Predicted Network

To reconstruct the predicted network we take the following steps:

1. Predict the graph embedding for the last time point;

63

2. Collect those pairs of vertices which are not connected by an edge in the graph at the preceding

time;

3. Sort the vertices collected at the previous step by their distance: pairs of vertices that are closest

in Euclidean space could be potential edges;

4. Prioritize links based on the distances computed above.

The reconstruction results are compared against a set of edges created randomly as shown in Ta- ble 6-3. The results of each embedding method with regression methods are reported in terms of how much better they are than a random guess. The percentage of a correct random guess of an edge existence is 0.013%. The consistent superiority of our framework compared random guess suggests that the trend of graph evolution is meaningfully captured. Lastly, the Procrustes align- ment method along with a linear regression method proved to be the best performing predictors, which is consistent with last section.

Table 6-3. Performance of the algorithm with different alignment and regression methods compared to random

Without manifold Affine transform to Affine transform to Procrustes

alignment previous year the 1st year alignment Linear Quadratic Linear Quadratic Linear Quadratic Linear Quadratic Factor of im- provement 4.49 3.91 4.57 3.93 4.78 3.68 4.81 3.76 over random

6 . 8 Link Prediction performance analysis

In this link prediction experiment, we compare the asymptotic performance between our predict- ed results to random baseline. This section consists of two experiments:

64

6.8.1 Predicting the embedding

First we illustrate how our predictive model is capturing the dynamics trained with the previous time series of graphs. We train our model with T-1 years of graph snapshots, as stated in previous sections, and compare the predicted T-th year graph embedding with the ground truth. As illus- trated in Figure 6-8, they are visually very similar in the color coded distributions of all graph vertices. This hints a promising prediction, which we will further evaluate in next sub section.

Figure 6-11. Ground truth embedding at the last year and the predicted embedding by Procrustes manifold alignment at last year. Each circle is color coded by their author ID defined by our ex- periment, as in Appendix II.

6.8.2 Evaluating the link prediction accuracy

We define a concept of “Emerging Edge Set” (EES), which is difference of edge set in time point

T and T-1:

EES(T)= E(T) – E(T-1)

65

A good predictive model should be able to predict correctly on this set, trained on all previous year’s graph snapshots. Based on the embedding result we obtain in the previous subsection, we reconstruct the predicted pair-wise distance matrix of year T, and count how many edges fall into that EES emerging edge set.

As Figure 6-12 indicates, our proposed algorithm performs significantly and substantially better than random guess baseline, with the increasing of predicted links generated by the model, which convinces us that our model is capturing the actual dynamics in the temporal graph sequence.

Figure 6-12. Link prediction accuracy on EES: manifold alignment vs. random guess. X-axis indicates the size of predicted links generated by each model, and the Y-axis denotes the percent- age accuracy on predicting correctly the Emerging Edge Set. The random baseline performance is averaged on 10 runs at each threshold.

66

6 . 9 Cloud computing performance analysis

“In pioneer days they used oxen for heavy pulling, and when one ox couldn’t budge a log, they didn’t try to grow a larger ox. We shouldn’t be trying for bigger computers, but for more systems

of computers.”

- Grace Hopper

In the last decade, the computing industry witnessed the successful paradigm shift of high per- formance: the appearance of cloud computing. As an insightful quote from Grace Hopper, the US

Navy officer and , this interesting analogy pointed out the essence of cloud computing: building a super computer with commodity computers.

Upon this distributed nature of cloud computing architecture, one of the most successful software computing framework is the Map Reduce, proposed by . MapReduce framework is a func- tional program language, and consists of two major components: mappers and reducers. The mapper take the input data and crunch into intermediate pairs, then with the shuf- fling module which basically is a hash function, these pairs will theoretically be uniformly distributed to the set of reducers. Eventually the reducers will collect the results and write to a distributed file system. MapReduce is intrinsically a load-balanced computing frame- work, and is a powerful tool for breaking down Big Data, such as Google computation.

67

Figure 6-13. Major components of MapReduce architecture.

Hadoop[48] is one of the most successful open-source implementation of MapReduce. Figure 6-

14 illustrate the skeleton of Hadoop. Name node and data nodes maintain the data and IO access, taking into account the redundancy and reliability of the distributed data. Job tracker and task tracker are managing the map reduce computation work flow.

Figure 6-14. Hadoop: HDFS Namenode, Datanode, MapReduce JobTracker, etc.

68

In this section, we utilize the power of MapReduce for the massive vertex behavior modeling ex- periment, which basically is to apply a clustering algorithm to the ensembles of embedded tem- poral trajectories. We implemented the well-known Kmeans algorithm on the Hadoop MapRe- duce framework. Kmeans clustering algorithm is a specific case of EM (Expectation-

Maximization) algorithm, which iteratively consists of these two major steps:

 E-step: For all data points with the same cluster ID, compute their center.

 M-step: Assign each data point to one of its closest K centers. Go to E-step if not con-

verged or met the maximum iterations.

In our preliminary experiment, we collect the top 100 and 400 dimensional embedding for each author respectively, resulting files of about 22MB and 87MB big. The Hadoop runtime is report- ed in the table below. We can extrapolate that as the file size grows, more mappers will be used, and more speedup we can obtain in a larger scale computing.

Table 6-4: Scalability performance on Hadoop.

Dimension of Time in File size #Mapper #Reducer each author sec/iteration

1 100 22MB 54 1 10

2 400 87MB 78 2 10

Ratio (2/1) 4.0 1.4

69

Chapter 7 Discussion and future work

I could be bounded in a nutshell and count myself king of infinite space.

– Hamlet, Act 2, Scene 2

7 . 1 Framework for mining heterogeneous networks

We proposed a framework for mining heterogenous networks using graph embedding and mani- fold alignment. The experimental results on biological networks show its promise in mining mul- tiple heterogeneous networks by aligning them in the high dimensional embedding space. Since the pairwise graph distance is preserved in the embedding space, this framework provides an in- novative way of exploring multiple heterogeneous networks data.

Future work can be categorized into theoretical and practical aspects:

 Theoretical aspect: Is our framework capturing the biological networks, or more intrinsic

topological properties in other networks such as heterogeneous social networks (Twitter

follower and Facebook friend) ?

 Practical aspect: Can we extend this framework to more knowledge domains that are be-

yond biological? One instance is to align multiple social networks, e.g. Twitter and Face-

book.

7 . 2 Framework for mining dynamic networks

The work presented in this dissertation falls into the area of intelligent information processing.

To begin with, we addressed issue of definition of measures of similarity applied to heterogene-

70 ous data. The issue of heterogeneity also arises in connection with (large scale) network based data, such as social networks, extracted from various domains. In our study of such networks we were mainly concerned with the problem of link prediction, which, in fact, reflects the evolution of the network. Moreover, our main objective was to formulate and demonstrate a general framework for solving the problem of link prediction. We were motivated to search for such a framework by the diversity of existing approaches and solutions to this problem. Thus, the ques- tion of deriving a general framework becomes interesting. As a result, we formulated the prob- lem of link prediction in a network in an new way, in which network evolution in time is captured, and predicted. This required a dynamic approach, capable to account for this evolution (as op- posed to approaches based on node similarities in a static, snap shot of the network). This lead us to consider a time series approach. However, the challenge for a time series-like treatment is to extract suitable, useful network characteristics. Embedding the graphs underlying the network (at each time moment) into a continuous space, resulting in a nonlinear subspace or manifold provid- ed the tool for such characteristics. Essential for this approach, ensuring reliable prediction and estimation is the step of manifold alignment. Experimental results support our approach both in (i) the need of the alignment, and (ii) estimation and prediction reliability.

Manifold alignment is shown as a powerful tool also in the analysis of multiple heterogeneous networks, which however, share the same set of nodes. This type of application is illustrated in

Chapter 4 of this dissertation, on biological networks associated with the same set of genes.

Although the proposed framework identifies key components in constructing a good link predic- tion model, its potential has not been fully exploited.

Future work will address issues such as :

(i) What combinations of these key components will theoretically guarantee a good link predic-

tion result?

(ii) Are there theoretical bounds for link prediction based upon this framework?

71

(iii) We notice that there are always outliers in the distribution in the embedding space. According

to the graph embedding, those outliers are graph nodes that are distant away from the majori-

ty. What do those outliers hint?

(iv) Most of the social networks share a same property: scale-free[49], which means the node de-

gree distribution follows power law:

In our framework, we don’t specialize for this property. Can scale-free property help model

and predict the dynamic networks? These questions remain as future work.

72

Publications during Doctoral study

Journal & Book Chapter:

 Chunsheng Fang, M. Kohram, Anca Ralescu, "Spectral Regression with Low-Rank Ap-

proximation for Dynamic Graph Link Prediction", IEEE Transaction on Intelligent Sys-

tem, 2011.

 M Zhang, J Deng, Chunsheng Fang, X Zhang, Jason Lu, "Molecular Network Analysis

and Applications", Chapter 11 of "Knowledge-Based Bioinformatics.", John Wiley &

Sons, Ltd, July 2010 [link];

 Chunsheng Fang, Anca Ralescu, "Online Gaussian Mixture Model for concept modeling

and discovery”, International Journal of Intelligent Technologies and Applied Statistics,

2008;

Peer Reviewed Conference:

 Chunsheng Fang, M. Kohram, X. Meng, Anca Ralescu, "Framework for Link Prediction and Ver-

tex Behavior Modeling in Social Networks via Graph Embedding", ACM SIG KDD , So-

cial Network Analysis workshop, 2011.

 Chunsheng Fang, Judd Storrs, Anca Ralescu, Jing-Huei Lee, Jason Lu, "Detecting Park-

inson's brain changes using local feature based regional SVM ensemble on MRI imag-

es", Human Brain Mapping 2011, Quebec, Canada, Jun 2011.

 Chunsheng Fang, Jason Lu, Anca Ralescu, "Graph Spectra Regression with Low-Rank

Approximation for Dynamic Graph Link Prediction", NIPS2010 Workshop on Low-rank

Methods for Large-scale Machine Learning, Vancouver, Canada, December, 2010.

73

 Chunsheng Fang, Anca Ralescu, Jason Lu, "Curve Profiling Feature: Compact Represen-

tation for BDGP Embryo Gene Expression Pattern Mining", IEEE International Confer-

ence on Data Mining 2010 (ICDM2010), Sydney, Australia, Dec 2010;

 Minlu Zhang, Chunsheng Fang, Jason Lu, “Integrative scoring approach to identify tran-

scriptional regulations controlling lung surfactant homeostasis ", International Confer-

ence on Data Mining 2010 (ICDM2010), Sydney, Australia, Dec 2010;

 Chunsheng Fang, Anca Ralescu, "ProbSim-Annotation: a novel image annotation algo-

rithm using probability based similarity", 20th Midwest & Cogni-

tive Science Conference (MAICS), Fort Wayne , Indiana, Apr 18-19, 2009;

 Chunsheng Fang, Anca Ralescu, "Experiments on Probability based Similarity Measures

Applied to Image Similarity", 19th International Conference on

(ICPR2008) Sensing Web workshop, Tampa, FL, Dec 7 -11, 2008;

74

Bibliography:

1. West, D.B., Introduction to Graph Theory, 2nd ed. Englewood Cliffs, NJ: Prentice- Hall, . 2000. 2. Cook, S.A., The complexity of theorem-proving procedures. Proc. 3rd ACM Symposium on Theory of Computing, 1971. 3. David Liben-Nowell, J.K., , , The link-prediction problem for social networks,. Journal of the American Society for Information Science and Technology, Volume 58, Issue 7, pages 1019–1031, , 2007. 4. Lise Getoor, C.P.D., , , Link Mining: A Survey. SIGKDD Explorations, Volume 7, Issue 2, 2003. 5. E. Richard, N.B., T. Evgeniou, N. Vayatis, Link Discovery using Graph Feature Tracking. NIPS 2010, Vancouver, Canada, December, 2010., 2010. 6. Mikhail Belkin, P.N., , . Laplacian Eigenmaps for Dimensionality Reduction and Data Representation. Neural Computation, 2003. 7. Rowland, T., "Manifold." From MathWorld--A Wolfram Web Resource, created by Eric W. Weisstein. http://mathworld.wolfram.com/Manifold.html. 8. Xiaofei He, M.J., Hujun Bao, Graph Embedding with Constraints. IJCAI, 2009. 9. Weisstein, E.W., Regression. MathWorld--A Wolfram Web Resource. http://mathworld.wolfram.com/Regression.html, 2011. 10. Nalini Ravishanker, D.D., A first course in linear model theory. Chapman and Hall/CRC, . 2002. 11. Matthies, L., R. Szeliski, and T. Kanade, Kalman filter-based algorithms for estimating depth from image sequences, in Technical report Carnegie-Mellon University The Robotics Institute CMU-RI-TR-88-1. 1988, Carnegie Mellon University, The Robotics Institute: Pittsburgh, Pa. p. ii, 48 p. 12. Duygulu, P.B., K.; de~Freitas, J. F.~G.; and Forsyth, D.~A., Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. European Conf. for Computer Vision 2002, 2002. 13. Tibshirani, R., Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society B. (1):267}. 1996. 14. Hastie, T.T., R.; and F., J., Elements of statistical learning. Springer-Verlag, , 2001. 15. Belkin, M. and P. Niyogi, Semi-supervised learning on Riemannian manifolds. Machine Learning, 2004. 56(1-3): p. 209-239. 16. Saul, S.T.R.a.L.K., Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science, 2000. 17. Chunsheng Fang, A.R., Jason Lu, Curve Profiling Feature: Compact Representation for BDGP Embryo Gene Expression Pattern Mining. IEEE International Conference on Data Mining 2010, 2010. 18. de Hoon, M., S. Imoto, and S. Miyano, Inferring gene regulatory networks from time- ordered gene expression data using differential equations. Discovery Science, Proceedings, 2002. 2534: p. 267-274. 19. van Someren, E.P., L.F. Wessels, and M.J. Reinders, Linear modeling of genetic networks from experimental data. Proc Int Conf Intell Syst Mol Biol, 2000. 8: p. 355- 66. 20. van Someren, E.P., et al., Genetic network modeling. Pharmacogenomics, 2002. 3(4): p. 507-25.

75

21. Li, P., et al., Comparison of probabilistic Boolean network and dynamic Bayesian network approaches for inferring gene regulatory networks. BMC Bioinformatics, 2007. 8 Suppl 7: p. S13. 22. Bar-Joseph, Z., Analyzing time series gene expression data. Bioinformatics, 2004. 20(16): p. 2493-503. 23. Kim, S., S. Imoto, and S. Miyano, Dynamic Bayesian network and nonparametric regression for nonlinear modeling of gene networks from time series gene expression data. Biosystems, 2004. 75(1-3): p. 57-65. 24. Imoto, S., et al., Combining microarrays and biological knowledge for estimating gene networks via bayesian networks. J Bioinform Comput Biol, 2004. 2(1): p. 77-98. 25. Zhang, M., et al., Molecular network analysis and applications, in Knowledge-based bioinformatics: from analysis to interpretation, G. Alterovitz and M. Ramoni, Editors. 2010, Wiley. 26. Xu, Y., et al., A systems approach to mapping transcriptional networks controlling surfactant homeostasis. BMC , 2010. In press. 27. Hwang, T. and R. Kuang, A heterogenous label propagation algorithm for disease gene discovery. SIAM International Conference on Data Mining, 2010: p. 12. 28. S.V. N. Vishwanathan , e.a., Graph Kernels,. Journal of Machine Learning Research., 2008. 29. Young. F. W. , H.R.M., Theory and Applications of Multidimensional Scaling. Eribaum Associates. Hillsdale, NJ., 1994. 30. Anca L. Ralescu, S.V., Stefana Popovici, A Stochastic Treatment of Similarity. IPMU, 2010. 31. Popovici, S., On evaluating similarity between heterogeneous data. MASTER , University of Cincinnati, 2008. 32. Ho., S.L.a.T., Measuring the similarity for heterogeneous data: An ordered probability-based approach. LNAI, 3245:129–141, 2004., 2004. 33. Ralescu, A.P., S.; and Ralescu, D., On evaluating the proximity between heterogeneous data. Proceedings of Nineteenth Midwestern Artificial Intelligence and Cognitive Science Conference, MAICS-2008, Cincinnati, OH, 2008. 34. Popovici., S.A., On evaluating similarity between heterogeneous data. Master Thesis, University of Cincinnati,, 2008. 35. Makadia, A.P., V.; and Kumar, S., A new baseline for image annotation. European Conf. for Computer Vision 2008, 2008. 36. Oleksii Kuchaiev, T.M., Vesna Memisevic, Wayne Hayes & Natasa Przulj, Topological network alignment uncovers biological function and phylogeny. J. R. Soc. Interface, 2010. 37. Chang Wang , S.M., Manifold alignment using Procrustes analysis. In Proceedings of the 25th international conference on Machine learning (ICML '08). 2008. 38. Zhang, M., et al., The orphan disease networks. Am J Hum Genet, 2011. 88(6): p. 755-66. 39. Chen, J., et al., Improved human disease candidate gene prioritization using mouse phenotype. BMC Bioinformatics, 2007. 8: p. 392. 40. Chunsheng Fang, J.L., Anca Ralescu, Graph Spectra Regression with Low-Rank Approximation for Dynamic Graph Link Prediction. NIPS2010 Workshop on Low- rank Methods for Large-scale Machine Learning, Vancouver, Canada, December, 2010. , 2010. 41. Chunsheng Fang, M.K., Anca Ralescu, Spectral Regression with Low-Rank Approximation for Dynamic Graph Link Prediction. IEEE Transaction on Intelligent System, 2011.

76

42. Kunegis, D.F., and Christian Bauckhage. . , Network growth and the spectral evolution model. In Proceedings of the 19th ACM international conference on Information and knowledge management (CIKM '10). 2010. 43. Chung, F.R.K., Spectral graph theory. Regional conference series in ,. 1997, Providence, R.I.: Published for the Conference Board of the mathematical sciences by the American Mathematical Society. xi, 212 p. 44. Huang, Z., Link Prediction Based on Graph Topology: The Predictive Value of Generalized Clustering Coefficient,. ACM LinkKDD, 2006. 45. George, B., ed. Time Series Analysis: Forecasting & Control, 3rd Ed, . 1994, Pearson . 46. Chunsheng Fang, M.K., X. Meng, Anca Ralescu, Framework for Link Prediction and Vertex Behavior Modeling in Social Networks via Graph Embedding. ACM SIG KDD , Social Network Analysis workshop, 2011. 47. Brendan J. Frey and Delbert Dueck, U.o.T., Clustering by Passing Messages Between Data Points. Science 315, 972–976, 2007. 48. White, T., Hadoop : the definitive guide. 2009, Sebastopol, CA: O'Reilly. xix, 501 p. 49. Barabasi, A.L., Scale-free networks: a decade and beyond. Science, 2009. 325(5939): p. 412-3.

77

Appendix I: More results for ProbSim image annotation

78

Appendix II: DBLP Core author set

1, Peter A. Ng 36, Refik Molva 71, Tamal K. Dey 106, Rajiv Gupta

2, Maria Luisa Bonet 37, Sybille Hellebrand 72, Andreas Paepcke 107, Venkatesh Ganti

3, Colin McDiarmid 38, 73, Rolf Klein 108, Peter Baumgartner

4, Philip R. Cohen 39, Shalom Tsur 74, Alessandra Di Pierro 109, Yin-Feng Xu

5, Chee-Keng Yap 40, Ahmed Bouajjani 75, Alexander Aiken 110, Masafumi Yamashita

6, Yacov Yacobi 41, Jean-Marie Jacquet 76, J. Alfredo Sánchez 111, Jie Liang

7, Philip N. Klein 42, C. Greg Plaxton 77, Radha Jagadeesan 112, Danny De Schreye

8, Roman Slowinski 43, Edward A. Fox 78, Cheng-Zhong Xu 113, Éva Tardos

9, Teresa H. Y. Meng 44, Jaeho Lee 79, William E. Weihl 114, Anil Kumar

10, Chin-Chuan Han 45, Mounir Hamdi 80, Jaap-Henk Hoepman 115, Alvin R. Lebeck

11, Bhavani M. Thuraisingham 46, Dawson R. Engler 81, Andre Schröter 116, Peter Widmayer

12, Agostino Cortesi 47, Joost Engelfriet 82, Mark Craven 117, Kurt Keutzer

13, Geoff A. W. West 48, Betty Salzberg 83, James A. Hendler 118, Andrew Rau-Chaplin

14, David Lee 49, Vijay V. Raghavan 84, Craig G. Nevill-Manning 119, Stefano Leonardi

15, Debasis Mitra 50, Bart Selman 85, Mark A. Franklin 120, José M. Vidal

16, Dhiraj K. Pradhan 51, Mary W. Hall 86, Tamara Sumner 121, Martin Odersky

17, Baruch Awerbuch 52, Thomas Fahringer 87, Flaviu Cristian 122, David S. Doermann

18, Paul Ammann 53, Pei Cao 88, Gene Golovchinsky 123, Yuen-Tak Yu

19, Elizabeth Burd 54, Wei Kuan Shih 89, Carlo Colombo 124, David L. Dill

20, Marco Bernardo 55, Kuen-Jong Lee 90, Marshall W. Bern 125, Joseph L. Mundy

21, Andrea C. Arpaci-Dusseau 56, 91, Hanspeter Pfister 126, Sudipta Bhawmik

22, Thomas Linke 57, Jan A. Bergstra 92, Arie E. Kaufman 127, N. Ranganathan

23, Jennifer Preece 58, Youngsoo Shin 93, Xiaoyang Mao 128, Dragutin Petkovic

24, Yonit Kesten 59, Jeff Edmonds 94, Bongki Moon 129, Arvind Gupta

25, Jeannette M. Wing 60, 95, José E. Moreira 130, Giuliano Antoniol

26, Reinhard Klein 61, Hector Garcia-Molina 96, Marco Di Natale 131, Xiaowei Xu

27, Prabhakar Ragde 62, Michael O. Rabin 97, S. S. Ravi 132, Jan Van den Bussche

28, Marco Ajmone Marsan 63, Enric Pastor 98, Carl A. Gunter 133, Mogens Nielsen

29, José M. Cela 64, Laura M. Haas 99, Steffen Lange 134, Ming-Tat Ko

30, H. Venkateswaran 65, Jyh-Charn Liu 100, Enrico Macii 135, Martin Keim

31, 66, Byung Suk Lee 101, Yannis Manoussakis 136, Fernando C. N. Pereira

32, Marie-Odile Cordier 67, Clifford A. Shaffer 102, Nick Roussopoulos 137, James E. Tomayko

33, James F. Kurose 68, Hartmut Noltemeier 103, Daniel Marcu 138, Wolfgang Thomas

34, Harry G. Mairson 69, Daniel A. Spielman 104, Dominique Barth 139, Roger Espasa

35, Rohit Parikh 70, Rastislav Bodík 105, Nick McKeown 140, Peter Brezany

79

141, Adam Kowalczyk 179, Albert G. Greenberg 217, Yuval Rabani 255, Kurt Schneider

142, Tomás Feder 180, Zhengyou Zhang 218, Soo-Mook Moon 256, Gabor Karsai

143, Nitin H. Vaidya 181, Svetha Venkatesh 219, Andreas Butz 257, Andrew S. Grimshaw

144, Ronitt Rubinfeld 182, Michael Nicolaidis 220, Marian Bubak 258, Frank K. H. A. Dehne

145, Saumya K. Debray 183, Ruy J. G. B. de Queiroz 221, Michael Fisher 259, Jaejin Lee

146, Beate Bollig 184, Jurek Czyzowicz 222, Pietro Manzoni 260, Dennis Heimbigner

147, Shankar Krishnan 185, Tova Milo 223, Dean M. Tullsen 261, David B. Lomet

148, Gérard Boudol 186, Christos D. Zaroliagis 224, Jitender S. Deogun 262, Shamkant B. Navathe

149, Piero Mussio 187, David A. Cohen 225, Torben Hagerup 263, Scott F. Smith

150, Raymond T. Ng 188, Gary S. H. Tan 226, Sridhar Hannenhalli 264, Manish Parashar

151, Susan J. Eggers 189, Tetsuo Asano 227, Roberto Segala 265, Richard I. Hartley

152, Tzvetan Ostromsky 190, Jan Chomicki 228, Scott A. Mahlke 266, Gerda Janssens

153, Amnon Ta-Shma 191, Kenneth Y. Yun 229, Laks V. S. Lakshmanan 267, Baruch Schieber

154, Geppino Pucci 192, Maristella Matera 230, Michael B. Jones 268, Luca Cabibbo

155, Flaminia L. Luccio 193, Hisao Tamaki 231, Grammati E. Pantziou 269, Daniel Barbará

156, Miki Hermann 194, Bruce Hendrickson 232, Giuseppe Serazzi 270, Stephane Perennes

157, Daniel A. Menascé 195, Milos D. Ercegovac 233, Yoram Singer 271, Maria Cristina Pinotti

158, Imre Bárány 196, Donald Kossmann 234, Christian Bessière 272, Jian Yang

159, Eugene W. Myers 197, Ling Liu 235, Mike W. T. Wong 273, Adam L. Buchsbaum

160, Christos Levcopoulos 198, Fatos Xhafa 236, Jin-yi Cai 274, Shmuel Sagiv

161, Mateo Valero 199, Kedar S. Namjoshi 237, Radu Marculescu 275, Ran Raz

162, Matthew Dickerson 200, Mark H. Overmars 238, Thomas Schwentick 276, Raghu Ramakrishnan

163, Gautam Das 201, Chih-wen Hsueh 239, John Darlington 277, Pranav Ashar

164, Patrick H. Madden 202, Bruce M. Maggs 240, Michael H. Böhlen 278, Joel Wein

165, Luigi Cinque 203, Andrew Zisserman 241, Mordecai J. Golin 279, Amin Vahdat

166, Carolyn L. Talcott 204, Alberto Belussi 242, Jim Gemmell 280, Eric A. Brewer

167, Israel Cidon 205, Randolph Y. Wang 243, Piyush Mehrotra 281, Vineet Gupta

168, Stéphane Grumbach 206, Shiyu Zhou 244, John Hershberger 282, Scott Pakin

169, Abdel Ejnioui 207, Yuan-Shin Hwang 245, Dima Grigoriev 283, Jwu E. Chen

170, Ugo Vaccaro 208, David A. Forsyth 246, 284, Andrea Pietracaprina

171, Ronald J. Vetter 209, Gian Luigi Ferrari 247, Daniel Bleichenbacher 285, Jaeyoung Choi

172, Subhash Suri 210, Miklós Ajtai 248, Micah Adler 286, Oleg Sokolsky

173, Paul Erdös 211, Oren Etzioni 249, Stephen M. Smith 287, Steffen Lempp

174, Yehoshua Sagiv 212, Jon G. Riecke 250, Kozo Sugiyama 288, Arvind

175, Efim B. Kinber 213, Lisa Hellerstein 251, Wojciech Penczek 289, Michael Colón

176, Verónica Dahl 214, Bonnie E. John 252, Ernesto Damiani 290,

177, Zhen Zhang 215, Richard E. Ladner 253, Petr Savický 291, Paul M. B. Vitányi

178, Thomas Kunz 216, Robert C. Holte 254, Victor Vianu 292, Mauricio J. Serrano

80

293, Gary T. Leavens 331, Kyu-Young Whang 369, Prithviraj Banerjee 407, Mani B. Srivastava

294, Doug Baldwin 332, David Grove 370, Jussi Myllymaki 408,

295, Bruce A. Reed 333, Lauri Hella 371, Robert H. Sloan 409, Hui Li

296, Bernard Mourrain 334, 372, Tomoyuki Uchida 410, Anil Nerode

297, Peter Wegner 335, Juan A. Garay 373, Randal E. Bryant 411, Gordon D. Plotkin

298, Marco Comini 336, Amit P. Sheth 374, Joachim Biskup 412, Alexandru Nicolau

299, Johannes Gehrke 337, Daniel S. Weld 375, Oscar H. Ibarra 413, Edward L. Robertson

300, Katherine A. Yelick 338, Renato de Mori 376, Wojciech Maly 414, Ákos Lédeczi 301, Rong Lin 339, David G. Kirkpatrick 377, L. Bruce Richmond 415, Daniel M. Dias 302, Han Wang 340, Ton Kloks 378, Matteo Fischetti 416, Orna Kupferman 303, P. S. Thiagarajan 341, Peter M. Dew 379, Yves Lespérance 417, Pavol Hell 304, Jennifer L. Welch 342, Tei-Wei Kuo 380, R. Iris Bahar 418, Esko Ukkonen 305, László A. 343, Luca Trevisan 381, Liang-Hua Chen Székely 419, Haiko Müller 344, Esther M. Arkin 382, Chris J. Myers 306, Paul G. Spirakis 420, Debashis Saha 345, Jürgen Teich 383, Ayumi Shinohara 307, Mary Beth Rosson 421, Zhiyong Liu 346, Philippe Besnard 384, Li Gong 308, Sergio Greco 422, Rina Dechter 347, Maxime Crochemore 385, Xuding Zhu 309, Sridhar Rajagopalan 423, Evgenia Smirni 348, Miriam Di Ianni 386, Larry S. Davis 310, Jan M. Rabaey 424, Craig Gotsman 349, Hadas Shachnai 387, Pattie Maes 311, Radek Vingralek 425, Joseph Cheriyan 350, John P. Lehoczky 388, Wei Sun 312, Ravi S. Sandhu 426, Zoltán Füredi 351, 389, Lenwood S. Heath 313, Val Tannen 427, Leen Stougie 352, Keith H. Bennett 390, Ronald L. Graham 314, Christian S. Jensen 428, Avishai Wool 353, Andrea E. F. Clementi 391, Yosi Ben-Asher 315, John Mylopoulos 429, Erich Schikuta 354, Jim Griffioen 392, Pekka Kilpeläinen 316, David Zuckerman 430, Yossi Azar 355, Divesh Srivastava 393, Philip H. S. Torr 317, Franco P. Preparata 431, Leszek Gasieniec 356, Kyuseok Shim 394, Boris Aronov 318, Timothy M. Chan 432, Fabio Grandi 357, Jerzy Wasniewski 395, Sharad Malik 319, Larry J. Stockmeyer 433, Ouri Wolfson 358, Lei He 396, A. Min Tjoa 320, D. Joseph 434, Anup Basu 359, David A. Carr 397, Miriam Leeser 321, Xiaotie Deng 435, Gerald-P. Glombitza 360, Hugo Krawczyk 398, Carl Pixley 322, R. Ravi 436, Pierluigi Crescenzi 361, Daniel J. Lehmann 399, Rephael Wenger 323, Pierangela Samarati 437, Siddhartha Chatterjee 362, Richard R. Muntz 400, Giovanni Pighizzini 324, Joshua D. Guttman 438, Hiroshi Motoda 363, Nadav Eiron 401, Joseph M. Hellerstein 325, Pascal Van Hentenryck 439, Harpreet S. Sawhney 364, Patrick E. O'Neil 402, Martin Hofmann 326, Josef Küng 440, 365, Craig Chambers 403, Jonathan D. Cohen 327, Rusins Freivalds 441, Zoran Duric 366, 404, Jiawei Han 328, Soumen Chakrabarti 442, Douglas R. Stinson 367, Cheng-Ta Hsieh 405, Klaus Jansen 329, Atul Adya 443, Jan Kratochvíl 368, Arnon Rosenthal 406, 330, Alexander L. Wolf 444, Ruud M. Bolle

81

445, Brian Unger 483, Thomas Schiex 521, Zsolt Tuza 559, Miodrag Potkonjak

446, Paola Campadelli 484, Richard B. Bunt 522, Ioannis G. Tollis 560, Yiannis Aloimonos

447, Tosiyasu L. Kunii 485, Emilio L. Zapata 523, Gregory Gutin 561, Carlo Blundo

448, Sérgio Lifschitz 486, Martin Vetterli 524, Amr El Abbadi 562, Kazuo Iwama

449, Debanjan Saha 487, Hiroshi Nagamochi 525, William C. Regli 563, Toshihiro Fujito

450, Ravi Kumar 488, Timo Ojala 526, Rajeev Raman 564, Andrey A. Mironov

451, Michael Molloy 489, Philippe Picouet 527, Roch Guérin 565, Aloysius K. Mok

452, L. Paul Chew 490, Gerald Quirchmayr 528, Marc Stamminger 566, Paul Pettersson

453, Andy D. Pimentel 491, Anne Rogers 529, Stefan Savage 567, Tamar Eilam

454, Sally Floyd 492, Dipak Ghosal 530, Nadia Busi 568, Kiem-Phong Vo

455, Jun-Wei Hsieh 493, Michael T. Goodrich 531, Ashim Garg 569, Frank Piessens

456, Rachid Deriche 494, Marios D. Dikaiakos 532, Daniel Cohen-Or 570,

457, Louxin Zhang 495, Ian F. Akyildiz 533, Madhavan Mukund 571, Cormac Flanagan

458, Yoram Ofek 496, Luis Díaz de Cerio 534, Stefano Rizzi 572, Torsten Suel

459, Richard B. Tan 497, Arif Ghafoor 535, Wim Van Laer 573, Lusheng Wang

460, Seungjoon Park 498, Emilia Rosti 536, Yoav Shoham 574, Chiara Renso

461, David Wonnacott 499, P. Krishnan 537, Koenraad De Bosschere 575, Catuscia Palamidessi

462, Carsten Damm 500, Hirohisa Hirukawa 538, Ricard Gavaldà 576, Chunnian Liu

463, Sugih Jamin 501, Susan T. Dumais 539, Stephen J. Maybank 577, Leora Morgenstern

464, Riichiro Mizoguchi 502, Asit Dan 540, Yike Guo 578, Naphtali Rishe

465, David W. Walker 503, Thomas Eiter 541, Edith Schonberg 579, Shikharesh Majumdar

466, Eugene Asarin 504, Luigi Palopoli 542, Ian H. Witten 580, Tibor Jordán

467, Jay Hoeflinger 505, Mike P. Papazoglou 543, Foto N. Afrati 581, Wolfgang Dittrich

468, Bernard Courtois 506, Guozhu Dong 544, Susanne Graf 582, Frank Puppe

469, Edward Y. Chang 507, C. Murray Woodside 545, Ehud Rivlin 583, Anant Agarwal

470, Kazuhisa Makino 508, Rahul Simha 546, Peter Z. Revesz 584, Monika Rauch Henzinger

471, Sérgio Vale Aguiar 509, Dennis B. Smith 547, Takeshi Tokuyama 585, Neri Merhav Campos 510, Hanan Lutfiyya 548, Alessandro Fantechi 586, Mauro Gaspari 472, Lars Arge 511, Raphael Rom 549, Neal E. Young 587, Prathima Agrawal 473, Jordi Domingo-Pascual 512, Giuseppe Psaila 550, Walid G. Aref 588, Kalev Kask 474, Neal Lesh 513, William I. Gasarch 551, John Kubiatowicz 589, Nicolas Halbwachs 475, Wayne D. Gray 514, Amos Beimel 552, Marc J. van Kreveld 590, Evripidis Bampis 476, Yuri Breitbart 515, Jeffrey S. Rosenschein 553, Helmut Pottmann 591, Anwar Elwalid 477, Peter Jeavons 516, Kimberly Keeton 554, Martin E. Dyer 592, Frederic T. Chong 478, Dinesh Bhatia 517, Jan Krajícek 555, Richard J. Lipton 593, Yishay Mansour 479, Gustavo Alonso 518, Hans L. Bodlaender 556, Thomas Thierauf 594, Ragunathan Rajkumar 480, Shi-Kuo Chang 519, Hussein M. Abdel-Wahab 557, Joseph Naor 595, Ettore Merlo 481, Markus Stumptner 520, Alberto Caprara 558, David P. Williamson 596, J. V. Tucker 482, Alain J. Mayer

82

597, Dimitris Papadias 635, Pavel Zezula 673, Jude W. Shavlik 711, Joe Kilian

598, Pavel Pudlák 636, Stan Sclaroff 674, Felipe Cucker 712, Wojciech Szpankowski

599, Ilan Shimshoni 637, Jirí Matousek 675, Philippe Codognet 713, Vishwani D. Agrawal

600, Alex Kondratyev 638, Maria J. García de 676, Meichun Hsu 714, Roger Mohr la Banda 601, Jen-Yao Chung 677, 715, 639, Faith E. Fich 602, Giuseppe Santucci 678, Marcos A. Kiwi 716, Tor Helleseth 640, Ravi Sundaram 603, Ravi Janardan 679, Allen B. Tucker 717, Jorng-Tzong Horng 641, Hassan Gomaa 604, Yasuhito Mukouchi 680, François Llirbat 718, Rob A. Rutenbar 642, Maurizio Tucci 605, Mary Shaw 681, Dan Halperin 719, Marco Roccetti 643, C. Lee Giles 606, Raymond A. Paul 682, Deborah Estrin 720, Manuel M. T. Chakravarty 644, Sven Schuierer 607, Miguel Valero- 683, Joseph Pasquale 721, Alfredo De Santis García 645, Danilo Montesi 684, Jörg Keller 722, Michael I. Schwartzbach 608, Mohan Kumar 646, Yonatan Aumann 685, Paul Molitor 723, Hasan M. Jamil 609, David B. Shmoys 647, John G. Cleary 686, William Buxton 724, David Wai-Lok Cheung 610, Maurizio Gabbrielli 648, Ion Muslea 687, Kotagiri Ramamohanarao 725, Giorgio Ghelli 611, Gagan Agrawal 649, Jie Wang 688, George C. Polyzos 726, Siau-Cheng Khoo 612, 650, Esther Pacitti 689, Keith Edwards 727, Lefteris M. Kirousis 613, Garth Isaak 651, Rhan Ha 690, Goetz Graefe 728, Umakishore Ramachandran 614, Shinichi Shimozono 652, Byung Ro Moon 691, Michael J. Maher 729, Arne Andersson 615, Sue Whitesides 653, Joseph S. B. Mitchell 692, Peter W. O'Hearn 730, George Nagy 616, 654, Somesh Jha 693, Lawrence Rauchwerger 731, Matti Pietikäinen 617, Michael J. Kearns 655, Paul W. Goldberg 694, Marc Gyssens 732, 618, Symvonis 656, Ilan Newman 695, Roger L. Wainwright 733, Fabio Gadducci 619, Adrian Segall 657, Marcelo J. Weinberger 696, 734, Hesham El-Rewini 620, Jayanth Majhi 658, Ahmed K. Elmagarmid 697, Thorsten Altenkirch 735, Ellen Sentovich 621, Sheng-Tzong Cheng 659, Daniel Jackson 698, Peter Feldmann 736, Robert E. Schapire 622, Kim Marriott 660, Alex Delis 699, Roger S. Barga 737, Leen Torenvliet 623, Philip S. Yu 661, Marius Zimand 700, Peter Eades 738, Hans-Jörg Schek 624, Surajit Chaudhuri 662, Remzi H. Arpaci-Dusseau 701, John F. Roddick 739, Graham Brightwell 625, Susanne E. Hambrusch 663, Richard Cole 702, Michael J. Dinneen 740, Jean Roman 626, Niraj K. Jha 664, Ben Shneiderman 703, Zhongcheng Li 741, Clifford Stein 627, Michael Lindenbaum 665, Bernd Becker 704, Ayellet Tal 742, Jim Gray 628, Cathy H. Wu 666, Jens Knoop 705, Hing Leung 743, David Avis 629, Thomas Ball 667, Yannis Theodoridis 706, Thomas Stricker 744, Jeffrey Dean 630, Scott T. Leutenegger 668, Aart Blokhuis 707, Paul D. Coddington 745, Jayashree Saxena 631, Pawel Gburzynski 669, B. S. Manjunath 708, Franz Aurenhammer 746, Jerzy W. Grzymala-Busse 632, Vivek Tiwari 670, 709, Vassilis J. Tsotras 747, Doug Burger 633, William Aiello 671, 710, Costas S. Iliopoulos 748, Miklos Santha 634, Maurizio Talamo 672,

83

749, Diego Calvanese 787, Wang Yi 825, Zhi-Li Zhang 863, Indrakshi Ray

750, Tiziana Catarci 788, Esteban Feuerstein 826, Yi Pan 864, David P. Dobkin

751, Genoveffa Tortora 789, Sheng Yu 827, Elizabeth D. Mynatt 865, Péter Komjáth 752, Stanley M. Sutton Jr. 790, Chung Keung Poon 828, Moshe Y. Vardi 866, Bruno Courcelle 753, Torsten Schaub 791, Gary L. Miller 829, Roberto Gorrieri 867, Georges G. Grinstein 754, Nicola Leone 792, Jeffrey Shallit 830, Patrick Solé 868, Amitabh Varshney 755, Bernhard Mitschang 793, Peter F. Sweeney 831, James A. Storer 869, Rance Cleaveland 756, Denis Trystram 794, Ben Kao 832, Mark R. Greenstreet 870, Frieder Stolzenburg 757, Stasys Jukna 795, Benjamin Melamed 833, Cosimo Laneve 871, James W. Hong 758, Jeffrey D. Ullman 796, Paul Tarau 834, Pietro Pala 872, Gregory Dudek 759, Winfried Hochstättler 797, Kenichi Yoshida 835, Frank M. Shipman III 873, Mark D. Hill 760, Roger King 798, Byron Dom 836, Riccardo Focardi 874, Kostas Kontogiannis 761, Michael K. Reiter 799, Peter W. Shor 837, Hazel Everett 875, 762, Shigeru Yamashita 800, Vivek S. Borkar 838, Roberto Giacobazzi 876, Michael L. Scott 763, Hector J. Levesque 801, Cory J. Butz 839, Ömer Egecioglu 877, Russell Impagliazzo 764, Claudia V. Goldman 802, Barbara Pernici 840, Matthias Westermann 878, Hui Zhang 765, Chung-Kuan Cheng 803, Furio Honsell 841, Timo Raita 879, Rudolf Freund 766, Karol Myszkowski 804, Hans-Joachim Wunderlich 842, Elmar Schömer 880, Jörg Vogel 767, Ronen Basri 805, Kazue Sako 843, Elsa L. Gunter 881, Alexandru Mateescu 768, Yacov Hel-Or 806, Leandros Tassiulas 844, Chung-Sheng Li 882, Toby J. Teorey 769, Johannes Köbler 807, Shigeo Takahashi 845, Giuliana Vitiello 883, Sandeep K. S. Gupta 770, Mark de Berg 808, Jörg-Rüdiger 846, Prasad Tetali Sack 884, Setsuo Arikawa 771, János Pach 847, Uri Zwick 809, S. Lennart Johnsson 885, Jean-Claude Bermond 772, Zhiyong Peng 848, Felice Balarin 810, Andrew F. Monk 886, Yair Bartal 773, Longin Jan Latecki 849, Kiyoung Choi 811, Steve Plimpton 887, András Frank 774, Annalisa Bossi 850, Marcel-Catalin Rosu 812, Sheila S. Hemami 888, Henri E. Bal 775, Tamás Linder 851, Orna Grumberg 813, Leonard J. Seligman 889, Rodney G. Downey 776, Jonathan L. Herlocker 852, Jianer Chen 814, Soyeon Park 890, Tsong Yueh Chen 777, Jorge Lobo 853, Gary D. Hachtel 815, Robert C. Williamson 891, Christino Tamon 778, Helen M. Edwards 854, Manuel V. Hermenegildo 816, Cheng-Kok Koh 892, Kenneth Y. Goldberg 779, Mitsunori Ogihara 855, Jitendra Malik 817, Nicholas F. Maxemchuk 893, Martin Doerr 780, Rajmohan Rajaraman 856, Jean-Daniel Boissonnat 818, Yunshan Zhu 894, Ram Zamir 781, Toniann Pitassi 857, Giorgio C. Buttazzo 819, Edward W. Knightly 895, Augusto Celentano 782, Sarit Mukherjee 858, Klaus W. Wagner 820, Markus Stolze 896, Julio Villalba 783, Suresh Subramaniam 859, R. Srikant 821, Richard Furuta 897, Ronen I. Brafman 784, S. V. Raghavan 860, Th. Haniotakis 822, Guang R. Gao 898, Sven Oliver Krumke 785, 861, Tatsuaki Okamoto 823, Jon B. Weissman 899, Béla Bollobás 786, Osamu Watanabe 862, C. Y. Roger Chen 824, Sandeep K. Gupta 900, Malcolm Munro

84

901, Susan B. Davidson 939, Vijaya Ramachandran 977, 1015, Hubertus Franke

902, S. K. Michael Wong 940, Jordi Cortadella 978, George Samaras 1016, Dexter Kozen

903, Stefano Levialdi 941, Ralf Klamma 979, Stanislaw Jarecki 1017, Matthew K. Franklin

904, René van Oostrum 942, David A. Basin 980, Joan Feigenbaum 1018, Jianwen Su

905, Burkhard Monien 943, Giuseppe Liotta 981, Ralph J. Faudree 1019, Masami Hagiya

906, Dov M. Gabbay 944, Cyrus Shahabi 982, Adnan Aziz 1020, Michael Schrefl

907, Wil M. P. van der Aalst 945, Dinesh Manocha 983, 1021, Olivier Danvy

908, Baback Moghaddam 946, Ting-Chi Wang 984, Piotr J. Gmytrasiewicz 1022, Joseph O'Rourke

909, Arthur L. Liestman 947, José D. P. Rolim 985, András 1023, Giuseppe Castagna Gyárfás 910, Takeshi Shinohara 948, Robert B. Jones 1024, Sandip Kundu 986, David Bremner 911, Jesús Labarta 949, Peter C. Fishburn 1025, Edward A. Lee 987, Sang Hyuk Son 912, Eduard Ayguadé 950, Jitendra Khare 1026, Praveen Seshadri 988, Christian Laugier 913, Michele Flammini 951, Mitsuru Ikeda 1027, Nageswara S. V. Rao 989, Anil Maheshwari 914, Sanguthevar Rajasekaran 952, Jean Ponce 1028, David I. August 990, Vojin G. Oklobdzija 915, Annalisa De Bonis 953, Dyi-Rong Duh 1029, Scott R. Tilley 991, Lalit M. Patnaik 916, Jingyuan Zhang 954, Melvin A. Breuer 1030, Don S. Batory 992, Francis C. M. Lau 917, Thomas Hofmeister 955, Jia-Shung Wang 1031, Ruei-Chuan Chang 993, Charles Consel 918, Ralph Schäfer 956, Hans-Peter Lenhof 1032, Satish Rao 994, Mariette Yvinec 919, Koen Langendoen 957, 1033, Laure Blanc- 995, Daniela Fogli Féraud 920, Krishna Bharat 958, Irfan A. Essa 996, Anand Raghunathan 1034, Hannu Toivonen 921, John A. Keane 959, Jerzy Tyszer 997, Gordon Kurtenbach 1035, Marek Karpinski 922, Dirk Van Gucht 960, Marinus J. Plasmeijer 998, Vugranam C. Sreedhar 1036, Vikram S. Adve 923, John M. Carroll 961, Ishfaq Ahmad 999, Christopher D. Carothers 1037, Thomas A. Henzinger 924, Enrico Motta 962, Kenneth L. McMillan 1000, George Spanoudakis 1038, Alan Fekete 925, Christian Schindelhauer 963, Nancy A. Lynch 1001, Ghaleb Abdulla 1039, Penny E. Haxell 926, Jock D. Mackinlay 964, Heonshik Shin 1002, Norbert Ritter 1040, Andrew Y. Ng 927, Cinzia Bernardeschi 965, S. Seshadri 1003, Max J. Egenhofer 1041, L. Richard Carley 928, Ashfaq A. Khokhar 966, Robert L. Stevenson 1004, Ralf Klasing 1042, Jerome A. Rolia 929, Yoshihisa Shinagawa 967, Alan W. Brown 1005, Takayoshi Shoudai 1043, Alexei Sourin 930, Frits W. Vaandrager 968, Lila Kari 1006, Godfried T. Toussaint 1044, Hongji Yang 931, Dimitrios M. Thilikos 969, Marcelo Lubaszewski 1007, Desh Ranjan 1045, Elena Baralis 932, Ming L. Liou 970, 1008, Richard C. H. Connor 1046, Anne Condon 933, Richard T. Snodgrass 971, Niki Pissinou 1009, Janet L. Wiener 1047, Yaakov Kogan 934, Fausto Rabitti 972, Andrew C. Myers 1010, Adi Rosén 1048, Gio Wiederhold 935, Benjamin C. Pierce 973, Heikki Mannila 1011, Ronald Cramer 1049, Danièle Gardy 936, Enrico Pontelli 974, Giuseppe De Giacomo 1012, Martin Fürer 1050, Dieter Merkl 937, Jawahar Jain 975, 1013, Stephen A. Fenner 1051, Amir M. Ben-Amram 938, Fang Chen 976, Benny Sudakov 1014, Yuval Shavitt 1052, Philippe Nain

85

1053, Daniel A. Keim 1091, Ion Stoica 1129, King-Ip Lin 1167, Leonard McMillan

1054, Shlomi Dolev 1092, Rubens N. Melo 1130, Nader H. Bshouty 1168, Vladimir Gurvich

1055, John A. Miller 1093, W. J. Teahan 1131, Matthias Jarke 1169, John E. Savage

1056, Alan L. Selman 1094, Dimitrios Georgakopoulos 1132, A. Prasad Sistla 1170, Dhabaleswar K. Panda

1057, Josep Llosa 1095, Zahari Zlatev 1133, Alexander A. Pasko 1171, Jun Yang

1058, 1096, Maged M. Michael 1134, P. Vijay Kumar 1172, V. S. Subrahmanian

1059, Anthony Hunter 1097, Victor Neumann-Lara 1135, Lawrence S. Moss 1173, Jaime G. Carbonell

1060, 1098, Ashish Gupta 1136, Ivan Dimov 1174, Chin-Wen Ho

1061, Alexander Schrijver 1099, Osmar R. Zaïane 1137, Calvin Lin 1175, Michael W. Berry

1062, Charles J. Alpert 1100, Gösta Grahne 1138, Leonid Libkin 1176, Reinhard Männer

1063, Alberto Bertoni 1101, Imre Leader 1139, Giorgio Gambosi 1177, Stephen G. Eick

1064, Pratap Pattnaik 1102, Jennifer Widom 1140, Richard J. Wallace 1178, George S. Almasi

1065, Frank S. de Boer 1103, Thomas Tesch 1141, John Tromp 1179, Richard St.-Denis

1066, David Binkley 1104, Mojmír 1142, Elliot Soloway 1180, Tiziano Villa Kretínský 1067, Werner Winiwarter 1143, Wei-Ngan Chin 1181, Sujit Dey 1105, Panos Constantopoulos 1068, Job Zwiers 1144, Mahmoud Naghshineh 1182, G. Dick van Albada 1106, W. Melody Moh 1069, Anand Rajaraman 1145, Kim J. Vicente 1183, Juraj Hromkovic 1107, Jehoshua Bruck 1070, 1146, Nils Klarlund 1184, Yiyu Yao 1108, Nathan Linial 1071, Moreno Falaschi 1147, M. Paschalis 1185, Yinghua Min 1109, Luca de Alfaro 1072, Sunil Arya 1148, Bernhard Nebel 1186, Gábor Lugosi 1110, Alexander Tuzikov 1073, Michael E. Houle 1149, Pavel Tvrdík 1187, Maria Tortorella 1111, Jack Dongarra 1074, Yu-Kwong Kwok 1150, Carlo Zaniolo 1188, David A. Wood 1112, Xiaohua Hu 1075, Giuseppe Di Battista 1151, Steven S. Seiden 1189, Michael J. Feeley 1113, Eugene C. Freuder 1076, Heiko Schröder 1152, Long Quan 1190, Masahiro Fujita 1114, Sophie Cluet 1077, Wolfgang Kunz 1153, David Bruce Wilson 1191, Marc Snir 1115, Roland Wagner 1078, Jason Tsong-Li Wang 1154, Denis Thérien 1192, Gérard Verfaillie 1116, Yoshiharu Kohayakawa 1079, Richard Hull 1155, Martin L. Kersten 1193, Aidong Zhang 1117, Krithi Ramamritham 1080, John T. Baldwin 1156, Michael Theobald 1194, Sriram V. Pemmaraju 1118, Pawan Goyal 1081, James Abello 1157, Kunsoo Park 1195, Maurizio Martelli 1119, Stefano Ceri 1082, Edmund H. Durfee 1158, David Hung-Chang Du 1196, Curtis E. Dyreson 1120, Fabio Paternò 1083, Giovanna Guerrini 1159, Ronald J. Gould 1197, Stefano Bistarelli 1121, Cristian Calude 1084, Walter Unger 1160, 1198, David R. O'Hallaron 1122, Dekang Lin 1085, Alberto Del Bimbo 1161, Luc De Raedt 1199, Anoop Gupta 1123, Tomoyuki Yamakami 1086, Don Coppersmith 1162, James Frew 1200, James Bailey 1124, Gábor N. Sárközy 1087, Shmuel T. Klein 1163, Shibu Yooseph 1201, Oswald Drobnik

1125, Matthias Felleisen 1088, Srimat T. Chakradhar 1164, Dana S. Nau 1202, Camillo J. Taylor

1126, E. Allen Emerson 1089, Vijay Kumar 1165, Margaret-Anne D. Storey 1203, Naomi Nishimura

1127, Wlodzimierz Funika 1090, Jörg Liebeherr 1166, Manuel Palomar 1204, Alexandre Yakovlev

1128, Monique Laurent

86

1205, Martin Dietzfelbinger 1243, Sreejit Chakravarty 1281, Marina Lenisa 1319, Farnam Jahanian

1206, Manfred Kudlek 1244, Francine Chen 1282, Onn Shehory 1320, Bernhard Steffen

1207, Christian Icking 1245, Maria I. Sessa 1283, Matthew L. Ginsberg 1321, Ruth Silverman

1208, Raymond Reiter 1246, Yair Frankel 1284, J. Leon Zhao 1322, Miron Abramovici

1209, Giovanni Di Crescenzo 1247, Joe Marks 1285, Claude Thibeault 1323, Roberto Fiutem

1210, Renata Slota 1248, Kentaro Toyama 1286, Oswin Aichholzer 1324, Vittorio Maniezzo

1211, Christof Fetzer 1249, Biing-Feng Wang 1287, Wen-Syan Li 1325, Zhen Liu

1212, Brian L. Evans 1250, Sandy Irani 1288, Maciej Liskiewicz 1326, Johan Håstad

1213, Robert Godin 1251, Roberto Tamassia 1289, Bertrand Meyer 1327, Raimondo Schettini

1214, Lichan Hong 1252, Pietro Di 1290, Fahiem Bacchus 1328, Luc Longpré

1215, Rajeev Rastogi 1253, Deborah L. McGuinness 1291, Milind A. Sohoni 1329, Michel X. Goemans

1216, 1254, Laura Giordano 1292, Ramarathnam Venkatesan 1330, Peter M. A. Sloot

1217, Daniel J. Rosenkrantz 1255, Deborah Hix 1293, 1331, Sajal K. Das

1218, Jeffrey Scott Vitter 1256, Georg Gottlob 1294, Raj Acharya 1332, Mitchell Wand

1219, Francesco Ranzato 1257, Ulrich Kremer 1295, Tomás E. Uribe 1333, Kia Makki

1220, Vítor Santos Costa 1258, Sabina Rossi 1296, Quang-Tuan Luong 1334, Thomas Streicher

1221, Joachim Hammer 1259, Yih-Farn Chen 1297, 1335, Peter Winkler

1222, Robert P. Kurshan 1260, Sally Jo Cunningham 1298, Yahiko Kambayashi 1336, Pankaj Jalote

1223, 1261, Jim Melton 1299, José C. Monteiro 1337, Tero Harju

1224, 1262, Pedro C. Diniz 1300, Eli Gafni 1338, Lothar Thiele

1225, Meir Feder 1263, Shai Ben-David 1301, Derek L. Eager 1339, Johannes Sametinger

1226, Ethan L. Miller 1264, H. Ramesh 1302, Xiaoyang Sean Wang 1340, Hausi A. Müller

1227, Sophie Laplante 1265, Yong Meng Teo 1303, Aniello Cimitile 1341, Michael A. Palis

1228, Alberto Marchetti- 1266, Stefania Gnesi 1304, Klaus Pohl 1342, Kim B. Bruce Spaccamela 1267, Anders Dessmark 1305, Ming-Syan Chen 1343, Maria J. Serna 1229, Jeffrey B. Remmel 1268, Ulrich Furbach 1306, James Davis 1344, Ivan Stojmenovic 1230, Grady Booch 1269, Yi-Min Wang 1307, Ted Herman 1345, Jacob Slonim 1231, Alexander Moshe Rabino- vich 1270, 1308, Kåre J. Kristof- 1346, Howard Straubing fersen 1232, Yervant Zorian 1271, Thomas M. Conte 1347, 1309, Bart Demoen 1233, Richard M. Karp 1272, Walter Kern 1348, Dimitris Gizopoulos 1310, David A. Padua 1234, Arie Segev 1273, Maria Francesca Costabile 1349, Andreas Jakoby 1311, Paul B. Kantor 1235, James R. Goodman 1274, Henny Sipma 1350, Prabir Bhattacharya 1312, Nicoletta Cocco 1236, Maria Chiara Meo 1275, Vladimiro Sassone 1351, Kavita Ravi 1313, M. Frans Kaashoek 1237, Sanjit K. Mitra 1276, Devika Subramanian 1352, Wei-Ying Ma 1314, Peter Tiño 1238, 1277, Limsoon Wong 1353, Jocelyne Troccaz 1315, Martha E. Pollack 1239, Fabio Casati 1278, Jörg Rothe 1354, Andréa W. Richa 1316, Athman Bouguettaya 1240, Kun-Lung Wu 1279, Michele Boreale 1355, Paolo Bottoni 1317, Karl J. Lieberherr 1241, 1280, Matthew J. Katz 1356, Piero Fraternali 1318, Bozena Kaminska 1242, Pieter H. Hartel

87

1357, 1395, Ravi Jain 1433, Filomena Ferrucci 1471, Kyle Gallivan

1358, Kurt Maly 1396, Arun Hampapur 1434, Endre Boros 1472, Dilip D. Kandlur

1359, Attawith Sudsang 1397, Lydia E. Kavraki 1435, Elke A. Rundensteiner 1473, Michael Merritt

1360, Francesca Rossi 1398, Edward A. Bender 1436, Sibabrata Ray 1474, James R. Larus

1361, Ivan Damgård 1399, Giuseppe Pozzi 1437, Giorgio Ausiello 1475, David J. Musliner

1362, 1400, Injong Rhee 1438, Louis O. Hertzberger 1476, Carl-Johan H. Seger

1363, Krishna V. Palem 1401, Mirka Miller 1439, Rajeev Murgai 1477, Bruce W. Weide

1364, Weizhen Mao 1402, Robert Schreiber 1440, Lawrence Snyder 1478, Victor Y. Pan

1365, Orli Waarts 1403, Siang W. Song 1441, Chi-Ying Tsui 1479, Nicoletta De Francesco

1366, 1404, Thomas Wiegand 1442, Gerald E. Farin 1480, Sumit Roy

1367, Jozef Hooman 1405, Catherine C. Marshall 1443, Philip M. Long 1481, Gheorghe Paun

1368, Jacobo Torán 1406, Jan O. Pedersen 1444, Gadi Taubenfeld 1482, Jack Snoeyink

1369, Sara Comai 1407, Geoffrey Holmes 1445, George W. Fitzmaurice 1483, Philip A. Bernstein

1370, Kevin B. Theobald 1408, Silvana Castano 1446, Malcolm P. Atkinson 1484, Gary Bishop

1371, Frederic Desprez 1409, Pravin Varaiya 1447, Narain H. Gehani 1485, Rüdiger Reischuk

1372, Patrick Valduriez 1410, Gill Barequet 1448, Omran A. Bukhres 1486, Andrzej Lingas

1373, Jiri Sgall 1411, Richard A. Shore 1449, Amer Diwan 1487, Steven A. Demurjian

1374, Chau-Wen Tseng 1412, Reidar Conradi 1450, Laura Tarantino 1488, Jyrki Katajainen

1375, Margaret Martonosi 1413, László 1451, Helmut Alt 1489, Luc Vandeurzen Lovász 1376, Diana Marculescu 1452, Alexander Borgida 1490, Tzi-cker Chiueh 1414, Javier D. Bruguera 1377, Leonard J. Schulman 1453, Keshav Pingali 1491, Frédéric 1415, Kang G. Shin Andrès 1378, Bogdan S. Chlebus 1454, Hisashi Nakamura 1416, Bengt Jonsson 1492, Andrea Corradini 1379, Werner Nutt 1455, Faron Moller 1417, John Lillis 1493, 1380, 1456, Luc Devroye 1418, Francis Y. L. Chin 1494, Rob J. van Glabbeek 1381, Afonso Ferreira 1457, Oded Maler 1419, Paola Inverardi 1495, Komei Fukuda 1382, Barbara Simons 1458, Nicolò Cesa- 1420, Daniel J. Costello Jr. Bianchi 1496, Clarence A. Ellis 1383, Achim Kraiss 1421, Ken Kennedy 1459, Yi Deng 1497, Klara Kedem 1384, Phillip B. Gibbons 1422, Danny Krizanc 1460, Reinhard Klette 1498, Son T. Vuong 1385, Peter Auer 1423, Jon M. Kleinberg 1461, Davide Sangiorgi 1499, Larry Wilson 1386, Martín Abadi 1424, Guy Even 1462, Harrick M. Vin 1500, Il-Yeol Song 1387, Arnold L. Rosenberg 1425, Tarek F. Abdelzaher 1463, Sharad C. Seth 1501, Steve Lawrence 1388, Zahir Tari 1426, Sally A. Goldman 1464, Naoki Katoh 1502, Allan Borodin 1389, Bill Triggs 1427, Robert K. Brayton 1465, Bart Kuijpers 1503, Prashant J. Shenoy 1390, Willem P. de Roever 1428, Micha Sharir 1466, Amitava Mukherjee 1504, Donald F. Towsley 1391, Rajeev Alur 1429, Richard H. Schelp 1467, Burkhard Freitag 1505, Dewayne E. Perry 1392, Wilson C. Hsieh 1430, Chong-Sang Kim 1468, John H. Reif 1506, Peter J. Stuckey 1393, Gennaro Costagliola 1431, Eric Dubois 1469, Alistair Moffat 1507, H. V. Jagadish 1394, Hélène Fargier 1432, Saharon Shelah 1470, Mark Klein 1508, Daniel P. Lopresti

88

1509, Paolo Ciaccia 1547, Jean-Claude König 1585, Massimo Poncino 1623, Francesco Buccafurri

1510, Gad M. Landau 1548, Florin Sultan 1586, Candido Ferreira Xavier de 1624, Giuseppe Visaggio Mendonça Neto 1511, James A. Thom 1549, Ron Holzman 1625, Ewan D. Tempero 1587, Roland H. C. Yap 1512, Narendra Ahuja 1550, Alok N. Choudhary 1626, Stuart K. Card 1588, Simone Santini 1513, Michael Kishinevsky 1551, Eduardo D. Sontag 1627, Aya Soffer 1589, Wei Shu 1514, Vladimir V. Savchenko 1552, Vincenzo Ambriola 1628, Patrick W. Dymond 1590, Fangzhen Lin 1515, Reuven Cohen 1553, Stavros Tripakis 1629, Srikanth Venkataraman 1591, Arvind Krishnamurthy 1516, Yuan-Fang Wang 1554, Thomas C. Shermer 1630, Jack H. Lutz 1592, Harumi A. Kuno 1517, Carla D. Savage 1555, Crispin Cowan 1631, James C. French 1593, Donald Yeung 1518, Leo Mark 1556, Ramesh Viswanathan 1632, Larry Rudolph 1594, Evaggelia Pitoura 1519, Shubhendu S. Mukherjee 1557, Roberto Barbuti 1633, Raphael Yuster 1595, 1520, Anantha P. Chandrakasan 1558, Damian Niwinski 1634, Isabel F. Cruz 1596, Sang Lyul Min 1521, Michael Kaminski 1559, Olivier Devillers 1635, Aravind Srinivasan 1597, Paul C. Clements 1522, Shuvra S. Bhattacharyya 1560, Manfred A. Jeusfeld 1636, Venkata N. Padmanabhan 1598, Didier Rémy 1523, Stefan Näher 1561, Peter J. Denning 1637, Madhavan Swaminathan 1599, Wilhelm Schäfer 1524, Piero A. Bonatti 1562, Henry F. Korth 1638, David Mazières 1600, Philip Heidelberger 1525, Abhijit Chatterjee 1563, Cynthia A. Phillips 1639, Nicolai Vorobjov 1601, Satoshi Fujita 1526, Lenore Cowen 1564, Howard J. Karloff 1640, Egon Balas 1602, Jens Palsberg 1527, Bharat K. Bhargava 1565, Jan Prins 1641, Jeffrey C. Jackson 1603, Serdar Tasiran 1528, Rakesh Agrawal 1566, Christian Scheideler 1642, Dimitris Nikolos 1604, Charles Lee Isbell Jr. 1529, Douglas B. West 1567, Qiming Chen 1643, Rocco De Nicola 1605, Artur Czumaj 1530, Phillip M. Dickens 1568, Rajesh Bordawekar 1644, Wee Sun Lee 1606, Sándor P. Fekete 1531, Sven J. Dickinson 1569, Loren G. Terveen 1645, Robert W. Brodersen 1607, Stephan Olariu 1532, Mark Aagaard 1570, 1646, Joachim Hertzberg 1608, Scott E. Hudson 1533, 1571, Tomás Lang 1647, Gabriel M. Kuper 1609, Berthold Vöcking 1534, Giuseppe Manco 1572, Ulrich Pferschy 1648, Douglas W. Clark 1610, Sachin S. Sapatnekar 1535, Prakash Panangaden 1573, Klaus U. Schulz 1649, David S. Munro 1611, Sergio Yovine 1536, Pankaj Rohatgi 1574, Luis F. Romero 1650, Neil Hindman 1612, Naofumi Takagi 1537, Benno J. Overeinder 1575, Martin Plátek 1651, W. Kent Fuchs 1613, Stefano Paraboschi 1538, Alexander Birman 1576, Maurizio M. 1652, Joel H. Saltz Munafò 1614, Vladimir I. Levenshtein 1539, Stephan Merz 1653, Ramesh Jain 1577, Nelson Mendonça 1615, Sung-Yong Park Mattos 1540, Jien-Chung Lo 1654, Wolfram Wöß 1616, Alex Pentland 1578, Peter T. Wood 1541, Hanno Lefmann 1655, Seog Park 1617, David R. Karger 1542, Subodh Kumar 1579, Andrea Bianco 1656, Evelina Lamma 1618, Jouko A. 1580, Samir Khuller Väänänen 1543, Bhaskar DasGupta 1657,

1581, Michael Codish 1619, Christoph Meinel 1544, John Power 1658, Samuel P. Midkiff

1582, David W. Juedes 1620, Thomas P. Moran 1545, Andrew Thomason 1659, David J. Kriegman

1583, Ravi Mukkamala 1621, Tomasz Luczak 1546, 1660, Gerhard J. Woeginger

1584, Colleen Cool 1622, David A. Hull

89

1661, 1699, Lucy T. Nowell 1737, Flavio Corradini 1774, Paul D. Seymour

1662, Robert J. K. Jacob 1700, Sanjiv Kapoor 1738, Mark Guzdial 1775, Gregory Grefenstette

1663, Ding-Zhu Du 1701, Michael Brady 1739, Domenico Saccà 1776, Alexander Russell

1664, Joan P. Hutchinson 1702, Linda A. Macaulay 1740, Luca Lombardi 1777, Azriel Rosenfeld

1665, Alberto Apostolico 1703, Raoul Bhoedjang 1741, Haim Kaplan 1778, Bede Liu

1666, 1704, Monica Sebillo 1742, Jean-Claude Latombe 1779, Margrit Betke

1667, Jai Menon 1705, Michael J. Carey 1743, Jack Brassil 1780, Prasun Dewan

1668, Ashok K. Agrawala 1706, Anna Labella 1744, Søren Forchham- 1781, Wolfgang Straßer mer 1669, Tsan-sheng Hsu 1707, 1782, Juhani Karhumäki 1745, Alexander Kogan 1670, John M. Mellor-Crummey 1708, Abraham Silberschatz 1783, 1746, Chandra Chekuri 1671, Peter J. Haas 1709, Christian Huemer 1784, Davide Maltoni 1747, Vinay K. Chaudhri 1672, Ronald Parr 1710, Predrag R. Jelenkovic 1785, Leana Golubchik 1748, A. 1673, Joxan Jaffar 1711, Sang-Wook Kim 1786, Gustav Pomberger 1749, Zena M. Ariola 1674, Manfred K. Warmuth 1712, Mark Moir 1787, Michel Barbeau 1750, Giorgio Levi 1675, Peter Buneman 1713, George Varghese 1788, Moshe Sidi 1751, Per-Åke Larson 1676, Derick Wood 1714, Andrea Asperti 1789, Srinivasan Seshan 1752, Valeria De Antonellis 1677, Upamanyu Madhow 1715, Kuo-Chin Fan 1790, 1753, Gultekin Özsoyoglu 1678, Won Kim 1716, Markus Holzer 1791, Amotz Bar-Noy 1754, Gerti Kappel 1679, Thomas W. Reps 1717, Gigliola Vaglini 1792, Amnon Shashua 1755, Chung-Min Chen 1680, Anthony Jameson 1718, Patrick Chiu 1793, André Nies 1756, Pavel A. Pevzner 1681, Miroslaw Truszczynski 1719, Alan Roberts 1794, Krys Kochut 1757, Wen-mei W. Hwu 1682, Eli Upfal 1720, Thomas Rist 1795, Riccardo Torlone 1758, Yannis E. Ioannidis 1683, Sumanta Guha 1721, Kaizhong Zhang 1796, Carlo Tomasi 1759, Vijay Karamcheti 1684, Assaf Schuster 1722, Wolfgang Pree 1797, Vojtech Rödl 1760, Ruurd Kuiper 1685, Giri Narasimhan 1723, Tsai-Yen Li 1798, Osamu Maruyama 1761, Stephan Waack 1686, Michael L. Bushnell 1724, 1799, Guy Louchard 1762, Saman P. Amarasinghe 1687, Frank Stephan 1725, Vincenzo Auletta 1800, Nicholas Kushmerick 1763, Ph. Schnoebelen 1688, Christos Faloutsos 1726, Hubert de Fraysseix 1801, Gopal Gupta 1764, Lata Narayanan 1689, Gerardo Canfora 1727, Samuel R. Buss 1802, Geoffrey Fox 1765, Marti A. Hearst 1690, Daniela Rus 1728, Peter F. Sturm 1803, Elias Dahlhaus 1766, Prosenjit Bose 1691, Leonidas J. Guibas 1729, Guy E. Blelloch 1804, Endre Szemerédi 1767, Jacques Demongeot 1692, Ted Bapty 1730, Mabo Robert Ito 1805, Avi Pfeffer 1768, Harald Søndergaard 1693, Radu Horaud 1731, Min-You Wu 1806, H. Sebastian Seung

1694, Sanghoon Sull 1732, Jaswinder Pal Singh 1769, Frantisek Mráz 1807, Josep Torrellas

1770, David Garlan 1695, Michael L. Brodie 1733, Ran Canetti 1808, Santosh Vempala

1771, Hans Kleine Büning 1696, Gail E. Kaiser 1734, Nevin Heintze 1809, Nicholas J. Belkin

1772, Shahram Ghandeharizadeh 1697, Paola Mello 1735, Hong-Yuan Mark Liao 1810, Timos K. Sellis

1773, José L. 1698, Josep Díaz 1736, Johannes A. La 1811, Robert Harper Balcázar Poutré

90

1812, Harry B. Hunt III 1850, William Pugh 1888, Tiko Kameda 1926, Farhad Shahrokhi

1813, Laurie J. Hendren 1851, Arjen K. Lenstra 1889, Patrick Martin 1927, Klaus E. Schauser

1814, 1852, Jonathan E. Cook 1890, Cornelia Fermüller 1928, Anna Rita Fasolino

1815, Werner Damm 1853, Kenneth A. Hawick 1891, Nicholas C. Wormald 1929, N. Asokan

1816, Manuel Fähndrich 1854, Eduard Gröller 1892, Fan R. K. Chung 1930, Martin C. Cooper

1817, Michiel H. M. Smid 1855, Veda C. Storey 1893, Graham N. C. Kirby 1931, Vijay V. Vazirani

1818, Kenneth C. Sevcik 1856, Edmund M. Clarke 1894, Renato Lo Cigno 1932, Louiqa Raschid

1819, Kimberly C. Claffy 1857, Joseph Y. Halpern 1895, Gianfranco Bilardi 1933, Franco Turini

1820, Leonidas Georgiadis 1858, Jaspal Subhlok 1896, W. Keith Edwards 1934, Franz Baader

1821, Janak H. Patel 1859, Barbara M. Chapman 1897, Carl H. Smith 1935, Sukumar Ghosh

1822, Luisa Gargano 1860, Balaji Raghavachari 1898, Chandra M. R. Kintala 1936, Sanjay Jain

1823, Michael S. Hsiao 1861, Wu-chang Feng 1899, González 1937, Hiroshi Ishii

1824, Shmuel Zaks 1862, David E. Johnson 1900, Boaz Patt-Shamir 1938, Kenneth Zeger

1825, Itsik Pe'er 1863, William C. Hill 1901, Theis Rauhe 1939, André van der Hoek 1826, Nick Cercone 1864, S. Muthukrishnan 1902, Man Hon Wong 1940, Clement T. Yu 1827, Vassos Hadzilacos 1865, Andrew Chi-Chih Yao 1903, Minos N. Garofalakis 1941, Zdzislaw Pawlak 1828, Christoph Scholl 1866, Vijayalakshmi Atluri 1904, Hans-Peter Kriegel 1942, Janos Sztipanovits 1829, Paolo Dario 1867, Ying Zhao 1905, Colin Cooper 1943, Sunil M. Shende 1830, Alexander Zelikovsky 1868, 1906, Manfred Opper 1944, Constantine D. Poly- chronopoulos 1831, Bernard Chazelle 1869, Eyal Kushilevitz 1907, Martin Ester

1945, Andrzej Pelc 1832, Gerhard Brewka 1870, Michael E. Saks 1908, Carlo Combi

1833, Patrice Ossona de Mendez 1871, Yvon Savaria 1909, Andrew Turpin 1946, Jae Kyu Lee

1947, Don Kimber 1834, Germán Puebla 1872, Michael Dahlin 1910, Daniel A. Reed

1948, Perwez Shahabuddin 1835, Rolf Wanka 1873, Michael D. Smith 1911, Peter J. Nürnberg

1949, Sergio Verdú 1836, Peter A. Beerel 1874, Mark Handley 1912, Martin P. Ward

1950, Marc Antonini 1837, Cornelia Boldyreff 1875, James H. Anderson 1913, Simon S. Lam

1951, Roger D. Chamberlain 1838, Salim Hariri 1876, E. K. Park 1914, Stephen A. Cook

1839, Bernard Boigelot 1877, Costas Courcoubetis 1915, Harry Hsieh 1952, Luis Gravano

1953, Giorgio Delzanno 1840, Marek Rusinkiewicz 1878, 1916, John A. Stankovic

1954, Ariel Orda 1841, Vikraman Arvind 1879, Margaret H. Dunham 1917, Bill Jackson

1955, Dana Ron 1842, Ramakrishnan Srikant 1880, Andries E. Brouwer 1918, Lynn Wilcox

1956, Vivek Sarkar 1843, Joel L. Wolf 1881, Ravi Kannan 1919, James S. Plank

1957, Herbert Edelsbrunner 1844, 1882, Françoise Fabret 1920, Dario Maio

1845, Richard Y. Wang 1883, Mark H. Chignell 1921, Wei-Min Shen 1958, Kai Salomaa

1959, Tomasz Imielinski 1846, David B. Johnson 1884, Peter Pirolli 1922, Andrei Z. Broder

1960, Pietro Torasso 1847, Gilles Aubert 1885, Corrado Priami 1923, Arlindo L. Oliveira

1961, Heribert Vollmer 1848, Daniel G. Bobrow 1886, Qiang Zhu 1924, J. Gregory Morrisett

1962, S. Sudarshan 1849, Sandeep N. Bhatt 1887, Eric S. K. Yu 1925, Michele Bugliesi

1963, Elisardo Antelo

91

1964, Amitava Datta 2002, Barbara Catania 2039, Alexander A. Razborov 2077, Alan Sussman

1965, Kumiyo Nakakoji 2003, Qing Yang 2040, Naftali Tishby 2078, Spyros Tragoudas

1966, Donald Sannella 2004, San-qi Li 2041, James L. Schwing 2079, Josep Solé-Pareta

1967, Guy W. Mineau 2005, Miron Livny 2042, Andrew A. Chien 2080, Sushil Jajodia

1968, Ulrich Meyer 2006, Rene L. Cruz 2043, Moshe Tennenholtz 2081, Binay K. Bhattacharya

1969, Guy Kortsarz 2007, Viatcheslav P. Grishukhin 2044, Cheng-Shang Chang 2082, Alberto O. Mendelzon

1970, Edward P. K. Tsang 2008, Nimrod Megiddo 2045, Avrim Blum 2083, Robert Endre Tarjan

1971, 2009, Danny H. K. Tsang 2046, Jeffrey F. Naughton 2084, Umesh V. Vazirani

1972, Jorge Castro 2010, Erzsébet Csuhaj- 2047, Mark Horowitz 2085, J. Ian Munro Varjú 1973, Michel Barlaud 2048, Ting-Chuen Pong 2086, Zhixiang Chen 2011, Lui Sha 1974, Marcus J. Huber 2049, Paul A. Beardsley 2087, Fabio Somenzi 2012, Pierpaolo Degano 1975, Mary Lou Soffa 2050, Ron van der Meyden 2088, Yi-Bing Lin 2013, Luiz De Rose 1976, Gregory D. Abowd 2051, Harry Buhrman 2089, Daphna Weinshall 2014, Ugo Montanari 1977, David A. Patterson 2052, Fred S. Roberts 2090, Theo Härder 2015, Daniele Micciancio 1978, Hristo Djidjev 2053, Michael R. Fellows 2091, Thomas Zeugmann 2016, Paolo Tonella 1979, 2054, Evangelos Kranakis 2092, Edith Hemaspaandra 2017, Farid N. Najm 1980, Boon-Lock Yeo 2055, Richard Pollack 2093, María Alpuente 2018, Andrea Schaerf 1981, Howard J. Hamilton 2056, Stuart J. Russell 2094, Jacob A. Abraham 2019, Gerhard Weikum 1982, Enrico Vicario 2057, Yelena Yesha 2095, Hans Jürgen 2020, T. V. Lakshman Ohlbach 1983, Anne Rose 2058, David W. Murray 2021, Magnús M. 2096, Hari Balakrishnan Halldórsson 1984, 2059, Abraham Bookstein 2097, Prateek Sarkar 1985, Bernard Tourancheau 2022, Hiroki Ishizaka 2060, Petr Jancar 2098, Weiping Shi 2023, Monique Teillaud 1986, Pei-Hsin Ho 2061, Maurizio Lenzerini 2099, Pascal Koiran 2024, Dennis Shasha 1987, Giuseppe Persiano 2062, Fillia Makedon 2100, Jean-Pierre Tillich 2025, George G. Robertson 1988, Erich J. Neuhold 2063, A. Udaya Shankar 2101, Phillip G. Bradford 2026, François Larous- 1989, Steven M. Nowick 2064, Chris Clifton sinie 2102, 1990, Jürgen Dassow 2065, Paolo Nesi 2027, Lars R. Knudsen 2103, Juan J. Navarro 1991, Bülent Yener 2066, Ross Wilkinson 2028, Daniela Florescu 2104, William C. Chu 1992, Nancy M. Amato 2067, Myron Flickner 2029, Bruce Randall Donald 2105, Ah Chung Tsoi 1993, Idit Keidar 2068, Wonyong Sung 2030, Maria Luisa Sapino 2106, Michael Werman 1994, Tiziana Margaria 2069, Binhai Zhu 2031, Wojciech Ziarko 2107, Jin Yang 1995, Ron M. Roth 2070, Ramaswamy Ramanujam 2032, Drew V. McDermott 2108, Rolf Drechsler 1996, Dror G. Feitelson 2071, Péter L. 2033, Balachander Krishna- Erdös 2109, Sanjeev Khanna murthy 1997, Ming Li 2072, Filippo Lanubile 2110, Dimitrios Gunopulos 2034, Randeep Bhatia 1998, Alexander Repenning 2073, Susan Horwitz 2111, Jørgen Bang- 2035, Michael S. Waterman Jensen 1999, Jan Heering 2074, Edgar A. Ramos 2036, Jürgen Hesser 2112, Hervé 2000, Bonnie J. Dorr Brönnimann 2075, Alessandro Panconesi 2037, Prabhakar Raghavan 2001, 2113, Yehuda Afek 2076, Janusz Rajski 2038, T. C. Ting

92

2114, Sanjoy K. Baruah 2152, Justin Zobel 2190, Dieter Kratsch 2228, Michael A. Bauer

2115, Hans-Peter Seidel 2153, Alistair G. Sutcliffe 2191, Erik R. Altman 2229, Todd A. Proebsting

2116, V. Wiktor Marek 2154, Baudouin Le Charlier 2192, Cao An Wang 2230,

2117, Jean-Charles Régin 2155, Tuvi Etzion 2193, Günther Greiner 2231, Anna Ciampolini

2118, Manish Gupta 2156, Wei Lai 2194, Rolf Wiehagen 2232, Francesco Scarcello

2119, Scott A. Smolka 2157, Nikos I. Karacapilidis 2195, Robin Thomas 2233, Lawrence Chung

2120, Philip Wadler 2158, John D. Kececioglu 2196, Phokion G. Kolaitis 2234, P. David Stotts

2121, Elizabeth M. Rudnick 2159, Guido Araujo 2197, Amruth N. Kumar 2235, Hui Wang

2122, Nada Lavrac 2160, Erich Schweighofer 2198, Olivier D. Faugeras 2236, Wolfgang Klas

2123, Daniel DeMenthon 2161, Narendra V. Shenoy 2199, Jop F. Sibeyn 2237, Jos C. M. Baeten

2124, David E. Culler 2162, Marc Denecker 2200, Sanjeev Setia 2238, Friedhelm Meyer auf der Heide 2125, Shang-Hua Teng 2163, Steven McCanne 2201, Inderjit S. Dhillon 2239, Josep-Lluis Larriba-Pey 2126, Wojciech Rytter 2164, Peter L. Hammer 2202, Hong Va Leong 2240, Ronald Morrison 2127, David Maier 2165, Tapani Hyttinen 2203, Dhamin Al-Khalili 2241, Yu-Chee Tseng 2128, 2166, Klaus P. Jantke 2204, Sarit Kraus 2242, Antonio Si 2129, Yennun Huang 2167, Derek G. Corneil 2205, Marina Papatriantafilou 2243, James Aspnes 2130, Asim J. Al-Khalili 2168, Anna R. Karlin 2206, Alan M. Frieze 2244, Wolfgang Maass 2131, 2169, David K. Gifford 2207, Jonathan Walpole 2245, Vivek K. Goyal 2132, Diane H. Sonnenwald 2170, Kathy Ryall 2208, Miroslaw Kutylowski 2246, Thomas Seidl 2133, Ian D. Reid 2171, James W. Cooper 2209, Kenny Wong 2247, Letizia Tanca 2134, Wayne Wolf 2172, Krassimir Georgiev 2210, Martin Staudt 2248, Ibrahim Kamel 2135, Saddek Bensalem 2173, David M. Mount 2211, Monica S. Lam 2249, Yannis C. Stamatiou 2136, Bernd Gärtner 2174, Kevin Jeffay 2212, Elisabeth André 2250, Sharad Mehrotra 2137, Shimon Even 2175, Jan Paredaens 2213, Santone 2251, Peter L. Bartlett 2138, Satoru Kuhara 2176, Luca Console 2214, Bing Zeng 2252, Sandro Etalle 2139, Arun K. Somani 2177, 2215, Valerie King 2253, Rokia Missaoui 2140, Javier Esparza 2178, Sean W. Smith 2216, Siu-Wing Cheng 2254, Thomas R. Gross 2141, Charles U. Martel 2179, Teofilo F. Gonzalez 2217, David Peleg 2255, 2142, Shuichi Miyazaki 2180, Andrew B. Kahng 2218, Maribeth Back 2256, Michael Kaufmann 2143, Vern Paxson 2181, Ching-Chih Han 2219, Anselmo Lastra 2257, 2144, 2182, Marco Pistore 2220, Salvador Mir 2258, Riccardo Silvestri 2145, Kunle Olukotun 2183, Fabio Neri 2221, Nicola Olivetti 2259, Javier Pinto 2146, Vigyan Singhal 2184, Dimitrios Kagaris 2222, D. Sivakumar 2260, Edward W. Felten 2147, Neil A. M. Maiden 2185, Divyakant Agrawal 2223, Deborah A. Wallach 2261, Alfons Kemper 2148, Ichiro Suzuki 2186, Alessandra Raf- 2224, Ren-Hung Hwang faetà 2262, David Toman 2149, Renate Motschnig-Pitrik 2225, Jing Huang 2187, 2263, C. Mohan 2150, Ricardo Bianchini 2226, Andrea De Lucia 2188, Vassilios V. Dimakopou- 2264, Indradeep Ghosh los 2151, M. Sheelagh T. Carpendale 2227, Timothy Griffin 2265, Stephen D. Scott 2189, Antonella Carbonaro

93

2266, Jayant R. Haritsa 2304, Mendel Rosenblum 2342, Pedro A. Ramos 2380, Vittorio Scarano

2267, Marty Humphrey 2305, Vipul Kashyap 2343, Satish K. Tripathi 2381, Giuliano Pacini

2268, Pasquale Rullo 2306, Eric Rémila 2344, Sridhar Ramaswamy 2382, Zoé Lacroix

2269, Amihood Amir 2307, Jorge Urrutia 2345, Martin D. F. Wong 2383, Allison Woodruff

2270, Anand Sivasubramaniam 2308, Mark R. Tuttle 2346, Shyh-Kwei Chen 2384, Anatol Slissenko

2271, Alan Gibbons 2309, Douglas A. Cenzer 2347, Abdelsalam Helal 2385,

2272, Jan van Leeuwen 2310, Terence R. Smith 2348, John J. Leggett 2386, Gabriel Robins

2273, Ulrich Faigle 2311, Margo I. Seltzer 2349, Shay Kutten 2387, Kai Li

2274, Seth Copen Goldstein 2312, Alberto L. Sangiovanni- 2350, Fred Douglis 2388, Lance R. Williams Vincentelli 2275, Michel Deza 2351, Leonidas Fegaras 2389, Brian N. Bershad 2313, Jeffrey C. Mogul 2276, Zheng Zhang 2352, Yi-Jen Chiang 2390, Ming-Yang Kao 2314, Leon J. Osterweil 2277, Agata Ciabattoni 2353, Jenö Lehel 2391, Anna Gál 2315, Pankaj K. Agarwal 2278, Joseph A. Konstan 2354, Cláudio T. Silva 2392, Edmundo de Souza e Silva 2316, Kwei-Jay Lin 2279, Constance L. Heitmeyer 2355, J. D. Tygar 2393, Hirobumi Nishida 2317, Andrzej Duda 2280, Weiyi Meng 2356, Jane W.-S. Liu 2394, 2318, Christos H. Papadimitriou 2281, David E. Long 2357, Michelangelo Grigni 2395, Ming C. Lin 2319, Imrich Vrto 2282, Massoud Pedram 2358, Paolo Mancarella 2396, Henk L. Muller 2320, Calton Pu 2283, Eric Simon 2359, Asish Mukhopadhyay 2397, Frank Thomson Leighton 2321, Carlos Domingo 2284, Carey L. Williamson 2360, Virgilio Almeida 2398, Marc Noy 2322, Michael Thielscher 2285, Madhav V. Marathe 2361, Matthias Krause 2399, John Shawe-Taylor 2323, Yang Xiang 2286, Tao Yang 2362, Luigi Portinale 2400, Hava T. Siegelmann 2324, Danièle Beauquier 2287, Daniele Theseider 2363, José María 2401, Alexander Tuzhilin Dupré 2325, Robert E. Kraut Carazo 2402, Wolfgang Wahlster 2288, John C. S. Lui 2326, Henry A. Kautz 2364, David M. Nicol 2403, Elvira Mayordomo 2289, Joydeep Ghosh 2327, Umeshwar Dayal 2365, Steven H. Low 2404, Peter M. Schwarz 2290, Dana Randall 2328, Werner Retschitzegger 2366, Scott Hamilton 2405, 2291, Kim Guldstrand Larsen 2329, Luciano Lavagno 2367, David S. L. Wei 2406, Yannis Papakonstantinou 2292, Hiroki Arimura 2330, Toshihide Ibaraki 2368, Benny Chor 2407, Ramesh Hariharan 2293, Günther Pernul 2331, Steffen Hölldobler 2369, Benno Stein 2408, Rudolf Eigenmann 2294, Kai Hwang 2332, Philippas Tsigas 2370, Dorina C. Petriu 2409, Peter A. Dinda 2295, Xiaobo Li 2333, Uwe Engelmann 2371, Minerva M. Yeung 2410, Cormac J. Sreenan 2296, Chuan Yi Tang 2334, Gerhard Lakemeyer 2372, Ronald L. Rivest 2411, George T. Heineman 2297, Charles B. Owen 2335, Alon Efrat 2373, Shambhu J. Upadhyaya 2412, J. Eliot B. Moss 2298, Yoram Hirshfeld 2336, Gustaf Neumann 2374, Isaac Weiss 2413, Torleiv Kløve 2299, Jonathan C. L. Liu 2337, Sandip Sen 2375, Funda Ergün 2414, Richard Cleve 2300, Philipp Slusallek 2338, Rajarshi Mukherjee 2376, Nick Reingold 2415, Henry M. Levy 2301, Nikitas J. Dimopoulos 2339, Dennis McLeod 2377, C. Michael Overstreet 2416, John H. Reppy 2302, Shmuel Onn 2340, Kenneth A. Ross 2378, 2417, Tao Jiang 2303, 2341, Jon Crowcroft 2379, Richard Beigel

94

2418, Gen-Huey Chen 2456, Reiko Heckel 2494, 2532, Claudio Bettini

2419, Rainer Schuler 2457, Anantha Chandrakasan 2495, Alasdair Urquhart 2533, Charles E. Perkins

2420, Randy H. Katz 2458, Chung-Len Lee 2496, Mustafa Uysal 2534, Paul Beame

2421, Edward Omiecinski 2459, Sartaj Sahni 2497, Michael Leuschel 2535, Michael Mascagni

2422, Martin Anthony 2460, Sampath Rangarajan 2498, Tsau Young Lin 2536, Yuri Rabinovich

2423, Jovisa D. Zunic 2461, Pierre-Louis Curien 2499, Luitpold Babel 2537,

2424, Joseph Gil 2462, Ibrahim Matta 2500, Rutger F. H. Hofman 2538, Ching-Tien Ho

2425, Ming-Chien Shan 2463, H. C. M. Kleijn 2501, Ibrahim N. Hajj

2426, Alejandro López- 2464, André Raspaud 2502, David Gilbert Ortiz 2465, Ran El-Yaniv 2503, Neil Robertson 2427, Sampath Kannan 2466, Amir Herzberg 2504, Trevor Darrell 2428, Roberto De Prisco 2467, Martin Charles Golumbic 2505, Andrew Wolfe 2429, Johannes Blömer 2468, Michael S. Jacobson 2506, André de Korvin 2430, John Case 2469, Catherine Plaisant 2507, Sudha Ram 2431, Sergio Rajsbaum 2470, Paul L. Rosin 2508, S. Purushothaman Iyer 2432, Gerhard Fischer 2471, Krzysztof Diks 2509, David Harel 2433, Banu Özden 2472, Guido Moerkotte 2510, F. Javier Thayer 2434, Walter Willinger 2473, Sergio De Agostino 2511, Ian A. Mason 2435, Thomas E. Anderson 2474, 2512, Elisa Bertino 2436, Robert J. Hilderman 2475, Serge A. Plotkin 2513, Srinivas Devadas 2437, Maria Grazia Fugini 2476, James J. Lu 2514, Jana Koehler 2438, Hans W. Guesgen 2477, David Kinny 2515, Soonhoi Ha 2439, Ramaswamy Govindarajan 2478, Ondrej Sýkora 2516, Stefan Schirra 2440, Fadi Dornaika 2479, Noureddine Boudriga 2517, John K. Tsotsos 2441, Maurice Bruynooghe 2480, Gary MacGillivray 2518, Rudolf Fleischer 2442, Günter Rote 2481, Charles Wiles 2519, Martin Mundhenk 2443, Wojciech Plandowski 2482, Catriel Beeri 2520, Ramesh K. Sitaraman 2444, Arun Sharma 2483, Anh Nguyen-Tuong 2521, Lane A. Hemaspaandra 2445, Paolo Atzeni 2484, Paolo Terenziani 2522, Vladimir Estivill-Castro 2446, Hans-Peter Meinzer 2485, Peter Bro Miltersen 2523, Jérôme Lang 2447, Doron Peled 2486, Antonio Brogi 2524, D. Frank Hsu 2448, Christos Kaklamanis 2487, Vijay A. Saraswat 2525, Larry Kerschberg 2449, Froduald Kabanza 2488, Richard Fujimoto 2526, Piotr Berman 2450, Alan Mycroft 2489, Micah Beck 2527, Joaquim Gabarró 2451, J. Alison Noble 2490, Martin Middendorf 2528, Jenwei Hsieh 2452, Jason Cong 2491, Paolo Ciancarini 2529, Peter van Beek 2453, Lixia Zhang 2492, Yassine Lakhnech 2530, David W. Jacobs 2454, Ramamohan Paturi 2493, James Coplien 2531, John W. Byers 2455, Hans P. Zima

95