Generalized Graphs: Theory and Applications∗

Lopamudra Mukherjee1 Vikas Singh1 Jiming Peng2 Jinhui Xu1 Michael J. Zeitz3 Ronald Berezney3 1Department of Computer Science & Eng. 2Industrial & Enterprise Systems Eng. State University of New York at Buffalo University of Illinois at Urbana-Champaign {lm37,vsingh,jinhui}@cse.buffalo.edu [email protected] 3Department of Biological Sciences, State University of New York at Buffalo {mjzeitz,berezney}@buffalo.edu

Abstract ods, graph cuts and minimum spanning trees. In many applications, results from classical can We study the so-called Generalized Median graph be directly adapted to derive efficient solutions; an ap- problem where the task is to to construct a prototype propriate graph construction that encodes available in- (i.e., a ‘model’) from an input set of graphs. The formation presents the main challenge here. In other problem finds applications in many vision (e.g., object cases, the construction strategy is somewhat simpler. recognition) and learning problems where graphs are in- However, issues such as distortion and noise propagate creasingly being adopted as a representation tool. Exist- into the graph representations. For example, we may ing techniques for this problem are evolutionary search have multiple representations of the same object (or based; in this paper, we propose a polynomial time algo- scene) based on the number of observations (or read- rithm based on a linear programming formulation. We ings taken). We are then faced with the task of building present an additional bi-level method to obtain solu- a single composite model describing the scene in the tions arbitrarily close to the optimal in non-polynomial best possible way. Problems of this form are broadly time (in worst case). Within this new framework, one known as Prototype Learning, and require learning a can optimize edit functions that capture simi- model of a class given several of its members. The Gen- larity by considering labels as well as the graph eralized Median Graph problem asks following ques- structure simultaneously. In context of our motivating tion: given a set of graphs, what is a median graph application, we discuss experiments on molecular im- (not necessarily from the input set) that is a good rep- age analysis problems - the methods will provide the resentative (or prototype) for the set? basis for building a topological map of all pairs of the human chromosome. We are particularly interested in this problem in the context of applications in biological image analysis – specifically, in building a topological map of chromo- 1. Introduction some organization in the human cell nucleus. A proper understanding of the chromosomal organization will Graphs are a natural choice in applications where lead to insights into developmental changes and gene information regarding object-to-object relationships regulation and interaction; an all important focus of needs to be encoded. They serve as an invariant struc- ongoing research is on its relationship to the organi- ture descriptor in object recognition and for low level zation and mutations in the genome [5, 14]. Given image representations. Recent literature in vision in- sets of nuclear images exhibiting chromosomal organi- cludes several examples of graph theoretic algorithms zation, the task is to derive a “model” that is a good for some classical problems like segmentation, image representation of the organizational relationship among denoising and stereo using ideas such as spectral meth- chromosomes. The objective effectively reduces to find- ∗Research was supported in part by NSF grants CCF- ing a median graph for a set of attributed graphs from 0546509, IIS-0713489 and NIH grant GM 072131-23. individual nuclear images. In addition, the general- ized median graph problem is also a natural formaliza- based or heuristic approaches; no combinatorial solu- tion for many problems arising in computer vision and tion paradigms are known (except the special case of structural pattern recognition as well as specific graph certain types of trees [15]). learning problems arising in drug design [6]. While the problem of generalized median graphs 1.1. Problem Description is still somewhat young, a strongly related problem, called the graph isomorphism problem, has been exten- Let a labeled undirected graph G be given as G = sively studied by the computer vision and theoretical (V, E, fv, fe), where computer science communities. The earliest papers on • V is the vertex set, E is the edge set this topic are due to Corneil and Gotlieb [4] and Ull- mann [17]. The status of the problem is interesting – • L is the set of node labels, L is the set of edge V E no polynomial time algorithms are known, at the same labels, time, a formal proof of NP-hardness is also unknown. • fv : V → LV is a mapping from nodes to labels or However, a number of approaches exist for solving var- weights, and ious special cases of this problem. We will avoid an • fe : E → LE is a mapping from edges to labels or exhaustive discussion (see [16] for details) but will fo- weights. cus only on a subset of algorithms that are relevant for putting the remainder of this paper in context. Let G = (G1,G2,...,Gn) be a collection of graphs in arbitrary orientation with these properties In [18], Umeyama proposed an algorithm for graph matching based on eigen decomposition of the adja- • ∀Gi = (Vi,Ei, fv, fe), Vi ⊆ V and Ei ⊆ E. cency matrix of a graph. The technique is efficient • No restriction on the uniqueness of the vertex labels in practice but is only applicable for adjacency ma- of Vi, i.e., fv(ui) = fv(vi), ui, vi ∈ Vi is permissible trices with no repeat eigen values. The algorithm by (and likely). Almohamad and Duffuaa in [2] also employs adjacency matrices for weighted graph matching optimization. It • No restriction on the cardinality of the graphs in uses a Linear Program (LP, for short) formulation that G, i.e., |Gi|= 6 |Gj|, Gi,Gj ∈ G is permissible (and nicely exploits the relationship between permutation likely). matrices and graph isomorphism as follows. ¯ The median graph, G, for G = {G1,...,Gn} must T minimize the sum of distances as follows. min kAG0 − PAG1 P k, (2) n where A and A denote the adjacency matrices of ¯ X ˆ G0 G1 G = arg min d(G, Gi) Gi ∈ G, (1) Gˆ the two edge weighted graphs and P denotes a permu- i=1 tation matrix applied to one of the graphs. Hence, the where d(·, ·) is an appropriately defined ‘distance’ func- graph matching problem reduces to finding a P that tion. In the simplest case, d(·, ·) can be the cost of the minimizes the difference of one graph (say, AG0 ) with fewest edit operations required to ‘convert’ one graph the permuted version of the other (say, P (AG1 )). to the other. Alternative definitions of distance may The algorithm in [2] cannot be directly applied to reflect a similarity measure between a pair of graphs, the isomorphism problem on general graphs with un- as we will see shortly. When G¯ ∈ G, the median graph equal number of vertices or vertex labeled graphs. The is the set median; if we waive this requirement, we get more general case of vertex labeled graphs with weighted the generalized median graph problem. edges adds a new realm of complexity to the problem. To see this, let us consider the factors contributing to 1.2. Previous works the edit cost in general graphs. Normally, ‘edits’ are Generalized Median graphs were recently introduced performed on nodes and edges and can be broadly clas- by Jiang, M¨unger, and Bunke [10]. Their solution in- sified as insertion and deletion of nodes (fv(u)), edges volved a genetic search based algorithm; however, the (fe(e)) and substitution of nodes (dv(fv(u1), fv(u2))) work was significant because it provided a link be- and edges (de(fe(e1), fe(e2))). The problem of gener- tween problems of model construction (given a set) alized graph isomorphism becomes rather ill-posed for and median graphs. A subsequent paper by Hlaoui the case where the costs (for nodes and edges) are not and Wang [8] relied on certain application-specific hy- defined in the same metric space. To motivate this ar- potheses where local choices that improved the objec- gument, let us consider an illustrative example. In Fig. tive function were iteratively picked. The reader may 1, we seek to match graphs G0 and G1 in a minimal edit notice that despite the fact that the median graph cost sense. Assuming vertex substitution has unit cost problem allows a precise combinatorial graph theoretic (Hamming distance) and edge replacement costs are in definition, both existing techniques are soft computing L1 or L2 space, the matching will favor mapping 4abc in G0 to 4def in G1 instead of 4abc. Clearly, the cordance with the edit cost definitions, but in most ap- matching is a trade-off between competing influences. plications, the semantic meaning of a vertex in a graph is as important as its structural relationship with other nodes. For example, in matching graphs that represent chemical compounds (e.g., Lewis structure), should we match a ‘C’ (carbon) to a ‘H’ (hydrogen) simply be- cause they share bonds with the same number () of atoms? Figure 1: Matching with labels and weighted edges. The analysis above indicates that using edit distance as a cost function in many applications yields a bi- For the special case of graphs with labeled vertices ased weighted matching where a degree mismatch has a and unweighted edges, Justice and Hero [11] very re- higher penalty. Therefore, we must somehow re-weight cently proposed a modification of the algorithm in [2] the cost of replacing vertices to reflect the vertex labels to define the Graph Edit Distance in terms of an In- as well as the associated edge information (structure) teger Linear Programming formulation. This tech- concurrently. We will discuss these issues next. nique allows G0 and G1 to have unequal number of labeled vertices as follows. An edit grid is constructed, 2. Main Ideas given as a complete graph, G = (V ,E ) where Ω Ω Ω 2.1. A suitable cost function |VΩ| = N ≤ |V1| + |V2|. The vertex labels belong to an alphabet, Σ. First, the nodes (and edges) on the Consider an image registration problem where the edit grid are initialized to φ (and 0). Then, the initial images have colored regions or certain segmented ob- graph G0 is placed on this edit grid. The φ-labeled vertices of the edit grid take the labels of the vertices jects. The image’s graph representation would natu- rally encode the regions as vertices with the respective of G0 that are aligned with them. All edges in GΩ (la- beled 0) are converted to 1 if they denote a real edge colors as their labels. Additionally, edges between the in G0. An edit operation on this standard placement vertices may denote some ‘link’ between their corre- of G0 is then defined as either changing the label of sponding regions or relationships between objects as a vertex from a character in Σ to φ denoting deletion observed in the image. The registration problem then or from φ to a character in Σ denoting insertion. This requires one to match the images (or its corresponding amounts to permuting the standard placement of G0 graph link structures) in a manner that matches similar to minimize the following. colors (regions) while maximally preserving the graph

N N topologies. By this interpretation, there is a cost asso- X X i j ij 1 T min d(f (A ), f (A ))P + kA − PA P k, (3) ciated with the extent of similarity of two colors. For v 0 v 1 2 0 1 i=1 j=1 example, matching a blue vertex with a green vertex has a higher cost compared to matching a blue vertex with where A0 and A1 are the adjacency matrices of the a cyan vertex. This cost is different from edge weights; standard placements of G0 and G1 on the edit grid i j while label-to-label cost relates vertices in source and and A0 (and A1) is the ith (and jth) vertex of A0 (and target graphs, edge weights usually denote the strength A1) and d ∈ {0, 1} is the cost function for the vertices. of a ‘relation’ between vertices in the same graph. While vertex and edge edit costs are in the same To address the ill-posedness problem outlined in §1.2 space (binary), (3) does not adequately capture the while preserving the intuitions discussed above, we con- notion of similarity distance. To see this, consider two sider a generalization of edit distances for matching to-be-matched input graphs with vertices having large graphs. We will focus on matching edges alone; for- degree (≥ 3). If vertices with mismatched labels from tunately, since edges are identified by their end ver- the two graphs are aligned, one naturally pays a cost tices, it is possible to align vertices implicitly while of 1 for a single vertex pair mismatch. This is clearly aligning edges explicitly as we illustrate now. Assume small compared to the cost incurred when vertices hav- we are given two to-be-matched vertex labeled graphs, ing a large difference in degree are matched to each G0 = (V0,E0, fv, fe) and G1 = (V1,E1, fv, fe). We other (see (3)). A natural tendency of the algorithm, are also given a matrix, S , indicating similarity be- tween vertex labels. The entry, S (i, j) indicates the therefore, would be to match vertices with the same extent of dissimilarity between labels (colors) i and j; degree, instead of vertices with the same label. In fact, S (i, j) = 0 if i = j. The presence or absence of a link one may construct examples where such an approach between vertices is indicated by their edge weights, fe may ignore labels almost completely in favor of same- where fe ∈ {0, 1}. An explicit match between edge degree-vertex alignments, especially if the vertices have ei = (a, b) ∈ G0 and ej = (c, d) ∈ G1 aligns the two high degree. While it could be argued that this is in ac- vertices of the edges i.e., a→ c, b→ d. Therefore, while we do not focus on matching vertices, we can indirectly (placed) in an edit grid, GΩ, |GΩ| ≥ N. Here, a per- ‘charge’ vertices with misaligned labels. This idea al- mutation is simply a change of placement from one lows us to naturally derive the cost that must be paid position to the other. Hence, the generalized median is when an edge ei = (a, b) (in source) is matched with a graph on the edit grid that minimizes the distance of another edge ej = (c, d) (in target) given as every graph in the input set (post-permutation) to it- self. This distance is clearly zero when all input graphs cost(ei, ej ) = S (a, c) + S (b, d) + kfe(ei) − fe(ej )k, (4) are isomorphic. Extending this logic to the general case where the cost(ei, ej) = 0 if both end vertices of the where input graphs are not necessarily isomorphic, our edge pair match perfectly. Otherwise, it reflects sum objective is to find a set of permutations – one for each of the costs of (i) misaligned vertices (based on the graph such that the resultant set of placements are the dissimilarity of aligned vertex labels) and (ii) difference same (or as close as possible). It can be verified that the of weights of the edges (which is 0 unless it is matched mean of all such close placements will yield the general- to a null edge). ized median for the set of input graphs. Because which While (4) allows evaluating the cost due to ver- permutation must be applied to a certain graph is not tex mismatches given two aligned edges as parame- known in advance, we must simultaneously permute ters, we must represent the sum of such costs for a all pairs of graphs, and calculate the cost incurred. In pair of graphs in an algebraically computable man- the next section, we will introduce the integer program ner – to define what must be optimized. Suppose a that models this intuition. We will then address the vertex u1 ∈ G0 is aligned with u2 ∈ G1 such that overestimation in (6). Throughout this paper, upper (u , u ) > 0. Each edge incident on u pays a cost of S 1 2 1 case letters (A) and upper case letters with a single S (u1, u2) due to a misaligned u1. Thus, the total cost subscript (A0) will denote matrices; upper case letters is deg(u1) · S (u1, u2) where deg(·) denotes the degree of a vertex. Similarly, the cost for all incident edges of with two subscripts or two superscripts will refer to ij u2 is deg(u2) · S (u1, u2). The total cost of one vertex individual matrix entries (Aij, A0 , and A(i, j)). misalignment pair (u1 ↔ u2) is then 2.2. Cost when both graphs are permuted D(u , u ) = [deg(u ) + deg(u )] · (u , u ). (5) 1 2 1 2 S 1 2 The edit grid, GΩ, provides a common ‘reference’ frame in which the the sum of variations (cost) between Notice that all edges associated with u1 and u2 have the permuted graphs can be defined and computed pre- paid only part of their cost (due to vertex misalign- cisely. Let us first consider the cost definition when two ment on one side and reflecting the first term in (4)). graphs are simultaneously permuted onto GΩ and then However, the second vertex of edge ei (see second term extend the definition for multiple graphs. If graphs, G in (4)) is automatically considered at the time of eval- 0 and G1, are permuted by P0 and P1 respectively, (6) is uation of the other pair of aligned vertices. The cost represented as follows. function that models this is N N N N N X X X ik kj 1 T T X X ij 1 T D(i, j)P0 P1 + kP0A0P0 − P1 A1P1k, (7) min D(i, j)P + kA0 − PA1P k. (6) 2 2 i=1 k=1 j=1 i=1 j=1 where P ik = 1 indicates that v ∈ G is permuted The second term in (6) evaluates the cost of matching 0 i 0 to position k on the edit grid. A triplet, (i, j, k), such a ‘real’ edge to a ‘non-existent’ or null edge (see third that P ikP kj = 1 implies that v ∈ G is permuted to term in (4)). But, (6) as a measure of graph similar- 0 1 i 0 the position k, and position k on the edit grid maps ity is still not entirely accurate. It is perfect when an to vj ∈ G1 and so we must incur a cost, D(i, j). The edge matches with a null edge; however, when an edge main difficulty in (7) involves the product term of two matches to another real edge, the cost due to (u , u ) S 1 2 variable matrices, P0 and P1. Our linearization in- N×N×N is counted twice (for u1 ∈ G0 and for u2 ∈ G1). This volves an additional variable X ∈ < , Xijk is overestimation must be deducted to reflect the actual ik kj 1 if P0 P = 1 and 0 otherwise. This naturally yields cost. We will discuss this shortly. 1 ik kj For convenience of presentation, let us first general- P0 + P1 = 2 =⇒ Xijk = 1. (8) ize the notion of graph similarity distance (in (6)) in (8) can be converted to regular linear constraints as the context of median graphs. Addressing the overesti- ik kj mation issues discussed above in the generalized setup P0 + P1 ≥ 2Xijk, turns out to be significantly easier. Notice that when ik kj P0 + P1 − 1 ≤ Xijk. (9) we have just two graphs, the cost calculated is the simi- ik kj larity of one graph to the permuted version of the other. Observe that when P0 + P1 = 2, Xijk must be In our formulation, all the input graphs are embedded 1 to simultaneously satisfy both constraints in (9); if ik jk P0 + P1 ≤ 1, Xijk must be 0. (8) and (9) are hence 3. Linearization and Relaxation equivalent. Incorporating these into (7) gives In this section, we describe a relaxation technique to N N N derive a polynomial time solvable version of the prob- X X X 1 T T D(i, j)( X ) + kP A P − P A P k. (10) ijk 2 0 0 0 1 1 1 lem in §2.2. We illustrate the ideas using the familiar i=1 j=1 k=1 objective function in (2) in §1.2. The strategy applies We now revisit the overestimation issues from §2.1. readily to (15)-(20). We observe that the objective, P T To calculate the overestimation in (10), we first need ij(AG1 − PAG2 P )ij, is equivalent to to determine the edge-to-edge mappings between G0 X and G1. This is given by the “1” entries in the dot (vec(AG1 ) − vec(AG2 )(P ⊗ P ))l, (21) T T l product of P0A0P0 and P1 A1P1. The overestimation, say C ∈

K solved using branch-and-bound type methods (avail- ∗ X G = arg min d(G,¯ Gi) able in most commercial solvers like CPLEX) but may G¯ i=1 have non-polynomial running time in the worst case. ∗ We will avoid computing G directly. Instead, we 5. Experimental Results employ the lower bounding ideas in [9] to model the distance of every graph, Gi to the generalized median We discuss experiments related to our motivating bi- ∗ by an unknown variable, xi. Therefore, xi = d(G ,Gi). ological image analysis application. Results on pattern Then, the problem can be modeled as follows. recognition graph database and applications in drug- K X design will appear in the longer version of the paper. min xk (26) i=1 5.1. Applications in Biological Image Analysis s.t. xk + d(Gk,Gl) ≥ xl, ∀k, l, 5.1.1 Brief Biological background xl + d(Gk,Gl) ≥ xk, ∀k, l, Chromosomes within the human cell nucleus are known x + x ≥ d(G ,G ), ∀k, l, k l k l to occupy discrete territories and evidence from recent xi ≥ 0, ∀i ∈ [1,K]. (27) literature suggests that these might be functionally rel- (26)-(27) can be solved optimally in polynomial time. evant because the position of these territories in nuclear space is ‘organized’ [5]. It shows a statistical correla- Property 1 (from [9]) The sum of the entries of x = tion with gene density and the size of chromosomes T (x1, . . . , xK ) denotes a lower bound on the distance of [7]. In humans, larger chromosomes (numbers 1 − 10) all graphs to the optimal generalized median. are arranged on the periphery. Secondly, chromosomes Lemma 2 There must be a graph Ga ∈ G that satisfies may have non-random neighbors, a finding that might account for preferential translocations and interactions K K X X ∗ between specific chromosomes. Biologists suspect that d(G ,G ) ≤ 2 d(G ,G ), k a k such patterns are related to genomic evolution and cell- i=1 i=1 where G∗ is the optimal median graph for G. type-specific variability. In this regard, several chromosomes which occur frequently as a chain or a researchers[3, 13] have sub-group (e.g., numbers (b, c, d)) by graph mining [1] stressed on the need of and other software [12]. We then compared these to mapping large-scale orga- the results obtained by the median graph model for nization and distribution this cell line. Figure 3(h) shows groups (colored edges) of chromatin in various with an occurrence frequency of ≥ 70%. Fig. 3(h) cell lines. The standard shows that almost all such sub-graphs align perfectly approach relies on a with the computed median graph indicating that the sequence of experiments model incorporates the essential information from the on a large number of constituent graphs. Notice that while the median can cells. Chromosome to fit all subgraphs, all subgraphs cannot build the model. chromosome relationships Figure 2: A nuclear im- age with eight chromosome Structure Prediction. The basic question we wanted are manually investigated paints. to answer was – given a new cell, can we predict the using the acquired images non-random part of its chromosomal organization? If (see Fig. 2, one for every cell) sequentially. A reliable yes, then the model serves its purpose and can be used topological ‘model’ can provide the necessary founda- to interpret biologically relevant information. To do tion for studying the effect of higher-order chromatin this we calculated Jaccard’s, Sorensen’s, and Mount- distribution on nuclear functions. Unfortunately, an ford’s similarity of the median (from training set) with arrangement map for all 23 chromosome pairs in a each cell in the training/test set (considering attributed diploid human cell nucleus is lacking so far. Such edges as set members). The results in Fig. 3(i) show models are needed for different cell types at various that the similarity is only slightly worse for the test set stages of cell cycle and terminal differentiation [3]. w.r.t. the training set. Considering that Sorensen’s is a 5.1.2 Chromosomal organization maps percentage similarity measure, we can draw the follow- ing inference – given a new cell, we can predict close Topological map derivation of chromosomes transforms to 50% of its chromosome organization. This means naturally to the generalized median graph problem. ∼ 50% of chromosomal relationships are preserved! Each individual chromosome (denoted as numbers in Fig. 2) is represented as a labeled vertex in a graph. 6. Conclusions Due to microscopic acquisition limitations, at most 8 This paper applies mathematical programming to pairs of chromosomes can be ‘labeled’ per cell; hence, design an optimization framework for the generalized our graph has 16 vertices with at most 2 vertices median graph problem in context of a real world vi- with the same vertex label (i.e., chromosome number- sion problems in biological image analysis. In addition, ing). Spatial proximity (adjacency) between chromo- the concept has applications to drug-design (structural somes is expressed as an edge between vertices. We pattern recognition) and other problems, we believe focus on 8 pairs of the larger chromosomes (numbers that the ideas will find general applicability in build- 1, 2, 3, 4, 6, 7, 8, 9) from the human lung fibroblast cell ing exemplars from representations such as attributed line. The acquisition process was repeated for 37 cells skeletal graphs. The extension of the similarity metric overall. The raw images were segmented to create is also of independent interest. masks for individual chromosome pairs and individual References graph representations derived (see Fig. 3). To evaluate the goodness of our final ‘model’, we di- [1] IBM Frequent Subgraph Miner vide the cells into a training set (19 cells) and a test (http://www.alphaworks.ibm.com/tech/fsm). 7 set (18 cells). We report on the following observations. [2] H. A. Almohamad and S. O. Duffuaa. A linear pro- Optimality. Figure 3 (g) shows the median graph ob- gramming approach for the weighted graph matching tained from the training set. The objective function problem. IEEE PAMI, 15(5):522–525, 1993. 2, 3, 5 value was 1.43 times the lowest possible cost for com- [3] S. Boyle, S. Gilchrist, J. Bridger, N. L. Mahy, J. A. puting the median for this set of graphs (lower bound Ellis, and W. A. Bickmore. The spatial organization of human chromosomes within the nuclei of normal and by (26)-(27)). This ratio for the {training, test, en- emerin-mutant cells. Hum. Mol. Genet., 10:211–219, tire} set were {1.58,1.6,1.63} indicating that 2001. 7 the computed generalized median is a better represen- [4] D. G. Corneil and C. C. Gotlieb. An efficient algorithm tative for the set than any of the component graphs. for graph isomorphism. J. ACM, 17(1):51–64, 1970. 2 Note: Exhaustive enumeration analysis suggests that [5] T. Cremer and C. Cremer. Chromosome territories, a solution much closer to the lower bound is unlikely. nuclear architecture and gene regulation in mamalian Substructure frequency. We generated groups of cells. Nature Reviews Genetics, 2:292–301, 2001. 1, 6 (a)(b)(c)

(d)(e)(f)

(g)(h)(i) Figure 3: 2D sections of chromosome organization in 3 cell-images in (a)-(c) and their graphical representation in (d)-(f); a generalized median graph for a set of 19 cells in (g); all patterns (except one) with frequency of ≥ 70% across 19 cells (by graph-mining) were present in the generalized median graph and shown as colored in (h); average of similarity measures of items in the test/training sets w.r.t. to the generalized median and set median of those sets in (i).

[6] P. W. Finn, S. Muggleton, D. Page, and A. Srini- from time lapse microscopic image sequences of living vasan. Pharmacophore discovery using the inductive cell nucleus. In MICCAI, pages 577–585, 2006. 7 logic programming system PROGOL. Machine Learn- [13] R. Nagele, T. Freeman, L. McMorrow, and H. Lee. ing, 30(2-3):241–270, 1998. 2 Precise spatial positioning of chromosomes during [7] H. Tanabe et al. Evolutionary conservation of chromo- prometaphase: Evidence for chromosomal order. Sci- some territory arrangements in cell nuclei from higher ence, 270:1830–1835, 1998. 7 primates. PNAS, 99(7):4424–4429, 2002. 6 [14] L. Parada, P. G. McQueen, P. J. Munson, and T. Mis- [8] A. Hlaoui and S. Wang. Median graph computation teli. Conservation of relative chromosome positioning for graph clustering. Soft Comp., 10(1):47–53, 2006. in normal and cancer cells. Curr. Biol., 12(19):1692– 2 1697, 2002. 1 [9] X. Jiang and H. Bunke. Optimal lower bound for gen- [15] C. A. Phillips and T. Warnow. The Asymmetric Me- eralized median problems in metric space. In Proc. dian : A New Model for Building Consensus Trees. IAPR SSSPR, pages 143–152, 2002. 6 Discrete Appl. Math., 71(1-3):311–335, 1996. 2 [10] X. Jiang, A. Munger, and H. Bunke. On me- [16] S. Toda. Graph Isomorphism: Its Complexity and Al- dian graphs: Properties, algorithms, and applications. gorithms. In Proc. FSTTCS, page 341, 1999. 2 IEEE PAMI, 23(10):1144–1151, 2001. 2 [17] J. R. Ullmann. An algorithm for subgraph isomor- [11] D. Justice and A. Hero. A binary linear programming phism. J. ACM, 23(1):31–42, 1976. 2 formulation of the graph edit distance. IEEE PAMI, [18] S. Umeyama. An eigendecomposition approach to 28(8):1200–1214, 2006. 3 weighted graph matching problems. IEEE PAMI, [12] L. Mukherjee, V. Singh, J. Xu, K. Malyavantham, and 10(5):695–703, 1988. 2 R. Berezney. On mobility analysis of functional sites