Quasi-Median Graphs from Sets of Partitions
Total Page:16
File Type:pdf, Size:1020Kb
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Elsevier - Publisher Connector Discrete Applied Mathematics 122 (2002) 23–35 Quasi-median graphs from sets of partitions H.-J. Bandelta , K.T. Huberb;1 , V.Moulton b; ∗;2 aFachbereich Mathematik, Universitat Hamburg, D-20146 Hamburg, Germany bPhysics and Mathematics Department (FMI) Mid Sweden University, S 851-70 Sundsvall, Sweden Received 14 November 2000; received in revised form 4 September 2001; accepted 1 October 2001 Abstract In studies of molecular evolution, one is typically confronted with the task of inferring a phylogenetic tree from a set X of sequences of length n over a ÿnite alphabet .For studies that invoke parsimony, it has been found helpful to consider the quasi-median graph generated by X in the Hamming graph n.Although a great deal is already known about quasi-median graphs (and their algebraic counterparts), little is known about the quasi-median generation in n starting from a set X of vertices.We describe the vertices of the quasi-median graph generated by X in terms of the coordinatewise partitions of X .In particular, we clarify when the generated quasi-median graph is the so-called relation graph associated with X .This immediately characterizes the instances where either a block graph or the total Hamming graph is generated. ? 2002 Published by Elsevier Science B.V. 1. Introduction Given some ÿnite alphabet , the sequence space n for n ¿ 1 comprises all se- quences of length n over , equipped with the Hamming distance (assuming that all letters in are equidistant).The sequence space n can be regarded as a Hamming graph, being the nth Cartesian power of the complete graph on the vertex set , with two sequences in n adjacent exactly when they di>er in precisely one position ∗ Corresponding author.Fax: 46-60-148-875. E-mail addresses: [email protected] (V. Moulton), [email protected] (H.-J. Bandelt), [email protected] (K.T. Huber).. 1 The author thanks the Institute of Fundamental Sciences, Massey University, and the New Zealand Marsden Fund for its support. 2 The author thanks the Swedish Natural Science Research Council (NFR)—Grant M12342-300—for its support. 0166-218X/02/$ - see front matter ? 2002 Published by Elsevier Science B.V. PII: S0166-218X(01)00353-5 24 H.-J. Bandelt et al. / Discrete Applied Mathematics 122 (2002) 23–35 (i.e., coordinate). Such examples naturally arise in molecular biology, where they in- volve the nucleotide alphabet = {A; G; T; C} when DNA-sequences are being consid- ered. In studies of molecular evolution, one is typically confronted with the task of in- ferring a phylogenetic tree contained in n that connects a given set X ⊆ n of sequences.For studies which invoke parsimony (that is, search for minimal Steiner trees), one usually assumes that the sequences have been condensed in the sense that there is no constant position left, that is, some position at which all sequences from X share the same letter.Moreover, positions which have an identical distribution of letters among the members of X up to permutations of are identiÿed (possibly keeping the information of how many positions were merged as an integer weight). Thus, the essential information in this context is the non-trivial partition of X that a particular position induces: each part is formed by all sequences sharing the same letter at this position.We may then recode the (condensed) sequences by assuming = {0;:::;k − 1}, k ∈ N, and that i = {0;:::;ki − 1} for 1 6 i 6 n (where neces- n sarily 2 6 ki 6 n) is the projection of X onto the ith factor of , so that for each i =1;:::;n the pre-images {x ∈ X | xi = j} of the letters j ∈ i constitute the ki parts of the ith partition of X .We may stipulate that the letters in each factor have been permuted in such a way that their natural order reKects some ÿxed linear order ¡ on X : min{x ∈ X | xi =0} ¡ min{x ∈ X | xi =1} ¡ ···¡ min{x ∈ X | xi = ki − 1}: With this convention it is easy to see that there is a one-to-one correspondence between n the subset 1 ×···×n of and the set VR of all R-maps, that is, the set of all maps ’ from the set R of all distinct non-trivial partitions of X (associated with the n sequence positions) to the power set P(X )ofX , such that ’(R) ∈ R holds for all R ∈ R.Clearly the graph with vertex set VR, for which any two vertices ’1;’2 ∈ VR are adjacent if and only if ’1 and ’2 di>er on exactly one partition in R, is canonically isomorphic to the Hamming graph associated to 1 ×···×n.We will therefore dub it the Hamming graph HR associated to R.Note that our correspondence also associates x x a unique R-map ’ in VR to each x ∈ X , for which ’ (R) is that part in R containing x (R ∈ R). Returning to our above problem of inferring phylogenetic trees, it has been found that studying the collection of sequences contained in the quasi-median hull generated by X in 1 ×···×n can be a powerful tool.These hulls, whose induced subgraphs are known as quasi-median graphs, are well understood [7,26,32], especially in the case that the alphabet under consideration has cardinality two, so that X consists of binary sequences.In this special situation, quasi-median hulls =graphs are simply referred to as median hulls=graphs—see [5,6,9,21] where various results are given on these structures.By virtue of the above one-to-one correspondence, properties of the median hull generated by X can be translated into ones using the terminology of partitions and R-maps (see also [29]).For instance, the condition for a R-map to be contained in the “Buneman graph” [12–14] is a straightforward consequence of median convexity properties: e.g., just combine [2, Proposition 2:3] (see also [31]) with [3, Lemma]. H.-J. Bandelt et al. / Discrete Applied Mathematics 122 (2002) 23–35 25 The following results provide a classic example of this.One calls two splits, that is, bipartitions, R1;R2 of X compatible if and only if there exists some A1 ∈ R1 and some A2 ∈ R2 such that A1 ∪ A2 = X , and incompatible otherwise (which is the case exactly when for every A1 ∈ R1 and A2 ∈ R2 one has A1 ∪ A2 = X ).Then it can be shown that the median graph QR (alias Buneman graph) associated to R is a tree if and only if every pair of splits in R is compatible, whereas it coincides with HR if and only if every pair of splits in R is incompatible.Hence, the following natural question arises: can a similar approach be taken for elucidating the structure of quasi-median graphs in general and, if so, what would be an appropriate generalization for the notion of compatibility of splits to general partitions? It turns out that the right notion for this purpose was introduced in [15]: two distinct partitions R1 and R2 of X are strongly compatible if there exists B1 ∈ R1 and B2 ∈ R2 with B1 ∪ B2 = X . Di>erent notions of (in)compatibility for two partitions R1;R2 are best described in terms of the bipartite intersection graph (R1;R2) of their parts (where two parts are adjacent exactly when they intersect) [20].Then R1 and R2 are compatible if and only if (R1;R2) is a forest, that is, cycle-free [23–25].Thus, R1 and R2 are incompatible if (R1;R2) contains some cycle.We say that R1 and R2 are strongly incompatible if (R1;R2) is a complete bipartite graph.In this extreme case every part of one partition is a transversal for the other partition.The following examples will illustrate these notions: •{{1; 2}; {3}; {4}} and {{1}; {2}; {3; 4}} are strongly compatible. •{{1; 2}; {3}; {4}; {5}} and {{1}; {2}; {3; 4}; {5}} are not strongly compatible but compatible. •{{1; 2}; {3; 4}; {5; 6}} and {{1; 6}; {2; 3}; {4; 5}} are incompatible but not strongly incompatible. •{{1; 2; 3}; {4; 5; 6}} and {{1; 4}; {2; 5}; {3; 6}} are strongly incompatible. Given a collection R of non-trivial partitions of a ÿnite set X , we prove the following results: (A) An R-map ’ is a vertex in the quasi-median graph QR if and only if for each pair R1;R2 of distinct strongly compatible partitions in R either ’(R1)=B1 or ’(R2)=B2 holds, where B1 ∈ R1 and B2 ∈ R2 are the (necessarily unique) parts with B1 ∪ B2 = X (see Theorem 1). (B) The quasi-median graph QR is the Hamming graph HR if and only if any two distinct partitions in R are not strongly compatible (see Corollary 1). (C) The quasi-median graph QR is a block graph (i.e., every maximal 2-connected subgraph of QR is a clique) if and only if every pair of distinct partitions in R is strongly compatible (see Theorem 2). 2. Quasi-median graphs in a partitions setting In what follows we will assume that X denotes a ÿnite set and R a set of non-trivial partitions of X .A map ’ : R → P(X ) will be called an R-map if ’(R) ∈ R for all R ∈ R, and VR will denote the set of all R-maps.Note that each element x ∈ X may x be thought of as being canonically contained in VR as the map ’ deÿned by setting 26 H.-J.