Embedding Median Graphs Into Minimal Distributive -Semi-Lattices
Total Page:16
File Type:pdf, Size:1020Kb
Embedding median graphs into minimal distributive _-semi-lattices Alain Gély, Miguel Couceiro, Amedeo Napoli To cite this version: Alain Gély, Miguel Couceiro, Amedeo Napoli. Embedding median graphs into minimal distributive _-semi-lattices. NFMCP 2019 - 8th International Workshop on New Frontiers in Mining Complex Patterns in conjunction with ECML-PKDD 2019, Sep 2019, Wùzburg, Germany. hal-02912341 HAL Id: hal-02912341 https://hal.inria.fr/hal-02912341 Submitted on 5 Aug 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Embedding median graphs into minimal distributive _-semi-lattices Alain Gély1, Miguel Couceiro2, and Amedeo Napoli2 1 Université de Lorraine, CNRS, LORIA, F-57000 Metz, France 2 Université de Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France {alain.gely,miguel.couceiro,amedeo.napoli}@loria.fr Abstract. It is known that a distributive lattice is a median graph, and that a distributive _-semi-lattice can be thought of as a median graph i every triple of elements such that the inmum of each couple of its ele- ments exists, has an inmum. Since a lattice without its bottom element is obviously a _-semi-lattice, using the FCA formalism, we investigate the following problem: Given a semi-lattice L obtained from a lattice by deletion of the bottom element, is there a minimum distributive _- semi-lattice Ld such that L can be order embedded into Ld? We give a negative answer to this question by providing a counter-example. Keywords: Median graph · Distributive lattice · order embedding· For- mal Concept Analysis. 1 Motivation Lattices and median graphs are two structures with many applications, in par- ticular in classication and knowledge discovery. Median graphs are especially used in biology, for example in phylogeny, for modeling inter-species liations. In phylogeny, one of the main problems is to nd evolution trees for representing ex- isting species from accessible DNA fragments. When several trees are leading to the same inter-species liations, the preferred ones are the most parsimonious, where the number of modications such as mutations for example, is minimal for the considered species. However, several possible parsimonious trees may exist simultaneously. Such a situation arises with inverse or parallel mutations, e.g., when a gene goes back to a previous state or the same mutation appears for two non-linked species. This calls for a generic representation of such a family of trees. Bandelt et al. [2,3] propose the notion of median graph to overcome this issue, since it was noticed that a median graph may encode all parsimonious trees. It is known that median graphs are related to lattices (see, e.g., [1,2]). Any distributive lattice is a median graph, and any median graph can be thought of as a distributive _-semi-lattice such that for all x; y; z such that the supremum of ech pair exists, then the supremum fx; y; zg also exists. Formal Concept Analysis (FCA) is based on lattice theory and can be used in classication and knowledge discovery. Uta Priss [13,14] made a rst attempt to use the algorithmic machinery of FCA and the links between distributive lattices and median graphs, to analyze phylogenetic trees. However, not every concept lattice is distributive, and thus FCA alone does not necessarily outputs median graphs. A transformation should be designed to build a median graph from a concept lattice. In [14] Uta Priss sketches an algorithm to convert any lattice into a median graph. The key step is to transform any lattice into a distributive lattice. However, how to transform a lattice into a distributive one is not detailled in these papers. In [4], Bandelt uses a data set from [15] to illustrate and evaluate median graph. In this introduction, we will re-use it to show the dierences between median graph and FCA approaches. The example is an extract of mitochondrial DNA for 15 Kung individuals from a Khoisan-speaking hunter-gathered popula- tion in southern Africa. For some sequences in mitochondrial DNA (nucleotide positions, denoted by a, b, ::: , j in table), a binary information indicates if a group of individuals owns the consensus version of the sequence (blanck value) or a variation for this sequence (× value). Eight individual groups are studied because some individuals share the same variants. For example, group 0 stands for 4 similar individuals in [4]. Individual group with no variation on any nu- cleotide positions (consensus group) is not shown on the table. These data are shown in Fig. 1 (Upper Left). For these data, the median graph is shown in Fig. 1 (upper right). Vertices are either individual group (numbered from 0 to 7) or latent vertices, added such that from a group to an adjacent one, only one variation exists for nucleotide sequences (parcimony principle supposes that there is no chance that two varia- tions arise exactly at the same moment for the same population in evolutionary process). This variation is indicated on edges. As an example, from consensus group to L4, the only variation occurs in sequence k, from L4 to 0 the only variation occurs in sequence j. As stated, this graph contains every parcimo- nious tree as covering tree. Median graph owns others good properties: remove edges labeled with a sequence variation produces two disconnected parts. One correspond to individuals with the variation, the other without the variation. Since data is a binary table, Formal Concept Analysis can be applied. The concept lattice obtained from the data is shown in Fig. 1 (Lower Left). In gen- eral, it does not correspond to a median graph. To build a median graph, a necessary condition is to have a distributive _-semi-lattice. In [9], based on the work of Birkho and FCA formalism, we propose an algorithm to compute such a semi-lattice (and the corresponding data table). The result of this algorithm, transforming a concept lattice into a median graph, is given in Fig. 1 (Lower Right). Since FCA is supported by a wide community, the main idea of these re- searches is to be able to use FCA results and softwares to deal with phylogenetic data and median graphs. Remark that, for this particular data set the algorithm nd the median graph computed by Bandelt in [4], unfortunately, in some cases the algorithm returns a distributive _-semi-lattice Ld that is not minimal: There exists Ld0 such that L can be embedded in Ld0 and Ld0 can be embedded Ld. The continuation of [9] is to search for an algorithm which outputs a mini- mal distributive _-semi-lattice. Since we look for minimality, a natural question arises: does a unique minimal (so, minimum) distributive _-semi-lattice Ld ex- ists? In this paper, we propose a counter-example, and then we show that a minimum distributive _-semi-lattice does not always exist. In the following section, we recall denitions and notation for the under- standing of this paper. We then sketch the limitations of our algorithm and show in Section 4 that a minimum distributive _-semi-lattice does not exist. We conclude this paper by some remarks and perspectives in Section 5. consensus a b c d e f g h i j k 0 × × × k j 1 × × a L4 L1 2 × × 3 × × h j k c d 4 × × × × 5 × L3 L2 0 / jk 2 / cj 6 × × × i e b j h f 7 × × × 3 / ai 1 / ae 6 / bhk 4 / hjk 7 / cfj 5 / d 01234567,{} 01234567,{} 0247,j 046,k 0247,j 13,a 46,h 04,gj 27,cj 13,a 46,hk 04,jk 27,cj 1,ae 3,ai 6,bh 4,ghj 7,cfj 5,d 1,ae 3,ai 6,bhk 4,hjk 7,cfj 5,d {},abcdefghi {},abcdefghi Fig. 1. Upper Left. Phylogenetic data of sequence variations for individual groups. Upper Right. Median graph obtained from the data ([4]). Lower Left. Concept lattice for phylogenetic data. Lower Right. Concept lattice corresponding to the median graph. The new concept (046; k) corresponds to L4. To obtain this lattice, data must be modied replacing column g by k. 2 Models: lattices, semi-lattices, median algebras and median graphs In this section we recall basic notions and notation needed throughout the paper. We will mainly adopt the formalism of [8], and we refer the reader to [6,7] for further background. In this paper, all sets are supposed to be nite. 2.1 Lattices and FCA (J (L); M(L); ≤) a b c (J (L); J (L); 6≥) 1(c) 2(d) 3(b) 1 × × 1 × × 2 × 2 × 3 × 3 × × Fig. 2. Upper Left. Standard context for lattice N5 Lower Left. N5, a non distribu- tive lattice. Upper Right. The context (J (N5); J (N5); 6≥) of an ideal (and so, dis- tributive) lattice. Lower Right. Ideal lattice for (J (N5); ≤) and this poset. Note that N5 can be order-embedded in this lattice. Concepts (X,Y) are maximal rectangles of the contexts. For an element e of the lattice, the corresponding concept (X; Y ) is X = fj 2 J (L) j j ≤ eg and Y = fm 2 M(L) j m ≥ eg. For example, the element with label d is the concept (13; d).