An Order Theoretic Point-Of-View on Subgroup Discovery Aimene Belfodil
Total Page:16
File Type:pdf, Size:1020Kb
An Order Theoretic Point-of-view on Subgroup Discovery Aimene Belfodil To cite this version: Aimene Belfodil. An Order Theoretic Point-of-view on Subgroup Discovery. Artificial Intelligence [cs.AI]. Université de Lyon, 2019. English. tel-02355529 HAL Id: tel-02355529 https://hal.archives-ouvertes.fr/tel-02355529 Submitted on 8 Nov 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. THÈSE DE DOCTORAT DE L’UNIVERSITÉ DE LYON opérée au sein de L’INSA DE LYON ECOLE DOCTORALE N°512 MATHÉMATIQUES ET INFORMATIQUE (INFOMATHS) Spécialité/Discipline de doctorat: Informatique An Order Theoretic Point-of-view on Subgroup Discovery Soutenue publiquement le 30/09/2019 par AIMENE BELFODIL Devant le jury composé de: Pr. Bruno Crémilleux Université de Caen Rapporteur Pr. Bernhard Ganter Technische Universitaet Dresden Rapporteur Pr. Céline Robardet INSA Lyon Directrice de thèse Dr. Mehdi Kaytoue Infologic Co-directeur de thèse Dr. Peggy Cellier INSA Rennes Examinatrice Pr. Miguel Couceiro Université de Lorraine Examinateur Pr. Arno Siebes Universiteit Utrecht Examinateur Pr. Sergei O. Kuznetsov Higher School of Economics (Moscow) Invité Mr. Julien Zarka Mobile Devices Ingenierie Invité Département FEDORA – INSA Lyon - Ecoles Doctorales – Quinquennal 2016-2020 SIGLE ECOLE DOCTORALE NOM ET COORDONNEES DU RESPONSABLE CHIMIE CHIMIE DE LYON M. Stéphane DANIELE Institut de recherches sur la catalyse et l’environnement de Lyon http://www.edchimie-lyon.fr IRCELYON-UMR 5256 Sec. : Renée EL MELHEM Équipe CDFA Bât. Blaise PASCAL, 3e étage 2 Avenue Albert EINSTEIN [email protected] 69 626 Villeurbanne CEDEX INSA : R. GOURDON [email protected] E.E.A. ÉLECTRONIQUE, M. Gérard SCORLETTI ÉLECTROTECHNIQUE, École Centrale de Lyon AUTOMATIQUE 36 Avenue Guy DE COLLONGUE 69 134 Écully http://edeea.ec-lyon.fr Tél : 04.72.18.60.97 Fax 04.78.43.37.17 Sec. : M.C. HAVGOUDOUKIAN [email protected] [email protected] E2M2 ÉVOLUTION, ÉCOSYSTÈME, M. Philippe NORMAND MICROBIOLOGIE, MODÉLISATION UMR 5557 Lab. d’Ecologie Microbienne Université Claude Bernard Lyon 1 http://e2m2.universite-lyon.fr Bâtiment Mendel Sec. : Sylvie ROBERJOT 43, boulevard du 11 Novembre 1918 Bât. Atrium, UCB Lyon 1 69 622 Villeurbanne CEDEX Tél : 04.72.44.83.62 [email protected] INSA : H. CHARLES [email protected] EDISS INTERDISCIPLINAIRE Mme Emmanuelle CANET-SOULAS SCIENCES-SANTÉ INSERM U1060, CarMeN lab, Univ. Lyon 1 Bâtiment IMBL http://www.ediss-lyon.fr 11 Avenue Jean CAPELLE INSA de Lyon Sec. : Sylvie ROBERJOT 69 621 Villeurbanne Bât. Atrium, UCB Lyon 1 Tél : 04.72.68.49.09 Fax : 04.72.68.49.16 Tél : 04.72.44.83.62 [email protected] INSA : M. LAGARDE [email protected] INFOMATHS INFORMATIQUE ET M. Luca ZAMBONI MATHÉMATIQUES Bât. Braconnier 43 Boulevard du 11 novembre 1918 http://edinfomaths.universite-lyon.fr 69 622 Villeurbanne CEDEX Sec. : Renée EL MELHEM Tél : 04.26.23.45.52 Bât. Blaise PASCAL, 3e étage [email protected] Tél : 04.72.43.80.46 [email protected] MATÉRIAUX DE LYON M. Jean-Yves BUFFIÈRE Matériaux INSA de Lyon http://ed34.universite-lyon.fr MATEIS - Bât. Saint-Exupéry Sec. : Stéphanie CAUVIN 7 Avenue Jean CAPELLE Tél : 04.72.43.71.70 69 621 Villeurbanne CEDEX Bât. Direction Tél : 04.72.43.71.70 Fax : 04.72.43.85.28 [email protected] jean [email protected] MEGA MÉCANIQUE, ÉNERGÉTIQUE, M. Jocelyn BONJOUR GÉNIE CIVIL, ACOUSTIQUE INSA de Lyon Laboratoire CETHIL http://edmega.universite-lyon.fr Bâtiment Sadi-Carnot Sec. : Stéphanie CAUVIN 9, rue de la Physique Tél : 04.72.43.71.70 69 621 Villeurbanne CEDEX Bât. Direction jocelyn.bonjour @insa-lyon.fr [email protected] ScSo ScSo * M. Christian MONTES Université Lyon 2 http://ed483.univ-lyon2.fr 86 Rue Pasteur Sec. : Véronique GUICHARD 69 365 Lyon CEDEX 07 INSA : J.Y. TOUSSAINT [email protected] Tél : 04.78.69.72.76 [email protected] *ScSo : Histoire, Géographie, Aménagement, Urbanisme, Archéologie, Science politique, Sociologie, Anthropologie iii Publications presented in the Dissertation The contributions presented in this thesis appear in the following publications: International Conferences • Aimene Belfodil, Sergei O. Kuznetsov, Céline Robardet and Mehdi Kaytoue. Mining Con- vex Polygon Patterns with Formal Concept Analysis. In the International Joint Con- ference on Artificial Intelligence, IJCAI, pages 1425–1432, 2017. • Aimene Belfodil and Adnene Belfodil, Mehdi Kaytoue. Anytime Subgroup Discovery in Numerical Domains with Guarantees. In the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), pages 500–516, 20181. • Aimene Belfodil, Sergei O. Kuznetsov and Mehdi Kaytoue. On Pattern Setups and Their Completions. In Concept Lattices and Their Applications (CLA), pages 243–253, 2018. • Aimene Belfodil, Adnene Belfodil and Mehdi Kaytoue. Mining Formal Concepts us- ing Implications between Items. In the International Conference on Formal Con- cept Analysis (FCA), pages 173–190, 2019 (see https://hal.archives-ouvertes.fr/ hal-02080577/document). Submitted Papers in International Journals • Aimene Belfodil, Sergei O. Kuznetsov and Mehdi Kaytoue. On Pattern Setups and Pattern Multistructures. Submitted to the International Journal of General Systems (IJGS) (See https://arxiv.org/abs/1906.02963). Others • Aimene Belfodil, Mehdi Kaytoue, Céline Robardet, Marc Plantevit and Julien Zarka. Une méthode de découverte de motifs contextualisés dans les traces de mobil- ité d’une personne. Extraction et Gestion des Connaissances (EGC), pages 63–68, 2016. • Aimene Belfodil, Mehdi Kaytoue, Sergei O. Kuznetsov and Amedeo Napoli. On Ordered Sets in Pattern Mining. Tutorial at ECML/PKDD 2019 (See https://belfodilaimene. github.io/pattern-setups-tutorial/). • Aimene Belfodil and Adnene Belfodil, Anes Bendimerad, Philippe Lamarre, Celine Ro- bardet, Mehdi Kaytoue, Marc Plantevit. FSSD - A Fast and Efficient Algorithm for Subgroup Set Discovery. In the International Conference on Data Science and Advanced Analytics (DSAA), 2019. 1Rewarded by the ECML/PKDD’2018 award committee as the best student data mining paper of the year. v “... I can only say: je le vois, mais je ne le crois pas. ...”2 A Letter from Cantor to Dedekind, 29 June 1877 ([62] pp. 860) It is with these terms that Georg Cantor announced his proof to Richard Dedekind of one of the fundamental results in Set Theory. I wanted to start this dissertation with this Cantor astonishment of his findings where I can imagine the euphoria he had when he started to understand such things after years of effort. It is such a kind of euphoria I am looking for ... 2“... I can only say: I see it, but I don’t believe it. ...” TABLE OF CONTENTS Page 1 Introduction 1 1.1 On Description Languages.................................. 3 1.2 Ordering Description Languages.............................. 5 1.3 From Datasets to Interesting Subgroups ......................... 7 1.4 Enumeration and Subgroup Discovery .......................... 9 1.4.1 Finding all Subgroups................................ 9 1.4.2 Finding Interesting Subgroups .......................... 10 1.5 Contributions.......................................... 11 1.5.1 Patterns as Ordered Sets.............................. 11 1.5.2 Subgroup Exhaustive Enumeration........................ 12 1.5.3 Anytime Discriminative Subgroup Discovery.................. 13 1.6 Structure of the Thesis.................................... 15 I Patterns as Ordered Sets 17 2 Order-Theoretic Foundations 19 2.1 Elements of Set Theory.................................... 20 2.1.1 Basic Definitions................................... 20 2.1.2 Binary Relations................................... 20 2.1.3 On the Axiom of Choice (AC)............................ 21 2.2 Partially Ordered Sets .................................... 22 2.2.1 Basic Definitions................................... 22 2.2.2 Creating Partial Orders from Pre-orders..................... 25 2.2.3 Minimal, Minimum, Infimum and their Duals ................. 26 2.2.4 Summary ....................................... 29 2.3 Partially Ordered Sets Properties ............................. 31 2.3.1 Bounded Posets.................................... 31 2.3.2 Chain-Finite Posets ................................. 32 2.3.3 Chain-Complete Posets ............................... 32 vii viii TABLE OF CONTENTS 2.3.4 Lattices and Complete Lattices .......................... 33 2.3.5 Multilattices and Complete Multilattices .................... 38 2.3.6 Summary ....................................... 42 2.4 Morphisms on Partially Ordered Sets........................... 45 2.4.1 Basic Definitions................................... 45 2.4.2 Endomorphisms on Partially Ordered Sets ................... 47 2.4.3 Galois Connections.................................. 52 2.4.4 Poset Completions.................................. 53 3 Patterns as Ordered Sets 57 3.1 Elements on Pattern Languages .............................. 59 3.2 Formal