<<

An Introduction to Formal Analysis (FCA)

Amedeo Napoli Orpailleur Team Université de Lorraine, CNRS, Inria, LORIA, 54000 Nancy, France [email protected]

Tutorial at ICCS 2020 An Introduction to Formal Concept Analysis (FCA) Bolzano Summer of Knowledge, September 2020 Summary of the presentation

1 Introduction

2 A Smooth Introduction to Formal Concept Analysis Derivation operators, formal and concept The structure of the concept lattice

3 Algorithms for discovering the concepts The Chein-Malgrange algorithm The Ganter algorithm

4 Pattern Structures

5 Functional Dependencies

6 Complements and References

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 2 / 118 Knowledge Discovery in Databases (KDD)

data Knowledge Discovery in Databases (KDD) consists in selection processing large volumes of data preparation in order to discover “patterns” prepared data that are significant, useful, and

reusable. data Three main steps: preparation, data mining, and interpretation. discovered patterns

KDD is iterative and interactive, interpretation i.e. it can be replayed and it is evaluation guided by an analyst. interpreted patterns From Data to Knowledge Units

data

Data have a context and KDD is selection knowledge oriented, depending on preparation domain knowledge, e.g. prepared data constraints, preferences. . . data At each step, domain knowledge mining can be embedded to guide KDD, e.g. interestingness measures, discovered patterns feature selection. . . interpretation The KDD loop can be extended evaluation with an additional step related to interpreted patterns the production of actionable knowledge(knowledge construction pattern step). representation

knowledge units Knowledge Discovery and Knowledge Representation

A parallel can be drawn with the “Knowledge Level” (Newell): the data level, the information level, the knowledge level, and the production of actionable knowledge for being reused by software agents. . . Knowledge Discovery and Knowledge Engineering are complementary. Introduction Knowledge Discovery and Knowledge Representation

Most of the time, KD is depending on domain knowledge: selecting patterns w.r.t. interest measures, similarity, thresholds, preferences. feature selection. . . Knowledge discovery, as a learning process, can be used for knowledge engineering, for example by returning implications, i.e. =⇒, and concept definitions, i.e. =⇒ and ⇐=. Actually, implications within the concept lattice provide a support for “partial” and “complete definitions” of concepts. A main idea underlying declarative knowledge representation and reasoning can be reused in KDD, i.e. Describe the problem and the solver will take care of the solution. On which formalism could rely such a “solver”?

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 6 / 118 Introduction Exploratory Knowledge Discovery based on FCA

Formal Concept Analysis (FCA) is a mathematical formalism based on lattice theory, classification and concept discovery. Moreover, FCA follows a human centered approach and supports exploration operations through the concept lattice. Discovery of concepts, i.e. classes of individuals with a description. Organization of concepts into a poset based on a subsumption relation. The poset supports exploration, e.g. information retrieval, visualization... How FCA can be a “Discovery Engine for Exploratory KDD”.

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 7 / 118 Introduction The Contexts of Planets

The initial context of planets:

Planet Size Distance to Sun Moon(s) Jupiter large far yes Mars small near yes Mercury small near no Neptune medium far yes Pluto small far yes Saturn large far yes Earth small near yes Uranus medium far yes Venus small near no

The context of planets after plain scaling:

Planets Size Distance to Sun Moon(s) small medium large near far yes no Jupiter x x x Mars x x x Mercury x x x Neptune x x x Pluto x x x Saturn x x x Earth x x x Uranus x x x Venus x x x

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 8 / 118 The Concept Lattice of Planets Exploration and Visualization (LatViz) Navigation and Information Retrieval Interpretation of concepts and rules

Rules: “far −→ medium” with confidence 2/5, “small −→ near” with confidence 4/5. Implications: “no =⇒ near” and “near =⇒ small” with confidence 1. Introduction FCA: References, Books, Tools, Events, Applications...

The Formal Concept Analysis Homepage: https://upriss.github.io/fca/fca.html Books and Tutorials: Ganter and Wille, Belohlavek, Carpineto and Romano, Ganter and Obiedkov. . . Tools: Conexp and variations, Toscana, Latviz, Galicia, FCA Tools Bundle. . . Events: International Conference on Formal Concept Analysis (ICFCA), International Conference on Lattices and Applications (CLA), Workshop “What can FCA do for Artificial Intelligence?” (FCA4AI), International Conference on Conceptual Structures (ICCS). . . Applications: many many. . .

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 11 / 118 1 Introduction

2 A Smooth Introduction to Formal Concept Analysis Derivation operators, formal concepts and concept lattice The structure of the concept lattice

3 Algorithms for discovering the concepts The Chein-Malgrange algorithm The Ganter algorithm

4 Pattern Structures

5 Functional Dependencies

6 Complements and References A Smooth Introduction to Formal Concept Analysis

FCA, Formal Concepts and Concept Lattices

Marc Barbut and Bernard Monjardet, Ordre et classification, Hachette, 1970. Bernhard Ganter and Rudolf Wille, Formal Concept Analysis, Springer, 1999. Claudio Carpineto and Giovanni Romano, Concept Data Analysis: Theory and Applications, John Wiley & Sons, 2004. Bernhard Ganter and Sergei Obiedkov, Conceptual Exploration, Springer, 2016.

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 13 / 118 A Smooth Introduction to Formal Concept Analysis The FCA process

The basic procedure of Formal Concept Analysis (FCA) is based on a simple representation of data, i.e. a binary table called a formal context. Each formal context is transformed into a mathematical structure called concept lattice. The information contained in the formal context is preserved. The concept lattice is the basis for data analysis. It is represented graphically to support analysis, mining, visualization, interpretation. . .

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 14 / 118 Pizzas/Toppings tomato garlic oregano mozza funghi delfia x x x regina x x x roma x x x x marta x x napoli x x x x castelli x x x A Smooth Introduction to Formal Concept Analysis The notion of a formal context

G/M m1 m2 m3 m4 m5 g1 x x x g2 x x x g3 x x x x g4 x x g5 x x x x g6 x x x

(G, M, I) is called a formal context where G (Gegenstände) and M (Merkmale) are sets, and I ⊆ G × M is a between G and M. G = {delfia, regina, roma, marta, napoli, castelli} M = {tomato, garlic, oregano, mozza, funghi} The elements of G are the objects, while the elements of M are the attributes, I is the incidence relation of the context (G, M, I).

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 16 / 118 A Smooth Introduction to Formal Concept Analysis Derivation operators, formal concepts and concept lattice Two derivation operators

For A ⊆ G and for B ⊆ M: 0 : ℘(G) −→ ℘(M) 0 : A −→ A0 A0 = {m ∈ M/(g, m) ∈ I for all g ∈ A} 0 : ℘(M) −→ ℘(G) with 0 : B −→ B0 B0 = {g ∈ G/(g, m) ∈ I for all m ∈ B}

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 17 / 118 A Smooth Introduction to Formal Concept Analysis Derivation operators, formal concepts and concept lattice Computing the images of sets of objects and attributes

{g2}0 = {m1, m3, m4}:

Objects / Attributes m1 m2 m3 m4 m5 g1 x x x g2 x x x g3 x x x x g4 x x g5 x x x x g6 x x x

A0 = {m ∈ M/(g, m) ∈ I for all g ∈ A} B0 = {g ∈ G/(g, m) ∈ I for all m ∈ B}

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 18 / 118 A Smooth Introduction to Formal Concept Analysis Derivation operators, formal concepts and concept lattice Computing the images of sets of objects and attributes

{m3}0 = {g1, g2, g3, g5, g6}:

Objects / Attributes m1 m2 m3 m4 m5 g1 x x x g2 x x x g3 x x x x g4 x x g5 x x x x g6 x x x

A0 = {m ∈ M/(g, m) ∈ I for all g ∈ A} B0 = {g ∈ G/(g, m) ∈ I for all m ∈ B}

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 19 / 118 A Smooth Introduction to Formal Concept Analysis Derivation operators, formal concepts and concept lattice Computing the images of sets of objects and attributes

{g3, g5}0 = {m1, m2, m3, m4}:

Objects / Attributes m1 m2 m3 m4 m5 g1 x x x g2 x x x g3 x x x x g4 x x g5 x x x x g6 x x x

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 20 / 118 A Smooth Introduction to Formal Concept Analysis Derivation operators, formal concepts and concept lattice Computing the images of sets of objects and attributes

{m3, m4}0 = {g2, g3, g5, g6}:

Objects / Attributes m1 m2 m3 m4 m5 g1 x x x g2 x x x g3 x x x x g4 x x g5 x x x x g6 x x x

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 21 / 118 A Smooth Introduction to Formal Concept Analysis Derivation operators, formal concepts and concept lattice The derivation operators and the

0 : ℘(G) −→ ℘(M) with A −→ A0 0 : ℘(M) −→ ℘(G) with B −→ B0 These two applications induce a Galois connection between ℘(G) and ℘(M) when sets are ordered by set inclusion.

A Galois connection is defined as follows: Let (P, ≤) and (Q, ≤) be two partially ordered sets. A pair of maps φ : P −→ Q and ψ : Q −→ P is called a Galois connection between P and Q if:

(i) p1 ≤ p2 =⇒ φ(p1) ≥ φ(p2) (decreasing).

(ii) q1 ≤ q2 =⇒ ψ(q1) ≥ ψ(q2) (decreasing). (iii) p ≤ ψ ◦ φ(p) and q ≤ φ ◦ ψ(q) (increasing).

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 22 / 118 A Smooth Introduction to Formal Concept Analysis Derivation operators, formal concepts and concept lattice Iterating the derivation

A0 = {m ∈ M/(g, m) ∈ I for all g ∈ A} B0 = {g ∈ G/(g, m) ∈ I for all m ∈ B} The derivation operators can be composed, i.e. iterated: starting with a set A ⊆ G, we obtain that A0 is a of M. Applying the second operator on this set, we get (A0)0, or A00 for short, which is a set of objects. Continuing, we obtain A000, A0000, and so on.

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 23 / 118 A Smooth Introduction to Formal Concept Analysis Derivation operators, formal concepts and concept lattice Iterating the derivation

Objects / Attributes m1 m2 m3 m4 m5 g1 x x x g2 x x x g3 x x x x g4 x x g5 x x x x g6 x x x

{g3}00 = {m1, m2, m3, m4}0 = {g3, g5} {g1, g3, g5}00 = {m2, m3}0 = {g1, g3, g5} {m3, m4}00 = {g2, g3, g5, g6}0 = {m1, m3, m4} {m3}00 = {g1, g2, g3, g5, g6}0 = {m3}

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 24 / 118 A Smooth Introduction to Formal Concept Analysis Derivation operators, formal concepts and concept lattice Properties of the Derivation Operators

A0 = {m ∈ M/(g, m) ∈ I for all g ∈ A} B0 = {g ∈ G/(g, m) ∈ I for all m ∈ B} The derivation operators 0 satisfy the following rules: 0 0 A1 ⊆ A2 =⇒ A2 ⊆ A1 (decreasing). 0 0 B1 ⊆ B2 =⇒ B2 ⊆ B1 (decreasing).

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 25 / 118 A Smooth Introduction to Formal Concept Analysis Derivation operators, formal concepts and concept lattice Properties of the Derivation Operators

Extents: A ⊆ A00 and A0 = A000 (increasing and fix point). Extents: B ⊆ B00 and B0 = B000 (increasing and fix point). Proving the fixpoint : Let X be a subset either of objects or attributes (this is not important here). As X ⊆ X00, then (X00)0 ⊆ X0, i.e. X000 ⊆ X0, using the fact that the prime operator is decreasing. As X ⊆ X00, then X0 ⊆ (X0)00, i.e. X0 ⊆ X000, using the fact that the is increasing this time.

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 26 / 118 A Smooth Introduction to Formal Concept Analysis Derivation operators, formal concepts and concept lattice Examples

G/M m1 m2 m3 m4 m5 g1 x x x g2 x x x g3 x x x x g4 x x g5 x x x x g6 x x x

0 0 A1 ⊆ A2 =⇒ A2 ⊆ A1 0 0 B1 ⊆ B2 =⇒ B2 ⊆ B1 A ⊆ A00 and A0 = A000 B ⊆ B00 and B0 = B000

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 27 / 118 A Smooth Introduction to Formal Concept Analysis Derivation operators, formal concepts and concept lattice Two Closure Operators

The mappings A −→ A00 and B −→ B00, i.e. the composition of ’ and ’, are closure operators. Extensivity: for all A ⊆ G and B ⊆ M, we have that 00 00 A ⊆ A and B ⊆ B Monotonicity: 00 00 00 00 If A1 ⊆ A2 then A1 ⊆ A2 and if B1 ⊆ B2 then B1 ⊆ B2 Idempotency: 00 00 00 00 00 00 (A ) = A and (B ) = B 00 A is a closed set whenever A = A and B is a closed set whenever 00 B = B.

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 28 / 118 A Smooth Introduction to Formal Concept Analysis Derivation operators, formal concepts and concept lattice The basic theorem of FCA

The concept lattice B(G, M, I) is a in which the infimum and the supremum are given by: The meet of two concepts: 00 (A1, B1) ∧ (A2, B2) = (A1 ∩ A2, (B1 ∪ B2) ) The join of two concepts: 00 (A1, B1) ∨ (A2, B2) = ((A1 ∪ A2) , B1 ∩ Bk) More generally: V T S 00 k∈K(Ak, Bk) = ( k∈K Ak, ( k∈K Bk) ) W S 00 T k∈K(Ak, Bk) = (( k∈K Ak) , k∈K Bk) Note: an intersection of closed sets is a closed set but a union of closed sets is not necessarily a closed set.

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 29 / 118 A Smooth Introduction to Formal Concept Analysis The structure of the concept lattice

The structure of the concept lattice

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 30 / 118 A Smooth Introduction to Formal Concept Analysis The structure of the concept lattice The object concept

The name of the object g is attached to the “lower half” of the corresponding object concept γ(g) = ({g}00, {g}0). The object concept of an object g ∈ G is the concept ({g}00, {g}0) where {g}0 is the object intent {m ∈ M/(g, m) ∈ I} of g. The object concept of g, denoted by γ(g), is the smallest concept (for the lattice order) with g in its extent.

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 31 / 118 A Smooth Introduction to Formal Concept Analysis The structure of the concept lattice

Examples of object concepts:

γ(g4) = ({g4}00, {g4}0)

γ(g4) = ({g2, g3, g4, g5, g6}, {m1, m4})

γ(g1) = ({g1}00, {g1}0)

γ(g1) = ({g1}, {m2, m3, m5})

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 32 / 118 A Smooth Introduction to Formal Concept Analysis The structure of the concept lattice The attribute concept

The name of the attribute m is located to the “upper half” of the corresponding attribute concept µ(m) = ({m}0, {m}00). Correspondingly, the attribute concept of an attribute m ∈ M is the concept ({m}0, {m}00) where {m}0 is the attribute extent {g ∈ G/(g, m) ∈ I} of m. The attribute concept of m, denoted by µ(m) is the largest concept (for the lattice order) with m in its intent.

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 33 / 118 A Smooth Introduction to Formal Concept Analysis The structure of the concept lattice

Examples of attribute concepts: µ(m1) = ({m1}0, {m1}00) µ(m1) = ({g2, g3, g4, g5, g6}, {m1, m4}) µ(m1) = µ(m4) µ(m2) = ({m2}0, {m2}00) µ(m2)({g1, g3, g5}, {m2, m3})

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 34 / 118 The reduced labeling

A reduced labeling may be used allowing that each object and each attribute is entered only once in a diagram. Reduced labeling: intuitively, the attributes are “at the highest” and the objects are “at the lowest”. The reduced labeling

The extent of concept #7 is composed marta, plus all objects which are in the extents of the descendants of #7, i.e. {regina, castelli} (#3) and then {roma, napoli} (#5). The intent of concept #5 is composed of all attributes which are in the intents of the ascendants of #5, i.e. garlic( #4), oregano( #2), and {tomato, mozza}( #7). The reduced labeling

The intent of concept #1 is composed of funghi and all the attributes which are in the intents of the ascendants of #1, i.e. garlic( #4), oregano( #2). When attribute funghi is present then attribute garlic and attribute oregano, i.e. there are two implications, namely funghi −→ garlic and garlic −→ oregano. The reduced labeling

For any concept (A, B) we have: g ∈ A ⇐⇒ γ(g) ≤ (A, B) When an object g is in the extent of a concept (A, B), the object concept γ(g) is always lower than (A, B) in the concept lattice. The reduced labeling

For any concept (A, B) we have: m ∈ B ⇐⇒ (A, B) ≤ µ(m) When an attribute m is in the intent of a concept (A, B), the attribute concept µ(m) is always higher than (A, B) in the concept lattice. The AOC-Poset or Object-Attribute-Concept Poset

The AOC-poset contains the object-concepts and the attribute-concepts, i.e. the main information for reconstructing the whole concept lattice. A Smooth Introduction to Formal Concept Analysis The structure of the concept lattice Reconstructing a context from a concept lattice

Given a concept lattice, the context associated to the lattice can be read using the following general rule: (g, m) ∈ I ⇐⇒ γ(g) ≤ µ(m) Just as a set of concepts can be uniquely determined from a given context, so the context can be reconstructed from its concepts. The set G is the extent of the largest concept (G, G0). The set M is the intent of the smallest concept (M0, M). The incidence relation is given by: I = S{A × B/(A, B) ∈ B(G, M, I)} where B(G, M, I) denotes the set of all formal concepts of (G, M, I).

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 41 / 118 Objects/Attributes funghi garlic oregano tomato mozza marta x x regina x x x castelli x x x roma x x x x napoli x x x x delfia x A Smooth Introduction to Formal Concept Analysis The structure of the concept lattice

Irreducible Elements in a Lattice

Xavier Dolques, Florence Le Ber, Marianne Huchard. AOC-Posets: a Scalable Alternative to Concept Lattices for Relational Concept Analysis, in Proceedings of the Tenth International Conference on Concept Lattices and Their Applications, CEUR Workshop Proceedings 1062, pages 129–140, 2013. Anne Berry, Alain Gutierrez, Marianne Huchard, Amedeo Napoli and Alain Sigayret. Hermes: a simple and efficient algorithm for building the AOC-poset of a binary relation, Annals of Mathematics and Artificial Intelligence, 72(1-2):45–71, 2014.

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 43 / 118 A Smooth Introduction to Formal Concept Analysis The structure of the concept lattice The Join-Irreducible Elements in a Lattice

Let L be a lattice. An element x ∈ L is join-irreducible or ∨-irreducible is a ≤ x and b ≤ x imply a ∨ b < x. In other words, whenever x = a ∨ b then x = a or x = b: x cannot be the join of two different (lower) elements. x ∈ L is ∨-irreducible iff x has exactly one lower cover (immediate descendant in the lattice). J(L) denotes the set of the ∨-irreducible elements of L.

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 44 / 118 A Smooth Introduction to Formal Concept Analysis The structure of the concept lattice The Meet-Irreducible Elements in a Lattice

An element x ∈ L is meet-irreducible or ∧-irreducible if a ≤ x and b ≤ x imply a ∧ b ≤ x. In other words, whenever x = a ∧ b then x = a or x = b: x cannot be the meet of two different (larger) elements. x ∈ L is ∧-irreducible iff x has exactly one upper cover (immediate ascendant in the lattice). M(L) denotes the set of the ∧-irreducible elements of L.

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 45 / 118 A Smooth Introduction to Formal Concept Analysis The structure of the concept lattice Atoms, co-atoms, and concept lattice

The upper neighbors of ⊥ in a lattice are called the atoms. The atoms are always ∨-irreducible elements in a lattice. The lower neighbors of > in a lattice are called the co-atoms. The co-atoms are always ∧-irreducible elements in a lattice. It can be shown that any complete lattice L is isomorphic to to the concept lattice C(J(L), M(L), ≤).

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 46 / 118 An Example of Construction

Docs/Tops t1 t2 t3 t4 t5 t6 t7 d1 x x x x d2 x x x x d3 x x x d4 x x x x d5 x x x

The binary table K describes a set of five documents D = {d1, d2, d3, d4, d5} annotated by a set of seven topics T = {t1, t2, t3, t4, t5, t6, t7}. Firstly, we will draw the associated concept lattice L, and then we will make explicit the implications in L. Secondly, we will make explicit the ∨-irreducible and ∧-irreducible elements, and build the context (J(L), M(L), ≤). The DocsTops Concept Lattice. The DocsTops AOC-poset. The DocsTops AOC-poset.

M(L) = {#0, #3, #4, #5, #9, #10} J(L) = {#1, #7, #9, #11, #12} The DocsTops Reconstructed Context

Docs/Tops #0 #3 #4 #5 #9 #10 #1 x x x #7 x x x x #9 x x x #11 x x x #12 x x x Original and Reconstructed DocsTops Concept Lattices 1 Introduction

2 A Smooth Introduction to Formal Concept Analysis Derivation operators, formal concepts and concept lattice The structure of the concept lattice

3 Algorithms for discovering the concepts The Chein-Malgrange algorithm The Ganter algorithm

4 Pattern Structures

5 Functional Dependencies

6 Complements and References Algorithms for Discovering the Concepts There are many algorithms for discovering the concepts, some of them are also building the concept lattice at the same time. Chein - Malgrange: A simple algorithm for discovering formal concepts. Next-Closure: the algorithm designed by Bernhard Ganter which is explained below, one of the most important algorithms. Bordat: an algorithm for discovering formal concepts and building the concept lattice. Close By one (CbO): an algorithm close to NextClosure and designed by Sergei Kuznetsov. Norris algorithm, AddIntent (Sergei Obiedkov), Galois (Claudio Carpineto and Giovanni Romano), Nourine algorithm, Godin algorithm,... A reference paper: Sergei O. Kuznetsov and Sergei A. Obiedkov, Comparing performance of algorithms for generating concept lattices. Journal of Experimental and Theoretical Artificial Inteligence, 14:189–216, 2002. Algorithms for discovering the concepts The Chein-Malgrange algorithm An algorithm for computing the formal concepts

A rectangle in a binary table corresponds to a pair (X, Y) where X denotes a set of objects or an extension and Y denotes a set of attributes, or an . The size of a rectangle (X, Y) is the cardinality |X| of the extension X. The extension X and the intension Y are not necessarily an extent and an intent respectively.

A rectangle (X, Y) is contained in another rectangle (X1, Y1) whenever X ⊆ X1 and Y ⊆ Y1. A rectangle (X, Y) is maximal when it is not included in any other rectangle: any rectangle (X1, Y1) containing a maximal rectangle (X, Y) is such that X1 and/or Y1 contain at least a “void place”, i.e. a cell without a crossx.

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 55 / 118 Algorithms for discovering the concepts The Chein-Malgrange algorithm

1. L1 := {(g, g0),/g ∈ G} (set of rectangles of size 1) 2. k := 1 3. While |Lk| > 1 do 3.1 Lk+1 := ∅ 3.2 For each {(A, B), (C, D)} ⊆ Lk do 3.2.1 F := B ∩ D 3.2.2 If there is E ⊆ G such that (E, F) ∈ Lk+1 3.2.2.1 then E := E ∪ A ∪ C 3.2.2.2 else Lk+1 := Lk+1 ∪ {(A ∪ C, F)} endif 3.2.3 If F = B then mark (A, B) in Lk endif 3.2.4 If F = D then mark (C, D) in Lk endif 3.3 Lk = Lk \{(V, W)/(V, W) is marked in Lk} 3.4 k := k + 1 endwhile S k 4. k L is the concept set

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 56 / 118 Algorithms for discovering the concepts The Chein-Malgrange algorithm An example of construction of a concept lattice

G/M a b c d e 1 x x x 2 x x x 3 x x x x 4 x x 5 x x x x 6 x x x

For a better readability we set: G = {1, 2, 3, 4, 5, 6} M = {a, b, c, d, e}

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 57 / 118 The rectangles of size 1: L1 = {(1, bce), (2, acd), (3, abcd), (4, ad), (5, abcd), (6, acd)} For simplifying, the rectangles are ordered in increasing intent size and “duplicates” or rectangles with the same intent are fused. L1 = {(4, ad), (26, acd), (1, bce), (35, abcd)} Build the rectangles of higher size by “union” of rectangles in L1 (empty intents are ignored): L2 = ∅ (4, ad) + (26, acd) −→ (246, ad) ; L2 = {(246, ad)} ; (4, ad) is marked in L1 and no more used. (26, acd) + (1, bce) −→ (126, c) ; L2 = {(126, c), (246, ad)} (26, acd) + (35, abcd) −→ (2356, acd) ; L2 = {(126, c), (246, ad), (2356, acd)} ; (26, acd) is marked in L1 and no more used. (1, bce) + (35, abcd) −→ (135, bc) ; L2 = {(126, c), (246, ad), (135, bc), (2356, acd)} Algorithms for discovering the concepts The Chein-Malgrange algorithm

L1 = {(1, bce), (35, abcd)} L2 = {(126, c), (246, ad), (135, bc), (2356, acd)} ; Build the rectangles of higher size by “union” of rectangles in L2 (empty intents are ignored): L3 = ∅ (126, c) + (135, bc) −→ (12356, c) ; L3 = {(12356, c)} ; (126, c) is marked in L2 and no more used. (126, c) + (2356, acd) −→ (12356, c) ; nothing to do. (246, ad) + (2356, acd) −→ (23456, ad) ; L3 = {(12356, c), (23456, ad)} ; (246, ad) is marked in L2 and no more used. L3 cannot be processed further.

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 60 / 118 Algorithms for discovering the concepts The Chein-Malgrange algorithm

The set of maximal rectangles: L1 = {(1, bce), (35, abcd)} L2 = {(135, bc), (2356, acd)} ; L3 = {(12356, c), (23456, ad)} ;

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 61 / 118 Algorithms for discovering the concepts The Ganter algorithm

The Ganter algorithm for finding formal concepts

Bernhard Ganter and Sergei Obiedkov. Conceptual Exploration. Springer, 2016. Bernhard Ganter. Two Basic Algorithms in Concept Analysis. Preprint 831, Fachbereich Mathematik, Technische Hochschule Darmstadt, Juni 1984.

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 62 / 118 Algorithms for discovering the concepts The Ganter algorithm Representing sets by bit vectors

We start by giving to the set of attributes M an arbitrary linear order, M = {m1 < m2 < ... < mn}. Every subset S ⊆ M can as usual be described by a row of crosses and blanks. For example, if M = {a < b < c < d < e < f < g}, the subset S := {a, c, d, f} will be written as: x x x x

Using this notation it is easy to check when a given set is a subset of another given set, and some other tests.

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 63 / 118 Algorithms for discovering the concepts The Ganter algorithm The lectic order

We introduce a linear order over the of M, called the lectic order, denoted by ≤ –or < when eqaulity is not true– and defined as follows: Let A, B ⊆ M be two distinct subsets. We say that A is lectically smaller than B, if the smallest element in which A and B differ belongs to B. More formally, A < B :⇐⇒ ∃i(i ∈ B, i 6∈ A, ∀j < i(j ∈ A ⇐⇒ j ∈ B)). For example {a, c, e, f} < {a, c, d, f}, because the smallest element in which the two sets differ is d, and this element belongs to the larger set.

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 64 / 118 Algorithms for discovering the concepts The Ganter algorithm Understanding the lectic order

The lectic order corresponds to the lexicographic order over the bit vectors related to the subets: {a, c, e, f} < {a, c, d, f} is expressed as 101011 < 101101 or: x x x x

x x x x The lectic order extends the subset-order, i.e., A ⊆ B −→ A ≤ B, but the converse is not true.

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 65 / 118 Algorithms for discovering the concepts The Ganter algorithm Understanding the lectic order

Notation: A

In other words, A

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 66 / 118 Algorithms for discovering the concepts The Ganter algorithm Finding all closed sets

We consider the closure operator: A −→ A00 on the set of attributes M. This operator assigns to each subset A ⊆ M its closure A00 ⊆ M. One main task is to discover the list of all closures or closed sets (as we already discussed in the preceding section). A naive algorithm could be: “for each A ⊆ M check if A is closed”. But such an algorithm would require an exponential number of lookups in a list that may have exponential size!

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 67 / 118 Algorithms for discovering the concepts The Ganter algorithm Finding all closed sets

A better idea is to generate the closures in lectic order. Given a “starting closed set”, we will compute the lectically next closed set. Then no lookup in the list of found solutions is necessary (and it can be not even necessary to store the resulting list). For many applications it can be sufficient to generate the list elements on demand. Therefore we do not have to store exponentially many closed sets, but instead just one!

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 68 / 118 Algorithms for discovering the concepts The Ganter algorithm The ⊕-construction

To find the next closure of a given closed set, we define the following operation for A ⊆ M and mi ⊆ M: 00 A ⊕ mi := ((A ∩ {m1, ..., mi−1}) ∪ {mi})

For example, let A := {a, c, d, f} and mi := e.

We first remove all elements in A which are greater than or equal to mi and then we insert mi: x x x x

x x x

x x x x And we compute the closure of {a, c, d, e}.

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 69 / 118 Algorithms for discovering the concepts The Ganter algorithm Computing the next closed set

Proposition.

1. If i 6∈ A then A < A ⊕ i

2. If B is closed and A

3. If B is closed and A

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 70 / 118 Algorithms for discovering the concepts The Ganter algorithm The next closed set

Using the proposition, the following theorem allows to characterize the next closed set: Theorem. The smallest closed set larger than a given set A ⊂ M with respect to the lectic order is A ⊕ i,

i being the largest element of M with A

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 71 / 118 Algorithms for discovering the concepts The Ganter algorithm Turning the theorem into an algorithm

Algorithm All Closures Input: A closure operator X −→ X00 on a finite set M. Output: All closed sets in lectic order.

begin First Closure; repeat Output A; Next Closure; until not success; end.

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 72 / 118 Algorithms for discovering the concepts The Ganter algorithm Finding the first closed set

Algorithm First Closure Input: A closure operator X −→ X00 on a finite set M. Output: The closure A of the empty set.

begin A := ∅00; end.

Note: the closure of the empty set ∅00 is not empty when there exists an attribute which is in the description of all objects.

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 73 / 118 Algorithms for discovering the concepts The Ganter algorithm Finding all other closed sets: Next-Closure Algorithm

Algorithm Next Closure Input: A closure operator X −→ X00 on a finite set M and a subset A ⊆ M. Output: A is replaced by the lectically next closed set.

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 74 / 118 Algorithms for discovering the concepts The Ganter algorithm Finding all other closed sets: Next-Closure Algorithm

begin i := largest element of M; i := succ(i); success := false; repeat until success or i = smallest element of M. i := pred(i); if i 6∈ A then A := A ∪ {i}; B := A”; if B \ A contains no element < i then A:= B; success := true; else A := A \ {i}; end.

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 75 / 118 Algorithms for discovering the concepts The Ganter algorithm Finding Formal Concepts

For any formal context (G, M, I), there are two closure systems: the set of extents is a closure system on G and as well the set of intents is a closure system on M. Therefore we can apply the Next closure algorithm to compute all extents or all intents, and thus all formal concepts of a formal context. Sometimes, the “Next closure” algorithm is called “Next extent” When applied to extents and “Next intent” when applied to intents.

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 76 / 118 An example

G/M a b c d e 1 x x x 2 x x x 3 x x x x 4 x x 5 x x x x 6 x x x

1 Introduction

2 A Smooth Introduction to Formal Concept Analysis Derivation operators, formal concepts and concept lattice The structure of the concept lattice

3 Algorithms for discovering the concepts The Chein-Malgrange algorithm The Ganter algorithm

4 Pattern Structures

5 Functional Dependencies

6 Complements and References Pattern Structures

Scaling

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 80 / 118 Pattern Structures Conceptual scaling

The formal context is the basic data type of Formal Concept Analysis. However data are often given in form of a many-valued context. Many-valued contexts are translated to one-valued context via conceptual scaling. But this is not automatic and some arbitrary choices have to be made. Examples of scalings: Nominal: K = (N, N, =) Ordinal: K = (N, N, ≤) Interordinal: K = (N, N, ≤ ∪ ≥)

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 81 / 118 Pattern Structures A numerical example

G/M m1 m2 m3 g1 1 3 4 g2 2 2 3 g3 4 1 1 g4 2 2 1

Nominal Scaling:

G/M m1=1 m1=2 m1=4 m2=1 m2=2 m2=3 m3=1 m3=3 m3=4 g1 x x x g2 x x x g3 x x x g4 x x x

Interordinal Scaling:

G/M m1.lt.1 m1.gt.1 m1.lt.2 m1.gt.2 m1.lt.3 m1.gt.3 m1.lt.4 m1.gt.4 m2.lt.1 m2.gt.1 g1 x x x x x x g2 x x x x x x g3 x x x x x x x g4 x x x x x x

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 82 / 118 Pattern Structures

Pattern Structures

Bernhard Ganter and Sergei O. Kuznetsov. Pattern Structures and Their Projections, in Proceedings of the 9th International Conference on Conceptual Structures (ICCS-2001), LNCS 2120, pages 129–142, 2001. Mehdi Kaytoue, Sergei O. Kuznetsov, Amedeo Napoli and Sébastien Duplessis. Mining Gene Expression Data with Pattern Structures in Formal Concept Analysis, Information Science, 181(10):1989–2001, 2011. Mehdi Kaytoue, Victor Codocedo, Aleksey Buzmakov, Jaume Baixeries, Sergei O. Kuznetsov, and Amedeo Napoli. Pattern Structures and Concept Lattices for Data Mining and Knowledge Processing. Proceedings of the European Conference on and Knowledge Discovery in Databases (ECML PKDD), Lecture Notes in Computer Science 9286, pages 227–231, 2015.

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 83 / 118 Pattern Structures Computing similarity between descriptions

Intersection considered as a similarity operator:

∩ behaves like a similarity operator: {m1, m2} ∩ {m1, m3} = {m1} m1 m2 m3

g1 × × g2 × × g3 × × g4 × × g5 × × × ∩ induces a partial ordering relation ⊆ as follows:

S1 ∩ S2 = S1 ⇐⇒ S1 ⊆ S2 {m1} ∩ {m1, m2} = {m1} ⇐⇒ {m1} ⊆ {m1, m2} ∩ has the properties of a meet u in a semi lattice, i.e. a commutative, associative and idempotent operation:

c u d = c ⇐⇒ c v d

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 84 / 118 Pattern Structures The definition of a Pattern Structure

A pattern structure (G, (D, u), δ) is composed of:

G a set of objects, (D, u) a semi-lattice of descriptions or patterns, δ a mapping such as δ(g) ∈ D describes object g.

The Galois connection for (G, (D, u), δ) is defined as:

The maximal description representing the similarity of a set of objects:

A = ug∈Aδ(g) for A ⊆ G

The maximal set of objects sharing a given description:

d  = {g ∈ G|d v δ(g)} for d ∈ (D, u)

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 85 / 118 Pattern Structures Standard FCA as a Pattern Structure (G, (D, u), δ)

Considering a standard formal context (G, M, I ): G is the set of objects, (D, u) corresponds to ℘(M) where M is the set of attributes. δ(g) corresponds to the description of g in terms of attributes.

m1 m2 m3

g1 × × The Galois connection: g2 × × g3 × × g4 × × g5 × × ×

A = ug∈Aδ(g) for A ⊆ G 0 0 0 {g1, g2} = g1 ∩ g2 = {m1, m2} ∩ {m1, m3} = {m1} d = {g ∈ G|d v δ(g)} for d ∈ (D, u) 0 0 {m1} = {gi ∈ G|{m1} ⊆ gi } = {g1, g2, g5}

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 86 / 118 Pattern Structures From FCA to Pattern Structures

A formal context (G, M, I ) is based on a A pattern structure (G, (D, u), δ) is set of objects G, a set of attributes M, based on a set of objects G, a meet and a binary relation I ⊆ G × M. semi-lattice of object descriptions Two derivation operators are defined as (D, u), and a mapping δ : G −→ D follows, ∀A ⊆ G, B ⊆ M: which associates a description to each object. Two derivation operators are defined as A0 = {m ∈ M|∀g ∈ A, (g, m) ∈ I } follows, ∀A ⊆ G, d ∈ (D, u):

A = ug∈Aδ(g) B0 = {g ∈ G|∀m ∈ B, (g, m) ∈ I } d = {g ∈ G|d v δ(g) A formal concept (A, B) verifies A0 = B 0 and A = B . A formal concept (A, d) verifies A = d Formal concepts are partially ordered and A = d w.r.t. inclusion of extents (or dually of Pattern concepts are partially ordered intents): w.r.t. inclusion of extents (or dually inclusion of intents): (A1, B1) ≤ (A2, B2) iff A1 ⊆ A2 (A1, d1) ≤ (A2, d2) iff A1 ⊆ A2

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 87 / 118 Pattern Structures Interval Pattern Structure

Let D be a set of intervals with integer bounds (for simplicity), let u be a meet operator defined on D as the convex hull of intervals:

[a1, b1] u [a2, b2] = [min(a1, a2), max(b1, b2)] [4, 5] u [5, 5] = [4, 5]

[a1, b1] v [a2, b2] ⇐⇒ [a2, b2] ⊆ [a1, b1] [4, 5] v [5, 5] ⇐⇒ [5, 5] ⊆ [4, 5]

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 88 / 118 Pattern Structures Interval Pattern Structures for classifying a numerical context

m1 m2 m3

g1 5 7 6 g2 6 8 4 g3 4 8 5 g4 4 9 8 g5 5 8 5

 {g1, g2} = ug∈{g1,g2}δ(g) = h5, 7, 6i u h6, 8, 4i = h[5, 6], [7, 8], [4, 6]i

h[5, 6], [7, 8], [4, 6]i = {g ∈ G|h[5, 6], [7, 8], [4, 6]i v δ(g)}

= {g1, g2, g5}

({g1, g2, g5}, h[5, 6], [7, 8], [4, 6]i) is a pattern concept

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 89 / 118 Interval pattern concept lattice

Highest concepts: largest extents and smallest intents (but the largest intervals), Lowest concepts: smallest extents and largest intents (but the smallest intervals), Problem: efficient pattern mining. Pattern Structures Some applications of pattern structures

Text mining with tree-based pattern structures. Artuur Leeuwenberg, Aleksey Buzmakov, Yannick Toussaint and Amedeo Napoli Exploring Pattern Structures of Syntactic Trees for Relation Extraction, in ICFCA 2015, LNAI 9113, 2015. Mining sequential data for analyzing patient trajectories (with selection of interesting concepts using stability measure). Aleksey Buzmakov, Elias Egho, Nicolas Jay, Sergei O. Kuznetsov, Amedeo Napoli and Chedy Raïssi. On Mining Complex Sequential Data by Means of FCA and Pattern Structures, in International Journal of General Systems (To be published), 2015. Information Retrieval and Recommendation. Victor Codocedo, Ioanna Lykourentzou and Amedeo Napoli. A semantic approach to concept lattice-based information retrieval, in Annals of Mathematics and Artificial Intelligence, 72:169–195, 2014. Discovery of Functional Dependencies. Jaume Baixeries, Amedeo Napoli and Mehdi Kaytoue. Characterizing functional dependencies in formal concept analysis with pattern structures, Annals of Mathematics and Artificial Intelligence, 72:129-149, 2014. and Triadic Analysis. Mehdi Kaytoue, Sergei O. Kuznetsov, Juraj Macko and Amedeo Napoli. Biclustering meets triadic concept analysis, Annals of Mathematics and Artificial Intelligence, 70(1-2):55-79, 2014. Victor Codocedo and Amedeo Napoli. Lattice-based biclustering using Partition Pattern Structures, in Proceedings of ECAI 2014, IOS Press, pages 213-218, 2014.

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 91 / 118 h[4, 5], [5, 6]i = {g1, g3, g5} h[4, 5], [4, 7]i = {g1, g3, g5} h[4, 5], [4, 6]i = {g1, g3, g5} h[4, 6], [5, 6]i = {g1, g3, g5} h[4, 5], [5, 7]i = {g1, g3, g5} h[4, 6], [5, 7]i = {g1, g3, g5}

Pattern Structures for interval patterns

Interval patterns as (hyper) rectangles

m3 m1 m3

g 5 6 δ(g4) b 1 8 g2 6 4 g3 4 5 7 g4 4 8 g5 5 5 δ(g1) 6 b

δ(g3) 5 b b δ(g5)

4 b δ(g2)

3 m1 3 4 5 6

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 92 / 118 h[4, 5], [4, 7]i = {g1, g3, g5} h[4, 5], [4, 6]i = {g1, g3, g5} h[4, 6], [5, 6]i = {g1, g3, g5} h[4, 5], [5, 7]i = {g1, g3, g5} h[4, 6], [5, 7]i = {g1, g3, g5}

Pattern Structures Semantics for interval patterns

Interval patterns as (hyper) rectangles

m3 m1 m3

g 5 6 δ(g4) b 1 8 g2 6 4 g3 4 5 7 g4 4 8 g5 5 5 δ(g1) 6 b

δ(g3) 5 b b h[4, 5], [5, 6]i = {g1, g3, g5} δ(g5)

4 b δ(g2)

3 m1 3 4 5 6

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 92 / 118 h[4, 5], [4, 6]i = {g1, g3, g5} h[4, 6], [5, 6]i = {g1, g3, g5} h[4, 5], [5, 7]i = {g1, g3, g5} h[4, 6], [5, 7]i = {g1, g3, g5}

Pattern Structures Semantics for interval patterns

Interval patterns as (hyper) rectangles

m3 m1 m3

g 5 6 δ(g4) b 1 8 g2 6 4 g3 4 5 7 g4 4 8 g5 5 5 δ(g1) 6 b

δ(g3) 5 b b h[4, 5], [5, 6]i = {g1, g3, g5} δ(g5)

h[4, 5], [4, 7]i = {g , g , g } b 1 3 5 4 δ(g2)

3 m1 3 4 5 6

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 92 / 118 h[4, 6], [5, 6]i = {g1, g3, g5} h[4, 5], [5, 7]i = {g1, g3, g5} h[4, 6], [5, 7]i = {g1, g3, g5}

Pattern Structures Semantics for interval patterns

Interval patterns as (hyper) rectangles

m3 m1 m3

g 5 6 δ(g4) b 1 8 g2 6 4 g3 4 5 7 g4 4 8 g5 5 5 δ(g1) 6 b

δ(g3) 5 b b h[4, 5], [5, 6]i = {g1, g3, g5} δ(g5)

h[4, 5], [4, 7]i = {g , g , g } b 1 3 5 4 h[4, 5], [4, 6]i = {g1, g3, g5} δ(g2) 3 m1 3 4 5 6

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 92 / 118 h[4, 5], [5, 7]i = {g1, g3, g5} h[4, 6], [5, 7]i = {g1, g3, g5}

Pattern Structures Semantics for interval patterns

Interval patterns as (hyper) rectangles

m3 m1 m3

g 5 6 δ(g4) b 1 8 g2 6 4 g3 4 5 7 g4 4 8 g5 5 5 δ(g1) 6 b

δ(g3) 5 b b h[4, 5], [5, 6]i = {g1, g3, g5} δ(g5)

h[4, 5], [4, 7]i = {g , g , g } b 1 3 5 4 h[4, 5], [4, 6]i = {g1, g3, g5} δ(g2) h[4, 6], [5, 6]i = {g , g , g } 1 3 5 3 m1 3 4 5 6

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 92 / 118 h[4, 6], [5, 7]i = {g1, g3, g5}

Pattern Structures Semantics for interval patterns

Interval patterns as (hyper) rectangles

m3 m1 m3

g 5 6 δ(g4) b 1 8 g2 6 4 g3 4 5 7 g4 4 8 g5 5 5 δ(g1) 6 b

δ(g3) 5 b b h[4, 5], [5, 6]i = {g1, g3, g5} δ(g5)

h[4, 5], [4, 7]i = {g , g , g } b 1 3 5 4 h[4, 5], [4, 6]i = {g1, g3, g5} δ(g2) h[4, 6], [5, 6]i = {g , g , g } 1 3 5 3 m1 h[4, 5], [5, 7]i = {g1, g3, g5} 3 4 5 6

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 92 / 118 Pattern Structures Semantics for interval patterns

Interval patterns as (hyper) rectangles

m3 m1 m3

g 5 6 δ(g4) b 1 8 g2 6 4 g3 4 5 7 g4 4 8 g5 5 5 δ(g1) 6 b

δ(g3) 5 b b h[4, 5], [5, 6]i = {g1, g3, g5} δ(g5)

h[4, 5], [4, 7]i = {g , g , g } b 1 3 5 4 h[4, 5], [4, 6]i = {g1, g3, g5} δ(g2) h[4, 6], [5, 6]i = {g , g , g } 1 3 5 3 m1 h[4, 5], [5, 7]i = {g1, g3, g5} 3 4 5 6 h[4, 6], [5, 7]i = {g1, g3, g5}

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 92 / 118 Pattern Structures A condensed representation

Equivalence classes of interval patterns Two interval patterns with same image are said to be equivalent:

c =∼ d ⇐⇒ c = d

Equivalence class of a pattern d: [d] = {c|c =∼ d} with a unique closed pattern: the smallest rectangle and one or several generators: the largest rectangles. A pattern is closed iff it cannot be included in any other pattern with the same image. Dually, a pattern is a generator iff it does not include any other pattern with the same image.

In the example: 360 patterns ; 18 closed patterns ; 44 generators

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 93 / 118 1 Introduction

2 A Smooth Introduction to Formal Concept Analysis Derivation operators, formal concepts and concept lattice The structure of the concept lattice

3 Algorithms for discovering the concepts The Chein-Malgrange algorithm The Ganter algorithm

4 Pattern Structures

5 Functional Dependencies

6 Complements and References Functional Dependencies

Functional and Approximate Dependencies

Jaume Baixeries, Mehdi Kaytoue and Amedeo Napoli. Characterizing Functional Dependencies in Formal Concept Analysis with Pattern Structures, Annals of Mathematics and Artificial Intelligence, 72:129–149, 2014. Jaume Baixeries, Mehdi Kaytoue and Amedeo Napoli. Computing Functional Dependencies with Pattern Structures. Proceedings of The Ninth International Conference on Concept Lattices and Their Applications (CLA 2012), CEUR Workshop Proceedings 972, pages 175–186, 2012.

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 95 / 118 Functional Dependencies Functional Dependencies: Notations and Vocabulary

id a b c d

t1 1 3 4 1 t2 4 3 4 3 t3 1 8 4 1 t4 4 3 7 3

Let U = {a, b, c, d} be a set of attributes and Dom = {1, 3, 4, 7, 8} be a set of numerical values. A tuple t is a function t : U 7→ Dom and a data table T is a set of tuples. T is represented as a matrix where is T = {t1, t2, t3, t4}. Given a tuple t ∈ T and X = {x1, x2,..., xn} ⊆ U, we have:

t(X ) = ht(x1), t(x2),..., t(xn)i

t2({a, c}) = ht2(a), t2(c)i = h4, 4i

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 96 / 118 Functional Dependencies Functional Dependencies: Definition

id a b c d

t1 1 3 4 1 t2 4 3 4 3 t3 1 8 4 1 t4 4 3 7 3

Let T be a data table (i.e. a set of tuples), U a set of attributes and X , Y ⊆ U. A functional dependency (FD) X → Y holds in T if:

∀ti , tj ∈ T : ti (X ) = tj (X ) ⇒ ti (Y ) = tj (Y )

For example, the functional dependencies a → d and d → a hold whereas a → c does not hold since t2(a) = t4(a) but t2(c) 6= t4(c).

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 97 / 118 Functional Dependencies Functional Dependencies: an FCA characterization

Let us consider a table T with attributes in U taking values in Dom.

Let us build the formal context K = (B2(G), M, I ), where: G = T and M = U,

B2(G) = {(ti , tj ) | i < j and ti , tj ∈ T } is an ordered set of pairs of tuples from G, the incidence relation I is defined as:

(ti , tj ) I m ⇔ ti (m) = tj (m), for m ∈ M

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 98 / 118 Functional Dependencies From a Data Table to a Context

B2(G) = {(ti , tj ) | i < j and ti , tj ∈ T }

(ti , tj ) I m ⇔ ti (m) = tj (m) K a b c d id a b c d (t1, t2) × × t1 1 3 4 1 (t1, t3) × × × t2 4 3 4 3 (t1, t4) × t3 1 8 4 1 (t2, t3) × t4 4 3 7 3 (t2, t4) × × × (t3, t4)

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 99 / 118 Functional Dependencies Functional Dependencies as Implications

K a b c d

(t1, t2) × × (t1, t3) × × × (t1, t4) × (t2, t3) × (t2, t4) × × × (t3, t4)

A functional dependency X → Y holds in a table T if and only if 00 00 {X } = {X , Y } in the formal context K = (B2(G), M, I ), i.e. X → Y is an implication (Y = X 00 \ X ). ac → d holds since {a, c}00 = {a, c, d} and {a, c, d}00 = {a, c, d}, ab → d holds since {a, b}00 = {a, b, d} and {a, b, d}00 = {a, b, d}, a → bd does not hold since {a}00 = {a, d} and {a, b, d}00 = {a, b, d}.

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 100 / 118 Functional Dependencies Functional Dependencies as Partitions

id a b c d

t1 1 3 4 1 t2 4 3 4 3 t3 1 8 4 1 t4 4 3 7 3

Let X ⊆ U be a set of attributes in a table T . Two tuples ti and tj in T are equivalent w.r.t. X when:

ti ∼ tj ⇐⇒ ti (X ) = tj (X )

i.e. the tuples are “in agreement” w.r.t. a given subset of attributes X . Then, the partition of T induced by X is a set of equivalence classes:

ΠX (T ) = {c1, c2,..., cm}

For example, we have that Πa(T ) = {{t1, t3}, {t2, t4}}

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 101 / 118 Functional Dependencies Functional Dependencies as Partitions

id a b c d

t1 1 3 4 1 t2 4 3 4 3 t3 1 8 4 1 t4 4 3 7 3

Part(T ) is the set of partitions of T and can be ordered thanks to the coarseness relation v between partitions:

0 0 ∀Pi , Pj ∈ Part(T ): Pi v Pj ⇐⇒ ∀c ∈ Pi : ∃c ∈ Pj : c ⊆ c

{{t1}, {t2}, {t3, t4}} v {{t1}, {t2, t3, t4}}. A functional dependency X → Y holds in T if and only if ΠY (T ) v ΠX (T ).

a → d holds since Πd v Πa, with Πa(T ) = {{t1, t3}, {t2, t4}} and Πd (T ) = {{t1, t3}, {t2, t4}} (and as well d → a holds too).

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 102 / 118 Functional Dependencies Partitions and Equivalence Relations

A partition over a given set E is a set P ⊆ ℘(E) such that: S pi = E pi ∈P pi ∩ pj = ∅, for any pi , pj ∈ P with i 6= j.

There is a bijection between Part(T ) the sets of partitions and Equiv(T ) the set of equivalence relations of a set. This 1-1-correspondence between Part(T ) and Equiv(T ) is given by (e, e0) ∈ Equiv(T ) iff e and e0 belongs to the same equivalence class of Part(T ). For example, given P = {{1, 2, 3}, {4}}, one has the relation Equiv(P) = {(1, 2), (1, 3), (2, 3), (1, 1), (2, 2), (3, 3), (4, 4)} The set of equivalence relations Equiv(T ) can be ordered by inclusion as well as the set of partitions Part(T ).

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 103 / 118 Functional Dependencies Ordering Partitions

A partition P1 is finer than a partition P2 (P2 is coarser than P1), written P1 v P2, if any subset in P1 is a subset of a subset in P2.

{{1, 3}, {2}, {4}} v {{1, 2, 3}, {4}}

Actually, the ordered set of equivalence relations (Equiv(T ), v) of a set T , and as well the ordered set of partitions of T ,(Part(T ), v), form a lattice. In >, all elements are equivalent and there is only one equivalence class. In ⊥, there are no two equivalent elements, i.e. each single element forms an equivalence class and there are |T | classes of equivalence.

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 104 / 118 Functional Dependencies The meet and join of two partitions

The meet of two partitions is defined as the coarsest common refinement of two partitions, i.e. it is the intersection of the respective equivalence relations.

{{1, 3}, {2, 4}} u {{1, 2, 3}, {4}} = {{1, 3}, {2}, {4}}

The join of two partitions is defined as the finest common coarsening of two partitions, i.e. it is the transitive closure of the union of the respective equivalence relations.

{{1, 3}, {2}, {4}} t {{1, 2}, {3}, {4}} = {{1, 2, 3}, {4}}

In addition, the property P1 u P2 = P1 ⇐⇒ P1 v P2 holds for meets as well as the dual for joins. Since the set of all partitions over a set forms a lattice (D, u, t), it can be used as a description space of a pattern structure.

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 105 / 118 Functional Dependencies A Partition Pattern Structure for Discovering FDs

Given the context K = (G, M, I ) –or the discretization K = (B2(G), M, I )– we build the partition pattern structure (M, (D, u), δ) as follows: M is the set of original attributes. (D, u) is the set of partitions over G provided with the partition intersection operation u. For m ∈ M, the description δ(m) is given by a partition over G such that any two elements g, h of the same class take the same values for the attribute m, i.e. m(g) = m(h).

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 106 / 118 Functional Dependencies A Partition Pattern Structure for Discovering FDs

id a b c d m δ(m) ∈ (D, u)

t1 1 3 4 1 a {{t1, t3}, {t2, t4}} t2 4 3 4 3 b {{t1, t2, t4}, {t3}} t3 1 8 4 1 c {{t1, t2, t3}, {t4}} t4 4 3 7 3 d {{t1, t3}, {t2, t4}}

1. The original data, 2. the partition pattern structure, 3. the pattern concept lattice.

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 107 / 118 Functional Dependencies A Partition Pattern Structure for Discovering FDs

(B, A) is a pattern concept of the partition pattern structure (M, (D, u), δ) if and only if (A, B) is a formal concept of the formal context (B2(G), M, I ) for all B ⊆ M, A ⊆ B2(G). The pattern concept ({b}, {{1, 2, 4}, {3}}) is equivalent to the formal concept ({(1, 2), (1, 4), (2, 4)}, {b}) A functional dependency X → Y holds in a data table T if and only if: {X } = {X , Y } in the partition pattern structure (M, (D, u), δ).

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 108 / 118 A Partition Pattern Structure for Discovering FDs

The pattern concept ({a, d}, {{1, 3}, {2, 4}}) gives the functional dependencies a → d and d → a as: {a} = {d} = {a, d} = {{1, 3}, {2, 4}}. A Partition Pattern Structure for Discovering FDs

The pattern concept ({a, b, d}, {{1}, {3}, {2, 4}}) gives the functional dependencies ab → d and bd → a as: {ab} = {bd} = {abd} = {{1}, {3}, {2, 4}}, but we do not have ad → b as {ad} 6= {abd}. The pattern concept ({a, c, d}, {{1, 3}, {2}, {4}}) gives the functional dependencies ac → d and cd → a as: {ac} = {cd} = {acd} = {{1, 3}, {2}, {4}} but we do not have ad → c as {ad} 6= {acd}. 1 Introduction

2 A Smooth Introduction to Formal Concept Analysis Derivation operators, formal concepts and concept lattice The structure of the concept lattice

3 Algorithms for discovering the concepts The Chein-Malgrange algorithm The Ganter algorithm

4 Pattern Structures

5 Functional Dependencies

6 Complements and References Complements and References Applications

Some reference papers: Jonas Poelmans, Dmitry I. Ignatov, Sergei O. Kuznetsov, and Guido Dedene, Formal concept analysis in knowledge processing: A survey on applications, Expert Systems with Applications, 40(16):6538–6560, 2013. Jonas Poelmans, Sergei O. Kuznetsov, Dmitry I. Ignatov, and Guido Dedene, Formal concept analysis in knowledge processing: A survey on models and techniques, Expert Systems with Applications, 40(16):6501–623, 2013. Victor Codocedo and Amedeo Napoli, Formal Concept Analysis and Information Retrieval – A Survey, in proceedings of the 13th International Conference on Formal Concept Analysis (ICFCA), Lecture Notes in Computer Science 9113, pages 61–77, Springer, 2015.

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 112 / 118 Complements and References

Classification:... Visualization:... Information Retrieval and search: the Credo family by C. Carpineto and G. Romano, P. Eklund and collaborators (especially work on collection in museology), Camelis and related information retrieval systems by S. Ferré. . . Meta-search systems: Search Sleuth (metasearch system), Image Sleuth (search in collections of images), Mail Sleuth (plugin for e-mails). . . Construction of ontologies: G. Stumme, A. Maedche, P. Cimiano, A. Hotho, B. Sertkaya. . . M. Alam, A. Buzmakov, V. Codocedo, A. Napoli (definition discovery in RDF data). . . Linguistic applications, text understanding, semiotic. . . : U. Priss, Jonas Poelmans et al. (analysis of police reports). . .

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 113 / 118 Complements and References

Modeling classes and code analysis: R. Godin, P. Valtchev, M. Huchard, P. Cellier, G. Snelting. . . Recommendation: D. Ignatov, S.O. Kuznetsov. . . Bibsonomy: A. Hotho, G. Stumme (a web-service of social bookmarks. . . ) Discovery of epistemic communities: S.O. Kuznetsov, R. Missaoui, S. Obiedkov with C. Roth, H. Soldano... Access control models: S. Obiedkov... FCA, median graphs, distributive lattices. . . : U. Priss, M. Couceiro, A. Gély, A. Napoli. . . Applications in , , marketing. . . : F. Le Ber, M. Huchard, A. Buzmakov and many others......

The list cannot be exhaustive. . .

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 114 / 118 Complements and References Some Tools for building and visualizing concept lattices

First of all, check the Formal Concept Analysis Homepage: https://upriss.github.io/fca/fca.html as you will find a lot of useful information there. . . Conexp: http://sourceforge.net/projects/conexp Galicia Platform: http://www.iro.umontreal.ca/~galicia/ LatViz system: https://https://latviz.loria.fr/ Toscana platform: http://tockit.sourceforge.net/toscanaj/index.html FCA Tools Bundle: https://fca-tools-bundle.com/

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 115 / 118 Complements and References Main Events

ICFCA: 16th International Conference on Formal Concept Analysis, 29 June - 2 July 2021, Strasbourg https://icfca2021.sciencesconf.org/ CLA: 15th International Conference on Lattices and Applications, Tallinn, Estonia, June 29th - July 1st, 2020, https://cs.ttu.ee/events/cla2020/ CLA homepage: http://cla.inf.upol.cz/ FCA4AI: 8th Workshop “What can FCA do for Artificial Intelligence?”, co-located with ECAI 2020, August 29 2020 (https://fca4ai.hse.ru/2020) ICCS: International Conference on Conceptual Structures (we are there...) IJCRS: International Joint Conference on Rough Sets......

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 116 / 118 Complements and References Main Elements of Bibliography about Concept Lattices

Ordre et classification, Marc Barbut and Bernard Monjardet, Hachette, 1970. Introduction to Formal Concept Analysis, Radim Belohlavek, Palacky University, Olomouc, 2008 http://phoenix.inf.upol.cz/esf/ucebni/formal.pdf Concept Data Analysis: Theory and Applications, Claudio Carpineto and Giovanni Romano, John Wiley & Sons, 2004. Formal Concept Analysis, Bernhard Ganter and Rudolf Wille, Springer, 1999. Applied Lattice Theory: Formal Concept Analysis, Bernhard Ganter and Rudolf Wille, www.math.tu-dresden.de/~ganter/psfiles/concept.ps Formal Concept Analysis, Foundations and Applications, Bernhard Ganter, Gerd Stumme and Rudolf Wille editors, Lecture Notes in Computer Science 3626, Springer, 2005. Conceptual Exploration, Bernhard Ganter and Sergei Obiedkov, Springer, 2016.

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 117 / 118 Complements and References Other Books and Papers

Finite Ordered Sets: Concepts, Results and Uses, Nathalie Caspard, Bruno Leclerc and Bernard Monjardet, Cambridge University Press, 2012. General Lattice Theory, George Grätzer. Birkhäuser Basel, 2003. Introduction to Lattices and Order, B. A. Davey and H. A. Priestley, Cambridge University Press (2nd edition), 2012. Latticial Structures in Data Analysis, Vincent Duquenne, Theoretical Computer Science, 217(2):407–436, 1999. Formal Concept Analysis: From Knowledge Discovery to Knowledge Processing, Sébastien Ferré, Marianne Huchard, Mehdi Kaytoue, Sergei O. Kuznetsov and Amedeo Napoli. in A Guided Tour of Artificial Intelligence (Volume 2, Chapter 13), Springer, 2020. ...

Amedeo Napoli Tutorial on FCA ICCS 2020 – Sep. 18-20 2020 118 / 118