Arxiv:2106.06815V1 [Cs.LG]
Total Page:16
File Type:pdf, Size:1020Kb
Quantifying the Conceptual Error in Dimensionality Reduction Tom Hanika1,2 and Johannes Hirth1,2 0000-0002-4918-6374 0000-0001-9034-0321 1 Knowledge & Data Engineering Group, University of Kassel, Germany 2 Interdisciplinary Research Center for Information System Design University of Kassel, Germany [email protected], [email protected] Abstract Dimension reduction of data sets is a standard problem in the realm of machine learning and knowledge reasoning. They affect pat- terns in and dependencies on data dimensions and ultimately influence any decision-making processes. Therefore, a wide variety of reduction procedures are in use, each pursuing different objectives. A so far not considered criterion is the conceptual continuity of the reduction map- ping, i.e., the preservation of the conceptual structure with respect to the original data set. Based on the notion scale-measure from formal concept analysis we present in this work a) the theoretical foundations to detect and quantify conceptual errors in data scalings; b) an experimental in- vestigation of our approach on eleven data sets that were respectively treated with a variant of non-negative matrix factorization. Keywords: Formal Concept Analysis; Dimension Reduction; Conceptual Mea- surement; Data Scaling 1 Introduction The analysis of large and complex data is presently a challenge for many data driven research fields. This is especially true when using sophisticated analy- sis and learning methods, since their computational complexity usually grows at arXiv:2106.06815v1 [cs.LG] 12 Jun 2021 least superlinearly with the problem size. One aspect of largeness and complexity is the explicit data dimension, e.g., number of features, of a data set. Therefore, a variety of methods have been developed to reduce exactly this data dimension to a computable size, such as principal component analysis, singular value de- composition, or factor analysis [11]. What all these methods have in common is that they are based on the principle of data scaling [8]. A particularly challenging task is to apply Boolean factor analysis (BFA) [15], as the distinct feature values are restricted to either 0 (false) or 1 (true). For ex- ample, given the binary data set matrix K, the application of a BFA yields two Authors are given in alphabetical order. No priority in authorship is implied. binary data matrices S,H of lower dimension, such that S · H approximates K with respect to a previously selected norm k·k. The factor S can be considered as a lower dimensional representation of K, i.e., a scaling of K. The connection be- tween the scaling features of S and the original data features of K is represented by H. The quality of an approximation, and therewith the quality of a scale S, is usually scored through the Frobenius norm of K − SH, or other functions [3, 9], such as the Residual Sum of Squares or, in the binary setting, the Hemming distance. These scoring functions give a good impression of the extent to which the linear operator K is approximated by S ·H, yet, they are incapable to detect the deviation of internal incidence structure of S with respect to K, which we want to call conceptual scaling error. A well defined formalism from Formal Concept Analysis (FCA) [6] to analyze the resulting inconsistencies in (binary) data scaling is scale-measures [5, 7]. In this work, we build up on this notion and introduce a comprehensive framework for quantification of quantifying the conceptual errors of scales. The so introduced mathematical tools are capable of determining how many conceptual errors arise from a particular scaling S and pinpoint which concepts are falsely introduced or lost. For this we overcome the potential exponential computational demands of computing complete conceptual structures by employing previous results on the scale-measures decision problem [8]. We motivate our results with accompanying examples and support our results with an experiment on eleven data sets. 2 Scales and Data FCA Recap In the field of Formal concept analysis (FCA) [6, 14] the task for data scaling, and in particular feature scaling, is considered a fundamental step for data analysis. Hence, data scaling is part of the foundations of FCA [6] and it is frequently investigated within FCA [5, 7]. The basic data structure of FCA is the formal context, see our running example 3 KW Figure 1 (top). That is a triple (G, M, I) with non-empty and finite set G (called objects), finite set M (called attributes) and a binary relation I ⊆ G × M (called incidence). The tuple (g,m) indicates “g has attribute m in K”, when (g,m) ∈ I. Any context S = (H,N,J) with H ⊆ G, N ⊆ M and J = I ∩(H ×N) we call induced sub-context of K, and denote this relation by ≤, i.e., S ≤ K. The incidence relation I ⊆ G × M gives rise to the natural Galois connection between P (G) and P (M), which is the pair4 of operators ·′ : P(G) → P(M), A 7→ A′ = {m ∈ M | ∀a ∈ A: (a,m) ∈ I}, and ·′ : P(M) → P(G), B 7→ B′ = {g ∈ G | ∀b ∈ B : (g,b) ∈ I}, each called derivation. Using these operator, a formal concept is a pair (A, B) ∈P(G) ×P(M) with A′ = B and A = B′, where A and B are called extent and intent, respectively. 3 This restriction is in general not necessary. Since data sets can be considered finite throughout this work and since this assumption allows a clearer representation of the results within this work, it was made. 4 Both operators are traditionally denoted by the same symbol ·′ monocotyledon dicotyledon K has limbs breast feeds needs needs water lives on lives in can move W (L) (BF) chlorophyll (Ch) to live (W) land (LL) water (LW) (M) (MC) (DC) dog ×× ×× × fish leech × ×× corn ××× × bream × × ×× water weeds ×× × × bean × × × × frog × × × × × reed × ××× × W LL MLW Ch L MC fish leech DC BF ww corn dog bream bean frog reed Figure 1. This formal context of the Living Beings and Water data set (top) and its concept lattice (bottom). The two possible compositions of the derivation operators lead to two closure operators ·′′ : P(G) →P(G) and ·′′ : P(M) →P(M), and in turn to two closure spaces Ext(K) := (G,′′ ) and Int(K) := (M,′′ ). Both closure systems are repre- sented in the (concept) lattice B(K) = (B(K), ≤), where the set of all formal concepts is denoted by B(K) := {(A, B) ∈ P(G) ×P(M) | A′ = B ∧ B′ = A} the order relation is (A, B) ≤ (C,D) :⇔ A ⊆ C. An example drawing of such a lattice is depicted in Figure 1 (bottom). 2.1 Scales-Measures The basis for quantifying the conceptual error of a data scaling is the continuity of the scaling map with respect to the resulting closure spaces. We say a map map f : G1 → G2 is continuous with respect to the closure spaces (G1,c1) and −1 −1 (G2,c2) if and only if for all A ∈P(G2) we have c1(f (A)) ⊆ f (c2(A)). That is, a map is continuous iff the preimage of closed sets is closed. Within FCA this notion is captured by the following definition. Definition 1 (Scale-Measure (cf. Definition 91, [6])). Let K = (G, M, I) and S = (GS,MS, IS) be a formal contexts. The map σ : G → GS is called an S-measure of K into the scale S iff the preimage σ−1(A) := {g ∈ G | σ(g) ∈ A} of every extent A ∈ Ext(S) is an extent of K. plants land plants water plants S animals land animal water animal mammal W LW := C := M :=LL∧ plant := LW ∧ plant := LL ∧ animal := LW ∧ animal := animal ∧ BF dog × × × × fish leech ×× × × corn ×× × bream ×× × × water weeds ××× × bean ×× × frog ×× × × × reed ××× × × W W LL M Ch plant LW animal LW L land plant MC land animal ⇒ corn, bean water animal water plant fish leech DC mammal fish leech, water breast feeds water weedscorn bean dog bream dog bream weeds frog reed frog reed Figure 2. A scale context (top), its concept lattice (bottom right) for which idG is a scale-measure of the context in Figure 1 and the reflected extents σ−1[Ext(S)] (bottom left) indicated as in gray. The (scaling-)map σ can be understood as an interpretation of the objects from K using the attribute (features) of S. Hence, we denote σ−1[Ext(S)] := −1 S SA∈Ext(S) σ (A) as the set of extents that are reflected by the scale context . In Figure 2 we depicted an example scale-context S, where the attributes of the scale-context are constructed by conjunctions of attributes from KW , as seen in Figure 1. This scaling is based on the original object set G and we observe that S reflects twelve out of the nineteen concepts from B(KW). For any two scale-measures (σ, S), (ψ, T) we say [8] that (σ, S) is finer than (ψ, T), iff ψ−1(T) ⊆ σ−1[Ext(S)]. Dually we say then that (σ, S) is coarser than (ψ, T). From both relation an equivalence relation ∼ arises naturally. The set of all pos- sible scale-measures for some K, denoted by S(K) := {(σ, S) | σ is a S−measure of K}, is therefore ordered. Furthermore, it is known [8] that factorizing S(K) S K by ∼ leads to a lattice ordered structure S(K) = ( ( )/∼, ≤), called the scale- hierarchy of K. This hierarchy is isomorphic to the set of all sub-closure systems of Ext(K), i.e. {Q ⊆ Ext(K) | Q is a Closure System on G}, ordered by set in- clusion ⊆.