FORBIDDEN PATTERNS for ORDERED AUTOMATA Ondˇrej Kl´Ima(A) Libor Pol´Ak(A)
Total Page:16
File Type:pdf, Size:1020Kb
FORBIDDEN PATTERNS FOR ORDERED AUTOMATA Ondˇrej Kl´ıma(A) Libor Pol´ak(A) Department of Mathematics and Statistics, Masaryk University, Kotl´aˇrsk´a2, 611 37 Brno, Czech Republic {klima,polak}@math.muni.cz Abstract The contribution concerns decision procedures in the algebraic theory of regular languages. Among others, various versions of forbidden patterns or configurations in automata are treated in the existing literature. Basically, one looks for certain subgraphs of the minimal automaton of a given language to decide whether this language does not belong to a given significant class of regular languages. We survey numerous known examples and we build a general theory covering the most of familiar ones. The chosen formalism differs from existing ones and the general- ization to ordered automata enables us to reformulate some of known examples in a uniform shape. We also describe certain sufficient assumptions on the forbidden pattern which ensure that the corresponding class of languages forms a robust class in the sense of natural closure properties. 1. Introduction Certain significant classes of regular languages can be characterized by some kind of forbid- den patterns, which cannot occur in an automaton recognizing the language. To recall some examples, we can mention results by Cohen, Perrin and Pin [3] concerning the restriction of linear temporal logic obtained by considering only the operators “next” and “eventually”. The useful characterization obtained in that paper is that a language L is expressible by this logic, denoted by RTL, if and only if the minimal automaton of L does not contain the pattern from Figure 1. This characterization gave a polynomial time algorithm for testing whether the language recognized by an n-state deterministic automaton is RTL-definable (see Theorem 4.2 and its Corollary 4.3 in [3]). The technique of forbidden patterns was also used by Schmitz et al. [4, 14, 15] for the first levels of the Straubing-Th´erien hierarchy of the star-free languages. This paper is focusing on formal theory of forbidden patters for deterministic finite automata, for which early formalisms were given in [3, 14]. For the purpose of this paper, the basic notion is a semiautomaton which is a deterministic automaton without initial and final states being specified. Then a pattern is an (incomplete) semiautomaton over an auxiliar alphabet X with (A)Both authors were supported by Czech Science Foundation under Grant No. GA15-02862S. 2 Ondˇrej Kl´ıma, Libor Pol´ak z z y m x ℓ x z k y =6 x n o y z z Figure 1: The forbidden pattern for RTL. a marked pair of states. An example is on Figure 1 where the auxiliar alphabet is X = {x, y, z} and the pair of states is (ℓ,o). Now, one can consider the class of all regular languages for which there exist DFA recognizing the language, such that the automaton does not contain a pattern where a marked pair of states are different states. In the existing literature, e.g. in [5, 7, 10], the approach of forbidden patterns is considered in various modifications: one state of the marked pair being final and the second one non-final, considering complete or incomplete automata, considering the minimal automaton or arbitrary automata, etc. We would like to develop a unified theory of forbidden patterns which would explain some general behaviour of these patterns and the classes defined by them and compare the formalisms with those of numerous known mentioned applications. Comparing to a notion of forbidden patterns from [3, 14] where patterns are viewed as sub- graphs of the underlining graph of an automaton under the consideration, in our formalism we map a pattern into that automaton by a homomorphism of semiautomata. Moreover, we consider a certain generalization which reflects one of recent directions of the research in alge- braic theory of regular languages devoted to generalizations of the Eilenberg correspondence. Clearly, not all natural classes of regular languages are varieties. In particular, it is the case of some classes studied in papers, where forbidden patterns were applied successfully. Here, we use a combination of three ideas extending Eilenberg correspondence. Pin’s modification [11] to positive varieties of languages (classes need not to be closed under complementation) can be combined with Straubing’s modification [18] to C-varieties (classes are closed only under preimages in homomorphisms from a fixed category C of homomorphisms). In the mentioned papers, the corresponding classes of algebraic structures were syntactic ordered monoids and syntactic homomorphisms, respectively. Of course, in our paper, we deal with automata, thus we need to use Eilenberg type correspondence between positive C-varieties of regular languages and C-varieties of ordered semiautomata studied by the authors in [8]. The last notion is a modification of C-varieties of semiautomata introduced in Chaubard at al [2] as C-varieties of actions. In this contribution, we show that every pattern in our formalism satisfying certain assumptions defines a C-variety of ordered semiautomata. Then, we explain that many examples of forbidden patterns from the literature are particular instances of our general concept. We also explain how a certain special scheme of patterns working with final states in the minimal automaton of a language can be translated into a pattern for the minimal ordered automaton of the language. FORBIDDEN PATTERNS FOR ORDERED AUTOMATA 3 Finally, we show how the well-known characterization of languages from the level 3/2 in the Straubing-Th´erien hierarchy of star-free languages can be expressed naturally using our notion. Due to the space limitations, omitted proofs can be found in the full version of this paper [9]. 2. C-Varieties of Ordered Semiautomata The aim of this section is to overview a necessary minimum concerning Eilenberg type corre- spondence between positive C-varieties of languages and C-varieties of ordered semiautomata. This is a framework in which the notion of forbidden patterns is developed. Note that (positive) varieties can be seen as a special case, if C is the class of all homomorphisms. We consider only regular languages in this contribution. The quotient of a language L ⊆ A∗ by words u, v ∈ A∗ is the set u−1Lv−1 = { w ∈ A∗ | uwv ∈ L }. In particular, left quotients are u−1L = { w ∈ A∗ | uw ∈ L }, u ∈ A∗. The empty word is denoted by λ. To recall a definition of positive C-varieties of languages from [18], we first need to explain a role of a category of homomorphisms C. From the point of view of category theory, this C is a category where objects are free monoids A∗ for a non-empty finite alphabet A and morphisms are certain monoid homomorphisms among them. We simplify the notation to consider C as a class of homomorphisms satisfying the following properties: ∗ ∗ – For each finite alphabet A, the identity mapping idA : A → A belongs to C. – If f : B∗ → A∗ and g : C∗ → B∗ belong C, then the composition gf : C∗ → A∗ belongs to C. Examples of categories C which are used in this setting are: Call consisting of all homomorphisms between free monoids, Ci consisting of all injective homomorphisms, Cne consisting of all non- erasing homomorphisms (here only λ is mapped onto λ). Furthermore, by the preimage in f : B∗ → A∗ of a given L ⊆ A∗, the set f −1(L)= { v ∈ B∗ | f(v) ∈ L } is meant. A positive C-variety of languages V associates to every non-empty finite alphabet A a class V(A) of regular languages over A in such a way that – V(A) is closed under quotients, finite unions and intersections and contains ∅, A∗, – V is closed under preimages in morphisms from C. As we already mentioned, if we take C = Call, we get exactly the notion of the positive varieties of languages. When adding “each V(A) is closed under complements”, we get exactly the notion of the C-varieties of languages. To introduce a notion of ordered automata, we explain first that the minimal DFA of a given language is implicitly ordered. Indeed, for a minimal DFA, one can assign to each state q its future Fq consisting of all words which are acceptable if q would be the initial state. The minimality implies that different states have different futures. Now, if we identify states with their futures, then the relation ⊆ is an order on the minimal automaton (which is compatible with every action by a single letter, as we explain latter). We prefer to fix this minimal 4 Ondˇrej Kl´ıma, Libor Pol´ak (ordered) automaton of a given language, under the name canonical (ordered) automaton, using the construction of Brzozowski [1]: the canonical automaton of a regular language L is −1 ∗ −1 DL = (DL, A, ·, L, FL), where DL = { u L | u ∈ A }, q · a = a q, for each q ∈ DL, a ∈ A, and FL = { q ∈ DL | λ ∈ q }. Then a canonical ordered automaton is OL =(DL, A, ·, L, FL, ⊆). Before we introduce a notion of ordered semiautomaton, we recall some basic terminology from the theory of ordered sets. By an order ≤ on a set M we mean a reflexive, antisymmetric and transitive relation. A subset X of M is called upward closed if for every pair of elements x, y ∈ M, we have that x ≤ y, x ∈ X implies y ∈ X. A mapping f : M → N between two ordered sets (M, ≤) and (N, ≤) is called isotone if x ≤ y implies f(x) ≤ f(y) for every pair of elements x, y ∈ M. For example, the action by each letter a ∈ A in OL is an isotone mapping and the set FL of all final states is an upward closed subset with respect to ⊆.