Alternating Finite Automata and Star-Free Languages
Total Page:16
File Type:pdf, Size:1020Kb
Theoretical Computer Science 234 (2000) 167–176 www.elsevier.com/locate/tcs View metadata, citation and similar papers at core.ac.uk brought to you by CORE Alternating ÿnite automata and star-free languages provided by Elsevier - Publisher Connector Kai Salomaa ∗;1, Sheng Yu 2 Department of Computer Science, University of Western Ontario, London, Ont., Canada N6A 5B7 Received May 1997; revised November 1997 Communicated by G. Rozenberg Abstract For a given extended regular expression e we construct an equational representation of an alternating ÿnite automaton accepting the language denoted by e. For star-free extended regular expressions the construction yields a loop-free alternating ÿnite automaton. Also the inclusion in the opposite direction holds and, thus, we obtain a new characterization for the class of star-free languages. c 2000 Elsevier Science B.V. All rights reserved Keywords: Finite automata; Alternation; Star-free languages; Language equations 1. Introduction Alternating ÿnite automata (AFA), or Boolean automata, are a natural generaliza- tion of nondeterministic automata which provide a succinct representation for regular languages. This model was ÿrst considered in [4, 3], since then much work has been done on alternating (ÿnite) automata, see e.g., [5, 7–11, 16]. Systems of language equations can be used as a convenient and intuitively clear representation for alternating ÿnite automata [3, 6, 10, 11, 16]. In this paper we present a recursive construction that, for a given extended regular expression e, produces an equational representation Eqr(e) of an AFA that recognizes the language denoted by e. Extended regular expressions allow the use of complementation and intersection in addition to the operations employed by the standard regular expressions. A regular language is said to be star-free [12, 16] if it can be denoted by an extended regular expression without ∗-operators. The star-free languages form the lowest level ∗ Corresponding author. E-mail address: [email protected] (K. Salomaa) 1 Research supported by the Academy of Finland Grant 14018. 2 Research supported by the Natural Sciences and Engineering Research Council of Canada Grant OGP0041630. 0304-3975/00/$ - see front matter c 2000 Elsevier Science B.V. All rights reserved PII: S0304-3975(97)00047-4 168 K. Salomaa, S. Yu / Theoretical Computer Science 234 (2000) 167{176 of the extended star-height hierarchy. It is known that there exist regular languages that are not star-free but it remains an open problem whether there exist languages having extended star-height greater than one [2, 16]. An interesting characterization of star-freeness in terms of the noncounting property was given by Schutzenberger in 1965. A set S ⊆ ∗ is said to be noncounting if there exists an integer n>0 such that for all x; y; z ∈ ∗, xy nz ∈ S if and only if xy n+1z ∈ S. The characterization theorem of Schutzenberger [14, 15] states that a regular language is star-free if and only if it is noncounting. Meyer [13] showed that the family of group-free regular languages consists of exactly the star-free languages. In this paper, we obtain a new characterization for the family of star-free languages in terms of loop-free alternating ÿnite automata. Intuitively, an AFA A is said to be loop-free if there is a total order 6 on the state set of A such that any state p does not depend on any state q where q6p. We establish that if e is a star-free extended regular expression then the corresponding system of equations Eqr(e) represents a loop-free AFA. Also the converse inclusion holds and, thus, the loop-free AFA deÿne exactly the family of star-free languages. 2. Preliminaries Here we brie y recall the deÿnitions of extended regular expressions and alternating ÿnite automata. For a more detailed presentation and examples the reader is referred to [16]. The set of ÿnite words over an alphabet is ∗ and stands for the empty word. For L ⊆ ∗, L is the complement of L with respect to ∗. The cardinality of a ÿnite set A is |A|. We denote the set of non-negative integers by N0. The set of extended regular expressions E over an alphabet is the smallest subset of ( ∪{; ∅; +; ·; ∗; ¬; ∩; (; )})∗ that satisÿes the following conditions: (i) ∪{; ∅} ⊆ E. ∗ (ii) If e1;e2 ∈E then also (e1 + e2); (e1 ∩ e2); (e1 · e2); (¬e1) and (e1 ) belong to E. An expression e ∈ E is star-free if e does not contain any occurrence of the operator star-free ∗. The set of star-free extended regular expressions over is denoted E . The language denoted by an extended regular expression e ∈ E, L(e), is deÿned inductively as follows: (i) L(∅)=∅,L()={}, and L(a)={a}for a ∈ . (ii) Let e1;e2 ∈ . Then L((e1 + e2)) = L(e1) ∪ L(e2), L((e1 ∩ e2)) = L(e1) ∩ L(e2), ∗ ∗ L((e1 · e2)) = L(e1)L(e2), L((e1 )) = L(e1) , and L((¬e1)) = L(e1). We assume that the binary operators have precedence in the order +; ∩; · from lowest to highest and the unary operators are applied before the binary operators. A pair of parentheses may be omitted whenever the omission would not cause confusion. Also, often we do not write the symbol · denoting catenation in regular expressions. Note that the above deÿnition of extended regular expressions is not minimal be- cause the omission of, for instance, the empty set and the intersection operation would K. Salomaa, S. Yu / Theoretical Computer Science 234 (2000) 167{176 169 not change the family of languages deÿned by the (star-free) extended regular expressions. Alternating ÿnite automata, AFA, are an extension of nondeterministic ÿnite au- tomata. Informally, the operation of an AFA can be described as follows. When the automaton reads an input symbol a in a given state q, it will activate all states of the automaton to work on the remaining part of the input in parallel. Once the states have completed their tasks, q will compute its value by applying a Boolean function to the results obtained and pass on the resulting value to the state by which it was activated. A word w is accepted if the starting state computes the value of 1 and it is rejected otherwise. Below we brie y recall the deÿnition of AFA and their equational representation. For more details see [4, 7, 16]. We denote by the symbol B the two-element Boolean algebra B=({0; 1}; ∨; ∧; ¬; 0; 1). Let Q be a ÿnite set. Then BQ is the set of all mappings of Q into B. Note that u ∈ BQ can be considered as a vector of |Q| entries, indexed by elements of Q, with each entry Q being from B. For u ∈ B and q ∈ Q, we write uq to denote the image of q under u. An alternating ÿnite automaton (AFA) is a quintuple A =(Q; ; s; F; g) where Q is the ÿnite set of states, is the input alphabet, s ∈ Q is the starting state, F ⊆ Q is the set of ÿnal states, and g is a function from Q into the set of all functions from × BQ into B. Note that for each state q ∈ Q, g(q) is a function from × BQ into B, which we will denote by gq in the sequel. For each state q ∈ Q and a ∈ , we deÿne gq(a)to be the Boolean function BQ → B such that Q gq(a)(u)=gq(a; u);u∈B: Q Thus, for u ∈ B , the value of gq(a)(u), also gq(a; u), is either 1 or 0. Q Q We deÿne the function gQ : × B → B by putting together the |Q| functions Q Q gq : × B → B, q ∈ Q, as follows. For a ∈ and u; v ∈ B , gQ(a; u)=vif and only if gq(a; u)=vq for each q ∈ Q. For convenience, we will write g(a; u) instead of gQ(a; u) in the following. The characteristic vector of F, f ∈ BQ, is deÿned by the condition fq =1 ⇐⇒ q ∈ F: We extend g to a function of Q into the set of all functions ∗ ×BQ → B as follows: uq if w = ; gq(w; u)= 0 0 0 ∗ gq(a; g(w ;u)) if w = aw with a ∈ and w ∈ ; where w ∈ ∗ and u ∈ BQ. ∗ A word w ∈ is accepted by A if and only if gs(w; f) = 1, where s is the starting state and f is the characteristic vector of F. The language accepted by A is the set ∗ L(A)={w∈ |gs(w; f)=1}: 170 K. Salomaa, S. Yu / Theoretical Computer Science 234 (2000) 167{176 Example 2.1. We deÿne an AFA A =(Q; ; s; F; g) where Q = {q0;q1;q2},={a; b}, s = q0, F = {q2}, and g is given by State a b q0 q1 ∧ q2 0 q1 q2 q1 ∧ q2 q2 ¬(q1) ∧ q2 q1 ∨¬(q2) Let w = aba. Then w is accepted by the AFA A as follows: (aba; f)=g (ba; f) ∧ g (ba; f) gq0 q1 q2 =(gq1(a; f) ∧ gq2 (a; f)) ∧ (gq1 (a; f) ∨¬gq2(a; f)) (; f) ∧ (¬g (; f) ∧ g (; f))) ∧ (g (; f) =(gq2 q1 q2 q2 (; f) ∧ g (; f))) ∨¬(¬gq1 q2 ∧(¬f ∧f )) ∧ (f ∨¬(¬f ∧f )) =(fq2 q1 q2 q2 q1 q2 =(1∧(¬0∧1)) ∧ (1 ∨¬(¬0∧1))=1: Each AFA A =(Q; ; s; F; g) can be represented by a system of language equations of the following form: P Xq = a · gq(a; X )+q;q∈Q; (1) a2 where X is a vector of |Q| variables Xq indexed by the states q ∈ Q, and if q ∈ F; = q 0 otherwise; for each q ∈ Q.