Star Height Via Games

Star height via games Mikołaj Bojanczyk´ (University of Warsaw) Abstract—This paper proposes a new algorithm deciding the I would like to thank Nathanael¨ Fijalkow, Denis Kuperberg, star height problem. As shown by Kirsten, the star height prob- Christof Loding¨ and the anonymous referees for their helpful lem reduces to a problem concerning automata with counters, comments. This work has been supported by the Polish NCN called limitedness. The new contribution is a different algorithm for the limitedness problem, which reduces it to solving a Gale- grant 2014-13/B/ST6/03595. Stewart game with an !-regular winning condition. I. REDUCTION TO LIMITEDNESS OF COST AUTOMATA The star height of a regular expression is its nesting depth This section introduces cost automata, and describes a of the Kleene star. The star height problem is to compute reduction of the star height problem to a problem for cost the star height of a regular language of finite words, i.e. the automata, called limitedness. The exact reduction used in this smallest star height of a regular expression that defines the paper was shown in [Kir05] and is included to make the paper language. For instance, the expression (a∗b∗)∗ has star height self-contained. The idea to use counters and limitedness in two, but its language has star height one, because it is defined the study of star height dates back to the original proof of by the expression (a + b)∗ of star height one, and it cannot Hashiguchi. be defined by a star-free expression. Here we consider regular Cost automata and the limitedness problem: The lim- expressions without complementation, in which case star-free itedness problem concerns an automaton model with coun- regular expressions define only finite languages. ters, which in the literature appears under the following The star height problem is notorious for being one of the names: nested distance desert automaton in [Kir05]; hierar- most technically difficult problems in automata theory. The chical B-automaton in [BC06] and [Col09]; and R-automaton problem was first stated by Eggan in 1963 and remained open in [AKY08]. (The last model is slightly more general, being for 25 years until it was solved by Hashiguchi in [Has88]. equivalent to the non-hierarchical version of B-automata, but Hashiguchi’s solution is very difficult. Much work was later its limitedness problem is no more difficult.) Here, we simply devoted to simplifying the proof, resulting in important new use the name cost automaton. Define a cost automaton to be a ideas like the tropical semiring [Sim88a] or Simon’s Fac- nondeterministic finite automaton, additionally equipped with torisation Forest Theorem [Sim88b]. This work culminated a finite set of counters which store natural numbers. For each in a simplified proof, with an elementary complexity bound, counter c, there is a distinguished subset of transitions that which was given by Kirsten in [Kir05]. Kirsten introduces an increment counter c, and a distinguished subset of transitions automaton model, called a nested distance desert automaton, that reset counter c. The counters are totally ordered, say and shows that the star height problem can be reduced to they are called 0; : : : ; n, and every transition that resets or a decision problem for this automaton model, called the increments a counter c must also reset all counters 0; : : : ; c−1. limitedness problem. The reduction of the star height problem At the beginning of a run, all counters have value zero. to the limitedness problem is not difficult, and most of the Define the value of a run to be the smallest m such that effort in [Kir05] goes into deciding the limitedness problem. all counters have value at most m at all times during the run. Later solutions to the limitedness problem can be found in the In other words, this is the smallest m such that for every work of Colcombet on cost functions [Col09], or in the PhD counter c, there can be at most m transitions in the run that thesis of Torunczyk´ [Tor11]. All of these solutions are based increment counter c and which are not separated by a transition on variants of the limitedness problem. which resets counter c. A cost automaton is said to be limited The contribution of this paper is a conceptually simple over a language L if there exists a bound m such that for algorithm deciding the limitedness problem, and therefore also every word in L there exists a run of the cost automaton the star height problem. The algorithm is a reduction of the which is both accepting (i.e. begins in an initial state and ends limitedness problem to the Church Synthesis Problem, i.e. the in a final state) and has value at most m. The limitedness problem of finding the winner in a Gale-Stewart game with problem is to decide, given a cost automaton and a regular an !-regular winning condition. The reduction is quite simple, language L, whether the automaton is limited over L. The and reduces the star height problem to a problem that has been rest of Section I is devoted to proving the reduction stated in widely studied and is well understood. the following theorem. This is the same reduction that was used in Kirsten’s proof from [Kir05], a detailed proof can be To make the paper self-contained, we show how the star found in Proposition 6.8 of [Kir06]. height problem is reduced to the limitedness problem in Section I, and in Section II we present the new contribution, Theorem 1. The star height problem reduces to the limited- namely the algorithm for limitedness. ness problem. In the reduction we consider a normal form of regu- In the lemma below, we use the following notation: for ∗ lar expressions, which are called string expressions follow- w1; : : : ; wi 2 A and N1;:::;Ni ⊆ M, we write ing [Coh70]. The main idea is that in a string expression, a ∗ ∗ concatenation of unions is converted into a union of concate- w1N1 ··· wiNi ⊆ M nations. String expressions have two parameters, called (star) to be all possible elements of the monoid M that can be height and degree, and are defined by induction on height. A obtained by taking a product α(w1)n1 ··· α(wi)ni where each string expression of height 0 and degree at most m is defined nj is in the submonoid of M generated by Nj. to be any regular expression describing a finite set of words of length at most m. For h ≥ 1, a string expression of height Lemma 1. Let h; m 2 N. For every input word w, condition at most h and degree at most m is defined to be any finite 1 in the statement of the proposition is equivalent to: union of expressions of the form (*) there exists a number i ≤ m and subsets ∗ ∗ ∗ N1;:::;Ni ⊆ M w1f1 w2f2 ··· wifi such that w admits a factorisation where i is at most the degree m, w1; : : : ; wi are words of length bounded by the degree m, and f1; : : : ; fi are string ex- w = w1v1w2v2 ··· wivi pressions of height strictly smaller than h and same degree m. A straightforward induction proof, see Lemma 3.1 in [Coh70], such that the following conditions hold shows that every regular expression can be converted into a w N ∗ ··· w N ∗ ⊆ α(L) (1) string expression without increasing star height. Therefore, a 1 1 i i language has star height at most h if and only if it can be jwjj ≤ m for j 2 f1; : : : ; ig (2) m ∗ defined by a string expression of height at most h and some vj 2 ([Nj]h−1) for j 2 f1; : : : ; ig: (3) degree. The proof of the above lemma is omitted, as it follows rather The main observation in the reduction of the star height simply from the definition of string expressions. To complete problem to the limitedness problem is the following proposi- the proof of the proposition, we will show that there exists a tion, which says that for every star height h and every regular cost automaton, which admits a run of value m 2 if and language L, there is a cost automaton which maps an input N only if condition (*) in the above lemma is satisfied. When word to the smallest degree of a string expression that contains reading an input word w, the automaton proceeds as follows. the input word, has star height at most h, and is contained in L. It uses nondeterminism to guess a factorisation Proposition 1. For every regular language L ⊆ A∗ and h 2 N w = w v w v ··· w v ; one can compute a cost automaton such that for every word 1 1 2 2 i i w 2 A∗ and m 2 N, the following conditions are equivalent: as in the statement of the claim, and it also uses nondetermin- 1) there is a string expression e of height at most h and ism to guess the subsets N1;:::;Ni. The subset Nj is only degree at most m such that w 2 e ⊆ L. stored in the memory of the automaton while processing the 2) the automaton has a run on w with value at most m. block vj. Condition (1) is verified using the finite control of the automaton: when the automaton is processing the block Proof. The proposition is proved by induction on h. The wjvj, its finite memory stores Nj as well as induction base of h = 0 is straightforward: the cost automaton ∗ ∗ has one counter, has an accepting run if and only if the word w1N1 ··· wjNj ⊆ M: belongs to L, and the value of the counter in this run is the The remaining conditions are verified using counters.

Star Height Via Games

Language and Automata Theory and Applications

Some Results on the Generalized Star-Height Problem Jean-Eric Pin, Howard Straubing, Denis Thérien

Some Results on the Generalized Star-Height Problem

Languages of Generalised Star Height 1

Generic Results for Concatenation Hierarchies

INSTITUT FÜR INFORMATIK Finite Automata, Digraph Connectivity

Algorithms for Determining Relative Star Height and Star Height*

From Finite Automata to Regular Expressions and Back—A Summary on Descriptional Complexity

The Star-Height of a Finite Automaton and Some Related Questions

Open Problems About Regular Languages, 35 Years Later Jean-Eric Pin

On the Star-Height of Factor Counting Languages and Their Relationship To

Alternating Finite Automata and Star-Free Languages