Regular matching problems for infinite trees Carlos Camino FMI, Universität Stuttgart, Germany: [email protected] Volker Diekert FMI, Universität Stuttgart, Germany: [email protected] Besik Dundua Kutaisi International University and VIAM, Tbilisi State University: [email protected] Mircea Marin FMI, West University of Timişoara, Romania: [email protected] Géraud Sénizergues LaBRI, Université de Bordeaux, France: [email protected]

Abstract We study the matching problem of regular tree languages, that is, “∃σ : σ(L) ⊆ R?” where L, R are regular tree languages over the union of finite ranked alphabets Σ and X where X is an alphabet of variables and σ is a substitution such that σ(x) is a set of trees in T (Σ ∪ H) \ H for all x ∈ X . Here, H denotes a set of “holes” which are used to define a “sorted” concatenation of trees. Conway studied this problem in the special case for languages of finite words in his classical textbook Regular algebra and finite machines published in 1971. He showed that if L and R are regular, then the problem “∃σ∀x ∈ X : σ(x) 6= ∅ ∧ σ(L) ⊆ R?” is decidable. Moreover, there are only finitely many maximal solutions, the maximal solutions are regular substitutions, and they are effectively computable. We extend Conway’s results when L, R are regular languages of finite and infinite trees, and language substitution is applied inside-out, in the sense of Engelfriet and Schmidt (1977/78). More precisely, we show that if L ⊆ T (Σ ∪ X ) and R ⊆ T (Σ) are regular tree languages over finite or infinite trees,

then the problem “∃σ∀x ∈ X : σ(x) 6= ∅ ∧ σio(L) ⊆ R?” is decidable. Here, the subscript “io” in σio(L) refers to “inside-out”. Moreover, there are only finitely many maximal solutions σ, the maximal solutions are regular substitutions and effectively computable. The corresponding question

for the outside-in extension σoi remains open, even in the restricted setting of finite trees. In order to establish our results we use alternating tree automata with a parity condition and games.

2012 ACM Subject Classification Theory of computation → Formal languages and

Keywords and phrases Regular tree languages, infinite trees, Factorization theory, IO and OI

Preamble

The additional material in the appendix (Sec. 8) is not needed to understand the results in the main body of the paper. arXiv:2004.09926v5 [cs.FL] 14 Sep 2021 1 Introduction

1.1 Historical background Regular matching problems using generalized sequential machines were studied first by Ginsburg and Hibbard. Their publication [13], dating back to 1964, showed that it is decidable whether there is a generalized sequential machine which maps L onto R if L and R are regular languages of finite words. The paper also treats several variants of this problem. For example, the authors notice that the decidability cannot be lifted to context-free languages. Another paper in that area is by Prieur et al. [24]. It appeared in 1997 and studies the problem whether there exists a sequential bijection from a finitely generated free monoid to a given rational set R. Earlier, in the 1960’s Conway studied regular matching problems in the following variant of [13]: Let X , Σ be finite alphabets and L ⊆ (Σ ∪ X )∗, 2 Regular matching problems for infinite trees

∗ ∗ R ⊆ Σ∗. A substitution σ : X → 2Σ is extended to σ :Σ ∪ X → 2Σ by σ(a) = {a} for all a ∈ Σ \X . It is called a solution of the problem “L ⊆ R?” if σ(L) ⊆ R. In his textbook [6, Chapt. 6], Conway developed a factorization theory of formal languages. Thereby he found a nugget in theory: Given as input regular word languages L ⊆ (Σ ∪ X )∗ and R ⊆ Σ∗, it holds:

∗ 1. It is decidable whether there is a substitution σ : X → 2Σ such that σ(L) ⊆ R and ∅= 6 σ(x) for all x ∈ X . 2. Define σ ≤ σ0 by σ(x) ⊆ σ0(x) for all x ∈ X . Then every solution is bounded from above by a maximal solution; and the number of maximal solutions is finite. 3. If σ is maximal, then σ(x) is regular for all x ∈ X ; and all maximal solutions are effectively computable.

The original proof is rather technical and not easy to digest. On the other hand, using the algebraic concept of recognizing morphisms, elegant and simple proofs exist. Regarding the complexity, it turns out that the problem “∃σ : σ(L) ⊆ R?” is Pspace-complete if L and R are given by NFAs by [18, Lemma 3.2.3]. The apparently similar problem “∃σ : σ(L) = R?” is more difficult: Bala showed that it is Expspace-complete [1]. Conway also asked whether the unique maximal solution of the language equation Lx = xL is given by a substitution such that σ(x) is regular1. This question was answered by Kunc in a highly unexpected way: there is a finite set L such that the unique maximal solution σ(x) of Lx = xL is co-recursively-enumerable-complete [19]. A recent survey on language equations is in [20].

1.2 Conway’s result for trees

The present paper generalizes Conway’s result to regular tree languages. We consider finite and infinite trees simultaneously over a finite ranked alphabet ∆. We let T (∆) be the set of all trees with labels in ∆, and by Tfin(∆) we denote its subset of finite trees. More specifically, we consider finite ranked alphabets X of variables and Σ of function symbols. In order to define a notion of a concatenation we also need a set of holes H. These are symbols of rank zero. For simplicity, throughout the set of holes is chosen as H = {1,..., |H|}. We require (Σ ∪ X ) ∩ H = ∅. Trees in T (∆) are rooted and they can be written as terms x(s1, . . . , sr) where r = rk(x) ≥ 0 and si are trees. In particular, all symbols of rank 0 are trees. Words ∗ a1 ··· an ∈ Σ (with ai ∈ Σ) are encoded as terms a1(··· (an($)) ··· ) where the ai are function symbols of rank 1 and the only symbol of rank 0 is $ which signifies “end-of-string”. An ω infinite word a1a2 ... ∈ Σ is encoded as a1(a2(...)) and no hole appears. In contrast to the case of classical term rewriting, substitutions are applied at inner positions, too. In the word ∗ case it is clear what to do. For example, let w = xyx ∈ X with σ(x) = Lx and σ(y) = Ly, then we obtain σ(w) = LxLyLx. Translated to term notation, we obtain w = x(y(x($))), σ(z) = {u(1) | u ∈ Lz} for z ∈ {x, y} with the result σ(w) = LxLyLx($). On the other hand, in the tree case variables of any rank may exists. As a result, variables may appear at inner nodes as well as at leaves. Throughout, if t is any tree, then leafi(t) denotes the set of leaves labeled by the hole i ∈ H, and by ij we denote elements of leafi(t). In the following, a substitution means a mapping σ : X → 2T (Σ∪H)\H such that for all x ∈ X we have σ(x) ⊆ T (Σ∪{1,..., rk(x)}).A homomorphism (resp. partial homomorphism) is a substitution σ such that |σ(x)| = 1 (resp. |σ(x)| ≤ 1) for all x ∈ X . We write σ1 ≤ σ2 if 2 σ1(x) ⊆ σ2(x) for all x ∈ X ; and we say that σ is regular if σ(x) is regular for all x ∈ X .

1 There is a unique maximal solution since the union over all solutions is a solution. 2 There are several equivalent definitions for regular tree languages, e.g. see [5, 25, 22, 23, 27]. C. Camino. V. Diekert, B. Dundua, M. Marin, and G. Sénizergues 3

0 0 t = g t [1j ← t] = g 1 f f g t g a 1 a t

Figure 1 The left tree t0 has two positions labeled with hole 1 ∈ H. Holes define a composition of trees. The right tree is obtained by composing t0 with a tree t over the hole 1 which is denoted by 0 t [1j ← t].

Trees are represented graphically, too. For example, g(1, f(g(a, 1))) and g(t, f(g(a, t))) are represented in Fig. 1. We obtain g(t, f(g(a, t))) by replacing the positions labeled by hole 1 in g(1, f(g(a, 1))) by any rooted tree t. As soon as σ is not a partial homomorphism, one has to distinguish between “Inside-Out” (IO for short) and “Outside-In” (OI for short) as advocated and defined in [11, 12]. Our positive decidability results concern IO, only. The IO-definition σio(s) for a given tree s ∈ T (Σ ∪ X ) and a substitution σ has the following interpretation. First, we extend σ to a mapping from Σ ∪ X ∪ H to 2T (Σ∪H) by σ(f) = {f(1,..., rk(f))} for f ∈ Σ \X . Second, we use a term notation s = x(s1, . . . , sr) (for finite and infinite trees s) and we let σio(s) ⊆ T (Σ) to be a certain fixed point of the language equation [ σio(s) = {t[ij ← ti] | t ∈ σ(x) ∧ ∀ 1 ≤ i ≤ r : ti ∈ σio(si)} . (1)

The notation t[ij ← ti] in Eq.(1) means that for all i ∈ H and ij ∈ leafi(t) all leaves ij are replaced by ti where ti depends on i, only. It is a special case of the notation t[ij ← tij ] which says that for all i ∈ H and ij ∈ leafi(t) some tree tij is chosen and then each leaf ij is replaced by tij . This more general notation appears below when defining “Outside-In” substitutions. The idea to compute the elements in σio(s) is a recursive procedure which at each call first selects the tree t ∈ σ(x) (if x is the label of the root of s) and then it makes recursive calls for each hole i which appears as a label in t to compute the elements ti. After that, all positions ij in t which have the label i are replaced by the same tree ti. For finite trees the procedure terminates. For example, g(t, f(g(a, t))) = g(1, f(g(a, 1)))[1j ← t] in Fig. 1 represents an instance of an IO-substitution. For that, we let s = g(x, f(g(a, x))) where x is a variable of rank zero and σ(x) = {t1, t2}. Then σio(s) = {g(t1, f(g(a, t1))), g(t2, f(g(a, t2)))}. For infinite trees, the recursion is not guaranteed to stop. Running it for n recursive calls defines a Cauchy sequence in an appropriate complete metric space T⊥(Σ ∪ X ∪ H) where ⊥ plays the role of undefined if the procedure cannot select a tree because the corresponding set σ(x) happens to be empty. As explained above, the feature of IO is that every leaf ij in t ∈ σ(x) labeled with a hole i is substituted with the same tree ti ∈ σio(si). If we remove this restriction (that is: in Eq.(1) we replace t[ij ← ti] by t[ij ← tij ]), then we obtain the OI-substitution σoi(s) which can be much larger than σio(s). For example, for s = g(x, f(g(a, x))) and σ(x) = {t1, t2} we obtain

σoi(s) = {g(ti, f(g(a, tj))) | i, j ∈ {1, 2}} .

Clearly, σio(s) = σoi(s) if σ is a partial homomorphism. Thus, for a partial homomorphism ϕ we may write ϕ(s) without risking ambiguity. Another situation for σio(s) = σoi(s) is when duplications of holes do not appear. Duplication means that there are some x and t ∈ σ(x) where a hole i ∈ H appears at least twice in t. Duplications cannot appear in the traditional framework of words, but “duplication” is a natural concept for trees. Allowing duplications complicates the situation because it might happen that σio(L) is not regular, although L and σ are regular. Actually, this may happen even if σ is defined by a homomorphism ϕ : X → Tfin(Σ ∪ H). Recall that the notation Tfin(Σ ∪ H) refers to the subset of finite trees in T (Σ ∪ H). Examples of a homomorphism ϕ where ϕ(L) is not regular 4 Regular matching problems for infinite trees

are easy to construct. The classical example is L = {xn($) | n ∈ N} and ϕ(x) = f(1, 1). Then ϕ(L) is not regular. The corresponding trees of height 3 are depicted in Fig. 2. This led to

s3 = x ϕ(x) = f ϕ(s3) = f

x 1 1 f f

x f f f f

$ $ $ $ $ $ $ $ $

Figure 2 L = x∗($) is regular, but ϕ(L) is not.

the HOM-problem. The inputs are a homomorphism ϕ and a regular tree language L. The question is whether ϕ(L) is regular. The problem is decidable in the setting of finite trees by [14]. It is Dexptime-complete by [9]. Most of our work deals with regular tree languages. However, once the results are established for regular sets, we can easily push the results further to some border of decidability. We consider a class C of tree languages such that on input L ∈ C and a regular tree language K, the emptiness problem L ∩ K is decidable. For example, in the word case, the class C can be defined by the class of context-free languages, and then Conway’s result for finite words still holds if L is context-free and R is regular.

1.3 Statement of the main results Our main results can be found in Sec. 6. We are ready to formulate them now. Thm. 29 and can be rephrased as follows. First, the following decision problem is decidable

Input: Regular substitutions σ1, σ2 and tree languages L ⊆ T (Σ ∪ X ), R ⊆ T (Σ) such that R is regular and L ∈ C. Here C is as defined above such that emptiness of the intersection with regular sets is decidable. T (Σ∪H)\H Question: Is there some substitution σ : X → 2 satisfying both, σio(L) ⊆ R and σ1 ≤ σ ≤ σ2?

Second, we can effectively compute the set of maximal substitutions σ satisfying σio(L) ⊆ R and σ1 ≤ σ ≤ σ2. It is a finite set of regular substitutions. Cor. 31 shows that we can effectively compute the set of maximal substitutions σ satisfying σio(L) ⊆ R and σ(x) 6= ∅ for all x ∈ X . It is a finite set of regular substitutions. The requirement σ(x) 6= ∅ is rather natural and it reflects the original setting of Conway.

1.4 Roadmap to prove Theorem 29 Our proof is designed to be accessible for readers who are familiar with basic results about metric spaces and regular languages over trees.3 We don’t rely on (or use) the theory of monads. This is a categorial concept, which leads to a general notion of a syntactic algebra, see the arXiv-paper of Bojańczyk [3] for finite trees.4 For infinite trees notion of a syntactic algebra was given by Blumensath [2].

3 The simpler case of finite and infinite words can be found in Sec. 8.1 as a kind of warm-up exercise. 4 Conway’s result for finite trees follows from the theory of monads, too. Personal communication Mikołaj Bojańczyk, 2019. For the notion of a monad see [21, Chapters III to V]. C. Camino. V. Diekert, B. Dundua, M. Marin, and G. Sénizergues 5

In our paper we use nondeterministic finite (top-down) parity-tree automata to define an appropriate congruence of finite index. This is conceptually simple but there is no free lunch: the index of the congruence defined by an automaton is not guaranteed to be the smallest one.5

Having a convenient notion of a congruence, the next step is to define σio(s) for finite and infinite trees such that σio(s) is indeed the intended fixed point for Eq.(1). For finite trees the set σio(s) can be defined by induction on the size of s. Then σio(s) becomes the unique least fixed point of Eq.(1) satisfying σio(x) = σ(x) for all symbols of rank zero. For infinite trees the definition is more subtle. Given a tree s and a substitution σ we introduce a notion of a choice function. That is a function γ : Pos(s) → T (Σ ∪ H) ∪ {⊥} where Pos(s) is the set of positions (that is: vertices) of s such that the following holds. If u ∈ Pos(s) is labeled by x, then γ selects a tree γ(u) ∈ σ(x). If σ(x) = ∅, then γ(u) is not defined, which is denoted as γ(u) = ⊥. To each choice function we will associate a Cauchy sequence γn(s) in some complete metric space T⊥(Σ ∪ X ∪ H); and we let γ∞(s) = limn→∞ γn(s) be its limit. We can think of the space T⊥(Σ ∪ X ∪ H) as a union of the usual Cantor space T (Σ ∪ X ∪ H) together with an isolated point ⊥ which has distance 1 to every other point. Then we define

σio(s) = {γ∞(s) | γ is a choice function for s and γ∞(s) 6= ⊥} . (2)

It turns out that this definition coincides with the natural definition for finite trees, and it satisfies Eq.(1), too. Another crucial step on the road to show Thm. 29 is the following result. −1 If σ and R are regular, then the “inverse image” σio (R) = {s ∈ T (Σ ∪ X ) | σio(s) ⊆ R} is a regular set of trees. In order to prove this fact we use two well-known results. First, the class of regular tree languages can be characterized by alternating parity-automata, and the semantics of these automata can be defined by parity-games [22, 23, 5]. Second, parity-games are determined and have positional (= memoryless) winning strategies, [17].

2 Notation and preliminaries

We let N = {0, 1,...} denote the set of natural numbers, P = N\{0}, and P∗ to be the monoid of finite sequences of positive integers with the operation “.” and the neutral element . For r ∈ N we let [r] = {1, . . . , r}. We write 2S for the power set of S, and identify every element x ∈ S with the singleton {x}. A rooted tree is a nonempty, connected, and directed graph t = (V,E) with vertex set V and without multiple edges such that there is exactly one vertex, the root, without any incoming edge and all other vertices have exactly one incoming edge. As a consequence, for every vertex v ∈ V there is exactly one directed path from the root to v. Since there are no multiple edges we assume without restriction E ⊆ {(u, v) | u, v ∈ V, u 6= v}. If (u, v) ∈ E is an edge, then we say that v is a child of u, and u is the parent of v.A leaf of t is a vertex without children. Throughout, we restrict ourselves to directed graphs where the set of edges is (at most) countable. Hence, it is possible to encode the vertex set of a rooted tree as a subset of positions Pos(t) ⊆ P∗ satisfying the following conditions: ε ∈ Pos(t), and if u.j ∈ Pos(t), then both u ∈ Pos(t) and u.i ∈ Pos(t) for all 1 ≤ i ≤ j. Using this, we have root(t) = ε and edge set {(u, u.i) | u, u.i ∈ Pos(t)}. We are mainly interested in ordered trees: these are rooted trees where the children of a node are equipped with a “left-to-right” ordering. In such a case the “left-to-right breadth-first” ordering on positions can be represented by the length-lexicographical ordering on P∗. The size of a tree t is the cardinality of Pos(t). The level of a vertex u is the length of the unique directed path starting at the root and ending

5 With respect to worst case complexity this is more an advantage than a problem. “Generically” it is debatable whether it makes sense to spend any efforts in computing syntactic congruences. It doesn’t. 6 Regular matching problems for infinite trees

in u. Dually, the height of a vertex u is the length of the longest directed path starting at u. Leaves have height zero. The height of a tree is the height of its root. Typically, and actually without restriction, every position in a tree has a label in some set ∆. Such a tree t can be defined therefore through a mapping t : Pos(t) → ∆ where Pos(t) is the set of positions and t(u) ∈ ∆ is the label of the position u. The set of all trees with labels in ∆ is denoted by T (∆). Its subset of finite trees is denoted by Tfin(∆). More precisely, if Γ ⊆ ∆, then TΓ-fin(∆) is the set of trees where the number of positions with a label in Γ is finite. 0 0 Let t, t ∈ T (∆) and u ∈ Pos(t). We define the trees t|u and t[u ← t ] as usual:

∗ Pos(t|u) = {v ∈ P | u.v ∈ Pos(t)} with labeling t|u(v) = t(u.v), Pos(t[u ← t0]) = {u.u0 | u0 ∈ Pos(t0)} ∪ {v ∈ Pos(t) | u is not a prefix of v} , t[u ← t0](u.u0) = t0(u0) and t[u ← t0](v) = t(v) if u is not a prefix of v.

In almost all our cases (there is one exception in the proof of Prop. 13) the degree of vertices in a tree is bounded by a constant depending on ∆, and vertices have finite degree depending on their label. A ranked alphabet is a nonempty finite set of labels ∆ with a rank function rk : ∆ → N. Trees over a ranked alphabet have to satisfy the following additional constraint: ∀u ∈ Pos(t), if t(u) = x, then {1,..., rk(x)} = {i ∈ N | u.i ∈ Pos(t)}. Trees in T (∆) are represented by the set of terms, too. These are the ordered trees t ∈ T (∆). We adopt the standard notation of terms to denote trees: x(s1, . . . , sr) represents the tree s with s(ε) = x and s|i = si for all i ∈ [rk(x)]. Henceforth, if not otherwise specified, we let Ω = X ∪ Σ ∪ H be a ranked alphabet consisting of three sets: X is the set of variables, Σ is the set of function symbols, H is the set of holes. We require (X ∪ Σ) ∩ H = ∅ but X ∩ Σ 6= ∅ is not forbidden. Symbols a ∈ Σ with rk(a) = 0 are called constants. Holes are not constants, but they have rank 0, too. For simplicity, we assume H = [rmax] where rmax ∈ N satisfies rk(x) ≤ rmax for all x ∈ Σ ∪ X . To make the theory nontrivial, we assume that there is some x ∈ Σ ∪ X with rk(x) ≥ 1. In particular, T (Ω) contains finite trees where some leaves are labeled by a hole. For rk(x) ≥ 2 there are also infinite trees with this property. Given a tree t ∈ T (Ω) we denote by leafi(t) the set of leaves which are labeled by the hole i ∈ H. The length-lexicographical ordering of positions induced by P∗ is a well-order, which has the type of either a finite ordinal or the ordinal ω. This well-order induces a linear order on leafi(t), which can be represented by a downward-closed subset of P. That is, we may write leafi(t) = {ij | 1 ≤ j ≤ | leafi(t)|}. If | leafi(t)| = ∞, then this means leafi(t) = {ij | j ∈ P}. The term notation is also convenient for a concise notation of infinite trees using fixed point equations. For example, let f ∈ Σ be a function symbol of rank 2, then there is exactly one tree t ∈ T (Σ ∪ {1}) which satisfies the equation t = f(t, 1). It is depicted in Fig. 3. The set leaf1(t) is the set of all leaves. There is no leftmost leaf, but a rightmost leaf which is in turn the first one in the length-lexicographical ordering.

t = f(t, 1) = f Pos(t) = ε f 1 1 2 = 11 f 1 1.1 1.2 = 12 1 1.1.2 = 13

Figure 3 The tree (representing an “infinite comb”) t = f(t, 1) has infinitely many holes labeled ∗ by 1, but no leftmost hole. The set Pos(t) ⊆ P is depicted on the right. Following our convention, the subset leaf1(t) ⊆ Pos(t) is written as {11, 12, 13,...}.

Suppose that sets Tx ⊆ T (Σ ∪ [r]) and Ti ⊆ T (Σ) for i ∈ [r] are defined. Then we define

the set Tx[ij ← Tij ] ⊆ T (Σ) as the union over all trees tx[ij ← tij ] where tx ∈ Tx and

tij ∈ Tij . This is explained in more detail in Sec. 2.1. C. Camino. V. Diekert, B. Dundua, M. Marin, and G. Sénizergues 7

2.1 Substitutions: outside-in and inside-out for trees in TX -fin(Σ ∪ X ) The ranked alphabet Σ ∪ X contains function symbols and variables. As mentioned in the introduction we allow Σ ∩ X 6= ∅. In the following, if σ : X → 2T (Σ∪H) is a mapping which is specified on the set of variables, then we extend it to a mapping σ :Σ ∪ X → 2T (Σ∪H) by letting

σ(f) = {f(1,..., rk(f))} for all f ∈ H ∪ Σ \X . (3)

We say that a mapping σ : X → 2T (Σ∪H)\H is a substitution if σ satisfies the following additional property

σ(x) ⊆ T (Σ ∪ [rk(x)]) \ H for all x ∈ X . (4)

Note that (3) and (4) together imply that t ∈ T (Σ ∪ [rk(x)]) \ H for all t ∈ σ(x) and for all x ∈ Σ ∪ X . For all elements x ∈ Σ ∪ X of rank zero we have σ(x) ⊆ T (Σ). The set of substitutions is a partial order by letting σ ≤ σ0 if σ(x) ⊆ σ0(x) for all x ∈ X . A substitution σ is called a homomorphism (resp. partial homomorphism) if |σ(x)| = 1 (resp. |σ(x)| ≤ 1) for all x ∈ X . Since we identify elements and singletons we can also say that a homomorphism6 is specified by a mapping σ : X → T (Σ ∪ H) \ H. Recall that t[ij ← tij ] denotes the tree produced from s by replacing every position ij ∈ leafi(s) with the tree tij . According to [11, 12] there are two natural ways to extend a substitution σ to Tfin(Σ ∪ X ): outside-in (OI for short) and inside-out (IO for short). The corresponding notations are σoi and σio T (Σ∪H) respectively. Our goal is to extend σ to extensions σio and σoi from T (Σ∪H) to 2 such T (Σ) that T (Σ ∪ X ) maps to 2 . In this section we restrict ourselves to maps from TX -fin(Σ ∪ X ) to arbitrary subsets of T (Σ ∪ H). The inside-out-extension of σio including infinite trees relies on “choice functions”. We postpone the general definition of σio to Sec. 2.3. The corresponding extension of σoi can be found in Sec. 8.3. It is not used elsewhere. For a tree s = x(s1, . . . , sr) ∈ TX -fin(Σ ∪ X ), the sets of trees σio(si) and σoi(si) are defined by induction on the maximal level of a position labeled by some variable. If s ∈ T (Σ), then we let σoi(s) = σio(s) = {s}. Thus, we may assume that some variable occurs in s. For r = 0 we let σoi(s) = σio(s) = σ(x) ⊆ T (Σ). For r ≥ 1, the sets σio(si) ⊆ σoi(si) ⊆ T (Σ) are defined by induction for all i ∈ [r]. Hence, we can define

σio(s) = {tx[ij ← ti] | tx ∈ σ(x) ∧ ti ∈ σio(si)} (5)  σoi(s) = tx[ij ← tij ] tx ∈ σ(x) ∧ tij ∈ σoi(si) (6)

I Remark 1. For the interested reader we note that σio(s) 6= ∅ ⇐⇒ σoi(s) 6= ∅. Indeed, T (Σ) σio(s) 6= ∅ implies σoi(s) 6= ∅ because σio(s) ⊆ σoi(s) ⊆ 2 by definition. The other direction holds for r = 0. For r ≥ 1 the induction (on the maximal level) tells us that σoi(si) 6= ∅ implies σio(si) 6= ∅. J

There is more flexibility in OI than in IO because positions ij 6= ik of a hole i ∈ H may be substituted with σoi by different trees tij and tik , whereas with σio they are substituted by the same tree tij = tik . This means the i-th child of a position u in s is duplicated if there is some t ∈ σ(s(u)) where |leafi(t)| ≥ 2. Hence, σio(s) σoi(s) is possible because of “duplication”. Fig. 4 depicts an example for σio(s) 6= σoi(s). Here, x is a variable of rank 1 and z is a variable of rank 0 (playing the role of “end-of-string”). Note that in this situation ∗ ∗ (with σ(x) = f(1, 1) and σ(z) = {a, b}) neither σio(x (z)) nor σoi(x (z)) is regular, since n n n every tree in σio(x z) and in σoi(x z) is a full binary tree with 2 leaves. Reading the labels of the leaves from left-to-right reveals the difference. For n ≥ 1 and a 6= b the “leaf-language”

6 Homomorphisms σ such that σ(x) ∈/ H for all x ∈ X are called non-erasing by Courcelle in [8, page 117]. Thus, there is some difference in notation between our paper and [8] 8 Regular matching problems for infinite trees

s = x, σ(x) = f, f ∈ σoi(s) \ σio(s)

x 1 1 f f

z σ(z) = {a, b}, a a a b

n Figure 4 Let a 6= b be two constants and n ≥ 1. Duplication of the hole 1 yields |σio(x (z))| = 2 n n+1 n n and |σoi(x (z))| = 2 . In particular, σoi(x (z)) \ σio(x (z)) for all n ≥ 1.

n 2n 2n n n of σio(x z) is the two-element set {a , b } whereas the “leaf-language” of σio(x z) has 2 n elements. It is equal to {a, b}2 . Consider a modification of the example in Fig. 4 by letting σ(z) = ∅ (or any other subset ∗ of trees in T (Σ)) and σ(x) = {f(1, 1), a}. This leads to a striking situation where σio(x (z)) ∗ n is not regular but σoi(x (z)) is the set of all finite trees over {f, a}. Indeed, σio(x (z)) is the m 3 set of all full binary trees with 2 leaves for all 0 ≤ m < n. The set σio(x (z)) is depicted in Fig. 5. All leaves are labeled by a and all inner nodes are labeled by f. In contrast, n σoi(x (z)) is the set of all trees in T ({f, a}) with height less than n.

s3 = x σio(s3) = {a , f , f } x a a f f x z a a a a

∗ Figure 5 L = x (z) and σ(x) = {f(1, 1), a}. Then σio(L) is not regular, but σoi(L) = T ({f, a}).

Yet another variant is given by Σ = {f, a, b}, H = {1}, and X = {x, y} where rk(f) = 2, rk(x) = rk(y) = rk(a) = 1, and rk(b) = 0. Let σ(x) = t where t = f(t, 1) as in Fig. 3 and σ(y) = a∗(b). Then we obtain the following situation as depicted in Fig. 6.

In the sequel, we show that the domain of σio can be extended to T (Σ ∪ X ) such that equation (5) still holds for trees in TX -fin(Σ ∪ X ). We shall use two auxiliary concepts: complete metric spaces (Sec. 2.2) and choice functions (Sec. 2.3).

2.2 Complete metric spaces with ⊥ for “undefined” Let us introduce a special constant ⊥ ∈/ Ω which plays the role of “undefined”. The idea is that an empty set σio(s) is replaced by the singleton ⊥. We turn T (Ω ∪ {⊥}) into a metric space by defining a metric d, which makes sure that both sets, {⊥} and T (Ω), become clopen (= open and closed) subspaces in (T (Ω ∪ {⊥}), d). We let 2−∞ = 0 and define: ( 2− inf{|u| | u∈Pos(s)∩Pos(t)∧s(u)6=t(u)} if either s, t ∈ T (Ω) or both, s, t∈ / T (Ω), d(s, t) = 1 otherwise: s ∈ T (Ω) ⇐⇒ t∈ / T (Ω).

Note that every constant in T (Ω ∪ {⊥}), like the constant ⊥, has distance 1 to every other element in T (Ω ∪ {⊥}). As usual, T (Ω) is closed in T (Ω ∪ {⊥}) but, thanks to the definition of d, it is also open. Hence, each of the subsets T (Ω), {⊥} ∪ T (Ω), and their complements with respect to T (Ω ∪ {⊥}) are clopen subsets in (T (Ω ∪ {⊥}), d). The restriction to the clopen subspace {⊥} ∪ T (Ω) yields a compact ultra-metric space such that

d(s, t) = 2− inf{|u| | u∈Pos(s)∩Pos(t)∧s(u)6=t(u)} (7)

holds for all s, t ∈ {⊥} ∪ T (Ω). The ambient metric space (T (Ω ∪ {⊥}), d) is not complete and therefore not compact. Indeed, recall that, by our convention, there is some function C. Camino. V. Diekert, B. Dundua, M. Marin, and G. Sénizergues 9

f f f an(b) f an1 (b) f an(b) f an2 (b) an(b) an3 (b)

Figure 6 For all n ∈ N there exists tn ∈ σio(xy(b)) as shown on the left. For all sequences (ni)i∈N there is a tree t ∈ σoi(xy(b) as on the right. The set σoi(xy(b)) is regular, but σio(xy(b)) is not. symbol of rank at least one. Hence, there exists an infinite tree s and a Cauchy sequence

(tn)n∈N in T (Ω ∪ {⊥}) \ T (Ω) such that (tn)n∈N converges to s in the usual Cantor-space, but n not in our metric. For example, assume rk(a) = 1, then the Cauchy sequence (a (⊥))n∈N does not have any limit in (T (Ω ∪ {⊥}), d). However, as ⊥ represents “undefined”, we n wish that limn→∞ a (⊥) = ⊥. There is a way to achieve that: we identify the clopen set T (Ω ∪ {⊥}) \ T (Ω) with the constant ⊥. Thereby ⊥ becomes an isolated point. To be precise, let us define the equivalence relation ∼ on T (Ω ∪ {⊥}) which is induced by letting ⊥ ∼ t for all t ∈ T (Ω ∪ {⊥}) \ T (Ω). Now, we have an(⊥) ∼ ⊥ for all n. Hence, the image of n the sequence in the quotient space (a (⊥))n∈N is equal to the constant sequence (⊥)n∈N with the obvious limit ⊥. The natural embedding of {⊥} ∪ T (Ω) into the quotient space T (Ω ∪ {⊥})/ ∼ induces an isometry between (T (Ω) ∪ {⊥}, d) and (T (Ω ∪ {⊥})/ ∼, d∼) where 7 d∼ is the canonical quotient metric on T (Ω ∪ {⊥})/ ∼.

2.3 Choice functions and the definition of σio for infinite trees

Henceforth, T⊥(Σ ∪ H) denotes the complete metric space (T (Σ ∪ H) ∪ {⊥}, d) which, by the previous subsection, is identified with the quotient space (T (Σ ∪ H ∪ {⊥})/ ∼, d∼). A choice function for s ∈ T (Σ ∪ X ) is a mapping γ : Pos(s) → T⊥(Σ ∪ H) such that

γ(u) ∈ {⊥} ∪ T (Σ ∪ [rk(s(u))]) \ H with γ(u) = f1,..., rk(f) if f = s(u) ∈ Σ \X .

0 0 For u ∈ Pos(s) we let γ|u : Pos(s|u) → T⊥(Σ∪H) be the mapping defined by γ|u(u ) = γ(u.u ). Note that γ|u is a choice function for the subtree s|u whenever u ∈ Pos(s) and γ is a choice function for s. For every s ∈ T (Σ ∪ X ) and choice function γ for s, we define the sequence of trees (γn(s))n∈N in T (Σ ∪ H ∪ {⊥}) as follows:

γ0(s) = γ(ε) and γn(s) = γ(ε)[ij ← (γ|i)n−1(s|i)] if n > 0. (8)

This yields a Cauchy sequence (γn(s))n∈N in T⊥(Σ ∪ H) and therefore, limn→∞ γn(s) exists. Since choice functions don’t map tree positions to holes, we have limn→∞ γn(s) ∈ {⊥}∪T (Σ).

T (Σ∪H)\H I Definition 2. Let σ : X → 2 be a substitution and s ∈ T (Σ ∪ X ). We define:

1. The tree γ∞(s) = limn→∞ γn(s) ∈ {⊥} ∪ T (Σ). 2. The set Γ(σ, s) by the set of choice functions for s satisfying for all u ∈ Pos(s) with x = s(u) the following condition: if σ(x) 6= ∅, then γ(u) ∈ σ(x), otherwise γ(u) = ⊥.

3. The set σio(s) = {γ∞(s) | γ ∈ Γ(σ, s)} \ {⊥}.

T (Σ∪H)\H I Proposition 3. Let σ : X → 2 be a substitution, s = x(s1, . . . , sr) ∈ T (Σ ∪ X ) be a tree, and γ ∈ Γ(σ, s) be a choice function. Then γ∞(s) = γ(ε)[ij ← (γ|i)∞(si)].

7 The interested reader may consult Sec. 8.4 for the definition of a quotient metric in general topology. 10 Regular matching problems for infinite trees

Proof. Since γ∞(s) = limn→∞ γn(s) and (γ|i)∞(si) = limn→∞(γ|i)n(si) for all i, we obtain

γ(ε)[ij ← (γ|i)∞(si)] = γ(ε)[ij ← lim (γ|i)n(si)] n→∞

= γ(ε)[ij ← lim (γ|i)n−1(si)] n→∞

= lim γ(ε)[ij ← (γ|i)n−1(si)] n→∞

= lim γn(s) = γ∞(s). n→∞

J

I Corollary 4. Let s = x(s1, . . . , sr) ∈ T (Σ ∪ X ) be a tree. Then we have σio(s) = {tx[ij ← ti] | tx ∈ σ(x) ∧ ti ∈ σio(si)} . In particular, for finite trees the new definition of σio(s) in Def. 2 agrees with the earlier one given in Eq.(5).

Proof. We use the notation as given in Def. 2.

[  σio(s) = γ(ε)[ij ← (γ|i)∞(si)] γ ∈ Γ(σ, s) by Prop. 3  = tx[ij ← (γ|i)∞(si)] tx ∈ σ(x), γ|i ∈ Γ(σ, s|i) trivial

= {tx[ij ← ti] | tx ∈ σ(x) ∧ ti ∈ σio(si)} by Def. 2

Hence, σio(s) = {tx[ij ← ti] | tx ∈ σ(x) ∧ ti ∈ σio(si)} as desired. J

The following examples show that σio(s) is not closed in T⊥(Σ), in general. Every (infinite) tree s ∈ T (Σ) can be written as a limit limn→∞ sn where sn are finite trees in T (Σ)

(assuming, as usual, that Σ contains a constant). Then there are Cauchy sequences (tn)n∈N with tn ∈ σio(sn) which do not converge to any tree in σio(s). In these examples x is a variable of rank one, Σ = {f, a, b} with a constant b, rk(a) = 1, rk(f) = 2, and where only the first hole 1 ∈ H is used. In the first example we have s = sn for all n.

n I Example 5. Consider s = x(b) and σ(x) = {a (1) | n ∈ N}. Then s = limn→∞ sn where (n) n (n) (n) n sn = s. We define tn = γ (s) = a (b) where γ ∈ Γ(s, σ), γ (ε) = a (1), and note that

(tn)n∈N is a Cauchy sequence with every tn ∈ σio(sn), but

ω n lim tn = a ∈/ σio(s) = {a (b) ∈ }. n→∞ N

The next example is a modification of Ex. 5 such that s becomes infinite and the sequence sn is increasing such that sn+1 = sn[1 ← x(1)].

ω n I Example 6. Let s = f(a , x(b)) and σ be defined by σ(x) = {a (1) | n ∈ N}. Then s = n n n limn→∞ sn where sn = f(a (1), x(b)). We have tn = f(a (1), a (b)) ∈ σio(sn) since for each n (n) (n) n we may use the choice function γ with γ (u) = a (b) where u = 2 ∈ Pos(s). The tn’s form ω ω ω n a Cauchy sequence with t = f(a , a ) = limn→∞ tn but t∈ / σio(s) = {f(a , a (b)) | n ∈ N}.

3 Regular tree languages

There are several equivalent definitions for regular languages of finite and infinite trees, see [5, 15, 25, 27]. Here, we will consider regular languages of finite and infinite trees from T (Σ ∪ X ∪ H ∪ {⊥}). Note that for Γ ⊆ ∆ ⊆ Σ ∪ X ∪ H ∪ {⊥}, the sets Tfin(∆) (more general: TΓ-fin(∆)) and T (∆) are regular subsets of T (Σ ∪ X ∪ H ∪ {⊥}). C. Camino. V. Diekert, B. Dundua, M. Marin, and G. Sénizergues 11

3.1 Nondeterministic tree automata with parity acceptance We use parity-NTAs (nondeterministic tree automata with a parity acceptance condition) as the basic reference to characterize regular tree languages. The parity condition is used to accept infinite trees. In our definition a parity-NTA accepts sets of finite and infinite trees. Alternating tree automata with a parity acceptance condition will be considered later. Let ∆ be any ranked alphabet (finite as usual) with a rank function rk : ∆ → N.A parity-NTA over ∆ is specified by a tuple A = (Q, ∆, δ, χ) where Q is a nonempty finite set of states, δ is the transition relation, and χ : Q → C is a coloring with C = {1,,..., |C|}. Without restriction, we assume that |C| is odd. We have [ δ ⊆ Q × {f} × Qrk(f). (9) f∈∆ Thus, δ is a set of tuples p, f, p1, . . . , pr) where r = rk(f). The acceptance condition is defined as follows.

I Definition 7. Let A be a parity-NTA and let t ∈ T (∆). A run ρ of t is a relabeling of the positions of t by states which is consistent with the transitions. That is, ρ : Pos(t) → Q is a mapping such that for all u ∈ Pos(t) with

t(u) = f there is a transition (p, f, p1, . . . , prk(f)) ∈ δ such that ρ(u) = p and ρ(u.j) = pj for all j ∈ [rk(f)]. (See [15] for more details.) If ρ(ε) = p, then we say that ρ is a run of t at state p. Let ρ be a run and (u0, u1, u2,...) be an infinite path in Pos(t) such that uj+1 is a child of uj. The path is accepting if the number

lim inf(χρ(uk)) = max {min {χρ(ui) ∈ C | i ≥ k} | k ∈ N} k→∞ is even. This means that the minimal color appearing infinitely often on this path is even. The run ρ is accepting if all its all infinite directed paths are accepting. By RunA(t, p) we denote the set of accepting runs of t at state p. (If the context to A is clear, we might simply write Run(t, p).) If p ∈ Q, then we let L(A, p) = {t ∈ T (∆) | RunA(t, p) 6= ∅} . It is the accepted language of A at state p. For P ⊆ Q we define L(A, P ) by \ L(A, P ) = {L(A, p) | p ∈ P } . (10)

The set L(A, P ) ⊆ T (∆) is called the accepted language at the set P . The convention is as usual: L(A, ∅) = T (∆). For rk(f) = 0 there are no children; and we accept f at state p if and only if (p, f) ∈ δ. Therefore, no final states appear in the specification of parity-NTA. For a finite tree t ∈ T (∆) we have t ∈ L(A, p) if and only if there exists a run ρ of t with ρ(ε) = p since the parity condition holds vacuously. Thus, for trees in Tfin(∆) no other accepting condition is required than the existence of a run. The following fact is well-known, see for example [15]. We shall use Prop. 8 as the working definition for the present paper.

I Proposition 8. A language L ⊆ T (∆) is regular if and only if there is a parity-NTA A = (Q, ∆, δ, χ) and a state p ∈ Q such that L = L(A, p). It is a well-known classical fact that the class of regular tree languages forms an effective Boolean algebra [25]. This means that first, there is a parity-NTA accepting all trees in T (∆) and second, given two parity NTAs A1 and A2 with states p1 and p2 respectively, we can can effectively construct a parity NTA accepting L(A1, p1) \ L(A2, p2). In particular, Prop. 8 implies that for each parity-NTA A and all subsets P,P 0 ⊆ Q we can effectively construct a parity-NTA B with a single initial state q such that L(B, q) = L(A, P ) \ L(A, P 0). 12 Regular matching problems for infinite trees

4 Tasks and profiles

In this section we introduce the notions of a task and a profile. Throughout, we let H = {1,..., |H|} be a set of holes and Σ be a ranked alphabet such that rk(f) ≤ |H| for all f ∈ Σ. We also fix a parity-NTA B = (Q, Σ, δ, χ) such that L(B, p) ⊆ T (Σ) for all p ∈ Q. Every such B is extended to a parity-NTA BH by adding more transitions ensuring that every hole i ∈ H is accepted at all states. Thus, we let BH = (Q, Σ ∪ H, δH , χ) where δH = δ ∪ (Q × H). Clearly, L(BH , p) is regular and therefore the complement T (Σ ∪ H) \ L(BH , p) is regular, too. For the rest of this section, a run ρ of a tree t refers to

a tree t ∈ T (Σ ∪ H) and the NTA BH . Thus Run(t, p) = RunBH (t, p) if not stated explicitly otherwise. Let ρ be any run of a tree t : Pos(t) → Σ (not necessarily accepting) and let 0 0 0 0 α = (u0, u1,...), α = (u0, u1,...) be infinite directed paths in Pos(t) with ε = u0 = u0. We concentrate on infinite paths because these paths decide whether ρ is accepting. Since 0 directed paths in trees are directed away from the root, ui+1 is a child of ui and ui+1 is a 0 0 child of ui for all i ∈ N.) The paths α and α define infinite words over Σ and the run ρ defines infinite words ρ(α) and ρ(α0) over Q. Suppose each path is cut into infinitely many 0 finite and nonempty pieces wi and wi such that we can write ρ(α) = ρ(w0)ρ(w1) ··· and 0 0 0 0 + 0 ρ(α ) = ρ(w0)ρ(w1) ··· where all ρ(wi) and ρ(wi) are in Q . Assume that α satisfies the 0 parity condition. We aim for a condition based on the factors ρ(wi) and ρ(wi) which is strong enough to imply that the other infinite word α satisfies the parity condition, too.8 0 Naturally, such a condition is related to colors which appear in χρ(wi) and χρ(wi). More 0 0 precisely, let ci (resp. ci) denote the minimal color in χρ(wi) (resp. χρ(wi)) for i ∈ N. We 0 0 obtain two infinite words (c0, c1,...) and (c0, c1,...) over the alphabet C of colors. If we 0 compare locally each color ci with the corresponding color ci, then intuitively, with respect to the parity condition, an even color is better than an odd color. A small even color is better than a large even color. However, a large odd color is better than a small odd color. Based on that intuition, let us introduce the best-ordering best on the set {0,..., |C|}. Note that we explicitly include 0 which is not in χ(Q) and that |C| is odd by our convention. We let

0 best 2 best · · · best |C| − 1 best |C| best |C| − 2 best · · · best 3 best 1 (11)

Indeed, in the linear order best all even numbers are “better” than the odd numbers. Among even numbers smaller is better. Among odd numbers larger is better. 0 0 As a consequence, if ci best ci for all i ∈ N and if the parity condition holds for α , then it holds for α.

I Definition 9. A task is a tuple (p, ψ1, . . . , ψ|H|) with p ∈ Q and ψi : Q → {0} ∪ χ(Q) for 1 ≤ i ≤ |H|.A profile is a set of tasks.

The range of ψi in Def. 9 is a subset of the set {0,..., |C|} where 0 does not belong to χ(Q), but it is the best number in the linear order best. As we will see, for “satisfying” a task the value ψi(q) = 0 is better than any value in χ(Q), The best (resp. worst) value in χ(Q) is the smallest even (resp. odd) number. The best-ordering defines a partial order 0 0 0 0 on the set of tasks: we write (p, ψ1, . . . , ψ|H|) ≤ (p , ψ1, . . . , ψ|H|) if first, p = p and second, 0 ψi(q) best ψi(q) for all q ∈ Q and i ∈ H.

0 I Definition 10. Let τ = (p, ψ1, . . . , ψ|H|) be a task and t, t ∈ T (Σ ∪ H ∪ {⊥}) be trees.

1. A run ρ ∈ RunBH (t, p) satisfies τ if the following condition holds for all leaves ij ∈ leafi(t):

8 To find such a sufficient condition we just mimic the standard concept of a strongly recognizing morphism in the theory of ω-regular words as exposed for example in the appendix, Sec. 8.1. C. Camino. V. Diekert, B. Dundua, M. Marin, and G. Sénizergues 13

If cρ(ij) denotes the minimal color (w.r.t. to the natural order ≤ for natural numbers) on the path from the root of t to position ij ∈ leafi(t) in the tree χρ : Pos(t) → C, then we have cρ(ij) best ψi(ρ(ij)). We also write ρ |= τ in that case. Moreover, we write t |= τ if there exists some

ρ ∈ RunBH (t, p) such that ρ |= τ. Note that a tree containing the symbol ⊥ cannot satisfy any task because there is no run on such a tree.

2. A run ρ ∈ RunBH (t, p) defines τ if first, ρ |= τ and second, for all q ∈ Q and i ∈ H we have ψi(q) ≥ 1 ⇐⇒ ∃ij ∈ leafi(t): ψi(q) = cρ(ij). 3. The set of all tasks τ such that there is a tree t ∈ T (Σ ∪ H ∪ {⊥}) with t |= τ is denoted by T = TB. 4. By π(t) we denote the set of tasks τ such that t |= τ. In particular, τ ∈ π(t) implies τ ∈ T . 5. The set of all profiles π such that there is a tree t ∈ T (Σ ∪ H ∪ {⊥}) with π = π(t) is denoted by P = PB. 0 0 6. By ≡B we denote the equivalence relation which is defined by t ≡B t ⇐⇒ π(t) = π(t ).

Note that π(t) = ∅ if and only if RunBH (t, p) = ∅ for all p ∈ Q. Since ⊥ ∈/ T (Σ ∪ H), we have π(⊥) = ∅. The index of ≡B is finite since it is bounded by |P|. More precisely,

0 0 |Q|·(|C|+1)|H×Q| |{{t ∈ T (Σ ∪ H) | t ≡B t } | t ∈ T (Σ ∪ H)}| ≤ |PB| ≤ 2 . (12)

The value 0 in the range of ψi plays the following role: Let q ∈ Q and ρ |= (p, ψ1, . . . , ψ|H|) with ρ ∈ Run(p, t). Then ψi(q) = 0 implies {ij ∈ leafi(t) | ρ(ij) = q} = ∅. The condition is vacuously true if leafi(t) = ∅. The following proposition says that {t ∈ T (Σ ∪ H) | t |= τ} is effectively regular for every task τ ∈ TB. The proof of Prop. 11 is based on a product construction where the first component simulates the NTA BH and the second component remembers the minimal color appearing on paths in runs ρ of BH at states p.

I Proposition 11. Let τ = (p, ψ1, . . . , ψ|H|) ∈ TB be any task. Then the set of trees {t ∈ T (Σ ∪ H) | t |= τ} is effectively regular. More precisely, {t ∈ T (Σ ∪ H) | t |= τ} = L(Bτ , (p, χ(p))) where Bτ = (Q × C, Σ ∪ H, δτ , χτ ) is the following parity-NTA. For a color c ∈ C, a function symbol f ∈ Σ with r = rk(f), a hole i ∈ H, and states q, p1, . . . , pr ∈ Q we define δτ by the following equivalences:  (q, c), f, (p1, min{c, χ(p1)}),..., (pr, min{c, χ(pr)}) ∈ δτ ⇐⇒ (q, f, p1, . . . , pr) ∈ δB (13)  (q, c), i ∈ δτ ⇐⇒ c best ψi(q) (14)

The color of a pair (q, c) is defined by χτ (q, c) = χ(q).

Proof. Let pr1 : Q × C → Q, (p, c) 7→ p be the projection onto the first component. Let t ∈ T (Σ ∪ H). The aim is to show that pr1 defines for t a canonical bijection between accepting runs ρ at a state p in BH satisfying τ and the set of accepting runs ρτ at the state (p, χ(p)) in Bτ . This implies {t ∈ T (Σ ∪ H) | t |= τ} = L(Bτ , (p, χ(p))) and hence, {t ∈ T (Σ ∪ H) | t |= τ} is effectively regular. 0 The definition of δτ says that pr1ρ : Pos(t) → Q is a run in RunBH (t, p) for every 0 0 0 t ∈ T (Σ ∪ H) and every run ρ ∈ RunBτ (t, (p, χ(p))). Here, as usual, pr1ρ = pr1 ◦ ρ denotes the composition. Consider any nonempty directed path ε, u1, . . . , uk in t, then, by 0 induction on k, we see that ρ (uk) = (qk, ck) satisfies ck = min{c1, . . . , ck}. We claim that 0 0 pr1ρ |= τ. Indeed, the minimal color seen on every infinite directed path of the tree ρ t with 0 0 ρ (ε) = (p, χ(p)) is even. Since χτ (q, c) = χ(q), the same holds for pr1ρ . Together with the 0 definition of δτ ((q, c), i) we obtain the claim that pr1ρ |= τ. Thus, the projection onto the first component defines a mapping

0 0 0 pr1 : RunBτ (t, (p, χ(p))) → {ρ ∈ RunBH (t, p) | ρ |= τ} , ρ 7→ pr1(ρ ) = pr1ρ . (15) 14 Regular matching problems for infinite trees

We wish a bijection. Therefore, let us also define a mapping ρ 7→ ρτ in the other direction

from {ρ ∈ RunBH (t, p) | ρ |= τ} to RunBτ (t, (p, χ(p))). Given ρ ∈ RunBH (t, p) such that ρ |= τ, we define the run ρτ top-down beginning at the root with ρτ (ε) = (p, χ(p)). Now, assume that ρτ (u) = (q, c) is defined for u ∈ Pos(t) such that ρ(u) = q. Suppose t(u) = f

and rk(f) = r Then, thanks to ρ ∈ RunBH (t, p), there is some (q, f, p1, . . . , pr) ∈ δBH such that ρ(u.i) = pi for all 1 ≤ i ≤ r. However, since ρτ (u) = (q, c) is fixed there is exactly  one corresponding transition (q, c), f, (p1, min{c, χ(p1)}),..., (pr, min{c, χ(pr)}) ∈ δτ and we obtain ρτ (u.i) = min{c, χ(pi)} for all 1 ≤ i ≤ r. This is clear for f ∈ Σ by (13) and f = i ∈ H by (14) because ρ |= τ. Moreover, since every infinite path in the tree ρ satisfies the parity condition, the same is true for the run ρτ because the color of a state

χ(q, c) is defined by the first component: χ(q, c) = χ(q). Thus, ρτ ∈ RunBτ (t, (p, χ(p))) as 0 0 0 desired. A straightforward inspection shows (pr1ρ )τ = ρ for all ρ ∈ RunBτ (t, (p, χ(p))) and

pr1(ρτ ) = ρ for all ρ ∈ RunBH (t, p) where ρ |= τ. This shows that the mapping pr1 defined in (15) is bijective. We conclude {t ∈ T (Σ ∪ H) | t |= τ} = L(Bτ , (p, χ(p))). J

I Corollary 12. Let π be a profile, then {t ∈ T (Σ ∪ H ∪ {⊥}) | π(t) = π} is effectively regular. Proof. Let π = ∅, first. Then {t ∈ T (Σ ∪ H ∪ {⊥}) | π(t) = π} is the disjoint union of T (Σ ∪ H ∪ {⊥}) \ T (Σ ∪ H) (which is regular) and the set {t ∈ T (Σ ∪ H) | π(t) = π}. For π 6= ∅, we have {t ∈ T (Σ ∪ H ∪ {⊥}) | π(t) = π} ⊆ {t ∈ T (Σ ∪ H) | π(t) = π}. Thus, it is enough to show that {t ∈ T (Σ ∪ H) | π(t) = π} is regular. This set coincides with \ \ {t ∈ T (Σ ∪ H) | t |= τ} ∩ {t ∈ T (Σ ∪ H) | t 6|= τ} (16) τ∈π τ∈ /π

which is a finite intersection of languages, or complements of languages, from the set of languages L = {{t ∈ T (Σ ∪ H) | t |= τ} | τ ∈ τB}. By Prop. 11, the family L consists of regular tree languages. Since the class of regular tree languages in T (Σ ∪ H ∪ {⊥}) is an effective Boolean algebra [25], we conclude that {t ∈ T (Σ ∪ H ∪ {⊥}) | π(t) = π} is effectively regular too. J

I Proposition 13. Let B be a parity NTA with state set Q such that L(B, p) ⊆ T (Σ) for all 0 p ∈ Q and s ∈ T (Σ ∪ X ) be a tree. If γ, γ : Pos(s) → T⊥(Σ ∪ H) are two choice functions for 0 0 s such that γ (u) ≡B γ(u) for all u ∈ Pos(s), then γ∞(s) ∈ L(B, p0) ⇐⇒ γ∞(s) ∈ L(B, p0) for all states p0 of B.

In the following proof we encounter trees with infinite directed paths and where nodes may have an infinite degree: the β-sequences. This requires some attention.

Proof. Let s = x(s1, . . . , sr) and γ∞(s) ∈ L(B, p0). By symmetry, it is enough to show 0 γ∞(s) ∈ L(B, p0). Recall that γ∞(s) is the limit of trees γn(s). Fig. 7 depicts the basic relationship between paths in γn(s) and paths in s. For the path in s it shows positions u ∈ Pos(s) which can be identified with positions on the corresponding path in γn(s). Let us explain the relationship of the depicted paths in more detail. We begin with a directed path p in γn(s) starting at the root ε which defines t0 = γ(ε). The path p stays inside t0 or it leaves t0 using a hole labeled by some h1 ∈ H. Assuming the second case, we let u0 = ε 9 and h1 = u1 ∈ Pos(s) ∩ P. Hence, t1 = γ(u1) exists . The path p continues in the tree γn(s). Again, there are two possibilities: it stays inside t1 or it leaves t1 using a hole labeled by 2 some h2 ∈ H. If it leaves, we let u2 = h1 . h2 ∈ Pos(s) ∩ P . If possible, we continue by defining u3 ∈ Pos(s), t3 = γ(u3), and h3 ∈ H, etc. From the very beginning we face two cases: either the path stops at some leaf of γn(s) (which is also a leaf of tn) labeled by some

9 Observe that Pos(s) ∩ H = [r]. Thus, the choice function determines t1. C. Camino. V. Diekert, B. Dundua, M. Marin, and G. Sénizergues 15

u0 = ε h1 t0 = γ(u0) u1 = h1 s = h2 t1 = γ(u1) = γn(s)

u n−1 = h1. · · · .hn−1 hn tn−1 = γ(un−1) un = h1. · · · .hn hn+1 = i tn = γ(un)

ij ∈ leafi(γn(s))

Figure 7 A directed path in γn(s) starting at the root maps to a unique sequence of positions ε, h1, h1.h2 ... in s (indicated by dashed arrows). These positions of the sequence can be identified with positions in γn(s): the top positions of the small triangles (indicated by plain arrows from positions in s to positions in γn(s)). The depicted directed path ends in a leaf ij , but it could also stay inside tn or end at a leaf of tn which is a constant.

hn+1 ∈ H or there is some m ≤ n such that the path stays in tm. Fig. 7 shows a situation where p ends in a leaf of γn(s). This implies that there exists un.hn+1 ∈ Pos(s). Assume p stays in tm for some m ≤ n. This path can be finite or it can be infinite. Moreover, it may possibly stay inside tm although, perhaps, for all h ∈ H there exist positions um.h ∈ Pos(s). However, in that case, we do not define any position um+1.

Now, consider u ∈ Pos(s) and p ∈ Q such that RunBH (γ(u), p) 6= ∅. Then each run

ρ ∈ RunBH (γ(u), p) defines a task τρ = (p, ψ1, . . . , ψ|H|) by

ψ (q) = sup {c (i ) ∈ {0,..., |C|} | ∃i : q = ρ(i )} . (17) i best ρ j j j

10 The the existence of ρ implies that γ(u) |= τρ. The task τρ is minimal among all tasks τ such that γ(u) |= τ. The minimality of τρ implies that there is some leaf ij in γ(u) 0 where q = ρ(ij) if and only if ψi(q) ≥ 1. Moreover, Since γ(u) ≡B γ (u), there exists some 0 0 0 ρ ∈ RunBH (γ (u), p) such that ρ |= τρ. It follows that there is a mapping

0 0 θ : RunBH (γ(u), p) → RunBH (γ (u), p), ρ 7→ ρ (18)

0 0 such that ρ = θ(ρ) satisfies ρ |= τρ. 0 The definition of θ is crucial to prove γ∞(s) ∈ L(B, p0). Note that the shape of t = γ(u) and t0 = γ0(u) might be very different. For example, t can be finite whereas t0 is infinite, or 0 0 vice versa. The trees share however leafi(t) 6= ∅ ⇐⇒ leafi(t ) 6= ∅ because γ(u) ≡B γ (u). 0 Still, the cardinalities of leafi(t) and leafi(t ) might differ drastically. The rest of the proof uses an additional concept of an IO-prefix. Let t, tε, ti ∈ T⊥(Σ ∪ H) be trees for each i ∈ H where leafi(tε) 6= ∅. We say that tε is an IO-prefix of t if we can write t = tε[ij ← ti]. 0 In order to see γ∞(s) ∈ L(B, p0) we fix a run ρ ∈ Run(γ∞(s), p0)) witnessing γ∞(s) ∈ L(B, p0). Using this fixed ρ we construct a set of (finite or infinite) sequences α = 0 0 (u0, p0, ρ0) ··· (uk, pk, ρk) and β = (u0, p0, ρ0) ··· (uk, pk, ρk) such that for all i we have 0 0 ui ∈ Pos(s), pi ∈ Q, and ρi ∈ Run(γ(ui), pi) (resp. ρi ∈ Run(γ (ui), pi)). See Fig. 7 for an intuition about the positions ui. We require that uj+1 is child of uj for all 0 ≤ j < k. The set of sequences α (resp. β) form themselves a rooted tree where the root is the empty sequence. The parent of a nonempty sequence w0 ··· wk−1wk of length k is the prefix w0 ··· wk−1. As we will see later, each node in the tree defined by an α-sequence has at most |H × C| children whereas a node in the tree defined by a β-sequence may have infinitely many children. The

10 Minimal w.r.t. the natural partial order on tasks defined by best 16 Regular matching problems for infinite trees

number of children is actually equal to the size of the index sets Iα resp. Iβ as defined in (19) and (20) below. Every such sequence α = (u0, p0, ρ0) ··· (uk, pk, ρk) will define below a unique position ν(α) in γ∞(s) such that the subtree of γ∞(s) at position ν(α) has the tree γ(uk) as an 0 IO-prefix as defined above. The same will be true for a sequence β with respect to γ∞(s). 0 The position of the empty sequence ν(ε) is ε which is the root of γ∞(s) and γ∞(s). We define a sequence α1 = (u0, p0, ρ0) = (ε, p0, ρ0) such that ρ0 ∈ Run(γ(ε), p0) is induced by 0 0 ρ ∈ Run(γ∞(s), p0). This yields β1 = (ε, p0, ρ0) with ρ0 = θ(ρ0) as defined by (18). Now, let k ∈ N and assume that a sequence α = (u0, p0, ρ0) ··· (uk, pk, ρk) is already 0 0 defined. By induction, we also know a position ν(α ) ∈ Pos(γ∞(s)) where α is defined by 0 the equation α (uk, pk, ρk) = α. Let

Iα = {(i, q) ∈ H × Q | ∃ij ∈ leafi(γ(uk)) : ρk(ij) = q} . (19)

For each (i, q) ∈ Iα we define a triple (uk.i, q, ρ(i, q)). Here, uk.i is the i-th child of uk ∈ Pos(s). The children of α are defined by the sequences α · (uk.i, q, ρ(i, q)). Thus, there are at most |H × C| children in α-sequences. Before we define the run ρ(i, q), let us define the position 0 ν(α) ∈ Pos(γ∞(s)). By induction, the subtree at position ν(α ) ∈ Pos(γ∞(s)) has γ(uk) as an IO-prefix; and every leaf i` ∈ leafi(γ(uk)) defines a unique position in γ∞(s) in the 0 subtree at position ν(α ). We choose ν(α) as a leaf in γ(uk). For that we choose any leaf

ij ∈ leafi(γ(uk)) such that cρk (ij) = ψi(q) where ψi(q) refers to the task τρk defined in (17). Hence, the minimal color on the path from the root in γ(uk) to the chosen leaf ij induced by the run ρk is the greatest among all these colors among all i` ∈ leafi(γ(uk)) with respect 0 to best ordering. Since the root of γ(uk) is ν(α ) we can define ν(α) by the corresponding position of the chosen position ij. The subtree of γ∞(s) at position ν(α) has the tree γ(uk.i) as an IO-prefix. Hence, ρ induces a run ρ(i, q) ∈ Run(γ(uk.i), q); and (uk.i, q, ρ(i, q)) is defined by (i, q) ∈ Iα. Note that α has a child if and only if Iα 6= ∅. If Iα = ∅ the sequence α is a leaf in the corresponding tree. The definition of the tree of β-sequences is defined analogously. We let k ∈ N and assume 0 0 0 that a sequence β = (u0, p0, ρ0) ··· (uk, pk, ρk) is already defined such that ρi = θ(ρi) for all 1 ≤ i ≤ k. The mapping θ was defined in (18). By induction, we also know a position 0 ν(β) ∈ Pos(γ∞(s)). It might be that β has infinitely many children because we do not choose a particular leaf (as we did for α-sequences) but have to consider all leaves ij. More precisely, we define

0 0 0 Iβ = {(i, q, ij) ∈ H × Q × leafi(γ (uk)) | ij ∈ leafi(γ (uk)) ∧ ρk(ij) = q} . (20)

S 0 The set Iβ is in a canonical bijection with the union {leafi(γ (uk)) | i ∈ H}. For each 0 (i, q, ij) ∈ Iβ we define the run ρ (i, q) = θ(ρ(i, q)). (The run ρ(i, q) was defined above for the 0 α-sequence by choosing a particular leaf.) Thereby we obtain a triple (uk.i, q, ρ (i, q)). For 0 0 0 0 β = β · (uk.i, q, ρ (i, q)) the position ν(β ) ∈ Pos(γ∞(s)) is given in analogy to the position 0 0 0 ν(α ). That is, we consider the position ν(β) ∈ Pos(γ∞(s)). The subtree ν(β) has γ (uk) as 0 an IO-prefix; and we let ν(β ) = ν(β(i, q, ij)) the position which corresponds to the leaf ij, 0 Since Iβ is in a canonical bijection with the set of all leaves in γ (uk) labeled by some hole 0 0 i ∈ H, we see that we actually have defined runs ρn : Pos(γn(s)) → Q for all n ∈ N, hence a 0 0 run ρ : Pos(γ∞(s)) → Q. It remains to show that the run ρ0 is accepting. Since it is a run, it is enough to consider infinite paths w0. These paths result from a finite or infinite sequence β = 0 0 0 (u0, p0, ρ0)(u1, p1, ρ1) ··· . If the infinite path w is defined by some finite prefix of β, then 11 0 it is accepting because every finite prefix of β defines an accepting path of γ∞(s) at p0.

11 Accepting paths were defined in Def. 7. C. Camino. V. Diekert, B. Dundua, M. Marin, and G. Sénizergues 17

0 γ∞(s) γ∞(s) 0 u0 γ (u0) ≡B γ(u0) u0 0 u1 γ (u1) ≡B γ(u1) u1 ρ0 ρ 0 un γ (un) ≡B γ(un) un . . . .

Figure 8 The ui ∈ Pos(s) are the top positions of the small triangles according to Fig. 7. The run ρ defines a run ρ0. It satisfies the parity condition because every path in tree ρ has this property.

So, we may assume that β is infinite, too. Hence, w0 can be cut into infinitely many finite 0 0 0 0 0 paths w = w0w1 ··· where each wi is a finite path inside γ (ui). Each finite path belongs 0 0 0 0 to a run ρv0 of γ (u) at some position v ∈ Pos(γ∞(s)). It corresponds to the some run ρv of γ(u) at some position v ∈ Pos(γ∞(s)). In this way we obtain an infinite path in w = w0w1 ··· in γ∞(s) where each wi is a path in γ(ui). Recall that β was defined together 0 with a corresponding sequence α = (u0, p0, ρ0)(u1, p1, ρ1) ··· such that ρi = θ(ρi). Every 0 0 path in tree ρ : Pos(γ∞(s)) can be mapped to some accepting path in ρ : Pos(γ∞(s)). This 0 situation is depicted in Fig. 8. The task τv defined by ρv is satisfied by ρv0 . Hence, the 0 minimal color seen on the path from the root in γ (u) to a leaf ij is never greater in the best-ordering than the minimal color seen on the path from the root in γ(u) to a leaf ij. As a consequence, the minimal color seen infinitely often on the run defined by the path w0 is not greater in the best-ordering than the one defined by w. Since the minimal color seen infinitely often defined by w is even, the same is true for the minimal color seen infinitely often defined by w0. Let’s repeat the essential steps in the proof by looking at Fig. 8. A high-level interpretation of the picture shows us the top levels of two trees of possible infinite trees where the degree of nodes on the right is finite whereas the degree on of nodes on left can be infinite. On the 0 other hand, γ∞(s) is just a tree where every vertex has at most rmax children. In order to see 0 0 whether γ∞(s) ∈ L(B, p0) we have to consider all directed infinite path in γ∞(s). The left 0 tree in the picture shows such a path p . This path maps to a unique path in γ∞(s) which is chosen such that acceptance is made most difficult, but still the acceptance condition holds because we γ∞(s) has an accepting run ρ. We use the small triangles on the right to define local runs on the small triangles on the left. This is possible because γ(u) and γ0(u) satisfy the same set of tasks. The final step is to understand that the local runs on the left glue 0 together to an accepting run of γ∞(s). This was the goal. J

4.1 The saturation of σ

We continue with parity-NTAs B = (Q, Σ, δ, χ) and BH = (Q, Σ ∪ H, δH , χ) where, as above, δH = δ ∪ (Q × H).

T (Σ∪H)\H I Definition 14. Let σ : X → 2 be a substitution. The saturation of σ (w.r.t. B) T (Σ∪H)\H is the substitution σb : X → 2 defined by

0 0 σb(x) = {t ∈ T (Σ ∪ X ) \ H | ∃t ∈ σ(x): t ≡B t} .

Cleary, σ ≤ σb and σb(x) 6= ∅ imply σb(x) 6= ∅.

I Proposition 15. Let q0 ∈ Q be a state of the NTA B and L ⊆ T (Σ ∪ X ) be any subset. Then σb(x) is regular for all x ∈ X . Moreover, σio(L) ⊆ L(BH , q0) ⇐⇒ σbio(L) ⊆ L(B, q0). 18 Regular matching problems for infinite trees

S 0 0 Proof. We have σb(x) = π∈{π(t) | t∈σ(x)} {t ∈ T (Σ ∪ H) | π(t ) = π} . This is a finite union. Thus, σb(x) is regular by Cor. 12. Moreover, if σbio(L) ⊆ L(B, q0), then σio(L) ⊆ σbio(L) ⊆ L(B, q0) ⊆ L(BH , q0). Hence, let σio(L) ⊆ L(BH , q0). Since σio(L) ⊆ T (Σ), we may suppose σio(L) ⊆ L(B, q0). It suffices to show that, if s ∈ T (Σ ∪ X ) and σio(s) ⊆ L(B, q0), 0 0 0 then σbio(s) ∈ L(B, q0), that is, γ∞(s) ∈ L(B, q0) for all γ ∈ Γ(s, σb). Let γ ∈ Γ(s, σb). Then, by definition of σb, for every u ∈ Pos(s), where σ(s(u)) 6= ∅, we can choose a tree 0 γ(u) = t ∈ σ(s(u)) such that γ (u) ≡B γ(u). Thus, γ ∈ Γ(s, σ) and γ∞(s) ∈ L(B, q0) because 0 σio(s) ⊆ L(B, q0). By Prop. 11, we have γ∞(s) ∈ L(B, q0) too. Hence, σbio(s) ⊆ L(B, q0) as desired. J

T (Σ∪H)\H I Corollary 16. Let σ1, σ2 : X → 2 be regular substitutions such that σ1 ≤ σ2, R ⊆ T (Σ) be a regular tree language, and L ⊆ T (Σ ∪ X ) be any subset. Then there is an effectively computable finite set S2 of regular substitutions such that the following property T (Σ∪H)\H holds. For every σ : X → 2 satisfying both, σ1 ≤ σ ≤ σ2 and σio(L) ⊆ R 0 0 (resp. σio(L) = R), there is some maximal substitution σ ∈ S2 such that σio(L) ⊆ R 0 0 (resp. σio(L) = R) and σ ≤ σ ≤ σ2.

Proof. We may assume that R = L(B, q0). Throughout the proof, given any substitution σ, the notation σb refers to its saturation according to Def. 14. Moreover, by Prop. 15, every σb is a regular substitution and if σio(L) ⊆ R, then σio(L) ⊆ σbio(L) ⊆ R. Based on these facts, we define in a fist step the following set of substitutions.

T (Σ∪H)\H S0 = {σb | σ : X → 2 is a substitution}. (21)

|PB ×X | The set S0 is finite and effectively computable. Its cardinality is bounded by 2 . Since σ1 and σb are regular for every σb ∈ S0, we can effectively compute the following subset S1 of S0, too.

T (Σ∪H)\H S1 = {σb ∈ S0 | σ : X → 2 ∧ σ1 ≤ σ} = {σb ∈ S0} σ1 ≤ σ.b (22)

T (Σ∪H)\H The second equation in (22) uses σb = σb. Now, for every substitution σ : X → 2 satisfying σ1 ≤ σ ≤ σ2 we have σb ∈ S1. But σb ≤ σ2 might fail. However, since σ2 is regular, 0 0 we can calculate for each σb ∈ S1 the substitution σ such that σ (x) = σb(x) ∩ σ2(x) for all x ∈ X . Hence, the following finite set is effectively computable:

0 T (Σ∪H)\H 0 S2 = {σ : X → 2 | ∃ σb ∈ S1 ∀x ∈ X : σ (x) = σb(x) ∩ σ2(x)}. (23)

An easy reflection shows that for all σ such that σ1 ≤ σ ≤ σ2 and σio(L) ⊆ R there is some 0 0 0 substitution σ ∈ S2 such that σ ≤ σ ≤ σ2 and σio(L) ⊆ R. Since S2 is finite, there is a 0 0 0 maximal element σ in S2 such that σ ≤ σ ≤ σ2 and σio(L) ⊆ R. The assertion of Cor. 16 follows. J

I Remark 17. Note that Cor. 16 does not tell us how to decide whether there exists any substitution σ such that σio(L) ⊆ R or σio(L) = R. However, it tells us that if there exists such a substitution σ, then there there exist such a substitution in an effectively computable and finite set of regular substitutions. If this set is empty, then σio(L) ⊆ R does not hold. J

4.2 The specialization of σ

The purpose of this section is to add for each profile π = π(t) a specific finite tree tπ which satisfies the same profile and which depends on π, only. The roadmap for that is as follows. We enlarge the set Σ by various new function symbols in order to obtain a larger ranked alphabet ΣP . In particular, for each π ∈ PB there is a function symbol fπ and a finite tree tπ with tπ(ε) = fπ. The tree tπ is either a constant or it has height 2. Simultaneously, we C. Camino. V. Diekert, B. Dundua, M. Marin, and G. Sénizergues 19

define an extension BP of BH such that for each t ∈ T (Σ ∪ H) with π = π(t) (w.r.t. the NTA BH ) we have π = π(t) = π(tπ) where π(t) and π(tπ) are defined with respect to BP . The application is for deciding whether a regular substitution σ : X → 2T (Σ∪H)\H satisfies σio(L) ⊆ L(B, q0). In the decision procedure, we compute, in a preprocessing phase, for each x the set of profiles Πx = {π(x) | t ∈ σ(x)}. The set Πx is defined w.r.t. BH , but this is the same as being defined w.r.t. BP . Hence, we can also write Πx = {π(x) | t ∈ σˇ(x)} where σˇ(x) is a finite set of finite trees tπ of height at most 2. The substitution σˇ yields the “specialization” of σ; and it is enough to decide σˇio(L) ⊆ L(B, q0). The formal definitions of fπ, ΣP and BP are in Def. 18 and Def. 19. Since every symbol fπ has a rank which only depends on π, but which will be defined through a tree t where 0 0 π = π(t), it is important to note that ∅= 6 π(t) = π(t ) implies leafi(t) 6= ∅ ⇐⇒ leafi(t ) 6= ∅ for all holes i ∈ H. This follows because if ∅= 6 π(t), then leafi(t) 6= ∅ if and only if for all tasks (p, ψ1, . . . , ψ|H|) ∈ π there is some q ∈ Q such that ψi(q) > 0. Indeed, let ρ ∈ Run(t, p) such that ρ |= τ. Then for all ij ∈ leafi(t) the run ρ yields some state ρ(ij) ∈ Q. Since 0 0 χρ(ij) > 0 we must have ψi(ρ(ij)) > 0. In particular, ∅= 6 π(t) = π(t ) implies H(t) = H(t ). Thus, for π ∈ P: if π = π(t), then H(t) depends on π but not on the chosen t.

I Definition 18. 1. For each t ∈ T (Σ ∪ H) we denote by H(t) the subset of holes i ∈ H such that leafi(t) 6= ∅. 2. The alphabet ΣP contains Σ and it contains, in addition, first a new function symbol $i of rank 1 for each i ∈ H, and second, for each t ∈ T (Σ∪H) and each profile ∅= 6 π = π(t) ∈ P there is a new function symbol fπ of rank |Q × H(t)|. In particular, if t is without holes, then fπ is a constant. We also include a new constant f∅ in case ∅ ∈ PB (but we will make sure that no run at any state of BP accepts f∅). 3. For each ∅= 6 π ∈ P we choose any tree t ∈ T (Σ ∪ H) such that π = π(t). Then we define the finite tree tπ which has the following structure. Its root is labeled by the symbol fπ, If t has no holes, then fπ is a constant. In the other case it has height 2. The children of the root are labeled by $i where i ∈ H(t); and if a vertex is labeled by $i, then its children are holes ij (and thus labeled by i ∈ H(t)). There are k = |Q × H(t)| holes, all of them belong to H(t), and each hole i ∈ H(t) appears exactly |Q| times (in some fixed order). The corresponding tree is depicted in Fig. 9. For H(t) 6= ∅ we have k ≥ 1, for H(t) = ∅ we have k = 0 and the picture shows the constant fπ.

We are ready to define an extension BP of the automaton BH such that for all t ∈ T (Σ ∪ H) the finite tree tπ satisfies t ≡BP tπ if and only if π(t) = π.

0 0 I Definition 19. We let BP = (Q , ΣP ∪ H, δP , χ ) denote the following parity-NTA. It is 0 an extension of BH . The set of states is Q = Q ∪ (C × Q × H). The coloring χ is extended to states in C × Q × H by χ0((c, q, h)) = c. The set of transitions δP of BP contains all transitions from BH (that is: δH = δ∪Q×H) and, in addition, for each ∅= 6 π(t) ∈ PB the following set of tuples.  Transitions p, fπ, (ps,j)(s,j)∈Q×H(t) for τ = (p, ψ1, . . . , ψ|H|) ∈ π such that

{ps,j | (s, j) ∈ Q × H(t)} = {(ψi(q), q, i) | q ∈ Q, i ∈ H(t), ψi(q) 6= 0} . (24)

The sequence (ps,j)(s,j)∈Q×H(t) is a tuple of length rk(fπ) and each state ps,j is a triple such that Eq.(24) holds. The tuple allows repetitions of entries ps,j.  Transitions (c, q, i), $i, q for all (c, q, i) ∈ C × Q × H(t).

The cardinality of the set {(ψi(q), q, i) | q ∈ Q, i ∈ H(t), ψi(q) 6= 0} in (24) varies. It can be any number between |H(t)| and rk(fπ); and it is not necessarily the same for all τ ∈ π. This is the reason, why triples (ψi(q), q, i) may occur several times. Note that for every τ ∈ π  there exists some transition p, fπ, (ps,j)(s,j)∈Q×H(t) in δP . 20 Regular matching problems for infinite trees

tπ = fπ p ∈ Q

$h1 ··· $hk (ψh1 (q1), q1, h1) ··· (ψhk (qk), qk, hk)

··· ··· h1 ∈ H(t) hk ∈ H(t) q1 ∈ Q qk ∈ Q

Figure 9 The tree tπ where k = rk(fπ) and an accepting run ρ on tπ for τ = (p, ψ1, . . . , ψ|H|) ∈ π.

I Lemma 20. Let t ∈ T (Σ ∪ H), π = π(t) ∈ PB, and tπ as defined above. Then for all tasks

τ ∈ TBP we have tπ |= τ ⇐⇒ τ ∈ π. In particular, t ≡BP tπ and π(tπ) = π.

Proof. For each τ = (p, ψ1, . . . , ψ|H|) ∈ π the NTA BP admits for tπ a successful run at p. This is clear for H(t) = ∅ because then tπ is a constant and (p, tπ) ∈ δP . For H(t) 6= ∅ the accepting run is depicted on the right of Fig. 9. It is accepting because Q × H ⊆ δH ⊆ δP . Hence, τ ∈ π implies tπ |= τ. The converse follows from the definition of δP . In particular, 0 tπ |= τ implies τ ∈ TB. Since Σ ⊆ ΣP and Q ⊆ Q , we have t ∈ T (ΣP ∪ H) and TB ⊆ TBP

and therefore, t ≡BP tπ. J

−1 −1 Prop. 22 reduces the computation of σio (R) to the computation of σˇio (R). Here, σˇ is the specialization of σ according to Def. 21 such that σˇ(x) is a subset of the finite set of finite trees {tπ | π ∈ PB}.

T (Σ∪H) I Definition 21. Let σ : X → 2 be a substitution and π ∈ P be a profile. The T (ΣP ∪H) specialization σˇ : X → 2 w.r.t. BP is defined by the substitution

σˇ(x) = {tπ ∈ T (ΣP ∪ H) | ∃t ∈ σ(x): π = π(t)} . (25)

Here, tπ is the tree defined according to Def. 18. That is, if π = π(t) for some t without holes, then tπ = fπ. Otherwise k ≥ 1 and tπ has height two according to Fig. 9 on the left-side.

Note that σ(x) = ∅ ⇐⇒ σˇ(x) = ∅ and t∅ ∈ σˇ(x) ⇐⇒ ∃t ∈ σ(x): t |= ∅.

0 0 I Proposition 22. Let R = L(B, q0) ⊆ T (Σ) and let ΣP and BP = (Q , ΣP ∪ H, δP , χ ) according to Def. 19. Let σ : X → 2T (Σ∪H)\H be any substitution and σˇ : X → 2T (ΣP ∪H) its specialization according to Def. 21. Then we have

{s ∈ T (Σ ∪ X ) | σio(s) ⊆ L(B, q0)} = {s ∈ T (Σ ∪ X ) | σˇio(s) ⊆ L(BP , q0)} . (26)

Proof. The equality {s ∈ T (Σ ∪ X ) | σio(s) ⊆ L(B, q0)} = {s ∈ T (Σ ∪ X ) | σbio(s) ⊆ L(BP , q0)} follows directly from the definition of BP . We apply Prop. 15 twice, once to B and once to BP . We obtain {s ∈ T (Σ ∪ X ) | σio(s) ⊆ L(BP , q0)} = {s ∈ T (Σ ∪ X ) | σio(s) ⊆ L(BP , q0)} as n o b

well as s ∈ T (Σ ∪ X ) σbˇio(s) ⊆ L(BP , q0) = {s ∈ T (Σ ∪ X ) | σˇio(s) ⊆ L(BP , q0)} . Note that σ (s) refers to σ(x) = {t0 ∈ T (Σ ∪ H) | ∃t ∈ σ(x): t0 ≡ t}. That is to bio b P BP

the saturation with respect to terms in T (ΣP ∪ H). By Lem. 20, we have tπ ≡BP t for every t ∈ σ(x) and profile π = π(t). Hence, {s ∈ T (Σ ∪ X ) | σio(s) ⊆ L(BP , q0)} = n o b

s ∈ T (Σ ∪ X ) σbˇio(s) ⊆ L(BP , q0) . Putting everything together we see

{s ∈ T (Σ ∪ X ) | σio(s) ⊆ L(B, q0)} = {s ∈ T (Σ ∪ X ) | σbio(s) ⊆ L(BP , q0)} n o

= s ∈ T (Σ ∪ X ) σbˇio(s) ⊆ L(BP , q0) = {s ∈ T (Σ ∪ X ) | σˇio(s) ⊆ L(BP , q0)} .

The last line uses Prop. 13. The result follows. J C. Camino. V. Diekert, B. Dundua, M. Marin, and G. Sénizergues 21

5 Parity games

A directed (multi-)graph is a tuple G = (V,E, ι, o) where V is the set of vertices and the sets ι and o are functions from E to V . Here, ι(e) denotes the source of e and o(e) denotes the target of e. In our paper an arena is any tuple A = (V0,V1,E, ι, o) such that (V,E, ι, o) is a directed graph where V = V0 ∪ V1 and V0,V1 are disjoint. Phrased differently, every vertex is in exactly one subset Vi of V and we allow multiple edges between vertices. The maps ι and o are extended to all paths of the graph A in the natural way: if w = p0, e1, p1, . . . , em, pm is a path of A where m ≥ 0 and pi−1 = ι(ei), pi = o(ei) for all 1 ≤ i ≤ m, then we let ι(w) = p0 and o(w) = pm. A (board of a) parity game is defined by a pair (A, χ) where A is an arena and χ : V → C is a mapping to a set of colors C. Without restriction, we always assume that C = {1,..., |C|} and that |C| is odd. Let p0 ∈ V .A game at p0 is a finite or infinite sequence p0, e1, p1, e2 ... such that pi−1 = ι(e1), pi = o(ei) and ei ∈ E for i ≥ 1. Moreover, we require that a game is infinite unless it ends in 12 a sink. A sink is a vertex without outgoing edges . There are two players: P0 (Prover) and P1 (Spoiler). The rules of the game are as follows. It starts in the vertex p0. Let m ≥ 0 such that a path p0, e1, p1, . . . , em, pm is already defined with pm ∈ Vi. If pm is a sink, then player Pi loses. In the other case player Pi chooses an edge em+1 ∈ E such that pm = ι(em+1). Let pm+1 = o(em+1), then the game continues with p0, . . . , pm, em+1, pm+1. If the game does not end in a sink, then the mutual choices define an infinite se- quence. Prover P0 wins an infinite game if the least color which appears infinitely often in χ(p0), χ(p1),... is even. Otherwise, Spoiler P1 wins that game. A positional strategy for player Pi is a subset Ei ⊆ E such that for each u ∈ Vi there is at most one edge e ∈ Ei with ι(e) = u. Hence, every e ∈ Ei is an edge of the arena. However, the property that Ei is winning depends on the pair (ι(e), o(e)), only. Therefore, we can assume Ei ⊆ Vi × V . Each positional strategy defines a subarena Ai = (V,Ei ∪ {e ∈ E | ι(e) ∈ V1−i}). In the arena Ai player P1−i wins at a vertex p if and only if there exists some path starting at p which satisfies his winning condition defined above. Let Wi ⊆ V be the set of vertices p ∈ V of Ai where no path in Ai starting at p satisfies the winning condition of P1−i. Note that Wi depends on the positional strategy Ei. The set Wi contains all vertices where player Pi wins positional (also called memoryless) by choosing Ei. The set Wi is a set of winning positions for Pi in the original arena A, because player Pi is able to win, no matter how P1−i decides for pm ∈ V1−i on the next outgoing edge e ∈ E with ι(e) = pm.

I Theorem 23 ([17]). There exist positional strategies Ei = Vi × V ⊆ E for both players Pi such that their sets of winning positions Wi ⊆ V (w.r.t. the subarenas Ai) satisfy W1−i = V \ Wi. That is, V is the disjoint union of W1−i and Wi.

The theorem implies that for parity games there is no better strategy than a positional one. Thm. 23 is due to Gurevich and Harrington. Simplified proofs are e. g. in [15, 28].

5.1 Alternating tree automata

Let ∆ be a finite ranked alphabet with the rank function rk : ∆ → N. Nondeterministic tree automata are special instances of alternating tree automata. ATAs were introduced in [22] and further investigated in [23]. A parity-ATA for ∆ is a tuple A = (Q, ∆, δ, χ) where Q is a finite set of states, δ is the transition function, and χ : Q → C is a coloring with C = {1,,..., |C|} and where, without restriction, |C| is odd. For each (q, f) ∈ Q × ∆ there is exactly one transition which has the form q, f, Φ) where Φ is a positive Boolean formula

12 Some papers require that an arena has no sinks. 22 Regular matching problems for infinite trees

over the set [rk(f)] × Q and, moreover, Φ is written in disjunctive normal form13. This means that q, f, Φ) ∈ δ is written as _ ^  q, f, (dk, pk) (27)

j∈J k∈Kj

where J and Kj are finite sets and (dk, pk) ∈ [rk(x)] × Q. The first component dk ∈ [rk(f)] is also called a direction. With each j ∈ J there is an associated finite set of indices Kj. We use a syntax for Boolean formulae defined by a context-free grammar, but in the notation redundant brackets are typically removed as we did in (27). The syntax still allows repetitions of pairs (dk, pk). For example, we may have Kj = {k, `} with a conjunction  (dk, pk) ∧ (d`, p`) where (dk, pk) = (d`, p`). This will have no influence on the semantics which we define next, but it will introduce “multiple edges” in the corresponding game-arena, which will make the set of choices for Spoiler larger.14 By definition, let p ∈ Q, then L(A, p) is the set of trees s ∈ T (∆) such that Prover P0 has a winning strategy the following parity-game in the arena G(A, s) at vertex (ε, p). The set V0 of vertices belonging to Prover of the arena G(A, s) is defined by V0 = Pos(s)×Q. The color of (u, q) is χ(q). Next we define the set E0 of outgoing edges at (u, q) ∈ V0 by set of all (u, q, j) with j ∈ J and where J appears under the disjunction in (27) of the unique transition q,s (u), W V (d , p ) ∈ δ. We let ι(u, q, j) = (u, q) and we let j∈J k∈Kj k k o(u, q, j) = (u, q, j). We define the set of positions belonging to Spoiler by V1 = o(E0) and we let χ(u, q, j) = χ(q). Thus, target function o : E0 → V1 is surjective by definition and every vertex in V1 has an incoming edge. Note that for (u, q) ∈ V0 and any v ∈ V1 there is at most one edge e ∈ E with ι(e) = (u, q) and o(e) = v. No multiple edges occur here. This changes when we define the outgoing edges for vertices (u, q, j) ∈ V1. The outgoing edges are the quadruples (u, q, j, k) where k ∈ Kj for index sets under the conjunction in the transition q, s(u), W V (d , p ) ∈ δ. We define ι(u, q, j, k) = (u, q, j). The index k defines a j∈J k∈Kj k k pair (dk, pk) and then we let o(u, q, j, k) = (u.dk, pk). Thus, there can be many edges with the source (u, q, j) ∈ V1 and target (u.dk, pk). Thanks to the condition (dk, pk) ∈ [rk(x)] × Q we are sure that the position u.dk exists as soon as Kj 6= ∅. Let us phrase the definition of the arena from the perspective of the players if the game starts at some vertex (u, q) ∈ V0. Then Prover chooses, if possible, an index j ∈ J according to set E0. If J is empty, then Prover loses. (An empty disjunction is “false”.) In the other case, it is the turn of Spoiler P1, and the game continues at the vertex (u, q, j) belonging to P1 with color χ(q). If Kj is empty, then Spoiler loses. (An empty conjunction is “true”.) Otherwise, Spoiler chooses an index k ∈ Kj and the game continues at the vertex (u.dk, pk) ∈ V0 = Pos(s) × Q (which exists). That is, the game continues at the position of the dk-th child of u at state pk. Prover wins an infinite game if and only if the least color occurring infinitely often is even. A parity-NTA A is a special instance of an alternating automaton, where every transition W V  as in Eq.(27) has the special form q, f, j∈J i∈[rk(f)](i, pi) . For convenience we use only the traditional syntax that δ is a set of tuples q, f, p1, . . . , prk(f)). (28) The main results of alternating tree automata can be formulated as follows.

I Theorem 24 ([22, 23]). Let A be a parity-NTA as defined in Sec. 3.1 and p be a state. Then the following assertions hold.

13 Let S be any (finite) set. Then the set of positive Boolean formulae B+(S) is defined inductively. 1. The symbols ⊥ and > and all s ∈ S belongs to B+(S). (The elements of S are the atomic propositions.) 2. If ϕ, ψ ∈ B+(S), then (ϕ ∨ ψ) ∈ B+(S) and (ϕ ∧ ψ) ∈ B+(S). 14 This larger set of choices is explicitly used in the definition of wg in the proof of Lemma 26 C. Camino. V. Diekert, B. Dundua, M. Marin, and G. Sénizergues 23

1. Viewing A as a special instance of a parity-ATA using (27) or using the traditional syntax (28) and its semantics according to Def. 7 yields the same set L(A, p). 2. Parity-ATAs characterize the class of regular tree languages: if L(A, p) is defined by a parity-ATA A at a state p, then we can construct effectively a parity-NTA B and a state q such that L(A, p) = L(B, q).

−1 5.2 Alternating tree automata for languages of type σio (R)

Thanks to Prop. 22 we know that σio(L) ⊆ L(B, q0) ⇐⇒ σˇio(L) ⊆ L(B, q0) where σˇio is the specialization of σ. Since σˇio(x) is, by construction, a finite set of finite trees, is enough to consider the concept of alternating tree automata in the setting where there is a finite set S T ⊆ Tfin(Σ ∪ H) \ H of finite trees such that {σ(x) | x ∈ X } ⊆ T . This is not essential but allows to use the traditional definition in (27) for transitions where the index sets J and Kj are assumed to be finite. I Remark 25. Recall that a choice function is defined for a substitution and yields an element Q Pos(s) in the cartesian product s∈T (Σ∪X ) T⊥(Σ ∪ H) . Let us show that the set of mappings Q Pos(s) X → T (Σ ∪ H) can be viewed as a subset of s∈T (Σ∪X ) T⊥(Σ ∪ H) . Indeed there is a natural embedding Y ι : T (Σ ∪ H)X → T (Σ ∪ H)Pos(s) (29) s∈T (Σ∪X )

Indeed, if ϕ ∈ T (Σ ∪ H)X and x = s(u), then ι(ϕ)(x) is defined canonically by ι(ϕ)(s(u)) = ϕ(x) for x ∈ X and by ι(ϕ)(s(u)) = f(1,..., rk(f)) for x = f ∈ Σ \X . It is also clear that ι is an embedding since x(1,..., rk(x)) ∈ T (Σ ∪ X ). Moreover, if ϕ ∈ T (Σ ∪ H)X is a homomorphism and γ = ι(ϕ), then ϕio(s) = {γ∞(s)} is a singleton. J

I Lemma 26. Let R ⊆ T (Σ) be a regular tree language and ϕ : X → Tfin(Σ ∪ H) \ H be a homomorphism to the set of finite trees. Let γ = ι(ϕ) be its canonically associated choice function as defined in (29). Then the set of trees

−1 ϕio (R) = {s ∈ T (Σ ∪ X ) | γ∞(s) ∈ R} is effectively regular.

Proof. We have R = L(B, q0) for some parity-NTA B = (Q, Σ, δ, χ) and q0 ∈ Q. Thus, it is enough to show that the set {s ∈ T (Σ ∪ X ) | γ∞(s) ∈ L(B, p)} is effectively regular for all p ∈ Q. As usual, we assume that χ : Q → {1,..., |C|} where |C| is odd. Let us define another parity-ATA A = (Q × C, Σ ∪ X , δA, χA) (where χA(q, c) = pr2(q, c) = c) such that for all p ∈ Q we have

{s ∈ T (Σ ∪ X ) | γ∞(s) ∈ L(B, p)} = L(A, (p, χ(p))). (30)

The set δA is defined by the following transitions _ ^  (p, c), x, i, (ρ(ij), cρ(ij)) (31) i ∈K ρ∈RunBH (γ(x),p) j ρ

The notation in (31) is as follows: x ∈ Σ ∪ X with γ(x) = x(1,..., rk(x)) and

Kρ = {ij ∈ Pos(γ(x)) | ∃ i ∈ H : ij ∈ leafi(γ(x))} (32)

Recall that for a tree t ∈ T (Σ ∪ H) and a run ρ ∈ RunBH (t, p), the color cρ(ij) denotes the minimal color in the tree ρ(t) which appears on the unique path from the root ε to the leaf ij.

If RunBH (γ(x), p) = ∅, then the disjunction is over the empty set, hence the transition in 24 Regular matching problems for infinite trees

(31) becomes (p, x, false). If RunBH (γ(x), p) 6= ∅ but γ(x) is a tree without holes, then the transition in (31) becomes (p, x, true).

Using Thm. 24 it is enough to show that for all p ∈ Q we have γ∞(s) ∈ L(B, p) if and only if Prover P0 has a winning strategy in the associated parity game for the arena G(A, s) at vertex (ε, (p, χ(p))) where A is the parity-ATA above. Note that the index set Kρ allows Spoiler to pick any hole which appears in γ(x).

One direction is easy. If γ∞(s) ∈ L(B, p), then there exists an accepting run ρ0 ∈ RunB(γ∞(s), p) and Prover can react to all choices of Spoiler by choosing ρ = ρx,p in the disjunction in (31) as parts of the global run ρ. Note that due to the second component cρ(ij) in the states, every infinite game α corresponds to a unique infinite directed path in ρ0(γ∞(s)) such that the minimal color occurring infinitely often on that infinite path is the same color as the the minimal color occurring infinitely often in game α. Prover’s winning strategy can also be explained by looking at the right tree depicted in Fig. 8. Prover begins by choosing the partial run of ρ0 on the top triangle. Spoiler has no other option than to choose some leaf ij. This leads to the second triangle, and Prover chooses again the partial run according to ρ0 etc. Since γ∞(s) ∈ L(B, p), Spoiler cannot win. For the other direction, let us assume that Prover has a winning strategy in the arena G(A, s) at vertex (ε, (p, χ(p))). Then Prover P0 has a positional winning strategy by [17].

We will show that this winning strategy of P0 defines an accepting run ρ0 ∈ RunB(γ∞(s), p). The positional strategy chosen by Prover P0 defines for each vertex belonging to P0 at most one outgoing edge, all other outgoing edges at that vertex are deleted from the arena. After the deletion of edges, Prover is released and without any further interaction in the game. The game becomes a solitaire game where the outcome depends only on the choices of Spoiler. Henceforth, all games are defined in a subarena G0(A, s) where all games at vertex (ε, (p, χ(p))) are won by Prover. Moreover, we may assume that all vertices in G0(A, s) are reachable from the vertex (ε, (p, χ(p))). Thus, without restriction, every vertex in G0(A, s) belonging to Prover has exactly one outgoing edge. In particular, since all nodes are reachable from (ε, (p, χ(p))), all finite games starting at any vertex in G0(A, s) are won by Prover, and if α is any directed infinite path starting at any vertex in G0(A, s), then the minimal color occurring infinitely often at vertices of α is even. Whenever a game reaches a vertex (u, (q, c)) via a finite directed path in G0(A, s) starting

at (ε, (p, χ(p))), then RunBH (γ(s(u)), q) 6= ∅, and the unique outgoing edge defines a run

ρ ∈ RunBH (γ(s(u)), q) 6= ∅ which depends on (u, q, c), only. For better reading we denote 15 the run ρ as ρ(u, q, c), too . Note that, if (q, c) = (ρ(ij), cρ(ij)), then the run ρ(u, q, c) does not reveal the position of the leaf ij, in general. That implies that P0 chose ρ(u, q, c) without knowing i or ij, in general. The outgoing edge (which defines ρ) ends at the vertex (u, (q, c), ρ) belonging to spoiler. In the following we assume without restriction that all games α start at vertices belonging to P0. Moreover, if α is finite, then α ends in a vertex belonging to Spoiler.

Our aim is construct an accepting run of γ∞(s). The run is constructed top-down by

induction. We use the following notation. Let u ∈ Pos(s). We say that ρ ∈ RunBH (γ(s(u)), q) is a partial run of the subtree γ∞(s|u) if the run labels a prefix closed subset of positions in γ∞(s|u) with states. The partial run labels the root with q. (Recall that (s|u) denotes the subtree of s rooted at u.) A decision sequence for Spoiler is a finite or infinite sequence

g(α) = (g1, g2,...) where each g` is of the form g` = (h`)j` with h` ∈ H for all ` ≥ 1 such that g(α) records all choices of Spoiler in the game α. The full information about the solitaire game α in G0(A, s) starting at (ε, (p, χ(p)) is encoded in the sequence g(α).

15 Thus, the same letter ρ denotes a function or the specific value of that function. The context makes clear what we mean. C. Camino. V. Diekert, B. Dundua, M. Marin, and G. Sénizergues 25

Let D(s) be set of all decision sequences of Spoiler. In the next step, we define four items for every finite prefix g = (g1, . . . , gk) of a decision sequence g(α) in D(s).

A path wg in γ∞(s). The other items are defined through the path wg. A position v(wg) in γ∞(s). A “terminal” position o(wg) = (u, (q, c)) in V0 = Pos(s) × (Q × C) such that the tree γ(s(u)) appears as an IO-prefix of the subtree γ∞(s|u).

For o(wg) = (u, (q, c)) a partial run ρg ∈ RunBH (γ(s(u)), q) of the subtree γ∞(s|u).

If g is the empty sequence, then no choice of Spoiler has been recorded. We let wg be the empty path, v(wg) = ε ∈ Pos(γ∞(s)), and o(wg) = (ε, (p, χ(p)) ∈ V0. The game α starts at that vertex (ε, (p, χ(p)) belonging to P0. The game defines a unique run ρ1 ∈ RunBH (γ(s(ε)), p). If no hole appears in γ(s(ε)), then γ∞(s) is equal to γ(s(ε)). We are done in this case: knowing ρε = ρ1, all four items are defined if g is the empty sequence. In the other case there exist some i ∈ H and ij ∈ leafi(γ(s(u))). Thus, the game α is 0 not finished; and Spoiler decides on some leaf labeled by a hole. Let g = (g1, . . . , gk+1) be corresponding prefix. By induction, wg is defined for g = (g1, . . . , gk). Let o(wg) = (u, (q, c)), and x = s(u). Hence, there is a tree tu = γ(x), and, by induction, tu is an IO-prefix of the subtree γ∞(s)|v of γ∞(s) rooted at the position v = v(wg) in γ∞(s). Note that u ∈ Pos(s) and v ∈ Pos(γ∞(s)). In particular, since tu is an IO-prefix, every position v in tu can be identified with a unique position pos(v) ∈ Pos(γ∞(s)) by the endpoint of the path wg.wu(v). Here, wu(v) denotes the unique directed path in tu from the root to the position v in tu. The choice gk+1 of spoiler defines a position ij ∈ leafi(tu) for some i ∈ [rk(x)]. We let wg0 be the unique path which starts at the the root has wg as prefix and stops in pos(ij) ∈ Pos(γ∞(s)). Then we define v(wg0 ) = pos(ij). It is the endpoint of the path wg0 . The unique outgoing

0 edge at o(wg) = (u, (q, c)) defines a run ρg ∈ RunBH (γ(s(u)), q). Since gk+1 is defined, the 0 0 position u.i ∈ Pos(s) exists. and we let o(wg0 ) = (u.i, (ρ (ij), cρ0 (ij))) where ρ = ρg0 . Note that the tree γ(s(u.i)) appears as an IO-prefix of the subtree γ∞(s|u.i). Thus, all desired items are defined for the sequence g0. s See Fig. 7 how a finite directed path w in γ∞(s) starting at the root defines a corresponding path in s. The figure also shows that w defines a unique position in v(w) ∈ Pos(γ∞(s)) via the arrows which start in γ∞(s), leave the tree on the left, and make a clockwise turn upwards to enter the tree from the right. This visualizes the formal definition of a path wg and its position v(wg) in γ∞(s). 0 0 0 0 Suppose that o(w) = (u, (q, c)) and o(w ) = (u, (q , c )) where w = wg and w = wg0 are finite prefixes of decision sequences for Spoiler. We claim that if (q, c) 6= (q0, c0), then the positions of v(w) and v(w0) are incomparable. That is, there is no directed path between 0 0 0 0 them. Indeed, if (q, c) = (ρ(ij), cρ(ij)) and (q , c ) = (ρ(ij0 ), cρ(ij0 )), then we must have 0 ij0 6= ij. The claim follows by induction on |w|. Details are left to the reader. Thus, the 0 0 runs ρ(u, q, c) ∈ RunBH (γ(s(u)), q) and ρ(u, q , c ) ∈ RunBH (γ(s(u)), q) (chosen by P0 in his positional strategy) might be different, for example because c 6= c0. Since the positions v(w) and v(w0) are incomparable, we can use ρ(u, q, c) to define a partial run of the subtree rooted 0 0 at v(w) ∈ Pos(γ∞(s)) and we can use ρ(u, q , c ) to define a partial run of the subtree rooted at v(w0). Let n ∈ N. Consider all finite directed paths w in γ∞(s) beginning at the root such that |w| ≤ n, and among them those which are induced by finite prefixes of decision sequences g of Spoiler. Let call them wg. Then the the runs ρg constructed above for wg yield partial runs ρen of γ∞(s)). Note that these partial runs are compatible: for every common positions of two such partial runs, the labels (being states) are the same. This follows because wg = wg0 implies ρ (v(w )) = ρ 0 (v(w 0 )). Moreover, the sequence (ρ ) has a well-defined limit g g g g en n∈N ρ = lim ρ . Indeed, for every k ≥ 0 there is some n such that ρ (v) = ρ (v) for all 0 n→∞ en k en enk n ≥ nk and all positions v ∈ Pos(γ∞(s)) having a distance less than k to the root. 26 Regular matching problems for infinite trees

Therefore, the set of all directed paths in the arena G0(A, s) starting at (ε, (p, χ(p))) yields ρ0 as a (totally defined) run ρ0 : Pos(γ∞) → Q such that ρ0(ε) = p. (We write ρ0 because its definition depends on the positional winning strategy of P0.) We have to show that the run ρ0 is accepting. For that we have to consider all directed paths in the tree defined by ρ0. If a path ends at a leaf, then it is accepting because ρ0 is a run. Every infinite path in ρ0 is due to some infinite game α starting at (ε, (p, χ(p))). The game defines a decision sequence g(α) and the run ρ0 labels all vertices on the unique infinite

direct path β which visits all positions v(wgk ) where gk runs over all finite prefixes of g. By construction, the minimal color c(α) appearing infinitely often in the game α is the same color as occurring infinitely often on β. Since Prover wins all games, c(α) is even. Hence, the run ρ0 is accepting. Thus, the assertion we were looking for. J

6 Main results: The inclusion problem into regular sets

Henceforth, if σ : X → 2T (Σ∪H)\H is a substitution and R ⊆ T (Σ) is a subset, then we let

−1 σio (R) = {s ∈ T (Σ ∪ X ) | σio(s) ⊆ R} . (33)

Eq.(33) extends the earlier notation used for homomorphisms; and following lemma generalizes Lem. 26 by switching from a homomorphism γ to a regular substitution σ.

T (Σ∪H)\H I Lemma 27. Let R ⊆ T (Σ) be regular and σ : X → 2 be a regular substitution. −1 Then the set σio (R) = {s ∈ T (Σ ∪ X ) | σio(s) ⊆ R} is effectively regular. Proof. Let us begin to prove the lemma with the notation Σ0, R0, and σ0. Thus, formally, 0 0 0 T (Σ0∪H)\H 0−1 0 we begin with R ⊆ T (Σ ) and σ : X → 2 . The aim is to show that σ io (R ) = 0 0 0 {s ∈ T (Σ ∪ X ) | σio(s) ⊆ R } is effectively regular. For that we introduce a new constant a and we define a larger alphabet than Σ0 by letting Σ to be the disjoint union of Σ0 and {a}. We have R0 ∪ σ0(x) ⊆ T (Σ0 ∪ H) \ H for all x ∈ X . Define σ(x) = σ0(x) ∪ (T (Σ) \ T (Σ0)) for all x ∈ X and R = R0 ∪ (T (Σ) \ T (Σ0)) ⊆ T (Σ). Then σ and R are regular. Moreover, R contains all trees which have a leaf labeled by a. For all s ∈ T (Σ0) we have

0 0 −1 σio(s) ⊆ R ⇐⇒ σio(s) ⊆ R ⇐⇒ s ∈ σio (R).

−1 −1 0 Moreover, if σio (R) is regular, then σio (R) ∩ T (Σ ) is regular, too. Thus, it is enough to prove the lemma under the assumptions that first, σ(x) 6= ∅ for all x ∈ X and second, there is a constant a ∈ Σ such that T (Σ) \ T (Σ \{a}) ⊆ R. Let R = L(B, q0) for some parity-NTA B. According to Def. 19 in Sec. 4.2 we embed the NTA B first into BH and then BH into the parity-NTA BP . We also embed T (Σ) into T (ΣP ) such that both, R ⊆ L(BP , q0) and for every profile π = π(t) ∈ PB there is some finite tree tπ ∈ Tfin(ΣP ) satisfying π(tπ) = π. Since every tree with a leaf labeled by a belongs to R, we may assume that a ∈ L(B, p) for all states p. Hence, if t |= τ ∈ TB, then t[ij ← a] |= τ ∈ TB, too. In the following let Re = L(BP , q0). Since σ(x) is regular for all x ∈ X we can compute the specialization σˇ of σ as defined in Def. 21. Using Eq.(26) in Prop. 22 we can state

∀s ∈ T (Σ ∪ X ): σio(s) ⊆ R ⇐⇒ σio(s) ⊆ Re ⇐⇒ σˇio(s) ⊆ Re (34)

Define a new set of variables by

0 X = {(x, π) ∈ (Σ ∪ X ) × PB | ∃t ∈ σ(x): π = π(t)} with rk(x, π) = rk(x). (35)

Note that for all f ∈ Σ there exists a variable f, π(f(1,..., rk(f))) ∈ X 0. That is why, below, it is enough to work with T (X 0) rather than with T (Σ ∪ X 0). We use the second component 0 0 of (x, π) ∈ X to define a homomorphism γ : X → Tfin(ΣP ∪ H) \ H by γ(x, π) = tπ where C. Camino. V. Diekert, B. Dundua, M. Marin, and G. Sénizergues 27

tπ was defined is Fig. 9. (Recall that tπ satisfies π = π(tπ).) Then, by Eq.(25), for every x ∈ X the following equation holds.

σˇ(x) = {tπ ∈ T (ΣP ∪ H) | ∃t ∈ σ(x): π = π(t)} = {γ(x, π) | ∃t ∈ σ(x): π = π(t)} . (36)

Since σ(x) 6= ∅, the projection onto the first component (x, π) 7→ x defines a surjective 0 mapping ψ∞ : T (X ) → T (Σ) by the homomorphism

ψ : X 0 → T (Σ ∪ X ∪ H) \ H, (x, π) 7→ x(1,..., rk(x)). (37)

0 0 0 0 A top-down induction shows Pos(s ) = Pos(ψ∞(s )) for all s ∈ T (X ). Also note that −1 σˇio(s) = γ∞(ψ∞ (s)) for all s ∈ T (Σ ∪ X ). Indeed, consider any u ∈ Pos(s) with x = s(u). −1 −1 Then h (x) = {(x, π) | ∃t ∈ σ(x): t |= π}. Hence, γ(h (x)) = {tπ | ∃t ∈ σ(x): t |= π} = σˇ(x) where the last equation follows by Eq.(36). Therefore,

−1 −1 σio (R) =σ ˇio (Re) by (34) n o −1 = s ∈ T (Σ ∪ X ) γ∞(ψ∞ (s)) ⊆ Re n o −1 −1 = s ∈ T (Σ ∪ X ) ψ∞ (s) ⊆ γ∞ (Re) . n o −1 −1 −1 It enough to show that T (Σ ∪ X ) \ σio (R) = s ∈ T (Σ ∪ X ) ψ∞ (s) ∩ γ∞ (Re) 6= ∅ = 0 −1  −1 ψ∞ T (X ) \ γ∞ (Re) is regular. The set γ∞ (Re) is regular by Lem. 26. We can write 0 −1 0 0 0 0 T (Σ ∪ X ) \ γ∞ (Re) = L(A , p0) for a parity-NTA A = (Q, Σ ∪ X , δ , χ). It is therefore 0 enough to construct a parity-NTA A = (Q, Σ ∪ X , δ, χ) such that L(A, p0) = ψ∞(L(A , p0)). The construction of A is straightforward by using another transition relation. We define δ by the following equivalence where x ∈ Σ ∪ X :

 0 (p, x, q1, . . . , qr) ∈ δ ⇐⇒ ∃π ∈ PB : p, (x, π), q1, . . . , qr ∈ δ . (38)

The assertion follows. J By a class of tree languages we mean a family of sets C(∆), indexed by all finite ranked alphabets ∆, such that first, we have C(∆) ⊆ T (∆) and second, for every rank-preserving inclusion ϕ : ∆ → Γ, the tree-homomorphism defined by ϕ induces an inclusion C(∆) → C(Γ).

I Definition 28. We say that a class of tree languages C satisfies the property INREG, if the following inclusion problem into regular sets is decidable: For every finite ranked alphabet ∆, on input L ∈ C(∆) (given in some effective way) and a regular tree language K ⊆ T (∆) (given, say, by some parity-NTA) the problem “L ⊆ K?” is decidable.

I Theorem 29. Let C be a class of tree languages fulfilling the property INREG of Def. 28. Then the following decision problem is decidable.

Input: Regular substitutions σ1, σ2 and tree languages L ⊆ T (Σ ∪ X ), R ⊆ T (Σ) such that R is regular and L ∈ C(Σ ∪ X ). T (Σ∪H)\H Question: Is there some substitution σ : X → 2 satisfying both, σio(L) ⊆ R and σ1 ≤ σ ≤ σ2?

Moreover, we can effectively compute the set of maximal substitutions σ satisfying σio(L) ⊆ R and σ1 ≤ σ(x) ≤ σ2. It is a finite set of regular substitutions.

Proof. We let R = L(B, q0) for some parity NTA B. First, we check that σ1(x) ⊆ σ2(x) for all x ∈ X because otherwise there is nothing to do. By definition, we have σ1(f) = σ2(f) = 28 Regular matching problems for infinite trees

f(1,..., [rk(f)]) for all f ∈ Σ \X . Thus, without restriction, σ1(x) and σ2(x) are defined as T (Σ∪H)\H regular sets for all x ∈ Σ ∪ X with σ1 ≤ σ2. If there is any substitution σ : X → 2 satisfying both, σio(L) ⊆ R and σ1 ≤ σ ≤ σ2, then σ1 is the unique minimal substitution 0 satisfying that property. The substitution σ defined by σb1(x) ∩ σ2(x) satisfies that property, too. It belongs to the following effectively computable finite set S2 of regular substitutions

0 T (Σ∪H)\H 0 S2 = {σ : X → 2 | ∃ σ ∀x ∈ X : σ1(x) ⊆ σ (x) = σb(x) ∩ σ2(x)}. (39)

The set S2 is the same as defined above in (23) in the proof of Cor. 16. The notation in T (Σ∪H)\H Eq.(39) refers substitutions σ : X → 2 and their saturations σb w.r.t. the NTA B. Thus, if the set S2 is empty, then we can stop. The answer to the question in Thm. 29 is negative; and the set of maximal substitutions σ satisfying σio(L) ⊆ R and σ1 ≤ σ(x) ≤ σ2 is empty. Thus, we may assume S2 =6 ∅, which is computable by Cor. 16. Next, in a second phase we consider each element σ ∈ S2, one after another. By Lem. 27 we know that −1 for each σ ∈ S2, the set σio (R) is an effectively regular set. The assertions σio(L) ⊆ R −1 −1 and L ⊆ σio (R) are equivalent. Since L ∈ C we can check L ⊆ σio (R). Thus, we end 0 up with an effectively computable subset S2 of S2 such that there is some substitution T (Σ∪H)\H 0 σ : X → 2 satisfying both, σio(L) ⊆ R and σ1(x) ⊆ σ(x) ⊆ σ2(x) for all x ∈ X 0 0 if and only if S2 6= ∅. In case S2 = ∅ the answer to the question in Thm. 29 is negative, 0 again. Hence, without S2 6= ∅. The maximal substitutions σ satisfying both, σio(L) ⊆ R 0 0 0 and σ1(x) ⊆ σ(x) ⊆ σ2(x) for all x ∈ X are in S2. Since S2 is a nonempty, finite, and 0 computable set of regular substitutions, we can compute all maximal element(s) in S2. J Note that, in the above theorem, C can be chosen strictly larger than the class of all regular languages and still fulfill the hypothesis INREG. For example we can define C(∆) as consisting of all the regular languages in T (∆) augmented with all context-free languages over Tfin(∆) as defined in [7, 11, 16, 26]. I Remark 30. The hypothesis INREG cannot be removed in Thm. 29. For a counter-example we can choose the family C such that C(∆) consists of all regular subsets of T (∆) together with all the subsets T (∆) \ K where K is a context-free subset of Tfin(∆). The inclusion problem: “R ⊆ L?” for R regular and L context-free, reduces to the problem: T (∆∪H)\H ∃σ : X → 2 : σ(T (∆) \ L) ⊆ (T (∆) \ R) where σ1 = σ2 = H = ∅. But the inclusion of a regular language into a context-free language is undecidable, showing that the substitution-inclusion problem for the class C is undecidable. J The following corollary considers the special case where we demand that substitutions map variables to nonempty subsets of trees. This setting is rather natural and it is also the setting when Conway studied the problem for finite words. Cor. 31 is a straightforward consequence of Thm. 29 and the fact that saturated solutions are regular.

I Corollary 31. Let C be a class of tree languages fulfilling the property INREG as in Thm. 29. Then the following decision problem is decidable.

Input: Tree languages L ⊆ T (Σ ∪ X ), R ⊆ T (Σ) such that R is regular and L ∈ C(Σ ∪ X ). T (Σ∪H)\H Question: Is there some substitution σ : X → 2 satisfying both, σio(L) ⊆ R and σ(x) 6= ∅ for all x ∈ X ?

Moreover, we can effectively compute the set of maximal substitutions σ satisfying σio(L) ⊆ R and σ(x) 6= ∅ for all x ∈ X . It is a finite set of regular substitutions.

Proof. By definition of regularity, we have R = L(B, q0) for some parity-NTA B. We run the following nondeterministic decision procedure. For each x ∈ X we guess a profile πx ∈ PB. Then we check that there is some t ∈ T (Σ ∪ [rk(x)]) \ H such that πx = π(t). If there is no such t, then the assertion in the corollary has a negative answer for that guess. If there exists C. Camino. V. Diekert, B. Dundua, M. Marin, and G. Sénizergues 29

such t, then we let σ1(x) = {t ∈ T (Σ ∪ [rk(x)]) \ H | π = π(t)} and σ2(x) = T (Σ∪[rk(x)])\H. We have σb1 = σ1 and therefore σ1 is regular by Prop. 15. We now apply Thm. 29. The assertion in the corollary is positive if and only if for some guess the application of Thm. 29 yields a positive answer with a set of maximal solutions. From these data we can compute the set of maximal solutions where σ(x) 6= ∅ for all x ∈ X . Indeed, let σ(x) 6= ∅ for all x ∈ X . Then there is some tx ∈ σ(x) and we can define πx = π(tx). The non-deterministic procedure can guess the mapping x 7→ πx and we obtain the maximal solutions for that guess. Simulating all guesses and collecting the maximal solutions for each guess yields the result. J

7 Conclusion and open problems

The main result of the paper is Thm. 29. We included the special case Cor. 31 because the assertion in the corollary corresponds exactly to the results for regular languages over finite words due to Conway as stated in the introduction, Sec. 1. The assertions of Thm. 29 and Cor. 31 are positive decidability results, but we don’t know any (matching) lower and upper complexity bounds except for a few special cases. Various other natural problems are open, too. For example, a remaining question is whether it is possible to derive positive results for the outside-in extension σoi. Another puzzling problem is that we don’t know how to decide for regular tree languages L and R whether there exists a substitution σ such that σio(L) = R. We have seen that if such a substitution σ exists, then σ is in a finite and effectively computable set of regular substitutions. Thus, the underlying problem has no existential quantifier: decide “σio(L) = R?” on input L, R, and σ where L, R, and σ are regular. If the answer is yes: σio(L) = R, then σio(L) is regular. So, one could try to solve that problem. Recall that the problem is decidable in the setting of finite trees and homomorphisms by [9, 14].

Actually, we don’t know how to decide the problem “σio(L) = T (Σ)?” where L ⊆ T (X ) and σ(x) = T (Σ ∪ [rk(x)]) for all x ∈ X . Both problems “∃σ : σio(L) = R?” and “∃σ : σoi(L) ⊆ R?” remain open, even if we restrict ourselves to deal with finite trees, only. It is also open whether better results are possible if we restrict R (or L and R) to smaller classes of regular tree languages like the class of languages with Büchi acceptance.

References 1 Sebastian Bala. Complexity of regular language matching and other decidable cases of the satisfiability problem for constraints between regular open terms. Theory of Computing Systems, 39(1):137–163, 2006. 2 Achim Blumensath. Regular tree algebras. Logical Methods in Computer Science, 16, February 2020. URL: https://lmcs.episciences.org/6101. 3 Mikołaj Bojańczyk. Recognisable languages over monads. ArXiv e-prints, abs/1502.04898, 2015. URL: http://arxiv.org/abs/1502.04898, arXiv:1502.04898. 4 Julius Richard Büchi. On a decision method in restricted second-order arithmetic. In Proc. Int. Congr. for Logic, Methodology, and Philosophy of Science, pages 1–11. Stanford Univ. Press, 1962. 5 Hubert Comon, Max Dauchet, Rémy Gilleron, Florent Jacquemard, Denis Lugiez, Christof Löding, Sophie Tison, and Marc Tommasi. Tree automata techniques and applications, 2007. http://www.grappa.univ-lille3.fr/tata. 6 John Horton Conway. Regular algebra and finite machines. Chapman and Hall, London, 1971. 7 Bruno Courcelle. A representation of trees by languages. II. Theoretical Computer Science, 7(1):25–55, 1978. URL: https://doi-org.docelec.u-bordeaux.fr/10.1016/0304-3975(78) 90039-7, doi:10.1016/0304-3975(78)90039-7. 30 Regular matching problems for infinite trees

8 Bruno Courcelle. Fundamental properties of infinite trees. Theoretical Computer Science, 25:95–169, 1983. URL: https://doi-org.docelec.u-bordeaux.fr/10.1016/0304-3975(83) 90059-2, doi:10.1016/0304-3975(83)90059-2. 9 Carles Creus, Adrià Gascón, Guillem Godoy, and Lander Ramos. The HOM problem is EXPTIME-complete. SIAM J. Comput., 45:1230–1260, 2016. URL: https://doi.org/10. 1137/140999104, doi:10.1137/140999104. 10 Volker Diekert, Manfred Kufleitner, Gerhard Rosenberger, and Ulrich Hertrampf. Discrete Algebraic Methods. Arithmetic, Cryptography, Automata and Groups. Walter de Gruyter, 2016. 11 Joost Engelfriet and Erik Meineche Schmidt. IO and OI. I. J. Comput. Syst. Sci., 15:328–353, 1977. URL: https://doi.org/10.1016/S0022-0000(77)80034-2, doi:10.1016/ S0022-0000(77)80034-2. 12 Joost Engelfriet and Erik Meineche Schmidt. IO and OI. II. J. Comput. Syst. Sci., 16:67–99, 1978. URL: https://doi.org/10.1016/0022-0000(78)90051-X, doi:10.1016/ 0022-0000(78)90051-X. 13 Seymour Ginsburg and Thomas N. Hibbard. Solvability of machine mappings of regular sets to regular sets. J. ACM, 11:302–312, 1964. URL: http://doi.acm.org/10.1145/321229.321234, doi:10.1145/321229.321234. 14 Guillem Godoy and Omer Giménez. The HOM problem is decidable. J. ACM, 60:23:1–23:44, 2013. URL: http://doi.acm.org/10.1145/2501600, doi:10.1145/2501600. 15 Erich Grädel, Wolfgang Thomas, and Thomas Wilke, editors. Automata, Logics, and In- finite Games: A Guide to Current Research, volume 2500 of Lecture Notes in Computer Science. Springer, 2002. URL: https://doi.org/10.1007/3-540-36387-4, doi:10.1007/ 3-540-36387-4. 16 Irène Guessarian. Pushdown tree automata. Mathematical Systems Theory, 16(4):237–263, 1983. URL: https://doi-org.docelec.u-bordeaux.fr/10.1007/BF01744582, doi:10.1007/ BF01744582. 17 Yuri Gurevich and Leo Harrington. Trees, automata, and games. In Harry R. Lewis, Barbara B. Simons, Walter A. Burkhard, and Lawrence H. Landweber, editors, Proceedings of the 14th Annual ACM Symposium on Theory of Computing, May 5-7, 1982, San Francisco, California, USA, pages 60–65. ACM, 1982. URL: https://doi.org/10.1145/800070.802177, doi:10. 1145/800070.802177. 18 Dexter Kozen. Lower bounds for natural proof systems. In Proc. of the 18th Ann. Symp. on Foundations of Computer Science, FOCS’77, pages 254–266, Providence, Rhode Island, 1977. IEEE Computer Society Press. 19 Michal Kunc. The power of commuting with finite sets of words. Theory of Computing Systems, 40:521–551, 2007. 20 Michal Kunc and Alexander Okhotin. Language equations. In Jean-Éric Pin, editor, Handbook of Automata, Vol. I, pages 765–800. EMS Publishing House, Berlin, 2021. Final version (2018) available at the homepage of Okhotin. 21 Saunders MacLane. Categories for the Working Mathematician. Springer-Verlag, New York, 1971. Graduate Texts in Mathematics, Vol. 5. 22 David E. Muller and Paul E. Schupp. Alternating automata on infinite trees. Theoretical Com- puter Science, 54:267–276, 1987. URL: https://doi.org/10.1016/0304-3975(87)90133-2, doi:10.1016/0304-3975(87)90133-2. 23 David E. Muller and Paul E. Schupp. Simulating alternating tree automata by nondeter- ministic automata: New results and new proofs of the theorems of Rabin, McNaughton and Safra. Theoretical Computer Science, 141:69–107, 1995. URL: https://doi.org/10.1016/ 0304-3975(94)00214-4, doi:10.1016/0304-3975(94)00214-4. 24 Christophe Prieur, Christian Choffrut, and Michel Latteux. Constructing sequential bijections. In Structures in logic and computer science, volume 1261 of Lecture Notes in Computer Science, pages 308–321. Springer, Berlin, 1997. URL: https://doi-org.docelec.u-bordeaux.fr/10. 1007/3-540-63246-8_19, doi:10.1007/3-540-63246-8_19. 25 Michael O. Rabin. Decidability of second-order theories and automata on infinite trees. Transactions of the American Mathematical Society, 141:1–35, 1969. C. Camino. V. Diekert, B. Dundua, M. Marin, and G. Sénizergues 31

26 Karl M. Schimpf and Jean H. Gallier. Tree pushdown automata. Journal of Computer and System Sciences, 30(1):25–40, 1985. URL: https://doi-org.docelec.u-bordeaux.fr/10. 1016/0022-0000(85)90002-9, doi:10.1016/0022-0000(85)90002-9. 27 Wolfgang Thomas. Automata on infinite objects. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, chapter 4, pages 133–191. Elsevier Science Publishers B. V., 1990. 28 Wiesław Zielonka. Infinite games on finitely coloured graphs with applications to automata on infinite trees. Theoretical Computer Science, 200:135–183, 1998. URL: https://doi.org/ 10.1016/S0304-3975(98)00009-7, doi:10.1016/S0304-3975(98)00009-7.

8 Appendix: some additional material

As noticed in the preamble to the present paper: it is not necessary to read anything in the appendix to understand the main results. L’annexe, c’est l’art pour l’art: ars gratia artis.

8.1 Conway’s result for finite and infinite words This section is meant for readers who are interested to see proofs of our main results in the special (and simpler) case of finite and infinite words. Let Σ and X be finite alphabets and let Σ∞ denote the set of finite and infinite words. That is Σ∞ = Σ∗ ∪ Σω where Σω is the set of infinite words. The aim is to give an essentially self-contained proof for (a generalization of) Conway’s result with respect to # ∈ {⊆, =} for subsets of Σ∞. For that we consider “regular constraints” in the spirit of Thm. 29. The proof for Σ∗ is given with all details. Next, we explain that the same approach works for infinite words, too. However, we are more sketchy and our proof uses the well-known result that regular ω-languages form an effective Boolean algebra [4]. This means that on input ω-languages L1,L2 (for example, specified by Büchi automata) we can construct Büchi ω ω automata for Σ , L1 ∪ L2, and Σ \ L1. There is a way to avoid an explicit complementation, for example by using ultimately period words as a witness to show that L1 \ L2 6= ∅.

Σ+ I Proposition 32. Let σ1, σ2 be mappings from Σ ∪ X to 2 such that for i = 1, 2 first, σ1(a) = {a} for every a ∈ Σ \X and second, σi(x) is a regular language for every x ∈ X . Then for regular languages L ⊆ (Σ ∪ X )∞ and R ⊆ Σ∞ the following assertions hold for # ∈ {⊆, =}.

1. The following decision problem is decidable.

“∃σ : σ(L) # R ∧ ∀x : σ1(x) ⊆ σ(x) ⊆ σ2(x)?” (40)

2. Define σ ≤ σ0 by σ(x) ⊆ σ0(x) for all x ∈ X . Then every solution σ of (40) is bounded from above by a maximal solution; and the number of maximal solutions is finite. Moreover, if σ is maximal, then σ(x) is regular for all x ∈ X ; and all maximal solutions are effectively computable.

Recall that a solution σe of (40) is maximal and above a solution σ if and only if first, σ ≤ σe 0 0 0 and second, if σ is any solution of (40) with σe ≤ σ , then σe = σ .

Proof. Clearly, we may assume without restriction that σ1(x) ⊆ σ2(x) for all x because otherwise there are no solutions. Finite words: The easiest situation is the original setting of Conway: L ⊆ (Σ ∪ X )∗ is arbitrary and R ⊆ Σ∗ is regular. In that case let B = (Q, Σ, δ, I, F ) be an NFA such that δ ⊆ Q × Σ × Q and R = L(B). Without restriction, Q = {1, . . . , n} with n ≥ 1. For p, q ∈ Q we denote by L[p, q] the set of words accepted by the NFA Bp,q = (Q, Σ, δ, {p}, {q}). Next, we consider the semiring of Boolean n×n matrices Bn×n where, as usual, B = ({0, 1}, max, min). We use the multiplication of matrices to define Bn×n as a monoid where the unit matrix is the 32 Regular matching problems for infinite trees

neutral element. For a letter a ∈ Σ we denote by Ma the matrix such that for all p, q we have ∗ Ma(p, q) = 1 ⇐⇒ a ∈ L[p, q]. Since Σ is a free monoid, the Ma’s define a homomorphism ∗ n×n ∗ µ :Σ → B such that for all w ∈ Σ we have Mw(p, q) = 1 ⇐⇒ w ∈ L[p, q] where −1 ∗ Mw = µ(w). We have µ (µ(R)) = R. Indeed, for u ∈ Σ let [u] be the set of words v such that µ(u) = µ(v). Then we have [ [ [ R = {u ∈ L[p, q] | p ∈ I, q ∈ F } = {[u] ⊆ Σ∗ | u ∈ L[p, q]} . (41) p∈I,q∈F The verification of (41) is straightforward from the definition of µ. The crucial observation is that the union in (41) is finite because the set {[u] ⊆ Σ∗ | u ∈ Σ∗} is finite. Its cardinality is n2 less than 22 .

Final Steps. We are almost done with the proof for finite words. We need a few more Σ∗ Σ∗ steps. Let σ : X → 2 be any substitution. Define its saturation σb : X → 2 by S ∗ σb(x) = {[u] ⊆ Σ | u ∈ σ(x)} . Then σ ≤ σb, the set σb(x) is regular (being a finite union of regular sets), and if σ(L) # R, then it holds σb(L) # R, too. Moreover, if σ is regular, then we n×n −1 can calculate σb because we can decide for all m ∈ B and x ∈ X whether µ (m) ⊆ σ(x). 0 0 Let us call a substitution σ be a candidate as soon as first, σ1(x) ⊆ σ (x) for all x ∈ X and second, u ∈ σ0(x) implies [u] ⊆ σ0(x). Since {[u] ⊆ Σ∗ | u ∈ Σ∗} is an effective list of finitely n2 many regular sets, the finite list C of candidates is computable: there are at most 22 |X | 0 candidates. Since we assume σ1 ≤ σ2, we compute for each candidate σ ∈ C another regular 0 substitution σe by σe(x) = σ (x) ∩ σ2(x). The point is that whenever the problem in (40) has Σ∗ any solution σ : X → 2 , then there is a maximal solution of the form σe which satisfies σe(L) # R. The class of regular languages is closed under substitution of letters by regular sets. This is a standard exercise in formal language theory. Hence, σe(L) is regular; and we can decide σe(L) # R for every σe. This is the same as to decide σe(L) ⊆ R. Thus, we have to ω 16 decide σe(L) ∩ (Σ \ R) = ∅. The test involves the complementation of R. So, we end up with a nonempty list L of substitutions σe satisfying σe(L) # R. The list L is finite and effectively given. Hence, we can compute all maximal elements. We are done.

Infinite words: Let us show that (using [4]) the case of infinite words can be explained in a similar fashion. For simplicity we restrict ourselves to substitutions where σ(x) is a non-empty subset of Σ+. (The other cases are not harder, but need more case distinctions.) The starting point are two ω-regular languages L ⊆ (Σ ∪ X )ω, and R ⊆ Σω. We use the fact that every ω-regular language can be accepted by a nondeterministic Büchi automaton. The syntax of a Büchi automaton is as for an NFA: B = (Q, Σ, δ, I, F ) where δ ⊆ Q × Σ × Q. A word w ∈ Σω is accepted if there are p ∈ I and q ∈ F such that B allows an infinite path labeled by w which begins in p and visits the state q infinitely often. Instead of working over the Booleans B, we consider the three-element commutative idempotent semiring S = ({0, 1, 2}, +, ·) where + = max and x · y = 0 if x = 0 or y = 0 and otherwise x · y = max{x, y}. Note that 0 is a zero and 1 is neutral in (S, ·). Let us define for n×n Q = {1, . . . , n} and a ∈ Σ the matrix Ma ∈ S by  0 if (p, a, q) ∈/ δ,  Ma(p, q) = 1 if (p, a, q) ∈ δ but {p, q} ∩ F = ∅,  2 otherwise: if (p, a, q) ∈ δ and {p, q} ∩ F 6= ∅.

2 The multiplicative structure (Sn×n, ·) yields a finite monoid with 3n elements. The matrices n×n ∗ n×n ∗ Ma ∈ S define a homomorphism µ :Σ → S . For all p, q ∈ Q and w ∈ Σ the

16 It is here where our exposition uses for ω-regular languages Büchi’s help. C. Camino. V. Diekert, B. Dundua, M. Marin, and G. Sénizergues 33

interpretation for Mw = µ(w) is as follows. We have Mw(p, q) 6= 0 if and only if there is a path labeled by w from state p to q. Moreover, Mw(p, q) = 2 if and only if there is a path labeled by w from state p to q which visits a final state. Let T ⊆ Sn×n be any subset, then µ−1(T ) is a regular language of finite words: Sn×n is the state set of a (deterministic!) NFA accepting µ−1(T ). In the terminology of ω-languages: µ strongly recognizes every ω L = L(Q, Σ, δ, {p}, {q}) ⊆ Σ where p, q ∈ Q. Strong recognition means that if u = x0x1 ··· and v = y0y1 ··· are infinite sequences of finite words such that µ(xi) = µ(yi) for all i ∈ N, then u ∈ L ⇐⇒ v ∈ L. This property is crucial in the following. The verification of this property is straightforward and left to the reader. We proceed as in the case of finite words. We let [u] = {v ∈ Σ+ | µ(u) = µ(v)}. Now, Σ+ consider any substitution σ : X → 2 \ ∅. Let us define its saturation σb by σb(x) = S + {[u] ∈ Σ | u ∈ σ(x)}. The saturation σb is regular. Moreover, if σ itself is regular, then we can compute σb effectively. (Note that this involves sets of finite words, only.) The number of n2 saturated substitutions is less than 23 |X |. We can calculate a list all saturated substitutions. + We also use the fact that if L ⊆ (Σ ∪ X ) is ω-regular and if σ : X → 2Σ is a regular substitution, then σ(L) is effectively ω-regular. Again, this fact is rather easy to see and it does not rely on the results of [4]. It is well-known that a language L of infinite words is ω-regular if and only if L is a finite union of languages UV ω where U and V are regular languages of nonempty finite words. This fact can be derived in essentially the same way as the corresponding statement for finite words showing that NFAs have the same expressive power as regular expressions. Now, we can use our knowledge on finite words: if L = S UV ω, then σ(L) = S σ(U)σ(V )ω. If σ is a regular substitution such that the empty word is not in any σ(x), then σ(L) ⊆ Σω is ω-regular because the class of regular subsets in Σ∗ is closed ω under regular substitutions. More precisely, let w = x0x1 · · · ∈ L ⊆ (Σ ∪ X ) such that σ(w) ∈ L(B). Then σ(w) is the set of all ω-words which have a factorization u = u0u1 ··· ω where for each i ∈ N we have ui ∈ σ(xi). Consider any ω-word v = v0v1 · · · ∈ Σ such that vi ∈ [ui] for all i ∈ N. Since µ it is strongly recognizable, as explained above, σ(w) ⊆ R implies v ∈ R. As a consequence, if σ(L)#R, then we have σb(L)#R, too. We now have all ingredients together to argue as in paragraph above “Final Steps” for finite words. However, to make the statements effective we use the fact that the family of ω-regular languages is an effective Boolean algebra. It is in this part where (according to our approach) Büchi’s result in [4] enters the scene: the complement of an ω-regular language is effectively ω-regular. See also the corresponding footnote (16) above where we speak about finite words and where we point to complementation. J

For more details about regular languages over infinite words (in the spirit of this section) we refer to the textbook [10]. In particular, the book presents the equivalence between ω-regular expressions and Büchi automata. It also shows how Büchi used a simple argument from Ramsey theory to derive his results in [4]. Of course, other textbooks or survey papers than [10] do the same.

8.2 About the existence of maximal solutions

The aim of this subsection is to show that the regularity hypothesis on R ⊆ T (Σ) cannot be removed from our main results. This is done in Ex. 34. The example uses the special case of infinite words, only.

Remember that TX -fin(Σ ∪ X ) (resp. TH-fin(Σ ∪ H)) denote the set of trees with only a finite number of occurrences of variables (resp. holes). Moreover, in Sec. 2.1 we used induction on the maximal level of a variable and Eq.(5) (resp. Eq.(6)) to define σio(s) (resp. σoi(s)) for s ∈ TX -fin(Σ ∪ X ) (with the additional exceptional case of a term without variable). Such 34 Regular matching problems for infinite trees

inductions can be viewed, as inductions on the following notion of norm of a term

ksk = sup{|u| + 1 | u ∈ pos(s) ∧ s(u) ∈ X }. (42)

I Proposition 33. Let L ⊆ TX -fin(Σ ∪ X ) and R ⊆ T (Σ) be arbitrary subsets, σ : X → 2T (Σ∪H)\H be a substitution, and # ∈ {⊆, =}. Then the following holds.

0 0 0 There exists a maximal substitution σ such that σ ≤ σ and σio(L)#R. 0 If, in addition, σ(x) ⊆ TH-fin(Σ ∪ H), then exists a maximal substitution σ such that 0 0 σ ≤ σ and σoi(L)#R, too. Proof. For the proof let e ∈ {io, oi} one of the two possible extensions. It is enough to see that ≤ is an inductive ordering on the set of substitutions σ : X → 2T (Σ∪H)\H such that σe(L)# R because then, every solution is bounded from above by some maximal solution thanks to Zorn’s lemma. For that we have to consider nonempty totally ordered subsets

(k) T (Σ∪H)\H T (Σ∪H)\H {σ : X → 2 | k ∈ K} ⊆ {σ : X → 2 | P(L)# R}. Here, the index set K is totally ordered satisfying σ(k) ≤ σ(`) ⇐⇒ k ≤ `. For x ∈ X 0  (k) 0 T (Σ∪H)\H we define σ (x) = S σ k ∈ K . This yields a substitution σ : X → 2 such that σ(k) ≤ σ0. We show by induction on ksk that if s ∈ L and t ∈ σ0(s), then t ∈ σ(k)(s) for some k ∈ K. For that we fix any index k0 ∈ K. Now, let s = x(s1, . . . , sr) ∈ L 0 0 0 and t = tx[ij ← ti] ∈ σio(s) (resp. t = tx[ij ← tij ] ∈ σoi(s)) such that tx ∈ σ (x) and 0 0 ti ∈ σio(si) (resp. tij ∈ σio(si)). For ksk = 0 and both possibilities e ∈ {io, oi}, we have (k0) 0 σe (s) = σe(s) = s. We are done for ksk = 0. In the other case, there are indices ki (ki) (kij ) (resp. kij) by induction such that ti ∈ σio (si) (resp. tij ∈ σoi (si)) for all 1 ≤ i ≤ rk(x). 0 (kx) (This is true for rk(x) = 0.) Since tx ∈ σ (x), there is some index kx such that tx ∈ σ (x). Let kt = max {kx, ki | 0 ≤ i ≤ rk(x)} (resp. kt = max {kx, kij | 0 ≤ i ≤ rk(x) ∧ ij ∈ leaf tx}). (kt) The maximum exists for both e = io and e = oi. We have kt ≥ k0 and t ∈ σe (s): the 0 induction step is achieved. Therefore, σe(s) ⊆ R. (k) Finally, if σe (L) = R for all k ∈ K, then for each t ∈ R there is some s ∈ L such that (kt) S (k) S 0 t ∈ σe (s). This implies R ⊆ {σe (s) | s ∈ L ∧ k ∈ K} = {σe(s) | s ∈ L} ⊆ R. Hence, S 0 {σe(s) | s ∈ L} = R: every solution is bounded from above by some maximal solution. J Ex. 34 shows that the statement of Prop. 33 might fail, when L contains some tree with infinitely many occurrences of some variable. We give such an example in the case where R is not regular and L is a set of ω-words. It fails for both extensions {io, oi} and for both comparison relations # ∈ {=, ⊆}.

I Example 34. Consider Σ = {a, b} and X = {x} where all three symbols have rank one. Hence, H = {1} and an infinite tree over Σ ∪ X encodes an infinite word in (Σ ∪ X )ω and vice 0 0 ω  ω k versa. We let L = L ∪R where L = {(ax) } and R = u ∈ Σ ∃k ≥ 1, b is no factor of u . 0 The singleton L is regular, but neither L nor R is not regular. Note that we have σio(L) = σoi(L) for every substitution σ(x) ⊆ T (Σ ∪ H) \ H. Clearly, σio(L) = R ⇐⇒ σio(L) ⊆ 0 0 R ⇐⇒ σio(L ) ⊆ R. There are many substitutions σ such that σio(L ) ⊆ R. For example, 0 ω let σ(x) = a(1), then σio(L ) = a . But there is no maximal substitution σ such that 0 σio(L) ⊆ R. Indeed, given any substitution σ(x) ⊆ Tfin({a, b, 1}) \ H such that σio(L ) ⊆ R, we can define n = max{k ≥ 0 | bk is a factor of σ(x)}. Then σ0(x) = σ(x) ∪ {bn+1(1)} is 0 ω strictly larger than σ, and it satisfies σio(L) ⊆ R, too. For example, if σio(L) = a , then we 0 0 ω obtain σ (x) = {a(1), b(1)} and σio(L) = (ab) . Comparing the two items in Prop. 33 we see that the “outside-in”-version for oi requires an additional assumption: every term σ(x) has finitely many holes, only. The next example shows that we cannot drop that hypothesis. C. Camino. V. Diekert, B. Dundua, M. Marin, and G. Sénizergues 35

I Example 35. Consider Σ = {f, a, b} and X = {x, y} where rk(f) = 2, rk(x) = rk(y) = rk(a) = 1, and rk(b) = 0. We are interested in the finite term x(y(b)). Let σ(x) = t where t = f(t, 1) be infinite comb as in Fig. 3 and σ(y) = a(b). The infinite tree σ(x) is regular and the accepted language of a deterministic top-down tree automaton with a Büchi-acceptance condition as defined for example in [27]. The singleton σoi(x(y(b))) is shown in Fig. 6 on the left with n = 1 and σ(y) = a(b). Similar as in Ex. 34 we let

 k R = t ∈ T ({f, a, b}) ∃k ≥ 1, a is no factor of any path in t .

0 0 Then σoi(x(y(b))) ⊆ R. However, there is no maximal substitution σ such that σ ≤ σ 0 00 and σoi(x(y(b))) ⊆ R. Indeed, given any such σ there is some n ∈ N such that σ (y) = 0 n 0 00 00 σ (y) ∪ {a (b)} is strictly larger than σ and σ still satisfies σoi(x(y(b))) ⊆ R. If we define 00 ∗ 00 σ (y) = a (b), then σoi(x(y(b))) is regular, but it is not a subset of R.

8.3 The outside-in extension σoi for infinite trees

Although the definition of σoi(s) for infinite trees is not used in the main body of the paper, we include a definition to have a reference for possible future research. Actually, the definition of σoi(s) is technically simpler than the one of σio(s) because no choice functions are used. Let us define σoi(L) for a set of trees which can be finite or infinite. Without restriction, we content ourselves to define σoi(L) for S ⊆ T (X ) where Σ∩X = ∅ and σ(x) ⊆ T (Σ∪[rk(x)])\H for all x ∈ X . The first step reduces the problem to define σoi(L) for a set L to the case S where L is a singleton simply be letting, as expected, σoi(L) = {σoi(s) | s ∈ L}. Thus, it is enough to define σoi(s) for an infinite tree s ∈ T (X ) because for a finite tree s we employ the definition in Eq.(6). The idea is to use a nondeterministic program which transforms s ∈ T (X ) into a “hybrid” tree of T (X ∪ Σ). During the process top-down more and more symbols appear in Σ and the subtrees (which are subtrees of s) appear at greater levels. Each run of the program defines at most one output. The set of all outputs over all runs defines the set σoi(s). The program does not need to terminate, but if it runs forever, then, in the limit, it defines (nondeterministically) a tree t ∈ T (Σ). The nondeterministic program nd is denoted as “σoi ”. The input for the program is any tree s ∈ T (X ). begin procedure Initialize a tree variable t := s ∈ T (Σ ∪ X ). Perform the following while-loop as long as t∈ / T (Σ).

1. Choose in the breadth-first order on Pos(t) the first position v such that x = s(v) ∈ X .

2. Denote the subtree t|v rooted at v as t|v = x(t1, . . . , tr). This implies rk(x) = r.

3. Choose nondeterministically some tx ∈ σ(x). If this is not possible because σ(x) = ∅, then EXIT without any output.

4. Replace in t the subtree t|v by tx[ij ← ti]. Recall that this means to replace in tx every ij ∈ leafi(tx) by the same tree ti where ti is the i-th child of the root in t|v. Moreover, if leafi(tx) 6= ∅, then 1 ≤ i ≤ rk(x). end procedure

Do not confuse the procedure with an inside-out extension. We visit positions in t labeled by a variable one after another and later choices of trees in σ(t(v)) are fully independent of each other. If the program terminates without using the EXIT branch, then we return the final tree t as the output t = σ(ρ, s) of the specific nondeterministic run ρ. If the program runs forever, then let tn be the value of t after performing the n-th loop of this run ρ. Since we always have tx ∈ T (Σ ∪ H) \ H, an easy reflection shows that there exists a unique infinite tree σ(ρ, s) = limn→∞ tn. Moreover, σ(ρ, s) ∈ T (Σ) . We then define σoi(s) = {σ(ρ, s) | ρ run over s} and Eq.(6) holds. 36 Regular matching problems for infinite trees

8.4 Quotient metrics This short subsection explains Footnote 7: our definition of a quotient metric agrees with the standard one in topology. Given a metric space (M, d) and any equivalence relation ∼ on M, there is a canonical definition of a quotient (pseudo-)metric d∼ on the quotient space M/∼. For x ∈ M let [x0] = {x0 ∈ M | x ∼ x0} denote its equivalence class. We associate to (M, d) a complete weighted graph with vertex set M/∼ and weight g([x], [y]) = inf {d(x0, y0) | x ∼ x0 ∧ y ∼ y0}. (So the weight might be 0 for [x] 6= [y].) Then we define d∼([x], [y]) by the infimum over all weights of paths in the undirected graph connecting [x] and [y]. The path can be arbitrarily long and still have weight 0. Clearly, d∼ is a pseudo metric (but not a metric, in general) satisfying

0 ≤ d∼([x], [y]) ≤ g([x], [y]) ≤ d(x, y).

If for each [x] there exists x0 ∈ [x] such that for all y we have

0 0 0 < g([x], [y]) = inf {d(x0, y ) | y ∼ y }

then g([x], [y]) = d∼([x], [y]) is a metric. In our situation it is a metric: we have (M, d) = (T (Ω⊥), d) and M/∼ results by identifying all trees s ∈ (T (Ω⊥), d) where some position in 0 Pos(s) is labeled by ⊥. To see this, just recall our definition of d(s, s ) for trees in T (Ω⊥):

( 0 0 1 if either s or s uses the symbol ⊥ but not both, d(s, s ) = ∗ 0 2− inf{|u|∈N | u∈N : s(u)6=s (u)} otherwise.