First-Order and Temporal Logics for Nested Words

Rajeev Alur Marcelo Arenas Pablo Barcel´o University of Pennsylvania PUC Chile Universidad de Chile

KoushaEtessami NeilImmerman LeonidLibkin University of Edinburgh University of Massachusetts University of Edinburgh

Abstract Pushdown systems (PDSs), Boolean Programs, and Re- cursive State Machines (RSMs), are equivalent abstract Nested words are a structured model of execution paths models of procedural programs, with finite data abstrac- in procedural programs, reflecting their call and return tion but unbounded call stack. Software model checking nesting structure. Finite nested words also capture the technology is by now thoroughly developed for checking structure of parse trees and other -structured data, such ω-regular properties of runs for these models, when the as XML. runs are viewed as ordinary words (see [5, 8, 1]). Unfor- We provide new temporal logics for finite and infi- tunately, temporal logic and ω-regular properties over ordi- nite nested words, which are natural extensions of LTL, nary words are inadequate for expressing a variety of prop- and prove that these logics are first-order expressively- erties of program executions that are useful in interproce- complete. One of them is based on adding a ”within” dural program analysis and software verification. These in- modality, evaluating a formula on a subword, to a logic clude Hoare-like pre/post conditions on procedures, stack CaRet previously studied in the context of verifying prop- inspection properties, and other useful program analysis erties of recursive state machines. The other logic is based properties that go well beyond ω-regular (see [2] for some on the notion of a summary path that combines the linear examples). On the other hand, many such program analy- and nesting structures. For that logic, both model-checking sis properties can easily be expressed when runs are viewed and satisfiability are shown to be EXPTIME-complete. as nested words. Runs of Boolean Programs and RSMs can Finally, we prove that first-order logic over nested words naturally be viewed as nested words once we add “summary has the three-variable property, and we present a tempo- edges” between matching calls and returns, and we can thus ral logic for nested words which is complete for the two- hope to extend model checking technology for procedural variable fragment of first-order. programs using richer temporal logics over nested words which remain tractable for analysis. These considerations motivated the definition of Visibly 1 Introduction Pushdown Languages (VPLs) [3] and the call-return tempo- ral logic CaRet [2]. CaRet is a temporal logic over nested words which extends LTL with new temporal operators that An execution of a procedural program can reveal not just allow for navigation through a nested word both via its ordi- a linear sequence of program states encountered during the nary sequential structure, as well as its matching call-return execution, but also the correspondence between each point summary structure. The standard LTL model checking al- during the execution at which a procedure is called and the gorithms for RSMs and PDSs can be extended to allow point when we return from that procedure call. This leads model checking of CaRet, with essentially the same com- naturally to the notion of a finite or infinite nested word ([4, plexity [2]. VPLs [3] are a richer class of languages that 3, 2]). A nested word is simply a finite or ω-word supplied capture MSO-definable properties of nested words. Re- with an additional binary matching relation which relates cently, results about VPLs have been recast in light of corresponding call and return points (and of course satisfies nested words, and in particular in terms of Nested Word Au- “well-bracketing” properties). Finite nested words offer an tomata [4] which offer a machine acceptor for (ω-)regular alternative way to view any data which has both a sequential nested words, with all the expected closure properties. string structure as well as a tree-like hierarchical structure. Examples of such data are XML documentsand parse trees. Over ordinary words, LTL has long been considered the

1 temporal logic of choice for program verification, not only a tableaux construction which translates an NWTL formula because its temporal operators offer the right abstraction into a Nested Word Automaton, enabling the standard au- for reasoning about events over time, but because it pro- tomata theoretic approach to model checking of Boolean vides a good balance between expressiveness (first-order Programs and RSMs with complexity that is polynomial in complete), conciseness (can be exponentially more suc- the size the model and EXPTIME in the size of the formula. cinct compared to automata), and the complexity of model- We then explore some alternative temporal logics, which checking (linear time in the size of the finite transition sys- extend variants of CaRet with variants of unary “Within” tem, and PSPACE in the size of the temporal formula). operators proposed in [2], and we show that these exten- This raises the question: What is the right temporal logic sions are also FO-complete. However, we observe that the for nested words? model checking and satisfiability problems for these logics The question obviously need not have a unique answer, are 2EXPTIME-complete. These logics are – provably – particularly since nested words can arise in various appli- more concise than NWTL, but we pay for conciseness with cation domains: for example, program verification, as we added complexity. already discussed, or navigation and querying XML doc- It follows from our proof of FO-completenessfor NWTL uments under “sequential” representation (see, e.g., [27]). that over nested words, every first-order formula with one However, it is reasonable to hope that any good temporal free variable can be expressed using only 3 variables. More logic for nested words should possess the same basic quali- generally, we show, using EF games, that 3 variables suffice ties that make LTL a good logic for ordinarywords, namely: for expressing any first order formula with two or fewer free (1) first-order expressive completeness: LTL has the same variables, similarly to the case of words [13] or finite trees expressive power as first-order logic over words, and we [19]. Finally, we show that a natural unary temporal logic would want the same over nested words; (2) reasonable over nested words is expressively complete for first-order complexity for model checking and satisfiability; and (3) logic with 2 variables, echoing a similar result known for nice closure properties: LTL is closed under boolean com- unary temporal logic over ordinary words [9]. binations including negation without any blow-up, and we Related Work. VPLs and nested words were introduced would want the same for a logic over nested words. Finally in [3, 4]. The logic CaRet was defined in [2] with the goal (and perhaps least easy to quantify), we want (4) natural of expressing and checking some natural non-regular pro- temporal operators with simple and intuitive semantics. gram specifications. The theory of VPLs and CaRet has Unfortunately, the logic CaRet appears to be deficient been recast in light of nested words in [4]. Other aspects of with respect to some of these criteria: although it is easily nested words (automata characterizations, games, model- first-order expressible, proving incompleteness – a widely checking) were further studied in [1, 4, 2, 16]. It was also believed conjecture – appears to be quite difficult. Also, observed that nested words are closely related to a sequen- some temporal operators in CaRet (such as the past-time tial, or “event-based” API for XML known as SAX [24] (as call modalities), motivated by program analysis, may not be opposed to a tree-based DOM API [7]). SAX representation viewed as particularly natural in other applications. There is very important in streaming applications, and questions is much related work in the XML community on logics for related to recognizing classes of nested words by the usual trees (see, e.g., surveys [14, 15, 28]), but they tend to have word automata have been addressed in [27, 6]. different kinds of deficiency for our purposes: they concen- While finite nested words can indeed be seen as XML trate on the hierarchical structure of the data and largely documents under the SAX representation, and while much ignore its linear structure; also, they are designed for finite effort has been spent over the past decade on languages trees. for tree-structured data (see, e.g. [14, 15, 28] for surveys), We introduce and study new temporal logics over nested adapting the logics developed for tree-structured data is not words. The main logic we consider, Nested Word Tem- as straightforward as it might seem, even though from the poral Logic (NWTL) extends LTL with both a future and complexity point of view, translations between the DOM past variant of the standard Until operator, which is inter- and the SAX representations are easy [26]. The main prob- preted over summary paths rather than the ordinary linear lem is that most such logics rely on the tree-based repre- sequence of positions. A summary path is the unique short- sentation and ignore the linear structure, making the nat- est directed path one can take between a position in a run ural navigation through nested words rather unnatural un- and some future position, if one is allowed to use both der the tree representation. Translations between DOM and successor edges and matching call-return summary edges. SAX are easy for first-order properties, but verifying nav- We show that NWTL possesses all the desirable proper- igational properties expressed in first-order is necessarily ties we want from a temporal logic on nested words. In non-elementary even for words if one wants to keep the particular, it is both first-order expressively complete and data complexity linear [10]. On the other hand, logics for has good model checking complexity. Indeed we provide XML tend to have good model-checking properties (at least

2 in the finite case), typically matching the complexity of word) induced by elements ℓ such that i ≤ ℓ ≤ j. If j 1 iff ϕ is true in position i − 1, ∗ a1 ...an ∈ Σ , and (µ, call, ret) is a matching on [1,n]. and ⊖µϕ is true in position j if j is a return position with A nested ω-word is a tuple w¯ = (w, µ, call, ret), where matching call i and ϕ is true at i. ω w = a1 . . . ∈ Σ , and (µ, call, ret) is a matching on N. The until/since operators depend on what a path is. In We say that a position i in a nested word w¯ is a call po- general, there are various notions of paths through a nested sition if call(i) holds; a return position if ret(i) holds; word. We shall consider until/since operators for paths that and an internal position if it is neither a call nor a return. are unambiguous: that is, for every pair of positions i and j If µ(i, j) holds, we say that i is the matching call of j, with i < j, there could be at most one path between them. and j is the matching return of i, and write c(j) = i and Then, with respect to any such given notion of a path, we r(i)= j. Calls without matching returns are pending calls, have the until and since operators with the usual semantics: and returns without matching calls are pending returns. A nested word is said to be well-matched if no calls or returns • (¯w,i) |= ϕUψ iff there is a position j ≥ i and a path are pending. Note that for well-matched nested words, the i = i0 < i1 <...

3 the current action i is executed, and its corresponding re- For example, in the figure below, h2, 4, 5i is a call path, turn. Formally, C(i) is the greatest matched call position h3, 4, 6, 7, 8, 10i is both an abstract and a summary path; ji whose matching call is before i. abstract path from 3 to 9). Definition 3.1 (Linear, call and abstract paths) Given positions i < j, a sequence i = i0

• a linear path if ip+1 = ip +1 for all p < k; 1 2 3 4 5 6 7 8 9 10 11 • a call path if ip = C(ip+1) for all p < k; • an abstract path if 4 Expressive Completeness

r(ip) if ip is a matched call ip+1 = In this section we study logics that are expressively com- otherwise. (ip +1 plete for FO, i.e. temporal logics that have exactly the same power as FO formulas in one free variable over finite and We shall denote until/since operators corresponding to infinite nested words. In other words, for every formula ϕ U S Uc Sc these paths by / for linear paths, / for call paths, of an expressively complete temporal logic there is an FO and Ua Sa for abstract paths. 1 / formula ϕ′(x) such that (¯w,i) |= ϕ iff w¯ |= ϕ′(i) for every Our logics will have some of the next/previous and un- nested word w¯ and position i in it, and conversely, for every ′ til/since operators. Some examples are: FO formula ψ(x) there is a temporal formula ψ such that w¯ |= ψ(i) iff (¯w,i) |= ψ′. • When we restrict ourselves to the purely linear frag- Our starting point is a logic NWTL (nested-word tempo- ment, our operators are and ⊖, and U and S, i.e. ral logic) based on summary paths introduced in the previ- precisely LTL (with past operators). ous section. We show that this logic is expressively com- • The logic CaRet [2] has the following operators: the plete (and of course remains expressively complete under next operators and µ; the linear and abstract untils the addition of operators present in logics inspired by veri- (i.e., U and Ua), the call since (i.e., Sc) and a previous fication of properties of execution paths in programs). This operator ⊖c that will be defined in Section 4.2. latter remark will be of importance later, when we study the complexity of model checking. Another notion of a path combines both the linear and We then look at logics close to those in verification lit- the nesting structure. It is the shortest path between two erature, i.e. with operators such as call and abstract until positions i and j. Unlike an abstract path, it decides when and since, and ask what needs to be added to them to get to skip a call based on position j. Basically, a summary path expressive completeness. We confirm a conjecture of [2] from i to j moves along successor edges until it finds a call that a within operator is what’s needed: such an operator position k. If k has a matching return ℓ such that j appears evaluates a formula on a nested subword. after ℓ, then the summary path skips the entire call from k to ℓ and continues from ℓ; otherwise the path continues as a 4.1 Expressive completeness and NWTL successor path. Note that every abstract path is a summary path, but there are summarypaths that are not abstract paths. The logic NWTL (nested words temporal logic) has next Definition 3.2 A summary path between i < j in a nested and previous operators, as well as until and since with re- word w¯ is a sequence i = i0

4 Proof sketch. Translation of NWTL into FO is quite property says that FO sentences over the usual, unnested, straightforward, but we show how to do it carefully, to get words can be evaluated without using the previous ⊖ and the 3-variable property. For the converse, we define yet an- since S operators. Let NWTLfuture be the fragment of σ other notion of path, called a strict summary path, that is NWTL that does not use S and the operators ⊖ and ⊖µ. different from summary paths in two ways. First, if it skips a call, it jumps from i not to r(i) but r(i)+1. Second, if it Proposition 4.4 There are FO sentences over nested words reaches a matched return position, it stops. We then look at that cannot be expressed in NWTLfuture. the logic NWTLs in which the semantics of until and since is modified so that they refer to strict summary paths. We then show that NWTLs ⊆ NWTL and FO ⊆ NWTLs. Proof sketch. Let w¯1 and w¯2 be two well-matched nested words, of length n1 and n2 respectively. We first show The former is by a direct translation. The proof of future FO ⊆ NWTLs is in two parts. First we deal with the fi- that, for every NWTL formula, there is an integer k nite case. We look at the standard translation from nested such that w¯1[i1,n1] ≡k w¯2[i2,n2] implies (¯w1,i1) |= ϕ iff words into binary trees. If a matched call position i is trans- (¯w2,i2) |= ϕ. Here ≡k means that Player II has a win in lated into a node s of a tree, then the first position inside the k-round Ehrenfeucht-Fra¨ıss´egame. This follows from expressive-completeness and properties of future formulas. the call is translated into the right successor of s, and the future linear successor of r(i) is translated into the left successor Using this, we show that there is no NWTL formula ⊖ of s. If i is an internal position, or an unmatched call or equivalent to µ⊤ ∧ µ a (checking whether the first return position, its linear successor is translated into the left position is a call, and the position preceding its matching 2 successor of s. With this translation, strict summary paths return is labeled a). become paths in a tree. Note also that adding all other until/since pairs to NWTL We next use until/since-based logics for trees from [18, does not change its expressiveness. That is, if we let + U S Uc Sc Ua Sa 25]. By a slight adaptation of techniques from these papers NWTL be NWTL + { , , , , , }, then: (in particular using the separation property from [18]), we Corollary 4.5 NWTL+ = FO. prove expressive completeness of a translation of NWTLs into a tree logic, and then derive expressive completeness of s Later, when we deal with model-checking, we shall NWTL for finite nested words. prove upper bound results for NWTL+ that, while expres- In the infinite case, we combine the finite case and the sively complete for FO, allows more operators. separation property of [18] with Kamp’s theorem and the separation property of LTL. Note that a nested ω-word 4.2 The within operator is translated into an infinite tree with exactly one infinite branch. A composition argument that labels positions of that branch with types of subtrees reduces each FO formula We now go back to the three until/since operators origi- to an LTL formula over that branch in which propositions nally proposed for temporal logics on nested words, based are types of subtrees, expressible in NWTLs by the proof on the the linear, call, and abstract paths. In other words, µ in the finite case. Using the separation properties, we then our basic logic, denoted by LTL , is show how to translate such a description into NWTLs. 2 ′ ′ Recall that FOk stands for a fragment of FO that consists ϕ, ϕ := ⊤ | a | call | ret | ¬ϕ | ϕ ∨ ϕ | ⊖ ⊖ of formulas which use at most k variables in total. First, ϕ | µϕ | ϕ | µϕ | U ′ S ′ Uc ′ Sc ′ Ua ′ Sa ′ from our translation from NWTL to FO we get: ϕ ϕ | ϕ ϕ | ϕ ϕ | ϕ ϕ | ϕ ϕ | ϕ ϕ Corollary 4.2 Over nested words, every FO formula with We now extend this logic with the following within op- at most one free variable is equivalent to an FO3 formula. erator proposed in [2]. If ϕ is a formula, then Wϕ is a for- mula, and (¯w,i) |= Wϕ iff i is a call, and (¯w[i, j],i) |= ϕ, Furthermore, for FO sentences, we can eliminate the where j = r(i) if i is a matched call and j = |w¯| if i is since operator. an unmatched call. In other words, Wϕ evaluates ϕ on a subword restricted to a single procedure. We denote such Corollary 4.3 For every FO sentence Φ over finite or in- µ finite nested words, there is a formula ϕ of NWTL that an extended logic by LTL + W. does not use the since operator Sσ such that iff w¯ |= Φ µ (¯w, 1) |= ϕ. Theorem 4.6 LTL +W = FO over both finite and infinite nested words. The previous operators ⊖ and ⊖µ, however, are needed even for FO sentences over nested words. This situation The inclusion of LTLµ + W into FO is routine. The con- is quite different thus from LTL, for which the separation verse is done by encoding NWTL into LTLµ + W.

5 CaRet and other within operators The logic CaRet, as states Q, a set of initial states Q0 ⊆ Q, a set of (linear) ac- defined in [2], did not have all the operators of LTLµ. In cepting states F ⊆ Q, a set of pending call accepting states fact it did not have the previous operators ⊖ and ⊖µ, and Fc ⊆ Q, a call-transition relation δc ⊆ Q × Σ × Q × Q, it only had linear and abstract until operators, and the call an internal-transition relation δi ⊆ Q × Σ × Q, a since operator. That is, CaRet was defined as return-transition relation δr ⊆ Q × Q × Σ × Q, and a pending-return-transition relation δpr ⊆ Q × Σ × Q. The ′ ′ ϕ, ϕ := ⊤ | a | call | ret | ¬ϕ | ϕ ∨ ϕ | automaton A starts in the initial state and reads the nested ϕ | µϕ | ⊖cϕ | word from left to right. The state is propagated along U ′ Ua ′ Sc ′ ϕ ϕ | ϕ ϕ | ϕ ϕ , the linear edges as in case of a standard word automaton. However, at a call, the nested word automaton propagates and we assume that a ranges over Σ ∪{pret}, where pret states along the linear edges and also along the nesting edge is true in pendingreturns(which is not definable with the re- (if there is no matching return, then the latter is required maining operators). Here ⊖ is the previous operator cor- c to be in F for acceptance). At a matched return, the new responding to call paths. Formally, (¯w,i) |= ⊖ ϕ if C(i) is c c state is determined based on the states propagated along the defined and (¯w, C(i)) |= ϕ. linear as well as the nesting incoming edges. A natural question is whether there is an expressively- Formally, a run r of the automaton A over a nested complete extension of this logic. It turns out that two within word w¯ = (a a . . . , µ, call, ret) is a sequence q , q ,... operators based on C and R (the innermost call and its re- 1 2 0 1 of states along the linear edges, and a sequence q′, for turn) functions provide such an extension. We define two i every call position i, of states along nesting edges, such new formulas Cϕ and Rϕ with the semantics as follows: that q0 ∈ Q0 and for each position i, if i is a call then ′ • (¯w,i) |= Cϕ iff w¯[j, i] |= ϕ, where j = C(i) if C(i) is (qi−1,ai, qi, qi) ∈ δc; if i is an internal, then (qi−1,ai, qi) ∈ ′ defined, and j =1 otherwise. δi; if i is a return such that µ(j, i), then (qi−1, qj ,ai, qi) ∈ δ ; and if i is an unmatched return then (q ,a , q ) ∈ δ . • (¯w,i) |= Rϕ if w¯[i, j] |= ϕ, where j = R(i) if R(i) r i−1 i i pr The run r is accepting if (1) for all pending calls i, q′ ∈ F , is defined, and j = |w¯| (if w¯ is finite) or ∞ (if w¯ is i c and (2) the final state for finite word of length , infinite) otherwise. qℓ ∈ F ℓ and for infinitely many positions i, qi ∈ F , for nested ω- Theorem 4.7 CaRet + {C, R} = FO over both finite and words. The automaton A accepts the nested word w¯ if it infinite nested words. has an accepting run over w¯. Nested word automata have same expressiveness as the The proof of this result is somewhat involved, and relies monadic second order logic over nested words, and the lan- on different techniques. The operators used in CaRet do not guage emptiness can be checked in polynomial-time [4]. correspond naturally to tree translations of nested words, and the lack of all until/since pairs makes a translation from 5.2 Tableau construction NWTL hard. We thus use a composition argument directly on nested words. We now show how to build an NWA accepting the satis- fying models of a formula of NWTL+. This leads to deci- 5 Model-Checking and Satisfiability sion procedures for satisfiability and model checking. Let us first consider special kinds of summary paths: In this section we show that both model-checking and summary-down paths are allowed to use only call edges satisfiability are single-exponential-time for NWTL. In fact (from a call to the first position inside the call), nesting we prove this bound for NWTL+, an FO-complete exten- edges (from a call to its matching return), and internal edges sion of NWTL with all of U, S, Uc, Sc, Ua, Sa. We use (from an internal or return position to a call or internal po- automata-theoretic techniques, by translating formula into sition). Summary-up paths are allowed to use only return equivalent automata on nested words. We then show that edges (from a position preceding a return to the return), a different expressively complete logic based on adding the nesting edges and internal edges. We will use Uσ↓ and within operator to CaRet requires doubly-exponential time Uσ↑ to denote the corresponding until operators. Observe for model-checking, but is exponentially more succinct. that ϕUσ ψ is equivalent to ϕ Uσ↑(ϕ Uσ↓ψ). Given a formula ϕ, we wish to construct a nested word 5.1 Nested word automata automaton Aϕ whose states correspond to sets of subformu- las of ϕ. Intuitively, given a nested word w¯, a run r, which ′ A nondeterministic nested word automaton is a linear sequence q0q1 . . . of states and states qi labeling (NWA) A over an alphabet Σ is a structure nesting edges from call positions, should be such that each (Q,Q0,F,Fc,δc,δi,δr,δpr) consisting of a finite set of state qi is precisely the set of formulas that hold at position

6 ′ cl i+1. The label qi is used to remember abstract-next formu- ψ ∈ (ϕ), ψ ∈ Ψ iff ψ ∈ Φ; (d)foreach µψ ∈ las that hold at position i and the abstract-previous formulas cl(ϕ), µψ 6∈ Φ. that hold at matching return. For clarity of presentation, we The transition relation ensures that the current symbol is focus on formulas with next operators and µ, and until over summary-down paths. It is easy to modify the con- consistent with the atomic propositions in the current state, struction to allow other types of untils and past operators. and next operators requirements are correctly propagated. Given a formula ϕ, the closure of ϕ, denoted by cl(ϕ), An atom Φ belongs to the set Fc iff Φ does not con- is the smallest set that satisfies the following properties: tain any abstract-next formula, and this ensures that, in an cl(ϕ) contains ϕ, call, ret, int, and ret; if either accepting run, at a pending call, no requirements are prop- agated along the nesting edge. For each until-formula ψ in ¬ψ, or ψ or µψ is in cl(ϕ) then ψ ∈ cl(ϕ); if ψ ∨ ψ′ ∈ cl(ϕ), then ψ, ψ′ ∈ cl(ϕ); if ψ Uσ↓ψ′ ∈ cl(ϕ), the closure, let Fψ be the set of atoms that either do not con- ′ Uσ↓ ′ Uσ↓ ′ cl tain ψ or contain the second argument of ψ. Then a nested then ψ, ψ , (ψ ψ ), and µ(ψ ψ ) are in (ϕ); AP and if cl and is not of the form (for any ), word w¯ over the alphabet 2 satisfies ϕ iff there is a run ψ ∈ (ϕ) ψ ¬θ θ cl then ¬ψ ∈ cl(ϕ). It is straightforward to see that the size of r of Aϕ over w¯ such that for each until-formula ψ ∈ (ϕ), cl(ϕ) is only linear in the size of ϕ. Henceforth, we identify for infinitely many positions i, qi ∈ Fψ. Thus, ¬¬ψ with the formula ψ. Theorem 5.1 For a formula of NWTL+, one can effec- cl ϕ An atom of ϕ is a set Φ ⊆ (ϕ) that satisfies: tively construct a nondeterministic Buchi¨ nested word au- O(|ϕ|) • For every ψ ∈ cl(ϕ), ψ ∈ Φ iff ¬ψ 6∈ Φ . tomaton Aϕ of size 2 accepting the models of ϕ. ′ cl ′ • For every formula ψ ∨ ψ ∈ (ϕ), ψ ∨ ψ ∈ Φ iff Since the automaton Aϕ is exponential in the size of ϕ, ′ (ψ ∈ Φ or ψ ∈ Φ). we can check satisfiability of ϕ in exponential-time by test- • For every formula ψ Uσ↓ψ′ ∈ cl(ϕ), ψ Uσ↓ψ′ ∈ Φ ing emptiness of Aϕ. EXPTIME-hardness follows from the iff either ψ′ ∈ Φ or (ψ ∈ Φ and ret 6∈ Φ and either corresponding hardness result for CaRet. σ↓ ′ σ↓ ′ (ψ U ψ ) ∈ Φ or µ(ψ U ψ ) ∈ Φ). Corollary 5.2 The satisfiability problem for NWTL+ is • Φ contains exactly one of the elements in the set EXPTIME-complete. {call, ret, int}. When programs are modeled by nested word automata These clauses capture local consistency requirements. A (or equivalently, pushdown automata, or recursive state Given a formula ϕ, we build a nested word automaton AP machines), and specifications are given by formulas ϕ of Aϕ as follows. The alphabet Σ is 2 , where AP is the set NWTL+, we can use the classical automata-theoretic ap- of atomic propositions. proach: negate the specification, build the NWA A¬ϕ ac- cepting models that violate , take product with the pro- 1. Atoms of ϕ are states of Aϕ; ϕ gram A, and test for emptiness of L(A) ∩ L(A ). Note 2. An atom is an initial state iff ; ¬ϕ Φ ϕ ∈ Φ that the program typically will be given more compactly, 3. For atoms Φ, Ψ and a symbol a ⊆ AP, (Φ, a, Ψ) is an say, as a Boolean program [5], and thus, the NWA A may internal transition of Aϕ iff (a) int ∈ Φ; and (b) for itself be exponential in the size of the input. p ∈ AP, p ∈ a iff p ∈ Φ; and (c) for each ψ ∈ + cl(ϕ), ψ ∈ Ψ iff ψ ∈ Φ; (d) for each µψ ∈ cl(ϕ), Corollary 5.3 Model checking NWTL specifications with µψ 6∈ Φ. respect to Boolean programs is EXPTIME-complete. 4. For atoms Φ, Ψ , Ψ and a symbol a ⊆ AP, l h 5.3 Checking within operator (Φ, a, Ψl, Ψh) is a call transition of Aϕ iff (a) call ∈ Φ; and (b) for p ∈ AP, p ∈ a iff p ∈ Φ; and (c) for We now show that adding within operators makes model- each ψ ∈ cl(ϕ), ψ ∈ Ψl iff ψ ∈ Φ; and (d) for cl checking doubly exponential. Given a formula ϕ of NWTL each µψ ∈ (ϕ), µψ ∈ Ψh iff µψ ∈ Φ. + or NWTL , let pϕ be a special proposition that does not 5. For atoms Φl, Φh, Ψ and a symbol a ⊆ AP, appear in ϕ. Let Wϕ be the language of nested words w¯ (Φl, Φh, a, Ψ) is a return transition of Aϕ iff (a) ret ∈ such that for each position i, (¯w,i) |= p iff (¯w,i) |= Wϕ. AP ϕ Φl; and (b) for p ∈ , p ∈ a iff p ∈ Φl; and (c) for We construct a doubly-exponential automaton B that cap- cl each ψ ∈ (ϕ), ψ ∈ Ψ iff ψ ∈ Φl; and (d) for tures W . First, using the tableau construction for NWTL+, cl ϕ each µψ ∈ (ϕ), µψ ∈ Φh iff ψ ∈ Φl. we construct an exponential-size automaton A that captures 6. For atoms Φ, Ψ and a symbol a ⊆ AP, (Φ, a, Ψ) is a nested words that satisfy ϕ. Intuitively, every time a propo- pending-return transition of Aϕ iff (a) ret ∈ Φ; and sition pϕ is encountered, we want to start a new copy of A, (b) for p ∈ AP, p ∈ a iff p ∈ Φ; and (c) for each and a state of B keeps track of states of multiple copies of

7 A. At a call, B guesses whether the call has a matching re- Remark: checking w¯ |= ϕ for finite nested words For turn or not. In the latter case, we need to maintain pairs of finite nested words, one evaluates the complexity of check- states of A so that the join at return positions can be done ing whether the given word satisfies a formula, in terms correctly. A state of B, then, is eitheraset ofstates of A ora of the length |w¯| of the word and the size of the formula. set of pairs of states of A. We explain the latter case. A pair A straightforward recursion on subformulas shows that for (q, q′) belongsto the state of B, while reading position i ofa NWTL formulas the complexity of this check is O(|w¯|·|ϕ|), nested word w¯, if the subword from i to the first unmatched and for both logics with within operators, CaRet + {C, R} return can take A from q to q′. When reading an internal and LTLµ + W, it is O(|w¯|2 · |ϕ|). symbol a, a summary (q, q′) in current state can be updated to (u, q′), provided A has an internal transition from q to 5.4 On within and succinctness u on a. Let B read a call symbol a. Consider a summary ′ (q, q ) in the current state, and a call-transition (q,a,ql, qh) We saw that adding within operators to NWTL+ in- of A. Then B guesses the return transition (ul, qh,b,u) that creases the complexity of model-checkingby one exponent. will be used by A at the matching return, and sends the In particular, there could be no polynomial-time translation ′ summary (ql,ul) along the call edge and the triple (b,u,q ) from NWTL+ + W to NWTL+. We now prove a stronger along the nesting edge. While processing a return symbol result that gives a space bound as well: while NWTL+ +W b, the current state of B must contain summaries only of has the same power as NWTL+, its formulae can be ex- the form (q, q) where the two states match, and for each ponentially more succinct than formulas NWTL+. That is, ′ + summary (b,u,q ) retrieved from the state along the nesting there is a sequence ϕn, n ∈ N, of NWTL + W formulas ′ edge, the new state contains (u, q ). Finally, B must enforce such that ϕn is of size O(n), and the smallest formula of + Ω(n) that Wϕ holds when pϕ is read. Only a call symbol a can NWTL equivalent to ϕn is of size 2 . For this result, contain pϕ, and when reading such a symbol, B guesses a we require nested ω-words to be over the alphabet 2AP . call transition (q0,a,ql, qh), where q0 is the initial state of + A, and a return transition (ul, qh,b,qf ), where qf is an ac- Theorem 5.6 NWTL +W is exponentially more succinct + cepting state of A, and sends the summary (ql,ul) along the than NWTL . call edge and the symbol b along the nesting edge. + The proof is based upon succinctness results in [9, 22], Lemma 5.4 For every formula ϕ of NWTL , there is a by adapting their examples to nested words. nested word automaton that accepts the language Wϕ and has size doubly-exponential in |ϕ|. 6 Finite-Variable Fragments Consider a formula ϕ of NWTL++W. For every within- ′ subformula Wϕ of ϕ, let ϕ be obtained from ϕ by substi- We have already seen that FO formulas in one free vari- tuting each top-level subformula Wψ in ϕ by the propo- able over nested words can be written using just three dis- sition pψ. Each of these primed formulas is a formula of + tinct variables, as in the case of the usual, unnested, words. NWTL . Then, if we take the product of the nested word For finite nested words this is a consequence of a tree rep- automata accepting Wϕ′ corresponding to all the within- resentation of nested words and the three-variable property subformulas ϕ, together with the nested word automaton for FO over finite trees [19], and for infinite nested words Aϕ′ , the resulting language captures the set of models of this is a consequence Theorem 4.1. ϕ. Intuitively, the automaton for Wϕ′ is ensuring that the In this section we prove two results. First, we give a truth of the proposition pϕ reflects the truth of the subfor- model-theoretic proof that FO formulas with zero, one, or mula Wϕ. If ϕ itself has a within-subformula Wψ, then the two free variables over nested words (finite or infinite) are automaton for treats it as an atomic proposition , and 3 3 ϕ pψ equivalent to FO formulas. Given the FO = FO col- the automaton checking pψ, running in parallel, makes sure lapse, we ask whether there is a temporal logic expressively that the truth of correctly reflects the truth of . 2 pψ Wψ complete for FO , the two-variable fragment. We adapt For the lower bound, the decision problem for LTL techniques from [9] to find a temporal logic that has the games can be reduced to satisfiability problem for formulas 2 same expressiveness as FO over nested words (in a vo- with linear untils and within operators [17], and this shows cabulary that has successor relations corresponding to the that for CaRet extended with the within operator, the satis- “next” temporal operators). fiability problem is 2EXPTIME-hard. We thus obtain: Proposition 5.5 For the logic NWTL+ extended with the 6.1 The three-variable property within operator W the satisfiability problem and the model checking problem with respect to Boolean programs, are We give a model-theoretic, rather than a syntactic, argu- both 2EXPTIME-complete. ment, that uses Ehrenfeucht-Fra¨ıss´egames and shows that

8 over nested words, formulas with at most two free vari- 6.2 The two-variable fragment ables are equivalent to FO3 formulas. Note that for finite nested words, the translation into trees, already used in the In this section, we construct a temporal logic that cap- proof of Theorem 4.1, can be done using at most three vari- tures the two-variable fragment of FO. Note that for fi- ables. This means that the result of [19] establishing the nite unranked trees, a navigational logic capturing FO2 is 3-variable property for finite ordered unranked trees gives known [20, 19]: it corresponds to a fragment of XPath. us the 3-variable property for finite nested words. We prove However, translating the basic predicates over trees into the that FO = FO3 over arbitrary nested words. vocabulary of nested words requires 3 variables, and thus we cannot apply existing results even in the finite case. 2 Theorem 6.1 Over finite or infinite nested words, every FO Since FO over a linear ordering cannot define the suc- cessor relation but temporal logics have next operators, we formula with at most 2 free variables is equivalent to an explicitly introduce successors into the vocabulary of . FO3 formula. FO These successor relations in effect partition the linear edges into three disjoint types; interior edges, call edges, and re- Proof: We look at infinite nested words since the finite turn edges, and the nesting edges (except those from a po- case was settled in [19]. It is more convenient to prove the sition to its linear successor) into two disjoint types; call- result for ordered unranked forests in which every subtree is return summaries, and call-interior-return summaries. finite. We translate a nested ω-word into such a forest is as • Si(i, j) holds iff j = i +1 and either µ(i, j) or i is not follows: when a call i with µ(i, j) is encountered, it defines a call and j is not a return. a subtree with i as its root, and j +1 as the next sibling (note c that this is different from the translation into binary trees we • S (i, j) holds iff i is a calland j = i+1 is not a return; used before). If i is an internal position, or a pending call • Sr(i, j) holds iff i is nota calland j = i+1 is a return. or return position, then it has no descendants and its next • Scr(i, j) holds iff µ(i, j) and there is a path from i to sibling is i +1. Matched returns do not have next sibling. j using only call and return edges. It is routine to define, in FO, relations  and  for desc sib cir holds iff and neither nor descendant and younger sibling in such a forest. Further- • S (i, j) µ(i, j) j = i +1 Scr(i, j). more, from these relations, we can define the usual ≤ and µ in nested words using at most 3 variables as follows. The Let T denote the set {c,i,r,cr,cir} of all edge types. In formulas for x ≤ y and µ(x, y) are given by addition to the built-in predicates St for t ∈ T , we add the transitive closure of all unions of subsets of these relations. (y desc x) ∨ ∃z x desc z ∧ ∃x (z ≺sib x ∧ y desc x) That is, for each non-empty set Γ ⊆ T of edge types, let t (y desc x) ∧ ∀z (z desc x) → z ≤ y) . SΓ stand for the union ∪ S , and let ≤Γ be the reflexive-  t∈Γ transitive closure of SΓ. Now when we refer to FO2 over  Thus, it suffices to prove the three-variable property for nested words, we mean FO in the vocabulary of the unary such ordered forests, which will be referred to as A, B, etc. predicates plus all the ≤Γ’s, the five successor relations, and Gv We shall use pebble games. Let m(A,a1,b1, B,b1,b2) be the built-in unary call and ret predicates. the m-move, v-pebble game on structures A and B where We define a temporal logic unary-NWTL that has future initially pebbles xi are placed on ai in A and bi in B. and past versions of next operators parameterized by edge Gv Player II has a winning strategy for m(A,a1,b1, B,b1,b2) types, and eventually operators parameterized by a set of iff A,a1,a2 and B,b1,b2 agree on all formulas with at edge types. Its formulas are given by: most v variables and quantifier-depth m. We know from ′ [13] that to prove Theorem 6.1, it suffices to show that, ϕ := ⊤ | a | call | ret | ¬ϕ | ϕ ∨ ϕ | t t Γ for all k, if Player II has a winning strategy for the game ϕ | ⊖ ϕ | 3Γϕ | ϕ G3 (A,a ,a ; B,b ,b ), then she also has a winning 3k+2 1 2 1 2 where a ranges over Σ, t ranges over T , and Γ ranges over strategy for the game Gk(A,a ,a ; B,b ,b ). k 1 2 1 2 non-empty subsets of T . The semantics is defined in the We show that Player II can win the k-pebble game by obvious way; for example, (¯w,i) |= 3Γϕ iff for some po- maintaining a set of 3-pebblesubgameson which she copies sition i ≤Γ j, (¯w, j) |= ϕ. Player I’s moves and decides on responses using her win- For an FO2 formula ϕ(x) with one free variable x, let ning strategy for these smaller 3-pebble games. The choice qdp(ϕ) be its quantifier depth, and for a unary-NWTL for- of these sub-games will partition the universe |A| ∪ |B| so mula ϕ′, let odp(ϕ′) be its operator depth. that each play by Player I in the k-pebble game will be an- swered in one 3-pebble game. This is similar to the proof Theorem 6.2 1. unary-NWTL is expressively complete that linear orderings have the 3-variable property [13]. 2 for FO2.

9 2. If formulas are viewed as DAGs (i.e identical sub- [3] R. Alur and P. Madhusudan. Visibly pushdown languages. formulas are shared), then every FO2 formula ϕ(x) In STOC’04, pages 202–211. can be converted to an equivalent unary-NWTL for- [4] R. Alur and P. Madhusudan. Adding nesting structure to mula ϕ′ of size 2O(|ϕ|(qdp(ϕ)+1)) and odp(ϕ′) ≤ words. In DTL’06, pages 1–13. 10 qdp(ϕ). The translation is computable in time [5] T. Ball and S. Rajamani. Bebop: A symbolic model checker polynomial in the size of ϕ′. for boolean programs. In SPIN’00, pages 113–130. 3. Model checking of unary-NWTL can be carried out [6] V. B´ar´any, C. L´oding, O. Serre. Regularity problems for vis- ibly pushdown languages. STACS 2006, pages 420–431. with the same worst case complexity as for NWTL. [7] Document Object Model DOM. http://www.w3.org/DOM. [8] J. Esparza and S. Schwoon. A BDD-based model checker 2 Proof sketch. The translation from unary-NWTL into FO for recursive programs. In CAV’01, pages 324–336. is standard. For the other direction we adapt techniques of [9] K. Etessami, M. Vardi, and T. Wilke. First-order logic with 2 [9]. Given an FO formula ϕ(x), the translation works a two variables and unary temporal logic. Information and follows. When ϕ(x) is of the form a(x), for a proposi- Computation 179(2): 279–295, 2002. tion a, it outputs a. The cases of Boolean connectives are [10] M. Frick, M. Grohe. The complexity of first-order and straightforward. The two cases that remain are when ϕ(x) monadic second-order logic revisited. LICS 2002, 215–224. ∗ ∗ is of the form ∃x ϕ (x) or ∃y ϕ (x, y). In both cases, we [11] G. Gottlob, C. Koch. Monadic datalog and the expressive say that ϕ(x) is existential. In the first case, ϕ(x) is equiv- power of languages for web information extraction. Journal alent to ∃y ϕ∗(y) and, viewing x as a dummy free variable of the ACM 51 (2004), 74–113. in ϕ∗(y), this reduces to the second case. [12] H.W. Kamp. Tense Logic and the Theory of Linear Order. In the second case, we can rewrite ϕ∗(x, y) as Ph.D. Thesis, UCLA, 1968. β(χ0(x, y), ..., χr−1(x, y), ξ0(x), ..., ξs−1(x), ζ0(y), [13] N. Immerman and D. Kozen. Definability with bounded ...,ζt−1(y)), where β is a propositional formula, each χi number of bound variables. Information and Computation, is an atomic order formula, and ξi’s and ζi’s are atomic or 83 (1989), 121-139. existential FO2 formulas with quantifier depth < qdp(ϕ). [14] N. Klarlund, T. Schwentick and D. Suciu. XML: model, In order to be able to recurse on subformulas of ϕ we have schemas, types, logics, and queries. In Logics for Emerg- ing Applications of Databases, Springer, 2003, pages 1–41. to separate the ξi’s from the ζi’s. For that, we consider mu- tually exclusive and complete order types that enumerate [15] L. Libkin. Logics for unranked trees: an overview. In ICALP possible order relations between x and y with respect to 2005, pages 35-50. different St’s. Under each order type, each atomic order [16] C. L¨oding, P. Madhusudan, O. Serre. Visibly pushdown formula evaluates to either ⊤ or ⊥. Furthermore, if τ is games. In FSTTCS 2004, pages 408–420. an order type, ψ(x) an FO2 formula, and ψ′ an equivalent [17] P. Madhusudan, personal communication. unary-NWTL formula, one can obtain a unary-NWTL for- [18] M. Marx. Conditional XPath, the first order complete XPath mula τhψi equivalent to ∃y(τ ∧ ψ(y)). Using this and the dialect. In PODS’04, pages 13–22. ′ ′ ′ hypothesis for ξi for i

10 A Proofs and Intermediate Results

Some terminology

The quantifier rank (or quantifier depth)ofan FO formula ϕ is the depth of quantifier nesting in ϕ. The rank-k type of a structure M over a relational vocabulary is the set {ϕ | M |= ϕ and the quantifier rank of ϕ is k}, where ϕ ranges over FO sentences over the vocabulary. It is well-known that there are finitely many rank-k types for all k, and for each rank-k type τ there is an FO sentence ϕτ such that M |= ϕτ iff the rank-k type of M is τ. Sometimes we associate types with formulas that define them. Many proofs in this paper make use of Ehrenfeucht-Fra¨ısse` (EF) games. This game is played in two structures, M and M′, over the same vocabulary, by two players, Player I and the Player II. In round i Player I selects a structure, say M, and ′ an element ci in the domain of M; Player II responds by selecting an element ei in the domain of M . Player II wins in k ′ rounds, for k ≥ 0, if {(ci,ei) | i ≤ k} defines a partial isomorphism between M and M . Also, if a¯ is an m-tuple in the ′ ′ domain of M and ¯b is an m-tuple in the domain of M , where m ≥ 0, we write (M, a¯) ≡k (M , ¯b) whenever Player II wins in k rounds no matter how Player I plays, but starting from position (¯a, ¯b). ′ ′ We write M ≡k M iff M and M have the same rank-k type, that is for every FO sentence ϕ of quantifier rank-k, ′ ′ M |= ϕ ⇔ M |= ϕ. It is well-known that M ≡k M iff Player II has a winning strategy in the k-round Ehrenfeucht-Fra¨ıss`e game on M and M′. In the proof of Theorem 6.1, we shall also use k-pebble games. In such a game, Player I and Player II have access k matching pebbles each, and each round consists of Player I either removing, or placing, or replacing a pebble in one structure, and Player II replicating the move in the other structure. The correspondencegiven by the matching pebbles should be a partial isomorphism. If Player II can play while maintaining partial isomorphism for m rounds, then the structures agree on all FOk sentences of quantifier rank up to m; if Player II can play while maintaining partial isomorphism forever, then the structures agree on all FOk sentences.

Proof of Theorem 4.1

We start with the easy direction NWTL ⊆ FO.

Lemma A.1 For every NWTL formula ϕ, there exists an FO formula αϕ(x) that uses at most three variables x,y,z such that for every nested word w¯ (finite or infinite), we have (¯w,i) |= ϕ iff w¯ |= αϕ(i).

Proof of Lemma A.1: The proof is by induction on the formulas and very simple for all the cases except Uσ and Sσ: for example,

α µϕ(x) = ∃y µ(x, y) ∧ ∃x (x = y ∧ αϕ(x)) . σ For translating U , we need a few auxiliary formulas. Our first goal is to define a formula γr(x, z) saying that x is R(z), i.e. the return of the innermost call within which z is executed. For that, we start with δ(y,z) = z

∃y (y = x ∧ δ(y,z)) ∧ ∀y (δ(y,z) → y ≥ x).

Likewise, we define γc(y,z) stating that that y equals C(z), that is, the innermost call within which z is executed. Now define

χ1(y,z) = ∃x γr(x, z) ∧ x ≤ y , χ2(x, z) = ∃y γc(y,z) ∧ y ≥ x and χ(x,y,z) as χ1(y,z) ∧ χ2(x, z). Then this formula says that the summary path from x to ydoes not pass through z, assuming x

αψ(x) ∨ ∃y y > x ∧ αϕ(x) ∧ ∃x (x = y ∧ αψ(x)) ∧ ∀z (x

11 In the proof of the other direction, FO ⊆ NWTL, we shall need two variants of summary paths. A Semi-Strict summary path between positions i and j, with i < j, in a nested word w¯, is a sequence i = i0

r(ip)+1 if ip is a matched call and j > r(ip) ip+1 = (ip +1 otherwise. That is, when skipping a call, instead of jumping to the matching return position, a semi-strict path will jump to its successor. A strict summary path is a semi-strict summary path i = i0 < i1 < i2 < · · · < ik = j in which no ip with p < k is a matched return position. In other words, a strict summary path stops if it reaches a matched return position. In particular there may be positions i < j in a nested word such that no strict summary path exists between them. The until/since operators Uσ Sσ Uσ Sσ for semi-strict summary paths and strict summary paths will be denoted by ss/ ss and s / s , respectively. Versions of Uσ Sσ Uσ Sσ Uσ Sσ ss s NWTL in which / are replaced by ss/ ss (or s / s ) will be denoted by NWTL as NWTL . We will use mret for ret∧⊖µ⊤, and mcall for call∧ µ⊤, to capture matching return and call positions, respectively. The proof is based on two lemmas.

Lemma A.2 NWTLs ⊆ NWTLss ⊆ NWTL.

Lemma A.3 FO ⊆ NWTLs.

This of course implies the theorem: NWTL ⊆ FO ⊆ NWTLs ⊆ NWTLss ⊆ NWTL. Note that as a corollary we also obtain NWTLs = NWTLss = FO. s ss Proof of Lemma A.2: For translating an NWTL formula ϕ into an equivalent formula αϕ of NWTL we need to express Uσ Uσ Uσ ss ψ s θ with ss, which is simply (αψ ∧ ¬ret) ssαθ, and likewise for the since operators. For translating each NWTL formula ϕ into an equivalent NWTL formula βϕ, again we need to consider only the case of until/since operators. The Uσ formula ψ ssθ is translated into

σ βθ ∨ βψ ∧ (βψ ∨ ret) ∧ (¬call → βψ) U   

(βψ ∨ ret) ∧ (¬call → βθ) ∧ (call → ( βθ ∨ µ βθ ∨ (¬ret ∧ γ))) , (1)   where γ is a formula defined as follows:

σ (βψ ∨ ret) ∧ (¬call → βψ) ∧ ( ret → call) U  

(βψ ∨ ret) ∧ (¬call → βθ) ∧ (call → ( βθ ∨ µ βθ))   Sσ ss The translation for ss is similar. The proof that the translation is correct is by induction on the structure of NWTL formulas. Again we need to consider only the case of until/since operators. Assume that ψ, θ are equivalent to βψ and Uσ βθ, respectively. We need to prove that ψ ssθ is equivalent to (1). We only show here that if (¯w,i) satisfies (1), then Uσ (¯w,i) |= ψ ssθ, as the other direction is similar. Given that (¯w,i) satisfies (1), either (¯w,i) |= βθ or (¯w,i) satisfies the Uσ right-hand-side of the outer disjunction of (1). Given that βθ and θ are equivalent, in the former case (¯w,i) |= ψ ssθ. Thus, assume that the latter case holds. Then (¯w,i) |= ψ, since ψ and βψ are equivalent, and there exists a summary path i = i0

(¯w,ik) |= (βψ ∨ ret) ∧ (¬call → βψ) 0 ≤ k

1. First, assume that ip is not a call position. Then given that (¯w,ip) |= (¬call → βθ), we have that (¯w,ip + 1) |= βθ. Only one semi-strict summary path with endpoints i and ip +1 can be obtained from the sequence i0 < i1 < · · · < ip < ip +1 by removing some positions; let i = j0 < j1 < · · · < jℓ = ip +1 be that semi-strict summary path. Then we have that (¯w, jℓ) |= θ since (¯w,ip + 1) |= βθ. Next we show that (¯w, jk) |= ψ for every k ∈ [0,ℓ − 1].

12 If k = 0, then the property holds since (¯w,i) |= ψ. Assume that k ∈ [1,ℓ − 1]. If jk is not a return position, then (¯w, jk) |= ψ since (¯w, jk) |= (βψ ∨ ret). If jk is a return position, then given that jk is in the semi-strict summary path from j0 to jℓ, we have that jk − 1 is also in this semi-strict summary path and is not a call position. Thus, given that (¯w, jk − 1) |= (¬call → βψ), we have that (¯w, jk) |= βψ. We conclude that (¯w, jk) |= ψ and, hence, Uσ (¯w,i) |= ψ ssθ.

2. Second, assume that ip is a call position and (¯w,ip) |= βθ ∨ µ βθ. Then there exists a position ip+1 > ip such that (¯w,ip+1) |= βθ and ip+1 is either ip +1 or the linear successor of the matching return of ip. Only one semi-strict summary path with endpoints i and ip+1 can be obtained from the sequence i0 < i1 < · · · < ip < ip+1 by removing some positions; let i = j0 < j1 < · · · < jℓ = ip+1 be that semi-strict summary path. Then we have that (¯w, jℓ) |= θ since (¯w,ip+1) |= βθ. Next we show that (¯w, jk) |= ψ for every k ∈ [0,ℓ − 1]. If k =0, then the property holds since (¯w,i) |= ψ. Assume that k ∈ [1,ℓ − 1]. If jk is not a return position, then (¯w, jk) |= ψ since (¯w, jk) |= (βψ ∨ ret). If jk is a return position, then given that jk is in the semi-strict summary path from j0 to jℓ, we have that jk − 1 is also in this semi-strict summary path and is not a call position. Thus, given that (¯w, jk − 1) |= (¬call → βψ), we have that Uσ (¯w, jk) |= βψ. We conclude that (¯w, jk) |= ψ and, hence, (¯w,i) |= ψ ssθ.

3. Third, assume that ip is a call position and (¯w,ip) |= (¬ret∧γ). Then there exists a path ip +1 = ip+1

(¯w,ik) |= (βψ ∨ ret) ∧ (¬call → βψ) ∧ ( ret → call) p +1 ≤ k < q (¯w,iq) |= (βψ ∨ ret) ∧ (¬call → βθ) ∧ (call → ( βθ ∨ µ βθ))

Next we show that if ip is a matched call with return jp, then iq < jp. On the contrary, assume that iq ≥ jp. On the contrary, assume that iq ≥ jp. Then there exists k ∈ [p +1, q] such that ik = jp. Given that ip+1 is not a return position, we have that q >p +1 and, therefore, ik − 1 is also a position in the path ip+1 < ip+2 ... < iq. But given that ik = jp is the matching return of ip and ip +1 ≤ ik − 1, we have that ik − 1 is not a call position. Thus, (¯w,ik − 1) 6|= ret → call, which contradicts the fact that ip+1

(a) Assume that iq is not a call position. Then given that (¯w,iq) |= (¬call → βθ), we have that (¯w,iq + 1) |= βθ. Furthermore, given that if ip is a matched call with return jp, then we necessarily have that iq < jp, we conclude that only one semi-strict summary path with endpoints i = i0 and iq +1 can be obtained from the sequence i0

(b) Finally assume that iq is a call position. Then given that (¯w,iq) |= (call → ( βθ ∨ µ βθ)), there exists a position iq+1 > iq such that (¯w,iq+1) |= βθ and iq+1 is either iq +1 or the linear successor of the matching return of iq. Furthermore, given that if ip is a matched call with return jp, then we necessarily have that iq < jp, we conclude that only one semi-strict summary path with endpoints i = i0 and iq+1 can be obtained from the sequence i0 < i1 < · · · < iq < iq+1 by removing some positions; let i = j0 < j1 < · · · < jℓ = iq+1 be that semi-strict summary path. Then we have that (¯w, jℓ) |= θ since (¯w,iq+1) |= βθ. Next we show that (¯w, jk) |= ψ for every k ∈ [0,ℓ − 1]. If k =0, then the property holds since (¯w,i) |= ψ. Assume that k ∈ [1,ℓ − 1]. If jk is not a return position, then (¯w, jk) |= ψ since (¯w, jk) |= (βψ ∨ ret). If jk is a return position, then given that jk is in the semi-strict summary path from j0 to jℓ, we have that jk − 1 is also in this semi-strict summary path and is not a call position. Thus, given that (¯w, jk − 1) |= (¬call → βψ), we have that (¯w, jk) |= βψ. We conclude that Uσ (¯w, jk) |= ψ and, hence, (¯w,i) |= ψ ssθ. This concludes the proof of the lemma. 2

Proof of Lemma A.3: We start with the finite case, and then show how the inclusion extends to nested ω-words.

13 As a tool we shall need a slight modification of a result from [25, 18] providing an expressively complete temporal logic for trees with at most binary branching. We consider binary trees whose domain D is a prefix-closed subset of {0, 1}∗, and we impose a condition that if s · 1 ∈ D then s · 0 ∈ D. When we refer to FO on trees, we assume they have two successor relations S0,S1 and the descendant relation  (which is just the prefix relation on strings) plus the labeling predicates, which include two new labels pcall and pret (for pending calls and returns). Each node can be labeled by either a letter from Σ, or by a letter from Σ and pcall, or by a letter from Σ and pret (i.e. labels pcall and pret need not be disjoint from other labels). We also consider the following logic TLtree:

ϕ := a | ϕ ∨ ϕ | ¬ϕ | ↓ϕ | ↑ϕ | →ϕ | ←ϕ | ϕU↓ϕ | ϕS↓ϕ where a ranges over Σ ∪{pcall, pret}, with the following semantics:

• (T,s) |= ↓ϕ iff (T,s · i) |= ϕ either for i =0 or i =1;

• (T,s · i) |= ↑ϕ iff (T,s) |= ϕ (where i is either 0 or 1);

• (T,s · 0) |= →ϕ iff (T,s · 1) |= ϕ;

• (T,s · 1) |= ←ϕ iff (T,s · 0) |= ϕ;

′ ′ ′ ′′ ′′ ′′ ′ • (T,s) |= ϕU↓ψ iff there exists s such that s  s , (T,s ) |= ψ, and (T,s ) |= ϕ for all s such that s  s ≺ s ;

′ ′ ′ ′′ ′′ ′ ′′ • (T,s) |= ϕS↓ψ iff there exists s such that s  s, (T,s ) |= ψ, and (T,s ) |= ϕ for all s such that s ≺ s  s.

Lemma A.4 (see [18]) For unary queries over finite binary trees, TLtree = FO.

This lemma is an immediate corollary of expressive completeness of logic Xuntil from [18] on ordered unranked trees, as for a fixed number of siblings, the until and since operators can be expressed in terms of the next and previous operators. The result of [18] applies to arbitrary alphabets, and thus in particular to our labelings that may use pcall and pret. Next we need a translation from nested words into binary trees, essentially the same as in [4]. For each nested word w¯ we have a tree Tw¯ and a function ιw-t :w ¯ → Tw¯ that maps each position of w¯ to a node of Tw¯ as follows:

• the first position of w¯ is mapped into the root of Tw¯;

• if s = ιw-t(i) then: 1. if i is an internal, or an unamcthed call, or an unmatched return, and is not the last position of w¯, then s has only child s · 0 and ιw-t(i +1)= s · 0;

2. if i is a matched call, then s has both children s · 0 and s · 1 and ιw-t(r(i)+1)= s · 0, and ιw-t(i +1)= s · 1. 3. if i is a matched return, then then s has no children.

Of course the Σ-labels of i and ιw-t(i) are the same. If i was a pending call, we label ιw-t(i) with pcall, and if i was a pending return, we label ιw-t(i) with pret. Note that ιw-t is a bijection, and that labels pcall and pret may only occur on the leftmost branch of Tw¯. The following is immediate by straightforward translations.

Claim A.5 For every FO formula ϕ(x) over nested words there is an FO formula ϕ′(x) over trees such that for every nested ′ word w¯ and a position i in it, we have w¯ |= ϕ(i) iff Tw¯ |= ϕ (ιw-t(i)).

Since FO = TLtree by Lemma A.4, all that remains to prove is the following claim.

Claim A.6 For every TLtree formula ϕ, there exists an NWTLs formula ϕ◦ such that for every nested word w¯ and every position i in it, we have ◦ (¯w,i) |= ϕ ⇔ (Tw¯, ιw-t(i)) |= ϕ.

14 This is now done by induction, omitting the obvious cases of propositional letters and Boolean connectives. We note that the path in the tree between ιw-t(i) and ιw-t(j) corresponds precisely to the strict summary path from i to j (that is, if such a strict summary path is i = i0,i1,...,ik = j, then ιw-t(i0), ιw-t(i1),...,ιw-t(ik) is the path from ιw-t(i) to ιw-t(j) in Tw¯). Hence, the translations of until and since operators are:

U ◦ ◦Uσ ◦ S ◦ ◦Sσ ◦ (ϕ ↓ψ) = ϕ s ψ , (ϕ ↓ψ) = ϕ s ψ . For translating next and previous operators, and pending calls/returns, define:

mcall ≡ µ⊤ (true in a matched call position);

mret ≡ ⊖µ⊤ (true in a matched return position).

Then the rest of the translation is as follows:

pcall◦ ≡ call ∧ ¬mcall pret◦ ≡ ret ∧ ¬mret ◦ ◦ ◦ ( ↓ϕ) ≡ ¬ret ∧ ϕ ∨ (call ∧ µ ϕ )

◦  ◦  ◦ ( ↑ϕ) ≡ ⊖ret ∧ ⊖⊖µϕ ∨ ⊖¬ret ∧ ⊖ϕ ◦  ◦   ( →ϕ) ≡ ⊖ret ∧ ⊖⊖µ ϕ ◦ ◦ ( ←ϕ) ≡ ⊖call ∧ ⊖ µ ϕ

Now with the proof completed for finite nested words, we extend it to the case of nested ω-words. Note that Claim A.5 continues to hold, and Claim A.6 provides a syntactic translation that applies to both finite and infinite nested words, and thus we need to prove an analog of Lemma A.4 for trees of the form Tw¯, where w¯ ranges over nested ω-words. If w¯ is an nested ω-word, then Tw¯ has exactly one infinite branch, which consists precisely of ιw-t(i) where i is an outer position, i.e., not inside any (matched) call. We say that i is inside a call if there exists a call j with a matching return k such that ji being its matching return, then the left successor of ιw-t(i) on the infinite path is w¯ ιw-t(j + 1). Furthermore, the subtree t (i) which has i as the root, its right child, and all the descendants of the right child is finite and isomorphic to Tw¯[i,j] (note that w¯[i, j] has no pending calls/returns). If i an internal or pending call/return outer position, we let tw¯(i) be a single node tree labeled as i in w¯. w¯ w¯ Let w¯ now be an nested ω-word. For each outer position i we let τm(i) be the rank-m type of t (i). If i is not a matched call, such a type is completely described by i’s label (which consists of a label in Σ and potentially pcall or pret). w¯ If j is not an outer position, and i is an outer position such that i < j ≤ k, where k is the matching return of i, then τm(j) is the rank-m type of (Tw¯[i, k], ιw-t(j)) (i.e., the rank of Tw¯[i, k] with a distinguished node corresponding to j). Next, for an nested ω-word w¯, let s be a node in Tw¯ such that s = ιw-t(i). Let i1,i2,... enumerate all the outer positions w¯ of w¯, and assume that ip is such that ip ≤ i

′ ′ ′ Claim A.7 Let w,¯ w¯ be two nested ω-words, and s = ιw-t(i),s = ιw-t(i ) two nodes in Tw¯ and Tw¯′ such that:

← ← ′ ′ (a) sm (¯w,s) ≡m sm (¯w ,s ); → → ′ ′ (b) sm (¯w,s) ≡m sm (¯w ,s ); ′ w¯ w¯ ′ (c) τm(i)= τm (i ). ′ Then (Tw¯,s) ≡m (Tw¯′ ,s ).

The win for Player II is straightforward. If i1,i2,... enumerate outer positions in w¯ and ip ≤ ii. Player II then selects j so that

15 ′ ′ the response is in tw¯ (j′) according to his winning strategy in games either (a) or (b) (if j is in tw¯(i), then j′ is in τ w¯ (i′)), ′ m and then, since the rank-m types of tw¯(j) and the chosen tw¯ (j′) are the same, selects the actual response according to the ′ w¯ w¯ ′ winning strategy t (j) ≡m t (j ). tree Next we show how Claim A.7 proves that FO is expressible in TL over infinite trees Tw¯. First note that being an outer node is expressible: since ←⊤ is true in right children of matched calls, then

αouter = ¬ ⊤S↓( ←⊤) is true if no node on the path to the root is inside a call, that is, precisely in outer nodes. Next note that for each rank-m type τ of a tree there is a formula βτ such that if s = ιw-t(i) is an outer node of Tw¯, then w¯ (Tw¯,s) |= βτ iff the rank-m type of t (i) is τ. If i is not a matched call, then such a type is uniquely determined by i’s label and perhaps pcall or pret, and thus is definable in TLtree. w¯ If i is a matched call, the existence of such a formula βτ follows from the fact the rank-m type of t (i) is completely ′ w¯ w¯ determined by the label of i and the rank-m type τ of the subtree t0 (i) of t (i) rooted at the right child of s (recall that the root has only (right) child, by the definition of tw¯(i)). Type τ ′ is expressible in FO and, since tw¯(i) is finite, is expressible in tree ′ TL as well by Lemma A.4. Furthermore, by the separation property of [18], it is expressible by a formula βτ ′ that does S ′ w¯ ′ tree not use ↑ and ↓. This means that (Tw¯,s · 1) |= βτ ′ iff the rank-m type of t0 (i) is τ . Hence, βτ is expressible in TL ′ as a Boolean combination of propositional letters from Σ and formulas ↓βτ ′ . Note that in this case, βτ does not use pcall and pret. ← → tree By Claim A.7, we need to express, for each node s = ιw-t(i), the rank-m types of sm (¯w,s) and sm (¯w,s) in TL over w¯ Tw¯, as well as the rank-m type of τ (i), in order to express a quantifier-rank m formula, as it will be a Boolean combination of such formulas. Given s, we need to define ιw-t(ip) – the outer position in whose scope s occurs – and then from that point evaluate two FO formulas, defining rank-m types of words over the alphabet of rank-m types of finite trees. By Kamp’s theorem, each such FO formula is equivalent to an LTL formula whose propositional letters are rank-m types of trees. → Assume we have an LTL formula γ expressing the rank-m type τ0 of sm (¯w,s). By Kamp’s theorem and the separation property for LTL, it is written using only propositional letters, Boolean connectives, and U (that is, no ⊖ and S). We now tree inductively take conjunction of each subformula of γ with ¬( ←⊤) (i.e., a TL formula which is true in left successors), tree replace LTL connectives and U by ↓ and U↓, and replace each propositional letter τ by βτ , to obtain a TL formula ′ ′ → γ . Then (Tw¯, ιw-t(ip)) |= γ iff sm (¯w,s) has type τ0. Thus, for a formula ′′ ′ ′ γ = αouter ∧ γ ∨ ¬αouterS↓(αouter ∧ γ )

→ is true in (Tw¯, ιw-t(i)) iff the rank-m type of sm(¯w,s) is τ0. ← The proof for sm (¯w,s) is similar. Since this word is finite, by Kamp’s theorem and the separation property, there is an LTL formula γ that uses ⊖, S, propositional letters and Boolean connectives such that γ evaluated in the last position of the word expresses its rank-m type. Since there is exactly one path from each node to the root, to translate γ into a TLtree formula ′ γ we just need to replace propositional letters by the corresponding formulas βτ , and ⊖ by ↑. Then, as for the case of → ′ ← sm (¯w,s), we have that γ evaluated in ιw-t(ip) expresses the type of sm (¯w,s). Then finally the same formula as in the case → of sm (¯w,s) evaluated in s expresses that type. tree w¯ w¯ Finally we need a TL formula that expresses τm(i), the rank-m type of t (i), when evaluated in (Tw¯, ιw-t(i)). We w¯ can split this into two cases. If αouter is true in ιw-t(i), then, as explained earlier, the rank-m type of t (i) is a Boolean combination of propositional letters, and thus definable. w¯ So we now consider the case when αouter is not true in ιw-t(i). Then τm(i) is given by a Boolean combination of formulas w¯ w¯ that specify (1) the label of ip, and (2) the rank-m type of (t0 (ip),s), the subtree of t (ip) rooted at the child of ip with s as tree w¯ a distinguished node. This type can be expressed by a formula γ in TL over t0 (ip) by [18]. Hence if in γ we recursively ′ tree w¯ take conjunction of each subformula with ¬αouter, we obtain a formula γ of TL that expresses the type of (t0 (ip),s) w¯ ′ when evaluated in (Tw¯,s). Thus, τ (i) is expressible by a Boolean combination of formulas γ and ¬αouterS↓(αouter ∧ a) where a is a propositional letter. This completes the proof of translation of FO into TLtree over nested ω-words, and thus the proof of the lemma. 2

Proof of Corollary 4.3

In the proof of Theorem 4.1, we show that every FO sentence over a nested word w¯ can be translated into an FO sentence tree tree over tree Tw¯, and then, by the separation property of TL [18] is equivalent to a TL formula that does not use S↓ and ↑.

16 tree s Sσ S ◦ ◦Sσ ◦ Then, given that in the translation of TL into NWTL we only use s in the rule (ϕ ↓ψ) = ϕ s ψ , we see that the s Sσ equivalent NWTL formula does not use s . Thus, given that in the proof of Lemma A.2, no since operator is used in the Uσ Uσ Uσ Uσ translations of s into ss and ss into , the corollary follows for the finite case. For the infinite case, we note that in the proof of Theorem 4.1, for the case of FO sentences we only need to specify the → S tree type of sm (¯w,s) where s is the root. Thus, one can see that in this case the use of ↓ in TL formulas is not required, and s Sσ hence the resulting formulas are translated into NWTL formulas without s .

Proof of Proposition 4.4

We shall look at finite nested words; the proof for the infinite case applies verbatim. To evaluate a formula ϕ of NWTLfuture in position i of a nested word w¯ of length n one only needs to look at w¯[i,n]. That is, if w¯ and w¯′ of length n and n′ respectively are such that w¯[i,n] =∼ w¯[i′,n′], then (¯w,i) |= ϕ iff (¯w′,i′) |= ϕ for every formula ϕ of NWTLfuture. future Furthermore, for every collection of NWTL formulas Ψ= {ψ1,...,ψl}, one can find a number k = k(Ψ) such that

′ ′ ′ ′ w¯[i,n] ≡k w¯[i ,n ] implies (¯w,i) |= ψp ⇔ (¯w ,i ) |= ψp, for all p ≤ l.

In particular, if br stands for the word of length r in which all positions are labeled b and the matching relation is empty, we derive that there are numbers k1 = k1(Ψ) and k2 = k2(Ψ) with k1 > k2 such that

k1 k2 b |= ψp ⇔ b |= ψp, for all p ≤ l. Now consider the following NWTL formula:

α = µ⊤ ∧ µ⊖a, saying that the first position is a call, and the predecessor of its matching return is labeled a. We claim that this is not expressible in NWTLfuture. Assume to the contrary that there is a formula β of NWTLfuture equivalent to α. Let Ψ be the collection of all subformulas of β, including β itself, and let k1 and k2 be constructed as above. We now consider two nested words w¯1 and w¯2 of length k1 k1 +2 whose underlying words are bab of length n = k1 +2, such that the matching relation µ1 of w¯1 has one edge µ1(1, 3), and the matching relation µ2 of w¯2 has one edge µ1(1,n +1 − k2). In other words, the only return position of w¯1 is k1 k2 r1 =3, and the only return position of w¯2 is r2 = n +1 − k2, and thus w¯1[r1,n]= b and w¯2[r2,n]= b . Further notice that for every i> 1 we have w¯1[i,n] =∼ w¯2[i,n]. Observe that (¯w1, 1) |= α and (¯w2, 1) |= ¬α. We now prove by induction on formulas in Ψ that for each such formula γ we have (¯w1, 1) |= γ iff (¯w2, 1) |= γ, thus proving that β and α cannot be equivalent.

• The base case of propositional letters is immediate. • The Boolean combinations are straightforward too. • Let γ = ψ. Then (¯w1, 1) |= γ ⇔ (¯w1, 2) |= ψ ⇔ (¯w2, 2) |= ψ ⇔ (¯w2, 1) |= γ,

since w¯1[2,n] =∼ w¯2[2,n].

• Let γ = µψ. Then (¯w1, 1) |= γ ⇔ (¯w1, 3) |= ψ ⇔ bk1 |= ψ ⇔ bk2 |= ψ ⇔ (¯w2,n +1 − k2) |= ψ ⇔ (¯w2, 1) |= γ, since ψ ∈ Ψ.

17 σ • Let γ = ϕU ψ. Assume (¯w1, 1) |= γ. Consider three cases.

Case 1: (¯w1, 1) |= ψ. By the hypothesis (¯w2, 1) |= ψ and we are done. σ σ Case 2: The witness for ϕU ψ occurs beyond the only return. Then (¯w1, 1) |= ϕ and (¯w1, r1) |= ϕU ψ. Since σ σ σ ϕU ψ ∈ Ψ we have (¯w2, r2) |= ϕU ψ, and by the hypothesis, (¯w2, 1) |= ϕ, so (¯w2, 1) |= ϕU ψ. σ Case 3: The witness for ϕU ψ occurs inside the call. Since for every position i > 1 we have (¯w1,i) |= ϕ iff σ (¯w2,i) |= ϕ and likewise for ψ, the same summary path witnesses ϕU ψ in w¯2.

Thus, (¯w2, 1) |= γ.

Now assume (¯w2, 1) |= γ. In the proof of (¯w1, 1) |= γ is the same as above in Cases 1 and 2. For Case 3, assume that in the path which is a witness for ϕUσ ψ the position in which ψ is true is the 2nd or the 3rd position in the word. Then the same path witnesses (¯w1, 1) |= γ, as in the proof of Case 3 above. Next assume it is a position with index j higher than 3 (which is still labeled b) where ψ first occurs. Then ϕ must be true in all positions i with 3 ≤ i ≤ j in w¯2. Hence ϕ is true in all such positions in w¯1 as well, and thus the summary path in w¯1 that skips the first call (i.e. jumps from 1 σ to 3) witnesses ϕU ψ. Hence, in all the cases (¯w2, 1) |= γ implies (¯w1, 1) |= γ, which completes the inductive proof, and thus shows the inexpressibility of α in NWTLfuture. 2

Proof of Theorem 4.6

The translation from LTLµ + W into FO is similar to the translation used in the proof of Theorem 4.7. To prove the other direction, we show how to translate NWTL into LTLµ + W. More precisely, for every formula ϕ in NWTL, we show how to µ construct a formula αϕ in LTL + W such that for every nested word w¯ (finite or infinite) and position i in it, we have that (¯w,i) |= ϕ if and only if (¯w,i) |= αϕ. µ Since LTL includes the same past modalities as NWTL, αϕ is trivial to define for the atomic formulas, Boolean combi- nations and next and previous modalities:

α⊤ := ⊤,

αcall := call,

αret := ret,

αa := a,

α¬ϕ := ¬αϕ,

αϕ∨ψ := αϕ ∨ αψ,

α ϕ := αϕ,

α µϕ := µαϕ,

α⊖ϕ := ⊖αϕ, ⊖ α⊖µϕ := µαϕ.

Thus, we only need to show how to define αϕUσψ and αϕSσψ. Formula αϕUσ ψ is defined as: a c c αϕUσψ := αϕU (αψ ∨ (αϕ ∧ αψ) ∨ W 3(αψ ∧ ⊤ ∧ (¬ret → ⊖(βS γ)) ∧ (ret → ⊖µ(βS γ)))), where β and γ are formulas defined as:

a β := αϕS (αϕ ∧ ¬ret ∧ ⊖((αϕ ∧ ¬ret)S(αϕ ∧ call))), γ := ¬⊖⊤.

Moreover, formula αϕSσψ is defined as: c a αϕSσψ := δS (αϕS αψ), where δ is a formula defined as:

a δ := αϕS (αϕ ∧ ¬ret ∧ ⊖((αϕ ∧ ¬ret)Scall)). This concludes the proof of the theorem.

18 Proof of Theorem 4.7

In order to prove Theorem 4.7 we use the composition argument presented below. Let w¯ be a nested word and i an element in w¯. Let c1,...,cm, where m ≥ 0, be all elements in w¯ such that, for each j ∈ [1,m], cj < i and there is an element rj such that µ(cj , rj ) and i ≤ rj . Assume without loss of generality that c1

• The element a0 is labeled with the tuple whose first component is the rank-k type of (¯w[1,c1 − 1], 1) and whose second component is the rank-k type of (¯w[r1, ∞], 1) if m 6=0; otherwise, it is labeled with the tuple whose first component is the rank-k type of (¯w[1,i − 1], 1) and whose second component is the rank-k type of (¯w[i, ∞], 1)

• for each 0

• if m 6=0 then the element am is labeled with the the tuple whose first component is the rank-k type of (¯w[cm,i − 1], 1) and whose second component is the rank-k type of (¯w[i, rm − 1], 1). The following is our composition argument:

′ Lemma A.8 (Composition Method) Let w¯1 and w¯2 be two nested ω-words, and let i and i be two elements in w¯1 and ′ w¯2, respectively, that share the same label in Σ, and such that i is a call (resp., return) iff i is a call (resp., return). Then ′ ′ Ωk(¯w1,i) ≡k+2 Ωk(¯w2,i ) implies (¯w1,i) ≡k (¯w2,i ).

Proof: First we need to introduce some terminology. Let w¯ be a nested ω-word and i a position in w¯. Assume elements c1,...,cm, r1,...,rm are defined as above. With each element s of w¯ we associate an element [s] of Ωk(¯w,i) as follows:

• If m 6= 0 and s belongs to w¯[0,c1 − 1] or w¯[r1, ∞] then [s] is the first element of Ωk(¯w,i). In such case we say that w¯[0,c1 − 1] and w¯[r1, ∞] are the left and right intervals represented by [s], respectively. If m = 0 and s belongs to w¯[0,i − 1] or w¯[i, ∞] then [s] is also the first element of Ωk(¯w,i). In such case we say that w¯[0,i − 1] and w¯[i, ∞] are the left and right intervals represented by [s], respectively.

• If m 6=0 and s belongs to w¯[cm,i − 1] or w¯[i, rm − 1] then [s] is the last element of Ωk(¯w,i). In such case we say that w¯[cm,i − 1] and w¯[i, rm − 1] are the left and right intervals represented by [s], respectively.

• If m 6=0 and s belongs to w¯[cℓ,cℓ+1 − 1] or w¯[rℓ+1, rℓ − 1], for some 1 ≤ ℓ

(Ωk(¯w1,i), [p1],..., [pj ]) ≡k−j+2 ′ (Ωk(¯w2,i ), [q1],..., [qj ]).

19 L R By the way the strategy is defined, for each ℓ ∈ [1, j], if a¯ℓ and a¯ℓ are the subtuples of (a1,...,aj) containing the elements L R ¯L ¯R from (a1,...,aj ) that belong to [aℓ] and [aℓ] , respectively, then the corresponding subtuples bℓ and bℓ of (b1,...,bj) L R contain the elements from (b1,...,bj ) that belong to [bℓ] and [bℓ] , respectively. Further, by definition of the strategy, we L L L ¯L R R R ¯R also have that ([aℓ] , a¯ℓ , 1) ≡k−j ([bℓ] , bℓ , 1) and ([aℓ] , a¯ℓ , 1) ≡k−j ([bℓ] , bℓ , 1) (where 1 represents the first element of the interval). We now show how to define Player II’s response in the round j +1. Let us assume without loss of generality that for round ′ j +1 of the game on (¯w1,i) and (¯w2,i ), Player I picks an element aj+1 in w¯1 that belongs to the left interval represented by [aj+1] (all the other cases can be treated in a similar way). Player II response bj+1 in w¯2 is defined as follows. First, there ′ must be an element [s] in Ωk(¯w2,i ) such that

(Ωk(¯w1,i), [p1],..., [pj], [pj+1]) ≡k−j+1 ′ (Ωk(¯w2,i ), [q1],..., [qj ], [s]),

L where pj+1 = aj+1. The latter, together with the way that the strategy is defined, implies that there is an element b in [s] L ′ L ′ ′ such that ([aj+1] , a¯ ,aj+1, 1) ≡k−j−1 ([s] , ¯b , b, 1), where a¯ is the subtuple of (a1,...,aj) containing all the elements in L ′ (a1,...,aj) that belong to [aj+1] and ¯b is the corresponding subtuple of (b1,...,bj ). We then set bj+1 = b. We show by induction that, for each j ≤ k, if (a1,...,aj) and (b1,...,bj) are the first j moves played by Player I ′ and Player II on (¯w1,i) and (¯w2,i ), respectively, according to the strategy defined above, then ((a1,...,aj ), (b1,...,bj)) ′ ′ defines a partial isomorphism between (¯w1,i) and (¯w2,i ). This is sufficient to show that (¯w1,i) ≡k (¯w2,i ). Assume j =0. Then it follows from the statement of the lemma that i and i′ share the same label in Σ, and that i is a call (resp., return) iff i′ is a call (resp., return). Assume that the property holds for j. Also assume without loss of generality that for the round j +1 of the game on ′ (¯w1,i) and (¯w2,i ), Player I picks an element aj+1 in w¯1 that belongs to the right interval represented by [aj+1] (all the other cases can be treated in a similar way). We prove that bj+1 as defined above preserves the partial isomorphism. It is ′ not hard to see that aj+1 = i iff bj+1 = i . Indeed, assume first that i 6= rm. Then [aj+1] is the last element of Ωk(¯w1,i), ′ ′ and Ωk(¯w1,i) ≡k+2 Ωk(¯w2,i ) implies that [bj+1] is the last element of Ωk(¯w2,i ). Since aj+1 = i is the first element of R R ′ [aj+1] , bj+1 is the first element of [bj+1] , which is i . Assume now that i = rm. Then both the last elements of Ωk(¯w1,i) ′ ′ and Ωk(¯w2,i ) are of the form (τ, τε), where τε is the rank-k type of the empty nested word. Thus, i is also a non-pending ′ return, and [aj+1] is the penultimate element of Ωk(¯w1,i). Since, Ωk(¯w1,i) ≡k+2 Ωk(¯w2,i ), [bj+1] is the penultimate ′ ′ element of Ωk(¯w2,i ). But since i is a non-pending return, it implies that the first element of the right interval associated ′ ′ with the penultimate element of Ωk(¯w2,i ) is also bj+1 = i . Further, it is also clear that the label of aj+1 in w¯1 is a iff the label of bj+1 in w¯2 is a, for each a ∈ Sigma. Next we consider the remaining cases.

L ′ L ′ ′ • Assume that aj+1 ∈ call. Then ([aj+1] , a¯ ,aj+1, 1) ≡k−j−1 ([bj+1] , ¯b ,bj+1, 1), where a¯ is the subtuple of L ′ (a1,...,aj) containing all the elements in (a1,...,aj) that belong to [aj+1] and ¯b is the corresponding subtuple of (b1,...,bj ). This immediately implies that bj+1 ∈ call. The converse is proved analogously.

• Assume that aj+1 ∈ ret. This is similar to the previous case.

R • First, assume that aj+1 < aℓ holds for some ℓ ∈ [1, j]. Since aj+1 belongs to [aj+1] , we have that aℓ belongs R to [aℓ] and, thus, we only need to consider the cases [aℓ] = [aj+1] and [aj+1] < [aℓ]. If [aℓ] = [aj+1], then L L ([aℓ] ,aℓ,aj+1) ≡0 ([bℓ] ,bℓ,bj+1) and, therefore, bj+1 < bl also holds. If [aj+1] < [aℓ], then [bj+1] < [bℓ] and, L L thus, bj+1

Second, assume that aℓ

• First, assume that µ(aj+1,aℓ) holds for some ℓ ∈ [1, j]. Since aj+1 belongs to the right interval represented by [aj+1], L L we have that [aℓ] = [aj+1]. Thus, given that ([aℓ] ,aℓ,aj+1) ≡0 ([bℓ] ,bℓ,bj+1), we conclude that µ(bj+1,bℓ) holds.

Second, assume that µ(aℓ,aj+1) holds for some ℓ ∈ [1, j]. We need to consider two cases. If both aℓ and aj+1 belong L L to the same interval, then ([aℓ] ,aℓ,aj+1) ≡0 ([bℓ] ,bℓ,bj+1), and thus, µ(bℓ,bj+1) holds. If aℓ and aj+1 belong to

20 αa(x) := Pa(x),

αcall(x) := call(x),

αret(x) := ret(x),

αint(x) := ¬call(x) ∧ ¬ret(x),

αpret(x) := ret(x) ∧ ¬∃yµ(y,x),

α¬ϕ(x) := ¬αϕ(x),

αϕ∨ψ(x) := αϕ(x) ∨ αψ(x), ∃ ∧ ¬∃ ∧ ∧ α ϕ(x) := y (x

αϕUψ(x) := ∃y ((x

∀z(z

αϕUaψ(x) := ∃y ((x

∀z(z

αϕScψ(x) := αψ(x) ∨ ∃y (y

∀z (((z = x) ∨ (αcall(z) ∧ z

Figure 1. Translating CaRet + {C, R} into FO.

L R distinct intervals, then [aℓ] = [aj+1]+1, aℓ is the first element of [aℓ] and aj+1 is the first element of [aj+1] . Thus, ′ given that (Ωk(¯w1,i), [aℓ], [aj+1]) ≡1 (Ωk(¯w2,i ), [bℓ], [bj+1]), it is the case that [bℓ] = [bj+1]+1. Furthermore, given L L that the rank-k type of [aℓ] is the same as the rank-k type of [bℓ] with the first element distinguished as a constant, we L R conclude that bℓ is the first element of [bℓ] . In the same way, we conclude that bj+1 is the first element of [bj+1] and, therefore, µ(bℓ,bj+1) holds.

This concludes the proof of the lemma. 2

We now present the proof of Theorem 4.7.

Proof of Theorem 4.7: We first show that every CaRet + {R, C} formula ϕ is equivalent to an FO formula αϕ(x) over nested words, that is, for every nested word w¯ (finite or infinite), we have (¯w,i) |= ϕ iff w¯ |= αϕ(i). The translation is standard and can be done by recursively defining αϕ(x) from ϕ as shown in Figure A. In the following we assume that the FO formula θ(x)[y,z] is the relativization of θ(x) to elements in the interval [y,z], that is, θ(x)[y,z] is obtained from θ(x) by replacing each subformula of the form ∃uβ with ∃u(y ≤ u∧u ≤ z ∧β) and each subformula of the form ∀uβ with ∀u(y ≤ u∧u ≤ z → β). We now show the other direction, that is, FO ⊆ CaRet +{R, C}. We extend the vocabulary with an extra atomic predicate min interpreted as the first element of each nested word w¯. We start by proving this result for FO sentences (that is, we prove that for every FO sentence ϕ there is an CaRet + {R, C} formula ψ, such that w¯ |= ϕ iff (¯w, 1) |= ψ), and then extend it to the case of FO formulas with one free variable. Let ϕ be an FO sentence. We use induction on the quantifier rank to prove that ϕ is equivalent to an CaRet + {R, C} formula. For k = 0 the property trivially holds, as ϕ is a Boolean combination of formulas of the form Pa(min), min < min,

21 min = min, and µ(min, min). We now prove for k +1 assuming that the property holds for k. Since every FO sentence of quantifier rank k +1 is a Boolean combination of FO sentences of the form ∃xϕ(x), where ϕ(x) is a formula of quantifier rank k, we just have to show how to express in CaRet + {R, C} a sentence of this form. Let Γ be the set of all rank-k types of nested words over alphabet Σ. By induction hypothesis, for each τ ∈ Γ there is an CaRet + {R, C} formula ξτ such that (¯w, 1) |= ξτ iff the rank-k type of w¯ is τ. It is not hard to see then that there is an ℓ ℓ ℓ ℓ CaRet + {R, C} formula ξτ such that (¯w, 1) |= ξτ iff the rank-k type of w¯ is τ, where w¯ is the nested word obtained from ℓ w¯ by removing its last element (note that in order to construct ξτ we require the temporal operator ). Let Λ be thesetof allrank-(k+2) types of words over alphabet Γ×Γ. We first construct, for each λ ∈ Λ, an CaRet+{R, C} formula αλ over alphabet Σ such that,

(¯w,i) |= αλ ⇐⇒ the rank-(k + 2) type of Ωk(¯w,i) is λ, for each nested word w¯ and position i of w¯. Fix λ ∈ Λ. From Kamp’s theorem [12] there is an LTL formula βλ over alphabet Γ × Γ such that a word s satisfies βλ evaluated on its last element iff the rank-(k + 2) type of s is λ. By the separation property, we can assume that βλ only mentions past modalities ⊖ and S. Moreover, given that ϕ S ψ ≡ ψ ∨ (ϕ ∧ ⊖(ϕ S ψ)), we can also assume that βλ is a Boolean combination of formulas of the form either ϕ or ⊖ψ, where ϕ does not mention any temporal modality and ψ is an arbitrary past LTL formula. Thus, since CaRet + {R, C} is closed under Boolean combinations, to show how to define αλ from βλ, we only need to consider two cases: (1) βλ is an LTL formula over Γ × Γ without temporal modalities, and (2) βλ is of the form ⊖ψ, where ψ is an arbitrary past LTL formula over Γ × Γ. Next we consider these two cases.

◦ ◦ • Assume that βλ is an LTL formula without temporal modalities. Then αλ is defined to be βλ, where ( ) is defined recursively as follows. Given (τ, τ ′) ∈ Γ × Γ,

′ ◦ ℓ ℓ (τ, τ ) := (¬ret ∧Cξτ ∧ Rξτ ′ ) ∨ ℓ (ret ∧Cξτ ∧ ξτε ),

with τε the rank-k type of the empty word. Furthermore, if ψ and ϕ are LTL formulas without temporal modalities, then

(¬ϕ)◦ := ¬ϕ◦, (ϕ ∨ ψ)◦ := ϕ◦ ∨ ψ◦.

⋆ • Assume that βλ is a formula of the form ⊖ϕ, where ϕ is an arbitrary past LTL formula. Then αλ is defined to be βλ, where ( )⋆ is defined recursively as follows. Given (τ, τ ′) ∈ Γ × Γ,

′ ⋆ ℓ ℓ (τ, τ ) := (⊖c⊤∧Cξτ ∧ µRξτ ′ ) ∨ ℓ (¬⊖c⊤∧Cξτ ∧ µRξτ ′ )

Furthermore, if ψ and ϕ are past LTL formulas, then

(¬ϕ)⋆ := ¬ϕ⋆, (ϕ ∨ ψ)⋆ := ϕ⋆ ∨ ψ⋆, ⋆ ⋆ (⊖ϕ) := ⊖c ϕ , (ϕ S ψ)⋆ := ϕ⋆ Sc ψ⋆.

Now, let ∃xϕ(x) be an FO sentence such that the quantifier rank of ϕ(x) is k. Then, from our composition method ϕ(x) ′ can be expressed in CaRet+{R, C} as the formula λ∈Λ′ αλ, where Λ ⊆ Λ is the set of all rank-(k +2) types of words over alphabet Γ × Γ that belong to {Ωk(¯w,i) | w¯ |= ϕ(i)}. Thus, ∃xϕ(x) can be expressed as the following CaRet + {R, C} U W formula: ⊤ ( λ∈Λ′ αλ). This concludes the proof of the theorem. Finally, from the composition method and the previous proof we see that the equivalence FO = CaRet + {R, C} holds for unary queriesW over nested words. 2

22 Proof of Theorem 5.6

From the FO completeness of NWTL+, we have that NWTL+ + W can be translated into NWTL+. We show that at least + an exponential blow-up is necessary for such translation. More precisely, we construct a sequence {ϕn}n≥1 of NWTL +W + Ω(n) formulas of size O(n), such that the shortest NWTL formula that is equivalent to ϕn is of size 2 . Our proof is a + modification of similar proofs given in [9, 22]. Assume Σ = {a0,...,an}, and let ϕn be the following NWTL + W σ formula (here, 2σθ and θ are abbreviations for ¬(⊤Uσ¬θ) and ⊤Sσθ, respectively):

n σ σ σ σ 2 call → W2 ( (ai ↔ (ai ∧ ¬⊖⊤))) → (a0 ↔ (a0 ∧ ¬⊖⊤)) . i=1   ^  It is not hard to see that w¯ |= ϕn iff for every positions i, j in w¯ such that µ(i, j) holds, if position ℓ in w¯[i, j] coincides with i on a1,...,an, then ℓ also coincides with i on a0. It is shown in Theorem 5.1 that for each NWTL+ formula α, the language

Lα = { w¯ | w¯ is an nested ω-word such that w¯ |= α } is recognized by a nondeterministic nested word automaton with 2O(|α|) states. Thus, it is enough to show that every such 2Ω(n) automaton for Lϕn requires at least 2 states. Let A be a nondeterministic nested word automaton for Lϕn . Assume that Σ\{a0} n b0,...,b2n−1 is an enumeration of the symbols in 2 . For every K ⊆{0,..., 2 − 1} let w¯K be the word c0 · · · c2n−1 over alphabet 2Σ, where for each i ≤ 2n − 1:

bi i ∈ K ci = (bi ∪{a0} otherwise

n ω It is not hard to see that for each K ⊆ {0,..., 2 − 1}, the nested ω-word (¯wK , µ), where µ = {(cj ,c3·2n−1−j) | 0 ≤ n ω 1 2 1 2 1 j ≤ 2 − 1}, is such that (¯wK , µ) |= ϕn. Let (qK , qK ) and (qK′ , qK′ ) be pairs of states such that (1) A is in states qK and 2 ω n n ω 1 qK in an accepting run for (¯wK , µ) after reading 2 and 2 · 2 symbols from w¯K , respectively, and (2) A is in states qK′ 2 ω n n ω and qK′ in an accepting run for (¯wK′ , µ) after reading 2 and 2 · 2 symbols from w¯K′ , respectively. Next we show that 1 2 1 2 ′ 1 2 1 2 ′ ω ′ (qK , qK ) 6= (qK′ , qK′ ) if K 6= K . On the contrary, assume that (qK , qK ) = (qK′ , qK′ ). Then A accepts (¯wK w¯K w¯K , µ ), ′ n ω ′ where µ = {(cj ,c3·2n−1−j ) | 0 ≤ j ≤ 2 − 1}, which is a contradiction since (¯wK w¯K′ w¯ , µ ) 6|= ϕn. Given that the n K n number of different K’s is 22 , the latter implies that the number of different pairs of states of A is at least 22 . Thus, if A n n−1 Ω(n) has m states, then m2 ≥ 22 and, hence, m ≥ 22 . Therefore, the number of different states of A is at least 22 . This concludes the proof of the theorem.

Proof of Theorem 6.1

As we mentioned already, in the finite case this is a direct consequence of [19] so we concentrate on the infinite case. It is more convenient for us to prove the result for ordered unranked forests in which a subtree rooted at every node is finite. The way to translate a nested ω-word into such a forest is as follows: when a matched call i with µ(i, j) is encountered, it defines a subtree with i as its root, and j +1 as the next sibling (note that this is different from the translation into binary trees we used before). If i is an internal position, or a pending call or a pending return position, then it has no descendants and its next sibling is i +1. Matched returns do not have next sibling, nor do they have any descendants. The nodes in the forest are labeled with call, ret, and the propositions in Σ, as in the original nested word. It is routine to define, in FO, relations desc and sib for descendant and younger sibling in such a forest. Furthermore, from these relations, we can define the usual ≤ and µ in nested words using at most 3 variables as follows. For x ≤ y, the definition is given by

(y desc x) ∨ ∃z x desc z ∧ ∃x z ≺sib z ∧ y desc x and for µ(x, y), by   (y desc x) ∧ ∀z (z desc x) → ∃x(x = z ∧ x ≤ y) . Thus, it suffices to prove the three-variable property for such ordered forests, which will be referred to as A, B, etc. We Gv shall use pebble games. Let m(A,a1,b1, B,b1,b2) be the m-move, v-pebble game on structures A and B where initially

23 Gv pebbles xi are placed on ai in A and bi in B. Player II has a winning strategy for m(A,a1,b1, B,b1,b2) iff A,a1,a2 and B,b1,b2 agree on all formulas with at most v variables and quantifier-depth m. We know from [13] that to prove Theorem 6.1, it suffices to show the following,

G3 Claim A.9 For all k, if Player II has a winning strategy for the game 3k+2(A,a1,a2; B,b1,b2). Then she also has a Gk winning strategy for the game k(A,a1,a2; B,b1,b2). We will show how Player II can win the k-pebble game by maintaining a set of 3-pebble sub-games on which she will copy Player I’s moves and decide on good responses using her winning strategy for these smaller 3-pebble games. The choice of these sub-games will partition the universe |A| ∪ |B| so that each play by Player I in the k-pebble game will be answered in one 3-pebble game. This is similar to the proof that linear orderings have the 3-variable property [13]. G3 The subgames, m(A,a1,a2; B,b1,b2), that Player II maintains will all be vertical in which a2 desc a1 and b2 desc b1 hold, or horizontal in which a1 ≺sib a2 and b1 ≺sib b2 hold. The following lemma gives the beginning strategy of Player II in which she replaces an arbitrary game configuration with a set of configurations each of which is vertical or horizontal.

G3 ′ ′ ′ ′ Lemma A.10 If Player II wins m+4(A,a1,a2; B,b1,b2). Then there are points a1,a2 from A and b1,b2 from B such that G3 ′ ′ ′ ′ G3 ′ ′ ′ Player II wins the horizontal game m+2(A,a1,a2; B,b1,b2) and the vertical games m+2(A,ai,ai; B,bi,bi) for i =1, 2.

Proof: For this proof since A and B are fixed, we will describe a game only by listing the chosen points, e.g., (a1,a2; b1,b2). G3 We simulate two moves of the game, m+4(a1,a2; b1,b2), in which we choose Player I’s moves and then Player II answers according to her winning strategy. Let u + v denote the least common ancestor of u and v. First, we have Player I place ′ ′ pebble x3 on a1, the unique child of a1 + a2 that is an ancestor of a1. (Note that if a1 = a1 then this move can be skipped ′ ′ and similarly for the second move if a2 = a2.) Player II answers by placing x3 on some point b1. Second, Player I should ′ ′ move pebble x1 from a1 to a2, the unique child of a1 + a2 that is an ancestor of a2. Player II moves x1 to some point b2. Since Player II has moved according to her winning strategy, we have that she still has a winning strategy for the three ′ ′ ′ games in the statement of the lemma. Furthermore, since a1 and a2 are siblings and we have two remaining moves, b1 and ′ 2 b2 must be siblings as well.

Using Lemma A.10 we initially partition the universe according to four subgames:

′ • (ar,ap; br,bp) with domain everything not below ap or bp. Here ap = a1 + a2, i.e., the parent of a1, bp = b1 + b2, i.e., ′ the parent of b1 and ar and br are the roots of A and B, (the roots are not necessary but then the subgames are all on horizontal or vertical pairs), or

′ ′ ′ ′ • (a1,a1; b1,b1) with domain everything below a1 or b1, ′ ′ ′ ′ • (a2,a2; b2,b2), with domain everything below a2 or b2, ′ ′ ′ ′ • (a1,a2; b1,b2), with the remaining domain. We now have to explain, inductively, how all moves of Player I in the k-pebble game are answered by Player II and how, in the process, the universe is further partitioned. We inductively assume that Player II has a winning strategy for each of the 3-pebble, m-move sub-games. There are two cases: Vertical: Player I places a new pebble on a point a that is in the domain of a vertical game: (a1,a2; b1,b2). We thus know that a1 is a proper ancestor of a. The interesting case is where neither of a and a2 is above the other so, without loss of ′ generality, assume that a

24 Horizontal: In this case, we have the configuration, (a1,a2; b1,b2), consisting of a pair of siblings. The only interesting case occurs when Player I puts a new pebble on some vertex, a, s.t. a1

Proof of Theorem 6.2

The translation from unary-NWTL into FO2 is standard and can be done with negligibleblow-upin the size of the formula, so we concentrate on the other direction. The proof generalizes the proof of an analogous result for unary temporal logic over words from [9]. Given an FO2 formula ϕ(x) the translation procedure works a follows. When ϕ(x) is atomic, i.e., of the form a(x), it ′ ′ outputs a. When ϕ(x) is of the form ψ1 ∨ ψ2 or ¬ψ—we say that ϕ(x) is composite—it recursively computes ψ1 and ψ2, ′ ′ ′ ′ ∗ ∗ or ψ and outputs ψ1 ∨ ψ2 or ¬ψ . The two cases that remain are when ϕ(x) is of the form ∃x ϕ (x) or ∃y ϕ (x, y). In both cases, we say that ϕ(x) is existential. In the first case, ϕ(x) is equivalent to ∃y ϕ∗(y) and, viewing x as a dummy free variable in ϕ∗(y), this reduces to the second case. In the second case, we can rewrite ϕ∗(x, y) in the form

∗ ϕ (x, y)= β(χ0(x, y), .., χr−1(x, y), ξ0(x), .., ξs−1(x), ζ0(y), .., ζt−1(y)) where β is a propositional formula, each formula χi is an atomic order formula, each formula ξi is an atomic or existential 2 2 FO formula with qdp(ξi) < qdp(ϕ), and each formula ζi is an atomic or existential FO formula with qdp(ζi) < qdp(ϕ). In order to be able to recurse on subformulas of ϕ we have to separate the ξi’s from the ζi’s. We first introduce a case distinction on which of the subformulas ξi hold or not. We obtain the following equivalent formulation for ϕ:

( (ξi ↔ γi) ∧ ∃yβ(χ0,...,χr−1,γ0,...,γs−1, ζ0,...,ζt−1)) . s i

• Ψ0 is x = y.

t • For each t ∈ T , Ψt is S (x, y).

t t • For each t ∈ T , Φt is ∃z (S (x, z) ∧ z < y).

• Let o = t1,t2,...tk be a sequence over T such that 2 ≤ k ≤ 5, all ti’s are distinct, and a call never appears before return (that is, if ti = c then tj 6= r for j>i). Then Ψo stands for

′ ′ t1 T1 ′ t2 ′ T2 ′ Tk ∃z1,z1,z2,z2,...zk ( S (x, z1) ∧ z1 ≤ z1 ∧ S (z1,z2) ∧ z2 ≤ z2 ∧···∧ zk ≤ y )

where for 1 ≤ i ≤ k, the set Ti equals the set {t1,t2 ...ti}, but with r removed if both c and r belong to this set.

25 We claim that these order types are mutually exclusive and complete, and are expressible in unary-NWTL (and hence, in FO2). First, let us show that the order types form a disjoint partition, meaning for all pairs (x, y) such that x ≤ y, we have exactly one of these relationships holding true. To see this, suppose x

τ τ ( (ξi ↔ γi) ∧ ∃y(τ ∧ β(χ0 ,...,χr−1, γ, ζ))) . s i

′ ′ • For the order type Ψ0, τhψ i is ψ itself.

′ t ′ • For each t ∈ T , for the order type Ψt, τhψ i is ψ .

′ t t {t} ′ • For each t ∈ T , for the order type Φt, τhψ i is 3 ψ .

′ t1 T1 t2 Tk ′ • For order type Ψo, where o = t1,t2,...tk is a sequence over T , τhψ i is 3 · · · 3 ψ , where for 1 ≤ i ≤ k ≤ 5, the set Ti equals the set {t1,t2 ...ti}, but with r removed if both c and r belong to this set.

′ The case corresponding to past opeartors is analogous. Our procedure will therefore recursively compute ξi for i

′ τ τ ′ ′ ( (ξi ↔ γi) ∧ τ β(χ0 , .., χr−1, γ, ζ0(x),...,ζt−1(x)) ) . (2) γ∈{⊤,⊥}si

26