An Algebraic Characterisation of First-Order Logic with Neighbour

An Algebraic Characterisation of First-Order Logic with Neighbour Amaldev Manuel Dhruv Nevatia Indian Institute of Technology Goa Chennai Mathematical Institute [email protected] [email protected] Abstract—We give an algebraic characterisation of first-order endpoints using bet as well as N , the constants min and max logic with the neighbour relation, on finite words. For this, themselves are not expressible using bet and N . Since both bet we consider languages of finite words over alphabets with an and N are their own left-to-right dual, if the constants min involution on them. The natural algebras for such languages are involution semigroups. To characterise the logic, we define and max are not used then languages definable using these a special kind of semidirect product of involution semigroups, predicates are closed under the reverse operation. For the sake called the locally hermitian product. The characterisation theo- of good algebraic properties, we include the constants min rem for FO with neighbour states that a language is definable in and max in our vocabulary. Next we state the results of [8], the logic if and only if it is recognised by a locally hermitian [9] in this setting. product of an aperiodic commutative involution semigroup, and a locally trivial involution semigroup. We then define the The monadic second-order logic MSO(A, min, max, N ), is notion of involution varieties of languages, namely classes of the extension that also allows monadic second-order quan- languages closed under Boolean operations, quotients, involution, tification — ∃Xϕ, ∀Xϕ are also formulas whenever ϕ is a and inverse images of involutory morphisms. An Eilenberg-type formula of the logic; The formula ∃Xϕ is true if there is a correspondence is established between involution varieties of set of positions X that satisfies the formula ϕ. For example, languages and pseudovarieties of involution semigroups. the formula . ∃X∀x∀y (N (x, y) → (X(x) ↔ ¬X(y)))∧X(min)∧¬X(max ) I. INTRODUCTION defines all the words of even length. It turns out that We give an algebraic characterisation of a logic over finite MSO(A, min, max, N ) defines precisely all regular lan- words, namely the first-order logic with the neighbour relation. guages, i.e., it has the same expressive power as MSO Let A be a finite alphabet. Formulas of the logic FO with with the successor relation, MSO(A, +1), by Büchi-Elgot- neighbour, FO(A, min, max, N ), are interpreted over finite Trakhtenbrot’s theorem. The relationship between the pred- words over the alphabet A. The constants min and max denote icates N and bet are analogous to their oriented counter the first and last positions of a given word respectively. The parts, namely, the the successor relation (x +1 = y) and atomic formulas of the logic are the following: The predicate the order relation (x<y). Both N and bet are definable P (x), for a ∈ A, denotes that the position x is labelled by a in terms of the other by monadic second-order formulas; the letter a. The binary predicate N (x, y) denotes that x and thus MSO(A, min, max, bet) also defines all regular lan- y are neighbours, i.e., either x +1= y or y +1= x. Finally, guages. Likewise, FO(A, min, max, bet) defines precisely all we have the equality predicate x = y. The set of formulas of aperiodic regular languages — languages that are definable the logic are closed under Boolean operations and first order in the logic FO(A, <). In fact, this parallelism between arXiv:2105.09368v1 [cs.LO] 19 May 2021 quantifications, i.e., ϕ ∨ ψ, ϕ ∧ ψ, ¬ϕ, ∀x ϕ, and ∃x ϕ are also FO(A, min, max, bet) and FO(A, <) extends to their quanti- formulas of the logic, if ϕ and ψ are formulas of the logic. fier alternation hierarchies. For example, the formula However, the parallel described so far breaks down in the Pa(min) ∧ Pb(max) ∧ ∀x∀y (N (x, y) → (Pa(x) ↔ Pb(y))) case of FO(A, min, max, N ). There are languages expressible in FO(A, +1) that are not expressible in the former logic. For ⋆ defines the language a(ba) b over the alphabet {a,b}. The instance, the language language defined by a formula ϕ, denoted as L(ϕ), is the set ∗ ∗ of all words satisfying the formula ϕ. L = c abc Before we go into the characterisation problem of over the alphabet {a,b,c} is expressible in the logic FO(A, min, max, N ), it is interesting to note the expressibil- FO(A, +1). But, using an Ehrenfeucht-Fra¨ıssé argument ity of related logics. The neighbour and between predicates [7], [8], it can be shown that L is not definable in — the ternary predicate bet(x,y,z) is true if position y is FO(A, min, max, N ). Thus, we have the question of char- strictly between positions x and z — were first studied in [8], acterising the languages definable in this logic. [9]. Although one could express that a position is one of the Using Hanf’s theorem [7] from finite model theory, it is 978-1-6654-4895-6/21/$31.00 ©2021 IEEE possible to give language-theoretic characterisations of both FO(A, +1) and FO(A, min, max, N ). For t > 0, we define isation of FO(A, +1) is a deep result [1], [5], [15], [26] in the equality with threshold t on the set N of natural numbers the theory of finite semigroups. The first observation is that by LTT is a variety of languages — a class of languages that t i = j if i < t, is closed under Boolean operations, quotients with respect to i = j := (1) words, and inverse images under homomorphisms. Eilenberg’s (i ≥ t and j ≥ t otherwise. variety theorem states that varieties of languages correspond + + The word y ∈ A is a factor of the word u ∈ A if u = xyz to pseudovarieties of semigroups — a pseudovariety of fi- ∗ for some x, z in A . We use ♯(u,y) to denote the number of nite semigroups is a set of finite semigroups that is closed times the factor y appears in u, i.e. the number of pairs (x, z), under finite direct products, subsemigroups and quotients. ∗ where x, z ∈ A , such that u = xyz. The characterisation of LTT proceeds by first writing the ◮ Definition 1. Let ≈t , for k,t > 0, be the equivalence on class as a semidirect product of pseudovarieties of aperiodic k Acom A∗, whereby two words u and v are equivalent if either they commutative monoids ( ) and righty trivial semigroups D both have length at most k − 1 and u = v, or otherwise they ( is the pseudovariety of semigroups that satisfy the equation have se = e for all elements s and idempotents e). This step is the 1) the same prefix of length k − 1, algebraic analogue of synthesising an automaton for an LTT 2) the same suffix of length k − 1, language as a cascade of a scanner and an acceptor [1]. The 3) and the same number of occurrences, up to threshold t, next step in the proof uses the framework of categories for for all factors of length ≤ k, i.e. ♯(u,y) =t ♯(v,y) for obtaining the identities capturing the semidirect product [21], each word y ∈ A+ of length at most k. [22], [25]. In the case of wLRTT, this approach does not work; A language is locally threshold testable (or LTT for short) if because wLRTT does not form a variety of languages — t ∗ ∗ it is a unionof ≈k classes, for some k,t > 0. Locally threshold for instance, the language (xy) ab(xy) over the alphabet testable languages are precisely the class of languages defin- {a,b,x,y} is in wLRTT. Its inverse image under the mor- able in FO(A, +1) [1], [23]. phism a 7→ a,b 7→ b,c 7→ xy is the language c∗abc∗, Since the neighbour predicate N is definable using the over the alphabet {a,b,c}. As we have seen, this language successor relation in first-order logic, FO(A, min, max, N ) is not in wLRTT. Therefore, wLRTT is not closed under definable languages are a subset of LTT. But this inclusion is inverse image of morphisms, and hence not a variety of strict, as we have seen. languages. This necessitates a rubric, of the like of varieties r t We define a coarser equivalence ≈k by the counting factors and operations on varieties, to study the class wLRTT. only up to reverse. Let ♯r(w, v) denote the number of occurrences of v or vr in w, i.e. the number of pairs (x, y), where Our results. It was already observed in [8] that one needs x, y ∈ A∗, such that w = xvy or w = xvry. to extend semigroups with an involution (also called ⋆- ◮ Definition 2. Let k,t > 0. Two words w, w′ ∈ A∗ are semigroups) — an involution on a semigroup S is an operation ⋆ ⋆ ⋆ ⋆ ⋆ r t ′ ′ ⋆ such that (a ) = a and (ab) = b a , for each a,b ∈ S ≈k-equivalent if |w| < k and w = w , or w, w both are of length at least k, and they have — to characterise wLRTT. The involution operation is a gen- eralisation of the reversal operation on words. An involutory 1) the same prefix of length k − 1, alphabet A is a finite alphabet A with a bijection † on it. The 2) the same suffix of length k − 1, and map † extends to words over A as (a ...a )† = a† ...a†, 3) ♯r(w, v) =t ♯r(w′, v) for each word v ∈ A+ of length 1 n n 1 where each a ∈ A.

An Algebraic Characterisation of First-Order Logic with Neighbour

Math 311 - Introduction to Proof and Abstract Mathematics Group Assignment # 15 Name: Due: at the End of Class on Tuesday, March 26Th

1 Monoids and Groups

Lecture 3 Partially Ordered Sets

Lecture 19: Alternating Group

Theoretical Probability and Statistics

Inverse Monoids Associated with the Complexity Class NP

Math 131: Introduction to Topology 1

19. Automorphism Group of S Definition-Lemma 19.1. Let G Be A

Degree 2 Transformation Semigroups As Continuous Maps on Graphs

Chapter 8 Ordered Sets

Math 3310: Intro to Proofs

On Some Properties of the Free Monoids with Applications to Automata Theory*