Bounded Languages Described by GF(2)-Grammars

Bounded languages described by GF(2)-grammars Vladislav Makarov∗ February 17, 2021 Abstract GF(2)-grammars are a recently introduced grammar family that have some un- usual algebraic properties and are closely connected to unambiguous grammars. By using the method of formal power series, we establish strong conditions that are necessary for subsets of a∗b∗ and a∗b∗c∗ to be described by some GF(2)-grammar. By further applying the established results, we settle the long-standing open question of proving the inherent ambiguity of the language f anbmc` j n 6= m or m 6= ` g, as well as give a new, purely algebraic, proof of the inherent ambiguity of the language f anbmc` j n = m or m = ` g. Keywords: Formal grammars, finite fields, bounded languages, unambiguous grammars, inherent ambiguity. 1 Introduction GF(2)-grammars, recently introduced by Bakinova et al. [3], and further studied by Makarov and Okhotin [15], are a variant of ordinary context-free grammars, in which the disjunction is replaced by exclusive OR, whereas the classical concatenation is replaced by a new operation called GF(2)-concatenation: K L is the set of all strings with an odd number of partitions into a concatenation of a string in K and a string in L. There are several motivating reasons behind studying GF(2)-grammars. The first of them they are a class of grammars with better algebraic properties, compared to ordinary grammars and similar grammar families, because the underlying boolean semiring logic was replaced by a logic of a field with two elements. As we will see later in the paper, that arXiv:1912.13401v3 [cs.FL] 16 Feb 2021 makes GF(2)-grammars lend themselves very well to algebraic manipulations. The second reason is that GF(2)-grammars provide a new way of looking at unambiguous grammars. For example, instead of proving that some language is inherently ambiguous, one can prove that no GF(2)-grammar describes it. While the latter condition is, strictly speaking, stronger, it may turn out to be easier to prove, because GF(2)-grammars have good algebraic properties and are also closed under symmetric difference, meaning that more tools can be used in the proof. ∗Saint-Petersburg State University. Supported by Russian Science Foundation, project 18-11-00100. 1 Finally, GF(2)-grammars generalize the notion of parity nondeterminism to grammars. Recall that the most common types of nondeterminism that are considered in complexity theory are classical nondeterminism, which corresponds to an existence of an accepting computation, unambiguous nondeterminism, which corresponds to an existence of unique accepting computation and parity nondeterminism, which corresponds to number of accepting computations being odd. Parity complexity classes, mainly ⊕L and ⊕P , are actively studied in the complexity theory. Perhaps most famously, the latter is used in the proof of Toda’s theorem, which statement does not concern any parity classes directly. Similarly to the parity complexity classes, the notion of ⊕FA was considered before [17], as a counterpart to NFA and UFA (unambiguous finite automaton). Tying to the theme of connections between unambiguous and GF(2)-grammars, classical and parity nondeterminism can be seen as two different generalisations of unambiguous nondeterminism: if the number of accepting computations is 0 or 1, then it is positive (classical case) and odd (parity case) at the same time; the same is not true for larger numbers, of course. The main result of this paper is Theorem 4.2 that establishes a strong necessary con- ∗ ∗ ∗ ditions on subsets of a1a2 : : : ak that are described by GF(2)-grammars. Theorem 4.1, a special case of Theorem 4.2, implies that there are no GF(2)-grammars for the languages n m ` n m ` L1 := f a b c j n = m or m = ` g and L2 := f a b c j n 6= m or m 6= ` g. As a consequence, both are inherently ambiguous. For L1, all previously known ar- guments establishing its inherent ambiguity were combinatorial, mainly based on Ogden’s lemma. Proving inherent ambiguity of L2 was a long-standing open question due to Autebert et al. [2, p. 375]. There is an interesting detiail here: back in 1966, Ginsburg and Ullian fully characterized bounded languages described by unambiguous grammars in terms of semi-linear sets [12, Theorems 5.1 and 6.1]. However, most natural ways to apply this characterization suffer from the same limitation: they mainly rely on words that are not in the language and much less on the words that are. Hence, “dense” languages like L2 leave them with almost nothing to work with. Moreover, L2 has an algebraic generating function, meaning that analytic methods cannot tackle it either. In fact, Flajolet [6], in his seminal work on analytic methods for proving grammar ambiguity, refers to inherent ambiguity of L2 as to a still open question (see page 286). 2 Basics The proofs that we will see later make heavy use of algebraic methods. For the algebraic parts, the exposition strives to be as elementary and self-contained as possible. Hence, I will prove a lot of lemmas that are by no way original and may be considered trivial by someone with good knowledge of commutative algebra. This is the intended effect; if you consider something to be trivial, you can skip reading the proof. If, on the other hand, you have some basic knowledge of algebra, but still find some of the parts to be unclear, you may 2 contact me and I will try to find a better wording. The intended “theoretical minimum” is being at least somewhat familiar with concepts of polynomials, rational functions and formal power series. But let us recall the definition and the basic properties of GF(2)-grammars first. This section is completely based on already published work: the original paper about GF(2)- operations by Bakinova et al. [3] and the paper about basic properties of GF(2)-grammars by Makarov and Okhotin [15]. Hence, all the proofs are omitted; for proofs and more thorough commentary on definitions refer to the aforementioned papers. If you are already familiar with both of them, you may skip straight to the next section. GF(2)-grammars are built upon GF(2)-operations [3]: symmetric difference and new operation called GF(2)-concatenation: K L = f w j the number of partitions w = uv; with u 2 K and v 2 L; is odd g From a syntactical standpoint, GF(2)-grammars do not differ from ordinary grammars. However, in the right-hand sides of the rules, the normal concatenation is replaced with GF(2)-concatenation, whereas multiple rules for the same nonterminal correspond to symmetric difference of given conditions, instead of their disjunction. Definition ([3]). A GF(2)-grammar is a quadruple G = (Σ; N; R; S), where: • Σ is the alphabet of the language; • N is the set of nonterminal symbols; • every rule in R is of the form A ! X1 ::: X`, with ` > 0 and X1;:::X` 2 Σ [ N, which represents all strings that have an odd number of partitions into w1 : : : w`, with each wi representable as Xi; • S 2 N is the initial symbol. The grammar must satisfy the following condition. Let Gb = (Σ; N; R;b S) be the correspond- ing ordinary grammar, with Rb = f A ! X1 :::X` j A ! X1 ::: X` 2 R g. It is assumed that, for every string w 2 Σ∗, the number of parse trees of w in Gb is finite; if this is not the case, then G is considered ill-formed. Then, for each A 2 N, the language LG(A) is defined as the set of all strings with an odd number of parse trees as A in Gb. Theorem A ([3]). Let G = (Σ; N; R; S) be a GF(2)-grammar. Then the substitution A = LG(A) for all A 2 N is a solution of the following system of language equations. A = i X1 ::: X` (A 2 N) A!X1 ::: X`2R Multiple rules for the same nonterminal symbol can be denoted by separating the alternatives with the “sum modulo two” symbol (⊕), as in the following example. 3 Example 2.1 ([3]). The following GF(2)-linear grammar defines the language f a`bmcn j ` = m or m = n; but not both g. S ! A ⊕ C A ! aA ⊕ B B ! bBc ⊕ C ! Cc ⊕ D D ! aDb ⊕ Indeed, each string a`bmcn with ` = m or with m = n has a parse tree, and if both equalities hold, then there are accordingly two parse trees, which cancel each other. 2n Example 2.2 ([3]). The following grammar describes the language f a j n > 0 g. S ! (S S) ⊕ a The main idea behind this grammar is that the GF(2)-square S S over a unary alphabet doubles the length of each string: L L = f a2` j a` 2 L g. The grammar iterates this doubling to produce all powers of two. As the previous example illustrates, GF(2)-grammars can describe non-regular unary languages, unlike ordinary grammars. We will need the classification of unary languages describable by GF(2)-grammars in the following Sections. Definition. A set of natural numbers S ⊆ N is called q-automatic [1], if there is a finite automaton over the alphabet Σq = f0; 1; : : : ; q − 1g recognizing base-q representations of these numbers. Let Fq[t] be the ring of polynomials over the q-element field GF(q), and let Fq[[t]] denote the ring of formal power series over the same field.

Bounded Languages Described by GF(2)-Grammars

The Mathematics of Syntactic Structure: Trees and Their Logics

Hierarchy and Interpretability in Neural Models of Language Processing ILLC Dissertation Series DS-2020-06

The Complexity of Narrow Syntax : Minimalism, Representational

What Was Wrong with the Chomsky Hierarchy?

Reflection in the Chomsky Hierarchy

Noam Chomsky

The Evolution of the Faculty of Language from a Chomskyan Perspective: Bridging Linguistics and Biology

Complexity of Natural Languages

Representations and Characterizations of Languages in Chomsky Hierarchy by Means of Insertion-Deletion Systems

Calibrating Generative Models: the Probabilistic Chomsky-Schutzenberger¨ Hierarchy∗

Learnability and Language Acquisition

Chomsky Hierarchy Summary