LNCS 7381, Pp

Multi-Tilde-Bar Derivatives Pascal Caron, Jean-Marc Champarnaud, and Ludovic Mignot LITIS, Université de Rouen, 76801 Saint-Etienne´ du Rouvray Cedex, France {pascal.caron,jean-marc.champarnaud,ludovic.mignot}@univ-rouen.fr Abstract. Multi-tilde-bar operators allow us to extend regular expressions. The associated extended expressions are compatible with the structure of Glushkov automata and they provide a more succinct repre- sentation than standard expressions. The aim of this paper is to examine the derivation of multi-tilde-bar expressions. Two types of computation are investigated: Brzozowski derivation and Antimirov derivation, as well as the construction of the associated automata. 1 Introduction Regular expression word derivatives have been introduced in [5] by Brzozowski in order to compute language quotients via expression derivatives: for any word w, the language denoted by the derivative of a regular expression E w.r.t. w is the left quotient of the language denoted by E w.r.t. w. Regular expression derivation plays a fundamental role in theory of automata. In particular, under the assumption that the set D of all the derivatives of a regular expression E is finite, it is possible to construct a FA (finite automaton) with D as a set of states that recognizes the language denoted by E. Word derivatives handle unrestricted regular expressions; they are themselves expressions and they provide a DFA (deterministic finite automaton), as far as the ACI (associativity, commutativity and idempotence) properties of the sum of two expressions are used. Alternative types of derivation have been designed since Brzozowski’s seminal work. Partial derivatives, due to Antimirov [2], only address simple regular expressions; they are sets of expressions and they provide both a DFA and a NFA (non-deterministic finite automaton). Antimirov derivatives have been recently extended to unrestricted regular expressions [10]; extended partial derivatives are sets of sets of expressions and they provide a DFA, a NFA and an AFA (alternating finite automaton) [11]. Some derivations are based on the linearization of the (simple) input expression: let us cite the continuations of Berry and Sethi [4], the c-continuations of Champarnaud and Ziadi [14] and the derivatives of Ilie and Yu [18]. Let us mention that Antimirov derivation has been extended to the case of weighted rational expressions [21,13]. As reported in [2], the concept of derivation has been successfully used to investigate the properties of regular expressions [17,15,7,20,3,1]. More recently, Brzozowski introduced a new approach for studying the state complexity of regular languages, based on the counting of their quotients (or of their derivatives) [6]. N. Moreira and R. Reis (Eds.): CIAA 2012, LNCS 7381, pp. 321–328, 2012. c Springer-Verlag Berlin Heidelberg 2012 322 P. Caron, J.-M. Champarnaud, and L. Mignot Moreover, derivatives provide a useful tool to implement regular matching algo- rithms [23,16], or scanner generators as reported in [22]. A close topic is the derivation of new operators that extend regular expressions. For example, the computation of the derivatives of an approximative regular expression (that denotes a languages at a bounded distance from a given language) has been presented in [12]. The aim of this paper is to investigate the derivation of the multi-tilde-bar expressions introduced in [8,9]. These expressions are built upon simple operators and multi-tilde-bar operators and their main interest is that they are compatible with the structure of Glushkov automata and more succinct than standard expressions. We provide formulae for the computation of word and partial derivatives of multi-tilde-bar expressions and investigate the properties of these derivatives. The next section gathers classical notions concerning regular languages, regular expressions and finite automata; it also recalls the definition and main properties of multi-tilde-bar operators. The definition of the quotient of the language of an extended to multi-tilde-bar expression is introduced in Section 3. Section 4 is devoted to the computation of the Brzozowski derivatives of an extended expression and Section 5 to the computation of the Antimirov derivatives. In both cases, the construction of the associated automaton is provided. 2 Preliminaries We recall some definitions and notation concerning regular languages, regular expressions , finite automata and multi-tilde-bar expressions. For further details about these topics, we refer to classical books such as [24]. Languages, Regular Expressions and Automata An alphabet is a finite set of symbols. Given an alphabet Σ, any subset of Σ∗ is a language over Σ.Theset of regular languages over Σ is denoted by Reg(Σ∗) and is defined as the smallest family of languages containing ∅ and {a} for every symbol a in Σ and closed under union, catenation and Kleene star. A regular expression E over an alphabet Σ is inductively defined by E =0, E =1,E = a, E =(F + G), E =(F · G), E =(F ∗) with a asymbolin Σ,andF and G two regular expressions over Σ.Thelanguage denoted by a regular expression is inductively defined by L(0) = ∅, L(1) = {ε}, L(a)={a}, L(F + G)=L(F ) ∪ L(G), L(F · G)=L(F ) · L(G)andL(F ∗)=L(F )∗, with a asymbolinΣ,andF and G two regular expressions over Σ.Byconstruction, the language denoted by a regular expression is regular. The alphabetic width |E| of E is the number of occurrences of symbols of Σ appearing in E.Afinite automaton A is a 5-tuple (Σ,Q,I,F,δ)whereΣ is an alphabet, Q is a finite set of states, I ⊂ Q asetofinitial states, F ⊂ Q asetoffinal states and δ ⊂ Q×Σ ×Q asetoftransitions.Thesetδ can be seen as a function from Q×Σ to 2Q defined by q ∈ δ(q, a) ⇔ (q, a, q ) ∈ δ. The domain of the function δ can be extended Q ∗ to 2 × Σ by setting, for all Q ⊂ Q, δ(Q ,ε)=Q , δ(Q ,a)= q∈Q δ(q, a), δ(Q,a· w)=δ(δ(Q,a),w) for all word w in Σ∗ . The language recognized by the automaton A is the set L(A)={w ∈ Σ∗ | δ(I,w) ∩ F = ∅}. A language Multi-Tilde-Bar Derivatives 323 L is recognizable if there exists an automaton that recognizes it. The set of recognizable languages over Σ is denoted by Rec(Σ∗). Kleene theorem [19] asserts that Reg(Σ∗) = Rec(Σ∗). Consequently , for every regular language L,there exist an automaton A and an expression E such that L = L(E)=L(A). The Multi-tilde-Bar Operators [8,9] The unary operators tilde, denoted by ,andbar, denoted by are defined for every expression E by L( E )=L(E) ∪{ε} and L( E )=L(E) \{ε}.Theyare extended to multi-tilde-bar operators, which are applied to a list of expressions, according to the following definitions. Let n be a positive integer. For convenience, the list (E1,...,En)ofexpres- sions is denoted by E1,n. Similarly, a catenation E1 ···En is denoted by E1···n. The set of integers {1,...,n} is denoted by 1,n. The subset of pairs (i, j)such 2 that if 1 ≤ i ≤ j ≤ n is denoted by 1,n≤. The set of finite lists of pairs in 2 1,n≤ is denoted by Sn. Let S be a list in Sn.Letk be in 1,n. The list S≤k (resp. S≥k) is defined by S≤k =((i, f) ∈ S | f ≤ k)(resp.S≥k =((i − k +1,f − k +1)∈ S | i ≥ k)). Letusnoticethatarenumberingisperformed for the computation of S≥k.A list S is said to be free if for all pairs (i, f), (i,f)inS such that (i, f) =( i,f), i, f ∩ i ,f = ∅.LetL1,...,Ln be n nonempty regular languages over Σ and w be a word in L1 ···Ln. A sequence (w1,...,wn) satisfying w1 ···wn = w ∧∀k ∈ 1,n,wk ∈ Lk is said to be a split up of w over (L1,...,Ln). Multi-tilde-bar operators are a natural combination of multi-tilde and multi- bar operators [9]. The respective role of tildes and bars is explicited in the two following definitions. Definition 1. Let (w1,...,wn) be a split up of a word w over a list of languages L ∪{ε},...,L ∪{ε} T S w ,...,w ( 1 n ).Let be a free list in n. The sequence ( 1 n) T w ε k ∈ i, f w ∈ L is generated by the list if it holds: k = if (i,f)∈T and k k otherwise. Bars are used to forbid some combinations of tildes. Consequently, the satisfac- tion of a bar by a sequence has to be defined with a list of tildes as a context. Definition 2. Let E1,n be a list of n expressions. Let (w1,...,wn) be a split up of a word w over (L(E1) ∪{ε},...,L(En) ∪{ε}) generated by a free list T in 2 Sn.Letb =(i, f) be a pair in 1,n≤ \ T .Thebarb is said to be satisfied by (w1,...,wn) w.r.t. T if at least one of the three following conditions is satisfied: (1) there exists a pair t in T such that t overlaps b, (2) there exists a pair t in T such that b is included in t, (3) wi ···wf = ε. According to the two previous definitions, the language denoted by a multi-tilde- bar can be expressed as follows: Definition 3 ([8]). Let E1,n be a list of expressions over an alphabet Σ and L be the list (L(E1) ∪{ε},...,L(En) ∪{ε}) of languages.

LNCS 7381, Pp

Tugboat, Volume 11 (1990), No

Frequently Asked Questions Coins and Notes July 2020

Ffontiau Cymraeg

Combining Diacritical Marks Range: 0300–036F the Unicode Standard

Part 1: Introduction to The

Typing in Greek Sarah Abowitz Smith College Classics Department

Basis Technology Unicode対応ライブラリスペックシート文字コードその他の名称 Adobe-Standard-Encoding A

Guide for the Use of the International System of Units (SI)

Alphabets, Letters and Diacritics in European Languages (As They Appear in Geography)

Supplemental Punctuation Range: 2E00–2E7F

Diacritics-ELL.Pdf

List of Approved Special Characters