and Completeness 15-150: Principles of Functional Programming – Lecture 17

Giselle Reis

A regular expression is a finite way to describe potentially infinite sets of strings in such a way that it is possible to decide whether a string belongs to this set or not. In other words, it defines a pattern for strings. If r is a regular expression, the set of string which match its pattern is called r’s language and is denoted by L(r). Regular expressions can be built in many ways, we will use five basic constructors for our examples: 1. a single character (c); 2. the empty string (1);

3. concatenation of regular expressions (r1r2);

4. alternation of regular expressions (r1 + r2); 5. the Kleene star (r∗). The language of a regular expression constructed from each of those operators can be inductively defined as follows:

L(c) = {c} L(1) = {“”} L(r1r2) = {s1s2 | s1 ∈ L(r1) and s2 ∈ L(r2)} L(r1 + r2) = {s | s ∈ L(r1) or s ∈ L(r2)} ∗ ∗ L(r ) = {“”} ∪ {s1s2 | s1 ∈ L(r) and s2 ∈ L(r )} Using these definitions (and continuations) we have defined a function which will do just that. 1 datatype regex= 2 Char of char 3 | One 4 | Times of regex* regex 5 | Plus of regex* regex 6 | Star of regex; 7 8 fun match(Charc)sk=( cases 9 ofc’::l =>c=c’ andalsokl 10 | [] => ) 11 | match Onesk=ks 12 | match(Times(r1, r2))sk= 13 match r1s(fn rest => match r2 restk) 14 | match(Plus(r1, r2))sk= 15 match r1sk orelse match r2sk 16 | match(Starr)sk= 17ks orelse 18 matchrs(fn rest => rest <>s andalso match(Starr) restk); 19 20 fun acceptrs= matchr(String.explodes)(fn rest => rest = []); But how do we know this function is correct? First of all we need to define what it means to be correct. We want that our implementation gives no false positives and no false negatives. In other words, acceptrs evaluates to true iff s ∈ L(r). This can be split into two desiderata:

1 1. If s ∈ L(r) then acceptrs= true .

2. If s∈ / L(r) then acceptrs= false .

Desideratum 1 is called completeness. Intuitively, it states that the function accept is complete, in the sense that it will accept every s in the domain. Desideratum 2 is called soundness. Intuitively, it states that accept is sound, i.e., it will not return true for some s that is not in the domain. Take a while to think about these definitions. What happens if our function is sound but not complete? What if it is complete but not sound?

Showing the correctness of accept amounts to showing it is sound and complete. For our proofs, it will be easier if we make everything in terms of positive results. So by using the contrapositive1 of 2, we have:

1. If s ∈ L(r) then acceptrs= true .

2. If acceptrs= true then s ∈ L(r).

If we want to be really precise, modifying the statements this way requires a proof that the function accept is total. We will simply assume this for now and overlook this detail (but it is good to know!). Our function accept uses match with the same r and s but a new parameter, a continuation, k. Let’s state the properties we want to prove in terms of match then. For simplicity we are seemlessly converting char lists into strings and vice-versa.

Theorem 1 (Completeness). For every r: regex, s: char list and k: char list -> bool, there exists p: char list and u: char list such that if s = pu, p ∈ L(r) and ku= true , then matchrs k= true .

Initially one could think that a simple structural induction on r would do the job (given the way the function is defined). Although this works for most cases, it will fail for r∗, since one of the recursive calls is performed on the same regex. For this reason we need to consider also the string s (the thing that is reducing when the call with r∗ is made). This is achieved by performing a proof by lexicographic induction on the pair (r, s). This means that we can apply the induction hypothesis either on r0 < r, or on r0 = r and s0 < s.

About quantifiers The of quantifiers depend on whether the statement is an as- sumption or a proof goal. If we are using a fact ∀x.P (x), then we can safely replace x by any concrete object we want, since we know (by assumption) that it holds for any x. On the other hand, if we are proving a fact ∀x.P (x), then we need to keep this x generic if we want to actually prove that. The treatment of existential quantifiers is dual. In the proof below, we will use x for quantified variables we can specialize and x for quantified variables that need to remain generic.

Proof. The proof proceeds by lexicographic induction on the pair (r, s). ∗ There will be two base cases (1, c) and three inductive cases (r1r2, r1 + r2, r ).

• Base cases:

1. r = 1

To show: if s= p u, p ∈ L(1) and k u= true , then match One s k= true .

Assumptions:

1The contrapositive of A → B is ¬B → ¬A.

2 H1 s = pu H2 p ∈ L(1) = {“”} H3 ku= true According to the definition of match: 1 | match Onesk=ks

From H2 and H1, p = “” and s = u. Since ku= true from H3, then ks= true . Therefore match Onesk= true . 2. r = c

To show: if s= p u, p ∈ L( c ) and k u= true , then match(Char c) s k= true .

Assumptions: H1 s = pu H2 p ∈ L(c) = {c} H3 ku= true According to the definition of match: 1 fun match(Charc)sk=(cases 2 ofc’::l =>c=c’ andalsokl 3 | [] => false)

From H2 and H1, p = c and s = c :: u. As s is not empty, the evaluation of the Char case falls on the first option c’::l, where c=c’ and l=u . This reduces to c=c andalso ku . The first conjunct is trivially true and the second one holds from H3. Therefore match(Charc)sk= true .

• Inductive cases:

1. r = r1r2

IH1: if s= p u, p ∈ L(r1) and k u= true , then match r1 s k= true .

IH2: if s= p u, p ∈ L(r2) and k u= true , then match r2 s k= true .

To show: if s= p u, p ∈ L(r1r2) and k u= true , then match(Times(r1, r2)) s k= true .

Assumptions: H1 s = pu

H2 p ∈ L(r1r2) H3 ku= true

From the definition of L(r1r2) and H2, we derive:

H4 p = p1p2 H5 p1 ∈ L(r1) H6 p2 ∈ L(r2)

Using s = p2u, H6 and H3, we have the necessary assumptions to apply IH2: H7 match r2(p_2@u)k= true

3 To apply IH1, we can choose s = p1(p2u) and from H5 we have the second assumption p1 ∈ L(r1). But we still need a function k such that k(p_2@u)= true . We do have such a function from H7! So we can simply define k=(fns => match r2sk) . Now we can apply IH1 and we obtain: H8 match r1(p_1@(p_2@u))(fns => match r2sk)= true This is precisely the definition of match for the Times case: 1 | match(Times(r1, r2))sk= 2 match r1s(fn rest => match r2 restk)

2. r = r1 + r2

IH1: if s= p u, p ∈ L(r1) and k u= true , then match r1 s k= true .

IH2: if s= p u, p ∈ L(r2) and k u= true , then match r2 s k= true .

To show: if s= p u, p ∈ L(r1r2) and k u= true , then match(Plus(r1, r2)) s k= true .

Assumptions: H1 s = pu

H2 p ∈ L(r1 + r2) H3 ku= true According to the definition of match: 1 | match(Plus(r1, r2))sk= 2 match r1sk orelse match r2sk

From the definition of L(r1 + r2) we derive:

H4 p ∈ L(r1) or p ∈ L(r2). We have two sub-cases:

H4 p ∈ L(r1) We can use H1, H4 and H3 to satisfy the assumptions of IH1 and thus obtain: H5 match r1sk= true . Therefore match(Plus(r1, r2))sk= true because of the first disjunct in the body.

H4 p ∈ L(r2) We can use H1, H4 and H3 to satisfy the assumptions of IH2 and thus obtain: H5 match r2sk= true . Therefore match(Plus(r1, r2))sk= true because of the second disjunct in the body. 3. r = r0∗

IH1: if s= p u, p ∈ L(r0) and k u= true , then matchr’ s k= true . IH2: for all s’ < s , if s’= p u, p ∈ L(r0∗) and k u= true , then match(Starr’) s’ k= true .

To show: if s= p u, p ∈ L(r0∗) and k u= true , then match(Starr’) s k= true .

Assumptions:

4 H1 s = pu H2 p ∈ L(r0∗) H3 ku= true 0 0∗ From H2 we know that either p = “” or p = p1p2 such that p1 ∈ L(r ) and p2 ∈ L(r ). We will consider the two cases: (a) p = “”: In this case s = u and ks=ku= true (from H3), therefore the disjunction in the body of match for the Star case will hold.

(b) p = p1p2:

Assumptions:

H4 p = p1p2 0 H5 p1 ∈ L(r ) 0∗ H6 p2 ∈ L(r )

Again there are two cases: either p2u < s (and thus p1 6= “”) or p2u = s (and p1 = “”):

i. p1 = “” The hypotheses then become:

H1 s = p2u 0∗ H2 p2 ∈ L(r ) H3 ku= true

H4 p = p2 H5“” ∈ L(r0) 0∗ H6 p2 ∈ L(r )

Again there are two choices: either p2 = “” or p2 6= “”. Let’s analyse both of them carefully. If p2 = “”, then we fallback to the previous case, where p = “”, and this is solved. If p2 6= “”, then we can actually say that p1 = p2 6= “” (since p2 must be 0 in L(r ) by H2 and p 6= “”), and we fall on the next case. This forces p1 to be the consumed string. So this is a non-case. Observe that, because of the second justification, we can safely add the restriction rest <> s to the code without sacrificing completeness.

ii. p1 6= “” Then p2u < s and using s = p2u, H6 and H3 we have the necessary assumptions to use IH2, thus obtaining H8: H7 p2@u <>s H8 match(Stars’)(p2@u)k= true . 0 To apply IH1, we can choose s = p1(p2u) and we have p1 ∈ L(r ) from H5. All we need to do is construct a function k such that k(p2@u)= true . This function can be obtained by combining H7 and H8: k=(fnx => p2@u <>s andalso match(Stars’)(p2@u)k) . Hence by using IH1 we get: H8 matchr’s(fnx => p2@u <>s andalso match(Stars’)(p2@u)k)= true According to the definition of match: 1 | match(Starr)sk= 2ks orelse 3 matchrs(fn rest => rest <>s andalso match(Starr) restk);

Soundness guarantees we do not have false positives:

5 2 (Soundness). For every r: regex, s: char list and k: char list -> bool, there exists p: char list and u: char list such that if matchrsk= true , then s = pu, p ∈ L(r) and ku= true.

Proof. • Base cases:

1. r = 1

To show: if match One s k= true , then s= p u, p ∈ L(1) and k u= true . Definition of match for One: 1 | match Onesk=ks

Since match Onesk= true we know that ks= true . Taking p = “” we have that s = pu, p ∈ L(1) (because “” ∈ L(1) by definition) and ku=ks= true . 2. r = c

To show: if match(Charc) s k= true , then s= p u, p ∈ L(c) and k u= true . Definition of match for Char: 1 fun match(Charc)sk=(cases 2 ofc’::l =>c=c’ andalsokl 3 | [] => false)

Since match(Charc)sk= true then it must have evaluated to the first case, i.e., c=c ’ andalsokl= true . Instantiating p with c and u with l, we have: s = pu, p ∈ L(c) (because c ∈ L(c)) and ku=kl= true .

• Inductive cases:

1. r = r1r2

IH1: if match r1 s k= true , then s= p u, p ∈ L(r1) and k u= true . IH2: if match r2 s k= true , then s= p u, p ∈ L(r2) and k u= true .

To show: if match(Times(r1, r2)) s k= true , then s= p u, p ∈ L(r1r2) and k u= true . Definition of match for Times: 1 | match(Times(r1, r2))sk= 2 match r1s(fn rest => match r2 restk)

H1 match r1s(fn rest => match r2 restk) Using IH1 we obtain: H2 s = pu

H3 p ∈ L(r1) H4 match r2uk= true From H4 and IH2: H5 u = p0u0 0 H6 p ∈ L(r2) H7 ku’ = true

6 0 0 0 From H3 and H6 we conclude pp ∈ L(r1r2), so we can choose s = pp u and the desired result follows.

2. r = r1 + r2

IH1: if match r1 s k= true , then s= p u, p ∈ L(r1) and k u= true . IH2: if match r2 s k= true , then s= p u, p ∈ L(r2) and k u= true .

To show: if match(Plus(r1, r2)) s k= true , then s= p u, p ∈ L(r1 + r2) and k u= true . Definition of match for Plus: 1 | match(Plus(r1, r2))sk= 2 match r1sk orelse match r2sk

As match(Plus(r1, r2))sk= true , there are two cases: (a) match r1sk= true Then, by IH1, we know: H1 s = p0u0 0 H2 p ∈ L(r1) H3 ku’ = true 0 0 0 From H2 we conclude that p ∈ L(r1 +r2), so we can choose s = p u and all the needed conclusions follow. (b) match r2sk= true Analogous to the previous case, using IH2. 3. r = r0∗

IH1: if matchr’ s k= true , then s= p u, p ∈ L(r0) and k u= true . IH2: for all s’ < s , if match(Starr’) s’ k= true , then s’= p u, p ∈ L(r∗) and k u= true .

To show: if match(Starr’) s k= true , then s= p u, p ∈ L(r∗) and k u= true . Definition of match for Star: 1 | match(Starr)sk= 2ks orelse 3 matchrs(fn rest => rest <>s andalso match(Starr) restk);

H1 match(Starr)sk= true There are two cases: (a) ks= true In this case we can instantiate p with “”, and we have s = u, “” ∈ L(r∗) (by definition) and ku=ks= true . (b) matchr’s(fn rest => rest <>s andalso match(Starr’)restk)= true Using IH1 we obtain: H2 s = pu H3 p ∈ L(r0) H4 u <>s andalso match(Starr’)uk= true

7 By H4 we know that u <>s and since u is a suffix of s, then u < s. Together with the second conjunct from H4, we can apply IH2, thus obtaining: H5 u = p0u0 H6 p0 ∈ L(r0∗) H7 ku’ = true By H3 and H6, we have that pp0 ∈ L(r0∗), so take s = pp0u0 and we have the desired result.

8