Refinement via Horn Constraint Optimization?

Kodai Hashimoto and Hiroshi Unno

University of Tsukuba {kodai, uhiro}@logic.cs.tsukuba.ac.jp

Abstract. We propose a novel method for inferring refinement types of higher-order functional programs. The main advantage of the proposed method is that it can infer maximally preferred (i.e., Pareto optimal) refinement types with respect to a user-specified preference order. The flexible optimization of refinement types enabled by the proposed method paves the way for interesting applications, such as inferring most-general characterization of inputs for which a given program satisfies (or vi- olates) a given safety (or termination) property. Our method reduces such a type optimization problem to a Horn constraint optimization problem by using a new refinement that can flexibly rea- son about non-determinism in programs. Our method then solves the constraint optimization problem by repeatedly improving a current solu- tion until convergence via template-based invariant generation. We have implemented a prototype inference system based on our method, and obtained promising results in preliminary experiments.

1 Introduction

Refinement types [5, 20] have been applied to safety verification of higher-order functional programs. Some existing tools [9, 10, 16–19] enable fully automated verification by refinement type inference based on invariant generation tech- niques such as abstract interpretation, predicate abstraction, and CEGAR. The goal of these tools is to infer refinement types precise enough to verify a given safety specification. Therefore, types inferred by these tools are often too specific to the particular specification, and hence have limited applications. To remedy the limitation, we propose a novel refinement type inference method that can infer maximally preferred (i.e., Pareto optimal) refinement types with respect to a user-specified preference order. For example, let us con- sider the following summation function (in OCaml syntax) let rec sum x = if x = 0 then 0 else x + sum (x - 1) A refinement type of sum is of the form (x : {x : int | P (x)}) → {y : int | Q(x, y)}. Here, P (x) and Q(x, y) respectively represent pre and post conditions of sum. Note that the postcondition Q(x, y) can refer to the argument x as well as the return value y. Suppose that we want to infer a maximally-weak predicate for

? This work was supported by Kakenhi 25730035, 23220001, 15H05706, and 25280020. P and maximally-strong predicate for Q within a given underlying theory. Our method allows us to specify such preferences as type optimization constraints: maximize(P ), minimize(Q). Here, maximize(P ) (resp. minimize(Q)) means that the set of the models of P (x) (resp. Q(x, y)) should be maximized (resp. minimized). Our method then infers a Pareto optimal refinement type with respect to the given preferences. In general, however, this of multi-objective optimization involves a trade-off among the optimization constraints. In the above example, P may not be weakened without also weakening Q. Hence, there often exist multiple optima. Actually, all the following are Pareto optimal refinement types of sum.1 (x : {x : int | x = 0}) → {y : int | y = x} (1) (x : {x : int | true}) → {y : int | y ≥ 0} (2) (x : {x : int | x < 0}) → {y : int | false} (3) Our method further allows us to specify a priority order on the predicate variables P and Q. If P is given a higher priority over Q (we write P @ Q), our method infers the type (2), whereas we obtain the type (3) if Q @ P . Interestingly, (3) expresses that sum is non-terminating for any input x < 0. The flexible optimization of refinement types enabled by our method paves the way for interesting applications, such as inferring most-general characteri- zation of inputs for which a given program satisfies (or violates) a given safety (or termination) property. Furthermore, our method can infer an upper bound of the number of recursive calls if the program is terminating, and can find a minimal-length counterexample path if the program violates a safety property. Internally, our method reduces such a refinement type optimization prob- lem to a constraint optimization problem where the constraints are expressed as existentially quantified Horn clauses over predicate variables [1, 11, 19]. The constraint generation here is based on a new refinement type system that can reason about (angelic and demonic) non-determinism in programs. Our method then solves the constraint optimization problem by repeatedly improving a cur- rent solution until convergence. The constraint optimization here is based on an extension of template-based invariant generation [3,7] to existentially quantified Horn clause constraints and prioritized multi-objective optimization. The rest of the paper is organized as follows. Sections 2 and 3 respectively formalize our target language and its refinement type system. The applications of refinement type optimization are explained in Section 4. Section 5 formalizes Horn constraint optimization problems and the reduction from type optimiza- tion problems. Section 6 proposes our Horn constraint optimization method. Section 7 reports on a prototype implementation of our method and the results of preliminary experiments. We compare our method with related work in Sec- tion 8 and conclude the paper in Section 9. An extended version of the paper with proofs is available online [8]. 1 Here, we use quantifier-free linear arithmetic as the underlying theory and consider only atomic predicates for P and Q. E[op(ne)] −→D E[ op (ne)] (E-Op) E[let x = v in e] −→D E[[v/x]e] J K (E-Let) D(f) = λx.e |x| = |v| e e e (E-App) E[let x = ∗∀ in e] −→D E[[n/x]e] E[f v] −→D E[[v/x]e] e e e (E-Rand∃) if n = 0 then i = 1 else i = 2 (E-If) E[let x = ∗∃ in e] −→D E[[n/x]e] E[ifz n then e1 else e2] −→D E[ei] (E-Rand∀)

Fig. 1. The operational semantics of our language L

2 Target Language L

This section introduces a higher-order call-by-value functional language L, which is the target of our refinement type optimization. The syntax is defined as follows.

m (programs) D ::= {fi 7→ λxei.ei}i=1 (expressions) e ::= x | e1 e2 | n | op(e1, . . . , ear(op)) | ifz e1 then e2 else e3 | let x = e1 in e2 | let x = ∗∀ in e | let x = ∗∃ in e (values) v ::= n | f ve (eval. contexts) E ::= [ ] | E e | v E | ifz E then e1 else e2 | let x = E in e

Here, x and f are meta-variables ranging over variables. n and op respectively represent integer constants and operations such as + and ≥. ar(op) expresses the arity of op. We write xe (resp. ve) for a sequence of variables xi (resp. values vi) and |xe| for the length of xe. For simplicity of the presentation, the language L has integers as the only . We encode Boolean values true and false m respectively as non-zero integers and 0. A program D = {fi 7→ λxei.ei}i=1 is a mapping from variables fi to expressions λxei.ei, where λx.ee is an abbreviation of λx . . . . λx .e. We define dom(D) = {f , . . . , f } and ar(f ) = |x |. A value 1 |xe| 1 m i ei f ve is required to satisfy 1 ≤ |ve| < ar(f). The call-by-value operational semantics of L is given in Figure 1. Here, op represents the integer function denoted by op. Both expressions let x = ∗∀ Jin eK and let x = ∗∃ in e generate a random integer n, bind x to it, and evaluate e. They are, however, interpreted differently in our refinement type system (see Section 3). We support these expressions to model various non-deterministic behaviors caused by, for example, user inputs, inputs from communication chan- ∗ nels, interrupts, and thread schedulers. We write −→D to denote the reflexive and transitive closure of −→D.

3 Refinement Type System for L

In this section, we introduce a refinement type system for L that can reason about non-determinism in programs. We then formalize refinement type optimization problems (in Section 3.1), which generalize ordinary type inference problems. The syntax of our refinement type system is defined as follows.

(refinement types) τ ::= {x | φ} | (x : τ1) → τ2 (type environments) Γ ::= ∅ | Γ, x : τ | Γ, φ (formulas) φ ::= t1 ≤ t2 | > | ⊥ | ¬φ | φ1 ∧ φ2 | φ1 ∨ φ2 | φ1 ⇒ φ2 (terms) t ::= n | x | t1 + t2 | n · t (predicates) p ::= λx.φe An integer refinement type {x | φ} equipped with a formula φ for type refinement represents the type of integers x that satisfy φ. The scope of x is within φ. We often abbreviate {x | >} as int. A function refinement type (x : τ1) → τ2 represents the type of functions that take an argument x of the type τ1 and return a value of the type τ2. Here, τ2 may depend on the argument x and the scope of x is within τ2. For example, (x : int) → {y | y > x} is the type of functions whose return value y is always greater than the argument x. We often write fvs(τ) to denote the set of free variables occurring in τ. We define dom(Γ ) = {x | x : τ ∈ Γ } and write Γ (x) = τ if x : τ ∈ Γ . In this paper, we adopt formulas φ of the quantifier-free theory of linear integer arithmetic (QFLIA) for type refinement. We write |= φ if a formula φ is valid in QFLIA. Formulas > and ⊥ respectively represent the tautology and the contradiction. Note that atomic formulas t1 < t2 (resp. t1 = t2) can be encoded as t1 + 1 ≤ t2 (resp. t1 ≤ t2 ∧ t2 ≤ t1) in QFLIA. The inference rules of our refinement type system are shown in Figure 2. Here, a type judgment ` D : Γ means that a program D is well-typed under a refinement type environment Γ . A type judgment Γ ` e : τ indicates that an expression e has a refinement type τ under Γ . A subtype judgment Γ ` τ1 <: τ2 states that τ1 is a subtype of τ2 under Γ . Γ occurring in the rules ISub and Rand∃ is defined by ∅ = >, Γ, x : {ν | φJ} K= Γ ∧ [x/ν]φ, Γ, x :(ν : τ1) → Ty τ2 = Γ , and Γ, φ J=K Γ ∧ φJ. In the rule OpK , JopK representsJ a refinement typeK ofJ opK that soundlyJ K abstractsJ K the behavior of theJ K function op . For example, + Ty = (x : int) → (y : int) → {z | z = x + y}. J K J KAll the rules except Rand∀ and Rand∃ for random integer generation are essentially the same as the previous ones [18]. The rule Rand∀ requires e to have τ for any randomly generated integer x. Therefore, e is type-checked against τ under a type environment that assigns int to x. By contrast, the rule Rand∃ requires e to have τ for some randomly generated integer x. Hence, e is type- checked against τ under a type environment that assigns a type {x | φ} to x for some φ such that fvs(φ) ⊆ dom(Γ ) ∪ {x} and |= Γ ⇒ ∃x.φ. For example, x : int ` let y = ∗∃ in x + y : {r | r = 0} is derivableJ K because we can derive x : int, y : {y | y = −x} ` x + y : {r | r = 0}. Thus, our new type system allows us to reason about both angelic ∗∃ and demonic ∗∀ non-determinism in higher-order functional programs. We now discuss properties of our new refinement type system. We can prove the following progress theorem in a standard manner. Theorem 1 (Progress). Suppose that we have ` D : Γ , dom(Γ ) = dom(D), 0 0 and Γ ` e : τ. Then, either e is a value or e −→D e for some e . Γ ` D(f): Γ (f) Γ ` e1 : τ1 (for each f ∈ dom(D)) Γ, x : τ1 ` e2 : τ2 x 6∈ fvs(τ2) (Prog) (Let) ` D : Γ Γ ` let x = e1 in e2 : τ2

Γ (x) = {ν | φ} Γ, x : int ` e : τ x 6∈ fvs(τ) (IVar) (Rand∀) Γ ` x : {ν | ν = x} Γ ` let x = ∗∀ in e : τ

Γ (x) = (ν : τ1) → τ2 fvs(φ) ⊆ dom(Γ ) ∪ {x} (FVar) |= Γ ⇒ ∃x.φ Γ ` x :(ν : τ1) → τ2 Γ, x : {x | φ}J ` Ke : τ x 6∈ fvs(τ) (Rand∃) Γ, x : τ1 ` e : τ2 Γ ` let x = ∗ in e : τ (Abs) ∃ Γ ` λx.e :(x : τ1) → τ2 Γ ` e1 : {ν | φ} Γ ` e1 :(x : τ1) → τ2 Γ, φ ∧ ν = 0 ` e2 : τ Γ ` e2 : τ1 Γ, φ ∧ ν 6= 0 ` e3 : τ (App) (If) Γ ` e1 e2 : τ2 Γ ` ifz e1 then e2 else e3 : τ

Ty Γ ` n : {ν | ν = n} (Int) Γ ` op <:(x1 : τ1)→...→(xm : τm)→τ ΓJ `Kei : τi (for each i ∈ {1, . . . , m}) Γ ` e : τ 0 Γ ` τ 0 <: τ (Op) (Sub) Γ ` op(e1, . . . , em): τ Γ ` e : τ 0 Γ ` τ1 <: τ1 |= Γ ∧ φ ⇒ φ 0 0 1 2 Γ, ν : τ1 ` τ2 <: τ2 J K (ISub) (FSub) Γ ` {ν | φ } <: {ν | φ } 0 0 1 2 Γ ` (ν : τ1) → τ2 <:(ν : τ1) → τ2

Fig. 2. The inference rules of our refinement type system

We can also show the type preservation theorem in a similar manner to [18]. Theorem 2 (Preservation). Suppose that we have ` D : Γ and Γ ` e : τ. If 0 0 e is of the form let x = ∗∃ in e0, then we get Γ ` e : τ for some e such that 0 0 0 0 e −→D e . Otherwise, we get Γ ` e : τ for any e such that e −→D e .

3.1 Refinement Type Optimization Problems We now define refinement type optimization problems, which generalize refine- ment type inference problems addressed by previous work [9, 10, 15–19]. We first introduce the notion of refinement type templates. A refinement type template of a function f is the refinement type obtained from the ordinary ML- style type of f by replacing each base type int with an integer refinement type {ν | P (x,e ν)} for some fresh predicate variable P that represents an unknown predicate to be inferred, and each T1 → T2 with a (dependent) function refinement type (x : τ1) → τ2. For example, from an ML-style type (int → int) → int → int, we obtain the following template.

(f :(x1 : {x1 | P1(x1)}) → {x2 | P2(x1, x2)}) →

(x3 : {x3 | P3(x3)}) → {x4 | P4(x3, x4)} Note here that the first argument f is not passed as an argument to P3 and P4 because f is of a function type and never occurs in QFLIA formulas for type refinement. A refinement type template of a program D with dom(D) = {f1, . . . , fm} is the refinement type environment ΓD = f1 : τ1, . . . , fm : τm, where each τi is the refinement type template of fi. We write pvs(ΓD) for the set of predicate variables that occur in ΓD.A predicate substitution θ for ΓD is a map from each P ∈ pvs(ΓD) to a closed predicate λx1, . . . , xar(P ).φ, where ar(P ) represents the arity of P . We write θΓD to denote the application of a substitution θ to ΓD. We also write dom(θ) to represent the domain of θ. We can define ordinary refinement type inference problems as follows. Definition 1 (Refinement Type Inference). A refinement type inference problem of a program D is a problem to find a predicate substitution θ such that ` D : θΓD. We now generalize refinement type inference problems to optimization problems. Definition 2 (Refinement Type Optimization). Let D be a program, ≺ be a strict partial order on predicate substitutions, and Θ = {θ | ` D : θΓD}.A predicate substitution θ ∈ Θ is called Pareto optimal with respect to ≺ if there is no θ0 ∈ Θ such that θ0 ≺ θ.A refinement type optimization problem (D, ≺) is a problem to find a Pareto optimal substitution θ ∈ Θ with respect to ≺. In the remainder of the paper, we will often consider type optimization problems extended with user-specified constraints and/or templates for some predicate variables (see Section 4 for examples and Section 5 for formal definitions). The above definition of type optimization problems is abstract in the sense that ≺ is only required to be a strict partial order on predicate substitutions. We below introduce an example concrete order, which is already explained informally in Section 1 and adopted in our prototype implementation described in Section 7. The order is defined by two kinds of optimization constraints: the optimization direction (i.e. minimize/maximize) and the priority order on predicate variables. Definition 3. Suppose that

– P = {P1,...,Pm} is a subset of pvs(ΓD), – ρ is a map from each predicate variable in P to an optimization direction d that is either ↑ (for maximization) or ↓ (for minimization), and 2 – @ is a strict total order on P that expresses the priority. We below assume that P1 @ ··· @ Pm. We define a strict partial order ≺ on predicate substitutions that respects ρ (ρ,@) and @ as the following lexicographic order: θ ≺ θ ⇐⇒ ∃i ∈ {1, . . . , m} . θ (P ) ≺ θ (P )∧∀j < i. θ (P ) ≡ θ (P ) 1 (ρ,@) 2 1 i ρ(Pi) 2 i 1 j ρ(Pj ) 2 j

Here, a strict partial order ≺d and an equivalence relation ≡d on predicates are defined as follows.

2 If @ were partial, the relation ≺(ρ,@) defined shortly would not be a strict partial order. Our implementation described in Section 7 uses topological sort to obtain a strict total order @ from a user-specified partial one. – p1 ≺d p2 ⇐⇒ p1 d p2 ∧ p2 6d p1, – p1 ≡d p2 ⇐⇒ p1 d p2 ∧ p2 d p1, – λx.φe 1 ↑ λx.φe 2 ⇐⇒|= φ2 ⇒ φ1, and λx.φe 1 ↓ λx.φe 2 ⇐⇒|= φ1 ⇒ φ2. Example 1. Recall the function sum and its type template with the predicate variables P,Q in Section 1. Let us consider optimization constraints ρ(P ) =↑, ρ(Q) =↓, and P @ Q, and predicate substitutions

– θ1 = {P 7→ λx. x = 0,Q 7→ λx, y. y = x}, – θ2 = {P 7→ λx. >,Q 7→ λx, y. y ≥ 0}, and – θ3 = {P 7→ λx. x < 0,Q 7→ λx, y. ⊥}. We then have θ ≺ θ and θ ≺ θ , because (λx. >) ≺ (λx. x = 0) and 2 (ρ,@) 1 2 (ρ,@) 3 ↑ (λx. >) ≺↑ (λx. x < 0). ut

4 Applications of Refinement Type Optimization

In this section, we present applications of refinement type optimization to the problems of proving safety (in Section 4.1) and termination (in Section 4.3), and disproving safety (in Section 4.4) and termination (in Section 4.2) of programs in the language L. In particular, we discuss precondition inference, namely, in- ference of most-general characterization of inputs for which a given program satisfies (or violates) a given safety (or termination) property.

4.1 Proving Safety

We explain how to formalize, as a type optimization problem, a problem of inferring maximally-weak precondition under which a given program satisfies a given postcondition. For example, let us consider the following terminating version of sum.

let rec sum’ x = if x <= 0 then 0 else x + sum’ (x-1)

In our framework, a problem to infer a maximally-weak precondition on the argu- ment x for a postcondition x = sum0 x is expressed as a type optimization prob- lem to infer sum0’s refinement type of the form (x : {x | P (x)}) → {y | x = y} under an optimization constraint maximize(P ). Our type optimization method described in Sections 5.2 and 6 infers the following type.

(x : {x | 0 ≤ x ≤ 1}) → {y | x = y}

This type says that the postcondition holds if the actual argument x is 0 or 1.

Example 2 (Higher-Order Function). For an example of a higher-order function, consider the following. let rec repeat f n e = if n<=0 then e else repeat f (n-1) (f e) By inferring repeat’s refinement type of the form

(f :(x:{x | P1(x)})→{y | P2(x, y)})→(n:int)→(e:{e | P3(n, e)})→{r | r ≥ 0} under optimization constraints ρ(P1) =↓, ρ(P2) = ρ(P3) =↑, and P3 @ P2 @ P1, our type optimization method obtains

(f :(x : {x | x ≥ 0}) → {y | y ≥ 0}) → (n : int) → (e : {e | e ≥ 0}) → {r | r ≥ 0}

Thus, our type optimization method can infer maximally-weak refinement types for the function arguments of a given higher-order function that are sufficient for it to satisfy a given postcondition. ut

4.2 Disproving Termination

In a similar manner to Section 4.1, we can apply type optimization to the prob- lems of inferring maximally-weak precondition for a given program to violate the termination property. For example, consider the function sum in Section 1. For disproving termination of sum, we infer sum’s refinement type of the form (x : {x | P (x)}) → {y | ⊥} under an optimization constraint maximize(P ). Our type optimization method infers the following type.

(x : {x | x < 0}) → {y | ⊥}

The type expresses that sum returns no value (i.e., sum is non-terminating) if called with an argument x that satisfies x < 0.

Example 3 (Non-Deterministic Function). For an example of non-deterministic function, let us consider a problem of disproving termination of the following. let rec f x = let n = read_int () in if n<0 then x else f x Here, read_int () is a function to get an integer value from the user and is modeled as ∗∃ in our language L. Note that the termination of f does not depend on the argument x but user inputs n. Our type optimization method successfully disproves termination of f by inferring a refinement type (x : int) → {y | ⊥} for f and {n | n ≥ 0} for the user inputs n. These types mean that f never terminates if the user always inputs some non-negative integer. ut

4.3 Proving Termination

Refinement type optimization can also be applied to bounds analysis for inferring upper bounds of the number of recursive calls. Our bounds analysis for functional programs is inspired by a program transformation approach to bounds analysis for imperative programs [6, 7]. Let us consider sum in Section 1. By inserting additional parameters i and c to the definition of sum, we obtain

let rec sum_t x i c = if x=0 then 0 else x + sum_t (x-1) i (c+1) Here, i and c respectively represent the initial value of the argument x and the number of recursive calls so far. For proving termination of sum, we infer sum t’s refinement type of the form

(x : {x | P (x}) → (i : int) → (c : {c | Inv(x, i, c)}) → int under optimization constraints maximize(P ), minimize(Bnd), P @ Bnd, and additional constraints on the predicate variables P, Bnd, Inv

∀x, i, c. (Inv(x, i, c) ⇐ c = 0 ∧ i = x) (4) ∀x, i, c. (Bnd(i, c) ⇐ P (x) ∧ Inv(x, i, c)) (5)

Here, Bnd(i, c) is intended to represent the bounds of the number c of recursive calls of sum with respect to the initial value i of the argument x. We therefore assume that Bnd(i, c) is of the form 0 ≤ c ≤ k0 + k1 · i, where k0, k1 represent unknown coefficients to be inferred. The constraint (4) is necessary to express the meaning of the inserted parameters i and c. The constraint (5) is also necessary to ensure that the bounds Bnd(i, c) is implied by a precondition P (x) and an invariant Inv(x, i, c) of sum. Our type optimization method then infers

(x : {x | x ≥ 0}) → (i : int) → (c : {c | x ≤ i ∧ i = x + c}) → int and Bnd(i, c) ≡ 0 ≤ c ≤ i. Thus, we can conclude that sum is terminating for any input x ≥ 0 because the number c of recursive calls is bounded from above by the initial value i of the argument x. Interestingly, we can infer a precondition for minimizing the number of re- cursive calls of sum by replacing the priority constraint P @ Bnd with Bnd @ P , assuming that Bnd(i, c) is of the form 0 ≤ c ≤ k0 for some unknown constant k0, and adding an additional constraint ∃x.P (x) (to avoid a meaningless solution P (x) ≡ ⊥). In fact, our type optimization method obtains

(x : {x | x = 0}) → (i : int) → (c : {c | c = 0}) → int and Bnd(i, c) ≡ c = 0. Therefore, we can conclude that the minimum number of recursive calls is 0 when the actual argument x is 0. We expect that our bounds analysis for functional programs can further be extended to infer non-linear upper bounds by adopting ideas from an elaborate transformation for bounds analysis of imperative programs [6].

4.4 Disproving Safety We can use the same technique in Section 4.3 to infer maximally-weak precon- dition for a given program to violate a given postcondition. For example, let us consider again the function sum. A problem to infer a maximally-weak precondi- tion on the argument x for violating a postcondition sum x ≥ 2 can be reduced to a problem to infer sum t’s refinement type of the form

(x : {x | P (x)}) → (i : int) → (c : {c | Inv(x, i, c)}) → {y | ¬(y ≥ 2)} under the same constraints for bounds analysis in Section 4.3. The refinement type optimization method then obtains

(x : {x | 0 ≤ x ≤ 1}) → (i : int) → (c : {c | 0 ≤ x ∧ i = x + c}) → {y | ¬(y ≥ 2)} and Bnd(i, c) ≡ 0 ≤ c ≤ i. This result says that if the actual argument x is 0 or 1, then sum terminates and returns some integer y that violates y ≥ 2. In other words, x = 0, 1 are counterexamples to the postcondition sum x ≥ 2. We can instead find a minimal-length counterexample path3 violating the postcondition sum x ≥ 2 by replacing the priority constraint P @ Bnd with Bnd @ P , assuming that Bnd(i, c) is of the form 0 ≤ c ≤ k0 for some unknown constant k0, and adding an additional constraint ∃x.P (x). Our type optimization method then infers

(x : {x | x = 0}) → (i : int) → (c : {c | 0 ≤ x ∧ i = x + c}) → {y | ¬(y ≥ 2)} and Bnd(i, c) ≡ c = 0. From the result, we can conclude that a minimal-length counterexample path is obtained when the actual argument x is 0.

5 Horn Constraint Optimization and Reduction from Refinement Type Optimization

We reduce refinement type optimization problems into constraint optimization problems subject to existentially-quantified Horn clauses [1,11,19]. We first for- malize Horn constraint optimization problems (in Section 5.1) and then explain the reduction (in Section 5.2).

5.1 Horn Constraint Optimization Problems

Existentially-Quantified Horn Clause Constraint Sets (∃HCCSs) over QFLIA are defined as follows.

(∃HCCSs) H ::= {hc1,..., hcm} (Horn clauses) hc ::= h ⇐ b (heads) h ::= P (et) | φ | ∃x.e (P (et) ∧ φ) (bodies) b ::= P1(et1) ∧ ... ∧ Pm(etm) ∧ φ

We write pvs(H) for the set of predicate variables that occur in H. A predicate substitution θ for an ∃HCCS H is a map from each P ∈ pvs(H) to a closed predicate λx1, . . . , xar(P ).φ. We write ΘH for the set of predicate substitutions for H. We call a substitution θ is a solution of H if for each hc ∈ H, |= θhc. For a subset Θ ⊆ ΘH, we call a substitution θ ∈ Θ is a Θ-restricted solution if θ is a solution of H. Our constraint optimization method described in Section 6 is designed to find a Θ-restricted solution for some Θ consisting of

3 Here, minimality is with respect to the number of recursive calls within the path. substitutions that map each predicate variable to a predicate with a bounded number of conjunctions and disjunctions. In particular, we often use n ar(P ) o Θatom = P 7→ λx1, . . . , xar(P ).n0 + Σi=1 ni · xi ≥ 0 | P ∈ pvs(H) consisting of atomic predicate substitutions.

Example 4. Recall the function sum and the predicate substitutions θ1, θ2, θ3 in Example 1. Our method reduces a type optimization problem for sum into a constraint optimization problem for the following HCCS Hsum (the explanation of the reduction is deferred to Section 5.2).  Q(x, 0) ⇐ P (x) ∧ x = 0,P (x − 1) ⇐ P (x) ∧ x 6= 0,  Q(x, x + y) ⇐ P (x) ∧ Q(x − 1, y) ∧ x 6= 0

Here, θ1 is a solution of Hsum, and θ2 and θ3 are Θatom -restricted solutions of Hsum. If we fix Q(x, y) ≡ ⊥ (i.e., infer sum’s type of the form (x : {x | P (x)}) → {y | ⊥}) for disproving termination of sum as in Section 4.2, we obtain the fol- ⊥ lowing HCCS Hsum. {⊥ ⇐ P (x) ∧ x = 0,P (x − 1) ⇐ P (x) ∧ x 6= 0} ⊥ Hsum has, for example, Θatom -restricted solutions {P 7→ λx.x < 0} and {P 7→ λx.x < −100}. ut We now define Horn constraint optimization problems for ∃HCCSs. Definition 4. Let H be an ∃HCCS and ≺ be a strict partial order on predicate substitutions. A solution θ of H is called Pareto optimal with respect to ≺ if there is no solution θ0 of H such that θ0 ≺ θ.A Horn constraint optimization problem (H, ≺) is a problem to find a Pareto optimal solution θ with respect to ≺.A Θ-restricted Horn constraint optimization problem is a Horn constraint optimization problem with the notion of solutions replaced by Θ-restricted solu- tions.

Example 5. Recall Hsum and its solutions θ1,θ2,θ3 in Example 1. Let us consider a Horn constraint optimization problem (H , ≺ ) where ρ(P ) = ↑, ρ(Q) = ↓, sum (ρ,@) and Q P . We have θ ≺ θ and θ ≺ θ . In fact, θ is a Pareto @ 3 (ρ,@) 1 3 (ρ,@) 2 3 optimal solution of H with respect to ≺ . ut sum (ρ,@) In general, an ∃HCCS H may not have a Pareto optimal solution with re- spect to ≺ even though H has a solution. For example, consider a Horn (ρ,@) constraint optimization problem (H , ≺ ) where ρ(P ) =↑, ρ(Q) =↓, and sum (ρ,@) x(x+1) P @ Q. Because the semantically optimal solution Q(x, y) ≡ y = 2 is not expressible in QFLIA, it must be approximated, for example, as Q(x, y) ≡ y ≥ 0 ∧ y ≥ x ∧ y ≥ 2x − 1. The approximated solution, however, is not Pareto optimal because we can always get a better approximation like Q(x, y) ≡ y ≥ 0 ∧ y ≥ x ∧ y ≥ 2x − 1 ∧ y ≥ 3x − 3 if we use more conjunctions. We can, however, show that an ∃HCCS has a Θatom -restricted Pareto optimal solution with respect to ≺ if it has any Θ -restricted solution. Interested (ρ,@) atom readers are referred to the extended version [8]. For the above example, θ2 in Example 1 is a Θatom -restricted Pareto optimal solution. 1: procedure Optimize(H, ≺) 2: match Solve(H) with 3: Unknown → return Unknown 4: | NoSol → return NoSol 5: | Sol(θ0) → 6: θ := θ0; 7: while true do 0 8: let H = Improve≺(θ, H) in 9: match Solve(H0) with 10: Unknown → return Sol(θ) 11: | NoSol → return OptSol(θ) 12: | Sol(θ0) → θ := θ0 13: end Fig. 3. Pseudo-code of the constraint optimization method for ∃HCCSs

5.2 Reduction from Refinement Type Optimization

Our method reduces a refinement type optimization problem into an Horn con- straint optimization problem in a similar manner to the previous refinement type inference method [18]. Given a program D, our method first prepares a refinement type template ΓD of D as well as, for each expression of the form let x = ∗∃ in e, a refinement type template {x | P (y,e x)} of x, where P is a fresh predicate variable and ye is the sequence of all the integer variables in the scope. Our method then generates an ∃HCCS by type-checking D against ΓD and col- lecting the proof obligations of the forms Γ ∧ φ1 ⇒ φ2 and Γ ⇒ ∃ν.φ respec- tively from each application of the rules ISubJ K and Rand∃. WeJ writeK Gen(D,ΓD) to denote the ∃HCCS thus generated from D and ΓD. We can show the soundness of our reduction in the same way as in [18].

Theorem 3 (Soundness of Reduction). Let (D, ≺) be a refinement type op- timization problem and ΓD be a refinement type template of D. If θ is a Pareto optimal solution of Gen(D,ΓD), then θ is a solution of (D, ≺).

6 Horn Constraint Optimization Method

In this section, we describe our Horn constraint optimization method for ∃HCCSs. The method repeatedly improves a current solution until convergence. The pseudo- code of the method is shown in Figure 3. The procedure Optimize for Horn con- straint optimization takes a (Θ-restricted) ∃HCCS optimization problem (H, ≺) and returns any of the following: Unknown (which means the existence of a solu- tion is unknown), NoSol (which means no solution exists), Sol(θ) (which means θ is a possibly non-Pareto optimal solution), or OptSol(θ) (which means θ is a Pareto optimal solution). The sub-procedure Solve for Horn constraint solving takes an ∃HCCS H and returns any of Unknown, NoSol, or Sol(θ). The detailed description of Solve is deferred to Section 6.1. Optimize first calls Solve to find an initial solution θ0 of H (line 2). Opti- mize returns Unknown if Solve returns Unknown (line 3) and NoSol if Solve returns NoSol (line 4). Otherwise (line 5), Optimize repeatedly improves a cur- rent solution θ starting from θ0 until convergence (lines 6 – 13). To improve θ, 0 we call a sub-procedure Improve≺(θ, H) that generates an ∃HCCS H from H by adding constraints that require any solution θ0 of H0 satisfies θ0 ≺ θ (line 8). Optimize then calls Solve to find a solution of H0. If Solve returns Unknown, Optimize returns Sol(θ) as a (possibly non-Pareto optimal) solution (line 10). If Solve returns NoSol, it is the case that no improvement is possible, and hence the current solution θ is Pareto optimal. Thus, Optimize returns OptSol(θ) (line 11). Otherwise, we obtain an improved solution θ0 ≺ θ (line 12). Optimize then updates the current solution θ with θ0 and repeats the improvement process. ⊥ Example 6. Recall Hsum in Example 4 and consider an optimization problem (H⊥ , ≺ ) where ρ(P ) =↑. We below explain how H⊥ , ≺  sum (@,ρ) Optimize sum (@,ρ) proceeds. First, Optimize calls Solve and obtains an initial solution, e.g., θ0 = ⊥ ⊥ {P 7→ λx.⊥} of H . Optimize then calls Improve≺ (θ0, H ) and obtains sum (@,ρ) sum 0 ⊥ an ∃HCCS H = Hsum∪{P (x) ⇐ ⊥, ∃x.¬(P (x) ⇒ ⊥)} that requires any solution 0 0 θ of H satisfies θ(P ) ≺ρ(P ) θ0(P ) = λx.⊥. Optimize then calls Solve(H ) and obtains an improved solution, e.g., θ1 = {P 7→ λx.x < 0}. In the next iteration, ⊥ Optimize returns θ1 as a Pareto optimal solution because Improve≺(θ1, Hsum) has no solution. ut We now discuss properties of the procedure Optimize under the assumption of the correctness of the sub-procedure Solve (i.e., θ is a Θ-restricted solution of H if Solve(H) returns Sol(θ), and H has no Θ-restricted solution if Solve(H) returns NoSol). The following theorem states the correctness of Optimize. Theorem 4 (Correctness of the Procedure Optimize). Let (H, ≺) be a Θ-restricted Horn constraint optimization problem. If Optimize(H, ≺) returns OptSol(θ) (resp. Sol(θ)), θ is a Pareto optimal (resp. possibly non-Pareto opti- mal) Θ-restricted solution of H with respect to ≺.

The following theorem states the termination of Optimize for Θatom -restricted Horn constraint optimization problems. Theorem 5 (Termination of the Procedure Optimize). Let (H, ≺ ) (@,ρ) be a Θatom -restricted Horn constraint optimization problem. Suppose that (a) for any P such that ρ(P ) =↓, P is not existentially quantified in H and P (b) if Solve returns θ, for any P , θ defined as θ {P 7→ λx.φe } (where φ ≡ > if ρ(P ) =↑ and φ ≡ ⊥ if ρ(P ) =↓) is either not a solution or θP 6≺ θ. (@,ρ) It then follows that (H, ≺ ) always terminates. Optimize (@,ρ)

6.1 Sub-Procedure Solve for Solving ∃HCCSs The pseudo-code of the sub-procedure Solve for solving ∃HCCSs is presented in Figure 4. Here, Solve uses existing template-based invariant generation tech- niques based on Farkas’ lemma [3, 7] and ∃HCCS solving techniques based on 1: procedure Solve(H) n ar(P ) o 2: let θ = P 7→ λx.ce 0 + Σi=1 ci · xi ≥ 0 | P ∈ pvs(H) in V 3: let ∃ec.∀x.e ∃y.φe = ∃ec. hc∈H ∀fvs(hc).θ(hc) in 4: let ∃c, z.∀x.φ0 = apply Skolemization to ∃c.∀x.∃y.φ in e e e 00 e e e 0 5: let ∃ec, z,e w.φe = apply Farkas’ lemma to ∃ec, z.e ∀x.φe in 6: match SMT(φ00) with 7: Sat(σ) → return Sol(σ(θ)) 8: | Unknown 9: | Unsat → match SMT(∀x.e ∃y.φe ) with 10: Unsat → return NoSol 11: | Unknown → return Unknown 12: | Sat(σ) → return Sol(σ(θ))

Fig. 4. Pseudo-code of the constraint solving method for ∃HCCSs based on template- based invariant generation

Skolemization [1, 11, 19]. Solve first generates a template substitution θ that maps each predicate variable in pvs(H) to a template atomic predicate with un- 4 known coefficients c0, . . . , car(P ) (line 2). Solve then applies θ to H and obtains a verification condition of the form ∃ec.∀x.e ∃y.φe without predicate variables (line 3). Solve applies Skolemization [1, 11, 19] to the condition and obtains a sim- 0 plified condition of the form ∃ec, z.e ∀x.φe (line 4). Solve further applies Farkas’ lemma [3, 7] to eliminate the universal quantifiers and obtains a condition of 00 the form ∃ec, z,e w.φe (line 5). Solve then uses an SMT solver that supports the quantifier-free theory of non-linear integer arithmetic (QFNIA) to find a satis- fying assignment to φ00 (line 6). If such an assignment σ is found, Solve returns σ(θ) as a solution (line 7). Otherwise (no assignment is found),5 Solve uses an SMT solver that supports the quantified theory of non-linear integer arith- metic (NIA) to check the absence of a Θatom -restricted solution by checking the unsatisfiability of ∀x.e ∃y.φe (line 9). Solve returns NoSol if Unsat is returned (line 10) and Unknown if Unknown is returned (line 11). Otherwise (a satisfying assignment σ is found), Solve returns σ(θ) as a solution (line 12). Example 7. We explain how Solve proceeds for H0 in Example 6. Solve first generates a template substitution θ = {P 7→ λx.c0 + c1 · x ≥ 0} with unknown 0 coefficients c0, c1 and applies θ to H . As a result, we get a verification condition   (⊥ ⇐ c + c · x ≥ 0 ∧ x = 0) ∧   ∀x. 0 1 ∧ ∃c0, c1.  (c0 + c1 · (x − 1) ≥ 0 ⇐ c0 + c1 · x ≥ 0 ∧ x 6= 0)  ∃x. c0 + c1 · x ≥ 0

4 The presented code here is thus specialized to solve Θatom -restricted Horn constraint optimization problems. To solve Θ-restricted optimization problems for other Θ, we need here to generate templates that conform to the shape of substitutions in Θ instead. Our implementation in Section 7 iteratively increases the template size. 5 Note here that even though no assignment is found, H may have a Θatom -restricted solution because Farkas’ lemma is not complete for QFLIA formulas [3, 7] and 0 Skolemization of ∃ec.∀x.e ∃y.φe into ∃ec, z.e ∀x.φe here assumes that ye are expressed as linear expressions over xe [1, 11, 19]. By applying Farkas’ lemma, we obtain   ∃w1, w2, w3 ≥ 0. (c0 · w1 ≤ −1 ∧ c1 · w1 + w2 − w3 = 0) ∧    (−1 − c0 + c1) · w4 + c0 · w5 − w6 ≤ −1 ∧   ∃w4, w5, w6 ≥ 0. ∧   c1 · (−w4 + w5) + w6 = 0  ∃c0, c1.      (−1 − c0 + c1) · w7 + c0 · w8 − w9 ≤ −1 ∧   ∃w7, w8, w9 ≥ 0. ∧   c1 · (−w7 + w8) − w9 = 0  ∃x. c0 + c1 · x ≥ 0

By using an SMT solver, we obtain, for example, a satisfying assignment σ such that σ(c0) = σ(c1) = −1, σ(w1) = σ(w2) = σ(w5) = σ(w6) = σ(w7) = σ(w9) = 1, and σ(w3) = σ(w4) = σ(w8) = 0. Thus, Solve returns σ(θ) = {P 7→ λx. − 1 − x ≥ 0} ≡ θ1 in Example 6. ut

The following theorem states the correctness of the sub-procedure Solve. Lemma 1 (Correctness of the Sub-Procedure Solve). Let H be an ∃HCCS. θ is a Θatom -restricted solution of H if Solve(H) returns Sol(θ), and H has no Θatom -restricted solution if Solve(H) returns NoSol.

7 Implementation and Experiments We have implemented a prototype refinement type optimization tool for OCaml based on the method presented in this paper. Our tool uses Z3 (https://z3. codeplex.com/) as its underlying SMT solver. We conducted preliminary ex- periments on a machine with Intel Core i7-3770 3.40GHz, 16GB of RAM. The experimental results are sum- Table 1. The results of a non-termination marized in Tables 1 and 2. Ta- verification benchmark set used in [2,11,13]. ble 1 shows the results of an existing Verified TimeOut Other first-order non-termination verifica- Our tool 41 27 13 tion benchmark set used in [2,11,13]. CppInv [13] 70 6 5 Because the original benchmark set T2-TACAS [2] 51 0 30 was written in the input language of MoCHi [11] 48 26 7 T2 (http://mmjb.github.io/T2/), TNT [4] 19 3 59 we used an OCaml translation of the benchmark set provided by [11]. The results for CppInv, T2-TACAS, and TNT are according to Larraz et al. [13]. The result for MoCHi is according to [11]. Our tool was able to successfully disprove termination of 41 programs (out of 81) in the time limit of 100 seconds. Our prototype tool was not the best but performed reasonably well compared to the state-of-the-art tools dedicated to non-termination verification. Table 2 shows the results of maximally-weak precondition inference for prov- ing safety (P/S) and termination (P/T), and disproving safety (D/S) and termi- nation (D/T). We used non-termination (resp. termination) verification bench- marks for higher-order functional programs from [11] (resp. [12]). The column “#I” shows the number of optimization iterations, and the column “Time” shows the running time in seconds. The column “Op.” shows whether Pareto optimal Table 2. The results of maximally-weak precondition inference. Program App. #I Time Op. Program App. #I Time Op. fixpoint [11] D/T 1 0.300  append [12] P/T 10 10.664 ? fib_CPS [11] D/T 1 7.083 ? zip [12] P/T 3 12.236 indirect_e [11] D/T 1 0.344  repeat (Sec.4.1) P/S 4 0.948 ? indirectHO_e [11] D/T 1 0.312  sum0 (Sec.4.1) P/S 2 0.036  loopHO [11] D/T 1 16.094 ? sum (Sec.4.2) D/T 1 0.174  ? foldr [11] D/T 3 8.048  sum t (Sec.4.3) P/T(P @Bnd) 5 12.028  ? ? sum_geq3 P/S 4 2.654  sum t (Sec.4.3) P/T(Bnd @P ) 2 15.504  ? ? append P/S 5 3.608  sum t (Sec.4.4) D/S(P @Bnd) 4 12.020  ? fib D/T 1 0.039  sum t (Sec.4.4) D/S(Bnd @P ) 1 20.780  refinement types are inferred: (resp. ?) indicates that the Pareto optimality of inferred types is checked automatically by our tool (resp. manually by us). The results show that our prototype tool is reasonably efficient for proving safety (P/S) and disproving termination (D/T) of higher-order functions. Further en- gineering work, however, is required to make the tool more efficient for proving termination (P/T) and disproving safety (D/S). 8 Related Work Type inference problems for refinement type systems [5, 20] have been inten- sively studied [9, 10, 15–19]. To our knowledge, this paper is the first to address type optimization problems, which generalize ordinary type inference problems. As we saw in Sections 4 and 7, this generalization enables significantly wider applications in the verification of higher-order functional programs. For imperative programs, Gulwani et al. have proposed a template-based method to infer maximally-weak pre and maximally-strong post conditions [7]. Their method, however, cannot directly handle higher-order functional pro- grams, (angelic and demonic) non-determinism in programs, and prioritized multi-objective optimization, which are all handled by our new method. Internally, our method reduces a type optimization problem to a constraint optimization problem subject to an existentially quantified Horn clause con- straint set (∃HCCS). Constraint solving problems for ∃HCCSs have been studied by recent work [1,11,19]. They, however, do not address constraint optimization problems. The goal of our constraint optimization is to maximize/minimize the set of the models for each predicate variable occurring in the given ∃HCCS. Thus, our constraint optimization problems are different from Max-SMT [14] problems whose goal is to minimize the sum of the penalty of unsatisfied clauses. 9 Conclusion We have generalized refinement type inference problems to type optimization problems, and presented interesting applications enabled by type optimization to inferring most-general characterization of inputs for which a given functional program satisfies (or violates) a given safety (or termination) property. We have also proposed a refinement type optimization method based on template-based invariant generation. We have implemented our method and confirmed by ex- periments that the proposed method is promising for the applications. References

1. T. A. Beyene, C. Popeea, and A. Rybalchenko. Solving existentially quantified horn clauses. In CAV ’13, volume 8044 of LNCS, pages 869–882. Springer, 2013. 2. H. Y. Chen, B. Cook, C. Fuhs, K. Nimkar, and P. W. O’Hearn. Proving nontermi- nation via safety. In TACAS ’14, volume 8413 of LNCS, pages 156–171. Springer, 2014. 3. M. A. Col´on,S. Sankaranarayanan, and H. B. Sipma. Linear invariant generation using non-linear constraint solving. In CAV ’03, volume 2725 of LNCS, pages 420–432. Springer, 2003. 4. F. Emmes, T. Enger, and J. Giesl. Proving non-looping non-termination automat- ically. In IJCAR ’12, volume 7364 of LNCS, pages 225–240. Springer, 2012. 5. T. Freeman and F. Pfenning. Refinement types for ML. In PLDI ’91, pages 268–277. ACM, 1991. 6. S. Gulwani, K. K. Mehra, and T. Chilimbi. SPEED: Precise and efficient static estimation of program computational complexity. In POPL ’09, pages 127–139. ACM, 2009. 7. S. Gulwani, S. Srivastava, and R. Venkatesan. Program analysis as constraint solving. In PLDI ’08, pages 281–292. ACM, 2008. 8. K. Hashimoto and H. Unno. Refinement type inference via horn constraint op- timization. An extended version, available from http://www.cs.tsukuba.ac.jp/ ~uhiro/, 2015. 9. R. Jhala, R. Majumdar, and A. Rybalchenko. HMC: verifying functional programs using abstract interpreters. In CAV ’11, volume 6806 of LNCS, pages 470–485. Springer, 2011. 10. N. Kobayashi, R. Sato, and H. Unno. Predicate abstraction and CEGAR for higher-order model checking. In PLDI ’11, pages 222–233. ACM, 2011. 11. T. Kuwahara, R. Sato, H. Unno, and N. Kobayashi. Predicate abstraction and CE- GAR for disproving termination of higher-order functional programs. In CAV’15, LNCS. Springer, 2015. 12. T. Kuwahara, T. Terauchi, H. Unno, and N. Kobayashi. Automatic termination verification for higher-order functional programs. In ESOP ’14, volume 8410 of LNCS, pages 392–411. Springer, 2014. 13. D. Larraz, K. Nimkar, A. Oliveras, E. Rodr´ıguez-Carbonell, and A. Rubio. Proving non-termination using max-SMT. In CAV ’14, volume 8559 of LNCS, pages 779– 796. Springer, 2014. 14. R. Nieuwenhuis and A. Oliveras. On SAT modulo theories and optimization prob- lems. In SAT ’06, volume 4121 of LNCS, pages 156–169. Springer, 2006. 15. P. Rondon, M. Kawaguchi, and R. Jhala. Liquid types. In PLDI ’08, pages 159–169. ACM, 2008. 16. T. Terauchi. Dependent types from counterexamples. In POPL ’10, pages 119–130. ACM, 2010. 17. H. Unno and N. Kobayashi. On-demand refinement of dependent types. In FLOPS ’08, volume 4989 of LNCS, pages 81–96. Springer, 2008. 18. H. Unno and N. Kobayashi. inference with interpolants. In PPDP ’09, pages 277–288. ACM, 2009. 19. H. Unno, T. Terauchi, and N. Kobayashi. Automating relatively complete veri- fication of higher-order functional programs. In POPL ’13, pages 75–86. ACM, 2013. 20. H. Xi and F. Pfenning. Dependent types in practical programming. In POPL ’99, pages 214–227. ACM, 1999.