EFFICIENT GROBNER¨ BASIS REDUCTIONS FOR FORMAL VERIFICATION OF GALOIS MULTIPLIERS

Jinpeng Lv and Priyank Kalla Florian Enescu Department of Electrical and Computer Eng. Department of and Statistics University of Utah, Salt Lake City, UT-84112 Georgia State University, Atlanta, GA 30302-4038

Abstract - Galois field arithmetic finds application in Problem Statement: We are given the following: many areas, such as cryptography, error correction codes, Given a finite field F k , i.e. given k (circuit input size), • 2 signal processing, etc. Multiplication lies at the core of most along with the corresponding irreducible Galois field computations. This paper addresses the problem of P(x). Let P(α)= 0, i.e. α be the root of P(x). formal verification of hardware implementations of (modulo) Given a word-level multiplier specification polynomial • multipliers over Galois fields of the F k , using a computer- F 2 Z = A B (mod P(x)), where A,B,Z 2k (k-bit vectors). algebra/algebraic-geometry based approach. The multiplier Given· a gate-level combinational∈ circuit C. F • circuit is modeled as a polynomial system in 2k [x1,x2, ,xd] The bit-level primary inputs of the circuit are and the verification problem is formulated as a membership··· a0,...,ak 1, b0,...,bk 1 , and z0,...,zk 1 are test in a corresponding (radical) . This requires the { − − } { − } the primary outputs; here ai,bi,gi F2,i = 0,...,k 1. computation of a Grobner¨ basis, which can be computationally ∈ k 1 − Therefore, A = a0 + a1α + + ak 1α − , B = b0 + intensive. To overcome this limitation, we analyze the circuit k 1 ··· − k 1 b1α + + bk 1α − and Z = z0 + z1α + + zk 1α − . topology and derive a term order to represent the . ··· − ··· − Our objective is to formally prove that A,B F k , the circuit Subsequently, using the theory of Grobner¨ bases over Galois 2 C correctly computes the multiplication∀Z = A∈ B (mod P(x)) fields, we prove that this term order renders the set of polyno- · over F k . Otherwise, we have to produce a counter-example mials itself a Grobner¨ basis of this ideal – thus significantly 2 that excites the bug in the design. improving verification. Using our approach, we can verify the correctness of, and detect bugs in, upto 163-bit circuits in We formulate and solve the verification problem using a computer-algebra and algebraic-geometry based approach. F 163 ; whereas contemporary approaches are infeasible. 2 In particular, we apply the results of Strong Nullstellensatz I. INTRODUCTION over Galois fields, along with Grobner¨ basis techniques, to Finite (Galois) field theory is extensively applied in Elliptic efficiently perform the equivalence verification between the Curve Cryptography (ECC), error correction codes, digital specification polynomial and the multiplier circuit implemen- signal processing, etc. Therefore, dedicated hardware imple- tation. Our approach and contributions can be outlined as mentations of Galois field arithmetic abound. Multiplication follows: lies at the core of most Galois field computations and is a high- We model the given circuit C as a polynomial system • complexity operation, particularly in cryptosystems where the f1,..., fs in F k [x1,x2,...,xd], and denote the generated { } 2 datapath size can be very large. For example, k = 163 for ideal as J = f1,..., fs . The specification Z = A B a Galois field recommended for ECC by the U.S. National (mod P(x)) ish also modeledi as a polynomial f : Z +A· B Institute for Standards and Technology (NIST). Therefore, F · in 2k [x1,x2,...,xd]. this has lead researchers to derive efficient and sophisticated Using the results of Strong Nullstellensatz over finite • hardware implementations of multiplier circuits over Galois fields [6], we show that the verification test can be for- Fields [1] [2] [3]. mulated as membership testing of f in the ideal (J +J0), The complicated and specialized nature of these arithmetic 2k 2k where J0 = x1 x1,...,xd xd . architectures requires custom design, raising the potential for For this idealh membership− test,− iti is required to compute errors and bugs in the implementation. Such errors not only • a Grobner¨ basis of (J + J0). Buchberger’s [7], cause unintended operation, but they also manifest them- employed for Grobner¨ basis computation, can be compu- selves as security vulnerabilities which can be maliciously tationally intensive, and its efficiency is highly susceptible exploited [4]. Arithmetic bugs are especially catastrophic for to the term orderings used to represent and manipulate the cryptosystems, as incorrect (buggy) hardware multiplication polynomials. To overcome this complexity, we draw in- can lead to full leakage of the secret key [5]. Therefore, it spirations from [8], and derive a term order by analyzing is of utmost importance to verify the correctness of hardware the circuit topology to represent the polynomials. implementations of finite field multipliers residing at the core The work of [8] shows that this term order makes the of such systems. This paper addresses formal verification of • set f1,..., fs a Grobner¨ basis of J. However, using F { } multiplier circuits over (binary) Galois fields of the type 2k the theoretical concepts of Grobner¨ bases over Galois using computer algebra techniques. fields, we further prove that this term order also makes 2k 2k the set f1,..., fs, x x1,...,x xd a Grobner¨ basis This work is sponsored in part by a grant from NSF #CCF-546859. { 1 − d − } 978-3-9810801-8-6/DATE12/ c 2012 EDAA of J + J0. This significantly enhances the capacity of our

verification framework, enabling verification of custom However, the result is yet to be reduced modulo the primitive implementations of 163-bit circuits – corresponding to polynomial P(x)= x4 + x3 + 1. This is shown below: the NIST recommended 163-bit elliptic curve. s3 s2 s1 s0 α4 α α3 Not only can we verify the correctness of multiplier s4 0 0 s4 s4 (mod P( )) = s4 ( + 1) • ⇐ · α5 α · α3 α circuits, but we can also detect the presence of bugs using s5 0 s5 s5 s5 (mod P( )) = s5 ( + + 1) ⇐ · α6 α · α3 α2 α our approach. Moreover, the time required to prove cor- s6 s6 s6 s6 s6 (mod P( )) = s6 ( + + + 1) z z z z ⇐ · · rectness or to detect bugs is comparable. Our experiments 3 2 1 0 The final result (output) of the circuit is: Z = z0 + z1α + show that none of the contemporary techniques based 2 3 z2α + z3α ; where z0 = s0 + s4 + s5 + s6; z1 = s1 + s5 + on BDDs, SAT, SMT-solvers etc. can prove correctness s6; z2 = s2 + s6; z3 = s3 + s4 + s5 + s6. beyond 16-bit circuits. The above multiplier design is called the Mastrovito multi- Our approach is generic enough to verify the implemen- plier [1]. In cryptosystems, multiplication is often performed tation of any Galois field arithmetic circuit against its given repeatedly – e.g., for exponentiation. For such applications, polynomial specification. However, for this paper, we concen- Montgomery multiplier architectures [2] [3] over Galois fields F trate only on multiplier circuits over Galois fields 2k . are employed for faster computation (at the expense of area). In our experiments, we verify custom implementations of both II. GALOIS FIELDS & MULTIPLICATION Mastrovito and Montgomery multipliers, where the circuits are given as gate-level (flattened) netlists. We briefly describe the relevant finite field concepts (details can be found in the textbook [9]) and modular multiplier III. RELATED PREVIOUS WORK design over such fields [1] [2] [3]. Contemporary graph-based canonical DAG representations A finite field, also called a Galois field, is a field with a of Boolean functions such as BDDs [10], OKFDDs [11], finite number of elements. The number of elements q of the BMDs [12] and MODDs [13], etc. are ill-suited for such finite field is a power of a prime – i.e. q = pk, where p modulo-multiplication applications, particularly over large fi- is prime integer, and k 1. Galois fields are denoted as Fq and nite fields. The work of [14] presents a DAG representation ≥ also GF(q = pk). We are interested in fields where p = 2 and for synthesis and verification of multi-output polynomials over k > 1; i.e. binary Galois extension fields F k , as these are often Z 2 finite integer rings 2k ,k > 1. Since their canonical reduction employed in elliptic curve cryptography implementations. rules employ the theory of polynomial functions over finite F F To construct 2k , we take the polynomial 2[x] and an integer rings [15], this approach is not directly applicable F F irreducible polynomial P(x) 2[x] of degree k, and construct over finite fields 2k . Similarly, the work of [16] is also only F F ∈ F F 3 2k as 2[x] (mod P(x)). For example, 8 = 2[x] (mod x + applicable for verification of integer-modulo-arithmetic over Z x + 1). All the field operations are performed modulo the 2k at word-level/RTL, and not over Galois field circuits. irreducible polynomial P(x) and the coefficients are reduced This verification problem is also very hard for SAT solvers, F modulo p = 2. Any element A 2k can be represented in due to the large circuit size, and the presence of AND- ∈ k 1 polynomial form as A = a0 + a1α + + ak 1α − , where XOR computations. Contemporary Satisfiability Modulo The- ··· − q ai F2 = 0,1 and P(α)= 0. For all elements A Fq,A = A, ory (SMT) solvers employ a mixture of theories for reasoning ∈ { q } ∈ and hence A A = 0, A Fq. – however, none of them employ polynomial equation solving − ∀ ∈ F Multiplier circuit design over 2k is illustrated below. over Galois fields (which is itself a hard problem). As shown F Example 2.1: Consider the field 24 . We take as inputs: A = in out experiments, BDDs, SAT and SMT solvers cannot prove 2 3 2 3 a0 +a1 α+a2 α +a3 α and B = b0 +b1 α+b2 α +b3 α , design correctness beyond 16-bit multipliers. along with· the· irreducible· polynomial P(x·)= x4 +· x3 + 1.· We The theorem-proving approach of [17] also verifies a Ga- have to perform the multiplication Z = A B (mod P(x)). The lois Field F k implementation against a given polynomial × 2 coefficients of A = a0,...,a3 ,B = b0,...,b3 are in F2 = specification. They devise a decision procedure based on 0,1 . Multiplication{ can be performed} { as shown} below: polynomial division, variable elimination, term re-writing, etc., { } a3 a2 a1 a0 and demonstrate a correctness proof of a sub-block of a Reed- b3 b2 b1 b0 × Solomon decoder. The work of [18] solves similar problems a3 b0 a2 b0 a1 b0 a0 b0 · · · · as those of [17]. However, they make use of OKFDDs [11] to a3 b1 a2 b1 a1 b1 a0 b1 · · · · canonically represent the circuit constraints. Moreover, instead a3 b2 a2 b2 a1 b2 a0 b2 · · · · F a3 b3 a2 b3 a1 b3 a0 b3 of verifying circuits over 2k directly, [18] verifies the circuits · · · · m n s6 s5 s4 s3 s2 s1 s0 over its equivalent composite field GF((2 ) ) representation, 2 3 4 The result Sum = s0 + s1 α + s2 α + s3 α + s4 α + s5 where a non-prime k = m n. Their approach has no benefit if k 5 6 · · · · · · α + s6 α , where, s0 = a0 b0, s1 = a0 b1 + a1 b0, s2 = is prime – say, when k = 163 for elliptic curves. Moreover, the · · · · F a0 b2 + a1 b1 + a2 b0, and so on. Here the multiply “ ” and size-explosion of FDDs limits their approach to 16-bit ( 216 ) add· “+” operations· · are performed modulo 2, so they can· be circuits, as shown in their experiments. implemented in a circuit using AND and XOR gates. Note The work of [19] verifies a given composite field GF((2m)n) that unlike integer multipliers, there are no carry-chains in the multiplier against its specification using a Grobner¨ basis ap- design, as the coefficients are always reduced modulo p = 2. proach based on the Weak Nullstellensatz over Fq – by proving that the “specification = implementation” (miter) is infeasible. correspondence between radical ideals and varieties over Fq. However, their approach6 has the limitation in that it requires Theorem 4.1: (Strong Nullstellensatz [22]) Let F be an the circuit decomposition hierarchy be made available to the arbitrary field and J be an ideal in F[x1,...,xd]. Let F denote F F verification engine. Moreover, when k in 2k is prime, this the algebraic closure of , and let VF(J) denote the variety of approach is inapplicable. Recent work of [20] formulates the J over F. Then I(VF(J)) = √J. verification problem in the similar style as [19]. They heuristi- The Nullstellensatz admits a special form over Galois fields. cally derive a variable order to represent the terms. We state the following results of Nullstellensatz in Galois Using their heuristic, they are able to speed-up the Grobner¨ fields, proofs of which can be found in [6]. basis computation. However, in the presence of bugs, their Lemma 4.1: [6] For any ideal J Fq[x1,...,xd], J + J0 = q q ⊆ heuristic is inefficient in that the Grobner¨ basis computation J + x x1,..., x xd is radical. In other words, √J + J0 = h 1 − d − i runs into memory explosion beyond 16-bit circuits. J + J0. The paper [8] addresses verification of finite precision Theorem 4.2: [6] For any Galois field Fq, let J F ⊆ integer datapath circuits using the concepts of Grobner¨ bases q[x1,...,xd] be an ideal. Let VFq (J) denote the variety of J Z F q q over the ring 2k . They model the circuit constraints by way of over q. Then, I(VFq (J))= J +J0 = J + x1 x1,..., xd xd . arithmetic-bit-level (ABL) polynomials ( G ), and formulate h − − i the verification test as an equivalent variety{ } subset problem. V. VERIFICATION PROBLEM FORMULATION To solve this, first they derive a term order that already Notation: In the sequel, we denote the Galois field as k makes G a Grobner¨ basis. Then they compute a normal Fq, and its algebraic closure as Fq, where q = 2 for our { } form f of the specification g w.r.t. G . If f is a vanishing application. Moreover, VFq (J) denotes the variety of J over Z { } the field F , and V J denotes the variety over it algebraic polynomial over 2k [16], circuit correctness is established. In q Fq ( ) [21], the authors further show that the vanishing polynomial closure. test can be omitted by formulating the problem directly over We are given a specification polynomial f : Z + A B where Z 2 α αk 1 α · αk 1 Q := 2k [X]/ x x : x X . A = a0 + a1 + + ak 1 − ,B = b0 + b1 + + bk 1 − h − ∈ i ··· − k 1 ··· − Our overall problem formulation is similar to those of [8] and Z = z0 + z1α + + zk 1α − . We are also given a gate- ··· − [21]. However, our application and computational domains level circuit C with a0,...,ak 1, b0,...,bk 1 as primary { − − } are different: we have to verify the circuits over the Galois inputs and z0,...,zk 1 as the primary outputs. We have to F Z { − } field 2k , as opposed to 2k in [8] [21]. Therefore, working check if the circuit C implements the function f over the given over Galois fields, using the Strong Nullstellensatz over Fq, we field Fq. show how to directly formulate the problem as an ideal mem- We analyze the circuit and model the Boolean gate-level bership test over the (radical) ideal J +J ; where J corresponds operators as polynomials over F2 ( Fq), as shown in Example 0 ⊂ to the ideal generated by the circuit constraints and J0 denotes 2.1. Besides, we append the following three polynomials: A+ F α αk 1 α αk 1 the ideal of all vanishing polynomials over 2k . We apply a0 + a1 + + ak 1 − , B + b0 + b1 + ...bk 1 − , Z + ··· − k 1 − the same term order from Proposition 2 in [8]; this of course z0 + z1α + + zk 1α − ; these polynomials specify the cor- ··· − makes the set of polynomials f1,..., fs extracted from the respondence between the bit-level (F2) and the word-level { } circuit a Grobner¨ basis of J. However, we further prove (Fq) variables in the system. We denote these constraints q q that it also renders the set f1,..., fs, x x1,...,x xd a as polynomials f1,..., fs over the ring Fq[x1,...,xd], and { 1 − d − } { } Grobner¨ basis of J +J over F – which is a new contribution. denote the generated ideal as J = f1,..., fs . Similarly, the 0 q h i Subsequently, the verification test can be carried out directly specification polynomial f Fq[x1,...,xd]. As opposed to in- Z ∈ α α2 via Grobner¨ basis reduction. teger coefficients in [8] ( 2k ), we have coefficients ( , ,...) from the field Fq. IV. IDEALS, VARIETIES & NULLSTELLENSATZ To prove that the specification ( f ) matches the implemen- Let F be any field and F[x1,...,xd] the tation (J) at all points in the design space, we need to check F over with indeterminates x1,...,xd. An ideal generated by whether or not f vanishes on the variety VFq (J). This is F ∑s f1,..., fs [x1,...,xd] is f1,..., fs = h = i=1 gi fi : gi because for all p VFq (J), if f (p)= 0, then Z +A B = 0 = ∈ d h i { · ∈ ∈ · ⇒ F[x1,...,xd] . Let a F be a point, and f F[x1,...,xd] be Z = A B. On the other hand, if f (p) = 0 for some point a polynomial.} We say∈ that f vanishes on a if∈ f (a)= 0. p, then· p corresponds to the bug in the6 design. Now if f F For any subset J of [x1,...,xd], the affine variety of J over vanishes on VFq (J), we know that f should be a member of F Fd F is V(J)= a : f J, f (a)= 0 . I(VFq (J)). Strong Nullstellensatz over q (Theorem 4.2) tells { ∈ ∀ ∈ }Fd q q Definition 4.1: For any subset V of , the ideal of poly- us that I(VFq (J)) = J + J0 = f1,..., fs, x1 x1,...,xd xd . nomials that vanish on V, called the vanishing ideal of V, is Therefore, we need to test whetherh or not f is− a member− of thei defined as I(V)= f F[x1,...,xd] : a V, f (a)= 0 . ideal J +J0. If f (J +J0), correctness of the circuit is proved. { ∈ ∀ ∈ } ∈ If a polynomial f vanishes on a variety V, then f I(V). Otherwise, there is a bug in the design. To test if f (J +J0), ∈ ∈ Definition 4.2: Let J F[x1,...,xd] be an ideal. The radical it is required to compute a Grobner¨ basis of J + J0. ⊂ m of J is defined as √J = f F[x1,...,xd] : m N, f J . Grobner¨ Basis of Ideals: Buchberger’s algorithm [7], shown When J = √J, J is said{ ∈ to be a radical∃ ideal∈ and∈I(}V) in Algorithm 1, is used to compute a Grobner¨ basis over a field. is a radical ideal. The strong Nullstellensatz establishes the Given polynomials F = f1,..., fs , the algorithm computes { } G another set of polynomials G = g1,...,gt such that ideal , then Spoly( f ,g) 0 for all pairs f ,g. Conse- { } −→+ I = f1,..., fs = g1,...,gt . In the algorithm, quently, the polynomials f1,..., fs extracted from the circuit h i h i L L { } Spoly( f ,g)= f g (corresponding ideal J) and represented using such a term lt( f ) · − lt(g) · order would itself constitute a Grobner¨ basis of J. In [8], the where L = LCM(lm( f ),lm(g)), where lm( f ) is the leading authors derive exactly such a term order, and a similar concept monomial of f , and lt( f ) denotes the leading term of f . can be applied in our case. The set G has nice properties that allow to solve many Note that in our case: i) since the circuit constraints f1,..., fs are modeled as polynomials in F2, they contain polynomial decision questions. For our context, Grobner¨ basis { } provides a decision procedure to test for membership in an only multi-linear monomial terms; ii) the output of a gate is ideal. The set G is a Grobner¨ basis of ideal I, if and only if for uniquely computed, and it always appears as a “single variable all f I, dividing f by polynomials of G gives 0 remainder: term” in the polynomials; iii) the circuit is acyclic. Let xi be the ∈ G output variable of any gate Hi in the circuit, and let xp1,...,xpj i.e. G = GB(I) f I, f + 0. ⇐⇒ ∀ ∈ −→ denote variables that are the inputs to the gate Hi. If we can represent the polynomials fi such that xi > every monomial in Input: : F = f1,..., fs { } the variables xp1,...,xpj, then all ( fi, f j),i = j have relatively Output: : G = g1,...,gt 6 prime leading monomials and f1,..., fs is a Grobner¨ basis. G := F; { } { } Example 6.1: Consider a 2-bit multiplier over F 2 given in REPEAT 2 Fig. 1. Variables a0,a1,b0,b1 are primary inputs, z0,z1 are G := G ′ primary outputs, and s0,s1,s2,s3,r0 are intermediate variables. For each pair f ,g , f = g in G′ DO {G′ } 6 Spoly( f ,g) + r IF r = 0 THEN−→ G := G r 6 ∪ { } UNTIL G = G′ Algorithm 1: Buchberger’s Algorithm We now have a complete solution to our problem: We q q can take the polynomials f1,..., fs, x x1,...,x xd and { 1 − d − } compute a Grobner¨ basis G for the ideal J + J0. Then, we G reduce the specification polynomial f w.r.t. G. If f + 0, then the circuit is correct, otherwise there is a bug. −→ Fig. 1: A 2-bit Multiplier over F(22). The gate corresponds ⊗ Complexity of Grobner¨ basis: For our specific problem of to AND-gate, i.e. bit-level multiplication modulo 2. The gate corresponds to XOR-gate, i.e. addition modulo 2. ⊕ computing a Grobner¨ basis for J + J0 over Fq, the following result is known [6]: q q We perform a “reverse topological traversal” of the circuit. Theorem 5.1: Let I = f1,..., fs, x1 x1,...,xd xd Starting from the primary outputs, traverse the circuit to the F h − − i ⊂ q[x1,...,xd] be an ideal. The time and space complexity of primary inputs, and order the gates according to the their Buchberger’s algorithm to compute a Grobner¨ basis of I is (reverse) topological levels. The primary outputs z0,z1 are both O(d) bounded by q assuming that the length of input f1,..., fs at level-0, variables r0,s0,s3 are at level-1, s1,s2 are at level-2, is dominated by qO(d). and the primary inputs a0,a1,b0,b1 are at level-3. We order the k variables z0 > z1 > r0 > s0 > s3 > s1 > s2 > a0 > a1 > b0 > b1. In our case q = 2 , and when k and d are large, this Using this variable order, we impose a lex term order on the complexity may make verification infeasible. In what follows, monomials. Then the polynomials extracted from the circuit we show that a term order can be derived which makes all have relatively prime leading terms, as shown below: q q f1,..., fs,x1 x1,...,xd xd a Grobner¨ basis – obviating { − − } s0 + a0 b0, lm = s0; s1 + a0 b1, lm = s1 the need to apply Buchberger’s algorithm. · · s2 + a1 b0, lm = s2; s3 + a1 b1, lm = s3 · · VI. OBVIATING BUCHBERGER’S ALGORITHM r0 + s1 + s2, lm = r0; z0 + s0 + s3, lm = z0

To improve Buchberger’s algorithm, variations of the chain z1 + r0 + s3, lm = z1 and product criteria are applied. In our overall problem formulation, we also have variables F Lemma 6.1: [Product Criterion [23]] Let f ,g [x1, ,xd] A,B,S Fq. They can also be accommodated in this term order ∈ ··· ∈ be polynomials. If the equality lm( f ) lm(g) = by imposing Z > A > B > z0 > z1 > r0 > s0 > s3 > s1 > s2 > G · LCM(lm( f ),lm(g)) holds, then Spoly( f ,g) + 0. a0 > a1 > b0 > b1. The above result states that when the leading−→ monomials Thus, using the result of Proposition 2 [8], the set of of f ,g are relatively prime, then Spoly( f ,g) always reduces polynomials G1 = f1,..., fs is a Grobner¨ basis for J. Note q { q } to 0 modulo G. Thus Spoly( f ,g) need not be considered that G0 = x x1,...,x xd is a Grobner¨ basis for J0. { 1 − d − } in Buchberger’s algorithm. If we could analyze the given However, we have to compute a Grobner¨ basis of J + J0 = q q circuit and derive a term order such that every polynomial f1,..., fs, x x1,...,x xd . Not all polynomials pairs in h 1 − d − i pair ( f ,g) in the generating set has relatively prime leading G1 G0 have relatively prime leading monomials. ∪ Consider an arbitrary polynomial fi G1. Using our term a counter-example for debugging – and a SAT or SMT- ∈ order, we have fi = xi +tail( fi); i.e. the leading monomial of fi solver can find such an assignment in no time. Our results q is a single variable term xi. Clearly, the pairs (xi +tail( fi), xi therefore obviate the need to construct a Grobner¨ basis, and q − G1,G0 xi), fi G1, xi xi G0 do not have relatively prime leading the verification can be performed only by reduction: f r. ∈ − ∈ q −→ + monomials. In fact, the pairs (xi +tail( fi),xi xi) are the only ones to be considered for Grobner¨ basis computation,− as all VII. EXPERIMENTAL RESULTS other pairs have relatively prime leading terms. This motivated We have conducted experiments to verify custom imple- us to investigate further the question: “what is the result of the mentations of Mastrovito and Montgomery Multipliers using q G1,G0 SAT/SMT/BDDs and our proposed approach. We use the reduction Spoly(xi + tail( fi),xi xi) + r?” We state and prove the following: − −→ computer algebra tool SINGULAR [v. 3-1-3] [24] to conduct Theorem 6.1: Let R = Fq[x1,...,xd] on which we have a multivariate polynomial division for our approach. Our ex- monomial order . Let I be a subset of 1,...,d . For all i I, periments are conducted on a desktop with 2.40GHz Intel ≤ { } ∈ let fi = xi +Pi (where Pi = tail( fi)) such that all indeterminates Core(TM)2 Quad CPU and 8GB memory running 64-bit x j that appear in Pi satisfy xi > x j. Then the set G1 G0 = fi : Linux. The initial designs are given in EQN format and q q ∪ { then translated to different formats: CNF, SMTLIB, BLIF, i I x1 x1,...,xd xd is a Grobner¨ basis. ∈ }Proof: ∪ { −According− to Buchberger’s} Theorem (Theorem Polynomials, as used by SAT, SMT, BDD and the SINGULAR 1.7.4 in [22]), we need to show that for all f ,g G1 G0, tool, respectively. G1,G0 ∈ ∪ Evaluation of SAT/SMT/BDD: Using the conventional Spoly( f ,g) + 0. Lemma 6.1 shows that if f ,g have rela- −→ G ,G equivalence checking approaches, we create a “miter” with the tively prime leading terms, then Spoly( f ,g) 1 0 0. So the −→ + specification and the implementation, and use SAT/SMT/BDD only case where Lemma 6.1 does not apply is when f = xi +Pi q q 1 q 1 methods to check if it is unsatisfiable. While the implemen- and g = xi xi. Then Spoly( f ,g)= xi − f g = Pixi − + xi. tation is given as a circuit, the specification is given as a In what follows,− it is important to note that the− indeterminates word-level polynomial Z = A B (mod P(x)). For SAT and appearing in Pi are all less than xi. · q 1 q 2 2 q 2 BDDs, we use a Mastrovito-style gate-level circuit as the First of all, Pixi − +xi Pixi − (xi +Pi)= Pi xi − +xi, which golden model. For SMT experiments, the designs are modeled q 1 xi+−Pi 2 q 2 shows that Pixi − + xi Pi xi − + xi. at bit-vector level using QF-BV theories, maintaining a BV- 2 q 2 −→2 q 3 3 q 3 Next, Pi xi − + xi Pi xi − (xi + Pi)= Pi xi − + xi. Continu- level abstraction whenever possible. Table I shows that none − q 1 q 1 q ing in this fashion, we get Pi − xi +xi Pi − (xi +Pi)= xi +Pi , of BDDs, SAT or SMT solvers can verify the correctness of q q − and finally xi + P (xi + Pi)= P Pi. Hence, circuits beyond 16-bits. i − i − F q 1 xi+Pi 2 q 2 xi+Pi 3 q 3 xi+Pi TABLE I: Runtime for correct multiplier circuits over 2k for BDDs, Pix − + xi P x − + xi P x − + xi i −→ i i −→ i i −→ ··· SAT, SMT-solver based methods. TO = timeout of 10hrs. Time is givn xi+Pi q xi+Pi q in seconds. Pi + xi Pi Pi. ··· −→ −→ − Word size of the operands k-bits F q q Solver 8 12 16 Over the Galois field q, Pi Pi I(V(0)) = x1 q − ∈q q h − MiniSAT 22.55 TO TO x1,...,xd xd . By Lemma 6.1, G0 = x1 x1,...,xd xd − i q G0 { − − } CryptoMiniSAT 7.17 16082.40 TO is Grobner¨ basis. Therefore Pi Pi + 0. PicoSAT 14.85 TO TO − → G1,G0 In conclusion, f ,g G1 G0, Spoly( f ,g) + 0 and Yices 10.48 TO TO ∀ ∈ ∪ −→ Beaver 6.31 TO TO hence G1 G0 is a Grobner¨ basis. ∪ F CVC TO TO TO We setup the verification problem in q[x1,...,xd], on which Z3 85.46 TO TO we have the monomial order as specified above. We extract Boolector 5.03 TO TO ≤ SimplifyingSTP 14.66 TO TO the set of polynomials G1 = f1,..., fs from the circuit. q { q } ABC 242.78 TO TO We append the set G0 = x1 x1,...,xd xd . Then the set { − − } q BDD 0.10 14.14 1899.69 G1 G2 is a Grobner¨ basis of the ideal J +J0 = f1,..., fs,x1 ∪ q h − Conceptually, our approach requires to check whether the x1,...,xd xd . We take our specification polynomial f and −G1,iG0 specification f J + J . Without deducing the result of The- compute f r. If r = 0, then f J + J and the circuit 0 + 0 orem 6.1, if we∈ use Singular to compute a Grobner¨ basis for is correct; otherwise,−→ if r = 0, then∈ we have a bug in the G G using the proposed term order of [8], we can only design. Moreover, if r = 0,6 then the monomial order ensures 1 0 verify∪ the correctness upto 48-bit multipliers. Beyond that, that r contains only the6 primary input variables. To show this, the Grobner¨ basis engine runs into memory explosion. This assume that r = 0 and r contains either an intermediate or a is shown in Table II. primary output6 variable x . As there always exists a polynomial j Evaluation of Our Approach: Our approach only requires f in G G with lm( f ) = x , r can be further reduced j 1 0 j j the Grobner¨ basis reduction (division) for the verification test: by f . Continuing∪ in this fashion, all the terms with non- j G1,G0 primary-input (intermediate or primary output) variables can f + r and to check if r = 0? Results for verification of be eliminated. Finally, in the presence of a bug, any assignment Mastrovito−−→ multipliers using only this Grobner¨ basis reduction to the (primary-input) variables that makes r = 0, provides (REDUCE command in SINGULAR) are shown in Table III. 6 TABLE II: Verification of correct circuits by computing GB(J +J0). verification can be formulated by simply carrying out the MO=out of 8G memory. Time is givn in seconds. G1,G0 reduction f + r. Using our approach, we are able to verify Size 16 32 64 96 128 160 163 −→ the correctness of upto 163-bit multipliers over F 163 , whereas #variables 323 1155 4355 9603 16899 26243 27224 2 #polynomials 609 2241 8577 19009 33537 52161 54117 conventional techniques based on SAT/SMT/BDD solvers are #terms 2415 9439 37311 83615 148351 231519 240261 infeasible. Time 0.94 93.80 MO MO MO MO MO REFERENCES [1] E. D. Mastrovito, “VLSI Designs for Multiplication Over Finite Fields With our approach, we can verify the correctness of upto 163- GF(2m),” Lecture Notes in Computer Science, vol. 357, pp. 297–309, bit Mastrovito multipliers. We also experimented with bug- 1989. [2] C. Koc and T. Acar, “Montgomery Multiplication in GF(2k),” Designs, catching in incorrect designs; the bugs were introduced by Codes and Cryptography, vol. 14, no. 1, pp. 57–69, Apr. 1998. arbitrarily swapping the wires (variables) xi with x j, for some [3] H. Wu, “Montgomery Multiplier and Squarer for a Class of Finite i = j. In such cases, we obtained a non-zero r. We used a Fields,” IEEE Transactions On Computers, vol. 51, no. 5, May 2002. 6 [4] D. Babic and M. Musuvathi, “Modular Arithmetic Decision Procedure,” SAT-solver to find a SAT assignment to r = 0, and the counter- Microsoft Research, Tech. Rep. TR-2005-114, 2005. example was generated in no time. These6 run times are shown [5] E. Biham, Y. Carmeli, and A. Shamir, “Bug Attacks,” in Proc. Annual in Table III. Conf. on Cryptology: Advances in Cryptology, ser. CRYPTO 2008, 2008, pp. 221–240. Results of the verification of Montgomery multipliers is [6] S. Gao, “Counting Zeros over Finite Fields with Grobner¨ Bases,” shown in Table IV. Montgomery multipliers are significantly Master’s thesis, Carnegie Mellon University, 2009. larger than Mastrovito multipliers. If we represent a poly- [7] B. Buchberger, “Ein Algorithmus zum Auffinden der Basiselemente des Restklassenringes nach einem nulldimensionalen Polynomideal,” Ph.D. nomial for every gate in the design, then we create too dissertation, University of Innsbruck, 1965. many variables (d) in the system, exceeding SINGULAR’S [8] O. Wienand, M. Wedler, D. Stoffel, W. Kunz, and G. Gruel, “An Alge- capacity (d 32767). For this reason, we have to partition braic Approach to Proving Data Correctness in Arithmetic Datapaths,” ≤ in Intl. Conf. on Computer-Aided Verification, 2008. the circuit, and construct the polynomials for each circuit [9] R. J. McEliece, Finite Fields for Computer Scientists and Engineers. partition – and we ensure that our term ordering constraint Kluwer Academic Publishers, 1987. is not violated. With such efforts, we are able to verify [10] R. E. Bryant, “Graph Based for Boolean Function Manip- ulation,” IEEE Trans. on Computers, vol. C-35, pp. 677–691, August Montgomery multipliers upto 128-bits, beyond which we still 1986. exceed SINGULAR’S capacity. [11] R. Drechsler and et al, “Efficient Representation and Manipulation of Switching Functions based on Ordered Kronecker Functional Decision Diagrams,” in DAC, 1994, pp. 415–419. TABLE III: Runtime for verifying bug-free and buggy Mastrovito [12] R. E. Bryant and Y.-A. Chen, “Verification of Arithmetic Functions with multipliers using our approach. Time is given in seconds. Binary Moment Diagrams,” in DAC, 95. [13] T. Rajaprabhu and et al., “Modd for CF: A Compact Representation for Size 16 32 64 96 128 160 163 Multiple Output Function,” in IEEE International High Level Design #variables 323 1155 4355 9603 16899 26243 27224 Validation and Test Works hop, 2004. #polynomials 291 1091 4227 9411 16643 25923 26989 [14] B. Alizadeh and M. Fujita, “Modular Datapath Optimization and Ver- #terms 1793 7169 28673 64513 114689 179201 185984 ification based on Modular-HED,” IEEE Trans. CAD, pp. 1422–1435, Bug-free 0.04 1.41 112.13 758.82 3054 9361 16170 Sept. 2010. Bugs 0.04 1.43 114.86 788.65 3061 9384 16368 [15] N. Shekhar and et al., “Equivalence Verification of Polynomial Datap- aths with Fixed-Size Bit-Vectors using Finite Ring Algebra,” in ICCAD, 2005. TABLE IV: Runtime for verifying bug-free and buggy Montgomery [16] ——, “Equivalence Verification of Polynomial Datapaths using Ideal multipliers using our approach. Time is given in seconds. Membership Testing,” IEEE Trans. CAD, pp. 1320–1330, July 2007. [17] S. Morioka, Y. Katayama, and T. Yamane, “Towards efficient verification Size 16 32 48 64 96 128 of arithmetic algorithms over Galois fields GF(2m),” Proc. Computer- #variables 319 1194 2280 4395 6562 14122 Aided Verification, vol. 2102, pp. 465–477, 2001. #polynomials 287 1130 2184 4267 6370 13866 [18] D. Mukhopadhyaya, G. Sengar, and D. Chowdhury, “Hierarchical Ver- #terms 2262 10741 18199 40021 55512 134887 ification of Galois Field Circuits,” IEEE Trans. on CAD, 2007. Bug-free 0.03 1.50 11.03 27.70 1802.75 10919.35 [19] J. Lv, P. Kalla, and F. Enescu, “Verification of Composite Galois Field Bugs 0.03 1.52 11.10 28.18 1812.15 11047.10 Multipliers over GF((2m)n) using Computer Algebra Techniques,” in IEEE High-Level Design Validation and Test Workshop (HLDVT 2011), 2011. VIII. CONCLUSIONS [20] ——, “Formal Verification of Galois Field Multipliers using Computer This paper has presented a formal approach to model Algebra,” in 25th IEEE International Conference on VLSI Design F (VLSID 2012), 2012. and verify multiplier circuits over Galois fields 2k using a [21] E. Pavlenko and et al., “STABLE: A new QBF-BV SMT Solver for hard computer-algebra based approach. We show how the verifi- Verification Problems combining Boolean Reasoning with Computer cation test can be formulated as membership testing of the Algebra,” in Proc. Design Auto. & Test in Europe (DATE), 2011. [22] W. W. Adams and P. Loustaunau, An Introduction to Grobner Bases. specification polynomial f in a (radical) ideal G1,G0 = American Mathematical Society, 1994. q q h i [23] B. Buchberger, “A criterion for detecting unnecessary reductions in the f1,..., fs,x1 x1,...,xd xd , where G1 = f1,..., fs corre- spondsh to the− ideal generated− i by polynomials{ extracted} from construction of a groebner bases,” in EUROSAM, 1979. q [24] W. Decker, G.-M. Greuel, G. Pfister, and H. Schonemann,¨ “SINGULAR the circuit, and G0 = xi xi corresponds to the ideal of 3-1-3 — A for polynomial computations,” vanishing polynomials{ of the− field.} By analyzing the circuit 2011, http://www.singular.uni-kl.de. topology, we derive a monomial order that makes the set G1 G0 itself a Grobner¨ basis of J + J0. Subsequently, the ∪