Journal of Electronic Testing (2020) 36:643–663 https://doi.org/10.1007/s10836-020-05904-2

Formal Verification of ECCs for Memories Using ACL2

Mahum Naseer1 · Waqar Ahmad1 · Osman Hasan1

Received: 12 April 2020 / Accepted: 2 September 2020 / Published online: 26 September 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Due to the ever-increasing toll of soft errors in memories, Error Correction Codes (ECCs) like Hamming and Reed-Solomon Codes have been used to protect data in memories, in applications ranging from space to terresterial work stations. In past seven decades, most of the research has focused on providing better ECC strategies for data integrity in memories, but the same pace research efforts have not been made to develop better verification methodologies for the newer ECCs. As the memory sizes keep increasing, exhaustive simulation-based testing of ECCs is no longer practical. Hence, formal verification, particularly proving, provides an efficient, yet scarcely explored, alternative for ECC verification. We propose a framework, with extensible libraries, for the formal verification of ECCs using the ACL2 theorem prover. The framework is easy to use and particularly targets the needs of formally verified ECCs in memories. We also demonstrate the usefulness of the proposed framework by verifying two of the most commonly used ECCs, i.e., Hamming and Convolutional codes. To illustrate that the ECCs verified using our formal framework are practically reliable, we utilized a formal record- based memory model to formally verify that the inherent properties of the ECCs like hamming distance, codeword decoding, and error detection/correction remain consistent even when the ECC is implemented on the memory.

Keywords Error Correction Codes (ECCs) · Memory soft errors · Hamming codes · Convolutional codes · Formal verification · Theorem proving · ACL2

1 Introduction in IC packaging material, induce the silicon based semi- conductor memories to change their logic states, hence Soft errors are type of errors that do not cause permanent resulting in soft errors [10, 47]. damage to the semi-conductor devices [56], yet leading to Recent advancements in technology, including circuit temporary faults in them. In particular, radiation induced miniaturization, voltage reduction, and increased circuit soft errors have been a major concern in semi-conductor clock frequencies, have augmented the problem of soft devices since 1970s [12, 60]. In a long chain of events, errors in memories [10, 48]. The most obvious drawbacks of both the high speed protons in cosmic rays and the alpha memory errors include the loss of correct data and the addi- particles emitted during the decay of radioactive impurities tion of faulty data into the memory. However, depending on the application/system using the memory, the severity of these memory errors could vary. This is summarized in Fig. 1. In a LEON3 processor, a memory error may simply Responsible Editor: V. D. Agrawal cause a result error, i.e., an erroneous output from an algo-  Mahum Naseer rithm running on the system, or a system timeout, i.e., the [email protected] termination of an application without any result [39]. Sim- ilarly, in a Xilinx FPGA, such errors may cause the system Waqar Ahmad to halt [33]. [email protected] Error Correction Codes (ECCs) [44], are used to cater Osman Hasan for memory errors by adding extra bits, often called parity [email protected] or check bits, to the data bits in the memory. The parity bits are calculated using the available data bits, and in case of an error, the lost data is retrieved using these parity bits. Hence, 1 School of Electrical Engineering and Computer Science (SEECS), National University of Sciences and Technology ECCs are considered to be the most effective solution for (NUST), Islamabad, Pakistan memory errors [10], and since the introduction of Hamming 644 J Electron Test (2020) 36:643–663

Fig. 1 Impact of a) Technology Miniaturization, b) Voltage Scaling, and c) Increased Clock Frequency on memories, and its consequences

codes [27] in 1950, ECCs have remained an active domain Generally, there are two major categories of formal of research. methods to ensure system resilience: model checking and theorem proving. While the former was explored in earlier 1.1 Motivation researches for ensuring circuit reliability, its use was limited to smaller systems [43] due the large number of states Simulation based testing is the most commonly used formed in larger systems and state-space-explosion [18]. technique for ensuring the correctness of ECCs in memories The latter resolves the limitation posed by large/infinite [7, 20, 28, 51, 54]. Initially, errors are injected at the state space in model checking using induction [26]. input of the memory model, in a process known as fault However, this is achieved at the cost of increased complexity injection. The performance of the ECC, i.e., how well an of implementation [19]. ECC corrects the errors, is then evaluated at the output of the In this paper, we developed a framework for formal model. This approach of testing is quite effective for smaller verification of ECCs used in memories using a semi- memories, where exhaustive simulation can be somewhat automated theorem prover A Computational Logic for achievable. However, as the memory size grows, it becomes Applicative (ACL2) [37]. ACL2 is a powerful increasingly difficult to employ exhaustive simulation [20]. Lisp-based tool in a sense that it not only provides automatic So, a common practice is to pick random combinations of proof execution, but also enables its users to direct the proof input errors, and observe the response of an ECC in presence procedures in a meaningful way using the hints facility. In of those error combinations [7, 20, 28]. This undermines the addition, it is an efficient tool that augments the speed of reliability of simulation results in determining the resilience proof procedures by the use of previously proved lemmas to of ECCs against the errors. verify new ; this means that a theorem which may [29] have been extensively used to require huge amount of time for verification if proved using provide an efficient alternative to the simulation based only the basic logic axioms, can be proved in a significantly testing. The main idea here is to first construct a shorter duration using auxiliary lemmas. mathematical model of the given system using a state- machine or an appropriate logic, and then use logical 1.2 Challenges reasoning and deduction methods to formally verify that this model exhibits the desired characteristics of the system. The major challenges while using ACL2 for a framework The desired behavior of the system is also specified for verification of ECCs used in memories are: mathematically using appropriate logic. This overcomes the need of applying combinations of errors at input, and • Formal Modelling: shifts the focus of the verification task on formal reasoning The ECCs are generally represented either as a instead. system of encoder and decoder equations/logic, or as J Electron Test (2020) 36:643–663 645

a hardware circuit implementation [32, 51, 54]. Hence, memories and forms the basis of several multiple-error- the first challenge is to create a system model that correcting codes like two-dimensional codes. fulfills all the specifications of the ECC equations/ • Formalization of convolutional codes, which provide logic/hardware implementation. the foundation for the formal verification of the more • Formal Reasoning: sophisticated ECCs, such as turbo codes. To the best In formal methods, having a logical explanation for of our knowledge, this is the first endeavor towards the the behavior of ECC is not sufficient to ensure the formal verification of convolutional codes. correctness of the model. So, the second challenge is • Utilizing ACL2 theorem prover for developing libraries to use mathematical reasoning to verify the model and for formal verification of ECCs used in memories. its associated properties. There are two major classes of ACL2 is a semi-automatic tool, which provides properties that we dealt with: (i) properties that verify necessary automation to facilitate the verification working of the ECC in the absence of any errors, and process. This signifies the ease of use of our libraries. (ii) the properties that verify ECC performance in the • Implemention of the formally verified ECCs on a presence of the error. formal record-based memory model to demonstrate • Translating the formal ECC model and its properties in that the verified properties of the ECCs pertaining to Lisp (ACL2) language: encoding/decoding, error detection and error correction, Hand-written mathematical models and proofs are are generic enough to easily comply with any given unsound due to the possibility of human errors involved memory model. in the modeling/proof procedure. Hence, we need to translate our model and the associated mathematical The rest of the paper is organized as follows: Section 2 proof procedure to ACL2. This requires some degree provides an overview of the current state–of–the–art in of expertise in understanding the Lisp language and the domain of formal verification for memory/system error handling the tool. resilience. Section 3 describes the preliminaries for the • Model Reusability: ACL2 theorem prover. Section 4 presents our proposed To minimize the formalization efforts during the methodology. Sections 5 and 6 define the formalization of verification of new ECCs, such as Turbo codes, the our two libraries i.e., the Standard Library for ECCs and existing formalization framework needs to be composi- the ECCs Library. Section 7 demonstrates a case study tional in nature in order to provide reusable definitions using a record-based memory model. Finally, Section 9 and theorems. This ensures wider applicability of the concludes our work with guildelines to extend our approach framework for the verification of ECCs with different for verification of advanced ECCs using our framework. encoder/decoder designs, varied codeword lengths, and diverse error correction capabilities. 2 Related Work 1.3 Our Contributions In this section, we give a brief overview of the available Earlier works [1, 26], in the domain of theorem proving, literature about the formal analysis for system/memory aimed at building coding theory libraries, using SSReflect reliability, using different formal methods. and Lean theorem provers. The primary objective of these Binary Decision Diagram (BDD) [16]issimplya works was not to analyze memory reliability in the presence directed acyclic graph that provides an alternate, and often of soft errors, but rather ensure error-free communication. more compact, representation of the Boolean relations. Hence, their implementation strategies focused on noisy Some earliers works [15, 41] utilized BDD-based verifica- channels, rather than specific memory models. tion to ensure resilience of circuit models in the presence In this paper, we propose a formal framework consisting of errors. Equivalence checking was employed, using BDD, of generic ECC properties using ACL2. This formal ECC to compare the performance of two identical models, i.e., framework allows us to verify the correctness of the ECCs the reference/golden model and the model under fault injec- generally used in memories. Our novel contributions in this tion [41]. However, the error protection approaches under paper are as follows: study included simple error detection codes and Triple Mod- ular Redundancy (TMR) circuits. For the error detection • Providing extensible libraries, with functions and codes, the approaches also remained inconclusive in ensur- properties, like the error injection function, which are ing the functional correctness of the given circuit in absence common to multiple ECCs. of an error. Moreover, the size of BDD grows as the num- • Formalization and verification of Hamming codes, ber of input variables/bits increase. In such cases, obtaining which is among the most widely-used ECCs in a compact BDD representation, for efficient equivalence 646 J Electron Test (2020) 36:643–663 checking, depends on the optimum variable ordering, which satisfying solution. However, the approaches still verified has been found to be an NP-hard problem [13]. Hence, robustness for small circuits only. in practice, the BDD-based verification may not be an Theorem proving [19] uses a deductive reasoning approach appropriate approach for large memories. to verify the specifications of a system. Although this Model checking [18] utilizes the system’s state space approach requires an in-depth understanding of the system to explore specifications satisfied by the system as under study and a high level of expertise in formal reasoning, input transitions through the system. Model checking- it can be effectively applied to larger systems as well. based approaches have been proposed in numerous works The earliest work [9] related to ECC verification in [8, 43] to ensure resilience of circuits’ components in theorem proving demonstrated a case study of the project on the presence of soft errors. The main idea is to ensure properties of computer algebra. The formal proofs for some that the components return to their correct state after the properties of Hamming and Bose-Chaudhuri-Hocquenghem occurrence of an error. In a similar approach, the knowledge (BCH) codes were presented using the Isabelle theorem of the vulnerable circuit components was shown to be prover [34]. Earlier attempts of ACL2 theorem prover usage useful in reducing the power required for error protection was for the register dependability analysis in the presence of of the circuit [55], by securing only the components an error [49]. TMR was employed on a single register and most vulnerable to such errors. Other model checking thus the register was triplicated. The correction of a fault based efforts are mainly targeted towards verifying the injected on one of the registers, via majority voting, was performance of simple error correction approaches like then formally verified. This was the first study focusing on TMR [31] and elementary parity encoding [5, 43]in the use of theorem proving for hardware verification of a memory elements. However, all these works lacked the memory component in the presence of error. inclusion of sophisticated error correction codes, such The formal verification of Hamming codes and decoding as Hamming codes, which are commonly used in state- algorithm of Low Density Parity Check (LDPC) codes, of-the-art systems for error resilience. A dedicated tool using the SSReflect extension of Coq proof assistant [1, BLUEVERI [46] was proposed to exhaustively explore 2, 4], was the first elaborate work entirely focusing on the state-space of IBM’s ECC circuits to identify their the formalization of any ECCs using theorem proving. design bugs. However, the BLUEVERI requires an in-depth A systematic approach was taken to initially formalize knowledge of the ECC’s hardware implementation, hence the Hamming encoder and decoder. The approach was limiting its portability to other memories. Also, due to the gradually extended to the formalization and verification of exhaustive search of state-space, the computational (timing decoding algorithm for LDPC codes. The work was later and memory) requirement of the tool increases as the size expanded to include the formalization and verification of of ECC circuits increase; this makes the tool infeasible the Reed-Solomon codes [3] and BCH codes [4]. Like [1], for large memories. Despite being a highly automated these were elaborate studies to verify the theoretical model verification technique, the state-space explosion, i.e., an of important classes of ECCs. All the indicated works were exponential growth in the size of formal model with the aimed at ensuring error-free communication. increase in number of state variables, is a common problem Among the most recent work published in the field in model checking-based approaches, which is likely to of formal verification of ECCs include verification efforts overshadow their performance if used for the verification of [26, 40] using the coding theory library, Cotoleta, based the sophisticated ECCs in larger systems/memories [18]. on the Lean theorem prover [42]. The usefulness of To overcome the issue of scalability in model checking, the established library was demonstrated by the formal Boolean Satisfiability (SAT) solvers are commonly used [35, verification of Hamming (7, 4) codes. However, the work 59]. SAT solvers use the propositional expressions for the was not extended to indicate the practicality of the library system and its properties to deduce whether the properties for formal verification of other commonly used ECCs. hold true for the system. In case the property does not hold To the best of our knowledge, despite soft errors being true for the system, the prover provides a counterexample. a growing concern in memories, no extensive research has In the bound-based SAT approaches [23, 24], the circuit focused on using theorem proving for the formal verification of components were categorized as either robust, non-robust ECCs, which is the scope of the current paper. (i.e., giving incorrect output in presence of error), or non- classified (i.e., not causing an erroneous output but causing Silent Data Corruption). The idea proposed was to use the 3 ACL2 Preliminaries upper and a lower bounds for the time/number of cycles to identify the robust and non-robust circuit components in ACL2 [37] is a first-order-logic theorem prover featuring the circuits, instead of doing a thorough search to find a several powerful verification algorithms for proving the- J Electron Test (2020) 36:643–663 647 orems automatically or by using user-guided intelligent a pair from the supplied arguments, and append that hints. It is also equipped with a mechanical theorem prover concatenates the available lists. allowing users to construct proofs interactively by using a proof builder. ACL2’s logic is directly executable, which implies that a system model can be tested by using con- 4 Proposed Methodology crete executions besides symbolic verification. Since ACL2 is developed in Common Lisp, it also offers high execution Despite having a unique logic for encoding/decoding, all efficiency provided by the underlying Lisp . ECCs have two requirements in common: A user interacts with ACL2 using REPL (read-eval-print 1. In the absence of any error, decoding the codeword must loop) and a new function can be defined by using the provide the data in the correct form: keyword defun. ∀ ∃ = (def un F unction − Name data. code. ( (Encode (data) code) =⇒ = (input1,input2, ...) (Decode (code) data)) ∗ functionbody∗ ) 2. Depending on their error-correcting capability, all ECCs must be able to correct error(s): Similarly, a proof attempt can be invoked by using the defthm event, which takes a proof goal and then attempts ∀data. ∃code. ∃bad code. to verify it automatically by utilizing several clause- ((Encode (data) = code) ∧ processors [38], which are ACL2’s automatic verification (F ault inject (code) = bad code) algorithms. =⇒ (Decode (bad code) = data)) (def thm T heorem − Name Similarly, while the encoder and decoder functions for ∗proofgoal ∗ every ECC are unique, there are a few essential functions ∗: : ∗ hints / instructions (optional) ) required for verification of all ECCs. For instance, to verify Akeyword:hints provides user-guidance to a defthm the reliability of ECC in presence of error, the fault/error event to direct the proof attempt. As mentioned earlier, injection function is needed. We propose a framework, ACL2 also offers a proof builder facility, which allows developed using the ACL2 theorem prover, with extensible user to control the prover like an interactive theorem libraries of functions, which are essential for the verification prover. Similar to :hints, the proof builder commands can of ECCs. The proposed methodology is shown in Fig. 2. also be supplied to the defthm event using the keyword The first step in our methodology is to formally :instructions. The keyword :instructions initi- represent the system (in our case, ECCs) specifications. ate the series of commands that the prover must carry out in The whole operation of an ECC is captured in its encoding the specified sequence in order to verify the proof goal. and decoding algorithms, which are hence inputs to the An important concept in ACL2 is the encapsulation proposed framework. Due to the diversity of memory sizes principle [38], which allows the introduction of constraint on which each ECC is applicable, and the varying degree functions, and then specifies constraints over these func- of error protection required by memories used in different tions by using the keyword encapsulation. A closely applications, code specific information, like data word size related concept to encapsulation is the derived rule of infer- and code rate for ECC, are also taken as input to the ence called functional instantiation. This rule states that a framework. theorem may be derived by replacing the function sym- As indicated earlier, there are certain functions, like fault bols of any theorem by new function symbols, provided that injection, which exhibit the same behavior for many ECCs. the new symbols satisfy all the constraints on the replaced We developed a library of formalized standard functions, symbols [14]. as described in Section 5, with the most commonly The ACL2 functions car and cdr are used to return used functions in ECCs. The next step in our proposed the first and second members of the pair. Similarly, the procedure is to use the ECC specifications and code specific constants t and nil represent true and false, respectively. information indicated above, along with the functions from Booleanp, true-listp,andalistp are ACL2 our library of formalized standard functions, to formulate recognizers for boolean contants/variables, true-lists (i.e., formal ECC encoder and decoder models/functions. The an array whose last element is a nil), and association lists encoder and decoder functions together provide the formal (i.e., list of lists), respectively. Other commonly used ACL2 model of the ECC. keywords include equal that checks whether the given To ensure the functional and behavioral correctness of constants/variables/lists are equal, cons that constructs the ECC model, ECC properties/theorems are formally 648 J Electron Test (2020) 36:643–663

Fig. 2 Proposed Methodology

expressed using the ECC model and the library of codes. Since both functions are behaviorally similar, we will formalized standard functions. In the final step of the only discuss bit flip tlist here for brevity. proposed methodology, the ACL2 theorem prover identifies if the indicated properties hold true for a given ECC model. Definition 1 If a given property fails, it gives valuable feedback, which helps to correct the formalization of the ECC. In case, ACL2 (def un bit flip tlist (n lst) verifies all the properties, the ECC is deemed verified. The (if (zp n) verified ECC is finally stored into the ECCs library, which (cons (not (car lst)) (cdr lst)) will be discussed in detail in Section 6. (cons (car lst) (bit flip tlist (− n 1)(cdr lst))))) 5 Library of Formalized Standard Functions where lst represents the codeword and n is the location In this section, we formally define some standard functions where error can be inserted. zp is an ACL2 recognizer that are used in ECC formalization and verify their key for zero. The function recursively calls itself, with the properties using ACL2. top-most element removed and n decremented by 1 in each iteration, i.e., (bit flip tlist (- n 1) (cdr 5.1 Fault Injection lst)), until (zp n) is true, i.e. n becomes zero.Once this condition is true, the top-most element of lst, (car The functions bit flip tlist and bit flip pair lst), is flipped using the not keyword, and concatenated are formally defined to inject errors at arbitrary location to the remaining codeword, (cdr lst). in the codeword. bit flip tlist is applicable to codewords from block codes, described in Section 6, while The behavior of fault injection model is formally bit flip pair is defined to work with convolutional described in following properties: J Electron Test (2020) 36:643–663 649

Theorem 1.1 The above property states that the codeword before and after fault injection differs exactly at one codeword loca- tion. The function count element mismatch tlist (def thm len − bit − flip− tlist is another function from our library of formalized standard (implies (and (>= n 0) functions, which gives the number of places the codewords (< n (len lst)) lst and (bit flip tlist n lst) differ in. (true − listp lst)) 5.2 Comparator (equal (len lst) (len (bit flip tlist n lst))))) Once faults are injected into the codeword, the error- correcting capability of the ECC is typically determined The above theorem ensures that fault injection does not on the basis of the number of faults that an ECC can add or remove bits from a codeword. The theorem is detect/correct. The BDD –based simulation approach in [41] structured into an implication denoted by the keyword proposed the use of a golden device, i.e., an uncorrupted implies. The first argument of implication gives the device, to compare and evaluate the performance of the necessary premises/conditions, which must be satisfied device where fault is injected in it. for the property to hold true, while the second argument We use a similar strategy in our work. We formally count element mismatch indicates the conclusion or the property itself. The above define two functions, i.e., tlist count element mismatch pair theorem states that if the error location n is a non-negative and ,that number, (>=n0), less than the length of lst, (< n can count the number of mismatches in the original code- (len lst)), and lst is a true-list, then the length of word and the codeword after fault injection. Again, we are count element mismatch tlist codeword before and after fault injection, i.e., (len lst) only discussing , and (len (bit flip tlist n lst)),isequal. which works for block codes. The approach used for count element mismatch pair is almost similar and Theorem 1.2 is applicable to the convolutional codes. Definition 2 (defthm list − change − af t er − flip− tlist (implies (and (>= n 0) (< n (len lst)) (def un count element mismatch tlist (A B) (true − listp lst)) (if (endp A) (not (equal lst (bit flip tlist n lst))))) 0 (if (equal (car A) (car B)) Fault injection always changes a codeword bit. This is (count element mismatch tlist described in the above theorem by indicating that the (cdr A) (cdr B)) lst original codeword , and the codeword after fault (+ 1 injection, (bit flip tlist n lst), are different. (count element mismatch tlist Theorem 1.3 (cdr A) (cdr B))))))

(defthm single − bit − flips− tlist where the ACL2 function endp returns true if the given list (implies (and (>= n 0) is empty. The above function recursively counts the number (< n (len lst)) of mismatch elements in the corresponding locations of the (true − listp lst) codewords A and B. For instance, if A = (101) and B = (not (endp lst))) (100) then the function count element mismatch tlist returns 1. (equal (count element mismatch tlist lst (bit flip tlist n lst)) We proved some essential properties about the above 1))) comparator function in ACL2 as follows: 650 J Electron Test (2020) 36:643–663

Theorem 2.1 Theorem 3.2 (def thm count − mistmatch − in − equal − lists − tlist (def thm even − binary − number (implies (true − listp lst) (implies (equal d nil) (equal (count element mismatch tlist (evenp (bin to dec a b c d)))) lst lst) 0))) Having the last bit of the binary number as nil implies an even number. evenp is the ACL2’s built–in recognizer for The above property states that the identical codewords differ even integers. at zero locations. In other words, when comparing identical codewords, the comparator function should return 0. Theorem 3.3 − − Theorem 2.2 (def thm odd binary number (implies (equal d t) (def thm count − mismatch − append − tlist (oddp (bin to dec a b c d)))) (implies (and (true − listp A) In case the last bit of the binary number is t, this implies that (true − listp B) the number is odd. oddp is the ACL2’s built–in recognizer − (true listp C)) for odd integers. (equal (count element mismatch tlist (append A B) (append A C)) 6 Formal ECCs Library

(count element mismatch tlist ECCs can be divided into two broad categories: the block B C)))) codes and the convolutional codes [45]. The block codes are branch of codes where, depending on the type of memory, Adding identical bits in the corresponding locations of the the data blocks of constant lengths are used to generate a codewords do not effect the number of mismatches. The fixed number of parity bits. The convolutional codes, on above theorem states this property formally by indicating other hand, have a fixed code rate m/n where n output that the number of element mismatches in two codewords B (codeword) bits are generated for m input (data) bits. This and C is the same as that in the codewords (append A B) section describes formalization of two ECCs, one from and (append A C),whereA is another codeword each ECC category: the Hamming (7,4) codes and the appended to the top of codewords B and C. 1/2 rate Convolutional codes. Both are extensively used in memories, and are common to several advanced Error 5.3 Binary–to–Decimal Convertor Correction Codes. A brief overview of the encoding and decoding algorithms of both ECCs are first highlighted. During verification, it is often simpler to compare the data This is followed by the formalization of these ECCs. Finally words as decimal numbers, rather than comparing them as the details of the formal verification of their associated binary sequences. Hence, we define bin to dec, a 4-bits properties are provided. The ACL2 script for the two ECCs, binary to decimal convertor. This is particulary helpful in and their associated properties, comprises of approximately ECCs where the error location is available in binary, and the 1000 lines of code, and can be downloaded from [57]. error correction mechanism makes use of the error locations indicated in decimal number system. 6.1 Hamming Codes Definition 3 Hamming codes [27] are among the first ECCs introduced. (def un bin to dec(abcd) They are systematic Single Error Correction-Double Error + (+ (if d 10) Detection (SEC-DED) codes, using (k 1) parity bits for protecting d data bits, to generate an n bits codeword. The (if c 20) th initial k-parity bits are allocated 2n bit positions (i.e., bit (if b 40) locations 1, 2, 4, 8,...) of the codeword. The last parity bit (if a 80))) takes up the last bit of the codeword. The remaining bits of the codeword represent the data. We verified some properties related to bin to dec in The calculation of the parity and the syndrome bits ACL2 as follows: involves modulo-2 additions. For the initial k bits, the J Electron Test (2020) 36:643–663 651

Table 1 Parity and Syndrome Bits involved in the calculation of parity/syndrome bits in Hamming Code

Parity/Syndrome position Bits on which Modulo-2 addition is applied

For Parity bits For Syndrome bits

1 3,5,7,9,11,13,... 1,3,5,7,9,11,13,... Pick 1 bit, skip 1 bit (starting from 1) 2 3,6,7,10,11,... 2,3,6,7,10,11,... Pick 2 bits, skip 2 bits (starting from 2) 4 5,6,7,12,13,... 4,5,6,7,12,13,... Pick 4 bits, skip 4 bits (starting from 4) sequence of this calculation involves the use of parity bits Definition 4 showninTable1, whereas, for the last parity/syndrome bit, (def un hamming7 − 4 − encode (x3 x5 x6 x7) all the preceding bits are used for calculation. The initial k parity bits are used for single error correction; the last parity (cons (xor (xor x3 x5)x7) bit provides double error detection capability. (cons (xor (xor x3 x6)x7) Although the error-correction capability of Hamming (cons x3 codes is very limited, they form the basis of several (cons (xor (xor x5 x6)x7) recent multiple error detection/correction codes [21, 52, 53]. They are among the most widely used error resilient (cons x5 codes in memories due to their smaller area overhead, (cons x6 latency, power requirement, and simpler combinatorial (cons x7 circuit architectures. They are often employed in space and (cons (xor.... military applications [30]. (xor (xor x3 x5)x7) We have formalized the Hamming (7,4) code. It protects a 4-bit data word, using 3 parity bits for single-error (xor (xor x3 x6)x7)) correction, and an additional parity bit for the double- x3) error detection. The gate level structure of the Hamming (xor (xor x5 x6)x7)) (7,4) code encoder is shown in Fig. 3. The parity bits are x5) determined as: x6) x7))))) p1 = d3 ⊕ d5 ⊕ d7 The decoder has a similar ACL2 structure as that of an p2 = d3 ⊕ d6 ⊕ d7 encoder. At the decoder, the syndrome bits corresponding to p4 = d5 ⊕ d6 ⊕ d7 the parity bits are calculated as: = ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ p8 p1 p2 d3 p4 d5 d6 d7(1)s1 = p1 ⊕ d3 ⊕ d5 ⊕ d7 s2 = p2 ⊕ d3 ⊕ d6 ⊕ d7 s4 = p4 ⊕ d5 ⊕ d6 ⊕ d7 The formal model of (7,4) hamming encoder, shown in = ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ Fig. 3, is described in ACL2 as follows: s8 p1 p2 d3 p4 d5 d6 d7 p8(2) In case of no error, all the syndrome bits are zero. A non- zero syndrome bits indicate the presence of error(s). In case of a single error, syndrome bit s8 is non-zero; the remaining syndromes (s1 s2 s4) give the binary location of the error. For instance, the syndromes (101) represent error at the fifth codeword location. The correction of error involves flipping the erroneous bit. In case of a double error, the syndrome bit s8 is zero; however, the remaining syndrome bits do not necessarily need to be zero. This indicates the presence of a double-error. Some important formally verified behavioral properties Fig. 3 Encoder for Hamming (7,4) Code of Hamming codes, as verified in ACL2 are as follows: 652 J Electron Test (2020) 36:643–663

Theorem 4.1 Theorem 4.3

(def thm hamm − SEC1 (def thm hamm distance4 (implies (implies (not (equal (equal 1 (bin to dec a b c d) (count element mismatch tlist (bin to dec e f g h))) (hamming7 − 4 − encode = (> (count element mismatch tlist x3 x5 x6 x7) − − (hamming7 4 encode (list x1 x2 x3 x4 x5 x6 x7 x8))) abcd) (equal (hamming7 − 4 − decode − − (hamming7 4 encode x1 x2 x3 x4 x5 x6 x7 x8)))) ef gh)) (list x3 x5 x6 x7) 4))) Theorem 4.4

The hamming distance for an SEC-DED Hamming (7, 4) (def thm hamm − SEC2 code is greater than or equal to 4, as stated in the above (implies (and (booleanp a) theorem. In other words, if data words (a b c d) and (e f g h) are different, at a single binary location, (booleanp b) the codewords (hamming7-4-encode a b c d) and (booleanp c) (hamming7-4-encode e f g h) must differ at least (booleanp d) at 4 binary codeword locations. (equal − − Theorem 4.2 (hamming7 4 encode a b c d) (list e f g h i j k l))) (and (equal (def thm hamm − NO − ERROR2 (hamming7 − 4 − decode (implies (equal ef (notg)hijkl) (hamming7 − 4 − encode (list a b c d)) x3 x5 x6 x7) (equal (list x1 x2 x3 x4 x5 x6 x7 x8)) (hamming7 − 4 − decode (equal ef gh(noti)jkl) (hamming7 − 4 − decode (list a b c d)) x1 x2 x3 x4 x5 x6 x7 x8) (equal (list x3 x5 x6 x7)))) (hamming7 − 4 − decode ef ghi(notj)kl) In case of no error, the codeword must be correctly (list a b c d)) decoded into data word, as described by the above theorem. (equal So, assuming (list x1 x2 x3 x4 x5 x6 x7 x8) (hamming7 − 4 − decode to be a codeword generated by encoding (x3 x5 x6 x7), decoding (x1 x2 x3 x4 x5 x6 x7 x8) must return the ef ghij(notk)l) original data bits, i.e., (x3 x5 x6 x7). (list a b c d)))))

Now, we formally verify the single error correction We split the single error correcting property of the Hamming property of Hamming codes as follows: codes into two theorems: Theorem 4.3 states that if an error J Electron Test (2020) 36:643–663 653 occurs in any of the parity bits of the codeword, it should be The convolutional codewords, or more specifically the removed during decoding, whereas Theorem 4.4 describes code since the length of code is arbitrary, comprise only that if an error occurs in any of the data bits of the codeword, of the output bits. The number of output bits determined it would be corrected by the decoder. using each input bit defines the Code Rate of the convolutional codes. In this work, we have considered 1/2 Theorem 4.5 rate convolutional codes, which produces two output bits for each input bit. (def thm hamm − DED The encoding process involves determining the output (implies (and (booleanp a) bits using modulo-2 addition on current and previous input bits, as dictated by the generator polynomial. The (booleanp b) ... convolutional code we have chosen in this work uses (booleanp l) the generator polynomials G1 = (111) and G2 = (equal 2 (101). Hence, the convolutional encoder, shown in Fig. 4, (count element mismatch tlist translates to a finite-state-machine (FSM), shown in Fig. 5, and generates the following output bits equations: (hamming7 − 4 − encode abcd) xn = dn−2 ⊕ dn−1 ⊕ dn (list e f g h i j k l)))) yn = dn−2 ⊕ dn (3) (equal Definition 5a (len (hamming7 − 4 − decode ef ghijkl)) (def un encode xor (old data) 1))) (if (endp old) nil Hamming codes are double-error detecting as well. A single- (let ∗ ((xn (xor (xor (first old) bit value is sufficient to indicate the presence of a double- (second old)) bits error. The double-error detection property of the Ham- (f irst data))) ming codes is hence verified by the above theorem, which states that if a codeword (efghijkl)has two (yn (xor (f irst old) errors, the presence of the errors is indicated by the decoder (f irst data)))) with a single-bit output, i.e., (equal (len (ham- (cons (cons xn yn) ming7-4-decode e f g h i j k l)) 1). (encode xor (cdr old) (cdr data)))))) 6.2 Convolutional Codes The (encode xor) function implements the convolu- tional encoder equations described above, using xor recur- Initially acknowledged for error correction in communica- sively until the entire data list old has been encoded, i.e., tion networks [58], the convolutional codes are now also being investigated for ensuring the memory chip reliability [25, 51]. In convolutional codes, each input (data) bit is used to generate multiple output (codeword) bits during encod- ing, which in turn are used to retrieve the data bits again via decoding. As expected of most error-correcting strategies used in communication networks, the coding and decoding procedures of convolutional codes generally happen sequen- tially over multiple clock cycles. However, combinatorial counterpart of the convolutional encoder and decoder are also available for use in memories [25]. Here, instead of considering input as a stream of bits, it is instead consid- ered as blocks of fixed length (i.e., the word length). The blocks act like sliding over-lapping sliding window to per- form the encoding and decoding operations. Output bits may be interleaved [50] to minimize data corruption in case Fig. 4 1/2 Rate Convolutional Encoder, with Generator Polynomials of soft errors. G1 = (111) and G2 = (101) 654 J Electron Test (2020) 36:643–663 until (endp old) becomes true. The codeword bits xn Definition 5c and yn, generated at each recursion are simultaneously con- catenated to form a pair using (cons xn yn). Hence, the final codeword is an association list, where each ele- ment of the association list (pair list) is a pair of xn and yn (def un decode xor (code) dn corresponding to the data bit . (if (endp (cdddr code)) Definition 5b nil (cons (or (and (xor (def un encoder conv xor (lst)  (xor (cdr (f irst code)) (encode xor (append (nil nil) lst) lst)) (cdr (second code))) (car (f irst code))) The definition (encoder conv xor) appends two nils (xor at the start of the data sequence lst, and sends it to the encoder definition (encode xor). (xor (xor (cdr (third code)) For decoder to start decoding at a known state, the (cdr (f ourth code))) encoding is generally initiated at State 00, i.e., with both (car (third code))) dn−1 and dn−2 as nil, as implemented by (encod- er conv xor). Likewise, the encoding terminates at the (xor (car (f if th code)) State 00 as well, which is practically implemented in the (cdr (f if th code))))) encoder by appending two nil bits at the end of data. (and (xor (car (third code)) In ACL2, we implemented this condition by ending our (cdr (third code))) recursion process when the sequence old, which has two (xor more elements than the data list, ends. This process of starting and ending the encoding process at the same state is (xor called tail-biting [25]. (xor (cdr (third code)) We followed the combinatorial decoder design as (cdr (f ourth described in [25]. For a 1/2 rate convolutional code, code))) decoding the current data bit involves the use of 2 predictions of the data bit: (car (third code))) (not (xor (car (f if th code)) dna = yn−1 ⊕ yn ⊕ xn−1 (cdr (f if th code))))))) dnb = xn+1 ⊕ yn+1 (4) (decode xor (cdr code)))))

In case of no error, the predictions dna and dnb must Definition 5d match, i.e., dna = dnb. A mismatch indicates the presence of an error. The erroneous bit is correctly restored by the use of future predictions dn+2a and dn+2b: (def un decoder conv xor (code)  dn+2a = yn+1 ⊕ yn+2 ⊕ xn+1 (decode xor (append ((nil nil)) code))) dn+2b = xn+3 ⊕ yn+3 (5)

Considering a single error condition, if dna = dnb then Our decoder definitions in ACL2 are similar to the either yn is corrupted or either of xn+1 or yn+1 is corrupted. encoder definitions. The function (decoder conv xor) If dn+2a = dn+2b, we can conclude that yn was the initializes the decoding operation at State 00, while corrupted. However, if dn+2a = dn+2b, the erroneous bit is (decode xor) continues decoding as a sequence of identified in the next decoder cycle. boolean operations. J Electron Test (2020) 36:643–663 655

Theorem 5.2

(def thm min − hamm − dist − conv (implies (and (true − listp A) (true − listp B) (alistp (encoder conv FSM A)) (alistp (encoder conv FSM (cons t B))) (alistp (encoder conv FSM (cons nil B)))) (equal (count element mismatch (append (encoder conv FSM A) (encoder conv FSM (cons t B))) (append (encoder conv FSM A) (encoder conv FSM (cons nil B)))) Fig. 5 FSM corresponding to 1/2 Rate Convolutional Code, with 5)) Generator Polynomials G1 = (111) and G2 = (101)

As indicated in [17], the minimum hamming distance between The important behavioral properties of Convolutional codewords in a 1/2 rate convolutional code is 5. For any codes that we verified are: data bits (append A ’(x) B),wherex is a boolean variable, the coderword generated by (append (encoder conv Theorem 5.1 FSM A)(cons t B)) and (append(encoder co nv FSM A) (cons nil B)) will differ at 5 binary locations – this corresponds to a hamming distance of 5. (def thm len − encoder − xor Theorem 5.3 (implies (true − listp lst) (equal (len (encoder conv xor lst)) (+ 2 (len lst))))) (defthm decode − decoder (implies (alistp code) The encoding process in 1/2 rate convolutional codes (equal (decode xor code) adds two extra (nil) bits to the end of data bits, i.e., the (cdr (decoder conv xor code))))) terminating nils. Since this addition of the terminating nils was not explicit in our definition, the above property verifies that two extra bits have been added to our data. We already Like encoding, the decoding is generally initiated at State know that any number of extra bits at the end of data 00. However, if decoding is not initiated at State 00, sequence lst will be nil because we considered lst to be the first data bit will be lost during decoding. This loss a true-list, i.e., (true-listp lst). The above theorem of data bit is indicated in the above theorem by (cdr verifies that the length of codeword (encoder conv xor (decoder conv xor code)), which simply repre- lst) increases by 2, when compared to the length of the sents that without the initial nil bits, the decoder will original list of data bits lst, due to the 2 pairs of output bits return all data bits except the first, i.e., (car (dec- (xn+1 yn+1) and (xn+2 yn+2). oder conv xor code)). 656 J Electron Test (2020) 36:643–663

Theorem 5.4 BOOL (cdr (decode xor (encode xor data (cddr data))))),whereBOOL represents the boolean relation of codeword bits given as follows: (def thm no error conv1 (implies (and (true − listp data) ((y0 ⊕ y1 ⊕ x0) . ((y2 ⊕ y3 ⊕ x2) ⊕ (x4 ⊕ y4))) + − (boolean listp data) ((x2 ⊕ y2) . ((y2 ⊕ y3 ⊕ x2) ⊕ (x4 ⊕ y4)) (6) (> (len data) )) 3 Finally, BOOL is rewritten into (equal (decode xor its correct data-based form (cadddr (encode xor data (cddr data))) data) using lemma-decode-encode- (cdddr data))) once. : instructions (: induct (: change − goal nil t) : prove : demote (: dv 12211) Theorem 5.5 (: rewrite encode − cdr) (defthm decode − bitf lip − encode − flg : up (: rewrite decode − cdr2) − : top(: dv 221) (implies (and (true listp data) (: rewrite (boolean − listp data) decode − first − element − code2) (booleanp f lg) (: dv 1) (> (len data) 3) : ( rewrite (zp n) lemma − decode − encode − once) (f lg)) : top : prove : prove)) (equal (decode xor (bit flip pair n f lg (encode xor data (cddr data)))) In case of no-error, the data bits are correctly decoded from the codeword. This is represented formally by (equal (decode (cons CODE ERR X0 xor (encode xor data (cddr data)))(cdddr (cdr (decode xor data)),where(cddr data) is the input data bits, (bit flip pair n f lg excluding the initial nil bits while (encode xor data (encode xor data (cddr data))))))) (cddr data)) is the codeword input to the decoder : : − − − function decode xor. We prove that the decoder output is instructions (( use lemma decode bitf lip (cdddr data) instead of simply data to eliminate the encode − flg) impact of dn−2 and dn−1 added to the input before encod- : demote (: dv 12) ing, and x − and y − supplied to the codeword before n 1 n 1 (: rewrite equal − cons − car − cdr) decoding. : top The above proof goal makes use of ACL2’s proof : prove : prove)) builder facility along with several previously proved lem- As a single codeword of a 1/2 rate convolutional code mas, i.e., small, intermediate properties, to complete the is composed of two bits xn and yn, an error can occur proof procedure. As mentioned in Section 3, the proof in either of these two bit locations. The above theorem defthm builder commands are supplied to the event represents an equivalent form of change in the codeword :instructions :induct using . applies a suitable due to an error injected via (bit flip pair n flg :dv induction scheme to the proof goal, dives to a (encode xor data (cddr data))) at xn of the 0th :rewrite clause/term inside the proof goal, replaces location of codeword. The error in bit x0 is ensured by the expression with an equivalent form using the indi- the premises (zp n) and flg. Error injection rewrites :prove cated (proven) lemma, and calls the ACL2 the data bit decoded from the erroneous codeword bit as automatic prover. The lemmas used in the above theorem the boolean expression CODE ERR X0, and concatenates (encode xor (cdr data) (cdddr initially rewrite it with the remaining data bits using the ACL2 keyword data)) , which is generated after the induction step, into cons. The expression CODE ERR X0 is mathematically (cdr (decode xor (encode xor a simpler form represented as: data (cddr data)))). Next, the lemma decode- ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ + first-element-code2 expands the simplified expres- ((y0 y1 x0) . ((y2 y3 x2) (x4 y4))) sion to an equivalent cons-based expression (cons ((x2 ⊕ y2) . ((y2 ⊕ y3 ⊕ x2) ⊕ (x4 ⊕ y4)) (7) J Electron Test (2020) 36:643–663 657

Here again, the proof goal makes use of the proof-builder 7.1 Record-Based Memory Model commands and the lemma lemma-decode-bitflip- encode-flg, which expands car of (decode xor Records [22] are data storing structures that can be accessed (bit flip pair n flg (encode xor data by the user for reading as well as writing data in memories. (cddr data)))) into the expression CODE ERR X0, A record can hence be considered as a simple abstraction and equal-cons- car-cdr, which coalesces the car for the memory. Such memory models have two basic and cdr of the stated cons pair. Similar to the previous operations: load, which reads data from a specific memory theorem, the premise (> (len data) 3) ensures that location, and store, which writes new data into the memory. the number of data bits is greater than 3, discounting the Both of these operations are formalized in ACL2 [36]as additional bits dn−2, dn−1 and (xn−1 . yn−1) added for the follows: encoding and decoding procedures. 7.1.1 Load: Retrieving data from Memory Theorem 5.6 The load byte function accepts the memory model and a (def thm SEC − flg memory location, and returns the value at the given location (implies (and (true − listp data) in the specified memory. The memory is modeled as an (boolean − listp data)) association list with byte-sized values at each list location. (equal CODE ERR X0 Definition 6a (f ourth data)))) (def un load byte (n mem) The above theorem formally verifies the error-correction property when an error is injected at the codeword location (if (zp n) x0. The expression CODE ERR X0 models the error injected (car mem) at the location x0 of the codeword. The correct data bit, i.e., (load byte (− n 1) (cdr mem)))) (fourth data) is successfully retrieved in the presence of a single error at x0. A similar procedure (i.e., Theorem 5.5 where n represents the memory location and mem specifies and 5.6) can be easily applied to demonstrate single-error the memory model. The load byte recursively calls itself correction while the error is injected at the bit y0. until the memory location n reduces to zero, i.e., (zp n) becomes true. The required value from the memory is then As described, in the decoder Equations 4 and 5,the returned at this point, using (car mem). decoding of a single data bit requires five codeword bits. The procedure used for the verification of Theorem 5.6 can 7.1.2 Store: Entering data into the Memory be further extended to correct an error occurrence in any of these five codeword bits in the decoding process. Hence, The store byte function accepts the memory model, the this demonstrates the ACL2 formalization of the single error value that needs to be stored in the memory, and the memory correction property of the 1/2 rate convolutional codes. location where this value must be entered. The updated memory is then returned at the output.

7 Case Study: Record-Based Memory Definition 6b (def un store byte (n mem byte) In order to demonstrate the effectiveness of our proposed (if (zp n) formalization of ECCs, described in Section 4,wepresent a formal error analysis on a record-based byte addressable (cons byte (cdr mem)) memory, presented in [36]. We first describe the formal (cons (car mem) memory model and then utilize our ECCs formalization (store byte along the memory model, to detect and correct errors that (− n 1) (cdr mem) byte)))) may occur during read and write processes in the memory. The complete formalization framework [57], including its where n represents the memory location, mem specifies implementation on the memory model, consists of more the memory, and byte is the value that needs to be than 1600 lines of Common Lisp code, including both stored in the memory at the designated location. Just like defun and defthm events, which take approximately load byte, the recursion is performed on the variable n. 1300 seconds of verification time on a MacBook (1.6 GHz Once (zp n) becomes true, the ACL2 built-in function Intel Dual-Core i5 CPU). cons concatenates byte at the designated memory location. 658 J Electron Test (2020) 36:643–663

7.2 Byte-Addressable Memory remains unchanged after consecutive store and load operations, (load byte n (store byte n mem The load and store functions defined above can work byte)). with data of any size. A byte-addressable memory is the type of memory where the smallest element that is stored Theorem 6.2 or retrieved from the memory is 8-bits long. To ensure that our memory model is byte-addressable, a recognizer byte-alistp is defined, which ensures that each value (defthm overwrite in our memory is byte-sized only. (implies (and (< n (len mem)) Definition 6c (byte − alistp mem) (equal (len byte1) 8) (equal (len byte2) 8)) (def un byte − alistp (mem) (equal (store byte (cond ((atom mem) (equal mem nil)) n (t (and (consp (car mem)) (store byte n mem byte1) (equal (len (car mem)) 8) byte2) (byte − alistp (cdr mem)))))) (store byte n mem byte2))))

This recursive function returns t if the memory model mem is entirely empty (i.e., (equal mem nil)) or each Overwriting replaces new data in place of the old one. element of mem has a length of 8 (i.e., each memory Hence, storing byte2 at the location where byte1was location holds one byte of value as indicated by (equal already stored, i.e., (store byte n (store byte n (len (car mem)) 8)). mem byte1) byte2), is the same as storing byte2atthe memory location directly, i.e., (store byte n mem 7.3 Memory Properties byte2). For overwriting in a byte-addressable memory, both byte1andbyte2 must be of equal lengths, i.e., 8- To ensure that the functions load byte, store byte, bits long, as indicated by (equal (len byte1) 8) and and byte-alistp fulfill the criteria of a memory model, (equal (len byte2) 8). the following properties are verified: Theorem 6.3 Theorem 6.1

(defthm store − changes − mem (def thm load − store (implies (and (< n (len mem)) (implies (and (< n (len mem)) (byte − alistp mem) (byte − alistp mem) (equal (len byte) 8) (equal (len byte) 8)) (not (equal byte (equal (load byte n (load byte n mem)))) (store byte n mem byte)) (not (equal mem byte))) (store byte n mem byte)))))

The above theorem formally verifies that data remains unchanged after storing and then retrieving from the The given property indicates that if the memory location memory. The above theorem states that given a valid mem- n did not already hold the value byte,storingbyte at ory location (< n (len mem)) in a byte-addressa- the memory location n, as shown by (store byte n ble memory (byte-alistp mem),abyte of data mem byte), will change the contents of the memory. J Electron Test (2020) 36:643–663 659

Theorem 6.4 abcd) (list e f g h i j k l))) (equal (def thm copy − paste − in − mem (hamming7 − 4 − decode ... (implies (and (< n1 (len mem)) (load byte n (< n2 (len mem)) (store byte n mem (not (equal n1 n2)) (list e f g h i j k l)))) (byte − alistp mem) (list a b c d))) ... ) (equal (len (load byte n1 mem)) 8)) The above theorem indicates that given a byte- (equal (load byte n2 addressable memory mem, a valid memory location n, (store byte n2 mem and a boolean codeword (list e f g h i j k l) (load byte n1 mem))) generated by data bits (a b c d) using hamming encoder (load byte n1 mem)))) (hamming-7-4-encode a b c d), the codeword stored in the memory is retrieved in its correct form in case Copying data from the memory location n1 and pasting it at of no error. The retrieved codeword decodes to the correct the memory location n2 produces a replica of information at data bits list a b c d. The proof procedure makes the two memory locations. This is represented by the above use of Theorem 6.1 (omitted in the text above for the sake theorem. Both n1andn2 must be valid memory locations of simplicity) to rewrite all instances of (load byte n i.e., (< n1 (len mem)) and (< n2 (len mem)), (store byte n mem (list e f g h ij k l)) for the property to hold true. Since store byte writes one ) by (list e f g h i j k l). byte of data at a time, an additional constraint that ensures Theorem 7.2 only a single byte of data is copied from location n1is (equal (len included in the premises of the theorem as (def thm mem − single − error − hamm (load byte n1 mem)) 8) . (implies (and (< n (len mem)) (byte − alistpmem) 7.4 Implementation of Hamming Codes (booleanp a) on the Memory Model (booleanp b) (booleanp c) The Hamming codes (7, 4), discussed in the previous (booleanp d) section, generate 8-bits codeword for a 4-bits data. Hence, (equal they are well-suited for a byte-addressable memory that can (hamming7 − 4 − encode stores and retrieves byte-sized codewords. We implemented abcd) the Hamming encoder and decoder definitions along with (list e f g h i j k l))) the memory load and store definitions to demonstrate that (...(implies the operation of Hamming codes is unchanged even when (equal implemented on a memory model. This is indicated by the following properties: (len (list e f g h i j k (not l))) 8) Theorem 7.1 (equal (hamming7 − 4 − decode... (load byte n (def thm mem − noerror − hamm (store byte n mem (list (implies (and (< n (len mem)) e f g h i j k (not l))))) (byte − alistp mem) (list a b c d))))...) (equal (len (list e f g h i j k l)) 8) In case of a single-bit error, the retrieved codeword from (booleanp a) memory contains error, i.e., if (list e f g h i j k (booleanp b) l) is the correct codeword, an error may cause (list e (booleanp c) f g h i j k (not l)) to be stored in the memory (booleanp d) instead. However, the hamming decoder can extract the (equal correct data bits from the codeword containing single-bit (hamming7 − 4 − encode error. 660 J Electron Test (2020) 36:643–663

7.5 Implementation of Convolutional Codes Similarly, Cotoleta library formalized in the Lean the- on the Memory Model orem prover for coding theory provides the formalization of Hamming code [26]. However, the scope of this work The 1/2 rate convolutional codes, discussed in Section 6.2, is focused on providing a purely mathematical implemen- can work with chunks of data of arbitrary number of bits. tation, as opposed to the hardware implementation of the To use our model of convolutional codes with the byte- block codes. addressable memory model, we assume the data to be 4-bits In contrast, our framework provides a more hardware- long, hence generating 8-bits codeword. realizable approach for ECC formalization. Our libraries formalize ECC encoders and decoders using boolean Theorem 7.3 operators, which can be visualized as the gate-level hardware semantics. Hence, it provides a meaningful − − (def thm mem noerror conv verification approach for ECCs in practical applications. (implies (and (< n (len mem)) Unlike the exiting works [2, 4] where errors were considered (byte − alistp mem) as probabilistic noise model, our framework formalizes (equal (len (cdddr data)) 8) errors in terms of a fault injection model, where an error (true − listp data) can occur at “any” bit location with a 100% probability. (boolean − listp data) This captures the behavior of soft errors in memories (> (len data) 3)) more closely, and aligns with our motivation of verifying (equal ECCs deployed in memories. Moreover, the previous works (decode xor focused on the formalization of solely the block codes [1–4, (load byte n 26]. Our framework, on other hand, provides an ECC library (store byte n mem catering both block and convolutional codes. (encode xor data (cddr data))))) (cdddr data))) 9 Conclusion and Future Directions In absence of any error, a codeword formed by the convolutional encoder encode xor is correctly retrieved Simulation has been the most widely opted testing met- from the memory, using Theorem 6.1. Also, the retrieved hodology for ensuring reliability of any given ECC codeword is decoded into the correct data bits sequence used with memory. Similarly, model checking has also (cdddr data), as dictated by Theorem 5.4. been experimented for analyzing the reliability of the data in memories. However, both of them shows lim- Our case study verifies that the operation of any ECC itations when analyzing large memory models. On the is memory/ technology independent – the memory model other hand, our proposed theorem proving-based verifi- does not effect the inherent behavior of the ECC. The cation approach provides formally verified ECC proper- error detection and correction properties of ECC remain ties on information bits/codewords of fixed and arbitrary consistent when the ECC is used on a byte-addressable lengths. We believe that our formal framework provides memory model. Moreover, unlike the traditional simulation- essential stepping stone to ECC verification used for mem- based ECC analysis, our proposed approach can be easily ories. extended for the verification of ECCs formalized for much In this research, we utilize ACL2 to formally analyze larger memories. the encoding, decoding, error detection and error correction properties of the Hamming and Convolutional codes. We also used record-based, byte-addressable memory 8 Comparison to Existing Works abstraction to establish that our ECCs models ensure data correctness even when implemented on a memory. As briefly highlighted in Section 3, an extension of SSRe- Hamming or convolution codes also form the basis of flect library based on Coq theorem prover was proposed several newer ECCs. For instance, the two-dimensional to provide formalization of ECCs, for ensuring error-free matrix codes, in simple terms, are only the implementation communication [1–4]. While the approach provided sub- of Hamming codes to the rows of the data-bits matrix while stantial efforts to ensure the “theoretical” reliability of ECCs its columns use simple parities [6]. Similarly, the widely deployed in noisy communication channels, the framework acknowledged Turbo codes are an extension of concatenated is difficult to visualize in the hardware implementation of Convolution codes [11]. This provides an edge for the ECCs. Hence, it remains inadequate to provide “practical” improvement and enhancement of our framework. Our reliability guarantees for ECCs. existing libraries can be used as the basis for the verification J Electron Test (2020) 36:643–663 661 of more advanced ECCs. This will ultimately lead to a more 17. Can B, Yomo H, De Carvalho E (2006) Hybrid Forwarding extensive framework for verification of ECCs not only for Scheme for Cooperative Relaying in OFDM based Networks. In: memories, but also the communication systems that widely Proceedings of International conference on communications, vol 10, IEEE, pp 4520–4525 employ ECCs too. Furthermore, the framework can also act 18. Clarke EM (1997) Model checking. In: Proceedings of Interna- as cog in the wheel for an all rounded memory verification tional Conference on Foundations of Technology and for real-world applications. Theoretical Computer Science. Springer, pp 54–56 19. Clarke EM, Wing JM (1996) Formal methods: State of the art and future directions. Comput Surv 28(4):626–643 20. Cota E, Lima F, Rezgui S, Carro L, Velazco R, Lubaszewski References M, Reis R (2001) Synthesis of an 8051-Like Micro-Controller tolerant to transient faults. J Electron Test 17(2):149–161 21. Das A, Touba NA (2018) Low Complexity Burst Error Correcting 1. Affeldt R, Garrigue J (2015) Formalization of Error-Correcting Codes to Correct MBUs in SRAMs. In: Proceedings of Great codes: from hamming to modern coding theory. In: Urban Lakes Symposium on VLSI. ACM, pp 219–224 C, Zhang X (eds) Proceedings of International conference on 22. Davis J (2006) Memories: Array-like records for ACL2. In: interactive theorem proving. Springer, LNCS, vol 9236, pp 17–33 Proceedings International workshop on the ACL2 theorem prover 2. Affeldt R, Garrigue J (2015) Formalization of Error-Correcting and its applications. ACM, pp 57–60 Codes using SSReflect. MI Lect Note Ser 61:76–78 23. Fey G, Sulflow A, Frehse S, Drechsler R (2011) Effec- 3. Affeldt R, Garrigue J, Saikawa T (2016) Formalization of tive robustness analysis using bounded model checking tech- Reed-Solomon Codes and progress report on Formalization of niques. Trans Comput-Aided Des Integr Circ Syst 30(8):1239– LDPC Codes. In: Proceedings of International Symposium on 1252 Information Theory and Its Applications. IEEE, pp 532–536 24. Frehse S, Fey G, Suflow A, Drechsler R (2009) Robustness Check 4. Affeldt R, Garrigue J, Saikawa T (2020) A library for for Multiple Faults using Formal Techniques. In: Proceedings formalization of linear Error-Correcting codes. Journal of of Euromicro conference on digital system design, architectures, Automated Reasoning, pp 1–42 Methods and Tools. IEEE, pp 85–90 5. Arbel E, Koyfman S, Kudva P, Moran S (2014) Automated 25. Frigerio L, Radaelli MA, Salice F (2008) Convolutional Coding detection and verification of parity-protected memory elements. for SEU mitigation In: Proceedings of European Test Symposium. In: Proceedings of International Conference on Computer-Aided IEEE, pp 191–196 Design. IEEE, pp 1–8 26. Hagiwara M, Nakano K, Kong J (2016) Formalization of Coding 6. Argyrides C, Pradhan DK, Kocak T (2011) Matrix codes for Theory using Lean In: Proceedings of International Symposium reliable and cost efficient memory chips. Trans VLSI Syst on Information Theory and Its Applications. IEEE, pp 522–526 19(3):420–428 27. Hamming RW (1950) Error detecting and error correcting codes. 7. Argyrides CA, Reviriego P, Pradhan DK, Maestro JA (2010) Bell Syst Techn J 29(2):147–160 Matrix-Based Codes for adjacent error correction. Trans Nuclear 28. Han H, Touba NA, Yang JS (2017) Exploiting unused spare Sci 57(4):2106–2111 columns and replaced columns to enhance memory ECC. Trans 8. Baarir S, Braunstein C, Encrenaz E, Ilie´ JM, Mounier I, Comput-Aided Des Integr Circ Syst 36(9):1580–1591 Poitrenaud D, Younes S (2011) Feasibility analysis for robustness 29. Hasan O, Tahar S (2015) Formal verification methods In: quantification by symbolic model checking. Formal Methods Syst Encyclopedia of information science and technology, 3 edn. IGI Des 39(2):165–184 Global, pp 7162–7170 9. Ballarin C, Paulson LC (1999) A Pragmatic Approach to 30. Hentschke R, Marques F, Lima F, Carro L, Susin A, Reis R Extending Provers by Computer Algebra—with Applications to (2002) Analyzing area and performance penalty of protecting Coding Theory. Fund Inf 39(1,2):1–20 different digital modules with hamming code and triple modular 10. Baumann R (2005) Soft errors in advanced com- redundancy. In: Proceedings of Integrated Circuits and Systems puter systems. Des Test Comput 22(3):258–266. Design. IEEE, pp 95–100 https://doi.org/10.1109/MDT.2005.69 31. Holler¨ A, Kajtazovic N, Preschern C, Kreiner C (2014) Formal 11. Berrou C, Glavieux A, Thitimajshima P (1993) Near Shannon fault tolerance analysis of algorithms for redundant systems in limit error-correcting coding and decoding: Turbo-codes 1 In: early design stages. In: Proceedings of Software Engineering for Proceedings of International conference on communications, vol Resilient Systems. Springer, pp 71–85 vol 2. IEEE, pp 1064–1070 32. Hsiao MY (1970) A class of optimal minimum Odd-Weight- 12. Binder D, Smith EC, Holman AB (1975) Satellite anomalies Column SEC-DED codes. J Res Dev 14(4):395–401 from galactic cosmic rays. Trans Nuclear Sci 22(6):2675–2680. 33. Hussein J, Swift G (2015) Mitigating Single-Event Upsets https://doi.org/10.1109/TNS.1975.4328188 Xilinx White Paper (WP395)(v1. 1). Available at: https:// 13. Bollig B, Wegener I (1996) Improving the variable ordering of www.xilinx.com/support/documentation/white papers/ OBDDs is NP-complete. Trans Comput 45(9):993–1002 wp395-Mitigating-SEUs.pdf 14. Boyer RS, Goldschlag DM, Kaufmann M, Moore JS (1991) 34. Isabelle Theorem Prover (2020) Available at: https://isabelle.in. Functional instantiation in First-Order logic. Academic Press, tum.de/ pp 7–26 35. Jiang JHR, Lee CC, Mishchenko A, Huang CY (2010) To SAT or 15. Burlyaev D, Fradet P, Girault A (2014) Verification-guided not to SAT: Scalable exploration of functional dependency. Trans voter minimization in triple-modular redundant circuits. In: Comput 59(4):457–467 Proceedings of Design, Automation & Test in Europe Conference 36. Kaufmann M, Sumners R (2002) Efficient Rewriting of Opera- & Exhibition. IEEE, pp 1–6 tions on Finite Structures in ACL2 16. Cabodi G, Murciano M (2006) BDD-based Hardware Verification. 37. Kaufmann M, Moore JS, Manolios P (2000) Computer-Aided In: Proceedings of International conference on formal methods for Reasoning: An Approach. Kluwer Academic Publishers the design of computer, Communication, and Software Systems. 38. Kaufmann M, Moore JS, Ray S, Reeber E (2009) Integrating Springer, pp 78–107 external deduction tools with ACL2. J Appl Log 7(1):3–25 662 J Electron Test (2020) 36:643–663

39. Kchaou A, Youssef WEH, Tourki R, Bouesse F, Ramos P, 54. Sanchez-Maci´ an´ A, Reviriego P, Maestro JA (2016) Combined Velazco R (2016) A deep analysis of SEU consequences SEU and SEFI Protection for Memories using Orthogonal Latin in the internal memory of LEON3 processor. In: Proceed- Square Codes. Trans Circ Syst I: Reg Papers 63(11):1933–1943 ings of Latin-American Test Symposium. IEEE, pp 178–178. 55. Seshia SA, Li W, Mitra S (2007) Verification-guided Soft Error https://doi.org/10.1109/LATW.2016.7483358 Resilience. In: Proceedings of Design, Automation & Test in 40. Kong J, Webb DJ, Hagiwara M (2018) Formalization of Europe Conference & Exhibition. IEEE, pp 1–6 insertion/deletion codes and the Levenshtein metric in lean. In: 56. Slayman CW (2005) Cache and memory error detection, Proceedings of Information Theory and Its Applications. IEEE, correction, and reduction techniques for terrestrial servers pp 11–15 and workstations. Trans Device Mater Reliab 5(3):397–404. 41. Krautz U, Pflanz M, Jacobi C, Tast HW, Weber K, Vierhaus https://doi.org/10.1109/TDMR.2005.856487 HT (2006) Evaluating Coverage of Error Detection Logic for 57. Verifi-ECC (2020) Available at: https://github.com/Mahum123/ Soft Errors using Formal Methods. In: Proceedings of Design Verifi-ECC.git automation & test in europe conference, vol 1. IEEE, pp 1–6. 58. Viterbi A (1971) Convolutional Codes and their Performance in https://doi.org/10.1109/DATE.2006.244062 Communication Systems. Trans Commun Technol 19(5):751–772 42. Lean Theorem Prover (2020) Available at: https://leanprover. 59. Zhang P, Muccini H, Li B (2010) A classification and comparison github.io/ of model checking software architecture techniques. J Syst Softw 43. Leveugle R (2005) A new approach for early dependability 83(5):723–744 evaluation based on formal property checking and controlled 60. Ziegler JF, Curtis HW, Muhlfeld HP, Montrose CJ, Chin B, mutations. In: Proceedings of International On-Line Testing Nicewicz M, Russell CA, Wang WY, Freeman LB, Hosier Symposium. IEEE, pp 260–265 P, LaFave LE, Walsh JL, Orro JM, Unger GJ, Ross JM, 44. Lin S, Costello DJ (1983) Coding for reliable digital transmission O’Gorman TJ, Messina B, Sullivan TD, Sykes AJ, Yourke H, and storage. Prentice-Hall, chap 1, pp 1–14 Enger TA, Tolat V, Scott TS, Taber AH, Sussman RJ, Klein 45. Lin S, Costello DJ (1983b) Error Control Codin: Fundamentals WA, Wahaus CW (1996) IBM Experiments in soft fails in and Applications. Pearson-Prentice Hall computer electronics (1978—1994). IBM J Res Dev 40(1):3–18. 46. Lvov A, Lastras-Montano LA, Paruthi V, Shadowen R, El-zein https://doi.org/10.1147/rd.401.0003 A (2012) Formal verification of error correcting circuits using computational algebraic geometry. In: Proceedings of Formal Publisher’s Note Springer Nature remains neutral with regard to Methods in Computer-Aided Design. IEEE, pp 141–148 jurisdictional claims in published maps and institutional affiliations. 47. May TC, Woods MH (1979) Alpha-Particle-Induced Soft errors in dynamic memories. Trans Electron Dev 26(1):2–9. https://doi.org/10.1109/T-ED.1979.19370 Mahum Naseer received her B.E. in Electronics Engineering degree 48. Nicolaidis M (1999) Time redundancy based soft-error tolerance from NED University of Engineering and Technology, and M.S. in to rescue nanometer technologies. In: Proceedings VLSI Test Electrical Engineering degree from National University of Sciences Symposium. IEEE, pp 86–94 and Technology (NUST), Pakistan, in 2016 and 2018 respectively. 49. Pierre L, Clavel R, Leveugle R (2009) ACL2 for the Verification of Her current research interests include reliability analysis of systems, Fault-tolerance Properties: First Results. In: Proceedings of Inter- error control coding, resilient systems, and formal methods for system national Workshop on the ACL2 Theorem Prover and Its Applica- verification. tions. ACM, pp 90–99, https://doi.org/10.1145/1637837.1637852 50. Radke WH (2011) Fault-tolerant non-volatile integrated circuit memory. US Patent 8,046,542 Waqar Ahmad received his Ph.D. and M. Phil degrees from National 51. Rastogi A, Agarawal M, Gupta B (2009) SEU Mitigation-using University of Sciences and Technology (NUST), Islamabad, Pakistan 1/3 Rate Convolution Coding. In: Proceedings of International and Quaid-i-Azam University, Islamabad, Pakistan in 2012 and 2017, Conference on Computer Science and Information Technology. respectively. He worked as a postdoctoral fellow at the Hardware IEEE, pp 180–183 Verification Group (HVG) of Concordia University, Montreal, Canada 52. Sanchez-Macian A, Reviriego P, Maestro JA (2012) Enhanced for two years from 2018 to 2019. Partly, he is volunteering as a detection of double and triple adjacent errors in hamming codes research associate with HVG at Concordia University and System through selective bit placement. Trans Device Mater Reliab Analysis and Verification (SAVe) Lab at NUST. His area of interest 12(2):357–362 includes formal reasoning and dependability analysis of safety-critical 53. Sanchez-Macian A, Reviriego P, Maestro JA (2014) Hamming systems. He published more than 20 research papers. He won the SEC-DAED and extended hamming SEC-DED-TAED codes young researcher award from Heidelberg Leureate Forum (HLF-18), through selective shortening and bit placement. Trans Device Germany, the best researcher awards from SAVE lab, in 2015 and Mater Reliab 14(1):574–576 2016, and the best paper award in WCE-11, London, UK. He is also a member of IEEE young professionals. J Electron Test (2020) 36:643–663 663

Osman Hasan received his BEng (Hons) degree from the University of Engineering and Technology, Peshawar Pakistan in 1997, and the MEng and PhD degrees from Concordia University, Montreal, Quebec, Canada in 2001 and 2008, respectively. Before his PhD, he worked as an ASIC Design Engineer from 2001 to 2004 at LSI Logic. He worked as a postdoctoral fellow at the Hardware Verification Group (HVG) of Concordia University for one year until August 2009. Currently, he is an Associate Professor and the Head of Department of Electrical Engineering at the School of Electrical Engineering and Computer Science of National University of Science and Technology (NUST), Islamabad, Pakistan. He is the founder and director of System Analysis and Verification (SAVe) Lab at NUST, which mainly focuses on the design and formal verification of energy, embedded and e-health related systems. He has received several awards and distinctions, including the Pakistan’s Higher Education Commission’s Best University Teacher (2010) and Best Young Researcher Award (2011) and the President’s gold medal for the best teacher of the University from NUST in 2015. Dr. Hasan is a senior member of IEEE, member of the ACM, Association for Automated Reasoning (AAR) and the Pakistan Engineering Council.