Efficient Checker Processor Design
Total Page:16
File Type:pdf, Size:1020Kb
Efficient Checker Processor Design Saugata Chatterjee, Chris Weaver, and Todd Austin Electrical Engineering and Computer Science Department University of Michigan {saugatac,chriswea,austin}@eecs.umich.edu Abstract system works to increase test space coverage by proving a design is correct, either through model equivalence or assertion. The approach is significantly more efficient The design and implementation of a modern micro- than simulation-based testing as a single proof can ver- processor creates many reliability challenges. Design- ify correctness over large portions of a design’s state ers must verify the correctness of large complex systems space. However, complex modern pipelines with impre- and construct implementations that work reliably in var- cise state management, out-of-order execution, and ied (and occasionally adverse) operating conditions. In aggressive speculation are too stateful or incomprehen- our previous work, we proposed a solution to these sible to permit complete formal verification. problems by adding a simple, easily verifiable checker To further complicate verification, new reliability processor at pipeline retirement. Performance analyses challenges are materializing in deep submicron fabrica- of our initial design were promising, overall slowdowns tion technologies (i.e. process technologies with mini- due to checker processor hazards were less than 3%. mum feature sizes below 0.25um). Finer feature sizes However, slowdowns for some outlier programs were are generally characterized by increased complexity, larger. more exposure to noise-related faults, and interference In this paper, we examine closely the operation of the from single event radiation (SER). It appears the current checker processor. We identify the specific reasons why advances in verification (e.g., formal verification, the initial design works well for some programs, but model-based test generation) are not keeping pace with slows others. Our analyses suggest a variety of improve- these challenges. As a result, design teams are growing ments to the checker processor storage system. Through larger, development costs are increasing, and time-to- the addition of a 4k checker cache and eight entry store market lengthens. Without significant advances in the queue, our optimized design eliminates virtually all core quality and speed of verification, the burden of verifica- processor slowdowns. Moreover, we develop insights tion will likely slow the rate at which designers can cre- into why the optimized checker processor performs well, ate higher-performance computing devices, a significant insights that suggest it should perform well for any pro- source of value in our industry. gram. 1.1. Dynamic Verification 1. Introduction Recently we proposed the use of dynamic verifica- tion to reduce the burden of verification in complex The enormous complexity of modern microprocessor microprocessor designs [1,2]. Dynamic verification is design presents significant challenges in the verification an online instruction checking technique that stems of these systems. Architects and designers must ensure from the simple observation that speculative execution that designs operate correctly for all possible programs, is fault tolerant. Consider for example, a branch predic- and they must ensure that this correct functionality is tor that contains a design error, e.g., the predictor array maintained for all (including adverse) operating condi- is indexed with the most significant bits of the PC tions. If they fail to meet these challenges, the repercus- (instead of the least significant PC bits). The resulting sions can destroy profit margins, products, and even design, even though the branch predictor contained a companies. In the worse case, failure to meet these chal- design error, would operate correctly. The only effect on lenges could even result in loss of life. the system would be significantly reduced branch pre- To avoid releasing faulty parts, designers spend con- dictor accuracy (many more branch mispredictions) and siderable effort on functional and electrical verification. accordingly reduced system performance. From the Unfortunately, the complexity of modern microproces- point of view of a correctly designed branch predictor sors makes this verification process both incomplete and check mechanism, a bad prediction from a broken pre- imprecise. The test spaces of stateful design are simply dictor is indistinguishable from a bad prediction from a too immense to fully test, necessitating the development correct predictor design. Moreover, predictors are not of ad hoc test generation and coverage analysis tools to only tolerant of permanent errors (e.g., design errors), point to where to look for bugs, and when to stop look- but also transient errors (e.g., noise-related faults or nat- ing. Moreover, the lack of any formality in system defi- ural radiation particle strikes). nition often leaves verification teams with a hazy Given this observation, the burden of verification in a definition of correctness. Formal verification [8] of a complex design can be decreased by simply increasing Core Processor Checker Processor Prediction EX/ stream: MEM insts, inputs, addresses, results IF ID REN REG SCHEDULER CHK CT non-speculative inputs Architected Reg/Mem Figure 1. Dynamic Verification Architecture the degree of speculation. Dynamic verification does computation, reducing the verification of the core to this by pushing speculation into all aspects of core pro- simply the process of locating and fixing commonly gram execution, making the architecture fully specula- occurring design errors that could adversely impact sys- tive. In a fully speculative architecture, all processor tem performance. Moreover, the simplicity of the communication, computation, control and forward checker processor design (which must be completely progress is speculative. Accordingly, any permanent correct) lends itself to high-quality functional and elec- (e.g., design error, defect, or failure) and transient (e.g., trical verification. In addition, dynamic verification may noise-related) faults in this speculation does not impact render other benefits and opportunities in the design of correctness of the program. Figure 1 illustrates the complex microprocessors. A number of promising approach. directions that we are currently exploring (additional To implement dynamic verification, a microproces- details are available in [1,2] ) include: reduced time-to- sor is constructed using two heterogeneous internal pro- market and design cost, SER and transient fault toler- cessors that execute the same program. The core ance, aggressive core circuitry implementations, and processor is responsible for pre-executing the program reduced core processor complexity. to create the prediction stream. The prediction stream consists of all executed instructions (delivered in pro- 1.2. Contributions of this Paper gram order) with their input values and any memory addresses referenced. In a baseline design, the core pro- In this paper, we examine in detail the performance cessor is identical in every way to the traditional com- of our initial checker processor design. We found that plex microprocessor core up to (but not including) the for many programs, slowdowns from the checking pro- retirement stage. In this baseline design, the complex cess are minimal, but for others (especially floating core processor is ‘‘predicting’’ values because it may point codes) slowdowns were non-trivial. We attribute contain latent bugs that could render these values incor- the primary source of these slowdowns to a) core pro- rect. cessor decoder stalls because of checker processor back- The checker processor follows the core processor, pressure at retirement, b) storage hazards created as the verifying the activities of the core processor by re-exe- core and checker processor compete for storage access cuting all program computation in its wake. The checker ports, and c) cache misses experienced by the checker processor is assumed to be correct since its simple pipeline. Our analyses suggest the addition of a dedi- design lends itself to easy verification (including formal cated checker processor register file and store queue to verification). The high-quality stream of predictions relieve any retirement backpressure. Remaining storage from the core processor serves to simplify the design of hazards and checker processor cache misses are elimi- the checker processor and speed its processing. Pre-exe- nated with the addition of a dedicated checker processor cution of the program on the complex core processor cache. Using the core processor as a (near) oracle eliminates all the processing hazards (e.g., branch prefetcher, a dual ported 4k checker processor cache has mispredictions, cache misses, and data dependencies) virtually no misses and provides sufficient bandwidth that slow simple processors and necessitate complex for checker processor accesses. The resulting design microarchitectures. In the event the core produces a bad demonstrates that online instruction checking can be prediction value (e.g., due to a design errors), the implemented with no slowdowns, and moreover, our checker processor will detect the bad value and flush all results suggest that the checker should perform well for internal state from the core processor and restart it after any program. the errant instruction. Once restarted, the core processor In Section 2 we give a brief description of the base- will resynchronize