The Bi-Mode Branch Predictor

The B&Mode Branch Predictor Chih-Chieh Lee, I-Cheng K. Chen, and Trevor N. Mudge EECS Department, University of Michigan 1301 Beal Ave., Ann Arbor, Michigan 48109-2122 {leecc, icheng, tnm} @eecs.umich.edu Abstract been pushed above 90%. As a result, two-level dynamic branch predictors have been incorporatedin several rcccnt Dynamic branch predictors are popular because they high-performance microprocessors. Perhaps the best can deliver accurate branch prediction without changes to known examples,at the time of writing, are the Pentium Pro the instruction set architecture or pre-existing binaries. [Gwennap and Alpha 21264 [Gwennap96]. However, to achieve the desiredprediction accuracy, exist- Among two-level predictors, those using global history ing dynamic branch predictors require considerable schemeshave beenshown to yield the best performancefor amounts of hardware to minimize the integerence effects integer benchmarks[YehPatt93]. However, to achieve high due to aliasing in the prediction tables. We propose a new levels of accuracy, current dynamic branch predictors rc- dynamic predictor, the bi-mode predictor, which divides the quire considerableamounts of hardwarebecause their most prediction tables into two halves and, by dynamically deter- significant weakness,the destructive aliasing problem, is mining the current “mode” of the program, selects the ap- most easily solved by increasing the size of the predictors propriate half of the table for prediction. This approach is [SechrestLeeMudge96].This paper proposesa new tech- shown to preserve the merits of global history basedpredic- nique, the bi-mode branchpredictor, that is economical and tion while reducing destructive aliasing and, as a result, im- simple enough to avoid critical timing paths. Furthermore, proving prediction accuracy. Moreover, it is simple enough we demonstratethat on the IBS and SPECCINT95 bench- that it does not impact a processor’s cycle time. We con- marks the bi-mode predictor performs on average better clude by conducting a comprehensive study into the mech- than gshare,one of the best global history basedpredictors, anism underlying two-level dynamic predictors and for the same cost. Finally, we conduct a comprehcnsivc investigate the criteria for their optimal designs. The anal- study into the mechanism underlying two-level dynamic ysis presented provides a general framework for studying predictors and investigate the criteria for their optimal dc- branch predictors. signs. The study explains why our proposed schemepcr- 1. Introduction forms well and provides a general framework for studying branch predictors. The ability to minimize stalls or pipeline bubbles that The report is organized into five sections. In section 2, may result from branchesis becoming increasingly critical we summarizethe aliasing problem, and then introduce our as microprocessordesigns implement greaterdegrees of in- solution for de-aliasing. Section 3 describesour simulation struction level parallelism. There are several techniquesfor methodology and presentsthe simulation results. In section reducing branch penaltiesincluding guardedexecution, ba- 4 we presentan analysis of aliasing in dynamic branch pre- sic block enlargement,and static and dynamic branch predictors that explains the source of the improved perfor- diction [PnevmatikatosSohi94, Hwu93, Smith81, mancefor the bi-mode predictor. Finally, in the conclusion FisherFreudenberger92, YehPatt91, PanSoRahmeh921. we proposefuture directions for this work. Among these, dynamic branch prediction is perhaps the 2. Aliasing and De-aliasing most popular, becauseit yields good results and can be im- plemented without changesto the instruction set architec- 2.1 The aliasing problem ture or pre-existing binaries. The strength of dynamic branch prediction is that it can Branch outcomesare not usually the result of random ac- track branch behavior closely at run-time, providing a de- tivities; most of the time they are correlated with past bc- gree of adaptivity that other approachesare lacking. This havior and the behavior of neighboring branches. By adaptivity is especially critical when behavior of branches keeping track of the history of branch outcomes,it is possi- can be affectedby the input data of different program runs. ble to anticipatewith a high degreeof certainty which direc- With the introduction of two-level schemes[YehPattgl], tion future brancheswill take. the prediction accuracy of dynamic branch predictors has However, current dynamic branchpredictors still exhibit 4 1072-445l/97$10.00@1997IEEE performancelimits. These are due in part to the restricted availability of information upon which to basepredictions, but more importantly due to shortcomingsof design, especially the way that branch outcome history is exploited. In current designs,dynamic predictors spendlarge amountsof hardware to memorize this branch outcome history. Each static (per-address)branch often has a biased behavior so that it is either usually taken or usually not-taken. This can be exploited by the conventional two-bit counter schemeto ‘l-l- , ChoicelPredictor predict future outcomesof a particular static branch. How- DirectioniTi +,A ever, two-bit counter schemesare limited becausebranches , may behavedifferently from their biasesunder somespecial conditions. These conditions are not difficult to recognize, t but recognition requires memory space. Therefore, to Final prediction for the branch achieve very high prediction accuracy,both the per-address bias and the special conditions need to be identified and Figure 1: Proposed branch prediction scheme memorizedby dynamic predictors. diagram Global history-the outcomes of neighboring branches-is a common way to identify special branch conditions. 2.2 Proposed branch prediction scheme Previous studieshave shown that the global history indexed schemesachieve good performanceby storing the outcomes The bi-mode branch predictor is aimedat the elimination of global history patternsin two-bit counters, e.g., the GAg of destructive aliasing in global history indexed schemes. and GAS schemes[PanSoRahmeh92, YehPatt921. Another This scheme, shown in Figure 1, splits the second-level way to identify special branch conditions is to use per-ad- two-bit counter table into two halves. Given a history pat- dresshistory-the past outcomesof a branch itself, such as tern, two counters, one from each half, are selected.We re- PAg and PASschemes [YehPatt91]. The per-addresshistory fer to these as the direction predictors. Meanwhile, another schemeis also shown to be effective, especially for loop-in- two-bit counter table, indexed by the branch addressesonly, tensive floating-point programs.However, as we noted ear- is used to provide a final selection for these two counters. lier, [YehPatt93] shows that, for integer programs, global The counter table providing selection will be referred to as history schemestend to perform better than per-addresshis- the choice predictor. The final prediction is then made by tory schemesbecause global schemescan make better pre- the state of the counter selectedfrom the direction predic- dictions for if-then-else branchesdue to their ability to track tors and, importantly, only the selectedcounter will be up- correlation with neighboring branches. dated with the branch outcome; the statusof the unselected Nevertheless,the global history schemeis still limited by one, will not be altered. The choice predictor is always up- destructive aliasing that occurs when two brancheshave the datedwith the branch outcome,except that when the choice same global history pattern, but opposite biases is opposite to the branch outcome but the selectedcounter [TalcottNemirovskyWood95,YoungGloySmithB.?i]. This is of the direction predictors makes a correct final prediction. not due to the limited availability of information, but to the This partial updatepolicy is particularly effective when the indexing method which does not discriminate between total hardwarebudget is small. brancheswith the sameglobal history patterns. Our proposed scheme can improve global history in- One proposal to overcome the destructive aliasing, dexed schemesbecause although global history patternsare gshare,randomizes the index by xor-ing the global history still kept in the second level tabIe, they are dynamically with the branch address [McFarling93]. It provides only classified before being stored. They are classified by a pre- limited improvement [SechrestLeeMudge96]. Recently, liminary prediction from the choice predictor which is sim- there have been several new proposals to reduce aliasing ply a conventional two-bit counter scheme,and, as such, problems [ChangEversPatt96, Sprangle97, typically can provide 80% or better prediction accuracy (MichaudSeznecUhlig971. The best of these with relatively modest cost. Thus, the bi-mode schemedi- lIvlichaudSeznecUhlig971 employs a hardware hashing vides branchesinto two groups accordingto the per-address scheme. A comparative study of these and the bi-mode bias of the choice predictor, and then usesthe global history schemecan be found in [Lee97]. The study showsthat hard- patterns to identify the special conditions for each of two ware hashing is useful for small low cost systems.For large groups separately.The effect of the choice predictor is to systems the bi-mode scheme is the best cost-effective separatethe destructive aliaseswhile keeping the harmless schemeto date. aliasestogether. 5 I ---- -- ___- 3. Experimental Results Benchmarks Input data

Load more