View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by Texas A&M Repository

IMPACTOFDYNAMICVOLTAGESCALING(DVS)ONCIRCUIT

OPTIMIZATION

AThesis

by

CARLOSALBERTOESQUITHERNANDEZ

SubmittedtotheOfficeofGraduateStudiesof TexasA&MUniversity inpartialfulfillmentoftherequirementsforthedegreeof

MASTEROFSCIENCE

May2009

MajorSubject:Engineering

IMPACTOFDYNAMICVOLTAGESCALING(DVS)ONCIRCUIT

OPTIMIZATION

AThesis

by

CARLOSALBERTOESQUITHERNANDEZ

SubmittedtotheOfficeofGraduateStudiesof TexasA&MUniversity inpartialfulfillmentoftherequirementsforthedegreeof

MASTEROFSCIENCE

Approvedby:

ChairofCommittee, JiangHu CommitteeMembers, SunilKhatri DuncanM.Walker HeadofDepartment, CostasGeorghiades

May2009

MajorSubject:ComputerEngineering iii

ABSTRACT

ImpactofDynamicVoltageScaling(DVS)onCircuitOptimization.(May2009)

CarlosAlbertoEsquitHernandez,B.S.,UniversityoftheValleyofGuatemala

ChairofAdvisoryCommittee:Dr.JiangHu

Circuit designers perform optimization procedures targeting speed and power duringthedesignofacircuit.Gatesizingcanbeappliedtooptimizeforspeed,while

DualVT and Dynamic Voltage Scaling (DVS) can be applied tooptimizeforleakage and dynamic power, respectively. Both gate sizing and DualVT are designtime techniques,whichareappliedtothecircuitatafixedvoltage.Ontheotherhand,DVS isaruntimetechniqueandimpliesthatthecircuitwillbeoperatingatadifferentvoltage thanthatusedduringtheoptimizationphaseatdesigntime. After some analysis, the risk of noncritical paths becoming critical paths at runtime is detected under these circumstances. The following questions arise: 1) should we take DVS into account duringtheoptimizationphase?2)DoesDVSimposeanyrestrictionswhileperforming designtime circuit optimizations?. This thesis is a case study of applying DVS to a circuitthathasbeenoptimizedforspeedandpower,andaimsatansweringtheprevious twoquestions.

Weuseda45nmCMOSdesignkitandflow.Synthesis,placementandrouting, and timing analysis were applied to the benchmark circuit ISCAS’85 c432. Logical

EffortandDualVTalgorithmswereimplementedandappliedtothecircuittooptimize iv

for speed and leakage power, respectively. Optimizations were run for the circuit operatingatdifferentvoltages.Finally,theimpactofDVSoncircuitoptimizationwas studied based on HSPICE simulations sweeping the supply voltage for each optimization.

TheresultsshowedthatDVShadnoimpactongatesizingoptimizations,butit didonDualVToptimizations.Itisshownthatweshouldnotoptimizeatanarbitrary voltage.Moreover,simulationsshowedthatDualVToptimizationsshouldbeperformed atthelowestvoltagethatDVSisintendedtooperate,otherwisenoncriticalpathswill becomecriticalpathsatruntime.

v

Tomyparents,AngelEsquitandMariaHortensiaHernandez(†Dec.2000),whohave

beenthegreatestparentsIcouldimagineandhavesupportedmeineverywaypossible.

ToAngel,Ivan,Lorena,Sandra,andthewholefamily,whohavehelpedmeinsomeor

otherwaythroughmylifeandhavecontributedtomysuccess.

vi

ACKNOWLEDGEMENTS

Firstofall,Iwouldliketothankmyadvisorandcommitteechair,Dr.JiangHu, forhissupportandhelpthroughoutthecourseofthisresearch.Ialsowanttothankmy committee members, Dr. Sunil Khatri and Dr. Hank M. Walker for the priceless knowledgeacquiredthroughtheirinterestinglectureswhilebeingtheirstudenthereat

TexasA&MUniversity.

ThankstoallthestaffmembersattheDepartmentof Electrical and Computer

Engineering for helping me with all the procedures and paper work throughout my graduateprogram.

To my friends, inside and outside the Department of Electrical and Computer

Engineeringforhelpingmeinavarietyofwaysacademicallyandatthepersonallevel duringmytwoyearslivinghereinCollegeStation.

SpecialthankstomyfriendMSc.VictorCorderoforallthehelpwiththeVLSI tools, encouragement and even waking me up early in the mornings to push me into work!,thankstohimalsoforthebrainstormingswhichshapedtheexperimentalsetupof thisresearchproject.

Lastbutnotleast,Iwanttothankallthestaffmembersattheofficeofsponsored studentsatTexasA&MUniversity,fortheirinvaluableandprompthelpwithawide rangeofissues. vii

NOMENCLATURE

DVS DynamicVoltageScaling

VT Thresholdvoltage

VDD Supplyvoltage

LE LogicalEffort

HDL HardwareDescriptionLanguage

PDP PowerDelayProduct

RDV Relativedelayvariation viii

TABLEOFCONTENTS

Page

ABSTRACT ...... iii

DEDICATION...... v

ACKNOWLEDGEMENTS...... vi

NOMENCLATURE ...... vii

TABLEOFCONTENTS ...... viii

LISTOFFIGURES ...... x

1. INTRODUCTION...... 1

2. BACKGROUND...... 3

2.1 DualVTTechniques ...... 3 2.2 DynamicVoltageScaling...... 5 2.3 LogicalEffortandGateSizing...... 8

3. PROBLEMFORMULATION ...... 9 4. EXPERIMENTALSETUP ...... 13 4.1 FromHDLtoLayout...... 14 4.2 InitialCircuitandDVSRange...... 15 4.3 AutomationToolforHSPICESimulations...... 16 4.4 SpeedOptimization...... 17 4.5 LeakageOptimization...... 18 5. RESULTS...... 21 5.1SpeedOptimizationResults...... 21 5.2LeakageOptimizationResults...... 25 5.3DynamicVoltageScalingImpactonCircuitOptimizations...... 26 ix

Page

6. CONCLUSIONSANDFUTUREWORK...... 36

REFERENCES ...... 39

VITA...... 43 x

LISTOFFIGURES

FIGUREPage

1 Pathdelaydistributionofacircuitaftersynthesis...... 4 2 PathdistributionofdualV TandsingleV TCMOS...... 5 3 Performancescaling ...... 6 4 FrequencytovoltagefeedbackcontrolloopforDVS...... 7 5 Pathdelaydistributionofacircuitbeforeandaftersizeoptimization...... 8 6 RelativeCMOScircuitdelayvariation(simulated) ...... 10

7 Designflowusedforthecasestudy...... 13

8 PseudocodefortheDualVToptimizerimplementedinC++...... 19

9 Delayofthefourmostcriticalpathsafterplace&route ...... 22

10 Delaysafterspeedoptimization,normalizedtothedelaysafter place&route ...... 23 11 CircuitspeedupforthewholeDVSrangeafterLEoptimization ...... 23 12 Delayvs.powerbeforeandafterspeedoptimizations...... 24 13 PowerDelayProductbeforeandafterspeedoptimizations...... 24 14 LeakagepowersavingsafterapplyingDualVToptimizations...... 26 15 DelayofallpathsforDVSrangebeforespeedoptimizations ...... 27 16 DelayofallpathsforDVSrangeafterspeedoptimizations...... 28 17 DelayofallpathsforDVSrangebeforeandafterspeedoptimizations.... 28 18 DelayofallpathsforDVSrangeafterDualVT optimizations atV DD =0.7V ...... 30 xi

FIGUREPage

19 DelayofallpathsforDVSrangeafterDualVToptimizations atV DD =0.9V ...... 31 20 DelayofallpathsforDVSrangeafterDualVToptimizations atV DD =1.1V ...... 32 21 CircuitslowdownforDVSrangeafterspeedandpoweroptimizations.... 33 22 ImpactofDVSafterDualVToptimizationsatthelowestand highestsupplyvoltagesintheDVSrange...... 35 23 ImpactofDVSafterDualVToptimizationsatthelowestand middlesupplyvoltagesintheDVSrange ...... 35

1

1.INTRODUCTION

Powerisoneofthemajorconcernsintoday’s VLSI circuit design. Leakage power has become an important source of power consumption, and nowadays its magnitudeiscomparabletothatofdynamicpowerforlargedesignsandkeepsgrowing exponentiallywithdevicescaling[1].Differenttechniqueshavebeendevelopedtocope withbothdynamicandleakagepowerconsumption.DynamicVoltageScalingisawell know technique for reducing dynamic power, while DualVTisoneofthetechniques usedforreducingleakagepower.Traditionally,circuitdesignershavetargetedspeedas the variable to optimize,but current VLSI design requires optimizations not only for speed, but for power as well. Optimization for power targets not only to high performance powerhungry designs, but also to lowerend circuits for lowpower consumer electronics so that battery life is extended. Gate sizing is a common technique to optimize for speed, and is often used in combination with lowpower techniquessuchasDualVT.Optimizationtechniquesforspeedandpowerhavebeen extensivelyresearched.Thedesignspaceformanyofthesetechniquescomprisesanyof thefollowingdimensions:supplyvoltage(V DD ),thresholdvoltage(V T),andgatesize.

Sometechniquesoptimizeusingonlyoneofthosedimensions,whileothersuseeither twoorthreedimensions,targetingsolutionsclosertotheoptimumpowerconsumption subjecttodeterminedspeedrequirements.

______ This thesis follows the style and format of IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems. 2

Techniquesin[2,4]usethethresholdvoltageastheonlydimensionforpower optimization.Thedesignspacefor[7,8]comprisessupplyvoltageV DD andthreshold voltage V T. In [5, 6] both threshold voltage and gate sizing are used to perform optimizations.Moreelaboratealgorithmsusethesupplyvoltage,thresholdvoltageand gatesizingastheirdesignspaceforoptimizationsasin[3].

In current VLSI design, a sequence of tasks for speed improvement, leakage power reduction, and dynamic power reduction is a typical scenario for circuit optimization.InthisthesiswepresentacasestudyoftheimpactofapplyingDynamic

VoltageScalingtoacircuitthathasbeenoptimizedforspeedandleakagepowerusinga gate sizing technique and a DualVT algorithm, respectively. The remainder of this thesisisorganizedasfollows.Section2presentsthebackgroundconceptsthataremost relevant to the problem formulation in section 3. In section 4 we present the experimentalsetupand methodologyusedtoperform thecasestudy.Theresultsare presented and discussed in section 5. Finally, section 6 presents the conclusions and futurework.

3

2.BACKGROUND

2.1.DUALVTTECHNIQUES

TounderstandtheadvantageofusingDualVTtechniques,wefirstneedtolook at the leakage current of a MOSFET device. (1) expresses the leakage current of a nMOSFET,whereI ds0isthecurrentatthresholdandisdependentonanddevice geometry, nisaprocessdependenttermaffectedbythedepletionregioncharacteristics and vT isthethermalvoltage(26mVatroomtemperature)[9]. Vgs –V T Vds nv T vT Ids= Ids0 e 1– e (1) (1)showsthattheleakagecurrentcanbereducedexponentiallybyincreasingthe thresholdvoltageV T.Theleakagepowerexpressedin(2)isalsoreducedexponentially.

P Leakage =I dsVDD (2) Nextweneedtolookatthesimpleα–powermode· lforthedelayofaCMOS gate[8]:

C L·VDD (3) Td≈ α K (V –V ) DD T

WhereC Listheloadcapacitance, Kisafactorthatdependsontheprocessand thegatesize,andαtakesanyvaluebetweenone,forfullvelocitysaturation,andtwo, fornovelocitysaturationasinlongchanneldevices[8].(3)showsthatincreasingV T producesanincreaseinthedelayT dofthegate.DualVTalgorithmsworkbasedon(1) and(3).Aftersynthesis,circuitshaveapathdistributionsimilartotheoneshowninFig.

1. The traditional approach to V Tselectionsreliesontheobservationthatacircuit’s 4

overall performance if often limited by a few critical paths [10]. Many paths are unnecessaryfasterthan thecriticalpath(s).The concept of DualVT algorithms is to slow down those paths by increasing the threshold voltage V T in as many gates as possible,aslongasthepathsdon’tbecomeslowerthantheoriginalcriticalpath(s).All gatesincriticalpath(s)arekeptwithlowvTsothatthemaximumfrequencyofoperation isnotaltered.Allgateswithincreasedv T willhavelessleakagecurrent,accordingto(1), thusreducingthetotalleakagepowerofthecircuitwithoutanyperformancepenalty.

# paths of

Fig.1.Pathdelaydistributionofacircuitaftersynthesis[10]

Assumingthatwehaveacircuitthathasalreadybeenoptimizedforspeed,the idea of applying a DualVT algorithmistoequalizethedelaysofallthepaths in the circuitasmuchaspossiblebyassigninghighvTgatesinallnoncriticalpaths.Ifall gatesinthecircuitareassignedtohighvT(singlehighvT)thenthecriticalpathofthe circuit would become larger than the original before highvT assignments. One importantcharacteristicisthatDualVTCMOShasthesamecriticaldelayasthesingle 5

lowvTcircuit[11],butconsumeslessleakagepowerduetothegatesreassignedtohigh vtinnoncriticalpaths.Fig.2showsanexampleofthischaracteristicfora32bit.

Fig.2.PathdistributionofdualV TandsingleV TCMOS[11]

2.2.DYNAMICVOLTAGESCALING

Inadditiontoleakagereduction,DynamicVoltageScaling(DVS)iscommonly usedindesigntoachievesubstantial dynamic power reduction. DVS allows devices to dynamically change their speed and voltage, increasing the energy efficiencyoftheiroperation[12].(4)expressesthedynamicpowerconsumptionfora

CMOSgate,whereαisanactivityfactor,Cisthecapacitancebeingswitchedand fis the operating frequency. Reducing the supply voltage V DD will offer a quadratic reductionindynamicpower.Atthesametime,from(3),reducingthesupplyvoltage willalsoincreasethedelayofthegate,thisimpliesthatthefrequencyofoperationhasto besloweddowninordertoavoidsetuptimeviolations.

2 (4) Pdynamic =αCVDD f 6

Slowing down the frequency without voltage scaling is not useful from an energyefficient viewpoint, since the power savings are offset by an equal increase in executiontime[14].TheideaofDVSinamicroprocessoristovarythesupplyvoltage under software control to meet dynamically varying performance requirements [13].

ThegoalofDVSistosavepowerbyreducingtheperformanceofthewithout causinganapplicationtomissitsdeadlines,thisconceptisshowninFig.3,inAthe workload runs at full speed and finishes well in advance of its deadline. In B, the executionoftheworkloadisstretchedtoitsdeadline,whichallowsforenergysavings onprocessorsthatimplementvoltagescaling.Completingataskbeforeitsdeadlineand thenidlingislessenergyefficientthanrunningthetaskmoreslowlytobeginwith,and meetingitsdeadlineexactly[14].

Fig.3.Performancescaling[14].

TheimplementationofDVSrequiresalgorithms,termed voltage schedulers to determinetheoperatingspeedoftheprocessoratruntime.Differentvoltageschedulers havebeencomparedin[13].Basically,voltageschedulersworkattheoperatingsystem levelanddeterminehowmuchtheoperatingfrequencycanbesloweddownwhilestill 7

meetingperformancerequirements.Voltageschedulerscontroltheprocessorspeedby writingthedesiredclockfrequencytoacontrolregister.Theregister’svalueisusedby acontrollooponchiptoadjusttheCPUclockfrequencyandregulatedvoltage[15].A voltagecontrolled oscillator (VCO) in the control loop generates a clock signal fCLK whichisproportionaltothecriticalpathdelayofthecoreoverprocessandtemperature changes[16].Fig.4showsanexampleofacontrolloopusedbyDVS.Somethingvery importanttomentionisthatmostDVScontrolloopdesignsuseaninverterbaseddelay chainorringoscillatortomodelthecriticalpathofacircuit.Ithasbeenshownthatthe delayofaninverterbasedringoscillatorshouldaccuratelytrackthedelayofthecritical pathasshownin[17].Thedelaysdonotneedtobeexactlythesame,aslongasthey track each other. A gain factor in the control loop can adjust the delay of the ring oscillatortomatchthedelayofthecriticalpath.

Fig.4.FrequencytovoltagefeedbackcontrolloopforDVS[15]

8

2.3. LOGICALEFFORTANDGATESIZING

Logicaleffortisadesignprocedureforachievingtheleastdelayalongapathofa logicnetwork[18].Themethodoflogicaleffortisfoundedonasimplemodelofthe delay through a single MOS logic gate. The model describes delays caused by the capacitiveloadthatthe logic gatedrives andbythe topology of the logic gate. All optimizations with this method are performed by gate sizing. Logical effort is a powerful tool for speed optimization due to its effectiveness and simplicity; these characteristicsmakeitagoodoptionforuseinverydifferentscenarios.In[19],logical effortisusedtooptimizeforspeedinasynthesisalgorithm.Themethodhasalsobeen usedinsoftwaretoolsasin[24].Asectionaboutlogicaleffortandtransistorsizingis presentedin[9].Allthedetailsaboutthealgorithmanditstheoryarepresentedin[18].

Inadesignflow,amethodlikelogicaleffortwouldtypically beappliedafter synthesistooptimizeforspeed,alsoreferredtoas size optimization .Bydoingso,the pathdistributionwouldchangefromtheoneshowninFig.1,toatallerandnarrower distribution,bothdistributionsareshowninFig.5foreaseofcomparison.

# paths of # paths of

Fig.5.Pathdelaydistributionofacircuitbeforeandaftersizeoptimization[10] 9

3.PROBLEMFORMULATION

LetusstudyaVLSIdesignscenariowhereacircuitisfirstoptimizedforspeed usingthemethodoflogicaleffortforgatesizing.Thespeedoptimizedcircuitisthen optimized for leakage power using a DualVT algorithm. Finally, dynamic voltage scaling(DVS)isusedfordynamicpowerreductionatruntime.Bothlogicaleffortand

DualVT aredesigntimetechniques,whichareappliedtothecircuitatafixedvoltage.

On the other hand, DVS is a runtime technique and implies that the circuit will be operatingatadifferentvoltagethanthatusedduringtheoptimizationswithlogicaleffort andDualVT.From(3)weseethatthedelayofaCMOSgateisafunctionofthesizeof thegate,thesupplyvoltage,andthethresholdvoltage.Asmentionedinsection2,the delay measurement in feedback loops used in DVS circuits works based on the assumptionthatcircuitdelaystrackwellovervoltageasshownin[16,17,20].The delaymeasurementofthecriticalpathisbasedonthatassumptionnomatterwhetherthe feedbackloopusesaringoscillatororanactualreplicaofthecriticalpath.Butthework in[15]showsthatsmalldelaytrackingvariationsexistovervoltagewhichimpactcircuit timing.In[15]threechainsofinvertersweresimulatedwhoseloadsweredominatedby gate,interconnect,anddiffusioncapacitancerespectively.Tomodelpathsdominatedby stackeddevices,afourthchainwassimulatedconsistingoffourpMOSandfournMOS transistorsinseries.Thebaselinereferenceisan inverter chain with a balanced load capacitance similar to the ring oscillator. With this setup, 4 structures modeling 4 differenttypesofpathsweresimulated.AsFig.6shows,thereisrelativedelayvariation 10

(RDV)fordifferentpathstructuresovervoltage,especiallyatlowvoltageswhenVDD is close to V T. The range of V DD between V T and 2V T is particularly interesting to our studysincewearegoingtobeoperatinginthatrangeasitwillbeexplainedandjustified insection4.

Fig.6.RelativeCMOScircuitdelayvariation(simulated)[15]

Fig.6showsapotentialriskforcircuitsusingDVSwithvoltagesoperatingclose to the threshold voltage V T.Theriskisthatnoncriticalpathscouldbecome critical paths due to the relative delay variation during voltage scaling at runtime. This situation presents a problem to DVS itself, since the control loop for regulating the supply voltage is based on delay measurements of the critical path, which would be undeterminedatruntime.Theonchipreplicaofthecriticalpathwouldjustnotreplicate the ‘new’ critical path. The risk is even higher after size optimization has been performedbecausethepathdelaydistributionisnarrowerasshowninFig.5. 11

AlthoughFig.6isshowingthatrelativedelayvariationsbetweenpathsdoexist, wedon’texpectthosetobelargeinthiscasestudysincethepathsinourcircuit(ISCAS

85c432)aremostlyallgatedominant(notmanydifferentpathstructuresinthecircuit thatcouldscaleatconsiderabledifferentrates).

Inadditiontotherelativedelayvariationriskjustdiscussed,optimizingusinga

DualVTalgorithmpresentsanotherrisk,withthesamefinalconsequences.A circuit will have a set of highvT gates and a set of lowvT gates after DualVT optimization.

Weneedtocalculatethepartialderivativeof(3)toshowtherisk.

α1 ∂ ∂ CL·VDD αC· L·V DD (V DD –V T ) (5) (T d)= = α 2α ∂V T ∂V K (V –V ) K (V –V ) T DD T DD T

(5)isexpressingtherateatwhichthedelayofaCMOSgatescaleswithrespect tothethresholdvoltageV T.Letusexpress(5)relativetothedelayofthegateT ditself sothatthedependenceoftherateofscalingonV T ismoreclear:

∂ (T ) d α ∂V T (6) = Td (V DD –V T ) From(6)wecanseethatthepartialderivativeofthedelayofagatewithrespect tothethresholdvoltage(andnormalizedtothedelayofthegateT d)isstillafunctionof thethresholdvoltage,whichmeansthatwhileapplyingvoltagescalingduetoDVS,the delayofgatesassignedtohighvtwillscaledifferentlythanthoseassignedtolowvt.In 12

fact,(6)statesthathighvTgateswillscalefasterthanlowvTgates.ApplyingaDualVT algorithmwiththepurposeofminimizingleakagepowerresultsinadistributionwith manypathsveryclosetothecriticalpath.Again,theriskisthatnoncriticalpathscould becomecriticalpathsduringvoltagescalingatruntime.

Thisthesisisacasestudytodeterminewhetherthisriskexistsinareal,practical deepsubmicronVLSI implementation. Iftheriskexists, DVS should be taken into accountduringdesigntimeoptimizationsandwouldprobablyrestrictthose.

13

4.EXPERIMENTALSETUP

ThecasestudyusedthebenchmarkcircuitISCAS85c432.Fromtheviewpoint oftheproblemformulatedinsection3,threefundamentaltasksneedtobeperformed:

Optimizeforspeed,optimizeforleakagepower,andperformDVStostudythebehavior ofthecircuit.ThewholedesignflowusedtoperformthosetasksisshowninFig.7.

1. ISCAS85 Applyspeed c432HDL 8. optimization algorithm (LogicalEffort) 2. Synthesis(DC) HSPICE 9. simulationfor 3. TimingAnalysis NO DVS (DC) YES Fastenough? Applyleakage 4. P&R(Encounter) 10. optimization algorithm (DualVT) 5. TimingAnalysis (PrimeTime) HSPICE 11. simulationfor DVS 6. Generate HSPICEdeck 12. Changecircuit voltage YES HSPICE 7. NO Optimizeforanother simulationfor voltage? DVS Finished

Fig.7.Designflowusedforthecasestudy 14

4.1.FROMHDLTOLAYOUT

ThestartingpointintheflowofFig.7isaHardware Description Language

(HDL)versionofthecircuit.WeusedtheVerilogfiledownloadedfrom[25].Synthesis and place & route are performed based on the FreePDK45nm design flow jointly developedatOklahomaStateUniversityandNorthCarolinaStateUniversity.Theflow offersallthelibrariesandscriptsnecessarytoperformsynthesisusing Synopsys Design

Compiler and place and route using Cadence Encounter . Libraries for static timing analysisusing Synopsys PrimeTime arealsoincluded.Thedesignflowandlibrariescan bedownloadedfrom[26].Weperformedseveraliterationsofsteps2and3fromFig.7 inordertogetasynthesizedcircuitfastenoughforthistechnology.Sincethiswasthe firsttimewewereusinga45nmflow,thefirstsynthesiswasperformedwitharbitrarily relaxedtimingconstraints,sothatsynthesissuccesswasguaranteedandwecouldgeta startingpointforresynthesizingwithmorerealistictimingconstraints.Afewiterations wereperformeduntilthesoftwaretoolwasnotabletogetasynthesizeddesign.The maximum frequency achieved was 700 MHz. After accepting 700 MHz as our base frequency,weproceededwithstep4inFig.7,thatisplace&routeusing Encounter .

Synopsys PrimeTime wasusedinnextstep,for statictiminganalysis. PrimeTime is morepowerfulandprecisethan Design Compiler intheareaofstatictiminganalysis, andafterthefirstiterationofplace&routewefoundoutthatthefrequencyofoperation ofthecircuitwas800MHzafterparasiticsbackannotation,sothatnoadditionalP&R iterationswereperformed.SynthesisandP&Rproceduresareautomatedbasedonthe scripts provided by [26]. A timing report was generated using PrimeTime , which 15

contains information of 15 distinctive paths. That is the set of paths used for all simulationsandmeasurementsinthiscasestudy.AHSPICEnetlistwasalsogenerated from the final placed & routed design using Encounter . The timing report and the

HSPICEnetlistarethebaseforallthefollowingstepsintheflowofFig.7(steps7to

12).

4.2.INITIALCIRCUITANDDVSRANGE

AfirstsetofHSPICEsimulationsisrunafterplaceandroutetodeterminethe circuit’s performance. This set consists of simulations for a supply voltage sweep through the range of DVS. The results from these simulations are the baseline for comparisonsafterspeedandpoweroptimizations.

Thecelllibrarydownloadedfrom[26]usesavoltageof1.1V.Allsupporting files,likethelibraryfileforusewith Design Compiler andtiminglibraryfor PrimeTime , are based on cell characterizations at that voltage. According to [21], the maximum allowablevoltagethatcanbeappliedtothegatethicknessusedinthisthesis(1.1nm), suchthatnomorethan100ppmoxidebreakdownratein10yearsoccurs,isbetween

1.0V and 1.2V (extrapolated), therefore the 1.1V was chosen as the upper bound for

DVSoperation.ThelowerboundforDVSislesscritical,sinceattheenditisabout tradingoffspeedforpower.ThezerobiasthresholdvoltagefornMOStransistorsinthe modelfilesfrom[26]is0.42V,andevenahighervalueisusedwhenapplyingDualVT optimizations (as presented in section 4.5). Under these circumstances, an arbitrary lowerboundof0.7VwaschosenforDVS. 16

4.3.AUTOMATIONTOOLFORHSPICESIMULATIONS

A tool written in C++ was implemented to automate some tasks related to

HSPICEsimulationstobeusedfromthispoint(step7inthewholeflow)totheendof theflowofFig.7.TheinputstothistoolaretheoriginalHSPICEdeckgeneratedby

Encounter andthetimingreportgeneratedby PrimeTime .Thetoolreadsallinputsand outputsfromtheHSPICEdeckofthecircuit,thenreadsallpathsfromthetimingreport, and generates decks for power measurement and delay measurement for all paths includedinthereport.AmasterHSPICEdeckisusedasatemplatetoputalldecks togetherbyusing‘.include’statementssothatsimulationscanberunusingthismaster deck.

Inthecaseofpowermeasurements,thetoolprovidesparametersforchoosingthe numberofpatternswantedtouseforpowermeasurementandthefrequencyatwhich thosepatternswanttobeapplied.Thetoolthengenerates a HSPICE deck with the randomvectorsandconnectsthemtothemaincircuit.Thetemplatedeckcontainsthe

‘.measurement’statementforpowersothatitisreadytobesimulatedandgetpower measurementresults.

Inthecaseofpathdelaymeasurements,thetoolreads all the paths from the

PrimeTime reportandgeneratesaHSPICEdeckwithallthesignalsrequiredtosensitize allpaths,italsogeneratesallthecorresponding‘.measure’statements.Thefrequency parameterisusedtodeterminethetimingforapplyingpatternstomeasurethedelayof eachpath.Byusingthetoolallpowermeasurementpatterns,andallinputsignalsto sensitize paths are automatically generated and connected at the deck level. All 17

‘.measurement’statementsareautomaticallygeneratedaswell.Thenallwhatisneeded to do is to run one simulation for the master power measurement deck and one simulationforthepathdelaymasterdeck.APerlscriptisusedtoreadtheinformation fromthe.mt0filegeneratedbyHSPICE,andgeneratesacommaseparatedvalue(.csv) filereadytobeimportedbyanystandardspreadsheettool.

4.4.SPEEDOPTIMIZATION

Speedoptimizationswereperformedusingthemethodoflogicaleffortpresented in[18].Themethodwasnotappliedmanually,butratheritwasautomatedbymeansof anothertoolwritteninC++.TheinputstothetoolaretheoriginalHSPICEdeckand two PrimeTime reports;thesametimingreportasintheprevioustool,plusacomplete nets report generated using the ‘report_net’ tcl command in a PrimeTime script.The outputfromthetoolisanewHSPICEdeckwhichconsistsoftheolddeckwithresized gatesthatoptimizeforspeedaccordingto[18].Thetoolworksonsinglepaths,sothat theuserselectsboththestartandendpointsofthepath.Thetoollooksforthespecified pathinthetimingreportandgeneratestheoptimizedHSPICEdeck.Anoutputtextfile isalso generated,whichcontainsinformationabout all the resized gates and detailed informationaboutfinalandintermediateresultsfromthemethodoflogicaleffort.

Noadditionaldetailsorpseudocodeabouttheoptimization algorithm will be includedhere,sincethealgorithmusedforspeedoptimizationinthisthesisisaprecise andstraightforwardimplementationofthemethodpresentedin[18].Allthedetailsand theoryaboutthealgorithmarepresentedin[18]. 18

HSPICEsimulationssweepingthesupplyvoltage are run after logical effort optimizationstoevaluatetheimpactofDVSonthecircuit.

4.5.LEAKAGEOPTIMIZATION

A DualVT optimization tool was implemented based on [2]. The tool was writteninC++.TheinputstotheDualVT optimizerarethe PrimeTime timingreport andasingleVTHSPICEdeck,whichforourflowissimplytheoutputfromthespeed optimizer.TheoutputisaHSPICEdeckusinggatesmappedtoaDualVTversionofthe libraryprovidedby[26].ThisDualVTversionwasimplementedbycreatingahighvT versionoftheMOSFETmodelcardsandaDualVTversionofallthecells,whichmap to the highvT MOSFET models. It is important to mention that this thesis is not proposing a leakage optimization algorithm. The tool was implemented with the objectivetohelpautomateandintegratetheleakageoptimizationtaskstoourflow.The toolwasdesignedtoperformDualVTleakageoptimizationtoallpathsincludedinthe timingreport,nottothewholecircuit.Thisisvalidandsufficientforthepurposesof thisstudysinceallmeasurementsarebasedonthesamesetofpaths.

ThepseudocodefortheDualVT algorithm used for leakage optimizations is showninFig.8.FortheselectionofthehighvTvalue,theruleofthumbin[22]was used.TheruleofthumbstatesthatthedifferencebetweenhighvTandlowvTshouldbe approximately0.1VsothattheleakagereductionfromDualVTisoptimal.Thekey concept behind the rule is that if the difference between the high and low threshold voltagesistoolarge,theleakagereductionachievedbyusingeachhighvTgateishigh, 19

butjustafewgatescanbeassignedtohightvT.Ontheotherhand,ifthedifference betweenthetwothresholdvoltagesistoosmall,manygatescanbereassignedtohigh vT,buttheleakagereductionachievedbyeachgatewillbesmall.

HighVTAssignment { CreateacopyoftheinputHSPICEdeck(tempdeck) Foreachpath i { Foreachgate j { AssignhighvTto j intempHSPICEdeck LaunchtempdeckHSPICEsimulationandmeasuretheslacks δ1…δ natalloutputs1…n If(δ m>0,for1<m<n) keep jathighvTintempdeck Else Reassign j tolowvTintempdeck } } WritetempHSPICEdecktooutputHSPICEdeck }

Fig.8.PseudocodefortheDualVToptimizerimplementedinC++

ThecircuitwassettooperateatthelowestvoltageintheDVSrange(0.7V).The firstDualVToptimizationwasrunatthisvoltage,correspondingtostep10inourflow.

ThenHSPICEsimulationsarerunsweepingthesupplyvoltagethroughthewholeDVS range.ThemasterHSPICEdeckfiledescribedinsection4.3isusedtoautomatically sensitizeallthepathsandmeasureallthecorrespondingdelays.Afterthis,thevoltageis settoadifferentvalueinstep12,andaloopisperformedbygoingbacktostep10in 20

ourflow.EachiterationatadifferentvoltageisusedtostudytheimpactofDVSon differentcircuitoptimizations(DualVToptimizationsatdifferentvoltages).Theloop wasiterated3times,forgatheringdataofthecircuitoperatingat0.7V,0.9Vand1.1V.

Those voltages correspond to the lowest, middle, and highest values within the DVS range.

21

5.RESULTS

ThecasestudyflowofFig.7wasappliedtothebenchmark circuit ISCAS85 c432 using the 45nm design kit provided by Oklahoma State University and North

Carolina State University [26]. The PrimeTime timing library in [26] needed to be recompiledduetocompatibilityissueswithsoftwareversionscurrentlyinstalledonthe electricalengineeringserversatTexasA&MUniversity.Subsections5.1and5.2will firstshowtheresultsvalidatingtheoptimizationalgorithmsusedforspeedandleakage power.TheresultsfortheimpactofapplyingDVSwillbepresentedinsection5.3.

5.1.SPEEDOPTIMIZATIONRESULTS

Thealgorithmoflogicaleffortwasappliedtothefourmostcriticalpathsoutof thesetof15pathsusedinthiscasestudy.Optimizationforallotherpathswasnot necessary since all of them were shorter than the most critical path even after speed optimization.Acircuitspeedupbetween14%and17%wasachievedthoughthewhole

DVS range. Fig. 9 shows the delay of the four most critical paths before any optimization. Fig. 10 shows the delays after speed optimization, normalized to the originaldelaysafterplaceandroute.Fig.11showsthecircuitspeedupachievedforthe

DVSrange.Itisshownthatcircuitspeedupdecreasesforincreasingsupplyvoltage;this ismostlyduetogatedelayscalingwithoutcorrespondingwiredelayscaling.Thetool implemented according to the method of logical effort optimizes for speed by gate sizing,butdoesnotperformanyinterconnectdelayoptimizations.Thelargerthesupply 22

voltage,thesmallerthegatedelay,butthewiredelay remains constant.Theoverall circuitspeedupisthensmallerforlargersupplyvoltages.

Themethodoflogicaleffortexplainshowtodesignapathformaximumspeed, butdoesnoteasilyshowhowtodesignapathforminimumareaorpowerunderafixed delayconstraint[18].Wearenottargetingtheminimumenergydesign,sincetoachieve thatweshouldfollowathreedimensiondesignspaceprocedure,optimizingforsupply voltage,gatesizingandthresholdvoltageasin[3].Aprocedurelikethatdoesn’ttake into account DVS. This case study uses logical effort for speed optimization only, withoutanypowerconstraints.Fig.12andFig.13showthespeedpowertradeoffdue tologicaleffort.

Delay of Most Critical Paths for DVS Range after Place & Route 3.00 2.50

Path #1 (most critical) 2.00 Path #2 Path #3 1.50 Delay(ns) Path #4 1.00 0.50 0.6 0.7 0.8 0.9 1.0 1.1 1.2 Vdd (V)

Fig.9.Delayofthefourmostcriticalpathsafterplace&route

23

Delays After Speed Optimization (normalized to the delays after Place & Route) 0.90 0.88 Path #1 (most critical) 0.86 Path #2 Path #3 0.84 Path #4 Normalized Delay 0.82 0.80 0.6 0.7 0.8 0.9 1.0 1.1 1.2 Vdd (V)

Fig.10.Delaysafterspeedoptimization,normalizedtothedelaysafterplace&route

Circuit Speedup for DVS Range After Logical Effort Optimizations

17

16

15 % Speedup

14

13 0.7 0.8 0.9 1.0 1.1 Vdd (V)

Fig.11.CircuitspeedupforthewholeDVSrangeafterLEoptimization 24

Delay vs. Power Consumption Before and After Logical Effort (LE) Optimizations (5 points per curve, for Vdd = 0.7, 0.8, 0.9, 1.0 and 1.1 V )

3.00

2.50

2.00 Before LE After LE 1.50 Delay (ns) Delay

1.00

0.50 0 100 200 300 400 Power (uW) Fig.12.Delayvs.powerbeforeandafterspeedoptimizations

Power-Delay Product vs. Vdd Before and After Logical Effort (LE) Optimizations

351 301 251 201 Before LE 151 After LE

PDP (fJ) 101 51

1 0.6 0.7 0.8 0.9 1.0 1.1 1.2 Vdd (V)

Fig.13.PowerDelayProductbeforeandafterspeedoptimizations 25

5.2.LEAKAGEOPTIMIZATIONRESULTS

DualVToptimizationswererunforthecircuitoperatingatthelowest,middle, andhighestvoltageswithintheDVSrange(0.7Vto1.1V).Fig.14showstheleakage powersavingsafterapplyingtheDualVToptimizationtoolimplementedinC++.Power savings can be as large as 52% when operating at a supply voltage of 0.7V and optimizingwithDualVTatV DD =1.1V.OptimizingwithDualVTwhenthecircuitis operatingatvoltageshigherthan0.7Vmaynotbeadvisableasitwillbeshowninthe nextsection.Practicalpowersavingsarebetween17%and36%fortheDVSrange.

DualVT optimization is expected to achieve larger power savings for larger circuits,asshownin[2].Theresultsshowthatthe larger the operating voltage, the smallertheleakagepowersavings,thisisduetothe combination of two effects; the drain-induced barrier lowering (DIBL),whichreducesV Twithlargersupplyvoltages, andthesensitivityofleakagepowertoV T,whichisproportionaltoleakagepoweritself

[2].IncreasingthesupplyvoltagewillreduceV Tandincreasetheleakagecurrentdue toDIBLeffectasshownin[23].Thenthefixeddifferenceof0.1Vforthezerobias thresholdvoltagebetweenhighvTandlowvTgateswillofferdifferentpowersavings dependingonthefinalvalueofV TduetoDIBLeffect.

The Figure also shows that optimizing at higher voltages offers larger power savings. (2)showsthat thelargerthesupply voltage the smaller the impact of using highvTgatesongatedelay.Thisimpliesthatthehigherthevoltage,themoretheroom forassigninggatestohighvTwithoutproducingtimingviolations.Thenmoreleakage powersavingscanbeachieved. 26

Leakage Power Savings for DVS Range after Dual-Vt optimizations

55 50 45 40 Dual-Vt optimization at 35 Vdd = 1.1 V 30 Dual-Vt optimization at 25 Vdd = 0.9 V 20 Dual-Vt optimization at 15 Vdd = 0.7 V 10 % Power Savings Leakage 5 0 0.7 0.8 0.9 1.0 1.1 Vdd

Fig.14.LeakagepowersavingsafterapplyingDualVToptimizations

5.3.DYNAMICVOLTAGESCALINGIMPACTONCIRCUITOPTIMIZATIONS

TostudytheimpactofDVSoncircuitoptimization,severalHSPICEsimulations were run sweeping the supply voltage (V DD ) through the DVS range for the original circuitafterplaceandroute,andafterspeedandleakagepoweroptimizationsatdifferent operating voltages. The objective was to evaluate the risk of noncritical paths becomingcriticalpathsatruntime,aspresentedinsection3,andtodetermineifthereis anypatternthattellsushowtakeDVSintoconsiderationduringcircuitoptimizations,in casethatisnecessary.Figs.15,16and17showtheresponseofthecircuittoDVSafter speedoptimizationsusinglogicaleffort.TheFiguresshowthedelayofall15pathsin thesetusedforthecasestudy.Thesetofpathswassortedwiththecircuitoperatingat thehighestvoltage,startingwithnumberone,assignedtothelongestpath,tonumber 27

15,assignedtotheshortestpath.Fig.15showsthat different paths scale differently whileapplyingDVS.Thisrelativescalingisconsistentwiththebehaviorpredictedby

Fig.6intheproblemformulationsection(section3).Fig.15showsthatinitially(at1.1

V)thesetofpathsismonotonicallydecreasing,butpathsscaledifferently alongwith

DVSandthesetisnot monotonicallydecreasing anymore. Fig. 16 shows that after speed optimizations, the relative delay variation (RDV) between paths is reduced in somecasesandslightlyincreasedinothers.Figs.15and16showthatalthoughrelative delayvariationsexistbetweenpaths,thoseRDV’saresmallforlongpaths(<7%)anddo notrepresentanyriskfornoncriticalpathsbecomingcriticalpathswhileapplyingDVS.

Fig.17summarizestheresultsbyshowingonlythehighestandlowestvoltagesinthe

DVSrangeandthemostrelevantRDV’s.

Delay of all Paths for DVS Range Before Optimizations

3.00

2.50 6% RDV

2.00 Vdd = 0.7 V Vdd = 0.8 V 1.50 Vdd = 0.9 V Vdd = 1.0 V Delay (ns) Delay 1.00 Vdd = 1.1 V 0.50 0.00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Path

Fig.15.DelayofallpathsforDVSrangebeforespeedoptimizations 28

Delay of all Paths for DVS Range After Speed Optimizations

2.50 2% RDV 2.00 Vdd = 0.7 V 1.50 Vdd = 0.8 V Vdd = 0.9 V 1.00 Vdd = 1.0 V

(ns) Delay Vdd = 1.1 V 0.50 0.00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Path

Fig.16.DelayofallpathsforDVSrangeafterspeedoptimizations

Delay of all Paths Before and After Speed Optimizations using Logical Effort (LE) 3.00 2.50 6 % RDV 2.00 Before LE, Vdd = 0.7 V Before LE, Vdd = 1.1 V 1.50 After LE, Vdd = 0.7 V 2 % RDV 11 % RDV After LE, Vdd = 1.1 V

Delay (ns) Delay 1.00 16 % RDV 0.50

2 % RDV 14 % RDV 0.00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Path

Fig.17.DelayofallpathsforDVSrangebeforeandafterspeedoptimizations 29

Figs.18to23showtheresultsofapplyingDVStothecircuitafterbothspeed and leakage power optimizations performed to the circuit operating at the lowest, middle, and highest voltages in the DVS range. Figs. 18 to 20 present the same informationasFigs.15to17didforspeedoptimizations,thatis,theypresentthedelay ofallpathsforthewholeDVSrange.Thepathsaresortedexactlythesameasbefore, path#1isthemostcriticalpathafterplace&route,andpath#15istheshortestpath afterplace&route.Fig.18showsresultsafterDualVToptimizationsforthecircuit operatingat0.7V.ThedelayequalizationduetotheDualVToptimizationisclearinthe curveforthecircuitoperatingat0.7V(curveinblue).Thereare7pathsverycloseto themostcriticalpathaftertheoptimization.Allotherpaths(path8to15)aremuch shortereven afterhighvTassignments.From(3)and(6)weexpectthatthedelayof highvTgateswillscaleup(becomeslower)atahigherratewithdecreasingV DD than thosewithlowvT.Similarly,thehighvTgatedelayswillscaledown(becomefaster)at ahigherratewithincreasingV DD thanthosewithlowvT.Thisbehaviorisobservedin

Fig. 18, where half of the curve is very flat (paths one to seven) due to the delay equalization at V DD =0.7V(bluecurve).Increasingthesupplyvoltage from 0.7V to

1.1VmakesthedelayofpathswithhighvT gatesscaledownfasterthanthosepaths withouthighvTgates(orwithfewerhighvTgates).Thisisobservedastheflatsection ofthecurveoperatingat0.7Vgoesscalingdowntothepurplecurveoperatingat1.1V, where paths two to seven have scaled down apart from the most critical path, which doesn’thaveanyhighvTgate(duetothenatureofDualVTalgorithms).Allpathsother thanthemostcriticalpathhavezeroormoregatesassignedtohighvT,whichmeansthat 30

allpathsscaledownatahigherratethanthemostcriticalpathwhilesweepingupthe supply voltage through the DVS range. This implies that the most critical path will alwaysbethelongestpathinthecircuitasshowninFig.18.

Delay of all Paths for DVS Range After Dual-Vt Optimizations at Vdd = 0.7 V

2.40 2.20 2.00 1.80 Vdd = 0.7 V 1.60 Vdd = 0.8 V 1.40 1.20 Vdd = 0.9 V 1.00 Vdd = 1.0 V Delay (ns) Delay 0.80 Vdd = 1.1 V 0.60 0.40 0.20 0.00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Path

Fig.18.DelayofallpathsforDVSrangeafterDualVToptimizationsatV DD =0.7V

Fig.19showstheresultsforDualVToptimizationswhenthecircuitisoperating at 0.9V. Again, the delay equalization is clear for the green curve representing the circuitoperatingatV DD =0.9V.PathswithhighvTgatesscaledownatahigherrate thanthemostcriticalpathasinthepreviouscase,thatisforsupplyvoltagesbetween 31

0.9Vand1.1V.Ontheotherhand,highvTpathswillscaleupatahigherrateforsupply voltageslowerthan0.9V.Fig.19showsthatthereisaproblemwiththat,sincenon criticalpaths2,3and4havebecomecriticalpathswhilescalingthesupplyvoltage,all ofthemhavebecomelongerpathsthantheoriginalcriticalpathatdesigntime(path1).

Fig.20showsthatpath1isnotthemostcriticalpathanymoreforoperatingvoltages lowerthan1.1VwhenoptimizingwithDualVTatV DD =1.1V.

Delay of all Paths for DVS Range After Dual-Vt Optimizations at Vdd = 0.9 V

2.80 2.60 2.40 2.20 2.00 Vdd = 0.7 V 1.80 1.60 Vdd = 0.8 V 1.40 Vdd = 0.9 V 1.20 Vdd = 1.0 V

Delay (ns) 1.00 Vdd = 1.1 V 0.80 0.60 0.40 0.20 0.00 1 2 3 4 5 6 7 8 9 101112131415 Path

Fig.19.DelayofallpathsforDVSrangeafterDualVToptimizationsatV DD =0.9V

32

Delay of all Paths for DVS Range After Dual-Vt Optimizations at Vdd = 1.1 V 3.00

2.50

2.00 Vdd = 0.7 V Vdd = 0.8 V 1.50 Vdd = 0.9 V Vdd = 1.0 V

Delay (ns) 1.00 Vdd = 1.1 V 0.50 0.00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Path

Fig.20.DelayofallpathsforDVSrangeafterDualVToptimizationsatV DD =1.1V

The DualVT approach as applied in this case study has the only purpose of reducingleakagepower.OptimizingwithDualVTshouldhavenoimpactoncircuit delay;thatis,themostcriticalpathshouldbeexactlythesamebeforeandafterDualVT optimizations.Moreover,thecircuitspeedupduetoDualVToptimizationsshouldbe zero at any operating supply voltage. That is a key concept for the algorithm as presented in [2]. Results presented in Figs. 18, 19, and 20 show that the circuit is actuallyexperiencingaslowdownforsomeoptimizationcasesasshowninFig.21. 33

Circuit Slowdown for DVS Range After LE and Dual-Vt Optimizations at Different Voltages (baseline: Circuit After LE)

25

20

15 Dual-Vt at Vdd = 0.7 V 10 Dual-Vt at Vdd = 0.9 V

% Slowdown Dual-Vt at Vdd = 1.1 V 5 0 0.7 0.8 0.9 1.0 1.1 Vdd (V)

Fig.21.CircuitslowdownforDVSrangeafterspeedandpoweroptimizations

Fig.21showsthatoptimizingwithDualVTatV DD =0.7Vistheonlycasethat doesn’tproducecircuitslowdown.Intheothertwocases(optimizingwithDualVTat supplyvoltagesof0.9Vand1.1V)thecircuitissloweddownasmuchas25%.The slowdownisactuallyduetononcriticalpathsatdesigntimethatturnintocriticalpaths whilesweepingthesupplyvoltagetosimulateDVS.BycomparingFig.11withFig.21 itisrevealedthatoptimizingwithDualVTatsupplyvoltageshigherthan0.7Vresultsin a circuit that is slower than the original circuit after place & route, before speed optimizations.Forexample,the17%speedupachievedafterlogicaleffortoptimization isoverriddenbythe25%slowdownwhenthecircuitisoperatingatVDD =0.7Vandhas beenoptimizedwithDualVTatVDD =1.1V.

34

Figs.22and23showtheimpactofDVSonDualVToptimizationsatdifferent voltages.TheFiguresshowthemostrelevantpathsregardingtorelativedelayscaling.

Agoodinsightofthecircuit’sbehaviorthroughtheDVSrangeisobtainedbyshowing thedelayofsomepathsafterbothspeedandpower optimizations, normalized to the delayofthecriticalpathafterspeedoptimizationsonly.TheFiguresshowhowdifferent

DualVT optimizations present different scaling behavior while sweeping the supply voltagethroughtheDVSrange.TheFiguresclearlyshowthatoptimizingatavoltage higherthanthelowestvoltageintheDVSrangeresultsinnoncriticalpathsatdesign timeturningintocriticalpathsatruntime.ThisisduetothedelayofhighvT paths scalingupatahigherratethanthemostcriticalpath(withoutanyhighvTgates)during supplyvoltagereduction.TheFiguresalsoshow,ontheotherhand,thatwhenDVSis adjustingthevoltageup,thedelayofhighvTpathsscalesdownatahigherratethanthe mostcriticalpath.Thislastscenariodoesnotrepresentanyproblemsincethecritical pathatdesigntime(path1)remainsasthelongestpaththroughthewholeDVSrange.

Importantandsummarizinginformationprovidedby Fig. 22 is that noncritical paths become critical when the circuit is operating right below 1.1V and DualVT optimizationshavebeenperformedatV DD =1.1V.Similarly,Fig.23showsthatnon critical paths become critical when the circuit is operating below 0.9V and DualVT optimizationshavebeenperformedatV DD =0.9V.Relativedelayvariations(RDV)as largeas32%areobserved.

35

Delay of paths after gate sizing and Dual-Vt optimizations, normalized to the delay of the critical path after gate sizing only. 1.30 25 % RDV 1.25 Path #1 after dual-vt at 1.20 Vdd=0.7 V Path #2 after dual-vt at 1.15 32 % RDV Vdd=0.7 V 1.10 Path #4 after dual-vt at Vdd=0.7 V 1.05 Path #1 after dual-vt at 1.00 Vdd=1.1 V 0.95 Path #2 after dual-vt at Normalized Delay Vdd=1.1 V 0.90 Path #4 after dual-vt at 0.85 Vdd=1.1 V 0.80 0.6 0.7 0.8 0.9 1.0 1.1 1.2 Vdd

Fig.22.ImpactofDVSafterDualVToptimizationsatthelowestandhighestsupply voltagesintheDVSrange

Delay of paths after gate sizing and Dual-Vt optimizations, normalized to the delay of the critical path after gate sizing only. 1.30 1.25 Path #1 after dual-vt at 24 % RDV Vdd=0.7 V 1.20 Path #2 after dual-vt at 1.15 Vdd=0.7 V 1.10 Path #4 after dual-vt at 28 % RDV Vdd=0.7 V 1.05 Path #1 after dual-vt at 1.00 Vdd=0.9 V 0.95 Path #2 after dual-vt at Normalized Delay Vdd=0.9 V 0.90 Path #4 after dual-vt at 0.85 Vdd=0.9 V 0.80 0.6 0.8 1.0 1.2 Vdd

Fig.23.ImpactofDVSafterDualVToptimizationsatthelowestandmiddlesupply voltagesintheDVSrange 36

6.CONCLUSIONSANDFUTUREWORK

Westudiedtheimpactofdynamicvoltagescalingon circuit optimizations for speed and leakage power. This case study was performed to the benchmark circuit

ISCAS85c432usinga45nmtechnology.Speedoptimizationswereperformedusing themethodoflogicaleffortaspresentedin[18].Leakageoptimizationswereperformed usingaDualVTalgorithmbasedon[2].ToolswereimplementedinC++toautomate andintegratecircuitoptimizationsintothewholedesignflow,fromsynthesistoHSPICE simulations for DVS as shown in Fig. 7. The optimization tools were validated by

HSPICEsimulations.Thespeedoptimizationtoolofferedamaximumspeedupof17%.

Thepoweroptimizationtoolofferedasmuchas35%leakagepowersavings.

TheresultsshowedthatDVShadnoimpactonspeedoptimizationssincepaths closetothecriticalpathpresentedsmallrelativedelayvariations(RDV’s<3%)which doesnotrepresentariskfornoncriticalpathstobecomecriticalpathsatruntimewhile applying DVS. Larger relative delay variations were observed for the shortest paths, withRDV’saslargeas16%,butthosevariationsarenotlargeenoughastoturnthe shortestpathsintocriticalpaths.

TheresultsshowedthatDVShasimpactonDualVToptimizations,sincenon critical paths at designtime became critical paths at runtime for optimizations performedatasupplyvoltagehigherthanthelowestvoltageintheDVSrange.This suggeststhatDualVToptimizationsshouldbeperformedatthelowestvoltage in the

DVSrange,otherwisenoncriticalpathswillbecomecriticalatruntime.Noncritical 37

pathsbecamelongerthanthecriticalpathduringruntimebyasmuchas25%.One shorter path showed as much as 32% RDV while applying DVS and came close to becoming the most critical path at runtime. Over designing could be considered to overcomethesituationofnoncriticalpathsbecomingcriticalpathsatruntime.A25% overdesignwouldbeenoughforthiscasestudy,butthefactthatRDV’saslargeas32% wereobserved,plusthepossibilityofhavingpathswithmuchmorecomplexstructures scalingwithmuchlargerRDV’ssuggeststhattherequiredoverdesigncouldbemuch largerthan25%forlargerandmorecomplexcircuitswithmorecomplexpathstructures.

Asetoftoolswereimplementedtoautomateseveraltasks andintegratethem intotheflowofFig.7sothatitcanbeusefulforfutureresearch.Thecircuitc432used inthiscasestudyisasmalldesignwithintheISCAS85familyofbenchmarks.Hence, anextensiontothisworkcouldbetheuseofthewholeflowimplementedinthisthesis tostudytheimpactofDVSoncircuitoptimizationsforlargerdesigns.Inordertodo that,someworkhastobedoneregardingtheDualVToptimizationswhichwerevery timeconsumingsincetheyarebasedonfullHSPICEsimulations.Theoptionswouldbe toeithercharacterizethecellsfordualvtanddifferentsupplyvoltages,andprograma statictiminganalyzerusinglookuptablessothatfullHSPICEsimulationsareavoided, or migrate the C++ DualVT optimization tool to UNIX, so that the full HSPICE simulationscanberunonthesupercomputersatthesupercomputingfacilitiesatTexas

A&M University. Characterizing the cells for dualvtanddifferentvoltagescouldbe moretimeconsumingthanrunningfullHSPICEsimulationsforthecircuitusedinthis casestudy,butcanbeagoodoptionforfutureresearch. 38

Wiring capacitances were not treated separately while performing speed optimizations with the method of logical effort. Those capacitances were taken into account as if they were part of the gate load being driven. This implies that the optimizationsforspeedarenotoptimal.Again,thecircuitusedinthiscasestudyis small and the wiring capacitances didn’t dominate the total capacitance of the nets.

Includingwiringdelayoptimizationsintothelogicalefforttoolisanotheropportunity forfuturework.

39

REFERENCES

[1] M. Anis and M. Elmasry, Multi-Threshold CMOS Digital Circuits: Managing

Leakage Power .1 st ed.Norwell,MA:KluwerAcademicPublishers,2003.

[2] L.Wei,Z.Chen,K.Roy,M.Johnson,andV.De,“DesignandOptimizationof

DualThreshold Circuits for LowVoltage LowPower Applications,” IEEE

Transactions on VLSI Systems, vol.7,no.1,pp.1624,March1999.

[3] P. Pant, V. De, and A. Chatterjee, “Simultaneous Power Supply, Threshold

Voltage, and Transistor Size Optimization for LowPower Operation of CMOS

Circuits,” IEEE Transactions on VLSI Systems , vol. 6, no. 4, pp. 538545,

December1998.

[4] V.SundararajanandK.Parhi,“LowPowerSynthesisofDualThresholdVoltage

CMOS VLSI Circuits,” in Proceedings of the IEEE International Symposium on

Low Power Electronics and Design ,1999,pp.139144.

[5] L. Wei, K. Roy, and C. Koh, “Power Minimization by Simulatenous DualVth

Assignment and Gatesizing,” in Proceedings of the IEEE Custom Integrated

Circuits Conference ,2000,pp.413416.

[6] S. Sirichotiyakul, T. Edwards, O. Chanhee, Z. Jingyan, A. Dharchoudhury, R.

Panda, and D. Blaauw, “Standby Power Minimization though Simultaneous

ThresholdVoltageSelectionandCircuitSizing,”in Proceedings of the 36 th Design

Automation Conference ,1999,pp.436441. 40

[7] K.Roy,L.Wei,andZ.Chen,“MultipleVddMultipleVth CMOS(MVCMOS)for

LowPower Applications,” in Proceedings of the IEEE International Symposium

on Circuits and Systems ,vol.1,1999,pp.366370.

[8] M. Khellah and M. Elmasry, “Power Minimization of HighPerformance

Submicron CMOS Circuits Using a DualVDD DualVth (DVDV) Approach,” in

Proceedings of the IEEE International Symposium on Low Power Electronics and

Design ,1999,pp.106108.

[9] N.WesteandD.Harris, CMOS VLSI Design: A Circuits and Systems Perspective,

3rd ed.Boston,MA:AddisonWesley,2005.

[10] S. Sirichotiyakul, T. Edwards, C. Oh, R. Panda, and D. Blaauw, “Duet: An

AccurateLeakageEstimationandOptimizationToolforDualVTCircuits,”IEEE

Transactions on VLSI Systems – Special Issue on Low Power Electronics and

Design ,vol.10no.2,pp.7990,2002.

[11] K.S. Yeo and K. Roy, Low-Voltage, Low-Power VLSI Subsystems , 1 st ed. New

York,NY:McGrawHill,2005.

[12] T.BurdandR.Brodersen,“EnergyEfficientCMOSMicroprocessorDesign,”in

Proceedings of the 28 th Hawaii International Conference on System Sciences ,vol.

1,1995,pp.288297.

[13] T.Pering,T.Burd,andR.Brodersen,“TheSimulationandEvaluationofDynamic

VoltageScalingAlgorithms,”in Proceedings of the IEEE International Symposium

on Low Power Electronics and Design , vol.1,1998,pp.7681. 41

[14] K. Flautner, S. Reinhardt, and T. Mudge, “Automatic Performance Setting for

DynamicVoltageScaling,”Wireless Networks ,vol.8,no.5,pp.507520,2002.

[15] T.Burd,T.Pering,A.Stratakos,andR.Brodersen,“ADynamicVoltageScaled

Microprocessor System,” IEEE Journal of Solid-State Circuits , vol. 35, no. 11,

pp.15711580,2000.

[16] W.T.NgandO.Trescases,“PowerManagementforModernVLSILoadsUsing

DynamicVoltageScaling,”in Proceedings of the 7 th International Conference on

Solid-State and Integrated Circuits Technology , vol. 2, no. 18, pp. 14121415,

2004.

[17] G.YeonandM.Horowitz,“Afully Digital,EnergyEfficient, Adaptive Power

Supply Regulator,” IEEE Journal of Solid-State Circuits , vol. 34, pp. 520528,

1999.

[18]I. Sutherland, B. Sproul, and D. Harris, Logical Effort: Designing Fast CMOS

Circuits ,1 st ed.SanFrancisco,CA:MorganKaufmann,1999.

[19] S.KarandikarandS.Sapatnekar,“TechnologyMappingUsingLogicalEffortfor

Solving the LoadDistribution Problem,” in IEEE Transactions on Computer-

Aided Design of Integrated Circuits and Systems ,vol.27,no.1,2008,pp.4558.

[20] R.Gonzalez,B.Gordon,M.Horowitz,“SupplyandThresholdVoltageScalingfor

LowPowerCMOS,”IEEE Journal of Solid-State Circuits ,vol.32,no.8,pp.1210

1216,1997.

[21] J. H. Stathis, “Reliability Limits for the Gate Insulator in CMOS Technology,”

IBM Journal of Research and Development ,vol.46,no.2,pp.265286,2002. 42

[22] M.Hirabayashi,K.Nose,andT.Sakurai,“DesignMethodologyandOptimization

Strategy for DualVth Scheme using Commercially Available Tools,” in

Proceedings of the International Symposium on Low Power Electronics and

Design ,2001,pp.283286.

[23] S.Martin,K.Flautner,T.Mudge,andD.Blaauw,“CombinedDynamicVoltage

Scaling and Adaptive Body Biasing for Lower Power under

DynamicWorkloads,”in Proceedings of the IEEE/ACM International Conference

on Computer-Aided Design ,2002,pp.721725.

[24] Electric’sVLSIDesignSystemManual.Staticfreesoft,2008.[Online].Available:

http://www.staticfreesoft.com/manual/text/chap0912.html

[25] ISCAS85HighLevelModels.DepartmentofElectricalandComputerScience,

UniversityofMichigan,2008[Online].Available: http://www.eecs.umich.edu/

~jhayes/iscas.restore/benchmark.html

[26] 45nm FreePDK kit. VLSI Group, Oklahoma State University, 2008 [Online].

Available:http: //vcag.ecen.okstate.edu/projects/scells/

43

VITA

Carlos Alberto Esquit Hernandez received his Bachelor of Science degree in electronicsengineeringfromUniversityoftheValleyofGuatemala,GuatemalaCity,in

June2003.

He worked for Siemens Electrotecnica S.A. as an Automation Engineer from

March2004toJuly2006.Heimplementedindustrialautomationsolutionsforindustries inGuatemalaandDominicanRepublicwhileworkinginthatposition.

He got a Fulbright grant sponsored by the US Department of State to pursue graduate studies in the United States. He used this grant to join the Department of

Electrical and Computer Engineering at Texas A&M Universityin August2006.He received his M.S. degree in Computer Engineering in May 2009. He will join the

DepartmentofElectronicsatUniversityoftheValleyofGuatemalastartinginJanuary

2009, where he will be teaching undergraduate courses and working at the Research

Institute of that University. Carlos Esquit may be reached at the Department of

Electronics Engineering office J305 at University of the Valley of Guatemala, 18 Av.

1195 Z. 15, V.H. III, Guatemala City, Guatemala. His email address is [email protected].