CSEE 3827: Fundamentals of Systems

Lecture 18, 19, & 20

April 2009

Martha Kim [email protected] Outline

• We will examine two MIPS implementations

• A single-cycle version

• A pipelined version

• Simple subset of MIPS, showing most aspects

• Memory reference: lw, sw

• Arithmetic/logical: add, sub, and, or, slt

• Control transfer: beq, j

• CPU performance factors

• Instruction count (determined by ISA and compiler)

and cycle time (determined by CPU hardware)

CSEE 3827, Spring 2009 Martha Kim 2 Instruction Execution

• PC → instruction memory, fetch instruction

• Register numbers → register file, read registers

• Depending on instruction class:

• Use ALU to calculate:

• Arithmetic or logical result

• Memory address for load/store

• Branch target address

• Access data for load/store

• PC ← target address or PC + 4

CSEE 3827, Spring 2009 Martha Kim 3 CPU Overview



!DD !DD

$ATA

2EGISTER 0# !DDRESS )NSTRUCTION 2EGISTERS !,5 !DDRESS 2EGISTER $ATA )NSTRUCTION MEMORY MEMORY 2EGISTER

$ATA

1, Ê {°£Ê ˜Ê >LÃÌÀ>VÌÊ ÛˆiÜÊ œvÊ Ì iÊ ˆ“«i“i˜Ì>̈œ˜Ê œvÊ Ì iÊ *-Ê ÃÕLÃiÌÊ Ã œÜˆ˜}Ê Ì iÊ “>œÀÊv՘V̈œ˜>Ê՘ˆÌÃÊ>˜`ÊÌ iʓ>œÀÊVœ˜˜iV̈œ˜ÃÊLiÌÜii˜ÊÌ i“°Ê!LLINSTRUCTIONSSTARTBYUSING THEPROGRAMCOUNTERTOSUPPLYTHEINSTRUCTIONADDRESSTOTHEINSTRUCTIONMEMORY!FTERTHEINSTRUCTIONIS FETCHED THEREGISTEROPERANDSUSEDBYANINSTRUCTIONARESPECIlEDBYlELDSOFTHATINSTRUCTION/NCETHE REGISTEROPERANDSHAVEBEENFETCHED THEYCANBEOPERATEDONTOCOMPUTEAMEMORYADDRESSFORALOADOR STORE TOCOMPUTEANARITHMETICRESULTFORANINTEGERARITHMETIC LOGICALINSTRUCTION ORACOMPAREFORA BRANCH )FTHEINSTRUCTIONISANARITHMETIC LOGICALINSTRUCTION THERESULTFROMTHE!,5MUSTBEWRITTENTO AREGISTER)FTHEOPERATIONISALOADORSTORE THE!,5RESULTISUSEDASANADDRESSTOEITHERSTOREAVALUEFROM THEREGISTERSORLOADAVALUEFROMMEMORYINTOTHEREGISTERS4HERESULTFROMTHE!,5ORMEMORYISWRITTEN BACKINTOTHEREGISTERlLE"RANCHESREQUIRETHEUSEOFTHE!,5OUTPUTTODETERMINETHENEXTINSTRUCTION ADDRESS WHICHCOMESEITHERFROMTHE!,5WHERETHE0#ANDBRANCHOFFSETARESUMMED ORFROMAN THATINCREMENTSTHECURRENT0#BY4HETHICKLINESINTERCONNECTINGTHEFUNCTIONALUNITSREPRESENTBUSES WHICHCONSISTOFMULTIPLESIGNALS4HEARROWSAREUSEDTOGUIDETHEREADERINKNOWINGHOWINFORMATIONmOWS 3INCESIGNALLINESMAYCROSS WEEXPLICITLYSHOWWHENCROSSINGLINESARECONNECTEDBYTHEPRESENCEOFADOT WHERETHELINESCROSS#OPYRIGHTÚ%LSEVIER )NC!LLRIGHTSRESERVED

CSEE 3827, Spring 2009 Martha Kim 4 Can’t just join wires together, use muxes



!DD !DD

$ATA

2EGISTER 0# !DDRESS )NSTRUCTION 2EGISTERS !,5 !DDRESS 2EGISTER $ATA )NSTRUCTION MEMORY MEMORY 2EGISTER

$ATA

1, Ê {°£Ê ˜Ê >LÃÌÀ>VÌÊ ÛˆiÜÊ œvÊ Ì iÊ ˆ“«i“i˜Ì>̈œ˜Ê œvÊ Ì iÊ *-Ê ÃÕLÃiÌÊ Ã œÜˆ˜}Ê Ì iÊ “>œÀÊv՘V̈œ˜>Ê՘ˆÌÃÊ>˜`ÊÌ iʓ>œÀÊVœ˜˜iV̈œ˜ÃÊLiÌÜii˜ÊÌ i“°Ê!LLINSTRUCTIONSSTARTBYUSING THEPROGRAMCOUNTERTOSUPPLYTHEINSTRUCTIONADDRESSTOTHEINSTRUCTIONMEMORY!FTERTHEINSTRUCTIONIS FETCHED THEREGISTEROPERANDSUSEDBYANINSTRUCTIONARESPECIlEDBYlELDSOFTHATINSTRUCTION/NCETHE REGISTEROPERANDSHAVEBEENFETCHED THEYCANBEOPERATEDONTOCOMPUTEAMEMORYADDRESSFORALOADOR STORE TOCOMPUTEANARITHMETICRESULTFORANINTEGERARITHMETIC LOGICALINSTRUCTION ORACOMPAREFORA BRANCH )FTHEINSTRUCTIONISANARITHMETIC LOGICALINSTRUCTION THERESULTFROMTHE!,5MUSTBEWRITTENTO AREGISTER)FTHEOPERATIONISALOADORSTORE THE!,5RESULTISUSEDASANADDRESSTOEITHERSTOREAVALUEFROM THEREGISTERSORLOADAVALUEFROMMEMORYINTOTHEREGISTERS4HERESULTFROMTHE!,5ORMEMORYISWRITTEN BACKINTOTHEREGISTERlLE"RANCHESREQUIRETHEUSEOFTHE!,5OUTPUTTODETERMINETHENEXTINSTRUCTION ADDRESS WHICHCOMESEITHERFROMTHE!,5WHERETHE0#ANDBRANCHOFFSETARESUMMED ORFROMANADDER THATINCREMENTSTHECURRENT0#BY4HETHICKLINESINTERCONNECTINGTHEFUNCTIONALUNITSREPRESENTBUSES WHICHCONSISTOFMULTIPLESIGNALS4HEARROWSAREUSEDTOGUIDETHEREADERINKNOWINGHOWINFORMATIONmOWS 3INCESIGNALLINESMAYCROSS WEEXPLICITLYSHOWWHENCROSSINGLINESARECONNECTEDBYTHEPRESENCEOFADOT WHERETHELINESCROSS#OPYRIGHTÚ%LSEVIER )NC!LLRIGHTSRESERVED

CSEE 3827, Spring 2009 Martha Kim 5 Control

"RANCH

- U X



- !DD !DD U X

!,5OPERATION $ATA

-EM7RITE 2EGISTER 0# !DDRESS )NSTRUCTION 2EGISTERS !,5 !DDRESS 2EGISTER - :ERO $ATA )NSTRUCTION U X MEMORY MEMORY 2EGISTER 2EG7RITE $ATA -EM2EAD

#ONTROL

1, Ê{°ÓÊ / iÊL>ÈVʈ“«i“i˜Ì>̈œ˜ÊœvÊÌ iÊ *-ÊÃÕLÃiÌ]ʈ˜VÕ`ˆ˜}ÊÌ iʘiViÃÃ>ÀÞʓՏ̈«iݜÀÃÊ>˜`ÊVœ˜ÌÀœÊˆ˜iÃ°Ê 4HETOPMULTIPLEXORh-UXv CONTROLSWHATVALUEREPLACESTHE0#0# ORTHEBRANCHDESTINATIONADDRESS THEMULTIPLEXORISCONTROLLED BYTHEGATETHATh!.$SvTOGETHERTHE:EROOUTPUTOFTHE!,5ANDACONTROLSIGNALTHATINDICATESTHATTHEINSTRUCTIONISABRANCH4HEMIDDLE MULTIPLEXOR WHOSEOUTPUTRETURNSTOTHEREGISTERlLE ISUSEDTOSTEERTHEOUTPUTOFTHE!,5INTHECASEOFANARITHMETIC LOGICALINSTRUCTION ORTHEOUTPUTOFTHEDATAMEMORYINTHECASEOFALOAD FORWRITINGINTOTHEREGISTERlLE&INALLY THEBOTTOMMOSTMULTIPLEXORISUSEDTO DETERMINEWHETHERTHESECOND!,5INPUTISFROMTHEREGISTERSFORANARITHMETIC LOGICALINSTRUCTION/2ABRANCH ORFROMTHEOFFSETlELDOF THEINSTRUCTIONFORALOADORSTORE 4HEADDEDCONTROLLINESARESTRAIGHTFORWARDANDDETERMINETHEOPERATIONPERFORMEDATTHE!,5 WHETHER THEDATAMEMORYSHOULDREADORWRITE ANDWHETHERTHEREGISTERSSHOULDPERFORMAWRITEOPERATION4HECONTROLLINESARESHOWNINCOLORTO MAKETHEMEASIERTOSEE#OPYRIGHTÚ%LSEVIER )NC!LLRIGHTSRESERVED

CSEE 3827, Spring 2009 Martha Kim 6 MIPS Combinational Elements

• AND gate (Y = A & B) • Adder (Y = A + B)

A A Y B + Y B

(Y = S ? A : B) • Arithmetic/Logic Unit (ALU) (Y = F(A,B)) A Y B A ALU Y S B

F

CSEE 3827, Spring 2009 Martha Kim 8 Clocking Methodology

Combinational logic transforms data during clock cycles. Longest combinational delay determines clock period.

3TATE 3TATE ELEMENT #OMBINATIONALLOGIC ELEMENT  

#LOCKCYCLE

1, Ê{°ÎÊ œ“Lˆ˜>̈œ˜>Êœ}ˆV]ÊÃÌ>ÌiÊii“i˜ÌÃ]Ê>˜`ÊÌ iÊVœVŽÊ>ÀiÊVœÃiÞÊÀi>Ìi`°Ê)NA SYNCHRONOUSDIGITALSYSTEM THECLOCKDETERMINESWHENELEMENTSWITHSTATEWILLWRITEVALUESINTOINTERNAL STORAGE!NYINPUTSTOASTATEELEMENTMUSTREACHASTABLEVALUETHATIS HAVEREACHEDAVALUEFROMWHICHTHEY WILLNOTCHANGEUNTILAFTERTHECLOCKEDGE BEFORETHEACTIVECLOCKEDGECAUSESTHESTATETOBEUPDATED!LLSTATE ELEMENTSINTHISCHAPTER INCLUDINGMEMORY AREASSUMEDTOBEEDGE TRIGGERED#OPYRIGHTÚ%LSEVIER )NC !LLRIGHTSRESERVED

3TATE #OMBINATIONALLOGIC ELEMENT

1, Ê{°{Ê ˜Êi`}i‡ÌÀˆ}}iÀi`ʓiÌ œ`œœ}ÞÊ>œÜÃÊ>ÊÃÌ>ÌiÊii“i˜ÌÊ̜ÊLiÊÀi>`Ê>˜`ÊÜÀˆÌ‡ Ìi˜Êˆ˜ÊÌ iÊÃ>“iÊVœVŽÊVÞViÊÜˆÌ œÕÌÊVÀi>̈˜}Ê>ÊÀ>ViÊÌ >ÌÊVœÕ`ʏi>`Ê̜ʈ˜`iÌiÀ“ˆ˜>ÌiÊ`>Ì>Ê Û>Õið/FCOURSE THECLOCKCYCLESTILLMUSTBELONGENOUGHSOTHATTHEINPUTVALUESARESTABLEWHENTHE ACTIVECLOCKEDGEOCCURS&EEDBACKCANNOTOCCURWITHINONECLOCKCYCLEBECAUSEOFTHEEDGE TRIGGEREDUPDATE OFTHESTATEELEMENT)FFEEDBACKWERE POSSIBLE THISDESIGNCOULDNOTWORKPROPERLY /URDESIGNSINTHIS CHAPTERANDTHENEXTRELYONTHEEDGE TRIGGEREDTIMINGMETHODOLOGYANDONSTRUCTURESLIKETHEONESHOWN INTHISlGURE#OPYRIGHTÚ%LSEVIER )NC!LLRIGHTSRESERVED

CSEE 3827, Spring 2009 Martha Kim 9 Building a datapath incrementally

• Datapath: elements that process data and addresses in the CPU

• Datapath will execute one instruction in one clock cycle

• Each datapath element can only do one function at a time

• Hence, we need separate instruction and data memories

• Use where alternate data sources are used for different instructions

CSEE 3827, Spring 2009 Martha Kim 10 Instruction Fetch

• Fetch Instruction contained in PC register from memory

• Compute PC + 4 for next instruction

)NSTRUCTION ADDRESS

)NSTRUCTION 0# !DD 3UM

)NSTRUCTION MEMORY

A)NSTRUCTIONMEMORY B0ROGRAMCOUNTER C!DDER

1, Ê {°xÊ /ÜœÊ ÃÌ>ÌiÊ ii“i˜ÌÃÊ >ÀiÊ ˜ii`i`Ê ÌœÊ Ã̜ÀiÊ >˜`Ê >VViÃÃÊ ˆ˜ÃÌÀÕV̈œ˜Ã]Ê >˜`Ê >˜Ê >``iÀʈÃʘii`i`Ê̜ÊVœ“«ÕÌiÊÌ iʘiÝÌʈ˜ÃÌÀÕV̈œ˜Ê>``ÀiÃðÊ4HESTATEELEMENTSARETHEINSTRUCTION MEMORY AND THE PROGRAM COUNTER 4HE INSTRUCTION MEMORYNEED ONLY PROVIDE READ ACCESS BECAUSE THE DATAPATHDOESNOTWRITEINSTRUCTIONS3INCETHEINSTRUCTIONMEMORYONLYREADS WETREATITASCOMBINATIONAL LOGICTHEOUTPUTATANYTIMEREmECTSTHECONTENTSOFTHELOCATIONSPECIlEDBYTHEADDRESSINPUT ANDNOREAD CONTROLSIGNALISNEEDED7EWILLNEEDTOWRITETHEINSTRUCTIONMEMORYWHENWELOADTHEPROGRAMTHISIS NOTHARDTOADD ANDWEIGNOREITFORSIMPLICITY 4HEPROGRAMCOUNTERISA BITREGISTERTHATISWRITTENATTHE ENDOFEVERYCLOCKCYCLEANDTHUSDOESNOTNEEDAWRITECONTROLSIGNAL4HEADDERISAN!,5WIREDTOALWAYS ADDITSTWO BITINPUTSANDPLACETHESUMONITSOUTPUT#OPYRIGHTÚ%LSEVIER )NC!LLRIGHTSRESERVED

CSEE 3827, Spring 2009 Martha Kim 11 Part 1: Instruction Fetch

!DD



2EAD 0# ADDRESS

)NSTRUCTION

)NSTRUCTION MEMORY

1, Ê{°ÈÊ Ê«œÀ̈œ˜ÊœvÊÌ iÊ`>Ì>«>Ì ÊÕÃi`ÊvœÀÊviÌV ˆ˜}ʈ˜ÃÌÀÕV̈œ˜ÃÊ>˜`ʈ˜VÀi“i˜Ìˆ˜}Ê Ì iÊ«Àœ}À>“ÊVœÕ˜ÌiÀ°Ê4HEFETCHEDINSTRUCTIONISUSEDBYOTHERPARTSOFTHEDATAPATH#OPYRIGHTÚ %LSEVIER )NC!LLRIGHTSRESERVED

CSEE 3827, Spring 2009 Martha Kim 12 R-Format Instructions

• Read two register operands • Perform arithmetic/logical operation • Write register result

 2EAD !,5OPERATION  REGISTER 2EAD 2EGISTER  2EAD DATA NUMBERS REGISTER :ERO $ATA !,5  2EGISTERS !,5 7RITE RESULT REGISTER 2EAD 7RITE DATA $ATA $ATA 2EG7RITE

A2EGISTERS B!,5

1, Ê {°ÇÊ / iÊ ÌÜœÊ ii“i˜ÌÃÊ ˜ii`i`Ê ÌœÊ ˆ“«i“i˜ÌÊ ,‡vœÀ“>ÌÊ 1Ê œ«iÀ>̈œ˜ÃÊ >ÀiÊ Ì iÊ Ài}ˆÃÌiÀÊwʏiÊ>˜`ÊÌ iÊ1°Ê4HEREGISTERlLECONTAINSALLTHEREGISTERSANDHASTWOREADPORTSANDONEWRITE PORT4HEDESIGNOFMULTIPORTEDREGISTERlLESISDISCUSSEDIN3ECTION#OF !PPENDIX#4HEREGISTERlLE ALWAYSOUTPUTSTHECONTENTSOFTHEREGISTERSCORRESPONDINGTOTHE2EADREGISTERINPUTSONTHEOUTPUTSNO OTHERCONTROLINPUTSARENEEDED)NCONTRAST AREGISTERWRITEMUSTBEEXPLICITLYINDICATEDBYASSERTINGTHE WRITECONTROLSIGNAL2EMEMBERTHATWRITESAREEDGE TRIGGERED SOTHATALLTHEWRITEINPUTSIE THEVALUETO BEWRITTEN THEREGISTERNUMBER ANDTHEWRITECONTROLSIGNAL MUSTBEVALIDATTHECLOCKEDGE3INCEWRITESTO THEREGISTERlLEAREEDGE TRIGGERED OURDESIGNCANLEGALLYREADANDWRITETHESAMEREGISTERWITHINACLOCKCYCLE THEREADWILLGETTHEVALUEWRITTENINANEARLIERCLOCKCYCLE WHILETHEVALUEWRITTENWILLBEAVAILABLETOAREADIN ASUBSEQUENTCLOCKCYCLE4HEINPUTSCARRYINGTHEREGISTERNUMBERTOTHEREGISTERlLEAREALLBITSWIDE WHEREAS THELINESCARRYINGDATAVALUESAREBITSWIDE4HEOPERATIONTOBEPERFORMEDBYTHE!,5ISCONTROLLEDWITH THE!,5OPERATIONSIGNAL WHICHWILLBEBITSWIDE USINGTHE!,5DESIGNEDIN !PPENDIX#7EWILL USETHE:ERODETECTIONOUTPUTOFTHE!,5SHORTLYTOIMPLEMENTBRANCHES4HEOVERmOWOUTPUTWILLNOTBE NEEDEDUNTIL3ECTION WHENWEDISCUSSEXCEPTIONSWEOMITITUNTILTHEN#OPYRIGHTÚ%LSEVIER )NC !LLRIGHTSRESERVED

CSEE 3827, Spring 2009 Martha Kim 13 Load/Store Instructions

• Read register operands • Calculate address using 16-bit offset (use ALU but sign-extend offset) • Load: read memory and update register • Store: write register value to memory

-EM7RITE

2EAD !DDRESS DATA   3IGN $ATA EXTEND 7RITE MEMORY DATA

-EM2EAD

A$ATAMEMORYUNIT B3IGNEXTENSIONUNIT

1, Ê {°nÊ / iÊ ÌÜœÊ Õ˜ˆÌÃÊ ˜ii`i`Ê ÌœÊ ˆ“«i“i˜ÌÊ œ>`ÃÊ >˜`Ê Ã̜ÀiÃ]Ê ˆ˜Ê >``ˆÌˆœ˜Ê ÌœÊ Ì iÊ Ài}ˆÃÌiÀÊwʏiÊ>˜`Ê1ʜvʈ}ÕÀiÊ{°Ç]Ê>ÀiÊÌ iÊ`>Ì>ʓi“œÀÞÊ՘ˆÌÊ>˜`ÊÌ iÊÈ}˜ÊiÝÌi˜Ãˆœ˜Ê՘ˆÌ° 4HEMEMORYUNITISASTATEELEMENTWITHINPUTSFORTHEADDRESSANDTHEWRITEDATA ANDASINGLEOUTPUTFOR THEREADRESULT4HEREARESEPARATEREADANDWRITECONTROLS ALTHOUGHONLYONEOFTHESEMAYBEASSERTEDON ANYGIVENCLOCK4HEMEMORYUNITNEEDSAREADSIGNAL SINCE UNLIKETHEREGISTERlLE READINGTHEVALUEOFAN INVALIDADDRESSCANCAUSEPROBLEMS ASWEWILLSEEIN#HAPTER4HESIGNEXTENSIONUNITHASA BITINPUTTHAT ISSIGN EXTENDEDINTOA BITRESULTAPPEARINGONTHEOUTPUTSEE#HAPTER 7EASSUMETHEDATAMEMORYIS EDGE TRIGGEREDFORWRITES3TANDARDMEMORYCHIPSACTUALLYHAVEAWRITEENABLESIGNALTHATISUSEDFORWRITES !LTHOUGHTHEWRITEENABLEISNOTEDGE TRIGGERED OUREDGE TRIGGEREDDESIGNCOULDEASILYBEADAPTEDTOWORK WITHREALMEMORYCHIPS3EE3ECTION#OF !PPENDIX#FORFURTHERDISCUSSIONOFHOWREALMEMORY CHIPSWORK#OPYRIGHTÚ%LSEVIER )NC!LLRIGHTSRESERVED

CSEE 3827, Spring 2009 Martha Kim 14 Part 2: R-Type/Load/Store Datapath

2EAD !,5OPERATION REGISTER  2EAD -EM7RITE DATA 2EAD :ERO -EMTO2EG )NSTRUCTION REGISTER !,53RC !,5 2EGISTERS 2EAD !,5 2EAD 7RITE  !DDRESS  DATA RESULT DATA REGISTER - - U U X X 7RITE   DATA $ATA 7RITE MEMORY 2EG7RITE DATA

  3IGN -EM2EAD EXTEND

1, Ê{°£äÊ / iÊ`>Ì>«>Ì ÊvœÀÊÌ iʓi“œÀÞʈ˜ÃÌÀÕV̈œ˜ÃÊ>˜`ÊÌ iÊ,‡ÌÞ«iʈ˜ÃÌÀÕV̈œ˜Ã°Ê4HISEXAMPLESHOWSHOW ASINGLEDATAPATHCANBEASSEMBLEDFROMTHEPIECESIN&IGURESANDBYADDINGMULTIPLEXORS4WOMULTIPLEXORSARENEEDED ASDESCRIBEDINTHEEXAMPLE#OPYRIGHTÚ%LSEVIER )NC!LLRIGHTSRESERVED

CSEE 3827, Spring 2009 Martha Kim 15 Branch Instructions

• Read register operands

• Compare operands (use ALU: subtract and check zero output)

• Calculate target address

• Sign-extend displacement

• Shift left two places (word displacement)

• Add to PC+4 (already calculated by instruction fetch)

CSEE 3827, Spring 2009 Martha Kim 16 Part 3: Instruction Fetch w. Branch

0# FROMINSTRUCTIONDATAPATH

"RANCH 3UM !DD TARGET 3HIFT LEFT

2EAD !,5OPERATION  )NSTRUCTION REGISTER 2EAD 2EAD DATA REGISTER 4OBRANCH :ERO 2EGISTERS !,5 CONTROLLOGIC 7RITE REGISTER 2EAD DATA 7RITE DATA 2EG7RITE

  3IGN EXTEND

1, Ê{°™Ê / iÊ`>Ì>«>Ì ÊvœÀÊ>ÊLÀ>˜V ÊÕÃiÃÊÌ iÊ1Ê̜ÊiÛ>Õ>ÌiÊÌ iÊLÀ>˜V ÊVœ˜`ˆÌˆœ˜Ê >˜`Ê>ÊÃi«>À>ÌiÊ>``iÀÊ̜ÊVœ“«ÕÌiÊÌ iÊLÀ>˜V ÊÌ>À}iÌÊ>ÃÊÌ iÊÃՓʜvÊÌ iʈ˜VÀi“i˜Ìi`Ê* Ê >˜`ÊÌ iÊÈ}˜‡iÝÌi˜`i`]ʏœÜiÀÊ£ÈÊLˆÌÃʜvÊÌ iʈ˜ÃÌÀÕV̈œ˜Ê­Ì iÊLÀ>˜V Ê`ˆÃ«>Vi“i˜Ì®]Êà ˆvÌi`Ê ivÌÊÓÊLˆÌðÊ4HEUNITLABELED3HIFTLEFTISSIMPLYAROUTINGOFTHESIGNALSBETWEENINPUTANDOUTPUTTHAT ADDSTWOTOTHELOW ORDERENDOFTHESIGN EXTENDEDOFFSETlELDNOACTUALSHIFTHARDWAREISNEEDED SINCE THEAMOUNTOFTHEhSHIFTvISCONSTANT3INCEWEKNOWTHATTHEOFFSETWASSIGN EXTENDEDFROMBITS THESHIFT WILLTHROWAWAYONLYhSIGNBITSv#ONTROLLOGICISUSEDTODECIDEWHETHERTHEINCREMENTED0#ORBRANCHTARGET SHOULD REPLACE THE 0# BASED ON THE :ERO OUTPUT OFTHE !,5 #OPYRIGHT Ú  %LSEVIER )NC!LL RIGHTS RESERVED

CSEE 3827, Spring 2009 Martha Kim 17 Full Datapath

0#3RC

- !DD U X !,5  !DD RESULT 3HIFT LEFT

2EAD 2EAD !,53RC !,5OPERATION 0# REGISTER  ADDRESS 2EAD -EM7RITE DATA 2EAD -EMTO2EG REGISTER :ERO )NSTRUCTION 2EGISTERS !,5 !,5 2EAD 2EAD !DDRESS 7RITE RESULT DATA )NSTRUCTION REGISTER DATA - - U U MEMORY X 7RITE X DATA 7RITE $ATA 2EG7RITE DATA MEMORY

  -EM2EAD 3IGN EXTEND

1, Ê{°££Ê / iÊȓ«iÊ`>Ì>«>Ì ÊvœÀÊÌ iÊ *-Ê>ÀV ˆÌiVÌÕÀiÊVœ“Lˆ˜iÃÊÌ iÊii“i˜ÌÃÊÀiµÕˆÀi`ÊLÞÊ`ˆvviÀi˜Ìʈ˜ÃÌÀÕV̈œ˜Ê V>ÃÃiðÊ4HECOMPONENTSCOMEFROM&IGURES  ANDÊ4HISDATAPATHCANEXECUTETHEBASICINSTRUCTIONSLOAD STOREWORD !,5 OPERATIONS ANDBRANCHES INASINGLECLOCKCYCLE!NADDITIONALMULTIPLEXORISNEEDEDTOINTEGRATEBRANCHES4HESUPPORTFORJUMPSWILLBE ADDEDLATER#OPYRIGHTÚ%LSEVIER )NC!LLRIGHTSRESERVED

CSEE 3827, Spring 2009 Martha Kim 18 MIPS Datapath Control Datapath Control Scheme

 Main control controls whole - !DD U X datapath based on opcode !,5   !DDRESULT 3HIFT 2EG$ST LEFT "RANCH -EM2EAD )NSTRUCTION;n= -EMTO2EG #ONTROL !,5/P -EM7RITE !,53RC 2EG7RITE

)NSTRUCTION;n= 2EAD ALU control controls ALU 2EAD REGISTER 0# ADDRESS 2EAD based on opcode )NSTRUCTION;n= 2EAD DATA REGISTER :ERO )NSTRUCTION  (ALUOp) and function ;n= !,5 !,5 2EAD - 7RITE 2EAD  !DDRESS  U RESULT DATA )NSTRUCTION )NSTRUCTION;n= REGISTER DATA - - field (funct) X U U MEMORY  X 7RITE X   DATA 2EGISTERS 7RITE $ATA DATA MEMORY

)NSTRUCTION;n=  3IGN  !,5 EXTEND CONTROL

)NSTRUCTION;n=

1, Ê{°£ÇÊ / iÊȓ«iÊ`>Ì>«>Ì ÊÜˆÌ ÊÌ iÊVœ˜ÌÀœÊ՘ˆÌ°Ê4HEINPUTTOTHECONTROLUNITISTHE BITOPCODElELDFROMTHEINSTRUCTION 4HEOUTPUTSOFTHECONTROLUNITCONSISTOFTHREE BITSIGNALSTHATAREUSEDTOCONTROLMULTIPLEXORS2EG$ST !,53RC AND-EMTO2EG THREE SIGNALSFORCONTROLLINGREADSANDWRITESINTHEREGISTERlLEANDDATAMEMORY2EG7RITE -EM2EAD AND-EM7RITE A BITSIGNALUSEDIN DETERMININGWHETHERTOPOSSIBLYBRANCH"RANCH ANDA BITCONTROLSIGNALFORTHE!,5!,5/P !N!.$GATEISUSEDTOCOMBINETHE BRANCHCONTROLSIGNALANDTHE:EROOUTPUTFROMTHE!,5THE!.$GATEOUTPUTCONTROLSTHESELECTIONOFTHENEXT0#.OTICETHAT0#3RCISNOW ADERIVEDSIGNAL RATHERTHANONECOMINGDIRECTLYFROMTHECONTROLUNIT4HUS WEDROPTHESIGNALNAMEINSUBSEQUENTlGURES#OPYRIGHTÚ %LSEVIER )NC!LLRIGHTSRESERVED CSEE 3827, Spring 2009 Martha Kim 20 ALU Control Inputs/Outputs

Main Control R-type → 10 ALUOp lw → 00 2 ALU 0000 → AND sw → 00 Operation 0001 → OR beq → 01 ALU control 4 0010 → add

Instruction[5:0] 0110 → subtract

0111 → set on less than

(See Appendix C of text for implementation of corresponding ALU.)

CSEE 3827, Spring 2009 Martha Kim 21 ALU Control Implementation

opcode ALUOp from main control Instruction[5:0] Operation lw → 00 xxxxxx → load word → add → 0010

sw → 00 xxxxxx → store word → add → 0010

beq → 01 xxxxxx → branch equal → subtract → 0110

R-type → 10 100000 → add → add → 0010

R-type → 10 100010 → subtract → subtract → 0110

R-type → 10 100100 → AND → AND → 0000

R-type → 10 100101 → OR → OR → 0001

R-type → 10 101010 → set on less than → set on less than → 0111

CSEE 3827, Spring 2009 Martha Kim 22 ALU Control Truth Table

ALUOp from main control Instruction[5:0] Operation

00 xxxxxx 0010

00 xxxxxx 0010

01 xxxxxx 0110

10 100000 0010

10 100010 0110

10 100100 0000

10 100101 0001

10 101010 0111

CSEE 3827, Spring 2009 Martha Kim 23 ALU Control Truth Table 2

!,5/P &UNCTlELD !,5/P !,5/P & & & & & & /PERATION   888888    888888    88   8 88    88    88   8 88 

&)'52% 4HETRUTHTABLEFORTHE!,5CONTROLBITSCALLED/PERATION 4HEINPUTSARE THE!,5/PANDFUNCTIONCODElELD/NLYTHEENTRIESFORWHICHTHE!,5CONTROLISASSERTEDARESHOWN3OME DONT CAREENTRIESHAVEBEENADDED&OREXAMPLE THE!,5/PDOESNOTUSETHEENCODING SOTHETRUTHTABLE CANCONTAINENTRIES8AND8 RATHERTHANAND.OTETHATWHENTHEFUNCTIONlELDISUSED THElRST BITS&AND& OFTHESEINSTRUCTIONSAREALWAYS SOTHEYAREDONT CARETERMSANDAREREPLACEDWITH88 INTHETRUTHTABLE#OPYRIGHTÚ%LSEVIER )NC!LLRIGHTSRESERVED

CSEE 3827, Spring 2009 Martha Kim 24 Datapath Control Scheme

 - !DD U X !,5   !DDRESULT 3HIFT 2EG$ST LEFT "RANCH -EM2EAD )NSTRUCTION;n= -EMTO2EG #ONTROL !,5/P -EM7RITE !,53RC 2EG7RITE

)NSTRUCTION;n= 2EAD 2EAD REGISTER 0# ADDRESS 2EAD )NSTRUCTION;n= 2EAD DATA REGISTER :ERO )NSTRUCTION  ;n= !,5 !,5 2EAD - 7RITE 2EAD  !DDRESS  U RESULT DATA )NSTRUCTION )NSTRUCTION;n= REGISTER DATA - - X U U MEMORY  X 7RITE X   DATA 2EGISTERS 7RITE $ATA DATA MEMORY

)NSTRUCTION;n=  3IGN  !,5 EXTEND CONTROL

)NSTRUCTION;n=

1, Ê{°£ÇÊ / iÊȓ«iÊ`>Ì>«>Ì ÊÜˆÌ ÊÌ iÊVœ˜ÌÀœÊ՘ˆÌ°Ê4HEINPUTTOTHECONTROLUNITISTHE BITOPCODElELDFROMTHEINSTRUCTION 4HEOUTPUTSOFTHECONTROLUNITCONSISTOFTHREE BITSIGNALSTHATAREUSEDTOCONTROLMULTIPLEXORS2EG$ST !,53RC AND-EMTO2EG THREE SIGNALSFORCONTROLLINGREADSANDWRITESINTHEREGISTERlLEANDDATAMEMORY2EG7RITE -EM2EAD AND-EM7RITE A BITSIGNALUSEDIN DETERMININGWHETHERTOPOSSIBLYBRANCH"RANCH ANDA BITCONTROLSIGNALFORTHE!,5!,5/P !N!.$GATEISUSEDTOCOMBINETHE BRANCHCONTROLSIGNALANDTHE:EROOUTPUTFROMTHE!,5THE!.$GATEOUTPUTCONTROLSTHESELECTIONOFTHENEXT0#.OTICETHAT0#3RCISNOW ADERIVEDSIGNAL RATHERTHANONECOMINGDIRECTLYFROMTHECONTROLUNIT4HUS WEDROPTHESIGNALNAMEINSUBSEQUENTlGURES#OPYRIGHTÚ %LSEVIER )NC!LLRIGHTSRESERVED CSEE 3827, Spring 2009 Martha Kim 25 Main control signals derive from instruction types

R-type: 0 rs rt rd shamt funct 31:26 25:21 20:16 15:11 10:6 5:0

Load/Store: 35 or 43 rs rt constant 31:26 25:21 20:16 15:0

Branch: 4 rs rt constant 31:26 25:21 20:16 15:0 sign-extend always read, write for and add read except R-type for load and load CSEE 3827, Spring 2009 Martha Kim 26 R-Type Control Signals

0

1

0 0 0

1

0

10 (Alt. illustration: Fig. 4.19) CSEE 3827, Spring 2009 Martha Kim27 lw Control Signals

0

1

0 1 1

0

1

00 (Alt. illustration: Fig. 4.20) CSEE 3827, Spring 2009 Martha 28Kim sw Control Signals

0

0

1 1 x

x

0

00

CSEE 3827, Spring 2009 Martha Kim 29 beq Control Signals

1

0

0 0 x

x

0

01 (Alt. illustration: Fig. 4.21) CSEE 3827, Spring 2009 Martha Kim 30 Main Control Truth Table

i“Ìœ‡ ,i}‡ i“‡ i“‡ Instruction[31:26]˜ÃÌÀÕV̈œ˜ ,i} ÃÌ 1-ÀV ,i} 7ÀˆÌi ,i>` 7ÀˆÌi À>˜V 1"«£ 1"«ä 000000 ,‡vœÀ“>Ì £ ä ä£ää ä £ ä 100011 LW ä £ £££ä ä ä ä 101011 SW 8 £ 8ää£ ä ä ä 000100 BEQ 8 ä 8äää £ ä £ 1, Ê{°£nÊ / iÊÃiÌ̈˜}ʜvÊÌ iÊVœ˜ÌÀœÊˆ˜iÃʈÃÊVœ“«iÌiÞÊ`iÌiÀ“ˆ˜i`ÊLÞÊÌ iʜ«Vœ`iÊwÊi`ÃʜvÊÌ iʈ˜ÃÌÀÕV̈œ˜°Ê4HElRST ROWOFTHETABLECORRESPONDSTOTHE2 FORMATINSTRUCTIONSADD SUB !.$ /2 ANDSLT &ORALLTHESEINSTRUCTIONS THESOURCEREGISTERlELDS ARERSANDRT ANDTHEDESTINATIONREGISTERlELDISRDTHISDElNESHOWTHESIGNALS!,53RCAND2EG$STARESET&URTHERMORE AN2 TYPEINSTRUCTION WRITESAREGISTER2EG7RITE BUTNEITHERREADSNORWRITESDATAMEMORY7HENTHE"RANCHCONTROLSIGNALIS THE0#ISUNCONDITIONALLY REPLACEDWITH0# OTHERWISE THE0#ISREPLACEDBYTHEBRANCHTARGETIFTHE:EROOUTPUTOFTHE!,5ISALSOHIGH4HE!,5/PlELDFOR2 TYPE INSTRUCTIONSISSETTOTOINDICATETHATTHE!,5CONTROLSHOULDBEGENERATEDFROMTHEFUNCTlELD4HESECONDANDTHIRDROWSOFTHISTABLE GIVETHECONTROLSIGNALSETTINGSFORLWANDSW4HESE!,53RCAND!,5/PlELDSARESETTOPERFORMTHEADDRESSCALCULATION4HE-EM2EADAND -EM7RITEARESETTOPERFORMTHEMEMORYACCESS&INALLY 2EG$STAND2EG7RITEARESETFORALOADTOCAUSETHERESULTTOBESTOREDINTOTHERT REGISTER4HEBRANCHINSTRUCTIONISSIMILARTOAN2 FORMATOPERATION SINCEITSENDSTHERSANDRTREGISTERSTOTHE!,54HE!,5/PlELDFORBRANCH ISSETFORASUBTRACT!,5CONTROL WHICHISUSEDTOTESTFOREQUALITY.OTICETHATTHE-EMTO2EGlELDISIRRELEVANTWHENTHE2EG7RITESIGNAL ISSINCETHEREGISTERISNOTBEINGWRITTEN THEVALUEOFTHEDATAONTHEREGISTERDATAWRITEPORTISNOTUSED4HUS THEENTRY-EMTO2EGINTHELAST TWOROWSOFTHETABLEISREPLACEDWITH8FORDONTCARE$ONTCARESCANALSOBEADDEDTO2EG$STWHEN2EG7RITEIS4HISTYPEOFDONTCAREMUST BEADDEDBYTHEDESIGNER SINCEITDEPENDSONKNOWLEDGEOFHOWTHEDATAPATHWORKS#OPYRIGHTÚ%LSEVIER )NC!LLRIGHTSRESERVED

CSEE 3827, Spring 2009 Martha Kim 31 Implementing Jumps The j instruction

• Unconditional jump to instruction at label

j label

• Instruction encoded in J-type format

2 address 31:26 25:0 • Jump uses word addresses

• Update PC with concatenation of: • Top 4 bits of old PC • 26-bit jump address • 00

CSEE 3827, Spring 2009 Martha Kim 33 Implementing the jump instruction

CSEE 3827, Spring 2009 Martha Kim 34 Implementing the jump instruction -- in class soln

CSEE 3827, Spring 2009 Martha Kim 35 CPU Performance Understanding Performance

• Algorithm → number of operations executed

• Programming language, compiler, architecture → determine number of machine instructions executed per operation

and memory system → determines how fast instructions are executed

• I/O system (including OS) → determines how fast I/O operations are executed

CSEE 3827, Spring 2009 Martha Kim 37 Defining Performance

• Which airplane has the best performance?

Boeing 777 Boeing 777

Boeing 747 Boeing 747

BAC/Sud BAC/Sud Concorde Concorde Douglas Douglas DC- DC-8-50 8-50

0 100 200 300 400 500 0 2000 4000 6000 8000 10000

Passenger Capacity Cruising Range (miles)

Boeing 777 Boeing 777

Boeing 747 Boeing 747

BAC/Sud BAC/Sud Concorde Concorde Douglas Douglas DC- DC-8-50 8-50

0 500 1000 1500 0 100000 200000 300000 400000

Cruising Speed (mph) Passengers x mph

CSEE 3827, Spring 2009 Martha Kim 38 Response Time and Throughput

Response time: how long it takes to do a task, sometimes also called latency [time/work]

Throughput: total work done per unit time [work/time]

How are response time and throughput affected by. . . Replacing the processor with a faster version?

Adding more processors?

For now, we’ll focus on response time CSEE 3827, Spring 2009 Martha Kim 39 Relative Performance

Define: Performance = 1 / Execution Time

“X is n times faster than Y” → Performance X / Performance Y = Execution Time Y / Execution Time X = n

Example:

Program takes 10 s to run on machine A, 15 s on machine B Execution Time B / Execution Time A = 15 / 10 = 1.5 “A is 1.5 times faster than B”

CSEE 3827, Spring 2009 Martha Kim 40 Measuring Execution Time

Define: Elapsed Time

Total response time including all aspects (Processing, I/O, overhead, idle time)

Define: CPU Time

Time spent processing a given job (discounts I/O time, other jobs shares)

Elapsed Time > CPU Time

CSEE 3827, Spring 2009 Martha Kim 41 CPU Clocking

Operation of digital hardware governed by a constant-rate clock

Clock period

Clock Data transfer and computation

Update state Time

Clock period: duration of a clock cycle e.g., 250ps = 0.25ns

Clock frequency (rate): cycles per second e.g., 4.0GHz = 4000MHz

CSEE 3827, Spring 2009 Martha Kim 42 CPU Time

CPU Time = CPU Clock Cycles * Clock Cycle Time = CPU Clock Cycles /

Performance improved by: 1. Reducing number of clock cycles 2. Increasing clock rate (reducing clock period)

Hardware designer must often trade off clock rate against cycle count.

CSEE 3827, Spring 2009 Martha Kim 43 CPU Time Example

Computer A: 2GHz clock, 10s CPU time

Designing Computer B: - Aim for 6s CPU Time - Clock rate increase requires 1.2x the number of cycles

How fast must Computer B’s clock be?

Clock CyclesB 1.2×Clock CyclesA Clock RateB = = CPU TimeB 6s

Clock CyclesA = CPU Time A ×Clock RateA = 10s×2GHz = 20×109 1.2×20×109 24×109 Clock Rate = = = 4GHz B 6s 6s

CSEE 3827, Spring 2009 Martha Kim 44 Instruction Count and CPI

Instruction count Determined by program, ISA, and compiler

Average cycles per instruction (CPI) - Determined by CPU hardware - If different instructions have different CPI, can compute a weighted average based on instruction mix

Clock Cycles = Instruction Count * Cycles per Instruction

CPU Time = Instruction Count * CPI * Clock Cycle Time

= (Instruction Count * CPI) / Clock Rate

CSEE 3827, Spring 2009 Martha Kim 45 CPI Example

Computer A: cycle time = 250ps, CPI=2.0 Computer B: cycle time = 500ps, CPI=1.2

Same ISA

Which is faster, and by how much?

CPU Time Instruction Count CPI Cycle Time A = × A × A = I× 2.0× 250ps = I×500ps A is faster... CPU Time Instruction Count CPI Cycle Time B = × B × B = I×1.2×500ps = I× 600ps CPU Time I× 600ps B = = 1.2 … by this much CPU Time I 500ps A ×

CSEE 3827, Spring 2009 Martha Kim 46 Amdahl’s Law

Be aware when optimizing. . .

Taffected T improved = + T unaffected improvement factor

Example: On machine A, multiplication accounts for 80s out of 100s total CPU time.

How much improvement in multiplication performance to get 5x speedup overall?

Corollary: make the common case fast

CSEE 3827, Spring 2009 Martha Kim 47 Performance Summary

Instructions Clock cycles Seconds CPU Time = x x Program Instruction Clock cycle

Algorithm, programming language and compiler compiler affect these terms.

ISA affects all three.

Performance depends on all of these things.

CSEE 3827, Spring 2009 Martha Kim 48 Single-Cycle CPU Performance Issues

• Longest delay determines clock period

• Critical path: load instruction

• instruction memory → register file → ALU → data memory → register file

• Not feasible to vary clock period for different instructions

• We will improve performance by pipelining

CSEE 3827, Spring 2009 Martha Kim 49 Pipelining Preview: Laundry Analogy

0- !- 4IME 4ASK ORDER !

"

#

$

0- !- 4IME

4ASK ORDER !

"

#

$

1, Ê{°ÓxÊ / iʏ>՘`ÀÞÊ>˜>œ}ÞÊvœÀÊ«ˆ«iˆ˜ˆ˜}°!NN "RIAN #ATHY AND$ONEACHHAVEDIRTYCLOTHES TOBEWASHED DRIED FOLDED ANDPUTAWAY4HEWASHER DRYER hFOLDER vANDhSTORERvEACHTAKEMINUTESFOR THEIRTASK3EQUENTIALLAUNDRYTAKESHOURSFORLOADSOFWASH WHILEPIPELINEDLAUNDRYTAKESJUSTHOURS 7ESHOWTHEPIPELINESTAGEOFDIFFERENTLOADSOVERTIMEBYSHOWINGCOPIESOFTHEFOURRESOURCESONTHIS TWO DIMENSIONALTIMELINE BUTWEREALLYHAVEJUSTONEOFEACHRESOURCE#OPYRIGHTÚ%LSEVIER )NC!LL RIGHTSRESERVED

CSEE 3827, Spring 2009 Martha Kim 50