ENEE 646: Digital Computer Design — Project 3 (15%)

Project 3: Precise (15%) ENEE 646: Digital Computer Design, Fall 2017 Assigned: Thursday, Oct 5; Due: Thursday, Nov 9

Purpose This project is intended to help you understand in detail how a modern microprocessor operates in concert with an operating system. You will build a precise- facility into your pipeline, you will add support for memory management via a translation lookaside buffer (TLB), and, using RiSC-16 assembly code, you will write a software TLB-miss handler—the heart of a typical system, which happens to be one of the most fundamental services that a modern operating system provides. Therefore, you will see the interaction between OS-level software and specialized control hardware (e.g. control registers and TLBs, as opposed to simple instruction-execution hardware), and you will see how the OS uses and responds to interrupts—arguably the fundamental building block of today’s multitasking systems. Precise Interrupts in Pipelined Computers The new & improved RiSC-16 pipeline is shown in Fig. 1 on the next page. In the figure, shaded boxes represent clocked registers; thick lines represent 16-bit buses; thin lines represent smaller data paths; and dotted lines represent control paths. The pipeline is slightly different from the one illustrated and described in the previous project, reflecting the following changes: 1. The TLB is added; it has two ports: one for instruction fetch, and one for data-memory access. 2. Support for detecting and handling precise interrupts has been added by the creation of a 7-bit exception register (labeled EXC in the figure) in pipeline registers IF/ID through MEM/WB. Also, the instruction’s PC is maintained all the way to the MEM/WB register. If a stage’s EXC register is non-zero, the corresponding instruction is interpreted as having raised an exception. The pipeline uses these values to ensure that all instructions following an exceptional instruction become NOPs: if there is an exception in the writeback stage, all other instructions in the pipe should be squashed.

3. CTL1 now represents control logic for handling exceptions and interrupts in the writeback stage. When a non-zero value is present in the MEMWB_exc register, all pipeline registers except for the program counter are reset, and the program counter latches the value coming from memory port 2, which corresponds to the contents of the corresponding entry in the interrupt-vector table.

4. CTL8 is new in the memory-access stage; it handles the interaction of exceptional instructions and memory access. For instance, MEMWB_exc should get a non-zero value if either EXMEM_exc is non-zero or the data access (assuming LW or SW) raises an exception. In addition, it should set the new pipeline register MEMWB_ifx (which stands for instruction-fetch exception), indicating that the TLB-miss exception, if present, was generated in the fetch stage and not the memory stage.

5. CTL9 is new in the memory-access stage; it handles TLB_WRITE events.

6. CTL0 is new in the fetch stage; it handles TLB misses by inserting the appropriate exception code into IFID_exc when a TLB miss occurs. As mentioned in class, interrupts must be handled in the writeback stage, otherwise it might be possible for interrupts to be handled out-of-order if back-to-back instructions cause exceptions but do so in different stages of the pipeline. If an exceptional instruction is flagged as such at the moment the exception is detected, it is safe to handle that exceptional condition during writeback because all previous

1 ENEE 646: Digital Computer Design — Project 3 (15%)  &()(9,,425:90708()3!!7470.9&!70.(80:39077:598;?

$( &F5.

!74A7,24:3907

&FU %45R-=98 $( '!  %B %6=88 9 ;5479? !  2=88 <= T4 49942R-=98 %O $%H

#

 #L;5479? %6 % &%

( ! 7 7 7 0 ! (

(  %5% 6 $<=19  $=A3 O9 7 9 %6 &F8 7 $# #H:$%#:B $#

!89,,, &% &%   %6 $%H 

%6  &F &F45 45 &F(22 %6 !89425 ( ! 7% 0 ! !#  !#  !#  0

&F710 &F,,: F&% &F,,: $%H & ,,: $# $# %6

0 ! 7% 0 ! $% #% 6& &%  $( =98CR 49942R-=98V7C W %45R-=98 49942R-=98 &F,8(/ %6 $( ! '!  <= X !  &F;53 M9,- 2=88 %B;5479? T4 49942R-=98

0O903/  #L 7-=98 %6=88 &F &F,//7 0V. $%H %6 # %( M/202 % &%  #L;5479?

%6R

&F4:9  7% 0 ! #:#(%% : (0 9 M#:%Q $%H &F71*(3 %6 =A C#=$  89,A05=50

 2 ENEE 646: Digital Computer Design — Project 3 (15%) instructions by that time have finished execution and committed their state to the machine. For this to work, the following things must happen in the pipeline: 1. Once an exceptional condition is detected in the writeback stage, all instructions in the pipeline behind the exceptional instruction are turned into NOPs, and a NOP is inserted into IFID instead of whatever was fetched during that cycle. 2. Each pipeline stage must suspend normal operation if the instruction has caused an exception and the pipeline stage modifies machine state: for example, do not write to memory if the instruction caused a privilege violation in a previous stage—this is indicated by CTL2 taking into account the value in MEMWB_exc. Each stage also forwards the exception code on to the following pipeline stage. If the instruction has not already caused an exception but does so during the stage in question, the EXC field in the following pipeline register must be set appropriately. Otherwise, pipeline operation is as normal. In the simplest form of an exceptional condition, when an exceptional instruction reaches the writeback stage, the following steps are performed by the hardware: 1. The PC of the exceptional instruction, or perhaps the instruction after the exceptional instruction (PC+1) is saved in the EPC control register (exceptional PC). 2. The exception type is used as an index into the interrupt vector table (IVT), located at physical address 80, and the vector corresponding to the exception type is loaded into the program counter. This is known as vectoring to the exception/interrupt handler. Some exceptions cause the hardware to perform additional steps before vectoring to the handler. For instance, you will implement a TLB-miss exception facility and handler routine. In addition to the steps listed above, before vectoring to the handler, the hardware will create an address for the handler to use. The reason hardware might store PC+1 and not PC in EPC is that some exception-raising instructions should be retried at the end of a handler’s execution (e.g., an instruction that causes a TLB miss), while others should be jumped over (e.g., TRAP instructions that invoke the operating system—jumping back to a TRAP instruction will simply re-invoke the trap and so cause an endless loop). If the exceptional instruction should be retried, the handler jumps to PC; if the exceptional instruction should not be re- executed or retried, the handler jumps to PC+1. Thus, EPC must have the correct value. The general form of a RiSC-16 exception/interrupt handler looks like the following: 1. Save the EPC in a safe location. This is done in case another exception or interrupt occurs before the handler has completed execution. 2. Handle the exception/interrupt. 3. Reload the EPC. 4. Return the processor to user/unprivileged mode and jump to the (modified) EPC. Most architectures have a facility (‘return-from-exception’) that performs step 4 in an atomic manner. Extended RiSC-16 ISA We must extend the RiSC-16 instruction-set architecture with system-level mechanisms. The extensions are the usual facilities found in any microarchitecture that supports operating systems and privileged mode. We need a way to protect the operating system from user processes; we need a way to distinguish between processes; we need a way to translate virtual addresses; we need an exception-handling facility; the operating system needs some control registers and would do well to have a set of general-purpose registers that it can use without disturbing user processes (otherwise, it would have to save/restore the entire register state on every exception, interrupt, or trap); etc. Briefly, the extensions include the following:

3  &ENEE()(9,, 646:425 :Digital90708 Computer()3!!7470 Design.9&!7 0—.( 8Project0:39077 :35 (15%)98;?

#04589075;0M 0247Y,5M 7 $70,/8,8)074+ P 7 7

'"$"%$%% 7 PQ " &$#  P2 &$#$! 7 7 7

72 P #04589078,//70880/ 75983472,;53897:.95438

#04589075;08M 0247Y,5M 7 $70,/8,8)074+ .7 $70,/8,8)074+ P 7 .7 ?!# &807!,40%,-;08 !!;5,%C ,3/!74.088$97:.98 7 .7 ?!#

'"$"%$%% 7 .7 ?!# %C PQ P2 " 0# %  7 .7 !$#

7 .7 F$# $WX,3/;078W R ( R2( F 0;5.08 7 .7 F#

72 .72 ! P #04589078,..0880/ #04589078,//70880/ ;5,850.5,;53897:.95438 75983472,;53897:.95438

!74.08847$9,9:8#0458907$!$#+M

Z Q Z 2 Z  Z  Z  Z  Z  Z  Z $F Q$53595,;5)0/94 +   

54 M#04589078,3/20247Y2,58;585-;053575;5;040/,3/343 575;5;040/24/08

 1.A9Addition03/0/ of# an($ exception/interrupt-handling :$ facility, as well as a mechanism that allows software to directly enter the machine into an exceptional state—for example, traps. Some of the !02:89‘exceptions’0(903/9+0 #that-$ this  mechanism-3897:.9-43 supports809,7.+ -are90. 9actually:707-9 +privileged888902 9instructions0;0920.+, 3that-82 the8 % machine+00(903 8-438 ,70 9+0 :8handles:,9 1,. -at9- 9the-08 time14:3 of/ -instruction3,38 2-. execution,74,7.+-90. 9instead:70 9+, of98 vectoring:5547984 to50 a7 ,software9-3? 888 handler9028,3 routine./57-;-9 0?0/ 24/0 !0This300 /includes,7,8 TLB9457 4handling90.99+0 4routines,507,9-3? the88 8HALT90217 4instruction,2:807574 .etc.088 08@70300/,7,894/-89-3?:-8+ -097003574.08808@70300/,7,89497,389,90;-79:,9,//708808@70300/,30(.059-43 +,3/9-3?1,.-9-98@ 9+0452.07,9Addition-3?88890 2of 3a 0privileged0/88420 kernel.43974 mode970?- 8that9078 is,3 activated/74:9/ upon/470 handling9994+,;0 an, 8exception0941?03 0or7, 9interrupt 5:75480 or70 ? -890789+,9upon-9.,3 handling:807-9+ a4 TRAP:9/-89 :instruction,7--3?:807 5which74.08 8raises08B4 an9+0 exception.77-80C-97 4:9/+,;0948,;0 70894709+0039-70 70?-8903.789Addition,90430; of07 8a 0set(. of05 98- 4control3C-390 7registers7:59C47 that97,5 Eare@09 used. 7 -while0198C in9+ 0privileged0(9038-4 3mode.8-3.9 :These/09+ 0are14 9shown947-3 ?inG  //Figure-9-434 2.1, GPR30(. refers059-4 3to - 3a9 general-purpose077:59 +,3/9-3? register.1,.-9-98 CThe,87 processor099,8, 2status0.+ ,register3-829 +(PSR),9,99 4contains788419 7mode,709 4 /-70.bits998 that0390 directly79+02 ,influence.+-30-39 4processor,30(.0 5operation.9-43,989, 9The0I 1interrupt470(,2 5service90C97, register58 $4 2(ISR)041 indicates9+0J0(. 0what59-43 8K 9+,9interrupts9+-820.+ have,3-82 been8:5 received54798,70 by,. the9:, 9processor.9857-;-90 ?The0/ -interrupt3897:.9-4 mask389+ ,register99+02 (IMR),.+-3 0allows+,3/ 9software08,99+0  9-20to4 1ignore-3897: selected.9-430( interrupts.0.:9-43C-3 The890, /exceptional41;0.947 -program3?94, 8counter4197,7 0(EPC)+,3/ 9register0774:9 -is3 0filled %+- 8by-3 hardware.9:/08% L when vectoring to a handler and indicates the PC of the exceptional instruction (or, rather, what +,3/9-3?74:9-308C9+0ML%-3897:.9-43C09. PC to return to after handling the exceptional condition). Access to the general-purpose register  //file-9-4 is3 still41, possible57-;-90? while0/O0 7in30 kernel924/ 0mode9+,9 -through8,.9-;, 9special0/:54 instructions3+,3/9-3? ,(which30(.0 5are9-4 not34 7necessary-39077:5 9for47  :543this+, assignment).3/9-3?,%# !-3897:.9-43C7+-.+7,-808,30(.059-43 4. Addition of a translation lookaside buffer (TLB) that performs address translation. For this project,  //-9-4341,80941R.43974970?-890789+,9,70:80/7+-90-357-;-90?0/24/0 %+080,708+473-3 the TLB will have two (2) entries and be fully associative, with a random replacement policy. On -?:every70 TLBJT! #write,K701 0consult7894, ?the03 0counter7,9 5:7 5bit48 to07 choose0?-8907 which%+05 TLB74.0 8entry8478 9to,9 :replace.870+,8 9 07-!$#1.439,-38 24/0--989+,9/-70.998-319:03.0574.088474507,9-43 %+0,39077:59807;,.070+,8907-4$#1-3/-.,908 75.+,The9-39 0definition77:598+, ;of0 -a 0memory0370.0 -map;0/ that-89 +delineates0574.088 portions47 %+0, 3of9 0the77: virtual592,8 6space70+, 8as90 mapped7-4#1 through,994788 419 7,70the94 TLBs,-?3470 accessible8090.90/ -in39 0kernel77:598 mode %+0 0only,8.05 etc.9,43 This,957 is4+ illustrated7,2.4:3 9in07 the- !figure170 ?above.-8907- 8In1 -user990/ mode,-8+,7 / 7,70all7 of+0 the3; 0address.947-3? space94, +is, mapped3/907,3 /through-3/-., 9the089 TLB+0! (all41 virtual9+00( .addresses059-43,9 are-38 9first7:. 9translated-43 ..0 8by8 9the49+ 0 TLB before being used to reference memory locations; note this implies that all virtual addresses

 4 ENEE 646: Digital Computer Design — Project 3 (15%)

are valid in user mode). In kernel mode, the top half of the address space is mapped through the TLB, and the bottom half is not: this means that addresses in this region, while the computer is in privileged kernel mode, will be sent directly to the memory system without first being translated. 6. Addition of the concept of an address-space identifier (ASID). While this is not strictly necessary, it does simplify the virtual memory mechanisms and organization. The ASID distinguishes between different process that might run on the machine, and its use allows state from many different processes to reside in the TLB and cache at the same time (otherwise the TLB and, at least potentially, the cache as well would have to be flushed on context switches). For purposes of TLB access, ASID 0 is interpreted by the hardware to indicate the kernel, executing in privileged mode. When the processor is in kernel mode, instruction fetch will always use ASID 0, and data- memory access will use whatever ASID is in the processor . This last mechanism allows the operating system to read and write locations within different user address spaces (i.e., masquerade as different processes), but it prevents the operating system from executing instructions from unprivileged processes (which might otherwise constitute a security hole). 7. Definition of some memory-management constructs, including the user organization. Having hardware define this structure is beneficial in that the hardware can quickly generate the address that the TLB-miss handler needs to locate the PTE. In many systems, this limits flexibility because the hardware dictates a page-table format to the operating system. However, if desired, software can always ignore this address (treat it as a ‘hint’ that need not be followed) and implement whatever page table it wants. Note that, in such a scenario, the operating system would then have to generate its own PTE addresses without support from hardware. Terminology Following Motorola’s terminology, we will distinguish two classes of exceptional conditions: those stemming from internal actions (exceptions) and those stemming from external actions (interrupts). For instance, the following interrupt types are defined:

INT_CLOCK INT_TIMER INT_IO Though this is not a particularly exhaustive list, it is more than enough for the purposes of this project, in which we will not even bother with interrupts, save possibly the timer or clock interrupt, both of which could be used to debug the system. In addition, several exception types are defined, including:

EXC_TLBUMISS EXC_TLBKMISS EXC_INVALIDOPCODE EXC_PRIVILEGES Corresponding interrupt/exception numbers are listed in the following section. Each interrupt or exception type corresponds to a single vector point and thus a single handler routine. Another type of exception (another class of internally generated exceptional condition) is a TRAP. For TRAP instructions, there is a whole class of trap types that can implement various operating-system routines, because the trap type is interpreted by the hardware to indicate a particular vector, just like each exception and interrupt type has a separate vector. Thus one can think of trap as equal in stature to interrupt or exception. Each of the trap vectors is OS-defined (e.g., TRAP 1 can mean read or write or open or close ...); the hardware simply vectors to the corresponding handler, so the operating system can attach arbitrary semantics to each of the trap handlers. The most noticeable effect of using this style of mechanism is to reduce the register-file pressure in handling system calls. Note that this is unlike most architectures, in which there is a single TRAP exception and all system calls are vectored through the same exception, and the user code first places the trap type into a user-visible register for the operating system

5 ENEE 646: Digital Computer Design — Project 3 (15%) to read once the handler runs. Our implementation is different not for any specific reason, but rather to explore the OS design space. New Instructions As mentioned, the instruction set has changed, now that software can insert exceptions directly into the pipeline and can execute a number of new privileged instructions. Additional instructions are given in the table below. Pseudo-instructions are generated by the compiler; privileged instructions are a new class.

Assembly-Code Format Meaning

PSEUDO INSTRUCTIONS:

nop do nothing (add r0, r0, r0)

trap type trap to the operating system with vector type

halt or trap TRAP_HALT ask operating system to stop machine & print state

lli regA, immed R[regA] <- R[regA] + (immed & 0x3f)

movi regA, immed R[regA] <- immed, implemented as lui/lli combo

.fill immed initialized data with value immed

.space immed zero-filled data array of size immed

PRIVILEGED INSTRUCTIONS:

tlbw regA, regB write TLB entry (held in regA and regB) to the TLB: bottom 8 bits of regA contain the PFN; bits 9–14 of regB contain ASID; bottom 8 bits of regB contain the VPN; all other bits in regA and regB are ignored

rfe regB return from exception: waits until writeback stage to jump through regB; simultaneously returns processor to previously stored mode (does not save the link pointer)

sys class cause exceptional condition of specified class (inserts the value of ìclassî directly into IDEX.exc; good for testing & debugging)

sys MODE_HALT an example of the previous instruction: stop machine & print state  &()(9,,425:90708()3!!7470.9&!70.(80:39077:598;? These and related modifications/extensions are described in more detail in the following sections.

Privileged  !7 (Instructions;(,0)0/:38 (and97:.9 TRAP)(438;,3/%#!? As shown88$4& in3 the(3 9figure$01(, below,:70-0 0the4& JALR19$02 instruction3#(3897 :is. overlapped9(43(84;0 with70,55 a0 number/&(9$, of3 :other2-0 7instructions.4149$07(3 897:.9(438

%97       > ? ;       -%98 -%98 -%98 ;-%98 (1*%220/%,90 1235#7  70A 70A

-%98 -%98 -%98 -%98 -%98 9C07D%802F% 7  70A 70A F%* ! F%*%

If  9the$07 bottommost&(8019$0-4 99422489 7 @bits-( 9of8 4the19 $instruction0(3897:.9 (are43 treated,70970 as,9 0an/ ,opcode8,34 5and.4 /associated0,3/,88 data4.(, for90/ a/ system-level,9,147,8> instruction8902 00;00 other(3897 :.9(4349$07 than9$, a3 JALR.,23 In# this<3 9case,$(8. the,80 1extended9$00B90 opcode3/0/4 5EXT_OP.4/0D takes%* !precedence9,H08570 over.0/0 the3.0 JALR4;07 opcode—i.e.,9$023#45 .4/0I( 0 1 the9$ instruction0(3897:.9( 4does3/4 not083 perform4950714 a7 2jump,J: and25 link,3/ but0(3H rather-:97 ,performs9$07507 1the472 operation89$045 0specified7,9(438 5by0. (1(0/->D%* ! EXT_OP.479$08 0343 23#(3897:.9(4381&0$,;09&4.0,8808L57(;(00,0/8>8902 00;004507,9(4381,3/9$07,(8(3, 410B.059(43,0.43/(9(4389$74:,$:807 00;00%#!(3897:.9(438M10,.$41&$(.$$,89$014004&(3,9(2(3,N

 ".059'438+'39077:598+,3/97,58,8&000,8,00 .$,3,08L1470B,25001 *R 63%,3/ #1&$(.$.$,3,089$0H07300 :80724/0-(9M,708$,/0/(39$014004&(3,9,-00,3/$,3/00/(39$0 07'90-,.289,3094570;039:3&,390/(3907,.9(43&(9$0,70(07(3897:.9(438(39$05(500(30  !7';'6030/4507,9'4380B.059147 .$,3,08,700019:38$,/0/(39$014004&(3,9,-00,3/,70 $,3/00/&$070;07(82489,557457(,90N147(389,3.019$0%3 2,3,,02039(3897:.9(438,70$,3/00/ (39$020247989,30-0.,:80/4(3,84,:,7,390089$,92:09(500/,9, %3,..08808&(00349$,5503 L( 0 19$0%3&(00349-0574-0/147,/,9,,//7088,99$08,209(209$,99$0%3 V#<%(3897:. 9(43(8&7(9(3,,30&0397>(3949$0%3M %$0800B903/0/45.4/08,70:80/94/(70.90>,110.9.$,3,0(39$04507,9(43419$0$,7/&,70W2,3>419$02 (380790B.059(43,0.43/(9(438(3949$05(500(301-:98420,70:80/9424;0/,9,,74:3/9$08>8902 00,70 57(;(00,0/4507,9(438L( 0 9$0>706:(709$,99$0574.08847-0(3H0730024/0M0B.0591479$0%#!(3897:. 9(4381&$(.$8(250>(3;4H09$04507,9(3,8>8902,3/9$:803,-00H0730024/080.:700>  4909$,99$(8 -7(3,8:5,3(3907089(3,54(39N9$0R3%(3897:.9(43(834&/01(30/,8,80(,$90>2470.42500B574.088 9$,3-01470 R3%(834&/01(30/,8430419$080;07,024/08410B0.:9(43L(3.0:/(3,#& ,3/$3!M1 ,3/5:99(3,9$0574.08847(3,850.(1(.24/0(8,57(;(00,0/,.9(43I:807.4/0.,3349R3%9$0574.08 847 %$:81:807.4/02:89,8H9$04507,9(3,8>8902945071472,R3%L&$(.$,004&89$0,7,.01:08$:9 /4&3419$02,.$(301&0709$070,1(008>89024784209$(3,8(2(0,7,99,.$0/M %$:81R3%(834&,9&4 89,,0574.088N1(7891:807.4/0.,008,%#!(3897:.9(43&(9$R3%,89$0,7,:2039 %$(8.,:808,30B.05 9(43,0.43/(9(431,3/9$02,.$(30;0.9478949$04507,9(3,8>8902\8.47708543/(3,%#!$,3/0071&$(.$ (3 9:73 .00,38 :5 ,3> 8>8902 89,90 30.088,7> L349 300/0/ (3 9$(8 (25002039,9(43M ,3/ 9$03 .,008 9$0  *R3%(3897:.9(43  4909$,9L1479$00B9039419$(8574J0.9M(3,00.,808-:9%3*V#<%,3/$]$*#19$07,3/71(00/8 419$023#(3897:.9(43,70(,3470/ %$0809&4(3897:.9(4381$4&0;071/4:809$07,3/ 4771(00/841 9$0(3897:.9(431850.(1>(3,,70,(89079470,/,3/ 47&7(90 #:8089$071(00/94574;(/0,J:25,//7088 9,H0317429$070,(89071(00 %3*V#<%.43897:.98,%30397>17429&4/(1107039 -(9&47/870,/ 17429$070,(89071(00L( 0 1(9:808-49$7,3/7M %$09,-00439$014004&(3,5,,0/08.7(-089$0;,7(4:80B903/0/45.4/08,3/9$0(7,884.(,90/5488(-00/,9, ;,0:08 <90289$,9,708$,/0/(39$09,-00705708039.43/(9(4389$,9.,:80$,7/&,7094;0.94794,8419&,70

@ ENEE 646: Digital Computer Design — Project 3 (15%)

For these non-JALR instructions, we have two classes (privileged system-level operations, and the raising of exceptional conditions through user-level TRAP instructions), each of which has the following timing: 1. Exceptions, interrupts, and traps as well as all MODE changes (for example, MODE_HALT and RFE, which changes the kernel/user mode bit) are shaded in the following table and handled in the writeback stage to prevent unwanted interaction with earlier instructions in the pipeline. 2. Privileged operations except for MODE changes are left unshaded in the following table and are handled wherever is most appropriate: for instance, the TLB-management instructions are handled in the memory stage because doing so guarantees that multiple data-TLB accesses will not happen (i.e., the TLB will not be probed for a data address at the same time that the TLB- WRITE instruction is writing a new entry into the TLB). These extended opcodes are used to directly affect change in the operation of the hardware; many of them insert exceptional conditions into the pipeline, but some are used to move data around the system. All are privileged operations (i.e. they require that the processor be in kernel mode) except for the TRAP instructions, which simply invoke the operating system and thus enable kernel mode securely. Note that this brings up an interesting point: the HALT instruction is now defined as a slightly more complex process than before. HALT is now defined as one of the several modes of execution (including RUN and SLEEP), and putting the processor in a specific mode is a privileged action—user code cannot HALT the processor. Thus, user code must ask the operating system to perform a HALT (which allows the graceful shut-down of the machine, were there a file system or something similar attached). Thus, HALT is now a two-stage process: first, user code calls a TRAP instruction with HALT as the argument. This causes an exceptional condition, and the machine vectors to the operating system’s corresponding TRAP handler, which in turn cleans up any system state necessary (not needed in this implementation) and then calls the MODE_HALT instruction. Note that (for the extent of this project) in all cases but TLB_WRITE and SYS_RFE, the rA and rB fields of the JALR instruction are ignored. These two instructions, however, do use the rB and/or rA fields of the instruction, specifying a register to read and/or write. RFE uses the rB field to provide a jump address taken from the register file. TLB_WRITE constructs a TLB entry from two different 16-bit words read from the register file (i.e., it uses both rA and rB). The table on the following page describes the various extended opcodes and their associated possible data values. Items that are shaded in the table represent conditions that cause hardware to vector to a software routine; all other items are essentially instructions that the hardware executes, just like ADD, ADDI, NAND, etc. Those mechanisms that your Project 3 implementation must support are in bold. Any of these exceptional conditions or extended opcodes can be invoked through the assembler, using the EXTEND opcode (looks like JALR with a non-zero immediate field). In most cases, there is no need to specify both EXT_OP and EXT_DATA because the EXT_DATA name uniquely identifies the exceptional condition or extended operation. The new assembler (can be found on the course website) supports this facility via several mechanisms: 1. First, the sys opcode, which takes a single value as an operand (rA and rB are both zero):

sys MODE_HALT # halts the machine sys INT_CLOCK # vectors to the CLOCK interrupt handler sys EXC_TLBUMISS # vectors to the TLBUMISS exception handler sys TRAP_HALT # vectors to the HALT trap handler (which executes a MODE_HALT) 2. Second, the ext (EXTEND) opcode, which functions like JALR in that two registers are specified, but an additional argument is also given to be used as the immediate value. This is how the TLB_WRITE and TLB_READ functions are specified. Examples of its use:

ext rA, rB, TLB_READ # VPN to search for is in rB, match is written to rA ext rA, rB, TLB_WRITE # reads entry from rB, rA is ignored

7 ENEE 646: Digital Computer Design — Project 3 (15%)

ext rA, rB, MODE_HALT # identical to “sys MODE_HALT” ... rA and rB are ignored ext rA, rB, SYS_RFE # this is identical to “rfe rB” 3. Last, several of the operations are common enough to warrant their own opcodes:

rfe rB # identical to: ext r0, rB, SYS_RFE trap type # identical to: sys type halt # identical to: trap HALT or sys TRAP_HALT tlbw rA, rB # identical to: ext rA, rB, TLB_WRITE

Opcode Extension (EXT_OP) Data Extension (EXT_DATA) Semantics

SYS_MODE (000) MODE_RUN (0) Normal mode—ignore (equivalent to JALR) MODE_SLEEP (1) Low-power doze mode, awakened by interrupt MODE_HALT (2) Halt machine MODE_RFU3 No definition yet .. MODE_RFU7 (3 .. 7) MODE_PANIC8 Halt and output panic value (8..15) (meaning is software- .. MODE_PANIC15 (8 .. 15) defined)

SYS_TLB (001) TLB_READ (0) Probe TLB for PTE matching VPN in rB TLB_WRITE (1) Write contents of rB to TLB (random) TLB_CLEAR (2) Clear contents of TLB

SYS_CRMOVE (010) Top bit specifies to/from Moves a value to/from the control registers from/to the Bottom 3 bits identify CR# general-purpose registers

SYS_RFE (011) Data value ignored Return From Exception: JUMP (without link) to address held in rB (a control register), and right-shift (zero-fill) the kmode history vector

SYS_RESERVED (100) Has no definition yet

SYS_EXCEPTION (101) EXC_GENERAL (0) General exception vector EXC_TLBUMISS (1) User address caused TLB miss EXC_TLBKMISS (2) Kernel address caused TLB miss EXC_INVALIDOPCODE (3) Opcode the execute stage does not recognize EXC_INVALIDADDR (4) Memory address is out of valid range EXC_PRIVILEGES (5) Decoded privileged instruction in user mode

SYS_INTERRUPT (110) INT_IO (0) General I/O interrupt INT_CLOCK (1) Used to synchronize with external real-time clock INT_TIMER (2) Raised by a watchdog timer

SYS_TRAP (111) TRAP_GENERAL (0) General operating system TRAP vector TRAP_HALT (1) Ask operating system to perform HALT

Processor Status Register The processor-status register is often considered the heart of a machine, as it contains some of the most important state information in a processor. See the earlier figure for the RiSC-16’s PSR; it includes three pieces of information: (1) the 6-bit ASID of the executing process, (2) the kernel-mode bit that enables the privileged kernel mode, and (3) an 8-bit wide bit-array that represents a history of previous kernel-mode bit values. 1. The address-space identifier (ASID) is a number that identifies the process currently active on the CPU. This is used to extend the virtual address when accessing the TLB. Note that the operating system gets special treatment in this regard: first, ASID 0 is considered to be synonymous with kernel mode (this allows the TLB to translate kernel addresses appropriately without having to know whether the machine is in privileged mode or not). Second, the kernel can operate in kernel mode with an ASID other than 0 in the processor status register—this would allow the operating system to perform I/O operations to and from a user-level process address space. However, to

8 ENEE 646: Digital Computer Design — Project 3 (15%)

prevent potential security holes, and to clearly delineate handler code and operating-system code from user-level code, instruction fetch should never proceed with a non-zero ASID if the processor is in kernel mode. For instance, when the processor vectors to a handler, the ASID of the previously running process is still in the PSR, but the handler instructions will be fetched from kernel space. 2. The K (kernel-mode) bit indicates whether the processor is in privileged mode or user mode; privileged instructions are only allowed to execute while the processor is in privileged mode; they cause an exception otherwise (EXC_PRIVILEGES). The memory map also changes depending on the mode: as mentioned previously, user space is mapped through the TLB; kernel space is divided into mapped and unmapped regions.

3. The K1 through K8 bits make up a shift register that maintains the previous eight modes of operation; every time an exception or interrupt is handled, the kernel-mode bit is left-shifted into this array (which is itself shifted to the left to accommodate the incoming bit), and the kernel most is set appropriately (usually turned on). Every invocation of the return-from-exception instruction right-shifts the K1 through K8 bit-array (while zero-filling from the left) and places the rightmost bit of the array into the kernel-mode bit. Note that if the architecture is defined such that user mode cannot be entered from kernel mode except by a return-from-exception, then the shift-register implementation can be replaced with a simple counter. This implementation of maintaining previous history bits allows the hardware to handle nested interrupts, a facility that is extremely important in modern processors and operating systems. A nested interrupt is a situation where the hardware handles an exception or interrupt while in the middle of handling a completely different exception or interrupt. In our implementation of virtual memory, the ability to handle nested interrupts will be of crucial importance. Control Registers, Generally The control registers are those extra 8 registers that are visible only in kernel mode:

cr0 - reads as 0, read-only cr1 - For general-purpose use cr2 - For general-purpose use cr3 - For general-purpose use and TLB interface cr4 - Processor Status Register cr5 - Interrupt Status Register cr6 - Interrupt Mask Register cr7 - EPC Register They behave as follows:

cr0 Like rf0, this is always zero.

cr1 (gpr1) This register is for general-purpose use. However, if an interrupt handler is going to use the register, it should save the register’s contents before writing to it and restore the contents prior to exiting, just in case the handler happened to preempt another handler using the register.

cr2 (gpr2) This register is for general-purpose use, just like cr1.

9 ENEE 646: Digital Computer Design — Project 3 (15%)

cr3 (gpr3) Another general-purpose register, with the addition that on TLBMISS exceptions the hardware places into this register the address of the mapping page-table entry. Details are presented in a later section. Because the contents of this register may be overwritten at any moment due to a TLB miss in either user or kernel mode, the kernel should use this only as a scratch register. In particular, the UMISS handler should first move the VPN into a different register before attempting to load the user PTE.

cr4 (psr) The processor-status register described earlier. To simplify your handler implementation, when reading and writing the register via normal instructions (ADD, NAND, LW, SW, etc.) only the ASID portion of the register is read/written. The K-bit and shift register are printed out for debugging, but they are not part of normal reading/writing of the register file.

cr5 (isr) The interrupt status register, which contains a single bit for every interrupt type. Whenever an interrupt occurs, its corresponding bit in this register is set high. Thus, a simple poll of the interrupt status is possible by comparing its contents to r0; if they are equal, no interrupts have occurred. Interrupts cause the hardware to vector to a handler routine, provided that the interrupt mask register (described below) is not disabling the interrupt type. Not used in this project.

cr6 (imr) The interrupt mask register, which is set by the operating system. It defines the interrupts that the processor is allowed to handle, indicated by a ‘1’ in the appropriate bit position. On every cycle, a bit-wise AND is performed by the hardware between the ISR and the IMR; if the result is non-zero, the hardware places the appropriate interrupt class into the IFID_exc register. Not used in this project.

cr7 (epc) The exceptional PC, representing the return address for the exception/interrupt handler. Hardware loads this right before vectoring to an exception, interrupt, or trap handler. The first thing that a handler should do is save this value in a safe place, just in case another handler is invoked in a preemptive manner. These eight registers perform two separate functions: first, they provide the operating system access to mode-control registers such as the PSR and ISR; facilities similar to these are found in nearly every processor architecture in existence and are necessary for most system-level software. The second function provided to the operating system is a small set of shadow registers not visible to user-level processes. While not necessary for implementing most system-level software, shadow registers—such as those found in processor architectures as diverse as Digital’s Alpha, Sun’s SPARC, Hewlett-Packard’s PA-RISC, ’s Xscale, and Motorola’s M-CORE—provide the operating system a safe haven in which to operate. These registers do not need to be saved when moving to and from privileged mode, as would be necessary if the operating system shared the same register file as user-level code. For instance, the MIPS architecture has only one space of registers, and the kernel, to avoid having to save and restore user-level registers, claims two of the registers as its own: the assembler and compiler assure that user-level code does not access these registers, and they are not saved or restored on interrupts or system calls and traps. The disadvantage of having shadow registers is either ensuring that data can be moved between the two sets of registers (e.g. by providing a larger register specifier in some or all privileged-mode instructions, or by providing a special data-move operation whose sole function is to move data between register namespaces), or ensuring that such a scenario is never needed. As mentioned previously, these are the default registers when kernel mode is active, i.e. when the K-mode bit in the processor status register contains a ‘1’ value. Thus, when the operating system performs instructions like the following:

10  &()(9,,425:90708()3!!7470.9&!70.(80:39077:598;?

43$39077:598478+8902.,//8,3/97,58 %30/$8,/;,39,50413,;$3583,/47705$89078$80$9307038:7$35 93,9/,9,.,3-024;0/-097003930974809841705$8907890 5 -+574;$/$35,/,7507705$8907850.$1$07$3 842047,//57$;$/050/ 24/0$3897:.9$438;47-+574;$/$35,850.$,//,9, 24;04507,9$437348084/01:3. 9$43$89424;0/,9,-097003705$89073,2085,.08<;47038:7$3593,98:.3,8.03,7$4$830;07300/0/  82039$430/570;$4:8/+;93080,70930/01,:/9705$890787303>0730/24/0$8,.9$;0;$ 0 7303930? 24/0 -$9$3930574.0884789,9:8705$8907.439,$38,@B;,/:0 %3:8;73039304507,9$358+890250714728$3897:. 9$438/$>093014//47$35C

#0730'24/0+843 ,// 7/7 /7 ENEE 646: Digital Computer Design — Project 3 (15%) 9304507,3/;,/:08,7070,/1742.43974/705$89078;,3/930708:/9$877$990394, .43974/705$8907 %3$8 0D,25/024;08930.439039841930574.0884789,9:8705$8907$394.7 %30#7 0kernel$8,34 -mode;$4: 8is9 7,on/0 411-097003$25/02039$359308$D9003705$8907890$539:807705$890785/:80$539 .43add974/705$8 r1,907 8r0,<,8 ,r4:3$1$0/705$89071$/0:8$35930>0730/ 24/0-$9,893014:793705$8907 850.$1$07-$9;47 the operand,89748 0values5,7,90 are705 read$8907 1from$/088 0control/0.90/- registers,+,&G and.43 9the74// 0result/-+ 9is30 written>0730/ 2to4 /a0 control-$9 0 5register.03/$35 4This3$2 5/0 example203 moves9,9$43; 9the30 :readable3$1$0// 0contents8$53.,3 of3, ;the0, processor1,8907,.. status0889$2 register0I930 (just&G0 the// 0ASID)8$53., into33, ;cr1.0/4 70754707 .438:259$43 J07$///4,:3$1$0//08$53C930705$89071$/0$8347, 0397+,77,+41705$89078;,8455480/ There94 is, 3anL obvious0397+,77 trade-off,+ J303 between70,/$354 implementing777$9$3593070 5the$890 sixteen71$/0;93 registers0945-$9 4(eight1930 7user05$8 9registers07$/039$1 $plus073 0eight0/894 control.4 2registers)0174293 as0 2a 4unified/0-$9$ 3register93057 4file.08 8using89,9: 8the70 5kernel-mode$8907M$ 0 ;93 0bit-4 as994 the2L fourth705$890 register-specifier78,707010703.0/ $bit,3: 8or07 as two2 4separate/0;,3/ 9register30945L files705$ 8selected9078,70 7by01 0a7 0MUX3.0/$ 3controlled>0730/24 by/0 theN4 :kernel-mode300/94$25/ 0bit.20 3Depending993$8.4330 .on9$4 3  implementation, the unified design can have a faster access time; the MUXed design can have lower power consumption. //7088% We7,3 will8-,9 /do43 a, unified3/%1 design:8 the register file is now a 16-entry array of registers, as opposed%"0 to2 0an24 8-entry7( 2,3, array.,020 3When9.250 0reading2039,9. 4or3 /writing01.308 5the,,0 8register94-0 file,8 the47/ top8.3 bit003 ,of9" the %" register0701470: identifier, -.9 ;.79:,0,//7088.8.425480/41,3? -.9'! ,3/,3? -.95,,0411809:,8.00:897,90/.39"01.,:70-0048C

'%79:,(//7088. -"98 1-"98 1-"98 //7088$5,.0;/039"1"07/$;0 '"79:,'!,*0 :2-07/'! 0 !,*0 11809

%3

!"#8%.,(//7088. !,*07,20 :2-07/! 0 !,*0 11809 %"01.,:70,084.00:897,9089"020.",3.8241,//708897,380,9.43C;.79:,0,//708808,7097,380,90/-(9"0 needs% toD come.3945 from"(8.. ,the0,/ mode/70880 bit8 % in7,3 the80,9 .process43.438 .status89841 3register—i.e.,49".3,24709 "the,3 7bottom050,..3, 89 "registers0;.79:,0 are5,, referenced03:2-07 in user8. mode,9"9"0. 4and7708 the543 /top.3, 85 ,registers,017,20 are3: 2referenced-07 %"05 in,, 0kernel411809 mode..8./03 9You..,0 .need3-49 "to, /implement/708808F,, .this;03 847/.8 connection.,99"08, 2004.,9.438.9".3,5,,0:8"09"079"05,,0.8;.79:,0475"(8..,0G  %"0%D.8,.,."0::8:,00(.250020390/,8,.439039,//7088,-0020247(FG:471:00(,884..,9.;0 Address.,." 0Translation K39".8.250 0and203 9TLBs,9.43:9"0.,."0L89,,1.00/.89"0.43.,903,9.4341, -.9$K,3/,3? -.9;.7 The memory-management9:,05,,03:2-07 %"0 .implementation47708543/.3,/,9 ,defines1.00/4 pages19"0 .to,. "be0 0256397( words.89"0 in5, ,length.017,2 0Therefore,3:2-078 "a0 16-bit709"0 virtual.3 /address..,90/; is.7 9composed:,05,,0., of3- an01 48-bit:3/ VPN and an 8-bit page offset, as illustrated in the figure below: The figureK39".8 also.25 0illustrates02039,9.4 3the:,0 0mechanism:807,//708 8of08 address1742 P translation: 94 P virtual,70 addresses2,550/9" are74: ,translated"9"0%D by % the"09 45 ",01419"0R07300L8;.79:,085,.0.8,0842,550/9"74:,"9"0%D %"0-49942",01419"0R07300L8,//7088 TLB 8into5,.0 physical:7,3,.3, addresses.1742,//7 0Translation88 P  9consists4 PS of .nothing82,550/ more/.70. than90(4 3replacing949"0-4 9the942 virtual",0141 page2,.3 number202 with 4the7(T corresponding9"05"(8..,0, /page/708 8frame06:,0 8number.9"0;.79: The,0,/ page/7088 offset %":8 :is7 0identical10703.08 in94 both9".88 5addresses,.0.,33 4(a9 .given,:80 ,word%D is at the2 same.88 % location".8.8,3 .within25479, 3a9 page,.438. /whether07,9.439 4the70 2page02- is07 virtual8"03/ or08 .physical).,3.3,(4: 7 %D 2.88",3/007 The TLB%"0 #is. $a cache, ", usually8,8419 %implemented,70 2,3,,0/ as%0 a content %$%88% 2addressable5)*20,38 memory9$,99$0 4(CAM),507,9%31 or8* 8fully9022 associative,3/3499$0 cache.$ ,In7/ 4this,70 2implementation,%87085438%-)0147 the$,3 cache’s/)%31% 0tag 7field012332 is9$ 0the,. 9concatenation414,)8%319$0 of5, a1 06-bit9,-) 0ASID43, %and9 an2% 88-bit8941 %3/ virtual page number. The corresponding data field of the cache entry is the page frame number where the indicated virtual page can be found.  In this implementation, all user addresses from 0x0000 to 0xFFFF are mapped through the TLB. The top half of the kernel’s virtual space is also mapped through the TLB. The bottom half of the kernel’s address space, ranging from address 0x0000 to 0x7FFF is mapped directly onto the bottom half of main memory —the physical address equals the virtual address. Thus, references to this space cannot cause a TLB miss. This is an important consideration to remember when designing your TLB-miss handler. The RiSC-16 has a software-managed TLB. This simply means that the operating system, and not the hardware, is responsible for handling TLB refill, the act of walking the page table on a TLB miss to find the appropriate page-table entry and inserting it into the TLB. When the TLB fails to find a given VPN for a mappable virtual address, the TLB raises an exception, invoking the operating system. If the address being translated is a user address (i.e., if the CPU was in user mode at the time of the exception), then the exception raised is a user miss, EXC_TLBUMISS. If the CPU is in kernel mode, and the top bit of the virtual address to be translated is a ‘1,’ then a kernel miss, or EXC_TLBKMISS, is raised. If in kernel

11  &()(9,,425:90708()3!!7470.9&!70.(80:39077:598;?  &()(9,,425:90708()3!!7470.9&!70.(80:39077:598;?

9"0,557457),905,*0 9,--00397/,3/)38079)3*)9)3949"0%3 6"039"0%31,)-8941)3/,*);03'! 147 ,9"20,,5555,7-4-507);,)9709:5,,-*,0/ 9/,7-0-8080>39"970/%,3/)378,0)8709)83*,3)90)?3.94059"9)043%>3)3; 46A")30*39"0%45307,19,)3)-*8984/18)930/2, *B1);90"30',/!/ 7018487 -,0)23*,5975,,3-8--0,9;0)/79):8,,-,:/80/770,8/8/>790"808%C)3 0 >)71,9)"808,!3&0?F.,085)93)4:38>0)73;244A/)30*,99"9"00495)2070,94)31*9"80/08?90.205 9)B4139G">09",0/3/790"808 0-?0.)035*9)947,3378,-),890/))8,H::808077,2//)78088>I8C)K 0 >*)1%93"0&!&B$F$, 8B1)39":0807!2&4)/80),39A9"070390)2-2044/109>",030/?.9"00599)4453G->)99"401399""00 ;0)?79.:0,5-9),4/3/770,8)809/4)-80,97H,:3880-7,920/)88)8>I,OK>Q*9"%033,&HA0B7$3$0- B21)98"80>I4!7&K)8)3*%A3073R0-2B4$/$0>>),837/,)98"00/ 94B15)-3)9A4071390"-0 2;)47/9:0,>-,,3///97"008894945-0)9947,1398"-0,9,0///)7808,8O)8>QO 9">Q093"0,,H/A/0730808-2.,)3838>4I94.7,:8K0,*3%03?.0R59)4B3$$->0.)8,:7,8)080)9/) 8B134)399A70,73380 - -2,904//09">7,43:/*9""09"9045%3-)94-1:99")308,90/,//70)8882)8,5O 5>0Q/9"/0)7,0/./9-7/084839.4,353"4/98)..,,:-8200,23407?/. 059)43-0.,:80)9)834997,38 -,90/9"74:*"9"0%3-:9)3890,/)82,550//)70.9-/43945"/8).,-20247/  # 7&,3)*,9)43419/0!,&0%,-40  # 7&,3)*,9)43419/0!,&0%,-40 %"05,*09,--01472,9)8;07/8)2)-,7949",9:80/)39"0B!$,7.")90.9:70T)9)8,9F4 9)070/9,--0>F"070 9%"0"09455,2*04899,-0-;001-4)782)3,95)"8/;80)7./,-8)825,).-,07>9F4)97"0,/9/:48F0/3)F3"9"0039"B0!,$55,7-)..",)99)04.39:)7800T?)90.)8:9,)39F*>4, 39)/0790"/09-,4-F-00>7F-0";007-0 )98"50,9*405,2--408,93-0/;,0/-/)870)83805/"/;8)7).9:,,--8-5/, .60>0FF))7-0-/.,/-4-F9"30F94"50-30;9"00-9,"505H-)7.4,499)I4314)7840-?;0).4::98)37*0>,,8433/89",30/-49"F00-74-F0;007- -)08;50,-*9"00,-H-:08,037/5,,/*/0709,8-80-0/I;-)709.:,,:--8/0 )6920,F5)8--9.",0--:98"0079,4/5/-700ENEE;808-895" 646:,0.H07 4Digital%49"I0145 Computer7,*40-;9,)4-:-08 Design7407,*8,433 )—U8, ,Project93)4/39")80 3)- -4(15%)-F:80 7 9-70,;900/-9-"00-4HF:8C0374590,*9"009,/-)1-100I70-30..0,:)3808.)9,-20,-5089F9"0003:89"007,:/80/7058,8*8059,,.-0- 0%,"30/59",0*0:890,7--,0//470*8,838)U5,,9.)40G3T)8)--:8 97,90/-0-4FC34909"0/)110703.0)38.,-0-09F0039"0:8075,*09,--0,3/9"0:807,//708885,.0GT #449N9,-O0PC!%M>47/ ,.W5,H09,-O00397XW,89W0 -591472,98W4>3-0O4>C #449N9,-O0PC!%M>47/ ,.W5,H09,-O00397XW,89W0 -591472,98W4>3-0O4>C # ' :3:80/ !,H07,20 :2-07 # ' :3:80/ !,H07,20 :2-07 !WX85.,O85,.0

7/ !'W5X798:5,.,O8O585,,.0.0 04 30 4 7/ '579:,O85,.0 04    &$#!K%3*!%8M5,'0+ 10 11 430    &$#!K%3*!%8M5,'0+ 10 11 '0 5, 430 '0 5, 430    &$##$$$!*5,'08+ 10 11    &$##$$$!*5,'08+ 10 11  !3%#5$  -59';#%&3#$$$!*=>47/8CE >47/5,H08+  !3%#5$  -59';#%&3#$$$!*=>47/8CE >47/5,H08+ %mode,"01:- -and,// the708 8top 85 ,bit.0 of.4 the39, )address38  is5 ,‘0,’*0 8the> F address")."70 6cannot:)708 cause! %an 8exception942,5) 9because ,." !it% is not)8 ,8)3*-0 F%translated4"70/>1:,3--/, /through/70!8%88 58the,..,0 3TLB.9"4:38 9but,1))398 )instead3 ,8)35* ,is-*0 0mapped58,>*F0" %)." "directly070164:7)07 >0onto8,8)3 physical*-!0%5,*80 9memory.412!%,58)9 . ,3,2.",5!9%"00)389),708):38*0-70 8F5,4.70/ >%,3"/0A073!0-%A080.5,83,9"80:98411)95),3*,08))33*)-9085;,)7*90: ,%-"8507,0.104>700,>.,"84)31*F-0"5).,"*0"4-1/!8%4308.:,830725,,5*09"90,-0-309 )7%0":08707 ,8Organization750,.0 4%1"90"0A800739 ,0of--A 0the080C59 8"Page0,7800,9 Table74015,*: 038)6)3:0)98;$)B79:8,T-98"50,.0$>B0,.)"841-F)9"8).F")/"04G->/,834/309":08.047757,0*850493,-/-)03 *%:"800770 9,7-0-08,7401"9"00-/80)39,9-"-00894C59"070;,)7790:,-5:,3*)068:4019"$0BA087T390"-0Q8,$//B708)885-,).908 F%)"/008G0>,370/)93"09:.7347270,855403//)-3/*7:484097 !9The,%--08 8page9",,709 ,table"700-/" 0)format3-/9")30994 "is50 very945; )similar79:F,-457 /,to8* 04that81451, used9*"001A7 in,027 3the00- Q 8 MIPS%,/"/:780 >architecture:898"805;,).709: ,%-"5 0it,8 *0is0, a73 0two-tiered:)23-9:0734219 ,"table,505:08/0 where-7/5,74*40 9 9!,the-%-0 topmost8)89"0,69:,7- 0level94"90"- /0is5 )in3"/ 9physical"8)0.,94-5,/ /space,70F88474 /wired189"40157 4down,4*90!1%7 ,when290", 9the 2%," application5:88>)99 "0;)79 :is, -executing,5,*03:2 and-07 the419 lower"0:8 0level75, *is0 9pageable,--0)80 6and:,- addressed949"05"/ 8virtually.).,-,// 7We088 will419 "call074 the49! top% level9",9 2the,5 “root”8)9 for obvious reasons and the lower 82039)430/>,--:807,//708808,7097,38-,90/9"74:*"9"0%3>,3/A0730-85,.0)89/5).,--//);)/0/ level82 the039 )“user430/ >page,--: table”807,/ because/708808 it,7 maps097,3 the8-,9 0user/9" address74:*"9 "space.0%3 The>, 3page/A0 7table30-8 organization5,.0)89/5). is,- -illustrated//);)/0/ )3below9470* (note)4389 the",9 difference,7097,38-, in90/ scale9"74 between:*"9"0 %the3 user,3/ page49"0 table770* )and438 the9",9 user2,5 address/)70.9- /space):4394 5 "/8).,-202 4)73/9 4%70"*0)4A3078390"-,Q89,977,03987-,,3908-/,9700/*)94"3784:9/*5").9",0--/%3"4-/,3//,94,99""0,797)08*8)04-3/8429",:9820/,>514/7)70?.,9-2/54-30949"50";/,87)).4,:-82/0,29, 849The77:/. 9 %:full7"008 addressAC)037.3-0:-/Q 8)space397*,5378 contains4-,.9008/8750,* *256)0439,8 -pages,9-/058)G.9," -which,-/9,"740- :/requires8/0,/9,949"A 256,0905)8 9PTEs87,0.-/A4 42to1 9:map"800/7:> it.3134 )7Each30*?5, 72PTE45.-008 89is0" 80a  ;Bsingle1,,7)45:7 4word,8./0,898, )8and937:4. 92569.::707 78PTEs0C3)93-./- :7can:/3)33 thus*)35*7>4 9fit."00 8in383 5a4, single3*049,1- 9page.-"0088G09" 8Therefore,9,79:,.790:7:0880,/ 7a90 4single)3A0:085 0page>97,,3./A of94" 1PTEs09/"30070 :can/33) 43map9*45.7. the4:.50/ 8entire850"8/ 8B )1user.,,-527 40.208 8 4)space.78/3 4%9". The::87>70) 93kernel29-/,A7:0 83keeps380)33*8> 0a9 9"set4035 of:394 9pages3"00241 )in93"90 4its80; )virtual87997::,.-9:8 5space,70,8.0, 70 each)3: 8of0> which,3/9" 0holds/300 one/34 user94.. page:5/ 5table."/8). There,-202 47/ %":8>)92,A0880380945:99"02)394;)79:,-85,.0  %are"0 64/)1 1of07 0these39;) 0tablesF84 1(there9"0 are -) 964,/ unique/70888 5ASIDs:,.0,70 the8"4 ASIDF3-0 -is4 F6 Tbits wide), and the corresponding user %"0/)1107039;)0F8419"0 -)9,//708885,.0,708"4F3-0-4FT &8074/06 3073044/06 E &8074/06 3073044/06 E E E ,.W5,H09,-O0 E E &807!,H0%,-O08 58O,4..W,950,/H-0X9,598-O0 &8075,H09,-O08 &8*07!5,,HH00%8+,-O08 .54877O40.8,59403//-53XH598$; !!;5,%3 ,&3/8057745.,0H8089,8-97O:0.898 *5,H08+ .E47 70 8 5^43*/5$3H;_$_;Z+ ,3/574.088897:.98 E ^*$;__Z+ !!;5,%3 E EZ &$#$! EZ E E  !!;5,%3 E YEZ   &$#$! E YEZ  E !!;5,%3 EY EY !74.088$97:.9:708M ,58/570.9OX4394 X3,25.,9,M09. 5,H08 2[ ( 2[Y( !74.088$97:.9:708M 5WX,855.8,/O52700.29OX447X394 X3,25.,9,M09. 5,H08 2[ ( 2[Y( 5WX85.,O20247X EZ E E EZ =0730O'579:,O$5,.0 E E =0730O'579:,O$5,.0

 tables are held in the top 64 virtual pages of the kernel’s address space. These are in turn mapped by root  PTEs that are held in the top 64 words of page frame 0. Thus, the virtual page number of the user page table is equal to the physical address of the root PTE that maps it. As mentioned, all user addresses are translated through the TLB, and kernel space is typically divided into regions that are translated through the TLB and other regions that map directly onto physical memory. The kernel’s translated regions typically hold data that is seldom used, for example the various data structures (including process page tables) that are used to keep track of the running processes. If a process is not currently running, then none of these structures are in use, and they need not occupy physical memory. Thus, it makes sense to put them into virtual space. The different views of the 16-bit address space are shown below: As mentioned, all user addresses are translated through the TLB, and kernel space is typically divided into regions that are translated through the TLB and other regions that map directly onto physical memory. The kernel’s translated regions typically hold data that is seldom used, for example the various data structures (including process page tables) that are used to keep track of the running processes. If a process

12  &()(9,,425:90708()3!!7470.9&!70.(80:39077:598;?

82039(430/+,--:807,//708808,7097,38-,90/9074:10900%3+,3/50730-85,.0(8985(.,--8/(;(/0/ (394701(43890,9,7097,38-,90/9074:10900%3,3/49007701(43890,92,5/(70.9-843945088(.,-202 478 %0050730-<897,38-,90/701(438985(.,--804-//,9,90,9(880-/42:80/+1470>,25-0900;,7(4:8/,9, 897:.9:708?(3.-:/(31574.0885,109,--08A90,9,70:80/94500597,.5419007:33(31574.08808 B1,574.088 (8349.:77039-87:33(31+900334304190080897:.9:708,70(3:80+,3/9008300/3494..:585088(.,-202 478 %0:8+(92,50880380945:99002(394;(79:,-85,.0 

 # 0890/%+ /881.059/438 ,::9/3;!%//708808 C003900%31,(-8941(3/,1(;03'! +(97,(808,30>.ENEE059(4 3646: B1 Digital900,/ /Computer7088-0(3 1Design97,38 -—,9 0Project/(8, :38 (15%)07 ,//7088?( 0 +900!&(8(3:80724/0A+90039000>.059(437,(80/(8J*%3&B$$ B1900!&(8(3 50730- 24/0 ,3/ 900 945 -(9 41 900 ,//7088 94 -0 97,38-,90/ (8 , N< 9003 900 0>.059(43 7,(80/ (8 is notJ currently*%3P running,B$$ B19 0then0945 none-(94 1of90 these0,// 7structures088(8N < 9are00 ,in// use,7088 and.,33 they49., :need80, 3not0> .occupy059(43 -physical0.,:80( 9(8 memory.34997, 3Thus,8-,90/ it90 makes74:10 9sense00%3 to -put:9( 3them890,/ into(82 virtual,550// space.(70.9-8 43945088(.,-202478 490 0>.059(438 -00,;0 ,8 3472,- ,3/ 5071472 ,//(9(43,- 1:3.9(438 -01470 ;0.947(31 94 900 0,3/-07 NestedC003 TLB-Miss900%3& ExceptionsB$$0>.059(4 3&0 ,Faulting3/-077:3 8PTE+(98R Addresses4-(8941(3/900:8075,109,--003978.47708543/(3194 When900 5the,10 TLB90,9 fails2(88 0to/ find900% a3 given %0 0VPN,:807 5it, raises109,- an-0 (exception.8-4.,90/(3 If9 0the09 4address56:,79 0being7419 0translated0,//7088 is85 a, .user0+,8 804T3,-4;0 %00?;(79:,-A-4.,9(4341900!%+1(;03900$B41900.:77039:807574.088,3/900'! address41900 (i.e.,,// 7the088 9CPU0,9., is: 8in0/ user900% mode),32(8 then8+(8. the42 5exception:90/,..4 7raised/(319 4is9 0EXC_TLBUMISS.014--4T(3106:,9(4 3IfV the CPU is in kernel mode and the top bit of the address to be translated is a ‘1’ then the exception raised is EXC_TLBKMISS.#590 If *the top , bit-$ of/ the00 address12,'! is ‘0’ the address cannot cause an exception because it is not% translated4,%/%39)0 through),3/+%3 ,the41 TLB9)00. but.05 9instead%4319)0 is.4 mapped3897:.9%4 directly3419)%8 onto,//70 physical88%85071 memory.4720/-7 ),7/8,70 %)%8%8 8%2%+,7949)0202477 2,3,,020391,.%+%9%08411070/-7

&8074/0) &4$$) 94$$)

.7 .7 .7 .7 .7 .7 .7 .7 .7 .7 .7  $4 ,/'! .7 ,/'! .7 .7 .7 .7 .7 .7 .7 .7 .7 .73 .73 ! .73 !

%#0,//7088+855,.0/+394.7!.43974(70*+8907 190791+8!9101,7/4,70;0.947894910&8$$1,3/(07 ADDRpte = 0xC000 + (ASID << 8) + VPN :103 910 1,3/(07 7:38!+94+(( :80 910 ;+79:,( ,//7088 +3 .7 94 7010703.0 910 !%  :103 910!% +8 To (aid4,/ 0in/ !the910 handling1,3/(074 -of9, the+38 9exception,10! B80 the01+ *construction:70-0(4414 7of85 this0.+1 +address.84391 0is !performed%1472,9 Eby % 1hardware.01,3/(07 This9103 is similar50714 to72 the8, 9memory-management%-'B%F47+90E+3897: .facilities9+43942 offered4;0910 by(4, MIPS/0/2 ,processors55+3*+394 and910 %UltraSPARCF processors. As soon 4 as90 a91 TLBUMISS,99101,3/(0 7exception(4,/8910 is!% detected,+394910 the574 hardware.08847:8+ 3takes*,; the+79: ,VPN(,// 7of08 8the % 1faulting:8!+9+8 address5488+-(0 and147 9the10 ASID1,3 /currently(07+980(1 9stored4.,:8 0in, the%F process2+88  %status1+8+8 register41,9+3 ;(PSR)4H089 1and0% FperformsI8$$ this1,3 computation./(07  Because all of the: values1031 ,in3 /question(+3*,H0 (number730( (0;0( of%F unique2+88 ASIDBK *values,%FI number8$$E!9 1of0 9uniqueN5041 %VPNs)F2 +are889 1all,9 powers1,5503 8of4 two,1+(0 the9 1additions0H0730(+ 8in0 Othe0.: equation9+3*!910 5above,*09, simply-(0300 /to0/ ordinary+8910H0 7concatenation30(P84435,* 0of9, bit-fields.-(091,92 ,This5891 is0 9illustrated451,(141 910 below,,//7 which08885, .shows0B910 theH07 3locations0(P8;+79: ,in( 8the5,. register0E %1+8 file5,* 0and9,- formats(0+8+((: 8of97 ,the90/ various+39103 0dataO980 that.9+43 areQ+9 placed+8(4., 9into0/,9 the, register//7088 fileT +by35 hardware1N8+.,(2 when0247N vectoring,3/0O903 to/8 a9 4TLB-miss,//7088 handler: %10 9 451,(14191+85,*09,-(0B,//708808 The address is placed into cr3, control register 3. After this, the hardware vectors to the UMISS handler. When the handler runs, it will use the virtual address in cr3 to reference the PTE. When the PTE is  loaded, the handler obtains the PFN (see figure below for specifics on the PTE format). The handler then performs a tlbw (TLB write) instruction to move the loaded mapping into the TLB. Note that the handler loads the PTE into the processor using a virtual address. Thus, it is possible for the handler itself to cause a TLB miss. This is what invokes the TLBKMISS handler. When handling a kernel-level TLB miss (EXC_TLBKMISS), the type of TLB miss that happens while the kernel is executing, the page table needed is the kernel’s own page table that maps the top half of the address space (the kernel’s virtual space). This page table is illustrated in the next section; it is located at address 128 in physical memory and extends to address 255. The top half of this page table (addresses 192–255) maps the user page tables referenced by the user TLB-miss handler. By construct, because the user page tables begin at virtual address 0xC000, their VPNs range from 0xC0 to 0xFF—in decimal, the range is 192–255. Therefore, by construction, the VPN of the virtual address for the user PTE that the UMISS handler loads equals the physical address of the kernel PTE that maps the user page table. Before

13  &()(9,,425:90708()3!!7470.9&!70.(80:39077:598;?  &()(9,,425:90708()3!!7470.9&!70.(80:39077:598;?

"$&2,589-0:8075,109,-3087010703.0/-89-0:807%: 2=88-,3/307 8.43897:.9@-0.,:809-0 "$&2,589-0:8075,109,-3087010703.0/-89-0:807%: 2=88-,3/307 8.43897:.9@-0.,:809-0 :8075,109,-308-01=3,9;=79:,3,//7088 C @9-0=7'! 87,3101742 C 94 CI=3/0.=2,3@9-0 :8075,109,-308-01=3,9;=79:,3,//7088 C @9-0=7'! 87,3101742 C 94 CI=3/0.=2,3@9-0 7,310=8"$ %-0701470@-8.43897:.9=43@9-0'! 419-0;=79:,3,//70881479-0:807!%9-,99-0 7,310=8"$ %-0701470@-8.43897:.9=43@9-0'! 419-0;=79:,3,//70881479-0:807!%9-,99-0 &M$$-,3/30734,/806:,389-05-88=.,3,//7088419-0P07303!%9-,92,589-0:8075,109,-30 01470 &M$$-,3/30734,/806:,389-05-88=.,3,//7088419-0P07303!%9-,92,589-0:8075,109,-30 01470 ;0.947=31949-0QM$$-,3/307@9-0-,7/R,7053,.089-=8'! SR-=.-@,8/08.7=-0/@=806:,3949-05-88 ;0.947=31949-0QM$$-,3/307@9-0-,7/R,7053,.089-=8'! SR-=.-@,8/08.7=-0/@=806:,3949-05-88 =.,3,//7088706:=70/-89-0QM$$-,3/307&=394.7@.439743701=8907  =.,3,//7088706:=70/-89-0QM$$-,3/307&=394.7@.439743701=8907  %-=8=8,.9:,338,1,=738=397=.,90574.088@,3/=9/02,3/8.,701:3,99039=434384:75,79=39-0/0;03452039 %-=8=8,.9:,338,1,=738=397=.,90574.088@,3/=9/02,3/8.,701:3,99039=434384:75,79=39-0/0;03452039 41 84:7 %: 2=88 -,3/3078@ 49-07R=80 /,9, .,3 109 890550/ 43 R=9-4:9 9-0 8419R,70 70,3=U=31 =9 S147 41 84:7 %: 2=88 -,3/3078@ 49-07R=80 /,9, .,3 109 890550/ 43 R=9-4:9 9-0 8419R,70 70,3=U=31 =9 S147 =389,3.0@R-039-0 :2#88 ",3/&07 .,:808 ,3 0,.059/43 1"03 /9 50714728 9"0 !% &4,/ :8/37,;/79:,& =389,3.0@R-039-0 :2#88 ",3/&07 .,:808 ,3 0,.059/43 1"03 /9 50714728 9"0 !% &4,/ :8/37,;/79:,& ,//70889  ,//70889  ;1</:7/37:80724/0<,3,..088.,:808,%=2/88<9"0:2ENEE#88" ,646:3/& 0Digital7/8/3; Computer4?0/ %"0 Design",7/1 —,7 0Project5&,.08 39 "(15%)0 ;1</:7/37:80724/0<,3,..088.,:808,%=2/88<9"0:2#88",3/&07/8/3;4?0/ %"0",7/1,705&,.089"0 !%@8;/79:,&,//7088/394.7 %"/8/8/430/3,///9/43948,;/379"0709:73!/3.7C<,3//9/8,&&/430 !%@8;/79:,&,//7088/394.7 %"/8/8/430/3,///9/43948,;/379"0709:73!/3.7C<,3//9/8,&&/430 -01470;0.947/37949"0:2#88",3/&07 -01470;0.947/37949"0:2#88",3/&07 vectoring;1</:7 /to37 the9"0 KMISS0,0.:9/4 handler,3419"0 :the2#8 hardware8",3/&07 Eplaces47</3 1this,.9< VPN/:7/37 (which,,3F?07 3as0 &described,.4/0,9,&& is9< ,equal202 to47 Fthe,. .088 ;1</:7/379"00,0.:9/43419"0:2#88",3/&07E47</31,.9</:7/37,3F?0730&.4/0,9,&&9<,20247F,..088 physical.,:80 address8,%= required2/88<9"0 by%2 the#88 "KMISS,3/&07/ 8handler)/3;4?0/ into%"0 cr3",7,/ control1,705& ,register.089"0 13.,: &9/37G -/9'! /394.7</3 .,:808,%=2/88<9"0%2#88",3/&07/8/3;4?0/ %"0",7/1,705&,.089"01,:&9/37G -/9'! /394.7</3 9"0&4107G-/98419"0707/8907 %"/8",5503894-09"05"F8/.,&,//7088419"07449!%E9"074499,-&0/8/3 This9" is0 &actually4107G- a/9 8fairly419" 0intricate707/8907 process,%"/8", 5and503 8it9 4demands-09"05" carefulF8/.,&, attention//708841 9on"0 7your449! part%E 9in"0 the744 99,-&0/8/3 5"F8/.,&5,70 L800&,907//,77,29 %"0",7/1,705&,.089"0709:73!E1"/."</32489.,8081/&&-09"0 development5"F8/.,&5 ,of70 your L8 TLB-miss00&,907//, handlers,77,29 %" 0otherwise",7/1,70 data5&,. 0can89" get070 9stepped:73! Eon1" /without."</32 4the89. software,8081/&& realizing-09"0 !419"0=M/3897:.9/43/39"0:2#88",3/&079/394.7C %"03",7/1,70;0.9478949"0%2#88",3/&07 it (for! instance,419"0=M when/3897 :the.9/ 4umiss3/39 "handler0:2#88 causes",3/&0 7an9/ 3exception94.7C %" 0when3",7 /it1 performs,70;0.94 7the894 PTE9"0% 2load#88 "using,3/&0 7a virtual 3.0,",3/&07",8&4,/0/,!%<1",9",5503830,9%"0!%1472,9&44?8&/?09"014&&41/37P address). 3.0 , ",3/&07",8&4,/0/,!%<1",9",5503830,9%"0!%1472,9&44?8&/?09"014&&41/37P

%94       < = "       %94       < = "       -%9 "-%98 =-%98 -%9 "-%98 =-%98 !,*0%,--039712!%34 ' !,*07,20 :2-072! 3 !,*0%,--039712!%34 ' !,*07,20 :2-072! 3

If, during%"0-4 user9942 mode,0/7"9 -an/98 access705708 causes0399"0 a5 TLB"F8/. miss,,&&4. the,9/4 3umiss 419 "handler0 5,70 Eis9" invoked.0! 9R 9The"0 9 4hardware5-/9 /8 , places;,&// -the/9 %"0-499420/7"9 -/987057080399"05"F8/.,&&4.,9/43 419"0 5,70 E9"0! 9R9"0 945-/9 /8 ,;,&// -/9 PTE’sE  virtual/3;,&// Raddress;,&// into9 & &cr3.9/2, 9This0&F< 9is"/ 8done1/&& -in0 addition:80/94.7 to0, 9saving0,301 the03 9return7F/39" PC0%= in 4 in9&0 r20 3and/84 the19& PTE0&,3 is/. 0in78 r1.,7 02(88(368(39&(.&706(8907.4390398,707089470/174220247?>4,3/709:73 1742 03.059(43(82(88(36 4707747 .&0.C(364?4:.,38(25.?DF%9&02,.&(305702,9:70.?47:80,850.(,.8?8.,..80 6 4! G 24/0>4lui-0.,:80 r3,9&0 0x8000!%8?4:7010703.0 #( 3will?4: 7be5 ,used609, -to.0 8test,70 5top70/ 0bit1(3 0/,3/8&4:./30;07-0(3;,.(/  nand r3, r3, r1 # r3=0111111111111111 => val; r3=1111111111111111 => inv nand r3, r3, r3 # r3=0x8000 => val; r3=0x0000=>inv  # !bne%&8(., r3,+ 0r0,24 valid7&, 5 # error-handling code !70;(4:81(6:708&,;0(..:897,90/9&0.,?4:941;(79:,.85,.0 %&014..49(361(6:70(..:897,9089&0.,?4:941 5"#8%.,(20247&'(3.+:/(3.900,++ (25479,391(7895,.06

   !,#08'4)/,3,9'.3#/,55)..,9.43.4/0 /,9,340730).4/0 /,9,3',3/)07.4/0309. 10 11

!,#07,20 / A0730)8,;0,70, '0 '. '9 A!% #!% G47/ ,//7088/  ? @  ? @ 

%00)0730(8,;0,70,(8:80/1478,;(3.89,90/:7(3.0,3/+07090.:9(43'09. %00'0''%',3/'970.(438.43 14 9,(3;0.947,//708808 147 01.059%438' %39077:598' ,3/97,58' 70850.9(;0+& %00)0730(5,40 9,-(0 67!%: 2,5890070.(434120247&(3;0(.0574.088897:.9:708,3/,884.(,90//,9,,7000+/ <4:;(++349:80900 =!%70.(43(390(8574?0.9 %0074495,409,-(06#!%:.439,(389002,55(3.8147900;,7(4:8:8075,.0 9,-+0890,94..:5&9009456:,790741900;(79:,+,//708885,.0 B9(834,..(/03990,9900#!%(85+,.0/(3 9009456:,7907415,.017,20 E,82039(430/0,7+(07'900708:+9(890,9900'! 41,3&H0730+;(79:,+ ,//7088.,3-0:80//(70.9+&,8,50&8(.,+,//7088944-9,(3900,557457(,907449 +0;0+!% 

47 90(8 574?0.9' &4: 300/ 94 5:9 /,9, (394 5,.0  K809 :5 '0 ,3/ '9 Z

70.(438',8;0++,8430!%(390074495,.09,-+0147$B3:2-07OP

49090,9900!$#804:+/-0(3(9(,+(Q0/,557457(,90+&94.439,(3$BO,8

;0++' 84 90,9 900 0,7/;,70 .,3 .70,90 900 .4770.9 ,//7088 ,8 5,79 41 %#! '% #$ 708543/(3.94,%R&B$$09.059(43 <4:;(++,+84300/945:90,3/+07

.4/08420;0070(350&8(.,+20247&';(90900;0.947,//708808(3(9(,+(Q0/

9454(39949000,3/+078:8(3.50&8(.,+,//708808 4709,25+0'&4:.4:+/

;07&;0++5:99000,3/+07.4/0(35,.0,3/54(39900,//708808(3900'0  %#!*YO%  %#!*P #O ,3/'970.(4389490080+4.,9(438 R,89+&'&4:2:89.70,90,5,.09,-+0147 

,3,55+(.,9(43 4709,25+0'&4:.4:+/5:990(85,.09,-+0(39450&8(.,+

5,.0,3/54(399007449!%.47708543/(3.94$BO945,.0 

N %##&!%   %39077:59'0.947%,-30 '% #$

%"0%39077:59;0.9479,-00",8,8%25001472,941470;07506.059%43,0.43 /%9%43 9",9 9"0 ",7/9,70 70.4:3%;08 <%3.0:/%3: 06.059%438= %39077:598= @? N %*%N# ,3/ 4797,58?=9"0702:89-0,3,//7088%39"09,-009",954%39894,",3/007 @Z N %*O A @ N %*N 74:9%30 %"09,-00%804.,90/,95"58%.,0,//7088A %3202475,3/",8A @

0397%08<06.059%4395508=%39077:5995508=,3/97,595508? %"%8%8

%00:897,90/%39"01%::70949"07%:"9

K!%N %"480;0.94789",92:89-0%250020390/,708",/0/F;0.94789",9,70349 '% #$ 8",/0/300/349-0%250020390/%39"%8574G0.9 ? K*!#N'NOP$ ? K*N 'ON# ? K*%O!#N' ? K*%OAN$$ ? K*%O&N$$ ? K*P #O

  &()(9,,425:90708()3!!7470.9&!70.(80:39077:598;?

4909&,99&(8(8349,.425.090&,3/.07114703,25.0407747 &,3/.(36(82(88(3649&0-06(33(368419&0 &,3/.078,702(88(368(39&(.&706(8907.4390398,708,;0/,3/9&0!%.4,/0/>49&003/8419&0&,3/.078 ,702(88(368(39&(.&706(8907.4390398,707089470/174220247?>4,3/709:73 1742 03.059(43(82(88(36 4707747 .&0.C(364?4:.,38(25.?DF%9&02,.&(305702,9:70.?47:80,850.(,.8?8.,..80 6 4! G 24/0>4-0.,:809&0!%8?4:7010703.0(3?4:75,609,-.ENEE08,70 646:570 /Digital01(30/ Computer,3/8&4: Design./30; 0—7 Project-0(3;, 3.( (15%)/ 

 # !%&8(.,+0247&,5 valid: !70;(4:tlbw81(6:70 r1,8& ,r2;0(..:897,90/9&0.,? #4 :writes941;( 7contents9:,.85,. 0of % r1+r2&014. .to49 TLB,(361 (sets6:70 (‘v’..:8 9bit7,90 8in9 &TLB0., ?entry4:941 The5"# steps8%.,( that202 the47& 'UMISS(3.+:/( 3and.90 KMISS0,++ (2 5handlers479,391( 7go89 5through,.06 are very similar.

Note that  this is not a complete!,#08 handler:'4)/,3,9'.3# for/,55 )example,..,9.43.4/0 /error-handling,9,340730).4/0 /,9 ,is3' ,missing,3/)07.4/030 the9. beginnings of 1the0 11 handlers are missing (in which register contents are saved and the PTE loaded), the ends of the handlers are missing! ,(in#0 7which,20 / register contents are restored from memory), and return-from-exception is missing. A0730)8,;0,70, '0 '. '9 A!% #!% For error-checking,G47/ you can simply HALT the machine prematurely or use a special syscall (e.g., PANIC mode),,//70 because88/ the PTEs you ? reference@  in? your page tables@ are predefined and should never be invalid. Physical%00)073 Memory0(8,;0,70 Map,(8:80/1478,;(3.89,90/:7(3.0,3/+07090.:9(43'09. %00'0''%',3/'970.(438.43 9,(3;0.947,//708808 147 01.059%438' %39077:598' ,3/97,58' 70850.9(;0+& %00)0730(5,40 9,-(0 67!%: Previous2,5890 0figures70.(4 3have412 illustrated0247&(3 ;the0( .layout0574. 0of8 8virtual897:.9: space.708,3 /The,88 following4.(,90//, figure9,,70 0illustrates0+/ <4: ;the(++ layout349:8 0of9 00 physical=!%7 0memory,.(43(39 0including(8574?0.9 the %0 all-important074495,409 ,first-(0 6page:#!%: .439,(389002,55(3.8147900;,7(4:8:8075,.0 The9,- +kernel0890, 9save4.. area:5& 9is0 0used945 for6:, saving790741 state900; (during79:,+, /handler/708885 execution,,.0 B9(83 etc.4,. The.(/0 3Ve,990 Vi,,99 0and0# Vt!% regions(85+,.0 /(3 9009456:,7907415,.017,20 E,82039(430/0,7+(07'900708:+9(890,9900'! 41,3&H0730+;(79:,+ contain vector addresses for exceptions, interrupts, and traps, respectively. ,//7088.,3-0:80//(70.9+&,8,50&8(.,+,//7088944-9,(3900,557457(,907449 +0;0+!%  The kernel page table (KPT) maps the region of memory in which process 47 90(8 574?0.9' &4: 300/ 94 5:9 /,9, (394 5,.0  K809 :5 '0 ,3/ '9 Z structures and associated data are held. You will not use the KPT region in 70.(438',8;0++,8430!%(390074495,.09,-+0147$B3:2-07OP this project. The root page table (RPT) contains the mappings for the 49090,9900!$#804:+/-0(3(9(,+(Q0/,557457(,90+&94.439,(3$BO,8 various user page tables that occupy the top quarter of the virtual address

;0++' 84 90,9 900 0,7/;,70 .,3 .70,90 900 .4770.9 ,//7088 ,8 5,79 41 %#! space. It is no accident that the RPT is placed in the top quarter of page '% #$ 708543/(3.94,%R&B$$09.059(43 <4:;(++,+84300/945:90,3/+07 frame 0—as mentioned earlier, the result is that the VPN of any kernel .4/08420;0070(350&8(.,+20247&';(90900;0.947,//708808(3(9(,+(Q0/ virtual address can be used directly as a physical address to obtain the 9454(39949000,3/+078:8(3.50&8(.,+,//708808 4709,25+0'&4:.4:+/ appropriate root-level PTE. ;07&;0++5:99000,3/+07.4/0(35,.0,3/54(39900,//708808(3900'0  %#!*YO%  %#!*P #O For,3 /this'9 project,70.(438 you9490 need080+4 to., 9put(43 8data R, 8into9+&' &page4:2 0: (set89. 7up0,9 0Ve, and5,. 0Vt9, -regions,+0147  as, 3well,5 5as+( .one,9(4 PTE3 4 in70 the9,2 root5+0' page&4: .table4:+/ for5: 9ASID90(85 number,.09,-+ 09).(3 Note9450 &that8(., + 5,.0,3/54(399007449!%.47708543/(3.94$BO945,.0  the PSR should be initialized appropriately to contain ASID 9 as well, so that the hardware can create the correct address as part of responding to a N %##&!% '% #$ TLBUMISS  %39077 exception.:59'0.94 You7%, will-30 also need to put handler code somewhere in physical memory, with the vector addresses initialized to point to the %"0%39077:59;0.9479,-00",8,8%25001472,941470;07506.059%43,0.43 handlers/%9%43 9 using",9 9" physical0 ",7/9 addresses.,70 70.4: 3For%;0 8example, <%3.0:/ %you3: 0could6.059 %very438= well%390 put77:5 the98= @? N %*%N# handler,3/ 47 code97,58 ?in=9 "page0702 1: and89- 0point,3,/ the/70 8addresses8%39"09 ,in-0 0the9" ,Ve95 and4%39 8Vt94 regions,",3/0 to07 @Z N %*O A @ N %*N these74:9 %locations.30 %"09, Lastly,-00%80 4you.,9 0must/,95 create"58%., a0 page,//70 table88A for%3 2an0 2application.475,3/", 8ForA @ example,0397%08< you0 6could.059% 4put395 this508 =page% 3table9077: 5into995 5physical08=,3/ page97 ,25 and9550 point8? %" the%8% 8 root%00: 8PTE97,90 /corresponding%39"01%::70 9to49 "ASID07%:" 99 to page 2.

K!%N %"480;0.94789",92:89-0%250020390/,708",/0/F;0.94789",9,70349 '% #$ Interrupt Vector Table 8",/0/300/349-0%250020390/%39"%8574G0.9 ? K*!#N'NOP$ The interrupt vector table has a simple format: for every exceptional ? K*N 'ON# ? K*%O!#N' condition that the hardware recognizes (including exceptions, interrupts, ? K*%OAN$$ ? K*%O&N$$ and/or traps), there must be an address in the table that points to a handler ? K*P #O routine. The table is located at physical address 80 in memory and has 48 entries (16 exception types, 16 interrupt types, and 16 trap types). This is illustrated in the figure to the right. Those vectors that must be implemented by you are shaded in the  illustrated table; vectors that are not shaded need not be implemented in this project. The placement of the table in physical memory and the arrangement of the various vectors and other 7- bit exception codes are not by accident. Out of the eight different types of exceptional codes (MODE, TLB, CRMOVE, etc.), only three require handlers in the operating system and thus require entries in a vector table. All three are contiguous in the naming range (EXC types are binary 101xxxx, INT types are 110xxxx, and TRAP types are 111xxxx). This corresponds to their entries in the vector table in the following way: note that the address 80 has the following hexadecimal representation:

15 ENEE 646: Digital Computer Design — Project 3 (15%)

0x50 (equals decimal 80) The first exceptional type that must be handled via an interrupt vector is EXC_GENERAL, which has the following hexadecimal representation:

0x50 (equals binary 1010000) It turns out that, at least in the RiSC-16 architecture (note that this is not true in general for other architectures), the exceptional code can be used directly as a physical address to retrieve the vector (the handler address). In general, the hardware is given a physical address for the start of the table, and the exception code is used to index into that table, requiring a small amount of computation. This mechanism is just slightly simpler and arises because we have created more exception codes than just EXC, INT, and TRAP types. Your Task Your task in this project is to build the memory-management scheme described in this document. You have been given my solution for Project 2, extended with a larger register file; you can start there or start with your own project 2 code. You are to build an exception facility that handles interrupts precisely. You are to implement the TLB (2-entry and fully associative). You are to implement two exception types, one trap type, one TLB-management instruction, and one mode instruction:

EXC_TLBUMISS EXC_TLBKMISS TRAP_HALT TLB_WRITE MODE_HALT You are to write handlers for each of the exceptions and trap types in RiSC-16 assembly code. You are to build a memory image containing your OS code (at this point, comprised of only handlers), kernel data, kernel save area, interrupt vector table, and an initialized kernel page table mapping all of the kernel’s virtual code & data as well as the user page table for a given process (for example, ASID 9). Follow the example given in the previous section: 1. Create your handlers and load them in physical memory: page frame 1 (starting at physical memory address 256). 2. Initialize your IVT so that the entries point to the appropriate handler locations. 3. Put the user page table into page frame 2 (starting at physical address 512). You do not need to initialize the user page table entries—my test code will choose a physical location for the application and initialize the page table for you (therefore, this step requires no work). However, you will need to initialize this page table when testing your own code. 4. Lastly, you need to put the user page table’s physical location into the root page table (RPT): at the RPT entry corresponding to ASID 9, you must put a valid page table entry (see the format above) and set the page frame number to ‘2’ (where the user page table has been placed). Your processor should start running with user mode enabled (K bit is PSR set to ‘0’), nothing but zeroes in the kernel-mode history vector (i.e. the top eight bits of the PSR), the ASID set to the value ‘9’, and the program counter set to 0. What I want from you: 1. Your RiSC.v pipeline. 2. A file called “init.sys” that represents the contents of the first 3 pages of physical memory (the initial page frame, a page of handler code, and the user page table for ASID 9).

16 ENEE 646: Digital Computer Design — Project 3 (15%)

I will test your code by loading a random program (probably laplace.s because it is large) into the memory space at a randomly chosen physical location and initializing the user page table for ASID 9 to point to the appropriate physical locations. The grade will break down along the following lines: • Correctness of hardware exception recognition and TRAP handler invocation, 5 points • Correctness of memory-management/address-translation hardware, 5 points • Correctness of TLB-miss handler implementations (umiss and kmiss handlers), 5 points

17