Teaching Computer Architecture/Organisation Using Simulators

7HDFKLQJ&RPSXWHU$UFKLWHFWXUH2UJDQLVDWLRQXVLQJVLPXODWRUV Herbert Grünbacher Vienna University of Technology Treitlstrasse 3/182-2, A-1040 Vienna / Austria E-mail [email protected] Abstract ,QWURGXFWLRQ ([SHULHQFH VKRZV WKDW PDQ\ VWXGHQWV HVSHFLDOO\ Teaching the dynamics of pipelines and caches is WKRVH ZLWK OLWWOH KDUGZDUH EDFNJURXQG HQFRXQWHU rather difficult if done on a paper and pencil basis. In GLIILFXOWLHVLQXQGHUVWDQGLQJWKHFRQVHTXHQFHVDQGHYHQ our experience students find it difficult to understand FRQFHSWV RI FRQYHQWLRQDO LQVWUXFWLRQ SLSHOLQLQJ the principles and complications of pipelines and to a VXSHUVFDODU LQVWUXFWLRQ SURFHVVLQJ LV HYHQ PRUH lesser extend of caches. To support teaching and give FRPSOLFDWHGDQGKDUGHUWRXQGHUVWDQG,WLVSDUWLFXODUO\ students an environment to experiment, we developed GLIILFXOW WR VWDWLFDOO\ WHDFK WKH FRQFHSW RI D SLSHOLQH several pipeline simulators and a cache simulator. 7KHUHIRUH ZH GHYHORSHG VRIWZDUH WR VLPXODWH DQG My experience is that students appreciate using G\QDPLFDOO\YLVXDOL]HWKHSURFHVVLQJRILQVWUXFWLRQVE\ simulators and by using them get easily introduced to SLSHOLQHG VXSHUVFDODU SURFHVVRUV 7KUHH VLPXODWRUV the subject. Based on the knowledge gained from using KDYHEHHQGHYHORSHG the simulators they are motivated to further study the • :LQ'/; LV EDVHG RQ +HQQHVV\3DWWHUVRQV '/; subject using books. DUFKLWHFWXUH DQG LV PRGHOHG DW WKH DUFKLWHFWXUH Almost all of our students have their private PCs and OHYHO WKHUHIRUH YHU\ OLWWOH SURFHVVRULQWHUQDO most of them run Windows95/NT. This was the main LQIRUPDWLRQLVJLYHQ reason why we develped the simulators to run under MS • 0,36LP LV EDVHG RQ 3DWWHUVRQ+HQQHVV\ V 0,36 Windows. It turned out that students particuarly like to SURFHVVRU ERRN DQG LV PRGHOHG DW WKH FRPSXWHU work at home and they are usually well prepared to ask RUJDQL]DWLRQ OHYHO IXQFWLRQDO XQLWV OLNH UHJLVWHU questions in class. ILOHSLSHOLQHUHJLVWHUVPXOWLSOH[HUVDUHYLVLEOHDQG 0,36LPGLVSOD\VFRQWHQWDQGG\QDPLFEHKDYLRURI :LQ'/; VXFKXQLWV • 0N6LP LV EDVHG RQ WKH 0,36 5 WinDLX is a MS-Windows (16 bit) based pipeline DUFKLWHFWXUHDQGPRGHOVWKHLQVWUXFWLRQGHFRGHDQG simulator for the DLX processor as described in [1]. GLVSDWFK XQLW WKH EUDQFK XQLW WKH LQVWUXFWLRQ DLX is modeled at the architecture level, very little TXHXHV DQG WKH IXQFWLRQDO XQLWV DGGUHVV about the underlying computer organization is know at FDOFXODWLRQ ERWK $/8V IORDWLQJSRLQW DGGHU that level. IORDWLQJSRLQW PXOWLSO\GLYLGHVTXDUHURRW XQLW After loading a symbolic DLX assembler code, most &RQFHSWV OLNH register renaming, EUDQFK KLVWRU\ of the information relevant to the CPU (pipeline, WDEOHEUDQFKUHVXPHEXIIHURXWRIRUGHUH[HFXWLRQ registers, I/O, memory, …) can be viewed and modified FDQEHH[SODLQHGHDVLO\XVLQJWKHVLPXODWRU while executing the code step-by-step or continuously. 7HDFKLQJ FDFKH RUJDQL]DWLRQ LV DQ HDVLHU WDVN WinDLX offers statistics about pipeline behavior in QHYHUWKHOHVV YLVXDOLVLQJ FDFKH DFWLYLWLHV KHOSV time. XQGHUVWDQGLQJ WKH G\QDPLFV RI D FDFKH PHPRU\ WinDLX works with several configurations: ;FDFKH LV D VLPXODWRU ZKLFK GLVSOD\V WKH LQWHUDFWLRQV Structure (number of floating point functional units) and EHWZHHQLQVWUXFWLRQPHPRU\DQGLQVWUXFWLRQFDFKHGDWD latency of the floating point can be changed. PHPRU\DQGGDWDFDFKHUHVSHFWLYHO\ Forwarding can be enabled/disabled and memory size can be modified. There is extensive online help 7KHVLPXODWRUDUHDYDLODEOHIRUIUHHGRZQORDGLQJIURP available to explain the simulator and the internals of KWWSZZZYOVLYLHWXZLHQDFDW&RPS$UFK DLX. "Register", "Code", "Pipeline", "Clock Cycle Diagram", "Statistics" and "Breakpoints" windows show internals of the pipeline. Further explanation is given below. )LJXUH0DLQ:LQGRZZLWKRSHQ&RGH:LQGRZ &RGH:LQGRZ &ORFN&\FOH'LDJUDP:LQGRZ The code window displays a three column Figure 2 - the cycle diagram window - shows the representation of the memory: address (symbolic or in timing behavior of the pipeline. The simulation shown hex), the machine code in hex and the assembler is in the 4th cycle, the first command is in the MEM command. Figure 1 shows the main simulation stage, the second in intEX and the fourth in IF. The window with a code segment in the open Code third command, however, is denoted as "aborted". Window. Color coding in the different simulation This is because the second command, jal, is an windows is consistent, e.g. WB (Write Back) is unconditional branch. This is known after the 3rd colored in blue. Double-clicking on instructions in cycle, when jal has been decoded. During this cycle any of the simulation windows displays pipeline status the command movi2fp (following after jal) has information in text form giving details about internal already been fetched, but the next executed command registers, operations, stalling and forwarding status. will be at another address. Therefore the execution of movi2fp must be aborted, leaving a "bubble" in the 3LSHOLQH:LQGRZ pipeline. The branch address of jal is named The pipeline window shows the inner structure of "InputUnsigned". By clicking Memory/Symbols in the DLX processor - the five pipeline stages of the the main window, the correspondence between the DLX processor and the floating point units (addition / used symbols and the actual addresses is shown. subtraction, multiplication and division). )LJXUH&ORFN&\FOH'LDJUDP %UHDNSRLQW5HJLVWHUDQG6WDWLVWLFV:LQGRZ &RQWURO'DWD)ORZ6LJQDOV Setting breakpoints stops the simulation at user After executing the program code data path and defined points. control signal can be displayed by clicking on them. The register window shows all registers, not just The instruction content of the different pipeline stages the register file, and their content in hex. is displayed on top of each stage. This statistics window provides information about Extensive help as well as a introductory tutorial is general aspects (e.g. number of simulation cycles), the available online. hardware configuration used in the simulation, stalls and their causes, conditional branches, load-/store- 0.6LP instructions, floating point stage instructions and traps. Usually, absolute count of events and The R10000 is a dynamic superscalar percentage are given, e.g. "RAW stalls: 17 (7.91 % of microprocessor which implements the 64-bit Mips all cycles)". Instruction Set Architecture [3], [4]. It fetches and The statistics window is very useful to compare decodes four instructions per cycle and dynamically the effects of changes in the pipeline configuration. issues them to five fully-pipelined low-latency execution units. Instructions can be fetched and 0,36LP executed speculatively beyond branches. Instructions graduate in order upon completion. Although MIPSim is a pipeline simulator for the MIPS execution is aggressively out-of-order, the processor processor as described in [2]. MIPS is modeled at the still provides sequential memory consistency and computer organization level. Functional units like precise exception handling. register files, pipeline registers, ALU, multiplexers, data and control flow are visible. 0RGHORIWKH5 The user can write small programs (currently there is only a subset of the MIPS instruction set Our R1000k model concentrates on the most implemented) and watch the pipeline doing its work, important issues of a superscalar architecture and we modify the program and the content of data memory wanted to have an easy to learn not to complex user- and register file ‘on the fly’ and go on simulating to interface. The following parts of the processor are see the effects. modelled: At present MIPSim models a rather simple ,QVWUXFWLRQGHFRGHDQGGLVSDWFKXQLW, responsible pipeline without hazard detection and forwarding for instruction fetching, instruction decoding, register units. renaming and finally dispatching the instruction to the appropriate queues. The dispatcher works together $VVHPEOHU3URJUDP,QVWUXFWLRQ0HPRU\&RQWHQW with the EUDQFKXQLW when predicting the outcome of conditional branches. During this process they need to In the very left window in Figure 3 the program access the EUDQFK KLVWRU\ WDEOH and the EUDQFK code is shown. The program can be executed in single UHVXPHEXIIHU, which therefore are also simulated. As step or running mode. By setting the pointer (in soon as instructions are being dispatched to the essence the program counter) to a particular address, queues they are also given an entry in the DFWLYHOLVW, manual jumps in the program can be accomplished. which also is part of our simulation. By double clicking on the Instr. box a window opens All of the R10000's LQVWUXFWLRQ TXHXHV, namely an in which modifications of the instruction memory address queue, an integer queue and a floating-point content (the program) can be done. queue are included in the simulation. To be able to determine, which operand results are ready, they 'DWD0HPRU\&RQWHQW access the also simulated EXV\WDEOH. The remaining parts of the simulation are the five By double clicking on the Data box a window opens. IXQFWLRQDO H[HFXWLRQ XQLWV, the address calculation Modifications (overwriting) of the data memory unit, both ALUs, the floating-point adder unit and the content can be done interactively. floating-point multiply/divide/square-root unit. Modifying the content of instruction/data memory is Data is read from and written to PHPRU\, which can very valuable for experimenting with the pipeline, e.g. be viewed and modified during the simulation. to show data hazards. The memory is simplified and it is assumed to be accessible without any delay. Exception handling is not implemented.

Load more