US 20030172256A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2003/0172256 A1 Soltis, JR. et al. (43) Pub. Date: Sep. 11, 2003

(54) USE SENSE URGENCY TO CONTINUE Publication Classi?cation WITH OTHER HEURISTICS TO DETERMINE EVENTS IN A (51) Im. c1? ...... G06F 9/00 TEMPORAL MULTITHREADED CPU (52) US. Cl...... 712/228 (57) ABSTRACT (76) Inventors: Donald C. Soltis JR., Fort Collins, CO (US); Rohit Bhatia, Fort Collins, CO A processing unit of the invention has multiple instruction (Us) pipelines for processing multi-threaded instructions. Each may have an urgency associated With its program Correspondence Address: instructions. The processing unit has a thread sWitch con HEWLETT-PACKARD COMPANY troller to monitor processing of instructions through the Intellectual Property Administration various pipelines. The thread controller also controls sWitch P.O. Box 272400 events to move from one thread to another Within the Fort Collins, CO 80527-2400 (US) pipelines. The controller may modify the urgency of any thread such as by issuing an additional instruction. The (21) Appl. No.: 10/092,670 thread controller preferably utilizes certain heuristics in making sWitch event decisions. A time slice expiration unit (22) Filed: Mar. 6, 2002 may also monitor expiration of threads for a given time slice.

100 x

102 V ( MONITOR PROGRAM THREAD PROGESSION > A8 INsTRUCTTONs EROCEssED IN ~ NO SW?! MULTIPLE PIPELINES

103 SET ACTIVE THREAD DID URGENCY HIGH, A TIME SLICE DEACTIVATE ACTIVE EXPIRATION THREAD AND OCCDR ACTIVATE INACTIVE ? THREAD % 104 105 DID A SWITCH EVENT OCCUR '2 1 130 106 MODIFY ACTIVE DID URGENCY, ASSES * 112 sTATI OR EVENT AGAINST INACTEVE OCCUR EOR ACTIVE URGENCY TO B THREAD DETERMINE WHETHER _ SWH'CH THREAD 7 SWITCH SHOULD OCCUR BY DEACTIVATINO ACTIVE THREAD AND ACTIVATINO 108 1141f MODIEV INACTIVE INACTIVE THREAD DID URGENCY, AssEs A M EVENT OCCUR AGAINST ACTIVE _,\__ FOR INACTIVE URGENCY TO THREAD DETERMINE WHETHER ? SWITCH SHOULD OCCUR

sWITCH SWITCH Patent Application Publication Sep. 11, 2003 Sheet 1 0f 2 US 2003/0172256 A1

FIG. 1 TIME SLICE EXPIRATION BYPASS UNIT 21f LOGIC _2_() Q I} k A T 1 ‘I

V [—___—__— ' 12m L“ ‘I I I“

V PIPELINE _, FETCH ISSUE 3A2) UNIT _H UNIT _ I 13 2.4 262 _ E

I, ' EV ‘ PIPELINE 1 O ‘I 2! N I h’ A 1} A l ‘I 1 Ir THREAD CQNTROLLER 4 1 ll ‘2 15(M) 17

REGISTER FILE ‘ 16H ‘ ) REG§$TER FILE ‘I8 16(N) ‘ 16§2)

FIG. 2 1 2 _’\ Ff- 62 f 66

EXI EXZ I EXZ @ 5.5 §§

70 —————> TIME Patent Application Publication Sep. 11, 2003 Sheet 2 0f 2 US 2003/0172256 A1

FIG. 3

I START I‘.‘d 102 v F MONITOR PROGRAM THREAD PROGESSION H > AS INSTRUCTIONS PROCESSED IN 2 NO SWiTf MULTIPLE PIPELINES

1 O3 SET ACTIVE TH READ DID URGENCY HIGH, A TIME SLICE DEACTIVATE ACTIVE EXPIRATION THREAD AND OCCUR ACTIVATE INACTIVE ? THREAD

105 194 DID A SWITCH EVENT OCCUR 1 ‘IO

MODIFY ACTIVE URGENCY, ASSES _ 1 12 AGAINST INACTIVE URGENCY TO 5 DETERMINE WHETHER _ SWITCH THREAD SWITCH SHOULD OCCUR BY DEACTIVATING ACTIVE THREAD AND ACTIVATING MODIFY INACTIVE INACTIVE THREAD URGENCY, ASSES A A EVENT OCCUR AGAINST ACTIVE _,\___ FOR INACTIVE URGENCY TO THREAD DETERMINE WHETHER ? SWITCH SHOULD OCCUR

SWITCH SWITCH US 2003/0172256 A1 Sep. 11, 2003

USE SENSE URGENCY TO CONTINUE WITH hoW urgent the program or logic believes the OTHER HEURISTICS TO DETERMINE SWITCH thread should be. By Way of example, a thread’s urgency EVENTS IN A TEMPORAL MULTITHREADED may be “loW,”“medium” or “high,” or further quantiZed With CPU an 3-bit identi?er, thereby providing different sWitch event solutions to treatment of the thread in stalled pipelines. If for BACKGROUND OF THE INVENTION example a thread instruction misses the cache, the controller can loWer the thread’s urgency, e.g., from high to medium, [0001] Temporal multithreading is known in the art as a or from 5 to 4. Additional cache misses can loWer the technique that uses one set of execution resources to execute multiple “programs,” or “threads.” These execution thread’s urgency further, such as from medium to loW, or from 4 to 3. The thread controller preferably utiliZes certain resources often include an array of pipeline execution units. heuristics in making sWitch event decisions. Other heuristics Instructions for a program thread are processed through the may be used by the controller to prescribe the sWitch events pipeline until it stalls; in a “sWitch” event, those stalled according to certain guidelines. By Way of example, pro instructions are then removed and instructions from another cessor interrupts may also modify the sWitch event heuris thread are injected to the same pipeline so as to ef?ciently tics. utiliZe execution resources. [0002] Temporal multithreading thus gives the appearance [0008] The controller also preferably monitors the timing of multiple central processing units (“CPU”). Each thread of a thread, such as to incorporate time out features With the sWitch event heuristics. processes through the execution units as if the program had the entire control of the execution units; activation and [0009] The invention thus provides certain advantages. deactivation of various threads occurs in hardWare control Unlike the prior art, sWitch events are no longer “black and logic based on multiple sWitching events in an attempt to White” decisions that may cause negative processing effects maximally utiliZe the execution units. Within the pipeline. By Way of example, in accord With the [0003] There is a penalty associated With the above invention, a sWitch event may not automatically occur after mentioned sWitch events. Accordingly, the prior art has a certain number of cycles associated With time slice expi developed certain objective criteria for a sWitch event. In ration. That is, an assessment is also made of a thread’s one example, a cache miss triggers a sWitch event because execution and relative to time slice expiration: if that thread the processor needs to acquire data from main memory. In is making good forWard progress, it is not stalled or sWitched another example, a time out counts the cycles of a out; rather its use urgency may actually be increased to thread’s execution and promotes an automatic sWitch for an amplify the thread’s activity. If hoWever there is a stall, the out-of-bounds thread execution duration. next inactive thread might be activated. If on the other hand the active thread has a time slice expiration but has a higher [0004] There is the need to further reduce the negative urgency than the inactive thread, the inactive thread may effects of sWitching events in high performing processors. remain dormant While the pipeline Waits to the By reducing or improving processing of sWitch events, a active, more-urgent thread. Those skilled in the art should processor Will have increased performance, by improving also appreciate that sWitching too late may also create instruction processing ef?ciency across multiple threads. processing penalties. The invention provides advantages One feature of the invention is therefore to provide a over the prior art by adding urgency to the thread in order to processor With intelligent logic for ef?ciently processing and encourage, or not, a sWitch event under appropriate heuris sWitching multi-threaded programs through the processor. tics. Several objects and other features of the invention are apparent Within the description that folloWs. [0010] Moreover, too many sWitch events may underuti liZe the pipeline. Once again, the invention has advantages SUMMARY OF THE INVENTION over the prior art by reducing the underutiliZation of execu tion units, due to sWitch events, by making appropriate [0005] The folloWing patents provide useful background sWitch decisions according to preferred instruction process to the invention and are incorporated herein by reference: ing policies; and these policies may be changed, dynami US. Pat. No. 6,188,633; US. Pat. No. 6,105,123; US. Pat. cally or otherWise, to further reduce the underutiliZation. By No. 5,857,104; US. Pat. No. 5,809,275; US. Pat. No. Way of example, a program thread With “loW” urgency may 5,778,219; US. Pat. No. 5,761,490; US. Pat. No. 5,721,865; sWitch immediately in the event of a stall (predicted or and US. Pat. No. 5,513,363. actual) [0006] In one aspect, a processing unit of the invention has [0011] The invention is next described further in connec multiple instruction pipelines for processing instructions tion With preferred embodiments, and it Will become appar from multiple threads. Though not required, each thread ent that various additions, subtractions, and modi?cations may include an urgency identi?er Within its program instruc can be made by those skilled in the art Without departing tions. The processing unit has a thread sWitch controller to from the scope of the invention. monitor processing of instructions through the various pipe lines. The thread controller also controls sWitch events to BRIEF DESCRIPTION OF THE DRAWINGS move from one thread to another Within the pipelines. The controller may modify the urgency of any thread such as [0012] A more complete understanding of the invention through modi?cation of the urgency identi?er. may be obtained by reference to the draWings, in Which: [0007] “Urgency,” as used herein, generally means an [0013] FIG. 1 schematically illustrates a processing unit abstraction of hoW Well a thread is progressing (or Will be of the invention for processing instructions through pipeline progressing) Within a pipeline; it is also an abstraction as to execution units; US 2003/0172256 A1 Sep. 11, 2003

[0014] FIG. 2 shows an exemplary switch event Within a the ?rst thread out, at sWitch event 70, and activates a second pipeline of the invention; and thread, illustratively With a “high” urgency. [0015] FIG. 3 shows a ?owchart illustrating the use of [0022] The invention thus provides for executing instruc urgency With multithreading, in accord With the invention. tions from other threads in the event a current thread stalls in the pipeline (e.g., EX1), such as through a cache miss that DETAILED DESCRIPTION OF THE DRAWINGS results in a long latency memory operation. More particu larly, sWitch event 70 causes only a minor delay in pipeline [0016] FIG. 1 shoWs a processing unit 10 of the invention. 12. Stall 62 is covered by continued execution of another Unit 10 is for example part of an EPIC processor to process thread through the pipeline; speci?cally, the second thread multiple program threads through multiple pipelines 12. continues execution Within pipeline 12 at stage 64. Pipelines 12 include an array of pipeline execution stages, The second thread may also stall at time 66; hoWever no knoWn to those skilled in the art, to process instructions sWitch event occurs, in this example, because the second incrementally, such as in the fetch stage F, the register read thread has higher urgency than the ?rst thread. Accordingly, stage R, the execute stage E, the detect exception stage D, stall 66 is covered by continued execution of the second and the Write-back stage W. Unit 10 preferably has multiple thread Within execution Within pipeline 12 at stage instruction pointers 14(1), 14(2) . . . 14(M) to accommodate 68. processing multiple threads 15(1), 15(2) . . . (15(M) through [0023] FIG. 3 shoWs a ?oWchart 100 illustrating certain units 12. A cache 13 buffers data from associated pipelines non-limiting thread controller operations of the invention. 12 to register ?les 16. The plurality of register ?les 16(1), After start, in step 102, the thread controller monitors 16(2) . . . 16(N) provide per-cycle storage of data for unit 10, processing of instructions Within an array of pipelines. In via 18; various architected states may be stored in step 103, the time slice expiration unit assesses Whether a register ?les 16(1), 16(2) . . . 16(N), including Write-back time slice expiration occurs. If no, processing continues to data from the W stage. Bypass logic 20 may be used to step 104. In the event a time slice expiration occurs, the accommodate bypass and speculative data transfers to and active thread urgency is set to high and deactivated, and the betWeen pipelines 12 and register ?le 16. inactive thread is activated, at step 105. Monitoring of [0017] Instructions are dispatched to pipelines 12 by a another thread then starts aneW, as shoWn. fetch unit 24 in communication With an instruction pointer [0024] In the event of a possible sWitch event, step 104, 14. An issue unit 26 may be used to couple instructions into the thread controller assesses Whether the event is for an pipelines 12 for execution therethrough. active thread (step 106) or an inactive thread (step 108). If the event is associated With an active thread, the active [0018] Unit 10 also includes a thread controller 30. Con thread urgency is modi?ed and assessed against the inactive troller 30 monitors processing of threads Within pipelines thread urgency to determine Whether to sWitch, or not, at 12; it also de?nes the sWitch events that deactivate and step 110. If the urgencies do not Warrant a sWitch, monitor activate multiple threads Within pipelines 12 to perform ing continues at step 102. If the urgencies do Warrant a multithreading. In the event of a sWitch event, instructions sWitch, then a sWitch occurs at step 112. At step 112, the from current threads are routed through controller 30 via bus active thread is deactivated and the inactive thread is acti 17. vated. [0019] A time slice expiration unit 19 may also couple to [0025] If the event from step 104 is for an inactive thread, unit 10 to monitor time slice expiration of any thread 15 step 108, then the inactive thread urgency is modi?ed and Within pipelines 12. Unit 19 couples With thread controller assessed against the active thread urgency to determine 30, via bus 21, to indicate an expiration event. In such an Whether to sWitch, or not, at step 114. If the urgencies do not event, and as described beloW, thread controller 30 may Warrant a sWitch, monitoring continues at step 102. As sWitch out the current thread, or not, based on the urgency above, if the urgencies do Warrant a sWitch, then a sWitch for that thread. occurs at step 112. At step 112, the active thread is deacti vated and the inactive thread is activated. [0020] Thread controller 30 utiliZes the urgency of various [0026] Accordingly, the invention assesses a thread’s threads processed Within unit 10 to de?ne the sWitch events. urgency to decide Whether the current thread should be By Way of example, controller 30 may monitor a time slice sWitched out of the pipeline or not (steps 110, 114). A expiration of a thread Within pipelines 12 and elect to sWitch modi?cation of the thread’s urgency may also occur by that thread out, or not, based on the urgency of the thread. operation of thread controller 30, such as by issuing an Controller 30 may also modify the urgency of a thread, such instruction to “hint” of a neW thread urgency.. If sWitched as by injecting an instruction into the pipeline. For example, out, the next thread is activated Within the pipeline. The if a thread repeatedly has cache misses, controller 30 may controller then continues to monitor instruction progress of repeatedly loWer the urgency of the thread, for example the neW thread in the pipeline (step 102). Thread loWering the urgency bits of the thread’s instructions from 3, may occur by ?ushing the pipeline and sWitching to another to 2, to 1. In still another example, thread controller 30 architected (e.g., via pointers to the register ?le and monitors processor interrupts for a given thread; it may stored in the thread sWitch controller); the fetching of again adjust the thread’s urgency based on the interrupts. instructions from the neW thread then commences. If the [0021] For purposes of illustration, FIG. 2 illustrates a current thread is not sWitched out, for example if its urgency sWitch event 70 de?ned by controller 30 Within a pipeline is very high, then it may not be sWitched out and the 12. A ?rst thread, illustratively With a “loW” urgency, is continued monitoring (step 102) occurs for the same thread. executed (EX1) in a stage 60 of pipeline 12. That thread [0027] The invention thus attains the objects set forth stalls at time 62 due to a cache miss. Controller 30 sWitches above, among those apparent from the preceding descrip US 2003/0172256 A1 Sep. 11, 2003

tion. Since certain changes may be made in the above 8. A method of claim 7, further comprising determining methods and systems Without departing from the scope of Whether a time slice expiration occurred. the invention, it is intended that all matter contained in the 9. Amethod of claim 8, further comprising utiliZing a time above description or shoWn in the accompanying draWing be slice expiration unit. interpreted as illustrative and not in a limiting sense. It is 10. A method of claim 7, further comprising determining also to be understood that the folloWing claims are to cover Whether a cache miss occurred. all generic and speci?c features of the invention described 11. A method of claim 7, further comprising inserting an herein, and all statements of the scope of the invention instruction to the pipeline to change urgency of the thread. Which, as a matter of language, might be said to fall there 12. A method of claim 1, further comprising the steps of betWeen. deactivating the ?rst thread and activating a second thread, and modifying urgency of the second thread. What is claimed is: 13. A method of claim 1, further comprising the steps of 1. A method for determining thread sWitch points Within monitoring possible sWitch points of an inactive thread pipeline execution units of a processor, comprising the steps having a second urgency and deactivating the ?rst thread, or of: not, based upon a ?rst and second urgencies. monitoring instruction processing of a ?rst thread Within 14. A processor for processing multi-threaded program the pipeline execution units; instructions, comprising: in the event of a possible sWitch point Within the pipeline an array of pipeline execution units and associated heu execution units, deactivate the ?rst thread, or not, based ristics affecting hoW the instructions are processed upon a ?rst urgency indicator for the ?rst thread. Within the units; and 2. A method of claim 1, further comprising deactivating a thread controller for monitoring processing of the the ?rst thread and activating a second thread based upon a instructions Within the units and for sWitching betWeen second urgency indicator for the second thread. multiple program threads based upon (a) the heuristics 3. A method of claim 2, further comprising deactivating and (b) urgencies of the program threads. the second thread, or not, based upon the second urgency 15. A system of claim 14, the heuristics comprising one or indicator for the second thread and in the event of a possible more of time slice expiration heuristics, cache miss heuris sWitch point event of the second thread. tics and processor interrupt heuristics. 4. A method of claim 3, further comprising activating 16. Asystem of claim 14, the program threads comprising another thread Within the pipeline if the second thread is one or more instructions, one of the instructions changing sWitched out. urgency for at least one thread of the processor. 5. A method of claim 1, the step of deactivating the ?rst 17. A system of claim 14, the controller modifying an thread comprising deactivating the ?rst thread, or not, based urgency of any of the threads to modify future treatment of upon the ?rst urgency indicator and upon a second urgency the threads in sWitch out events. indicator of a second thread. 18. A system of claim 17, the controller either decreasing 6. Amethod of claim 1, the step of monitoring comprising or increasing urgency for the program threads by injecting utiliZing a thread controller coupled With the execution an instruction to the pipeline execution units. units. 19. A system of claim 12, further comprising a time slice 7. Amethod of claim 1, further comprising modifying the expiration unit for monitoring expiration of threads Within ?rst urgency indicator to increase or alternatively decrease the processor. urgency of the ?rst thread based upon characteristics asso ciated With the sWitch event.