Introduction to Multithreading, Superthreading and Hyperthreading

Introduction to Multithreading, Superthreading and Hyperthreading by Jon "Hannibal" Stokes Back in the dual-Celeron days, when symmetric program at the same time. This might sound odd, so multiprocessing (SMP) first became cheap enough in order to understand how it works this article will to come within reach of the average PC user, many first look at how the current crop of CPUs handles hardware enthusiasts eager to get in on the SMP multitasking. Then, we'll discuss a technique called craze were asking what exactly (besides winning superthreading before finally moving on to explain them the admiration and envy of their peers) a dual- hyper-threading in the last section. So if you're processing rig could do for them. It was in this looking to understand more about multithreading, context that the PC crowd started seriously talking symmetric multiprocessing systems, and hyper- about the advantages of multithreading. Years later threading then this article is for you. when Apple brought dual-processing to its PowerMac line, SMP was officially mainstream, As always, if you've read some of my previous tech and with it multithreading became a concern for the articles you'll be well equipped to understand the mainstream user as the ensuing round of discussion that follows. From here on out, I'll benchmarks brought out the fact you really needed assume you know the basics of pipelined execution multithreaded applications to get the full benefits of and are familiar with the general architectural two processors. division between a processor's front end and its execution core. If these terms are mysterious to you, Even though the PC enthusiast SMP craze has long then you might want to reach way back and check since died down and, in an odd twist of fate, Mac out my "Into the K7" article, as well as some of my users are now many times more likely to be sporting other work on the P4 and G4e. an SMP rig than their x86-using peers, multithreading is once again about to increase in Conventional multithreading importance for PC users. Intel's next major IA-32 processor release, codenamed Prescott, will include Quite a bit of what a CPU does is illusion. For a feature called simultaneous multithreading instance, modern out-of-order processor (SMT), also known as hyper-threading. To take architectures don't actually execute code full advantage of SMT, applications will need to be sequentially in the order in which it was written. I've multithreaded; and just like with SMP, the higher covered the topic of out-of-order execution (OOE) the degree of multithreading the more performance in previous articles, so I won't rehash all that here. an application can wring out of Prescott's hardware. I'll just note that an OOE architecture takes code that was written and compiled to be executed in a Intel actually already uses SMT in a shipping specific order, reschedules the sequence of design: the Pentium 4 Xeon. Near the end of this instructions (if possible) so that they make article we'll take a look at the way the Xeon maximum use of the processor resources, executes implements hyper-threading; this analysis should them, and then arranges them back in their original give us a pretty good idea of what's in store for order so that the results can be written out to Prescott. Also, it's rumored that the current crop of memory. To the programmer and the user, it looks Pentium 4's actually has SMT hardware built-in, it's as if an ordered, sequential stream of instructions just disabled. (If you add this to the rumor about went into the CPU and identically ordered, x86-64 support being present but disabled as well, sequential stream of computational results emerged. then you can get some idea of just how cautious Only the CPU knows in what order the program's Intel is when it comes to introducing new features. instructions were actually executed, and in that I'd kill to get my hands on a 2.8 GHz P4 with both respect the processor is like a black box to both the SMT and x86-64 support turned on.) programmer and the user. SMT, in a nutshell, allows the CPU to do what most The same kind of sleight-of-hand happens when you users think it's doing anyway: run more than one run multiple programs at once, except this time the http://www.arstechnica.com/paedia/h/hyperthreading/hyperthreading-1.html operating system is also involved in the scam. To the front end, with the "back end"/"execution core" the end user, it appears as if the processor is containing only the execution units themselves and "running" more than one program at the same time, the retire logic. So in this article, the front end is the and indeed, there actually are multiple programs place where instructions are fetched, decoded, and loaded into memory. But the CPU can execute only re-ordered, and the execution core is where they're one of these programs at a time. The OS maintains actually executed and retired. the illusion of concurrency by rapidly switching between running programs at a fixed interval, called Preemptive multitasking vs. Cooperative a time slice. The time slice has to be small enough multitasking that the user doesn't notice any degradation in the usability and performance of the running programs, While I'm on this topic, I'll go ahead and take a brief and it has to be large enough that each program has moment to explain preemptive multitasking versus a sufficient amount of CPU time in which to get cooperative multitasking. Back in the bad old days, useful work done. Most modern operating systems which wasn't so long ago for Mac users, the OS include a way to change the size of an individual relied on each program to give up voluntarily the program's time slice. So a program with a larger CPU after its time slice was up. This scheme was time slice gets more actual execution time on the called "cooperative multitasking" because it relied CPU relative to its lower priority peers, and hence it on the running programs to cooperate with each runs faster. (On a related note, this brings to mind other and with the OS in order to share the CPU one of my favorite .sig file quotes: "A message from among themselves in a fair and equitable manner. the system administrator: 'I've upped my priority. Sure, there was a designated time slice in which Now up yours.'") each program was supposed to execute, and but the rules weren't strictly enforced by the OS. In the end, Clarification of terms: "running" vs. we all know what happens when you rely on people "executing," and "front end" vs. "execution and industries to regulate themselves--you wind up core." with a small number of ill-behaved parties who don't play by the rules and who make things miserable for For our purposes in this article, "running" does not everyone else. In cooperative multitasking systems, equal "executing." I want to set up this some programs would monopolize the CPU and not terminological distinction near the outset of the let it go, with the result that the whole system would article for clarity's sake. So for the remainder of this grind to a halt. article, we'll say that a program has been launched and is "running" when its code (or some portion of Preemptive multi-tasking, in contrast, strictly its code) is loaded into main memory, but it isn't enforces the rules and kicks each program off the actually executing until that code has been loaded CPU once its time slice is up. Coupled with into the processor. Another way to think of this preemptive multi-tasking is memory protection, would be to say that the OS runs programs, and the which means that the OS also makes sure that each processor executes them. program uses the memory space allocated to it and it alone. In a modern, preemptively multi-tasked and The other thing that I should clarify before protected memory OS each program is walled off proceeding is that the way that I divide up the from the others so that it believes it's the only processor in this and other articles differs from the program on the system. way that Intel's literature divides it. Intel will describe its processors as having an "in-order front Each program has a mind of its own end" and an "out-of-order execution engine." This is because for Intel, the front-end consists mainly of The OS and system hardware not only cooperate to the instruction fetcher and decoder, while all of the fool the user about the true mechanics of multi- register rename logic, out-of-order scheduling logic, tasking, but they cooperate to fool each running and so on is considered to be part of the "back end" program as well. While the user thinks that all of the or "execution core." The way that I and many others currently running programs are being executed draw the line between front-end and back-end places simultaneously, each of those programs thinks that it all of the out-of-order and register rename logic in has a monopoly on the CPU and memory. As far as http://www.arstechnica.com/paedia/h/hyperthreading/hyperthreading-1.html a running program is concerned, it's the only four-instruction limit.

Load more