POWER OPTIMIZATION METHODS IN HETEROGENEOUS MULTICORE PROCESSORS

PREPARED FOR: SHARON AHLERS ENGINEERING COMMUNICATIONS 350 COLLEGE OF ELECTRICAL AND COMPUTER ENGINEERING CORNELL UNIVERSITY

PREPARED BY: ALEXANDER VITKALOV COLLEGE OF ELECTRICAL AND COMPUTER ENGINEERING CORNELL UNIVERSITY

DECEMBER 12, 2005 VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 2

ABSTRACT

This report evaluates the benefits of using heterogeneous processor cores as a means of reducing microprocessor power consumption while increasing its performance. The project focuses on the hardware implementation of heterogeneous processors rather than software. Advantages of multicore architectures are evaluated across five main categories including performance, efficiency, compatibility, functionality and cost. Increases in speed and efficiency of multicore processors are derived through extrapolation of data from comparison between single core processors and their dual core counterparts. Compatibility and functionality advantages are discussed in terms of backwards compatibility, design flexibility and power consumption. The report concludes with a feasibility study outlining the technological and financial conditions required for profitable development of multicore processors.

VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 3

TABLE OF CONTENTS

LIST OF FIGURES ...... 4 1. INTRODUCTION...... 5 2. PERFORMANCE ...... 5 2.1 CHOICE OF PROCESSORS ...... 6 2.2 OVERALL PERFORMANCE ...... 7 2.2 PERFORMANCE EXTRAPOLATION ...... 9

3. EFFICIENCY ...... 10 3.1 PERFORMANCE PER WATT...... 11 3.2 EFFECTS OF CORE HETEROGENEITY...... 12 3.3 CHALLENGES ...... 13

4. COMPATIBILITY...... 13 4.1 BACKWARDS COMPATIBILITY...... 13 4.2 CORE COMPATIBILITY...... 14

5. FUNCTIONALITY ...... 15 5.1 PROGRAMMABLE PROCESSORS ...... 15 5.2 CHALLENGES ...... 17

6. FEASIBILITY...... 17 6.1 CURRENT TECHNOLOGIES...... 17 6.2 FUTURE TECHNOLOGIES ...... 18

7 CONCLUSION ...... 19 8 RECOMMENDATIONS ...... 19 REFERENCES...... 21 GLOSSARY ...... 23

VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 4

LIST OF FIGURES

FIGURE 1: SELECTED PROCESSORS………………………………………………………6 FIGURE 2: PROCESSOR POWER CONSUMPTION……………………………………...…...7 FIGURE 3: OVERALL PROCESSOR PERFORMANCE [DHRYSTONE]……………….………..7 FIGURE 4: OVERALL PROCESSOR PERFORMANCE [WHETSTONE]………………………...8 FIGURE 5: OVERALL PROCESSOR PERFORMANCE EXTRAPOLATION…....……………….. 9 FIGURE 6: PERFORMANCE PER WATT COMPARISON…………… ……….…………….. 10 FIGURE 7: PERFORMANCE PER WATT EXTRAPOLATION ……………....………………..11 FIGURE 8: BACKWARDS COMPATIBILITY……………………………....………………..14 FIGURE 9: PROGRAMMABLE COMMUNICATIONS BUS AS CORE INTERCONNECT……….. 14 FIGURE 10: VIDEO DECODER SCENARIO……………………………....……………… ..16 FIGURE 11: RELATIVE CORE SIZING……………………………... …....……………….18

VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 5

1. INTRODUCTION effective solutions to the demands of consumers. Unacceptable levels of power Over the past twenty years processor consumption and increasing costs of frequency has always been considered to be developing complex single core chips forced the fundamental measure of performance. the manufacturers to improve the efficiency Higher frequency generally meant faster and performance of their processors through performance. However, this notion has the use of dual core solutions. Although changed as processor power consumption using two identical cores is a step in the right became increasingly important. Power direction, efficiency and performance of consumption is dependent on operating microprocessors can be further improved by frequency and the number of transistors used using multiple heterogeneous cores. The in a processor. Today’s processors use as advantages provided by this method enable many as 250 million transistors, meaning that the fusion of high-performance and mobile a small increase in frequency of each can processor architectures and will provide cause a dramatic increase in overall power effective solutions for the years to come. consumption. The enormous heat that is To understand why utilizing generated as a result causes thermal parallelism in processors by using multiple breakdown of silicon crystals. Although heterogeneous cores is the most effective improving manufacturing process and method of improvement it is helpful to start decreasing transistor sizes lowers the power from a simple performance and efficiency consumption this approach is becoming comparison of identical single and dual core increasingly costly. Therefore, we have processors and then gradually advance to reached a point where the true processor more complicated issues. performance is no longer determined solely by its frequency or transistor size but is 2. PERFORMANCE dependent on the elegance and efficiency of its architecture. Performance of a processor truly Advances such as pipelining, branch depends on a multitude of factors determined prediction and hyperthreading have enabled by its architecture and manufacturing the increase in performance and efficiency of process. In general, frequency has always processors. However, even the most efficient emerged to be the leading factor. However single core architectures cannot provide architectural features such as cache size, VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 6

efficiency of a and pipeline addition, 780 [5] and depth among others are becoming Efficeon 8800 [6] were selected increasingly important in determining the from two opposite ends of mobile spectrum. performance of a single core processor. The disparity in their operating frequency Intuitively, the effects of designing a superior and power consumption illustrates the architecture are magnified when more than flexibility that is required for a truly mobile one core is used. solution. Intel PXA270 [7] is the sole example of a true system on chip (SoC) 2.1 CHOICE OF PROCESSORS processor that is typically used in personal digital assistants. To demonstrate the importance of From Figure 2, which illustrates processor architecture, the performance of processor power consumption, it can be seen several cores from different applications need that the energy use in modern desktop to be compared to each other (Figure 1). In processors, based on Pentium IV or AMD high performance segment, Intel Athlon nearly quadruples that of a typical 670 [1] and AMD Athlon FX-55 [2] were laptop, based on Pentium M. One of the chosen, since they represent the fastest single objectives of this report is to investigate how core processors available today. Intel this figure can be reduced through an Pentium 4 840D [3] and AMD Athlon 64-X2 efficient combination of multiple [4] are their dual core counterparts. In heterogeneous cores.

FIGURE 1: SELECTED PROCESSORS

PROCESSOR FREQUENCY TRANSITORS SIZE PRICE

PERFORMANCE

Intel Pentium IV 670 3800Mhz 169M 112mm2 $625 AMD Athlon FX-55 2600Mhz 114M 115mm2 $824

MOBILE

Intel Pentium M 780 2260Mhz 140M 87mm2 $638

Transmetta Efficeon 1300Mhz 40M 29mm2 n/a

Intel PXA 270 624Mhz 2.5M 50mm2 n/a

DUAL CORE Intel Pentium IV 840D 3200Mhz 230M 237mm2 $667

AMD Athlon X2 4800+ 2400Mhz 233M 199mm2 $790 VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 7

FIGURE 2: PROCESSOR POWER CONSUMPTION

Intel Pentium IV 670

AMD Athlon FX-55

Intel Pentium M 780

Transmetta Efficion

Intel PXA 270

Intel Pentium IV 840D

AMD Athlon X2 4800+

0 20406080100120140 POWER CONSUMPTION (WATT)

Clearly, the highest power consumers counterpart at half the power, it is twice as are the performance based cores of Intel efficient. The subsequent sections, focused Pentium 670 and 840D along with dual core on processor performance, verify the degree AMD Athlon FX-55 and X2. The overall of validity of the above statement. power consumption of dual core solutions is greater. However, the consumption per-core 2.2 OVERALL PERFORMANCE of dual core solution nearly halves the one of a single core. Therefore, if each core The overall performance of the provides equal performance to its single core selected processors can be compared using

FIGURE 3: OVERALL PROCESSOR PERFORMANCE: SISOFT SANDRA (DHRYSTONE)

Intel Pentium IV 670

AMD Athlon FX-55

Intel Pentium M 780

Transmetta Efficeon

Intel PXA 270

Intel Pentium IV 840D

AMD Athlon X2 4800+

0 5000 10000 15000 20000 25000 POWER CONSUMPTION (WATT) VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 8

various benchmarks, specifically SiSoft be the definite favorites, especially in the raw Sandra 2004 Dhrystone and Whetstone [8,9]. performance benchmark such as SiSoft Dhrystone benchmark compares the speed of Sandra Dhrystone. For both Intel and AMD, processors by counting the number of largely the dual core nearly doubles the performance. numerical operations or MIPS (Millions of The Whetstone also shows similar picture. Instruction Per Second), associated with Although the performance of the high-end common application instructions such as the desktop chips is superior to the mobile ones received from Windows (Figure 3). architectures, the difference is marginal. The Whetstone evaluates the floating point Dhrystone performance of Intel Pentium M, performance of a processor MFLOPS which is a mobile processor, is only 15% less (Million Floating Operations per Second), than that of a single core Intel Pentium 670 typically associated with scientific or and AMD Athlon FX-55. In multimedia multimedia applications. The second aspect benchmark, SiSoft Whetstone, Intel Pentium of the benchmark is becoming increasingly M also rounded up well against the desktop important as computers are being used to counter parts. watch videos, listen to music and play 3D On the other hand, Transmetta games (Figure 4). Efficeon and handheld Intel PXA-270 show a In the overall performance dramatic disadvantage in to the overall comparison, dual core processors remain to performance figures. Practically non-existent

FIGURE 4: OVERALL PROCESSOR PERFORMANCE: SISOFT SANDRA (WHETSTONE)

Intel Pentium IV 670

AMD Athlon FX-55

Intel Pentium M 780

Transmetta Efficeon

Intel PXA 270

Intel Pentium IV 840D

AMD Athlon X2 4800+

0 2000 4000 6000 8000 10000 12000 MILLION FLOATING OPERATIONS PER SECOND

VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 9

Dhrystone and Whetstone benchmarks of Intel PXA-270, which is one of the top 2.2 PERFORMANCE EXTRAPOLATION handheld processors used today, shows a definite underperformance when it comes to During Intel Developer’s Forum in ultra portable solutions. This significant the Spring 2005, Intel corporation predicted a difference in performance also explains why ten-fold increase in processor performance PDAs are not as widespread as desktops and due to the introduction of multicores. laptops. Slow processors seriously limit the Considering that the performance of desktop functionality of the units, giving consumers processors increased by 68 times since the less incentive to buy them. The performance introduction of 8086 in 1978 this prediction of the ultra portable processors such as Intel is quite accurate in the long run. As Intel PXA-270 is capped by the stringent limits in readies dual and quad core processors to hit power supply that are required to keep the the markets in 2006-07 the probability of a units portable. Although battery capacities significant short term performance increase is are slowly increasing, other methods, such as also likely. Considering that these processors use of multiple cores, are required to make will be manufactured on a decreased .065μm ultra portable electronics more practical. process, the power and performance

FIGURE 5. PERFORMANCE EXTRAPOLATION (BASED ON SANDRA DHRYSTONE BENCHMARK)

Intel Pentium IV 670 Intel Pentium IV 840D Intel Pentium (Quad) AMD Athlon FX-55 AMD Athlon X2 (Dual) AMD Athlon (Quad) Intel Pentium M 780 Intel Pentium M (Dual) Intel Pentium M (Quad)

0% 50% 100 150 200 250 300 350 400 450 % % % % % % % % PERFORMANCE %

VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 10

advantages will also be extended. By the a current codename Yohan. In addition, a most conservative estimates the quad core quad core desktop processor, codenamed configurations should improve the overall Kentsfield, is planned for the mid 2007 processor performance by a factor of at least arrival [10]. 3 (Figure 5). If the dual and quad core versions are 3. EFFICIENCY used with a portable processor, such as Intel Pentium M, laptops will experience an even Although performance has always more significant increase in performance. been the most important measure of Laptop processor designs are more energy evaluating processor superiority, issues with efficient and therefore produce less heat. energy consumption are rapidly adding new Considering that power dissipation issues are meanings to this concept. Currently, shifting to the forefront of technological manufacturers like Intel and AMD are limitations of processor designs, mobile beginning to evaluate their products on the architectures are likely to benefit more from basis of performance per watt, which reflects the increased core count. If fact, Intel is not only the speed of the processor, but also already making plans to introduce dual core how efficient it is in terms of energy use. mobile processors based on Pentium M, with

FIGURE 6: PERFORMANCE PER WATT COMPARISON.

Intel Pentium IV 670

AMD Athlon FX-55

Intel Pentium M 780

Transmetta Efficeon

Intel PXA 270

Intel Pentium IV 840D

AMD Athlon X2 4800+

0 50 100 150 200 250 300 350 400 PERFORMANCE(MIPS PER WATT)

VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 11

3.1 PERFORMANCE PER WATT for extremely low power consumption, the performance per watt figure varies Since the introduction of performance significantly from one application to the per watt design approach the industry has next. It is important to note that Pentium M been achieving higher performance figures at has a significantly higher performance per lower power consumption. Although mobile watt ratio than any other process in processors have always been better with comparison. Another important trend is that respect to efficiency, the answer for the the dual cores improve the performance per desktop systems has often been dual core watt ratio by roughly 30%. Combining these architectures. Figure 6, derived from figures results with Figure 5 we reach a clear 2 and 3, compares the performance per watt conclusion that mobile cores benefit the most ratings of the selected processors. For the from having multiple cores (Figure 7). To purpose of brevity, only Dhrystone obtain the results, the performance benchmark was used for the overall percentages of figure 5 were divided by the performance component of this comparison. performance per watt ratio of a given Intel PXA-270 was not included due to processor relative to Pentium M. For significantly different CPU architecture. Due instance, for AMD Athlon FX-55 the to the fact that its architecture is optimized performance per watt ratio is 350/140 = 2.5.

FIGURE 7: PERFORMANCE PER WATT EXTRAPOLATION

Intel Pentium IV 670 Intel Pentium IV 840D Intel Pentium (Quad) AMD Athlon FX-55 AMD Athlon X2 (Dual) AMD Athlon (Quad) Intel Pentium M 780 Intel Pentium M (Dual) Intel Pentium M (Quad)

0% 50% 100 150 200 250 300 350 400 450 % % % % % % % % PERFORMANCE % (Relative to Pentium M)

VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 12

Its performance based on Sandra Dhrystone processor, which is rarely used in consumer is 119% relative to Pentium M. Therefore, products, it can only suggest the potential in the total performance per watt is only 47.6% improvement that can be made if processor relative to Pentium M. Since performance architectures were specifically designed to and power consumption data was unavailable take advantage of multicore heterogeneity. In for the quad cores, it was derived by this case, powerful cores can be combined obtaining performance per watt ratio of a with more efficient ones to generate a corresponding single and dual core solution. significant improvement in terms of To obtain the quad core solution, this factor performance per watt figures. Since modern was then multiplied by the performance per processors remain underutilized for most of watt percent of the dual core solution. the time, this approach would yield Clearly, data shows that in terms of significant idle power reductions. More efficiency, mobile solutions are hard to beat. powerful cores would simply be shut off and used only when their performance counts. 3.2 EFFECTS OF CORE For the desktops the reduced power load HETEROGENEITY would decrease the demand for the custom water cooled solutions that are beginning to By using heterogeneous cores we can appear to resolve heat dissipation problems. further increase both performance and In addition, the increased processing efficiency of a processor [11,12]. In a likely capabilities would greatly reduce bottlenecks scenario, several cores with different in calculation intensive applications, such as performance, efficiency and complexity file archiving and conversion. indexes can be combined in one multicore In ultra mobile applications, the speed design. Large variations in the core of a desktop processor is rarely necessary, architectures causes the entire design to however if need does arise the batteries may become more flexible and adaptive to a provide enough power for shorts spurs of specific application. Kumar, et. al. in their time through the use of capacitors. In study found that on average heterogeneous addition, less complex cores can be use used cores provide a significant 39% in power for the specific application further reducing reduction while having only a negligible 3% the power consumption, which will be reduction in performance. Since their discussed later. experiment was based on the Alpha VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 13

3.3 CHALLENGES Corporation over the years is backwards compatibility. Backwards compatibility One of the primary challenges of means that newer processors produced by introducing heterogeneous cores is the Intel are compatible to the ones twenty increasing complexity of the communication years ago. The purpose of this practice is bus that is required for such a complicated so the software does not have to be network. When it comes to the modern dual rewritten for every new generation of core processors, the communication between processors. As a result, many of the the cores is still in its infant stages of newest processors have a number of old development. Although these processors artifacts that serve no purpose in any of provide nearly a two fold increase in some the modern applications. This section of applications, they may provide none in the report focuses not only on describing others. Creating an effective communications the method of ensuring backwards bus is a difficult challenge, which can be compatibility in multicore processors but magnified by introduction of heterogeneous also on making sure that heterogeneous cores that may work on different or even cores are compatible with each other. variable frequencies. In addition, performance bottlenecks and power 4.1 BACKWARDS COMPATIBILITY distribution issues have to be evaluated largely on the hardware level. Although, Multicore processors offer the best Kumar et.al [11] used software to determine possible solution in terms of backwards the processor assignment for a specific compatibility. Compared to the modern instruction, hardware implementations of an counterparts, the processors from twenty effective algorithm would have significant years ago were much slower. Therefore, performance advantages. A separate co- highly efficient processor cores, working at processing unit may be necessary just to deal low frequencies, are more than sufficient to with power and performance optimization. emulate the operation of their ancestors (Figure 8). The backwards compatibility, in 4. COMPATIBILITY faster high performance cores can therefore be neglected. As a result, the unnecessary One of the most important factors in redundancy would be eliminated from the the success of experienced by Intel entire design. In addition, the performance VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 14

and efficiency should increase because common instruction set potentially excludes Since heterogeneous multicore practically useless operations. processors may use significantly different cores to diversify their performance index 4.2 CORE COMPATIBILITY across a range of applications, efficient operation has to be ensured through core FIGURE 8: BACKWARDS COMPATIBILITY compatibility [13]. Most modern designs use a reduced INSTRUCTION TYPE A in instruction set count (RISC) architectures. MOBILE CORE INSTRUCTION TYPE B This approach provides advantages in both performance and power consumption over other processor instruction types. On the other hand, the instruction sets vary from processor to processor. For instance, Pentium IV includes an additional set of multimedia PERFORMANCE INSTRUCTION TYPE A CORE instructions to increase its performance across a wide range of multimedia

FIGURE 9: PROGRAMMABLE COMMUNICATION BUS AS CORE INTERCONNECT.

VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 15

applications, while earlier processors such as al [11]. Since the communication bus will be 80486 do not. The situation becomes more programmed by a local coprocessor, rather complicated as processors with different by indirect software methods used in [11], RISC are combined together. For instance, the efficiency and performance of the overall the RISC in Pentium IV is optimized for the design is likely to have a significant increase. high performance applications. On the other hand, the RISC in the Arm processor, 5. FUNCTIONALITY commonly used in handheld applications, is optimized for power consumption [14]. As a Since heterogeneous multicore result, the instruction sets are not compatible processors are likely to include a number of even though they are quite similar in their programmable components besides a purpose. communications bus, the functionality of the The most efficient solution is to design is likely to increase significantly. The combine the heterogeneous cores using purpose of this section is to discuss the translation layer [15]. Nava et. al proposed an advantages in functionality that are approach resembling a network topology to associated with multicore processors and resolve the communication issues in between their programmable components. the heterogeneous cores. Having network based communication bus act as a translation 5.1 PROGRAMMABLE PROCESSORS layer between the heterogeneous cores will eliminate most of the compatibility issues The key element of a heterogeneous between the cores. In addition, since the multicore design of tomorrow will be an proposed communication bus can be increased number of programmable programmable. The power consumption can processors. The advantages of programmable therefore be further decreased by using smart processors include custom execution units, routing techniques optimized to increase the variable instruction sets as wells as registers performance per watt rating of the processor. and register files [16]. Conventional fixed Programmable components, such as instruction set processors simply cannot communication bus connecting compete with flexibility and performance heterogeneous cores, can have a significant advantages offered by the programmable impact on the increased performance processors when it comes to specific compared to the approach used by Kumar et. applications. For instance, the emergence of VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 16

digital signal processing in internet and more complex and have wide range of multimedia applications allowed system functionality. For instance, the same architects to design processors for their programmable processor can a physics specific algorithms and subsequently update processor for scientific and entertainment them as their algorithms improve over time. applications or act as a GPS processor for For instance, a configurable processor used navigation applications. to decode older MPEG 2 video files, can be Considering that modern processors reprogrammed to decode newer MPEG-4 are swaying away from the traditional videos. handcrafted design approach, the complexity Today’s system on chips design of the processors is likely to increase. In the contains hundreds of custom programmable long run, the goal is to enable computers to processors [16]. This is achieved by keeping design themselves with as few human inputs the complexity of the programmable as possible. Today, computers go only as far processor cores at relatively low levels. In as aiding the designer in optimizing a given future, the programmable cores may become processor architecture. Special software

FIGURE 10: VIDEO DECODER SCENARIO USING HETEROGENEOUS MULTICORE PROCESSOR

VIDEO APPLICATION VIDEO APPLICATION (RESOLUTION 1280X720 @30FPS) (RESOLUTION 800X480 @ 25 FPS)

OPERATING SYSTEM OPERATING SYSTEM [POWER SAVE]

PROGRAMMABLE COMMUNICATIONS BUS PROGRAMMABLE COMMUNICATIONS BUS

CORE CORE PROG PROG CORE CORE PROG PROG A B CORE CORE A B CORE CORE

PERFORMANCE MODE MOBILE MODE

ON ON

OFF OFF

VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 17

packages can be used to reprogram In addition, as the complexity of the processors, while optimizing its performance heterogeneous components is increasing so and power consumption to a given scenario. will the complexity of the tools that are It is likely, that one of the first applications required to design them. A very high level of of this approach will be configurable abstraction is required to create a system with processors that would reprogram themselves so many variable parameters. In addition, a based on the amount of available power. For point may be reached when the complexity of instance, a video stream decoder could the instructions and design will start adding a provide an HDTV quality resolution while a burden on the overall system performance. laptop is plugged into a wall outlet, while giving a lower resolution and saving power 6. FEASIBILITY when the user is traveling (Figure 10). The purpose of the feasibility study in 5.2 CHALLENGES this report is to evaluate technological and market conditions required for making The primary challenge as the number heterogeneous core processors a viable and complexity of programmable alternative to current technologies. The study components increases is to ensure that evaluates current technological conditions heterogeneous cores are communicating and investigates near future trends. efficiently [17]. The efficient operation means that the power consumption has to be 6.1 CURRENT TECHNOLOGIES minimized according to the performance demand. An increasing variability in For the past four years the instruction sets makes this a challenging task. manufacturing standard was the .09μ As instructions become significantly technology. Along with improved processor different from each other, it is harder to architectures, it allowed the increase in determine the best suited component. 64 bit processor frequencies from roughly 1.4 to 3.8 instructions, already present in some CPU’s Ghz. This seemingly disproportional may provide the answer to this problem since frequency increase had a serous negative they accommodate twice the amount of impact on power requirements of a typical information compared to traditional 32 bit processor. Since larger number of transistors instruction. was needed to achieve high frequency VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 18

designs and each transistor required higher processors. However, in order to make a operating voltage for faster switching, the significant leap into the future the core overall power increased quadratically. manufacturing technology needs to decrease Modern designs have reached a point where from .09μm down to 0.065μm. This 38% performance is strictly limited by the ability decrease in length would cause nearly a 50% of the processor to withstand the enormous decrease in total area (Figure 11). heat it generates. In addition, the increased Considering that manufacturers are also complexity of the designs is making modern shifting towards using larger wafers1, the processors have lower production yields, decrease in production costs per unit will at which drive the unit prices up. In some least half. cases, such as Intel Pentium Extreme Edition, the unit prices have reached a staggering FIGURE 11. RELATIVE CORE SIZING 1000 dollars. To resolve the problem with power CACHE CORE CORE dissipation and low yields manufacturers CACHE resorted to the use of dual cores and 64 bit CORE CORE CORE processors. By increasing parallelism in the architectures, the processor designers were SINGLE CORE 0.09um QUAD CORE 0.065um able to decrease the frequency and power consumption of each core while still POWER

increasing the performance. Less POWER PERFORMANCE PERFORMANCE complicated cores that are used in dual core solutions have higher yields, decreasing costs of the overall design.

The decreased core area will also 6.2 FUTURE TECHNOLOGIES allow more cores to be combined into one

processor. The current .09μ technology does For the next several years the trend of not allow practical integration of more than exploiting parallelism in processor designs two cores, since cooling large cores becomes will gain momentum. Improved

manufacturing processes will continue to 1 Wafers are used to grow silicon crystals, which are drive down the costs of developing dual core subsequently sliced and divided up into processor cores. VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 19

problematic. A larger area can also solutions. In terms of performance and potentially allocate more space to the power efficiency, heterogeneous multicore optimization circuitry. This is one of the processors offer unprecedented increases due reasons why 0.065μm technology would to power consumption flexibility and high allow for development of quad-core and level of configurability. Although this report multicore processors. at times focused solely on performance Unfortunately, many fundamental advantages of multicore architectures, one physical phenomena, such as leakage has to keep in mind that in future the terms currents, are imposing serious constraints to performance and efficiency will become sub .065 μm technologies. Although Intel is interchangeable. Due to the increased currently looking at a potential reduction parallelism, the performance of future down to .045μm within the next two years multicore processors will be limited by [10], this projection may be as realistic as mostly the amount of power supplied and not 5Ghz Pentiums that were rumored by 2005 the frequency at which it operates. Therefore, and never delivered. Sub .045μ technologies the foremost issue with heterogeneous will eventually become a reality once multicore processor design is optimizing extreme forms of lithography are power consumption by carefully selecting the implemented. At this point heterogeneous cores and engaging programmable multicore processors will likely become the components based on the demands of dominant trend in the markets due to a applications. number of advantages discussed in prior in this report. 8 RECOMMENDATIONS

7 CONCLUSION Based on their performance and efficiency multicore processors are clearly Although true high performance the future of computing. Advantages in heterogeneous multicore processors are still functionality and power consumption guarded by unresolved technological associated with multicore designs will limitations, the future of this technology increase the amount of possible applications seems promising. The advantages introduced while adding new ways we will use by multicore designs overshadow the computers in our lives. For instance, benefits of current single and dual core heterogeneous multicores and breakthroughs VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 20

in memory technologies such as development of heterogeneous multicore perpendicular recording and magnetic processors. Although alternative research memory, will allow the current desktops to directions, such as quantum computing, be squeezed down do the size of a cellphone. promise lucrative opportunities, they do not Personalized programmable processors in have solid theoretical and practical these phones may enable the use of foundations. Advances in multicore cellphones as credit cards, which will be processor architectures are based on decades orders of magnitude more secure compared of research in the field of silicon based to traditional methods. Ultimately, semiconductors and not on a small number of heterogeneous cores will cause an theoretical speculations. This is why increasingly interactive experience from all increasing research and financing in the field electronics across the board. of heterogeneous multicore processors is an This is one of the multitude of undeniably solid investment that will bring examples reflecting the importance of significant returns in the long run. technologies which accelerate the VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 21

REFERENCES

[1] Intel Pentium 4 Processor 670 Processor Datasheet. Intel Corporation [Online] Available from: www.intel.com [2] AMD Athlon 64 4800+ Processor: Datasheet. AMD Corporation [Online] Available from: www.amd.com [3] Intel Pentium D Processor 840, 830, and 820: Datasheet. Intel Corporation. [Online] Available from: www.intel.com [4] AMD Athlon 64 X2 Processor: Datasheet. AMD Corporation [Online] Available from: www.amd.com [5] Intel Pentium M 770 Processor: Datasheet. Intel Corporation [Online] Available from: www.intel.com [6] Transmeta Efficeon TM 8800 Processor. Transmeta Corporation [Online] Available from www.transmeta.com [7] Intel PXA270 Processor: Datasheet. Intel Corporation [Online] Available from: www.intel.com [8] SiSoft – The Diagnostic Tool. SiSoft Corporation [Online] Available from: http://www.sisoftware.net/index.html?dir=&location=qa&langx=en&a= [9] Tom’s Hardware Guide Processors. Tom’s Guide Publishing, 2005. [Online] Available from: http://www23.tomshardware.com/index.html [10] SCHMID, P. Top Secret Intel Processor Plans Uncovered. Tom’s Guide Publishings. [Online]. Available from: http://www.tomshardware.com/2005/12/04/top_secret_intel_processor_plans_uncovered/ index.html VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 22

[11] KUMAR R, FARKAS K, JOUPPI N, RANGANATHAN P, TULLSEN D. Single-ISA Heterogeneous Muti-Core Architectures: The Potential for Processor Power Reduction. Proceedings of the 36th International Symposium on Microarchitecture (MICRO-36’03). IEEE. 2003. [12] BALAKRISHNAN S, RAJWAR R, UPTON M, LAI K. The Impact of Performance Asymmetry in Emerging Multicore Architectures. Proceedings of the 32nd International Symposium on Computer Architecture (ISCA’05). IEEE. 2005.

[13] JERRAYA A, TENHUNEN H, WOLF W. Introduction to Microprocessor Systems On Chips. Computer, v 38, n 7, July 2005. pp. 36-40. [14] Goodacre J, Sloss A. Parallelism and the ARM Instruction Set Architecture. Computer, v 38, n 7, July 2005. pp. 42-50 [15] NAVA M.D, BLOUET P, TENINGE P, COPPOLA M, BEN-ISMAIL T, PICCHIOTTINO S, WILSON R. An Open Platform for Developing Multiprocessor SOCs. Computer, v 38, n 7, July 2005. pp. 60-67 [16] LEIBSON S, KIM J. Configurable Processors: A New Era in Chip Design. Computer, v 38, n 7, July 2005. pp. 51-59. [17] JERRAYA A, BAGHDADI A, CESARIO W, GAUTHIER L, LYONNARD D, NICOLESCU G, PAVIOT Y, YOO S. Application Specific Multiprocessor Systems-on-Chip. SASIMI.

VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 23

GLOSSARY

CORE – Silicon device that contains transistor logic for the Processor. CPU – Central Processing Unit, or a Processor. DHRYSTONE – Benchmark used to measure floating point (MFLOPS) performance of a processor GPS- Global Positioning System. HETEROGENEOUS – made of processor core of different architectures. KENTSFIELD- First quad-core desktop processor due to appear in 2007. MIPS – Million Instructions Per Second MFLOPS – Million Floating Operations per Second PDA- Personal Digital Assistant PROCESSOR – Component that is responsible for evaluation of instructions. RISC - Reduced Instruction Set Coun.This approach is used in most modern microprocessors. It allows for faster and more efficient hardware designs. WATT- Unit of power, or work per second. WHETSTONE- Benchmark used to measure integer (MIPS) performance of a processor YOHAN- Dual core processor based on .065 process due to replace current Pentium M.

VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 24