Sharing best practices with the information technology community Winter 2008

MAGAZINE

also inside

Quad-Core: Faster By Design EDA Among First to Reap Benefits of Multi-Core Architecture page 44

comparing multi-core processors For Server Virtualization page 30

Solaris soars On Intel Architecture page 58

article reprint Reinvented Creating the Next Wave of Transistors Quad-Core Processors 45-nm breakthrough enables 45-nm Manufacturing high performance and energy efficiency in the next generation of Creating the Next Wave multi-core processors. of Quad-Core Processors Like this article? Get the full magazine at http://ipip.intel.com. Join the Intel Premier IT Professional program—sharing IT best practices and Intel product information through events, webinars, podcasts, and publications. 20 36 Virtualization 30 26 20 16 12 8 4 Keeping it Cool Lines Starting Virtual and Global Water The Goes Cooler Transforming Intel IT Wave Creating Next the Quad-Core of Keeping anEye Future the on Server for Processors Comparing Multi-Core Saving Internet the performance efficiency and energy breakthrough 45-nanometer running virtual on quad-core servers architecture shows intel faster growth grumman northrop Internet’s lack of structure jonathan harvard’s shake up the way Intel delivers IT resources aggressive an internal blogs and wikis derives intel commitment efficiency to energy Climate Savers Computing Initiative underscores its participation intel’s 2006 Performance Report, and an Intel rock video roundup a Winter 2008 on tera-scale computing, Intel IT’s better performance and lower costs innovation and inspiration from data center efficiency plan will Zittrain on overcoming the uses IT as a catalyst for as a founder of the 4 enables high www.intelpremierIT-digital.com of thismagazine onlineat check outthe digital version digital 66 62 58 54 48 44 40 Intel on Architecture Soars Solaris

Scenes the Behind Wet Story Side Performance the All It’s About Vista for AVision by Design Bootstrapping Strategic as Assets PCs early adoption of Windows Vista it intel going green the data center to achieve cost reductions while case the for CIOs means alliance animation technology how intel the data center performance on the track, with consumers, and in works bmw to design cpus quad-core capabilities. Read how we did it assets has resulted in lower TCO and increased approach intel’s

study shows benefits—and challenges—of for using wet-side economizers in 8 plays a supporting role powering Pixar’s with Intel to derive maximum greater flexibility and choice will make future chips easier to manage PCs as strategic

 premier it | winter 2008 | www.intel.com/info/ipip contents intel’s recent introduction of 45- 45-nanometer nanometer technology with hafnium- based high-k metal gate transistor breakthrough design represents the most significant enables high change in transistor technology in the performance and past 40 years. This technology, which energy efficiency will debut this year in Intel’s new

family, code-named Creating Penryn, will have a profound impact on mobile, desktop, and server devices the designed to provide high performance and energy efficiency. Our achievement in bringing this

technology to fruition reflects our ded- high-k metal ication to innovation and continues gate transistors are the driving our storied history in the development force behind the 45-nanometer Next of silicon architecture. We believe that products. the conversion to high-k metal gate is revolutionary. Though it has been scaled and made thinner and smaller, Intel has used the same basic transis- Waveof tor structure since the 1960s. This new technology improves on the 65-nanome- Quad-Core ter technology we introduced in 2005 in several ways, including: more than 35 years. As for cost, the price of a Penryn by a factor of five. It is also possible to adjust this pDoubling the transistor density, enabling a smaller transistor is about one-millionth what a transistor tradeoff so that the switching and power settings are Processors chip size, increased , or both cost in 1968. established somewhere between high and low. pA 30 percent reduction in transistor switching As a result of the decreased transistor size A reduction in gate leakage equal to 10 times the power and increased transistor density, we dramatically previous technology is another benefit of high-k tech- pA 20 percent improvement in transistor switching increased the availability of processors based on nology. The entire semiconductor industry is strug- speed, or reduction of source-drain leakage to one- Intel® quad-core technology. Now, this new tech- gling to deal with the heat generated by chips, which fifth of the previous 65-nanometer nology effectively doubles the computing capacity grows exponentially as the number of transistors pA reduction in gate oxide leakage power to one- of products available to IT decision makers, just as increases. Leakage control using high-k materials is tenth of the previous process technology moving from single-core to dual-core products did. one way to help transistors run cooler. In addition, we ensured that Penryn processors main- This is a boon to data center managers, because Abiding by Moore’s Law tained backwards compatibility while still providing Penryn processors can offer significantly higher clock Intel is a longtime adherent to Moore’s Law, which meaningful performance and feature enhancements. speeds on a richer while staying was formulated by Intel cofounder Gordon Moore within existing platform thermal constraints. As a Executive and refers to the continual scaling of transistor Bridging the Gap with High-K Metal result, they can install more computing power while Summary dimensions: reducing feature size, reducing the cost Gate Transistors utilizing the same amount of electricity and maintain- New, tiny 45-nanometer per transistor, and increasing the number of transis- High-k metal gate transistors are the driving force ing the same thermal footprint in the data center. transistors will be inside tors per chip. As a result, we have consistently—every behind the 45-nanometer products, which have over- High-k gate dielectrics can achieve such dramatic the next-generation two years—reduced feature size by seven-tenths, come the barrier between previous generations of gate leakage reductions because they are thicker than Intel® Core™2 Duo, decreased the cost per transistor by half, and doubled transistor technology and future generations. the silicon dioxide dielectrics that they replace, yet Intel Core 2 Quad, and the transistor count. One of the advantages of high-k metal gate still provide the higher capacity needed for improved Xeon® families of multi- Consider this: In 1970, Intel chips contained transistors involves the tradeoff between increased transistor performance. As a result, devices based on core processors. approximately 2,000 transistors each, whereas the switching speed and reduced power. In performance- them run cooler. The high-k materials require a new latest ® processors hold more than 1 billion oriented environments, the switching speed can be manufacturing process to lay down a thickness of one transistors. That means that we have been doubling upgraded 20 percent over previous processors, while molecular level at a time. In our never-ending quest the number of transistors per chip every two years for devices that require lower power can have it reduced to obey Moore’s Law, we’re working to identify the premier it | winter 2008 | www.intel.com/info/ipip premier it | winter 2008 | www.intel.com/info/ipip

36 illustrations by wes duvall 37 second generation of high-k metal gate transistors, which requires significantly more transistors to implement than will extend scaling even further. the previous Radix 4 divider did. The Radix 16 divider greatly benefits scientific computing applications such Innovative Conversion Process as physics, mechanical engineering, and material sci- We were able to decrease gate leakage by converting the ence that include high numbers of divide and square root older silicon dioxide gate insulator to a high-k insulator. instructions. This was a major challenge because we wanted to make By implementing a full-width, single-pass super the insulator as thin as possible, both to improve transistor shuffler unit that is 128 bits wide, Penryn processors can performance and to provide greater dimensional scaling. perform full-width shuffles in a single cycle, once again We knew as early as 1995 that the silicon dioxide insu- leveraging the transistor density available in the 45-nano- lator was reaching its end, but we had time to resolve the meter process technology. This significantly improves problem because we were still a couple of generations performance for many instructions that have shuffle-like away from hitting the wall—which arrived when we operations, such as pack, unpack, and wider packed shifts. couldn’t scale any more between 90- and 65-nanometer These are part of Intel’s various Streaming SIMD Exten- generations. At that point, the insulator had been scaled to sions (SSE) instruction packs, which include SSE, SSE2, a thickness of 1.2 nanometers, which is about five silicon SSE3, and the new-for-Penryn SSE4. These instructions atomic layers. rapidly rearrange data into SIMD registers for additional Faced with that roadblock, we started thinking about vector manipulations, thereby increasing performance how to get past the gate insulator scaling limit. Fortu- for content creation, imaging, video, and high-perfor- can offload work from graphics engines and video- With virtualization top-of-mind for IT decision nately, our development team created a solution that mance computing. capturing devices—which are good at some tasks— makers across industries, these performance gains changed the gate insulator from silicon dioxide to the Still driven to develop more power-efficient products, and pass it back to CPUs—which are good at others. will help guide virtualization strategies. The bottom hafnium-based high-k material and transitioned the gate we weren’t satisfied by the reduced leakage the new tech- This balances the workload across platform comput- line is that IT shops will be able to provide “free” electrode from a poly-silicon material to metal. We started nology offered. Even at a 20 percent relative leakage level, ing elements. In some cases, it can also eliminate resources through improved utilization of the base- putting together the integrated 45-nanometer high-k the number of transistors in Penryn processors would the need for custom external-to-the- com- line hardware, which is definitely cost-effective. metal gate process flow in 2005, demonstrated working consume a significant amount of power due to the aggre- puting engines, which, in turn, reduces overall plat- 45-nanometer memory chips in January 2006, and showed gate transistor leakage. We focused on driving that leak- form costs. Conclusion working 45-nanometer microprocessors a year later. age power down to zero. We realized that the transistor Traditionally, data had been easy to move from Intel’s move to 45-nanometer technology with haf- density would enable the implementation of small, state- CPUs to graphics engines but not the other way nium-based high-k metal gate transistors is a revo- Penryn: From Technology to Products retaining memory blocks, one of the cornerstones of our around, implying that once the data went to the lutionary change in the development of transistor Intel’s success is largely due to our ability to create indus- Deep Power Down Technology, in which processor con- graphics engine, it had to stay there. Streaming loads technology. Accomplishing it was a pressure-packed try-leading products from superior silicon architecture. It text is saved to RAM prior to disabling the processor’s core now make it possible to partition the graphics engine challenge that had to be overcome, because there was is also the direct result of our ability to control variables power supply. This advanced state sig- workload into CPU-optimized components and no going back to the silicon-dioxide insulator days of in the development and manufacturing processes, which nificantly reduces the power of the processor during idle graphics engine-optimized components, and to trans- the past. enable us to produce reliable products that can be quickly periods, such that the internal transistor power leakage fer data between the two domains in real time. The impact of this change will resonate positively rolled out to customers. is no longer a factor. We feel this is a major enhancement and productively throughout the computer industry, The latest example of this formula is Penryn, the over previous generations of Intel mobile processors. The Importance of Virtualization as IT decision makers reap the benefits of the tech- standard-bearer of 45-nanometer technology with high-k Penryn improves virtualization support with its nology and the new Penryn processors in the form of metal gate transistor design. Six Penryn family processors, Other Penryn features include: enhanced Intel® Virtualization Technology (Intel® enhanced servers, desktop PCs, and mobile devices including dual- and quad-core desktop processors and a pEnhanced Intel® Dynamic Acceleration Technology VT)1. Microarchitecture improvements that require that will demonstrate gains in performance and dual-core mobile processor, are members of the Intel® improves single-core processor performance by dynam- no virtual machine software changes can speed up energy efficiency. Core™2 processor family, and the new dual- and quad- ically increasing the performance of active cores when virtual machine transition entry and exit times. When The ability to eliminate obstacles, rapidly develop core server processors are members of the Intel® Xeon® not all cores are utilized. a virtual machine is run on a Penryn-class chip, the new technology, and quickly bring reliable products processor family. A processor for higher-end multiprocess- pThe delivery of overall clock frequencies within exist- hardware hides the entry/exit virtualization com- to market is a hallmark of Intel, and an indication ing systems is under development. The Penryn family ing power and thermal envelopes to further increase mands from the software, accelerating the instruction that we will continue to obey Moore’s Law as we have includes several enhancements to Intel’s microarchitec- performance. Desktop and server products will be context by an average of 25 to 75 percent. for the past 35 years. ture, including a Radix 16 divider, a new “super shuffler,” introduced at clock frequencies of greater than Penryn also improves virtualization support by reduc- and enhanced Deep Power Down Technology. 3GHz. ing the overhead associated with virtualization. In Mark Bohr is a senior fellow in the Technology Manufac- Penryn-based processors provide faster divide perfor- pAn L2 that is up to 50 percent larger, at 6MB, this enhanced environment, IT decision makers will turing Group at Intel. Rob Milstrey is a senior computer architect in the Mobile Platform Group at Intel. mance, roughly doubling the divider speed over previous than the 4MB L2 cache in the Conroe/Woodcrest/ be able to realize more throughput by implementing generations for integer and float- Merom line of processors. Penryn-based servers because the ratio of virtual serv- 1 Intel® Virtualization Technology requires a computer system with an enabled Intel® processor, BIOS, virtual machine monitor (VMM), and, for some uses, †download To learn ing-point divide instructions, ers to physical servers will improve—that is, there more about the 45nm certain computer system software enabled for it. Functionality, performance, or breakthrough, listen to as well as square root computa- Streaming Loads will be more virtual servers for a constant number other benefits will vary depending on hardware and software configurations and the podcast at www. tions. This is achieved through Streaming loads boost performance by providing a of physical servers. In addition, virtual servers can may require a BIOS update. Software applications may not be compatible with intelpremierit-digital.com/ all operating systems. Please check with your application vendor. intelpremierit/include/ the inclusion of a new, faster higher-performance mechanism to transfer nontemporal/ migrate from one physical server to another, further 45nm.html hardware Radix 16 divider, which use-once data into the processor. Thus we enhancing overall performance. premier it | winter 2008 | www.intel.com/info/ipip premier it | winter 2008 | www.intel.com/info/ipip 38 39