Appliedmicro's X-Gene

AppliedMicro’s X-Gene: Minimizing Power in Data-Center Servers

By Linley Gwennap Principal Analyst, The Linley Group

July 2012

www.linleygroup.com

AppliedMicro’s X-Gene: Minimizing Power in Data-Center Servers

By Linley Gwennap, Principal Analyst, The Linley Group

In cloud data centers, power-hungry server processors drive up operating costs both directly and indirectly. As these data centers grow, cloud server providers are seeking to reduce costs by using lower-power processors. ARM technology is one approach that can provide significant power savings. AppliedMicro is developing an ARM-compatible 64-bit server processor called X-Gene that will deliver a leap forward in power efficiency.

Data Center Workloads

Cloud computing is changing the way servers work. In the past, the focus has been on large workloads such as databases and financial services. These types of programs work best on beefy CPUs that can power through complex code. As a result, Intel focused on maximizing the performance of the CPU used in its Xeon products, somewhat reluc- tantly scaling to multiple CPUs per chip when it approached the limits of CPU speed.

A cloud data center, in contrast, provides web services such as accessing email or photo libraries or editing documents. In these cases, the server performs a modest amount of work per user. Each user’s workload is independent from others, increasing the opportunities for parallel processing. For this type of workload, the performance of an indi- vidual CPU is less important. If more performance is required, the IT manager can shovel in more processors and more servers. The important metrics are performance per unit of rack space, performance per dollar of equipment cost, and performance per watt.

Changing workloads also affect the type of software that is used. Rather than Windows or proprietary operating systems, many web servers run a variant of Linux. These servers may also use open-source web software such as Apache, MySQL, and PHP. As a result, they are not dependent on a single CPU instruction-set architecture (ISA). Alternative ISA providers can serve this market simply by porting these open-source software stacks to their architectures. These providers cannot, however, address servers that run Windows or other Microsoft services.

Reducing Power to Improve TCO

In large data centers, a growing concern is energy use. The power used by Intel’s high- end processor chips has increased over the past several years, and the rising cost of electricity per kilowatt-hour exacerbates this problem. Keep in mind that data centers must pay twice for a processor’s electrical consumption: once to power the processor and again to power the air-conditioning units needed to remove the heat generated.

According to Microsoft, a large data-center operator, equipment cost can be less than half of a basic server’s TCO. Electrical power (including power for cooling) can con- tribute about 20% to the TCO, as Figure 1 shows. Not all of this power comes from processors, but processor chips typically consume half of a server’s power, and servers

(and their associated cooling) consume the vast majority of the power in a data center. Thus, processor power is the biggest single factor in data-center power usage.

Higher processor power creates other, hidden costs. Each server contains a power supply that must be sized for the maximum power consumed by the system. The more power that is required, the more expensive this power supply will be. Internal cooling fans and other thermal-management devices also add to the server cost.

Much of the cost of building a data center, which is amortized as part of the TCO, is proportional to the power requirements of the servers. In fact, about 80% of the build cost is for air-conditioning equipment, power distribution, and backup power supplies (e.g., generators). Each of these components must be scaled for the system power required. As Figure 1 shows, these power-driven costs add up to another 20% of server TCO.

Figure 1. Total cost of ownership (TCO) for a basic 1U server. The purchase cost of the server contributes less than half of the TCO, whereas electricity consumes about 20%, a portion that is rising over time. *Equipment related to power and cooling requirements. (Source: Microsoft)

Advanced System Design

Microservers provide a new approach to improving power efficiency. Microservers are similar to blade servers but use even smaller processor boards to reduce the size of the system. In this way, a microserver can pack more processors into the same rack space as a blade server. Microservers combine low-power processors with shared system resources such as a power supply, networking, storage, and a chassis. Sharing resources enables consolidation and reduces system cost while improving power efficiency.

Although blade servers are primarily consuming two-socket (2P) processors, the smaller size of microservers drives demand for single-socket (1P) processors. These 1P processors don’t require the high-speed coherent interface (e.g., QPI) that connects the processors in a 2P system, reducing their cost.

Further reductions in size and power can be achieved through increasing the integration level of server processors. Today’s Xeon-based products combine a processor chip with a system-logic (south-bridge) chip and memory to form a complete 1P server. The system will also require additional Ethernet switch chips to connect the server boards. SeaMicro, a server vendor recently acquired by AMD, pioneered the integration of the Ethernet fabric with the system logic, eliminating one chip per server. To minimize cost and power, future server processors will adapt a system-on-a-chip (SoC) model that combines the entire server (except DRAM) in a single chip.

The ARM Alternative

Another technology for reducing power in servers is the use of ARM processors. Whereas Intel has only recently begun to retrofit its Xeon processors for lower power, ARM processors are designed from the ground up for power efficiency. This focus comes from ARM’s heritage in mobile phones and other battery-powered devices, but more recently, the company and its partners have expanded into the digital home (e.g., set-top boxes), microcontrollers, and even wireless infrastructure. The same technology can be applied to PCs and servers.

To better address the needs of the server market, ARM recently introduced a 64-bit version of the architecture called ARMv8. This version extends the size of the ARM registers to 64 bits, matching the capabilities of current Intel processors. Earlier ARM designs can address only 4GB of memory at a time, not enough for server software that manipulates large databases.

A challenge for ARM in the general server market is that most server software is designed to run on Intel processors. Most cloud software, however, is either open sourced or proprietary code developed by the service provider. In either case, the service provider can port their code to ARM as needed. Porting from a 64-bit Intel processor to a 32-bit ARM processor, requires checking for long pointers and possibly splitting up large objects into chunks that a 32-bit chip can handle. Porting to a 64-bit ARMv8 processor eliminates these problems and simplifies the transition.

While matching the key features of Intel’s x86 instruction set, ARM provides one key technical advantage. Intel processors bear the burden of compatibility with decades of legacy CISC code. To support this code, x86 processors include a complex instruction decoder that converts these CISC instructions into internal RISC instructions (which Intel calls micro-ops). Because ARM software already uses the more efficient RISC encoding, it does not require this conversion step. The complex x86 decoding logic takes up extra die area and, more importantly, burns extra power for each instruction that the processor executes.

Current ARM processors do not come close to the performance of Intel’s Xeon products. Even so, these smaller CPUs can efficiently handle workloads that are limited by memory bandwidth or I/O throughput. Web caching and storage applications often fall into this category. As the performance of ARM processors increases over time, they will be able to take on more CPU-intensive workloads.

Opportunities for ARM

We estimate the merchant market for server processors exceeded $7 billion in 2011. This figure excludes processors sold into workstation and embedded designs, although those chips may use the same brand (e.g., Xeon) as server processors. The merchant market also excludes server processors consumed internally by server OEMs such as IBM’s POWER chips and Oracle/Fujitsu SPARC chips. Thus, Intel and AMD almost exclusively served the 2011 merchant market for server processors, with Intel’s Xeon family dominating the market. In units, we estimate Intel and AMD shipped 17.4 million server processors into about half as many x86 servers.

Our forecast for server processors shows the merchant market growing to $9.4 billion in 2016, representing a five-year compound annual growth rate (CAGR) of nearly 6%. This top-level forecast, however, hides the diverging growth rates of underlying server seg- ments. In particular, servers sold to public-cloud providers will grow at about three times the rate of the overall market. By contrast, servers for small-to-medium businesses (SMB) will lag the overall market as SMBs adopt cloud computing as an alternative to internal infrastructure. We expect enterprise data centers, which may include private clouds, to grow at about the same rate as the overall market. The remaining segment, high-performance computing (HPC), should grow at nearly twice the rate of the total market and should consume as many processors as servers for SMBs by 2015.

For developers of server processors based on alternative instruction-set architectures, predominantly ARM, the question is how much of the server market is available to them. To examine the serviceable market for alternative architectures, we made several assumptions. First, we assumed no commercial-software support for alternative architectures would be available during the forecast period. This implies end customers must use open-source or internally developed operating systems and applications. Second, as a simplifying assumption, we excluded HPC as a target segment, because alternative architectures have not emphasized floating-point performance. This could change, however, and the first assumption does not exclude all HPC applications as potential future targets.

In the near term, these assumptions limit the serviceable market almost exclusively to public-cloud providers. Within the public-cloud segment, software as a service (SaaS) satisfies our first assumption, whereas infrastructure as a service (IaaS) does not. That is, IaaS requires binary compatibility with operating systems and applications; therefore, we exclude IaaS from the serviceable market. Throughout our forecast period, however, SaaS should represent the vast majority of the public-cloud segment. In the longer term, we expect some large enterprise data-center customers to follow the lead of cloud providers in using so-called “big data” applications such as Hadoop. This trend should open a small portion of the enterprise segment to alternative architectures by 2016.

Figure 2 shows our resulting forecast for the serviceable market (SAM) for alternative architectures versus the total available market (TAM) for merchant server processors. Although suitable products were not yet available, we estimate the market addressable by alternative architectures at more than $1 billion in 2011. Given the high growth rate of the public-cloud segment, we expect the SAM to climb to $2.8 billion in 2016,

©2012 The Linley Group, Inc. - 4 - AppliedMicro’s X-Gene: Minimizing Power in Data-Center Servers representing 30% of the total merchant server-processor market. This does not mean we expect ARM-based server processors to have 30% market share in 2016, but it provides a sizable opportunity for growth.

Figure 2. Market opportunity for non-x86 server processors. While the total market for merchant server processors is more than $10 billion, the portion of the market that can be served using non- x86 processors will grow to $2.8 billion in 2016.

Case Study: The X-Gene Processor

One of the first processors to take advantage of the capabilities of ARMv8 will be AppliedMicro’s X-Gene. This processor takes ARM performance to the next level with a custom-designed CPU that is fully compliant with the 64-bit ARM instruction set. This CPU is designed to approach the instructions per cycle (IPC) of Intel’s current Xeon products. To do so, it will use a four-issue superscalar architecture with out-of-order execution. This approach goes beyond even ARM’s high-end Cortex-A15, a three-issue out-of-order architecture.

The company expects its X-Gene CPU to operate at 2.4GHz in 40nm and 3.0GHz in 28nm. This clock rate is faster than that of Cortex-A15, which is rated at 2.5GHz in 28nm. The ARM design is synthesizable, however; AppliedMicro has an advantage in being able to “harden” its physical design, optimizing it for greater speed. Instead of aiming for pure performance, X-Gene is optimized for performance per watt. A complete 40nm eight-core 2.0GHz X-Gene processor with system logic is rated at just 25W TDP.

Each CPU has its own instruction and data caches; each pair of CPUs shares a level-two cache of unspecified size. These pairs of cores are connected to each other and to a level- three cache through a nonblocking cache-coherent crossbar with more than 1,000Gbps (1.0Tbps) of peak bandwidth. This approach should be adequate for up to eight cores.

Unlike Intel’s Xeon processors, X-Gene is a complete SoC with memory controllers, integrated accelerators, networking, and other system I/O, as Figure 3 shows. The

©2012 The Linley Group, Inc. - 5 - AppliedMicro’s X-Gene: Minimizing Power in Data-Center Servers processor connects directly to SATA drives and provides 10G Ethernet (10GbE) and Gigabit Ethernet (GbE) ports. It supports multiple DRAM channels with ECC protection. To reduce system power, the integrated memory controllers support 1.35V low-power DDR3L DRAM as well as standard DDR3 DRAM.

Figure 3. High-level block diagram of AppliedMicro’s X-Gene processor. X-Gene supports up to 32 ARMv8 CPU cores in a complete system-on-a-chip design. The company has not disclosed the specific I/O and memory configurations.

AppliedMicro has already demonstrated a working model of its ARMv8 CPU running in an FPGA-based simulator. The first X-Gene processor is due to sample in 40nm by the end of this year. If everything stays on track, 40nm X-Gene processors should enter production in 2H13, followed by 28nm versions in 2014.

Competitive Landscape

Intel is currently addressing microservers and other low-power servers with a hodge- podge of products including the Xeon E3-1220Lv2 (17W TDP), which uses the 22nm Ivy Bridge microarchitecture. In 2H12, the company will introduce its first server processor based on the Atom microarchitecture, a dual-core design code-named Centerton that has a TDP rating of just 6W.

Whereas the E3-1220Lv2 has only two CPUs running at 2.3GHz, AppliedMicro expects X-Gene to provide four CPUs at 2.4GHz within the same power envelope. If the X-Gene CPU can generate the same performance per megahertz as Ivy Bridge, this chip would have about 2.5x the performance of Xeon for the same power. A similar comparison matching an eight-core X-Gene against a six-core Intel E5-2640 gives X-Gene a 2.7x advantage in performance per watt. AppliedMicro has not actually benchmarked the unfinished X-Gene CPU, however, and matching Intel’s performance per megahertz will not be easy.

Furthermore, by the time X-Gene enters production, Intel should be shipping server processors based on its next-generation Haswell microarchitecture. Although Haswell will use the same 22nm technology as Ivy Bridge, Intel claims it will offer a sizable improvement in performance per watt; the company has not disclosed the size of this

©2012 The Linley Group, Inc. - 6 - AppliedMicro’s X-Gene: Minimizing Power in Data-Center Servers improvement or how it will achieve this goal, however. We expect this new microarchitecture will narrow the gap with X-Gene on power efficiency.

In addition to its ARM CPU, another advantage X-Gene offers over Intel is its integrated SoC architecture. To complete a server design, Intel’s current Xeon processors require a separate system-logic chip and one or more Ethernet chips. X-Gene, in contrast, inte- grates all of these functions onto a single chip, reducing both cost and power. We expect most of Intel’s server processors to retain this power-hungry multichip design, even after Haswell debuts. X-Gene also offers a wider array of hardware accelerators and offload engines than Ivy Bridge or Haswell.

X-Gene will compete against other ARM-compatible server processors. One other company to announce such a product is a startup, Calxeda (Cal-ZAY-dah). Calxeda’s first processor combines four Cortex-A9 CPUs running at 1.1GHz each. At this speed, the simpler Cortex-A9 cores will generate about a quarter of the performance of a 3.0GHz X-Gene CPU. Furthermore, Cortex-A9 uses the older 32-bit ARM architecture, making it more difficult to port 64-bit x86 code. With a target TDP of just 5W, Calxeda’s chip is targeting low-end microservers but won’t come close to the performance of AppliedMicro’s processors.

Calxeda is also working on a second-generation processor that is likely to use ARM’s forthcoming Atlas CPU, the first licensable ARMv8 core. We believe the Atlas project started at least a year later than X-Gene, however, giving AppliedMicro a potential time- to-market advantage over any vendors relying on ARM’s CPU designs. Furthermore, ARM has not specified whether Atlas will be able to match X-Gene’s performance.

Other vendors rumored to be designing their own 64-bit ARM processors include Marvell, Nvidia, and Qualcomm. None of these companies has announced any such products or even a roadmap. None has much experience in the cloud-server or networking markets. It remains to be seen whether they can put together a complete server SoC that could compete with X-Gene.

Conclusions

Cloud data centers are placing a greater emphasis on reducing electricity costs by improving the power efficiency of their servers. New approaches such as microservers reduce the size and power of servers, but these approaches can only go so far. To fully address the power problem, server designers must address the largest consumer of power in the system: the processor chip.

One method of reducing processor power is to use an alternative architecture such as ARM. The ARM instruction set is simpler and more modern than x86, allowing it to be implemented more efficiently. ARM has recently introduced a 64-bit version that is better suited to the large memory requirements of servers. The challenge for ARM is that most server software is designed to run on x86 processors only. But most cloud data centers use in-house or open-source software that can be ported to an alternative architecture, if the payback is sufficient.

One of the first processors to take advantage of the 64-bit ARM architecture will be AppliedMicro’s X-Gene chip. Unlike most ARM processors, X-Gene is designed specifi- cally to meet the needs of servers, offering high performance and integrated networking functions on a single, power-efficient chip. The first X-Gene processors are scheduled to sample by the end of this year. These chips could offer up to 3x the performance per watt of today’s Xeon processors, although this gap will narrow by the time X-Gene is actually shipping. AppliedMicro will be one of the few non-x86 vendors vying for what we forecast will grow to a $2.8 billion market by 2016. ♦

Linley Gwennap is founder and principal analyst of The Linley Group and publisher of “A Guide to Server Processors.” The Linley Group offers the most comprehensive analysis of the networking-silicon industry. Our in-depth reports covers topics including embedded processors, network processors, server processors, security processors, and communications processors. For more information, see our web site at www.linleygroup.com.

Trademark names are used throughout this paper in an editorial fashion and are not denoted with a trademark symbol. These trademarks are the property of their respective owners.

This paper is sponsored by AppliedMicro, but all opinions and analysis are those of the author.