<<

The Untold Story of Marvellʼs Development

By Linley Gwennap Principal Analyst

August 2008

www.linleygroup.com

The Untold Story of Marvell’s Processor Development

By Linley Gwennap, Principal Analyst, The Linley Group

This paper discloses the eight-year effort that preceded the recent launch of Marvell’s Sheeva processors, explaining how the company became a leading CPU supplier without announcing a single processor product. We also examine these new processors and their applicability to com- munications, printers, storage, consumer, and mobile applications, and provide a peek at some next-generation CPUs. This paper is sponsored by Marvell, but all opinions and analysis are those of the author.

Introduction

Marvell is a leading vendor in several markets, including hard-drive controllers, Ether- net chips, and mobile Wi-Fi chips. Despite this success, few people think of Marvell as a processor company, much less a leader in that field. Yet Marvell shipped more than 300 million CPUs last year, most of its own design. We are aware of no company that shipped more chips based on 32-bit CPUs of its own design.

Because Marvell’s CPUs are embedded in many of the products that the company ships, the company has not publicized its design efforts, but these CPUs are vital in enabling the feature set and power efficiency of its products. For this reason, the company has quietly maintained its own CPU design team since 2003.

Recently, Marvell has expanded its product line to include general-purpose proces- sor chips. Although it appears to most outsiders that Marvell is a new entrant in the embedded-processor market, in fact these products are based on the same CPU tech- nology that the company has been shipping for years. By combining its proven CPU technology with a set of common system peripherals, Marvell’s new Sheeva products offer a compelling alternative to traditional embedded RISC processors. With these products in hand, Marvell is now ready to disclose its CPU history.

The Road to Sheeva

Since 2000, Marvell has included CPUs in its chips, for example, to control the flow of data in its storage and Ethernet controllers. These early products used licensed CPU cores, but as Marvell began to ship greater numbers of CPU-based products, company founder and CEO Sehat Sutardja decided that Marvell needed to design its own CPU cores in order to create innovative and differentiated products.

To implement this vision, Marvell in 2003 acquired a small company called ASICA that was designing ARM-compatible CPUs. Much of the ASICA team had previously worked at Picoturbo, an earlier startup that had also designed ARM-compatible CPUs, providing extensive experience with the instruction set. After the ASICA acquisition, Marvell negotiated an architecture license from ARM Ltd., making it one of the few companies in the world legally able to design and sell ARM-compliant CPUs.

©2008 The Linley Group - 1 - The Untold Story of Marvell’s Processor Development

Figure 1. Timeline of Marvell CPU development. (Source: Marvell)

As Figure 1 shows, the first Marvell products using its in-house CPU design entered production in 2004. The company quickly began converting its other CPU-based prod- ucts to use its own CPU. To meet the needs of these products, the company designed several CPUs, each with different cost and performance characteristics.

In May 2005, Sutardja gave a presentation at Forum disclosing that Marvell had developed a CPU known as Feroceon that would operate at 600MHz in 150nm technology. This CPU would later be used in Orion, Marvell’s first customer- programmable processors. Although the company never announced the Orion products nor officially disclosed details about them, they quickly became successful in SOHO storage (NAS) products.

In December 2006, Marvell acquired ’s XScale processors and CPU design team. The XScale chips are also ARM-compatible but are designed for mobile applications. Since that time, Marvell has integrated the XScale and Feroceon design teams into a single group, led by veteran CPU designer Hongyi Chen, that will produce CPU cores for both mobile and embedded applications. Before joining Marvell, Chen was a cofounder of Picoturbo and ASICA after stints in CPU design at AMD and Sun.

In June 2008, the company introduced its first processors based on its new Sheeva CPUs. Sheeva is a family of Marvell-designed CPU cores that span a range of price/perfor- mance points but share a focus on maximizing . Even while keeping power dissipation for the entire processor chip below 2.0W (typical), these CPUs operate as fast as 2.0GHz in 65nm CMOS.

©2008 The Linley Group, Inc. - 2 - The Untold Story of Marvell’s Processor Development

Marvell’s CPUs are fully compatible with the ARM architecture and thus support all ARM development tools and software. Marvell also offers a complete tool chain that is optimized for its CPU designs. By offering the fastest available ARM-compatible pro- cessors, Marvell is expanding the ARM ecosystem into new applications.

Marvell has extended its license to cover ARM v6 and v7, the most recent version of the architecture. The company expects to sample its first ARM v7 CPU in late 2008.

Communications Applications

The Sheeva-based processors can be used in a variety of networking and communi- cations applications. For example, the MV78000 family uses an advanced Sheeva CPU that is superscalar (executing up to two ) and can reorder instruc- tions to avoid pipeline stalls. This CPU operates at up to 1.2GHz in 65nm CMOS. The MV78000 is available in single-CPU and dual-CPU models, providing enough perform- ance for enterprise-class control-plane designs or SMB-class equipment that combines the control and data planes on a single processor chip.

To further boost performance, the CPU includes optimizations such as fetching the critical word first, reading from the cache while a miss is being processed (hit-under- miss), and reading from the cache while a store is being processed (nonblocking store). Important code can be locked into the cache on a per-way (but not per-line) basis. To speed context , each cache line has two dirty bits, so only the dirty half of the line needs to be flushed.

Reliability is critical in enterprise and infrastructure applications. For these applications, Marvell’s CPUs implement ECC protection on the level-two cache, protecting against errors in this data structure. The MV78000 processors also implement ECC on the mem- ory controller. The small level-one caches are not protected.

The MV78000 processors also include common networking functions, such as Gigabit Ethernet MACs and PCI Express ports, to reduce system cost. Most of these functions have already been proven in Marvell’s popular Discovery system-logic chips. Yet the processors dissipate less than 5W (typical), even with two 1.2GHz CPUs and a complete set of peripherals. This efficiency enables the Sheeva processors to fit into systems that have tight power budgets.

Mobile Devices

Because of their low power consumption, Marvell’s CPUs are a leading choice for mo- bile devices such as smartphones and PDAs. The company currently offers standalone application processors, which can be used in cell phones and other mobile devices, as well as products that combine an application processor and a 3G cellular baseband on a single chip.

These processors are compatible with all leading mobile software, which is developed for the ARM instruction set. In addition, they implement the WMMX2 multimedia extensions, which accelerate audio and video functions. WMMX2 was developed by

©2008 The Linley Group, Inc. - 3 - The Untold Story of Marvell’s Processor Development

Intel as a corollary to the MMX and SSE extensions implemented in its PC processors. This compatibility simplifies the task of developers moving software applications from the PC to a mobile Internet device (MID), for example. Marvell’s software tools support the WMMX2 extensions, and the company offers a suite of multimedia subroutines that make use of these extensions, so customers need only access these subroutines to accel- erate their software.

Marvell uses many techniques to reduce CPU power. As noted above, the CPU die size is minimized, resulting in fewer transistors to consume power, either when switching or through leakage. The design uses fine-grained to turn off portions of the CPU that are not needed on a cycle-by-cycle basis, reducing operating power. When the CPU goes into standby mode, the supply voltage is removed from most of the circuitry, eliminating leakage power. Operating at a reduced voltage, the caches retain their state in this mode, even though the control circuitry is turned off.

Marvell’s efficient CPU pipeline also reduces power. CPUs with long pipelines often waste power due to pipeline stalls and mispredicted branch penalties. Marvell’s next- generation mobile CPU uses a variable-length pipeline that is 7 stages for basic integer instructions and up to 10 stages for load instructions. To minimize time- and power- wasting branch penalties, the CPU implements a complex prediction methodology, including a Gshare-based branch history table (BHT), a branch target buffer (BTB), a branch return stack, and when all else fails, static prediction. Branches that hit in the BTB execute immediately; other correctly predicted branches require one cycle to load the target instructions.

Printers

Marvell is a leading supplier of ASICs for laser printers, due to its acquisition of Avago’s printer-ASIC business in April 2006. These printers have historically included a fast general-purpose processor, which performs the image processing, and an ASIC that controls the print engine. More recently, the trend is to combine the processor and the print controller into a single chip.

In this configuration, the demands on the CPU can be quite taxing. The CPU must create a bitmap image of the page from the printer description language (e.g., PostScript or HP’s PCL). At 1,200 dots per inch, for example, the bitmap for an 8.5- x 11-inch image contains about 128 million pixels. At 10 images per minute, the CPU must compute 21 million pixels per second. This task can easily consume a 1GHz integer CPU. Color laser printers require four times as much processing power.

Marvell’s CPU designs include a floating-point unit (FPU). This unit greatly accelerates image processing, which is typically performed using floating-point math. Marvell’s FPU is pipelined, so it can sustain a rate of one floating-point operating per cycle for any operation except divide. To improve efficiency, the FPU is tightly integrated into the CPU pipeline. The FPU is optional, so the company can omit the unit in processors targeting low-cost or low-power applications.

©2008 The Linley Group, Inc. - 4 - The Untold Story of Marvell’s Processor Development

Marvell’s processors include a memory (DRAM) controller that is directly connected to the CPU through a high-speed on-chip . This design reduces latency to DRAM, a critical factor when processing large amounts of data in a single image.

Using its own CPU designs, Marvell has created custom printer ASICs that pack enough image-processing performance to support most laser-printer configurations. These single-chip solutions reduce system cost compared with the traditional two-chip design.

Storage Equipment

Marvell has long been a leader in supplying read-channel devices, the chips that convert the electrical signals from a hard-drive read head into raw data. As in printers, the read- channel device has been combined with a separate processor that performs controls the drive and connects with the external system. Using its in-house CPUs, Marvell builds single-chip devices that combine these functions, reducing the cost and size of the hard- drive electronics.

For these applications, the CPU must be powerful enough to handle data rates of up to 1.8Gbps, yet cost must be minimized. To meet these needs, Marvell has developed CPUs that consume very little die area. To simplify the design, these scalar CPUs use a shorter pipeline, only six stages, and simpler branch prediction than Marvell’s more powerful designs.

Marvell often integrates a block of SRAM into its processors; ARM calls this function tightly coupled memory (TCM). Because it responds much more quickly than main memory, the TCM acts as a cache, but the software explicitly controls what data is kept in the TCM. This approach is more efficient than a traditional cache for applications that chew through a lot of data but need to retain only a small amount of it.

Consumer Products

Consumer products such as set-top boxes and digital-entertainment equipment can place extreme demands on a CPU. Most new equipment supports high-definition (HD) video, which requires six times the processing power of standard-definition video. The growing popularity of Internet video is forcing devices to support a large and growing number of video codecs, preventing them from simply implementing the codec in hardware. Devices such as Apple TV have spurred emphasis on an attractive user interfaces, putting an additional burden on the CPU to deliver the necessary graphics functions.

Marvell’s recent 88F6000 processors achieve CPU speeds of up to 2.0GHz. This clock speed is not only far faster than that of any other ARM-compatible processor available today, it is faster than any MIPS or PowerPC processor as well. Yet the 88F6000, includ- ing the CPU and all peripherals, uses only 2.0W (typical) at this speed, well within the power budget for fanless consumer equipment. At lower speeds, the chip consumes only 1.0W.

©2008 The Linley Group, Inc. - 5 - The Untold Story of Marvell’s Processor Development

Figure 2. Block diagram of 88F6281-based residential gateway.

To minimize cost and power, the 88F6000 processors use a simpler CPU design than other Marvell products. This CPU is a scalar design, executing only one instruction per cycle, and it has no reordering capability. These parameters help the processor maintain its low power consumption. But with its high clock speed, the CPU still packs enough performance for the most demanding consumer equipment.

Consumer equipment must also reach low price points. To reduce system cost, the MV88F6000 integrates common system functions, including an encryption engine, memory controller, two Gigabit Ethernet MACs, two SATA ports (for DVRs and other equipment with hard drives), stereo audio output, MPEG-TS (for video output), and a high-speed USB port. Using its expertise in analog design, Marvell has added USB and SATA PHYs to the chip. As Figure 2 shows, the processor can connect to an external Wi-Fi chip (from Marvell or other vendors) through its x1 PCI Express port. No com- peting processor offers as complete a set of system interfaces.

Conclusions

Marvell’s role as a processor designer and vendor has rarely been discussed until now, but the company has developed several generations of CPU designs and shipped hundreds of millions of CPUs based on these designs. Until recently, these CPUs have mainly been deeply embedded in various types of controllers, but the company is now delivering several customer-programmable processors. These processors are all fully compatible with the ARM instruction set, development tools, and software base. Avail- able at various price and power levels, they address a wide range of applications.

Marvell’s processors offer several advantages over competing products. They match or exceed the performance of single- and dual-CPU RISC processors from other vendors,

©2008 The Linley Group, Inc. - 6 - The Untold Story of Marvell’s Processor Development but at impressively low power. They also reduce system cost, not only with their low prices but also by including a full set of common system functions; in many cases, competitors must add external chips to meet the same feature set that Marvell includes in its products.

Although the ARM architecture helps improve power efficiency, it is not the most com- mon choice in either networking or consumer equipment. The former market favors PowerPC, while the latter favors MIPS. Thus, to use Marvell’s processors, most OEMs in these markets will have to port their software to ARM. Furthermore, the company must still qualify its new chips for production and demonstrate that it can adequately support a large number of customers. The many advantages of Marvell’s processors, however, should motivate customers to consider switching architectures.

Linley Gwennap is founder and principal analyst of The Linley Group and coauthor of “A Guide to High-Speed Embedded Processors” and “A Guide to Mobile Processors.” The Linley Group offers the most comprehensive analysis of the networking-silicon industry. We analyze not only the business strategy but also the technology inside all the announced products. Our in-depth reports covers topics including network processors, security processors, general-purpose pro- cessors, handset processors, and Ethernet chips. For more information, see our web site at www.linleygroup.com.

©2008 The Linley Group, Inc. - 7 -