Untold Story of Marvell's Processor Development

The Untold Story of Marvellʼs Processor Development By Linley Gwennap Principal Analyst August 2008 www.linleygroup.com The Untold Story of Marvell’s Processor Development By Linley Gwennap, Principal Analyst, The Linley Group This paper discloses the eight-year effort that preceded the recent launch of Marvell’s Sheeva processors, explaining how the company became a leading CPU supplier without announcing a single processor product. We also examine these new processors and their applicability to communications, printers, storage, consumer, and mobile applications, and provide a peek at some next-generation CPUs. This paper is sponsored by Marvell, but all opinions and analysis are those of the author. Introduction Marvell is a leading vendor in several markets, including hard-drive controllers, Ether- net chips, and mobile Wi-Fi chips. Despite this success, few people think of Marvell as a processor company, much less a leader in that field. Yet Marvell shipped more than 300 million CPUs last year, most of its own design. We are aware of no company that shipped more chips based on 32-bit CPUs of its own design. Because Marvell’s CPUs are embedded in many of the products that the company ships, the company has not publicized its design efforts, but these CPUs are vital in enabling the feature set and power efficiency of its products. For this reason, the company has quietly maintained its own CPU design team since 2003. Recently, Marvell has expanded its product line to include general-purpose processor chips. Although it appears to most outsiders that Marvell is a new entrant in the embedded-processor market, in fact these products are based on the same CPU technology that the company has been shipping for years. By combining its proven CPU technology with a set of common system peripherals, Marvell’s new Sheeva products offer a compelling alternative to traditional embedded RISC processors. With these products in hand, Marvell is now ready to disclose its CPU history. The Road to Sheeva Since 2000, Marvell has included CPUs in its chips, for example, to control the flow of data in its storage and Ethernet controllers. These early products used licensed CPU cores, but as Marvell began to ship greater numbers of CPU-based products, company founder and CEO Sehat Sutardja decided that Marvell needed to design its own CPU cores in order to create innovative and differentiated products. To implement this vision, Marvell in 2003 acquired a small company called ASICA that was designing ARM-compatible CPUs. Much of the ASICA team had previously worked at Picoturbo, an earlier startup that had also designed ARM-compatible CPUs, providing extensive experience with the instruction set. After the ASICA acquisition, Marvell negotiated an architecture license from ARM Ltd., making it one of the few companies in the world legally able to design and sell ARM-compliant CPUs. ©2008 The Linley Group - 1 - The Untold Story of Marvell’s Processor Development Figure 1. Timeline of Marvell CPU development. (Source: Marvell) As Figure 1 shows, the first Marvell products using its in-house CPU design entered production in 2004. The company quickly began converting its other CPU-based products to use its own CPU. To meet the needs of these products, the company designed several CPUs, each with different cost and performance characteristics. In May 2005, Sutardja gave a presentation at Microprocessor Forum disclosing that Marvell had developed a CPU known as Feroceon that would operate at 600MHz in 150nm technology. This CPU would later be used in Orion, Marvell’s first customer- programmable processors. Although the company never announced the Orion products nor officially disclosed details about them, they quickly became successful in SOHO storage (NAS) products. In December 2006, Marvell acquired Intel’s XScale processors and CPU design team. The XScale chips are also ARM-compatible but are designed for mobile applications. Since that time, Marvell has integrated the XScale and Feroceon design teams into a single group, led by veteran CPU designer Hongyi Chen, that will produce CPU cores for both mobile and embedded applications. Before joining Marvell, Chen was a cofounder of Picoturbo and ASICA after stints in CPU design at AMD and Sun. In June 2008, the company introduced its first processors based on its new Sheeva CPUs. Sheeva is a family of Marvell-designed CPU cores that span a range of price/performance points but share a focus on maximizing performance per watt. Even while keeping power dissipation for the entire processor chip below 2.0W (typical), these CPUs operate as fast as 2.0GHz in 65nm CMOS. ©2008 The Linley Group, Inc. - 2 - The Untold Story of Marvell’s Processor Development Marvell’s CPUs are fully compatible with the ARM architecture and thus support all ARM development tools and software. Marvell also offers a complete tool chain that is optimized for its CPU designs. By offering the fastest available ARM-compatible processors, Marvell is expanding the ARM ecosystem into new applications. Marvell has extended its license to cover ARM v6 and v7, the most recent version of the architecture. The company expects to sample its first ARM v7 CPU in late 2008. Communications Applications The Sheeva-based processors can be used in a variety of networking and communications applications. For example, the MV78000 family uses an advanced Sheeva CPU that is superscalar (executing up to two instructions per cycle) and can reorder instructions to avoid pipeline stalls. This CPU operates at up to 1.2GHz in 65nm CMOS. The MV78000 is available in single-CPU and dual-CPU models, providing enough performance for enterprise-class control-plane designs or SMB-class equipment that combines the control and data planes on a single processor chip. To further boost performance, the CPU includes cache optimizations such as fetching the critical word first, reading from the cache while a miss is being processed (hit-under- miss), and reading from the cache while a store is being processed (nonblocking store). Important code can be locked into the cache on a per-way (but not per-line) basis. To speed context switches, each cache line has two dirty bits, so only the dirty half of the line needs to be flushed. Reliability is critical in enterprise and infrastructure applications. For these applications, Marvell’s CPUs implement ECC protection on the level-two cache, protecting against errors in this data structure. The MV78000 processors also implement ECC on the memory controller. The small level-one caches are not protected. The MV78000 processors also include common networking functions, such as Gigabit Ethernet MACs and PCI Express ports, to reduce system cost. Most of these functions have already been proven in Marvell’s popular Discovery system-logic chips. Yet the processors dissipate less than 5W (typical), even with two 1.2GHz CPUs and a complete set of peripherals. This efficiency enables the Sheeva processors to fit into systems that have tight power budgets. Mobile Devices Because of their low power consumption, Marvell’s CPUs are a leading choice for mobile devices such as smartphones and PDAs. The company currently offers standalone application processors, which can be used in cell phones and other mobile devices, as well as products that combine an application processor and a 3G cellular baseband on a single chip. These processors are compatible with all leading mobile software, which is developed for the ARM instruction set. In addition, they implement the WMMX2 multimedia extensions, which accelerate audio and video functions. WMMX2 was developed by ©2008 The Linley Group, Inc. - 3 - The Untold Story of Marvell’s Processor Development Intel as a corollary to the MMX and SSE extensions implemented in its PC processors. This compatibility simplifies the task of developers moving software applications from the PC to a mobile Internet device (MID), for example. Marvell’s software tools support the WMMX2 extensions, and the company offers a suite of multimedia subroutines that make use of these extensions, so customers need only access these subroutines to accelerate their software. Marvell uses many techniques to reduce CPU power. As noted above, the CPU die size is minimized, resulting in fewer transistors to consume power, either when switching or through leakage. The design uses fine-grained clock gating to turn off portions of the CPU that are not needed on a cycle-by-cycle basis, reducing operating power. When the CPU goes into standby mode, the supply voltage is removed from most of the circuitry, eliminating leakage power. Operating at a reduced voltage, the caches retain their state in this mode, even though the control circuitry is turned off. Marvell’s efficient CPU pipeline also reduces power. CPUs with long pipelines often waste power due to pipeline stalls and mispredicted branch penalties. Marvell’s next- generation mobile CPU uses a variable-length pipeline that is 7 stages for basic integer instructions and up to 10 stages for load instructions. To minimize time- and power- wasting branch penalties, the CPU implements a complex prediction methodology, including a Gshare-based branch history table (BHT), a branch target buffer (BTB), a branch return stack, and when all else fails, static prediction. Branches that hit in the BTB execute immediately; other correctly predicted branches require one cycle to load the target instructions. Printers Marvell is a leading supplier of ASICs for laser printers, due to its acquisition of Avago’s printer-ASIC business in April 2006. These printers have historically included a fast general-purpose processor, which performs the image processing, and an ASIC that controls the print engine. More recently, the trend is to combine the processor and the print controller into a single chip.

Untold Story of Marvell's Processor Development

SIMD Extensions

Benchmarking the Intel FPGA SDK for Opencl Memory Interface

Generic Pipelined Processor Modeling and High Performance

A Modern Primer on Processing in Memory

Advanced X86

Demystifying Internet of Things Security Successful Iot Device/Edge and Platform Security Deployment — Sunil Cheruvu Anil Kumar Ned Smith David M

Motorola Mpc107 Pci Bridge/Integrated Memory Controller

The Impulse Memory Controller

IXP43X Product Line of Network Processors Specification Update December 2008 2 Order Number: 316847; Revision: 005US Contents

Optimizing Thread Throughput for Multithreaded Workloads on Memory Constrained Cmps

COSC 6385 Computer Architecture - Multi-Processors (IV) Simultaneous Multi-Threading and Multi-Core Processors Edgar Gabriel Spring 2011

WP127: "Embedded System Design Considerations" V1.0 (03/06/2002)