A MULTICORE COMPUTING PLATFORM FOR BENCHMARKING

DYNAMIC PARTIAL RECONFIGURATION BASED DESIGNS

by

DAVID A. THORNDIKE

Submitted in partial fulfillment of the requirements

For the degree of Master of Science

Thesis Advisor: Dr. Christos A. Papachristou

Department of Electrical Engineering and Computer Science

CASE WESTERN RESERVE UNIVERSITY

August, 2012

CASE WESTERN RESERVE UNIVERSITY

SCHOOL OF GRADUATE STUDIES

We hereby approve the thesis/dissertation of

David A. Thorndike candidate for the Master of Science degree *.

(signed) Christos A.Papachristou (chair of the committee)

Francis L. Merat

Francis G. Wolff

(date) June 1, 2012

*We also certify that written approval has been obtained for any proprietary material contained therein.

Table of Contents

List of Figures ...... iii

List of Tables ...... iv

Abstract ...... v

1. Introduction ...... 1

1.1 Motivation ...... 1

1.2 Contributions ...... 2

1.3 Thesis Outline ...... 2

2. Background ...... 3

2.1 Multicore Computing ...... 3

2.2 ...... 6

2.3 Dynamic Partial Reconfigurability ...... 10

2.4 Related Work ...... 12

3. Platform Development Process ...... 15

3.1 OpenSPARC ...... 16

3.2 LEON3 ...... 19

3.3 Virtex-5 ...... 22

3.4 GNU Tools / GRLIB ...... 24

3.5 Xilinx ISE ...... 25

3.6 SnapGear (for SMP) ...... 26

3.7 Benchmarking Applications ...... 27

4. General Implementation Flow ...... 31

4.1 Hardware and Tools ...... 31

i

4.2 Board Functional Verification ...... 32

4.3 Hardware Development ...... 34

4.4 Software Development ...... 37

5. Results ...... 41

6. Summary ...... 50

6.1 Conclusion ...... 50

6.2 Future Work ...... 50

Appendix A ...... 51

Bibliography ...... 83

ii

List of Figures

Figure 1. IBM's Cell Processor ...... 6

Figure 2. DPR for Functional Modification and Size Reduction ...... 11

Figure 3. Leon3 SMP with reconfigurable ...... 13

Figure 4. Xilinx XUPV5-LX110T development board (ML509) ...... 15

Figure 5. Virtex-5 FPGA ML50x Evaluation Platform Block Diagram ...... 18

Figure 6. Block diagram of configurable LEON3 processor ...... 21

Figure 7. Example LEON3 multicore SoC with other GRLIB IP cores ...... 22

Figure 8. Linear speedup of OpenMPBench's BasicMath ...... 29

Figure 9. LEON3 Xilinx ML509 Template Design from leon3mp.vhd ...... 35

Figure 10. LEON3 processor core configurability ...... 36

iii

List of Tables

Table 1. Soft-core CPUs suitable for FPGA implementation ...... 20

Table 2. Xilinx Virtex-5 LX50T and LX110T Comparison ...... 33

Table 3. Single Core Synthesis Results ...... 41

Table 4. Multicore Synthesis Results...... 44

Table 5. Performance Results ...... 48

iv

A Multicore Computing Platform for Benchmarking Dynamic Partial Reconfiguration Based Designs

Abstract by

DAVID ANDREW THORNDIKE

With the increasing application of multiple processor cores (multicores) within applications, as well as the pervasive utilization of the field- programmable gate array (FPGA), the embedded system development community has been exploring the advantages of the dynamically reconfigurable nature of FPGAs. Given size and power limitations, a primary motivation for this interest is to enable dynamic customization of hardware to optimize system performance for the various algorithms that a system encounters. This work presents a hardware based platform for studying dynamic reconfiguration of FPGAs in the context of multicore embedded systems. It also presents a methodology for developing the hardware and software for these systems. An important aspect of this work was to maximize the utilization of open source hardware and software intellectual property (IP). An example of the basic implementation flow is also provided, along with some benchmarking results.

v

1. Introduction

1.1 Motivation

Over the past few decades, since the commercial introduction of the Field

Programmable Gate Arrays (FPGA) by Xilinx, the configurable nature of FPGAs brought them to the center of attention for many in the field of computing. Though recent advances by have introduced the first commercially available FPGAs with configurable analog blocks (Morris, 2005), FPGAs have primarily occupied the realm of digital logic and functionality. When the density of these devices got sufficiently large, designers were able to implement multiple microprocessor cores within a single FPGA in order to achieve the performance and power advantages that brought multicore architectures to general-purpose CPUs and ASIC platforms. With the advancement of dynamic partial reconfiguration capabilities provided by Xilinx tools and devices, new opportunities have unfolded to explore dynamic reconfigurability of multicore systems. The intent of this work is to develop a hardware platform on which the capabilities of this powerful and flexible technology can be explored.

Since reconfigurable computing has found wide ranging applicability throughout the field of computing, the primary focus of this work will be on its application to embedded systems. These are microprocessor based systems with customized software for the sole purpose of performing or controlling a set of specific functions, often involving real-time operations. The end-user may be provided options or choices in the operation of the system, but unlike that of a personal computer, the user is generally not provided the ability to program or change the software of an embedded system (Heath, 2003).

1

1.2 Contributions

In this work, a reconfigurable multicore computing platform is presented. A method

is demonstrated to create such a platform through the application of an open source soft-

core processor and open source development tools along with Xilinx tools and a Virtex-5

evaluation board, also referred to as XUPV5 or ML509.

1.3 Thesis Outline

This thesis is organized as follows:

• Chapter 2: Presents background information on multicore computing and

reconfigurable computing, including dynamic partial reconfiguration (DPR).

Related work in the area of DPR design on the ML509 is also described.

• Chapter 3: Describes components and considerations that were employed in

the development of the reconfigurable multicore system.

• Chapter 4: Discusses the hardware and software development tools that are

used in the general implementation flow of this platform.

• Chapter 5: Reviews synthesis results and corresponding performance results

for various single and multicore hardware design configurations.

• Chapter 6: Summarizes the results and includes consideration for future

work.

2

2. Background

“When the conventional processor (core) cannot meet the needs of a target application, it becomes necessary to evaluate alternative solutions such as multiple cores and/or configurable cores” (Maxfield, 2006). Maxfield describes what may have been the best-known configurable multicore system that was developed by Tensilica, which built a custom multicore (SoC) based on a 25K gate, 32-bit post-RISC processing engine called Xtensa that targeted the customer’s application. Tensilica’s tools analyzed the customer’s /C++ application and evaluated millions of possible processor extensions based on techniques like single-instruction multiple-data (SIMD) and parallel execution to determine the best configuration for the particular application which would typically produce a 5 or 6 heterogeneous Tensilica core SoC.

Though the Tensilica example does not reflect the dynamic reconfigurable nature of the SRAM-based Xilinx FPGAs, it does demonstrate the advantages of being able to customize a standard configurable framework to meet the particular needs of a variety of applications. The following sections review the background of key areas of this topic, namely multicore computing and reconfigurable computing.

2.1 Multicore Computing

Over the past several years the maximum operating frequency of current processors

has been reaching a plateau. Performance improvements have been sought through better

organization of the computation by utilizing computational parallelism in multithreaded

processor cores and multiple processor core devices (Danel et al., 2010).

3

In (Ganssle, 2008), Ganssle provides a little background on multicore systems and highlights some key strengths and weaknesses in their application to embedded systems.

Paralleling that of personal computer systems, as the CPU’s performance outpaced memory speeds, the complexity of the CPU increased to compensate for the memory bottleneck, developing pipelined architectures, hierarchical caches, speculative branching algorithms, and snooping. Performance gains were still attainable as these complexities were accommodated by increases in CPU speeds. But as the CPU clock speeds became limited by the increasing problem of power dissipation, additional performance improvements through these means alone were reaching an impasse. To avert this impasse, personal computer CPU vendors began adopting the model of parallelizing the hardware with multicore devices. However, this could only help to the extent that the problems could be parallelized. Where parallelization of the problem is possible the same workload of a higher speed single processor can be distributed across multiple lower speed, and consequently lower power, processors. One such architecture is Symmetric Multiprocessing (SMP), which consists of two or more homogenous cores that are tightly coupled with a common memory subsystem

(Kleidermacher, 2008). This requires an operating system to provide load-balancing services to distribute the workload across the multiple processing elements. When the workload can be partitioned in a manner that would benefit from using possibly unique processor cores that are individually dedicated to specific tasks, like one core performing real-time operations while one or more other cores support user operations, it is referred to as Asymmetric Multiprocessing (AMP). This often involves employing independent operating systems of the individual processing elements. Though it can provide some

4

improvement, SMP performance is generally still hampered by the memory bottleneck and the extent to which a problem or computation can be parallelized. Amdahl’s Law expresses the benefit of multiple processors as:

1 Speedup = + (1 − )/ where f is the part of the problem or computation that cannot be parallelized, and n is the

number of processors. In addition to the diminishing returns expressed by this equation, a

memory bus bottleneck is present when multiple processors need to access a shared

memory. So in as much as each SMP core’s program can be contained within its

dedicated L1 cache, this memory bus bottleneck can be avoided. Unfortunately, this can

be impractical for many applications like that in personal computing. Still the modest

gains in of power delivered through a few SMP cores in a multicore

processor continue to provide value to the PC industry.

Embedded designs have more control over the program and execution environment

than does the realm of personal computers (Ganssle, 2008). And multicore processor

vendors continue to find market opportunities in embedded systems. One notable

example is the Cell processor, developed by IBM in collaboration with Sony and Toshiba

to satisfy the performance demands of video gaming consoles and advanced network

routers and servers (Turley, 2009). Cell was planned to handle data-intensive broadband

media, including network packets, video streams, and massive floating-point calculations.

It includes one 3.2-GHz PowerPC processor core with 32K L1 and 512K L2 cache and

eight identical 128-bit single-instruction, multiple-data (SIMD) vector processing

5

Synergistic Processor Elements (SPE) each with its own 256K block of RAM for code instructions or local data and executing their own unique SPE instruction set.

Figure 1. IBM's Cell Processor(Turley, 2009)

As another example, Freescale has developed a family of QorIQ communications

platforms based on their leading PowerQUICC communications processors, which target

numerous AMP and SMP applications in systems like branch office routers,

enterprise/WLAN access points, line card control planes, and VPN/IP service routers

(Freescale, Inc., 2008)

2.2 Reconfigurable Computing

The design and architecture of computer systems is one of a primary focus for the

computer science and computer engineering communities, constantly juggling and trying

to improve the inter-related and overlapping key aspects of computer systems, that of

functionality, performance, power, and cost. The importance of each of these aspects has

6

led to the development of varied architectures, each attempting to achieve an optimized balance for a given area of computing. These architectures can be categorized into three main groups of processors by their degree of flexibility: general purpose processors, designed for utility in the widest variety of applications; domain-specific processors, somewhat customized for a broad class of applications; and application-specific processors, customized for only one application (Bobda, 2007).

Given that a greater relative degree of performance is achievable when an

architecture is focused on a specific application, we can recognize two characteristics that

generally appear to have an inverse relationship: flexibility and performance. As a

processors design is customized for improved performance with a specific application, it

loses the flexibility of being able to execute other classes of applications. Many

processors have been developed to address the options that exist between the extremes of

the flexibility centric general purpose and performance centric application-specific

processor. An ideal processor would have the flexibility to adapt in order to provide

superior performance for a variety of specific application. This would be called a

reconfigurable device or reconfigurable hardware. (Bobda, 2007) defines Reconfigurable

Computing (RC) as the study of computation using reconfigurable devices and defines

Configuration and Reconfiguration as the process of changing the structure of a

reconfigurable device at start-up-time and at run-time, respectively.

In (Papachristou, Wolff, & Ewing, 2005), a reconfigurable system is defined as

having the ability to modify its structure, behavior or function during the course of its

operation, with this modification being achieved either on command or autonomously.

Papachristou et al. identifies four different classes of reconfiguration.

7

1. Static Reconfiguration

2. Dynamic Reconfiguration

3. Self Reconfiguration

4. Evolvable Reconfiguration

As it relates to FPGAs, static reconfiguration is accomplished by downloading a new configuration data bitstream into the device while it is offline or in a non-operational mode. Dynamic reconfiguration performs the download of a new configuration data bitstream while the FPGA retains at least a portion of its functional capacity to operate normally. This is also referred to as dynamic partial reconfiguration (DPR). When dynamic reconfiguration is executed by an FPGA’s own initiative using autonomously generated internal or configuration signals, not fed by external commands, self reconfiguration is being demonstrated. Lastly, evolvable reconfiguration follows the bio- inspired approaches for evolvable hardware, characterized by aspects of self-growth and replication of the reconfigurable hardware (Papachristou, Wolff, & Ewing, 2005).

The acceptance of the FPGA as the most widely used reconfigurable device has contributed significantly to the progress of RC over the last two decades. FPGAs and RC have found utilization in a number of fields of application. Some include rapid prototyping, which permits real hardware level testing and updating before final production as ASIC; in-system customization, which permits accommodation of unforeseen modifications from a remote location like that of the FPGAs within the Mars rover vehicle; multi-model computation, which enables time multiplexed implementations of various functions like that of mobile devices providing computer, video, and communications functionality; and adaptive computing, which would enable

8

an adaptability of structure and behavior to accommodate changes in operating or environmental conditions or changes in protocols and standards (Bobda, 2007).

NASA’s plans for the development of a lunar outpost would provide numerous

opportunities for application of RC within the overall architecture of vehicles,

communication systems, and infrastructure being developed as part of the Constellation

Program. Given the long term deployment of such outposts, the architecture as a whole

must be maintainable and evolvable. Evolvability arises from the need to address changes

in priority, advances in technology, and experience as the program progresses. To address

these characteristics, the architecture is being developed from common, interoperable

components based on open industry standards. Reconfigurable Computing offers the

reuse of hardware for dissimilar applications, affording the flexibility in functionality

(evolvability). It also offers the reduction of unique spare parts for dedicated hardware

that can be replaced with the reconfigurable hardware (maintainability) (Somervill,

2008).

Some earlier work with RC in space applications enabled advances in onboard

instrumentation including Natural Feature Image Recognition (NFIR) previously

implemented on the SEAKR Reconfigurable Computer Card (RCC) for autonomous

docking with the Hubble Space Telescope. Another includes Autonomous Landing and

Hazard Avoidance Technology (ALHAT) that provides the capability to react, in real-

time, to surface hazards at the planned landing site in spite of poor lighting conditions

(Somervill, 2008).

9

“The last decade in digital systems development has demonstrated that

Reconfigurable Computing will significantly expand our capabilities and enable us to re-

evaluate how we design systems” (Somervill, 2008).

2.3 Dynamic Partial Reconfigurability

Since some reconfigurable systems include a significant amount of commonality

between the alternate configurations, the reconfiguration time can be significantly

reduced if the downloaded data bitstream was limited to those portions of the FPGA that

actually differ between the two configurations. If this partial reconfiguration can be

achieved while the remainder of the FPGA maintains its functionality and continues to

operate normally, this process is referred to as Dynamic Partial Reconfiguration (DPR).

DPR allows the FPGA programmable fabric to change its mode of operation during run

time, effectively allowing for time division multiplexing of portions of the FPGA fabric,

while the system is operating (Hoffman & Pattichis, 2011).

“Dynamically reconfigurable computing platforms provide promising methods for

dynamic management of hardware resources, power, and performance” (Hoffman &

Pattichis, 2011). For example, many designs need to maximize performance but often for

only a small percentage of the time. To save power during the larger percentage of time

when lower performance would be sufficient, designers can use dynamic partial

reconfiguration to swap out a high-performance design with a low power version of the

same design. The design can enable the system to switch back to the high-performance

design when the system requires it.

10

DPR also enables these designers to reduce the amount of idle logic and thereby the size of their designs by dynamically time-multiplexing portions of the available hardware resources and only loading functions on an as-needed basis.

An example of this strategy is the use of DPR within an FPGA based software defined radio (SDR) system, in which a user can establish a new channel of communication by uploading a new waveform on demand without disrupting other active communication channels. Each waveform only requires a unique partial bitstream, enabling any number of waveforms to be supported by a single hardware platform that permits dynamic partial reconfiguration of time-multiplexed portions of the FPGA (Dye,

2011).

Figure 2. DPR for Functional Modification and Size Reduction (Dye, 2011)

Xilinx has been the primary supplier of DPR capable FPGAs. Though the capability had always existed in all the Xilinx SRAM-based parts, the design flow was limited to static circuit design tools, making it very difficult, if not impossible, to support reconfiguration. However, their 1998 release of the JBits Bitstream Interface provided software support for a new set of capabilities with the Virtex families of FPGAs that had previously been unrealized in Xilinx devices (Guccione, Levi, & Sundararajan, 1999).

Unfortunately, this interface still required significant manual effort and required a high

11

degree of familiarity with the device architectures. These challenges were simplified when Xilinx introduced their “Early Access Partial Reconfiguration” flow in 2006 and were further simplified with the current software approach in their May, 2010 release of the Integrated Software Environment (ISE) Design Suite 12.1, which provided what they referred to as an intuitive design flow with fourth-generation partial reconfiguration capabilities for their Virtex and Spartan families of FPGAs.

Altera has recently begun to provide DPR capable FPGAs with their Stratic V,

Cyclone V, and Arria V families of devices ( Corporation, 2012). Altera supports

DPR in these devices with two design flows based on design partitions or engineering

change orders, both of which leverage incremental compilation and Altera’s Chip Planner

tool which is part of their Quartus II software. However, there has not yet been sufficient

time for the field to generate literature based on their use in DPR applications.

2.4 Related Work

A review of related work revealed that several groups have recently begun to utilize

the Virtex-5 FPGA and even the LEON3 core in their study of dynamic partial

reconfiguration.

Though generally more efficient than static reconfiguration, Dynamic Partial

Reconfiguration (DPR) retains the reconfiguration overhead time as one of its

fundamentally limiting factors. A primary contributor to the duration of this overhead

time is the DPR controller’s use of the Processor Local Bus (PLB) which will be

unavailable to the FPGA’s operating processor during DPR (Hoffman & Pattichis, 2011).

Recent work by Hoffman and Pattichis at The University of New Mexico has

12

demonstrated significant performance improvements over current partial reconfiguration subsystems through the use of a Multi-Port (MPMC) that frees the

PLB for use by the processor during DPR. They also introduced 33% over-clocking to improve reconfiguration performance, while utilizing active feedback from the Virtex-5’s

System Monitor to ensure that the device voltages and temperatures remain within nominal operating conditions.

For testing their DPR system, Hoffman and Pattichis applied a cryptography

application which provided the added benefit of allowing them to validate their DPR

system at the single bit level with the standard set of bit test vectors provided by the

National Institutes of Standards and Technology (NIST). They also employed a

performance DMA core to measure and document the speed of reconfiguration (Hoffman

& Pattichis, 2011).

Figure 3. Leon3 SMP with reconfigurable coprocessors (Serres et al., 2011)

Serres et al. did related work on a reconfigurable multicore platform based on the

same platform for this work, Xilinx ML509 board and the Leon3 processor (Serres et al.,

2011). As in (Hoffman & Pattichis, 2011), Serres et al. chose cryptographic applications

13

for testing and benchmarking their reconfigurable system. However, unlike my interest and that of Hoffman and Pattichis, Serres et al. approached their partial reconfigurability study from a static basis. As illustrated in Figure 3, their system was based on two Leon3 cores each interfaced with its own cryptographic co-processor. This co-processor could implement one of two cores, an encryption core or a decryption core, based on the Basic

DES Block Cipher (McQueen, 2003). This platform ran the cryptographic application on a SMP configured Linux operating system.

This collection of related work demonstrates the applicability of a Virtex-5 platform

for the study of DPR. It also identifies areas in which further study of DPR can be

pursued with the Virtex-5 and multiple LEON3 core based system.

14

3. Platform Development Process

The primary objective of this work is to prepare a platform on which to study dynamic partial reconfiguration applications and techniques as they apply to a multicore processing system. A secondary objective was to limit the cost and learning curve by referencing and collaborating with open source online communities in as many areas of the system as possible. “Open source is a development method for software that harnesses the power of distributed peer review and transparency of process. The promise of open source is better quality, higher reliability, more flexibility, lower cost, and an end to predatory vendor lock-in” (OSI Mission, n.d.).

Figure 4. Xilinx XUPV5-LX110T development board (ML509) (Virtex-5 OpenSPARC Evaluation Platform, n.d.)

15

Beginning with the Xilinx Virtex-5 OpenSPARC Evaluation Platform, XUPV5-

LX110T board (ML509) developed by Digilent, Inc., as an available asset within the

EECS department, the development of this platform required the process of studying,

understanding, and often experimenting with the following areas:

1. The OpenSPARC Evaluation Platform

2. The OpenSPARC T1 and T2 microarchitectures

3. The smaller SPARC based LEON3 microarchitecture

4. Xilinx Virtex-5 FPGA

5. Gaisler Research development tools for LEON3

6. Xilinx development tools as utilized by Gaisler Research tools

7. Linux distribution suitable for use on a multicore LEON3 SMP system

8. Benchmarking applications, like and OpenMPBench

3.1 OpenSPARC

Made available in September 2008 through a partnership between Xilinx, Inc. and

Sun Microsystems, Inc., the Virtex-5 OpenSPARC Evaluation Platform, also identified as the XUPV5 and ML509, provides academic researchers and hardware developers with a feature rich platform based on the combined powers of an open sourced 32-thread

UltraSPARC T1 processor core and the high-performance programmable Xilinx 65nm

Virtex-5 FPGA technology (Sun, Xilinx Partner on OpenSPARC Evaluation Platform,

2008). An open sourced core is a pre-designed and generally pre-tested intellectual property (IP) component for which the source code of the design is made freely available to the academic and commercial development communities through a public licensing agreement, like the GNU General Public License (GPL) (OpenCores: Mission, n.d.).

16

OpenSPARC is a hardware oriented open source project established by Sun

Microsystems in March 2006, through which their RTL (Register Transfer Level)

language source code for UltraSPARC T1 processor was made freely available to the

public under the GNU GPL (About OpenSPARC, 2012). In 2007, Sun also made

available their UltraSPARC T2 processor through OpenSPARC. The source code made

available for both processors is comprehensive and includes full-system simulators. They

also included scripts for compiling that source code, prepackaged operating systems,

source code to the Hypervisor software layer, a large suite of verification software, and

thousands of pages of documents.

The ML509 Evaluation Platform is designed to enable designers to investigate and experiment with features of the Virtex-5 LXT FPGA. Developed around this FPGA,

XC5VLX110T-1FFG1136, the board offers many features (Xilinx Inc., 2011), including:

• Two Xilinx XCF32P Flash PROMs (32 Mb each) for storing device configurations

• Xilinx System ACE™ CompactFlash configuration controller with Type I connector

• 64-bit wide, 256-MB DDR2 small outline DIMM (SODIMM)

• ZBT synchronous SRAM, 9 Mb on 32-bit data bus with four parity bits

P30 StrataFlash linear flash chip (32 MB) for storing bootable software

• Serial Peripheral Interface (SPI) flash (2 MB)

• One 8-Kb IIC EEPROM and other IIC capable devices

• Stereo AC97 audio codec with supporting off-board audio connectors

• Video input (VGA) / Video output DVI connector (VGA supported with adapter)

• 10/100/1000 PHY transceiver and RJ-45

• USB interface chip with host and peripheral ports

17

• RS-232 serial port, DB9 and header for second serial port

• 16-character x 2-line LCD display

• General purpose DIP switches, LEDs, pushbuttons, and rotary encoder

• Expansion header with single-ended, LVDS differential pairs, and spare I/Os

• Several GTP/GTX ports, including SATA and PCI Express® (x1 Endpoint)

Figure 5. Virtex-5 FPGA ML50x Evaluation Platform Block Diagram (Xilinx Inc., 2011)

Unfortunately, synthesizing the smallest of the default configurations of the

UltraSPARC T1, a 1 thread core, demonstrated that the programmable logic density of the Virtex-5, XCV5LX110T, is not sufficient to load a system with more than one

18

UltraSPARC T1 core. Though the platform documentation explains how to implement a multicore system by connecting two or more of the ML509 boards together through the

SATA interface, this would increase system cost and complexity, as well as deviate from the interest in studying dynamic partial reconfiguration of a multicore system as it can be implemented on a single programmable device. This lead to settling on a different

SPARC based core, the LEON3 developed by Gaisler Research, which was also made available as open source under the GNU GPL for evaluation, research, and educational purposes (Aeroflex Gaisler, 2008).

3.2 LEON3

Since the open sourced version of the custom designed UltraSPARC T1

Microprocessor, the OpenSPARC T1 core, was too large to fit more than one such core onto the Virtex-5 FPGA of the OpenSPARC platform, smaller soft-core processors were studied as alternatives. Soft-core processors are microprocessor designs for which the architecture and functionality are fully described within its synthesizable source code which is written in a hardware descriptive language (HDL) like VHDL or Verilog. (Tong et al., 2006), (Serres et al., 2011), and (Soft CPU Cores for FPGA, 2009) each conducted a survey of soft-core processors that would be suitable for FPGA implementation. Some highlights are provided in Table 1.

19

Table 1. Soft-core CPUs suitable for FPGA implementation (Soft CPU Cores for FPGA, 2009)

For this work, a few key requirements of the soft-core included an open source license, an MMU in order to support the Linux operating system with its SMP capabilities, and a size small enough to permit more than one core to fit on the Virtex-5

LX110T. The two leading candidates included the LEON3, based on the SPARC-V8 architecture, and the OpenRISC 1200, based on the OpenRISC 1000 architecture. Though the OpenRISC 1200 supports SMP, the Linux port does not yet support it. Furthermore, the development community around the LEON3 with its forum at

20

[email protected] reflects significant activity, including several forum members working specifically with the ML509 board. So the LEON3 proved to be the strongest choice for this work.

The LEON3 core is an open source synthesizable VHDL model of a 32-bit processor based on the SPARC-V8 architecture. This core is highly-configurable and uses the

AMBA 2.0 AHB bus to interface with other IP cores (Aeroflex Gaisler, 2008). A fault- tolerant version of the core is also available for space applications, providing immunity to single event upsets that can be prevalent in a high radiation environment.

Figure 6. Block diagram of configurable LEON3 processor

LEON3 core can be implemented in an AMP or SMP configurations, with up to 16 processor cores in a system. A typical four core system is capable of delivering up to

1600 Dhrystone MIPS of performance (Aeroflex Gaisler, 2008). A non-intrusive

21

hardware debugging interface provides access to all on-chip registers and memory for both single and multicore systems, and includes trace buffers for both CPU instructions and AMBA bus traffic. Multicore hardware support is provided for cache coherency, processor enumeration, and SMP interrupt steering, as well as a round-robin AMBA bus arbiter for fair bus utilization between the processors.

Figure 7. Example LEON3 multicore SoC with other GRLIB IP cores (Aeroflex Gaisler, 2008)

3.3 Xilinx Virtex-5

At the heart of the OpenSPARC Evaluation Platform, the Virtex-5 XC5VLX110T

FPGA is built on the 65-nanometer triple-oxide technology and delivered the highest

level of performance, density, and feature integration to the FPGA market place when it

was announce in May 2006 (Snowden, 2006). This introduced the industry’s first six-

input look-up table (LUT) which contributed to and average 30 percent performance

increase over the previous Virtex-4 with its 4-input LUTs. This Virtex-5 LXT Platform

22

targeted the market for high-performance logic-intensive applications with advanced serial connectivity. The XC5VLX110T provides 17,280 Virtex-5 Slices (each contains four 6-input LUTs with flip-), 64 DSP48E Slices (each contains a 25 x 18 multiplier, an adder, and an accumulator), 148 36Kb RAM Blocks (5,328 Kb total), 32 global clock trees with device-wide distribution and 6 Clock Management Tiles (each contains two

Digital Clock Managers and one Phase-Locked Loop), along with one PCI Express endpoint block, four tri-mode Ethernet MACs, 16 RocketIO GTP Transceivers with 3.75

Gb/s capability, and 680 User I/Os (Xilinx, Inc., 2009). It also includes a System Monitor for on-chip temperature and power supply measurements and a user accessible 10-bit

20kSPS ADC with up to 17 external analog input channels. This device also carries forward from previous Xilinx devices the capability of dynamic partial reconfiguration, supported by Xilinx PlanAhead, part of the Xilinx ISE Design Suite.

One of the principle features of the Virtex-5, as it applies to this work, is its configuration capabilities, mainly that of dynamic partial reconfiguration. Like all Xilinx

FPGAs, the Virtex-5 device’s configuration memory is volatile and must be configured each time it is powered up. This configuration is accomplished by loading application- specific configuration data (the bitstream) into the device’s internal configuration memory through special pins on the device package. There are several different configuration modes that can be used for this operation, employing either a serial or a parallel interface with the FPGA in the role of master or slave of the operation. The specific configuration mode is determined on power-up by the voltage level presented on dedicated Mode input pins (Xilinx, Inc., 2010). This method is referred to as static reconfiguration and is used to specify a variety of static conditions in functional blocks,

23

such as configuration logic blocks (CLBs), input/output blocks (SelectIO), or clock management tiles (CMTs). However, some applications may require a change in these conditions within a functional block while the system is operational. This can be accomplished by dynamic partial reconfiguration using the JTAG, ICAP, or SelectMAP ports. There are two kinds of dynamic reconfiguration within the Virtex-5. One kind involves the Digital Clock Managers (DCMs) within the CMTs in which a Dynamic

Reconfiguration Port (DRP), specific to the DCMs, allows dynamic adjustments of the clock’s configuration in order to produce a new frequency. The second kind is more pertinent to this work and involves reconfiguring a specific set or sets of the functional blocks, also referred to as the reconfigurable partitions or reconfigurable modules (RMs), and performing this reconfiguration while the static partition of the design continues to operate.

3.4 GNU Tools / GRLIB

Along with making their source code for the LEON3 design publicly available,

Gaisler Research, now Aeroflex Gaisler, has made freely available a collection of open

source IP cores and a design configuration environment for developing the LEON3 based

FPGA designs, collectively referred to as GRLIB IP library. The full software

development environment is based on a range of popular open source and commercial

tools and embedded operating systems, including VxWorks, ThreadX, and Nucleus.

Linux support for the MMU based LEON3 system is provided through a special version

of the SnapGear Embedded Linux distribution, which is includes as a full source package

containing kernel, libraries and application code for rapid development of embedded

SPARC based systems (Aeroflex Gaisler, 2008). Many of the operating systems and

24

kernels, are based on a common open source GNU-based cross-compilation system which includes the GNU C/C++ cross-compiler, assembler, linker, Newlib embedded C- library, boot-PROM builder, Eclipse based IDE, and more.

An extensive debug monitor, GRMON, provides quick hardware and software validation. It provides a non-intrusive debug environment on real target hardware through a variety of interfaces. It can operate attached to the GNU debugger (gdb) or in stand- alone mode, through which LEON3 applications can be loaded and debugged using a command line interface or via a graphical user interface. Numerous commands are available to examine data, insert breakpoints, and advance execution.

3.5 Xilinx ISE

Xilinx ISE is a suite of software tools used for the design entry, synthesis, analysis,

and implementation of a hardware descriptive language (HDL) based FPGA design. A

limited version of the tool set, the ISE WebPACK Tool, is the industry’s only free, fully

featured front-to-back FPGA design solution for Linux or Windows systems (Xilinx, Inc.,

2009). However, the WebPack version of ISE has limitations on its applicable devices

such that it cannot be used to Place&Route a design for the XC5VLX110T FPGA which

is on the ML509 board. Therefore, a license is required for one of the more

comprehensive versions of the design suite, in order to use ISE’s Place&Route for the

ML509. However, this can be often donated to an academic institution through the Xilinx

University Program. It is worth noting that the ISE WebPACK could be used for the

lower density FPGA, XC5VLX50T, which is on the Xilinx ML505 board.

25

3.6 SnapGear Linux (for SMP)

LINUX support for LEON3 is provided through a special version of the SnapGear

Embedded Linux distribution, which is a full source package, containing kernel, libraries

and application code (Hellstrom, 2009). This distribution includes two kernels, a

modified 2.0 version for non-MMU LEON3 systems and version 2.6.21.1 for LEON3

systems containing an MMU. It supports the optional floating-point unit (FPU) and

SPARC V8 mul/div instructions, as well as multicore LEON3 systems with symmetric

multi-processing (SMP), which are enabled through the Linux kernel configuration. The

graphical interface configuration utility, similar to that of the Linux 2.4 kernel, can be

used to remove drivers and features to minimize the kernel image size.

The SnapGear Linux distribution has incorporated a small boot loader that is designed specifically for the LEON3 processors, both multicore SMP and uniprocessor systems (Hellstrom, 2009). The boot loader’s main purpose is to perform initialization of the low level basic hardware like the debug console interface and the memory controller, prior to launching Linux.

Depending on the version of Linux being used, a choice of cross-compiler toolchain binary packages are made available with SnapGear Linux, comprising several utilities used in the compilation process, the most important of which are the GNU GCC compiler and linker. For this work, the Linux 2.6 GNU LibC toolchain, -linux-3.4.4, was used.

26

3.7 Benchmarking Applications

Used for assessing the software performance on hardware platforms, benchmarks are generally a set of applications whose execution results provide a metric to qualify a platforms performance. To test various architectural configurations within this work some benchmarking standards were studied, with a bias toward those used for embedded systems and multicore systems.

• Dhrystone is one of the most often referenced benchmark in computing

literature and system specifications, though not designed for multiple cores.

• CoreMark was developed by The Embedded Microprocessor Benchmark

Consortium (EEMBC) as a benchmark specifically for testing the

functionality of a processor core (EEMBC, 2012).

• MultiBench is a suite of embedded benchmarks that was developed by

EEMBC to analyze multicore architectures and platforms (EEMBC, 2012).

This has the drawback of being a commercial application that involves a cost

for its license.

• OpenMPBench, also referred to as ParMiBench, is an open source

benchmark consisting of the parallel implementation of a suite of compute

intensive algorithms from the uniprocessor benchmark MiBench (Iqbal,

2010).

In consideration of its open source nature, OpenMPBench was studied for use on this multicore LEON3 platform. A survey of industry literature revealed that OpenMPBench

27

has not yet seen wide spread utilization. As with the MiBench benchmark on which it is based, OpenMPBench was proposed for benchmarking embedded systems. Focused on multicore systems, it includes a parallel implementation of 7 of the 35 embedded applications that comprise MiBench (Iqbal, 2010). The original 35 applications attempt to capture the diversity of the embedded systems industry and are grouped into 6 domains.

• Automotive and Industrial Control

• Consumer Devices

• Office Automation

• Networking

• Security

• Telecommunications

The 7 OpenMPBench applications are drawn from 4 of these domains: automotive/industrial control, office, network, and security. The parallelization of these applications was done using the POSIX Pthread API and standard C libraries. For this work only one of these applications, Basicmath from the Automotive/Industrial Control domain, was implemented within the SnapGear Linux that was used for the benchmarking environment. Basicmath uses a fixed set of constants as the input data set to a variety of simple mathematical calculations, like integer square root and cubic function solving. The parallelization of Basicmath was accomplished by data partitioning through a master-worker strategy, which achieves a good linear speedup and scales well for large data set sizes (Iqbal, 2010).

28

Figure 8. Linear speedup of OpenMPBench's BasicMath (Iqbal, 2010)

Though not designed for multicore platforms, Dhrystone was also included here as a benchmark for relative comparison to industry literature. For example, Aeroflex Gaisler reports a Dhrystone 2.1 benchmark performance of 1.4 DMIPS/MHz for a particular configuration and compilation of the LEON3 core (Aeroflex Gaisler, 2008). It also serves here as a relative gauge of single core performance for various LEON3 core configurations.

Dhrystone is a synthetic algorithm developed in the 1980s to evaluate the

performance of a computer/compiler combination. With the availability of the

benchmark which provides floating-point performance, Dhrystone specifically excluded

floating-point arithmetic. This benchmark implements a loop of a representative mix of

C-language computations and includes a timing and reporting framework which provides

the number of loops executed per second. The industry has established a comparative result based on the performance the Digital Equipment Corporation VAX 11-780 and

DEC-supplied compiler which was reported to have run at one million instructions per

29

second. This VAX could execute 1757 Dhrystone loops per second. This comparative result, referred to as DMIPs, is calculated by dividing the benchmarks reported loops per second by 1757. This has been further adapted to remove the variable of the processor clock speed by dividing the DMIPs value by the for the processor. This provides the value of DMIPs/MHz that is generally reported in industry literature

(ECROS Technology, 2005).

30

4. General Implementation Flow

For this work, the general implementation flow for system development on the

ML509 board includes the following steps:

1. Acquire the hardware and software.

2. Verify board functionality with diagnostics and testing.

3. Build a processor based hardware design for the FPGA.

4. Using a cross-compiler environment, build compatible OS and/or applications.

5. Load and test the design on the FPGA platform.

4.1 Hardware and Software Tools

The following hardware is required:

1. Xilinx ML509 board (included in XUPV5 kit)

2. 5V (6.0A) AC power adapter (included in XUPV5 kit)

3. XUPV5 1GB CompactFlash Card (included in XUPV5 kit)

4. VGA to DVI adapter (included in XUPV5 kit)

5. Xilinx USB/JTAG Programming Cable

6. Null modem serial cable (DB-9)

7. Ethernet crossover cable (RJ-45)

8. Workstation running Linux distribution like Red Hat or Ubuntu

a. with a standard serial port, USB port, and Ethernet port.

The following software is required (version used for this work):

1. Xilinx ISE (v12.3)

2. ModelSim (SE-64 6.2i - Rev. 2007.07)

31

3. GRLIB IP library (v1.1.0-b4108)

4. The following host software is required for the GRLIB configuration scripts:

a. Bash shell

b. GNU make

c. GCC

d. Tcl/tk-8.4

e. patch utility

5. GRMON LEON debug monitor (v1.1.52 evaluation version)

6. SnapGear (v42) for Linux (kernel 6.2.21.1)

a. Linux 2.6 GNU LibC toolchain (sparc-linux-3.4.4)

7. Bare-C Cross-Compiler for LEON (v1.0.36b)

For this work, the development environment was established on an HP xw4400

Workstation running Red Hat Enterprise Linux 4.4 with kernel 2.6.9-42.

4.2 Board Functional Verification

Prior to any attempt to install and run any designs or applications, one should first

become familiarized with the hardware platform and verify the hardware functionality by

following the platform’s Getting Started Tutorial, which includes running the factory

installed demonstration software. It is important to note that Xilinx refers users of the

ML509 board to the documentation for the Xilinx ML505 Evaluation Platform, provided

at http://www.xilinx.com/products/boards/ml505/docs.htm. The only significant

difference between these two platforms is that the ML505 uses the Virtex-5 LX50T

FPGA rather than the LX110T which is on the ML509. On these platforms, both devices

32

share the same 35mm x 35mm fine-pitch 1136 ball grid array package but their logic density and number of User I/Os differ, as shown below.

Table 2. Xilinx Virtex-5 LX50T and LX110T Comparison (Xilinx, Inc., 2009)

Referring to the Xilinx document UG348, ML505/ML506/ML507 Getting Started

Tutorial, guidance is provided in the section Board Setup for connecting the ML509

board to power, a computer through a null modem serial cable, and a VGA or DVI

display, and setting the SW3 configuration DIP switches. Subsequent sections explain

how to boot and run demonstrations from various board level resources, like the System

ACE TM CompactFlash (CF), Linear Flash, Platform Flash, and SPI Flash. The demos also include an XROM application for performing board diagnostics and testing (Xilinx, Inc.,

2009). This application will test the following board-level resources:

1. Test DDR SDRAM

2. Test ZBT SRAM

3. Test LEDs

4. Test Pushbuttons

5. Test Dip Switches

6. Test Character LCD

7. Test PS/2 Keyboard

8. Test SMA Connectors

9. Test VGA Output

10. Test Flash Memory

33

11. Print IIC EEPROM Contents

12. Test Piezo

4.3 Hardware Development

The LEON3 processor is distributed by Aeroflex Gaisler as part of their GRLIB IP

library, which provides LEON3 template designs for many FPGA boards including the

Xilinx ML509. However, it must be noted that the ML509 templates are principally

ML505 templates with modifications to address the understood differences with the

ML509. New releases of this library are generally not verified on the ML509 by Gaisler

and require the user community to provide forum postings for any errors or issues that are

discovered with the application of the GRLIB IP library to the ML509 board.

The LEON3/GRLIB source code and documentation are available for download

from the Aeroflex Gaisler web site. GRLIB is primarily developed on Linux hosts, and

Linux is the preferred platform. The release used for this work was GRLIB Version 1.1.0

Build 4108.

The Aeroflex Gaisler document GRLIB IP Library User’s Manual provides a

LEON3 quick-start guide on how to implement a LEON3 system using GRLIB. This is typically done using one of the template designs, with the guide using that for the GR-

XC3S-1500 board. However, it is the ML509 template design that is used in this work and it is located in {GRLIB}/designs/leon3-xilinx-ml509 , where {GRLIB} represents the directory into which the GRLIB IP library was installed. Each template design is based on two files, config.vhd and leon3mp.vhd, and is accompanied by a test bench file, testbench.vhd. The VHDL package containing the configuration parameters is provided

34

in config.vhd, which is automatically generated by choices made in the xconfig GUI tool. leon3mp.vhd contains the top level entity and instantiates all on-chip IP cores, while using config.vhd to configure the instantiated IP cores (Aeroflex Gaisler, 2011).

Figure 9. LEON3 Xilinx ML509 Template Design from leon3mp.vhd

Implementation of a template design typically follows four basic steps that utilize the

GNU Make and GRLIB Makefiles for the target board:

1. Configuration of design using xconfig GUI tool

# make xconfig

2. Simulation of design and test bench

# make vsim

# vsim testbench

3. Synthesis and place&route of design for target FPGA

# make ise

4. Download design to the FPGA

# make ise-prog-fpga

35

VHDL generics are used to configure each core in the template design and are

assigned the value of constants declared in config.vhd. Other configuration variables that

provide board-level requirements for the FPGA, like FPGA I/O assignments and board-

level timing, are defined in files located in {GRLIB}/boards/xilinx-ml509-xc5vlx110t. In

addition to the top level , the xconfig GUI tool offers several configurable

options of LEON3 core, including whether to use an FPU, an MMU, the MUL/DIV

instructions, the SMAC/UMAC instructions, an instruction cache or data cache, and

several others, as well as configurable options for each of these units. Figure 10 provides

a block diagram that generally reflects the configurable units within the LEON3 core.

Figure 10. LEON3 processor core configurability (Aeroflex Gaisler, 2008)

Mentor Graphics ModelSIM can be used to simulate the template design with a test

bench that emulates the evaluation board, including external PROM and SDRAM which are pre-loaded with a test program. This test program will execute on the LEON3 processor,

36

test functionality in the design, and print diagnostics on the simulator console during the execution.

The template design can be synthesized and place&routed with the Xilinx ISE/XST.

However, the GRLIB scripts also provide hooks for other synthesis tools like Synplify

and Precision. Regardless of which tool is used the final programming file is

‘leon3mp.bit’. This is the Xilinx FPGA configuration file, which can be downloaded to

the FPGA with the Xilinx iMPACT software. An additional step can use the Xilinx

PromGen application to generate the PROM files needed to program the FPGA’s

configuration PROM.

4.4 Software Development

Software can be loaded and run on the LEON3 processors as a discrete application or

as part of an operating system. In order to study a multicore running within a SMP

system, one can use the Linux 2.6 kernel which provides support for SMP.

The primary means of accessing the LEON3 processors for software development is with Gaisler’s general debug monitor, GRMON (Aeroflex Gaisler, 2011). GRMON includes the following functions:

• Read/write access to all system registers and memory

• Built-in disassembler and trace buffer management

• Downloading and execution of LEON3 applications

• Breakpoint and watchpoint management

• Remote connection to GNU debugger (GDB)

• Support for USB, JTAG, RS232, PCI, Ethernet and SpaceWire debug links

37

Using GRMON, software can be loaded directly to memory and executed or loaded into the Flash PROM which will be retained between board power cycles.

To build software programs and applications for the LEON3 processor, Aeroflex

Gaisler provides a cross-compiler called Bare-C Cross Compiler (BCC), which is based on the GNU compiler tools and the Newlib standalone C-library (Gaisler, 2011). BCC consists of the following packages:

• GNU GCC C/C++ compiler v3.4.4, v4.4.2

• Newlib C-library v 1.13.1

• Low-level I/O routines for LEON2 and LEON3/4, including interrupt support

• uIP light-weight TCP/IP stack

• GDB debugger v6.4 with DDD and Insight Graphical front-end

• Mkprom prom-builder for LEON2/3

• Linux and Windows (MingW) hosts

The command line usage for calling this cross compiler is :

# sparc-elf-gcc [options] file …

Though not investigated for this work, Aeroflex Gaisler also provides a plugin for the

Eclipse framework, enabling LEON3 application development with the Eclipse C/C++

Development Tooling.

A special version of the SnapGear Embedded Linux distribution is utilized in this work to support the multicore LEON3 SMP. This is based on the Linux kernel 2.6.21.1.

The cross compiler within SnapGear is somewhat different from that of the Bare-C Cross

Compiler and if called independently from the standard SnapGear build has the following command line usage:

38

# sparc-linux-gcc [options] file …

GRMON, BCC, and SnapGear Linux are freely available for download from the

Aeroflex Gaisler Web site.

Implementation of SnapGear Linux on the LEON3 based ML509 platform typically follows three basic steps that utilize GNU Make and configuration files that are customized by the SnapGear and Linux configuration GUIs:

1. Configuration of boot loader, Linux kernel, and user applications

# make xconfig

2. Compilation of boot loader, kernel, libraries, applications and make image

# make

3. Download and run the RAM image with GRMON

# grmon-eval –xilusb –nb

grlib> load image.dsu

grlib> run

The configuration operation consists of using the xconfig GUI to customize three

groups of settings:

• Vendor hardware (boot loader for Leon3 on the ML509)

• Linux kernel & library (version 2.6.21.1 & glibc)

• Vendor/user applications (filesystems, network, busybox, and misc.)

The GUI based configuration utility is used to update four primary configuration

files that are used to build the SnapGear image:

39

1. {snapgear}/.config: Referred to as the config.vendor file, it configures the boot

loader which must satisfy requirements of the board level hardware.

2. {snapgear}/linux-2.6.21.1/.confg: Referred to as the config.linux file, it

configures the Linux kernel.

3. {snapgear}/config/.config: Referred to as the config.apps file, it specifies which

core applications, application libraries, and tools will be included in the loaded

file system.

4. {snapgear}/user/busybox/busybox-1.8.2/.config: Referred to as the

config.busybox, it configures the busybox which is a compact executable that

incorporates several stripped-down Unix tools frequently used by developers. where {snapgear} represents the directory into which the SnapGear package was installed.

Configuration templates are available for several boards to provide the correct settings for the vendor hardware. However, one was not available for the ML509. So as part of this work it was created and incorporated into the SnapGear installation as the leon3_xilinx_ml509 template configuration.

40

5. Results

One of the primary goals of this work was to establish a hardware based platform on

which one could implement various design configurations relating to a multicore and

reconfigurable architecture and evaluate their relative performance. The following results

demonstrate the configurability of the system, utilizing the OpenMPBench and the

Dhrystones benchmark to demonstrate and quantify the relative differences in

performance.

Hardware implementation statistics were found in the Xilinx Map application log

file, leon3mp.map. For simplicity, not all statistics will be reported, focusing principally

on the FPGA’s logic and memory utilization.

5.1 Single Core Configurations

#1 Leon3x1_nFPU_nMulDiv_nCache

#2 Leon3x1_nFPU_nMulDiv (4way icache-8KB & dcache-4KB LRU)

#3 Leon3x1_nFPU_wMulDiv (5-cycle latency)

#4 Leon3x1_wFPU_wMulDiv (GRFPU-Lite)

#1 #2 #3 #4 Total Avail.

Slice Registers 10% 16% 18% 19% 69120

Slice LUTs 18% 24% 28% 33% 69120

BlockRAM/FIFO 5% 19% 22% 21% 148

Table 3. Single Core Synthesis Results

41

Leon3x1_nFPU_nMulDiv_nCache:

Slice Logic Utilization: Number of Slice Registers: 7,104 out of 69,120 10% Number used as Flip Flops: 7,103 Number used as Latch-thrus: 1 Number of Slice LUTs: 12,932 out of 69,120 18% Number used as logic: 12,644 out of 69,120 18% Number used as Memory: 264 out of 17,920 1% Number used as Dual Port RAM: 204 Number used as Shift Register: 60 Number used as exclusive route-thru: 24 Number of route-thrus: 282

Specific Feature Utilization: Number of BlockRAM/FIFO: 8 out of 148 5% Number using BlockRAM only: 8 Total primitives used: Number of 36k BlockRAM used: 4 Number of 18k BlockRAM used: 8 Total Memory used (KB): 288 out of 5,328 5% Number of BUFG/BUFGCTRLs: 17 out of 32 53% Number used as BUFGs: 17 Number of IDELAYCTRLs: 3 out of 22 13% Number of BSCANs: 2 out of 4 50% Number of DCM_ADVs: 7 out of 12 58%

Leon3x1_nFPU_nMulDiv:

Slice Logic Utilization: Number of Slice Registers: 11,261 out of 69,120 16% Number used as Flip Flops: 11,260 Number used as Latch-thrus: 1 Number of Slice LUTs: 17,279 out of 69,120 24% Number used as logic: 16,989 out of 69,120 24% Number used as Memory: 265 out of 17,920 1% Number used as Dual Port RAM: 204 Number used as Shift Register: 61 Number used as exclusive route-thru: 25 Number of route-thrus: 287

Specific Feature Utilization: Number of BlockRAM/FIFO: 29 out of 148 19% Number using BlockRAM only: 29 Total primitives used: Number of 36k BlockRAM used: 12 Number of 18k BlockRAM used: 32 Total Memory used (KB): 1,008 out of 5,328 18% Number of BUFG/BUFGCTRLs: 17 out of 32 53% Number used as BUFGs: 17 Number of IDELAYCTRLs: 3 out of 22 13% Number of BSCANs: 2 out of 4 50% Number of DCM_ADVs: 7 out of 12 58%

42

Leon3x1_nFPU_wMulDiv:

Slice Logic Utilization: Number of Slice Registers: 12,509 out of 69,120 18% Number used as Flip Flops: 12,508 Number used as Latch-thrus: 1 Number of Slice LUTs: 19,937 out of 69,120 28% Number used as logic: 19,648 out of 69,120 28% Number used as Memory: 264 out of 17,920 1% Number used as Dual Port RAM: 204 Number used as Shift Register: 60 Number used as exclusive route-thru: 25 Number of route-thrus: 292

Specific Feature Utilization: Number of BlockRAM/FIFO: 34 out of 148 22% Number using BlockRAM only: 34 Total primitives used: Number of 36k BlockRAM used: 12 Number of 18k BlockRAM used: 36 Total Memory used (KB): 1,080 out of 5,328 20% Number of BUFG/BUFGCTRLs: 17 out of 32 53% Number used as BUFGs: 17 Number of IDELAYCTRLs: 3 out of 22 13% Number of BSCANs: 2 out of 4 50% Number of DCM_ADVs: 7 out of 12 58% Number of DSP48Es: 1 out of 64 1%

Leon3x1_wFPU_wMulDiv:

Number of Slice Registers: 13,685 out of 69,120 19% Number used as Flip Flops: 13,684 Number used as Latch-thrus: 1 Number of Slice LUTs: 23,196 out of 69,120 33% Number used as logic: 22,841 out of 69,120 33% Number used as Memory: 326 out of 17,920 1% Number used as Dual Port RAM: 292 Number used as Shift Register: 34 Number used as exclusive route-thru: 29 Number of route-thrus: 292

Specific Feature Utilization: Number of BlockRAM/FIFO: 32 out of 148 21% Number using BlockRAM only: 32 Total primitives used: Number of 36k BlockRAM used: 12 Number of 18k BlockRAM used: 36 Total Memory used (KB): 1,080 out of 5,328 20% Number of BUFG/BUFGCTRLs: 17 out of 32 53% Number used as BUFGs: 17 Number of IDELAYCTRLs: 3 out of 22 13% Number of BSCANs: 2 out of 4 50% Number of DCM_ADVs: 7 out of 12 58% Number of DSP48Es: 1 out of 64 1%

43

5.2 Multicore Configurations

#5 Leon3x2_nFPU_nMulDiv (4way icache-8KB & dcache-4KB LRU)

#6 Leon3x2_nFPU_wMulDiv (5-cycle latency)

#7 Leon3x2_wFPU_wMulDiv (GRFPU-Lite)

#8 Leon3x4_nFPU_wMulDiv

#5 #6 #7 #8 Total Avail.

Slice Registers 28% 29% 32% 51% 69120

Slice LUTs 44% 46% 55% 81% 69120 BlockRAM/FIFO 38% 39% 38% 72% 148

Table 4. Multicore Synthesis Results

Leon3x2_nFPU_nMulDiv:

Slice Logic Utilization: Number of Slice Registers: 19,811 out of 69,120 28% Number used as Flip Flops: 19,808 Number used as Latch-thrus: 3 Number of Slice LUTs: 30,718 out of 69,120 44% Number used as logic: 30,350 out of 69,120 43% Number used as Memory: 317 out of 17,920 1% Number used as Dual Port RAM: 224 Number used as Shift Register: 93 Number used as exclusive route-thru: 51 Number of route-thrus: 380

Specific Feature Utilization: Number of BlockRAM/FIFO: 57 out of 148 38% Number using BlockRAM only: 57 Total primitives used: Number of 36k BlockRAM used: 22 Number of 18k BlockRAM used: 66 Total Memory used (KB): 1,980 out of 5,328 37% Number of BUFG/BUFGCTRLs: 17 out of 32 53% Number used as BUFGs: 17 Number of IDELAYCTRLs: 3 out of 22 13% Number of BSCANs: 2 out of 4 50% Number of DCM_ADVs: 7 out of 12 58%

44

Leon3x2_nFPU_wMulDiv:

Slice Logic Utilization: Number of Slice Registers: 20,137 out of 69,120 29% Number used as Flip Flops: 20,128 Number used as Latch-thrus: 9 Number of Slice LUTs: 32,144 out of 69,120 46% Number used as logic: 31,790 out of 69,120 45% Number used as Memory: 317 out of 17,920 1% Number used as Dual Port RAM: 224 Number used as Shift Register: 93 Number used as exclusive route-thru: 37 Number of route-thrus: 398

Specific Feature Utilization: Number of BlockRAM/FIFO: 59 out of 148 39% Number using BlockRAM only: 59 Total primitives used: Number of 36k BlockRAM used: 22 Number of 18k BlockRAM used: 66 Total Memory used (KB): 1,980 out of 5,328 37% Number of BUFG/BUFGCTRLs: 17 out of 32 53% Number used as BUFGs: 17 Number of IDELAYCTRLs: 3 out of 22 13% Number of BSCANs: 2 out of 4 50% Number of DCM_ADVs: 7 out of 12 58% Number of DSP48Es: 2 out of 64 3%

Leon3x2_wFPU_wMulDiv:

Slice Logic Utilization: Number of Slice Registers: 22,483 out of 69,120 32% Number used as Flip Flops: 22,480 Number used as Latch-thrus: 3 Number of Slice LUTs: 38,193 out of 69,120 55% Number used as logic: 37,703 out of 69,120 54% Number used as Memory: 441 out of 17,920 2% Number used as Dual Port RAM: 400 Number used as Shift Register: 41 Number used as exclusive route-thru: 49 Number of route-thrus: 420

Specific Feature Utilization: Number of BlockRAM/FIFO: 57 out of 148 38% Number using BlockRAM only: 57 Total primitives used: Number of 36k BlockRAM used: 22 Number of 18k BlockRAM used: 66 Total Memory used (KB): 1,980 out of 5,328 37% Number of BUFG/BUFGCTRLs: 17 out of 32 53% Number used as BUFGs: 17 Number of IDELAYCTRLs: 3 out of 22 13% Number of BSCANs: 2 out of 4 50% Number of DCM_ADVs: 7 out of 12 58% Number of DSP48Es: 2 out of 64 3%

45

Leon3x4_nFPU_wMulDiv:

Slice Logic Utilization: Number of Slice Registers: 35,338 out of 69,120 51% Number used as Flip Flops: 35,337 Number used as Latch-thrus: 1 Number of Slice LUTs: 56,106 out of 69,120 81% Number used as logic: 55,597 out of 69,120 80% Number used as Memory: 423 out of 17,920 2% Number used as Dual Port RAM: 264 Number used as Shift Register: 159 Number used as exclusive route-thru: 86 Number of route-thrus: 628

Specific Feature Utilization: Number of BlockRAM/FIFO: 108 out of 148 72% Number using BlockRAM only: 108 Total primitives used: Number of 36k BlockRAM used: 42 Number of 18k BlockRAM used: 126 Total Memory used (KB): 3,780 out of 5,328 70% Number of BUFG/BUFGCTRLs: 17 out of 32 53% Number used as BUFGs: 17 Number of IDELAYCTRLs: 3 out of 22 13% Number of BSCANs: 2 out of 4 50% Number of DCM_ADVs: 7 out of 12 58% Number of DSP48Es: 4 out of 64 6%

5.3 Benchmark Performance

In order to demonstrate the increased performance of multiple cores, the Linux OS was used to support a symmetric multiprocessing configuration. So the primary performance comparison across the various configurations is that of running

OpenMPBench’s Basicmath and Dhrystone from within Linux. The Dhrystone benchmark was compiled to run on the LEON3 as an independent application, in addition to it being built into the SnapGear Linux environment. It was interesting to find a significant performance difference between these two implementations. So that will be presented as well. On the single core system the focus was to demonstrate the effect on performance of some of the principle configuration options, like that of the floating point unit, the hardware multiplier, and the cache configuration. For these single core systems,

46

the Dhrystone performance is also provided as an application running independent of

Linux, being loaded and executed by the GRMON debug interface.

For each of the single core configurations, the Dhrystone benchmark executes

400,000 loops and reports the average number of Dhrystone loops completed per second.

Since all configurations have the processors running at 80MHz, the standard

DMIPS/MHz value is simply the Dhrystone loops per second divided by the VAX factor of 1757 and by the clock speed of the subject processor, 80MHz.

An example of the Linux command line entry and results are shown below for a single core system that includes MUL/DIV unit:

/ # time dhrystone Execution starts, 400000 runs through Dhrystone Microseconds for one run through Dhrystone: 12.3 per Second: 81355.9

Dhrystones MIPS : 46.3

real 0m 3.01s user 0m 2.97s sys 0m 0.03s / #

An example of the GRMON command line entry and results are shown below for the

same single core system that includes MUL/DIV unit:

grlib> load Leon3x1_nFPU_wMulDiv/Benchmarks/dhry.exe section: .text at 0x40000000, size 48592 bytes downloading: ... section: .data at 0x4000bdd0, size 2768 bytes downloading: ... total size: 51360 bytes (613.3 kbit/s) read 261 symbols entry point: 0x40000000

grlib> run Execution starts, 400000 runs through Dhrystone Microseconds for one run through Dhrystone: 7.5 Dhrystones per Second: 132890.4 Dhrystones MIPS : 75.6 Program exited normally. grlib>

47

For all configurations the OpenMPBench’s Basicmath was run with the small data set and performed the integer square root calculations. Since this benchmark does not calculate its own execution time, the Linux command “time” was used to provide that information. An example of the Linux command line entry and resulting information from the time command is shown below for a four core system:

/ # parallel_basicmath |------| Error: Insufficient Parameters. Maximum Workers are 8! AVAILABLE MATHEMATICAL OPERATIONS 1 : SOLVE CUBIC EQUATIONS 2 : CALCULATE INTEGER SQR ROOTS 3 : CALCULATE LONG SQR ROOTS 4 : PERFORM DEGREE TO RADIAN ANGLE CONVERSION 5 : PERFORM RADIAN TO DEGREE ANGLE CONVERSION AVAILABLE DATASETS 1 : LARGE DATA SET 2 : SMALL DATA SET Commands to run! Command Format: OjbectFileName Workers DataSetSize ! Example: parallel basic math : ' ./parallel_basicmath 1 2 1'! |------| / # / # time parallel_basicmath 2 4 2 ********* CALCULATE INTEGER SQR ROOTS : Worker 3 *********** ********* CALCULATE INTEGER SQR ROOTS : Worker 1 *********** ********* CALCULATE INTEGER SQR ROOTS : Worker 4 *********** ********* CALCULATE INTEGER SQR ROOTS : Worker 2 *********** Finish Working real 12m 54.54s user 51m 19.26s sys 0m 0.04s / #

Build DMIPS/MHz Dhry./sec. Dhry./sec Basicmath on Linux on Linux Leon3x1_ nMulDiv 0.9121 128205.1 76677.3 real 51m 19.32s user 51m 19.25s sys 0m 0.02s Leon3x1_ wMulDiv 0.9454 132890.4 80267.6 real 51m 19.39s user 51m 19.31s sys 0m 0.01s Leon3x2_ nMulDiv 0.8949 125786.2 76190.5 real 25m 42.13s user 51m 18.01s sys 0m 0.02s Leon3x2_ wMulDiv 0.9423 132450.3 79734.2 real 25m 40.05s user 51m 18.04s sys 0m 0.01s Leon3x4_ wMulDiv 0.9679 136054.4 81355.9 real 12m 54.54s user 51m 19.26s sys 0m 0.04s Table 5. Benchmark Performance Results

48

Table 5 summarizes the benchmark performance for several Leon3 configurations, including ones with single, dual, and quad cores, as well as with and without the hardware Multiply/Divide unit. The Dhrystone results demonstrate a general performance improvement when the application was compiled and run with the hardware implemented multiply and divide functionality, as opposed to that functionality being implemented in software for the configurations that did not have the MulDiv unit. Similarly, the

Basicmath results demonstrate a significant speed up of the real execution time as the number of cores increased from one to four. However, one can still see by the values on the line for user domain time, that the same amount of work was being done, about 51.4 minutes, though being distributed between the two or four cores.

Another interesting result to note was the significant reduction in apparent performance when Dhrystone was run in the Linux environment rather than as an independent application.

49

6. Summary

6.1 Conclusion

The results of this work demonstrated the reconfigurability of the platform along with that of its processor core, the LEON3. It was further demonstrated that utilizing the

LEON3, one can fit at least four of these CPU cores into the boards FPGA. With the hardware and software implementation flows established, configurability demonstrated, and the trail blazed for development on this platform, one can utilize this platform for future study and benchmarking of DPR designs.

6.2 Future Work

The following list provides ideas for consideration of future work.

• Tune the values in the GRLIB boot loader options to maximize performance

• Test the remainder of the benchmark OpenMPBench

• Develop an environment to study Asymmetric Multiprocessing (AMP)

• Develop an implementation flow for a DPR design

50

Appendix A

This appendix provides console transcripts of the following operations:

• GRMON help output, providing listing of command and description

• GRMON console transcript of connecting to target and benchmark

• Console transcript of Linux boot and benchmark execution

• Series of GRLIB make xconfig windows

o Config.vhd file that results from GRLIB make xconfig

• Series of SnapGear make xconfig windows

o A set of .config file that results from a SnapGear make xconfig

51

GRMON Commands

Builtin Commands: batch execute a batch file of grmon commands break print or add breakpoint cont continue execution dcache show data cache debug change or show debug level delete delete breakpoint(s) disassemble disassemble memory echo echo string in monitor window exit see 'quit' float display FPU registers gdb connect to gdb debugger go start execution without initialisation hbreak print breakpoints or add hardware breakpoint (if available) help show available commands or usage for specific command icache show instruction cache show leon registers load load a file mem see 'x'(examine memory) profile Enable or show profiling register show/set integer registers reset reset active backend run reset and start execution at last load address shell execute a shell command step single step one or [n] times symbols show symbols or load symbols from file target change backend quit exit grmon verify verify downloaded image version show version watch print or add watchpoint wmem write word to memory x examine memory hasp hasp vmem examine virtual memory

Backend specific commands:

ahb [trace_length] show AHB trace baud change DSU baud rate perf [en|dis] Enable/disable/show performance statistics hist [trace_length] show trace history info drv Show all debug drivers info libs Show all debug libraries info reg Show system registers info sys Show system configuration init re-initialise processor inst [trace_length] show traced instructions mmu print mmu registers stack set stack pointer for next run tm [ahb|cpu|both] select trace mode verify verify memory contents va performs a virtual-to-physical translation of address

flash print the detected flash memory configuration flash blank [range]|all blank check flash memory flash erase [range]|all erase flash memory blocks flash load program flash memory from ELF or srecord file flash lock [addr]|all lock flash memory blocks flash lockdown [addr]|all lockdown flash memory blocks flash query print the flash memory query register contents flash status print the flash memory block lock status flash unlock [addr]|all unlock flash memory blocks flash write [addr] [data] write single data value to flash address

Press Ctrl-C to interrupt execution.

52

GRMON Console Transcript – Connecting to target and benchmark

!SESSION Mon May 28 18:48:31 2012 !GRMON version: v1.1.52 evaluation version !Command line: grmon-eval -xilusb -u -log XUPV5_Grmon_052812f.log This evaluation version will expire on 10/10/2012 Try to open libusb filter driver (install from http://libusb-win32.sourceforge.net) Xilinx cable: Cable type/rev : 0x3 JTAG chain: xc5vlx110t xccace xc95144xl xcf32p xcf32p

Device ID: : 0x509 GRLIB build version: 4108

initialising detected frequency: 80 MHz SRAM waitstates: 2

Component Vendor LEON3 SPARC V8 Processor Gaisler Research LEON3 SPARC V8 Processor Gaisler Research LEON3 SPARC V8 Processor Gaisler Research LEON3 SPARC V8 Processor Gaisler Research AHB Debug UART Gaisler Research AHB Debug JTAG TAP Gaisler Research SVGA Controller Gaisler Research GR Ethernet MAC Gaisler Research DDR2 Controller Gaisler Research AHB/APB Bridge Gaisler Research LEON3 Debug Support Unit Gaisler Research LEON2 Memory Controller European Space Agency System ACE I/F Controller Gaisler Research Generic APB UART Gaisler Research Multi-processor Interrupt Ctrl Gaisler Research Modular Timer Unit Gaisler Research PS/2 interface Gaisler Research PS/2 interface Gaisler Research General purpose I/O port Gaisler Research AMBA Wrapper for OC I2C-master Gaisler Research AMBA Wrapper for OC I2C-master Gaisler Research AHB status register Gaisler Research

Use command 'info sys' to print a detailed report of attached cores

00.01:003 Gaisler Research LEON3 SPARC V8 Processor (ver 0x0) ahb master 0 01.01:003 Gaisler Research LEON3 SPARC V8 Processor (ver 0x0) ahb master 1 02.01:003 Gaisler Research LEON3 SPARC V8 Processor (ver 0x0) ahb master 2 03.01:003 Gaisler Research LEON3 SPARC V8 Processor (ver 0x0) ahb master 3 04.01:007 Gaisler Research AHB Debug UART (ver 0x0) ahb master 4 apb: 80000700 - 80000800 baud rate 115200, ahb frequency 80.00 05.01:01c Gaisler Research AHB Debug JTAG TAP (ver 0x1) ahb master 5 06.01:063 Gaisler Research SVGA Controller (ver 0x0) ahb master 6 apb: 80000600 - 80000700 clk0: 25.00 MHz clk1: 25.00 MHz clk2: 40.00 MHz clk3: 65.00 MHz 07.01:01d Gaisler Research GR Ethernet MAC (ver 0x0) ahb master 7, irq 12 apb: 80000b00 - 80000c00 Device index: dev0 edcl ip 192.168.0.52, buffer 2 kbyte 00.01:02e Gaisler Research DDR2 Controller (ver 0x0) ahb: 40000000 - 60000000 ahb: fff00100 - fff00200 64-bit DDR2 : 1 * 256 Mbyte @ 0x40000000, 4 internal banks

53

190 MHz, col 10, ref 7.8 us, trfc 131 ns 01.01:006 Gaisler Research AHB/APB Bridge (ver 0x0) ahb: 80000000 - 80100000 02.01:004 Gaisler Research LEON3 Debug Support Unit (ver 0x1) ahb: 90000000 - a0000000 AHB trace 128 lines, 32-bit bus, stack pointer 0x4ffffff0 CPU#0 win 8, hwbp 2, itrace 128, V8 mul/div, srmmu, lddel 1 icache 4 * 8 kbyte, 16 byte/line lru dcache 4 * 4 kbyte, 16 byte/line lru CPU#1 win 8, hwbp 2, itrace 128, V8 mul/div, srmmu, lddel 1 icache 4 * 8 kbyte, 16 byte/line lru dcache 4 * 4 kbyte, 16 byte/line lru CPU#2 win 8, hwbp 2, itrace 128, V8 mul/div, srmmu, lddel 1 icache 4 * 8 kbyte, 16 byte/line lru dcache 4 * 4 kbyte, 16 byte/line lru CPU#3 win 8, hwbp 2, itrace 128, V8 mul/div, srmmu, lddel 1 icache 4 * 8 kbyte, 16 byte/line lru dcache 4 * 4 kbyte, 16 byte/line lru 03.04:00f European Space Agency LEON2 Memory Controller (ver 0x1) ahb: 00000000 - 20000000 ahb: 20000000 - 40000000 ahb: c0000000 - c2000000 apb: 80000000 - 80000100 16-bit prom @ 0x00000000 32-bit static ram: 1 * 1024 kbyte @ 0xc0000000 04.01:067 Gaisler Research System ACE I/F Controller (ver 0x0) irq 13 ahb: fff00200 - fff00300 01.01:00c Gaisler Research Generic APB UART (ver 0x1) irq 2 apb: 80000100 - 80000200 baud rate 38461, DSU mode (FIFO debug) 02.01:00d Gaisler Research Multi-processor Interrupt Ctrl (ver 0x3) apb: 80000200 - 80000300 03.01:011 Gaisler Research Modular Timer Unit (ver 0x0) irq 8 apb: 80000300 - 80000400 8-bit scaler, 2 * 32-bit timers, divisor 80 04.01:060 Gaisler Research PS/2 interface (ver 0x2) irq 4 apb: 80000400 - 80000500 05.01:060 Gaisler Research PS/2 interface (ver 0x2) irq 5 apb: 80000500 - 80000600 08.01:01a Gaisler Research General purpose I/O port (ver 0x1) apb: 80000800 - 80000900 09.01:028 Gaisler Research AMBA Wrapper for OC I2C-master (ver 0x2) irq 14 apb: 80000900 - 80000a00 Controller index for use in GRMON: 1 0c.01:028 Gaisler Research AMBA Wrapper for OC I2C-master (ver 0x2) irq 11 apb: 80000c00 - 80000d00 Controller index for use in GRMON: 2 0f.01:052 Gaisler Research AHB status register (ver 0x0) irq 7 apb: 80000f00 - 80001000 grlib>

54

grlib> lo Leon3x4_nFPU_wMulDiv/Benchmarks/dhry.exe section: .text at 0x40000000, size 48592 bytes downloading: 0 downloading: 8192 downloading: 16384 downloading: 24576 downloading: 32768 downloading: 40960 downloading: 48592 section: .data at 0x4000bdd0, size 2768 bytes downloading: 0 downloading: 2768 total size: 51360 bytes (553.0 kbit/s) read 261 symbols entry point: 0x40000000 grlib> run Execution starts, 400000 runs through Dhrystone

Microseconds for one run through Dhrystone: 7.3

Dhrystones per Second: 136054.4

Dhrystones MIPS : 77.4

Program exited normally. grlib> grlib> exit Closing Xilinx cable

Console transcript of Linux boot and benchmarks execution

Booting Linux Booting Linux... PROMLIB: Sun Boot Prom Version 0 Revision 0 Linux version 2.6.21.1 (dat20@hp12) (gcc version 3.4.4) #10 SMP Mon May 28 18:15:41 EDT 2012 ARCH: LEON TYPE: Leon2/3 System-on-a-Chip Ethernet address: 0:0:0:0:0:0 CACHE: 4-way associative cache, set size 4k Boot time fixup v1.6. 4/Mar/98 Jakub Jelinek ([email protected]). Patching kernel for srmmu[Leon2]/iommu 64MB HIGHMEM available. Nocache: 0xfc000000-0xfc400000, 1024 pages [128-1280] node 2: /cpu00 (type:cpu) (props:.node device_type mid mmu-nctx clock-frequency uart1_baud uart2_baud ) node 3: /a: (type:serial) (props:.node device_type name ) node 4: /ambapp0 (type:ambapp) (props:.node device_type name ) node 5: /cpu01 (type:cpu) (props:.node device_type mid clock-frequency ) node 6: /cpu02 (type:cpu) (props:.node device_type mid clock-frequency ) node 7: /cpu03 (type:cpu) (props:.node device_type mid clock-frequency ) PROM: Built device tree from rootnode 1 with 2483 bytes of memory. DEBUG: psr.impl = 0xf fsr.vers = 0x7 Built 1 zonelists. Total pages: 63909 Kernel command line: console=ttyS0,38400 rdinit=/sbin/init PID hash table entries: 1024 (order: 10, 4096 bytes) Todo: init master_l10_counter Attaching grlib apbuart serial drivers (clk:80hz): Console: colour dummy device 80x25 Dentry cache hash table entries: 32768 (order: 5, 131072 bytes) Inode-cache hash table entries: 16384 (order: 4, 65536 bytes) pkbase: 0xfc800000 pkend: 0xfcc00000 fixstart 0xfce5e000 Memory: 251200k/262144k available (1824k kernel code, 10844k reserved, 228k data, 2248k init, 65536k highmem) Mount-cache hash table entries: 512 Entering SMP Mode... 0:(4:32) cpus mpirq at 0x80000210 Starting CPU 1 : (irqmp: 0x80000210) DEBUG: psr.impl = 0xf fsr.vers = 0x7

55

Started CPU 1 Starting CPU 2 : (irqmp: 0x80000210) DEBUG: psr.impl = 0xf fsr.vers = 0x7 Started CPU 2 Starting CPU 3 : (irqmp: 0x80000210) DEBUG: psr.impl = 0xf fsr.vers = 0x7 Started CPU 3 Brought up 4 CPUs Total of 4 processors activated (319.48 BogoMIPS). migration_cost=10000 NET: Registered protocol family 16 NET: Registered protocol family 2 IP route cache hash table entries: 2048 (order: 1, 8192 bytes) TCP established hash table entries: 8192 (order: 4, 98304 bytes) TCP bind hash table entries: 8192 (order: 4, 65536 bytes) TCP: Hash tables configured (established 8192 bind 8192) TCP reno registered leon: power management initialized highmem bounce pool size: 64 pages io scheduler noop registered io scheduler cfq registered (default) grlib apbuart: 1 serial driver(s) at [0x80000100(irq 2)] grlib apbuart: system frequency: 80000 khz, baud rates: 38400 38400 ttyS0 at MMIO 0x80000100 (irq = 2) is a Leon Testing fifo size for UART port 0: got 4 bytes. RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize loop: loaded (max 8 devices) Probing GRETH Ethernet Core at 0x80000b00 Detected MARVELL 88EE1111 Revision 2 10/100 GRETH Ethermac at [0x80000b00] irq 12. Running 100 Mbps full duplex TCP cubic registered NET: Registered protocol family 1 NET: Registered protocol family 10 IPv6 over IPv4 tunneling driver Freeing unused kernel memory: 2248k freed init started: BusyBox v1.8.2 (2012-05-28 03:39:37 EDT) starting pid 33, tty '': '/etc/init.d/rcS' mount: mounting tmpfs on /var/tmp failed: Invalid argument starting pid 44, tty '': '/bin/sh' / # / # time dhrystone Execution starts, 400000 runs through Dhrystone Microseconds for one run through Dhrystone: 12.3 Dhrystones per Second: 81355.9

Dhrystones MIPS : 46.3 real 0m 3.01s user 0m 2.97s sys 0m 0.03s / # / # time parallel_basicmath 2 2 4 2 ********* CALCULATE INTEGER SQR ROOTS : Worker 3 *********** ********* CALCULATE INTEGER SQR ROOTS : Worker 1 *********** ********* CALCULATE INTEGER SQR ROOTS : Worker 2 *********** ********* CALCULATE INTEGER SQR ROOTS : Worker 4 *********** Finish Working real 12m 54.53s user 51m 19.16s sys 0m 0.02s / #

56

GRLIB hardware configuration: # make xconfig

57

58

59

config.vhd file for GRLIB build of single core Leon3 w/ FPU & Mul/Div instructions

------LEON3 Demonstration design test bench configuration -- Copyright (C) 2009 Aeroflex Gaisler ------

library techmap; use techmap.gencomp.all;

package config is

-- Technology and synthesis options constant CFG_FABTECH : integer := virtex5; constant CFG_MEMTECH : integer := virtex5; constant CFG_PADTECH : integer := virtex5; constant CFG_NOASYNC : integer := 0; constant CFG_SCAN : integer := 0;

-- Clock generator constant CFG_CLKTECH : integer := virtex5; constant CFG_CLKMUL : integer := (8); constant CFG_CLKDIV : integer := (10); constant CFG_OCLKDIV : integer := 2; constant CFG_PCIDLL : integer := 0; constant CFG_PCISYSCLK: integer := 0; constant CFG_CLK_NOFB : integer := 0;

-- LEON3 processor core constant CFG_LEON3 : integer := 1; constant CFG_NCPU : integer := (2); constant CFG_NWIN : integer := (8); constant CFG_V8 : integer := 2; constant CFG_MAC : integer := 0; constant CFG_SVT : integer := 0; constant CFG_RSTADDR : integer := 16#00000#; constant CFG_LDDEL : integer := (1); constant CFG_NWP : integer := (2); constant CFG_PWD : integer := 1*2; constant CFG_FPU : integer := (8+0) + 16*1; constant CFG_GRFPUSH : integer := 0; constant CFG_ICEN : integer := 1; constant CFG_ISETS : integer := 4; constant CFG_ISETSZ : integer := 8; constant CFG_ILINE : integer := 4; constant CFG_IREPL : integer := 0; constant CFG_ILOCK : integer := 0; constant CFG_ILRAMEN : integer := 0; constant CFG_ILRAMADDR: integer := 16#8E#; constant CFG_ILRAMSZ : integer := 1; constant CFG_DCEN : integer := 1; constant CFG_DSETS : integer := 4; constant CFG_DSETSZ : integer := 4; constant CFG_DLINE : integer := 4; constant CFG_DREPL : integer := 0; constant CFG_DLOCK : integer := 0; constant CFG_DSNOOP : integer := 1 + 1 + 4*1; constant CFG_DFIXED : integer := 16#0#; constant CFG_DLRAMEN : integer := 0; constant CFG_DLRAMADDR: integer := 16#8F#; constant CFG_DLRAMSZ : integer := 1; constant CFG_MMUEN : integer := 1; constant CFG_ITLBNUM : integer := 8; constant CFG_DTLBNUM : integer := 2; constant CFG_TLB_TYPE : integer := 1 + 0*2; constant CFG_TLB_REP : integer := 1; constant CFG_DSU : integer := 1; constant CFG_ITBSZ : integer := 2;

60

constant CFG_ATBSZ : integer := 2; constant CFG_LEON3FT_EN : integer := 0; constant CFG_IUFT_EN : integer := 0; constant CFG_FPUFT_EN : integer := 0; constant CFG_RF_ERRINJ : integer := 0; constant CFG_CACHE_FT_EN : integer := 0; constant CFG_CACHE_ERRINJ : integer := 0; constant CFG_LEON3_NETLIST: integer := 0; constant CFG_DISAS : integer := 0 + 0; constant CFG_PCLOW : integer := 2;

-- AMBA settings constant CFG_DEFMST : integer := (0); constant CFG_RROBIN : integer := 1; constant CFG_SPLIT : integer := 1; constant CFG_AHBIO : integer := 16#FFF#; constant CFG_APBADDR : integer := 16#800#; constant CFG_AHB_MON : integer := 0; constant CFG_AHB_MONERR : integer := 0; constant CFG_AHB_MONWAR : integer := 0;

-- DSU UART constant CFG_AHB_UART : integer := 1;

-- JTAG based DSU interface constant CFG_AHB_JTAG : integer := 1;

-- Ethernet DSU constant CFG_DSU_ETH : integer := 1 + 0; constant CFG_ETH_BUF : integer := 2; constant CFG_ETH_IPM : integer := 16#C0A8#; constant CFG_ETH_IPL : integer := 16#0034#; constant CFG_ETH_ENM : integer := 16#020000#; constant CFG_ETH_ENL : integer := 16#000034#;

-- LEON2 memory controller constant CFG_MCTRL_LEON2 : integer := 1; constant CFG_MCTRL_RAM8BIT : integer := 0; constant CFG_MCTRL_RAM16BIT : integer := 1; constant CFG_MCTRL_5CS : integer := 0; constant CFG_MCTRL_SDEN : integer := 0; constant CFG_MCTRL_SEPBUS : integer := 0; constant CFG_MCTRL_INVCLK : integer := 0; constant CFG_MCTRL_SD64 : integer := 0; constant CFG_MCTRL_PAGE : integer := 0 + 0;

-- DDR controller constant CFG_DDR2SP : integer := 1; constant CFG_DDR2SP_INIT : integer := 1; constant CFG_DDR2SP_FREQ : integer := (190); constant CFG_DDR2SP_TRFC : integer := (130); constant CFG_DDR2SP_DATAWIDTH : integer := (64); constant CFG_DDR2SP_COL : integer := (10); constant CFG_DDR2SP_SIZE : integer := (256); constant CFG_DDR2SP_DELAY0 : integer := (0); constant CFG_DDR2SP_DELAY1 : integer := (0); constant CFG_DDR2SP_DELAY2 : integer := (0); constant CFG_DDR2SP_DELAY3 : integer := (0); constant CFG_DDR2SP_DELAY4 : integer := (0); constant CFG_DDR2SP_DELAY5 : integer := (0); constant CFG_DDR2SP_DELAY6 : integer := (0); constant CFG_DDR2SP_DELAY7 : integer := (0);

-- AHB status register constant CFG_AHBSTAT : integer := 1; constant CFG_AHBSTATN : integer := (1);

-- AHB ROM constant CFG_AHBROMEN : integer := 0; constant CFG_AHBROPIP : integer := 0; constant CFG_AHBRODDR : integer := 16#000#;

61

constant CFG_ROMADDR : integer := 16#000#; constant CFG_ROMMASK : integer := 16#E00# + 16#000#;

-- AHB RAM constant CFG_AHBRAMEN : integer := 0; constant CFG_AHBRSZ : integer := 1; constant CFG_AHBRADDR : integer := 16#A00#;

-- Gaisler Ethernet core constant CFG_GRETH : integer := 1; constant CFG_GRETH1G : integer := 0; constant CFG_ETH_FIFO : integer := 32;

-- UART 1 constant CFG_UART1_ENABLE : integer := 1; constant CFG_UART1_FIFO : integer := 4;

-- LEON3 interrupt controller constant CFG_IRQ3_ENABLE : integer := 1; constant CFG_IRQ3_NSEC : integer := 0;

-- Modular timer constant CFG_GPT_ENABLE : integer := 1; constant CFG_GPT_NTIM : integer := (2); constant CFG_GPT_SW : integer := (8); constant CFG_GPT_TW : integer := (32); constant CFG_GPT_IRQ : integer := (8); constant CFG_GPT_SEPIRQ : integer := 1; constant CFG_GPT_WDOGEN : integer := 0; constant CFG_GPT_WDOG : integer := 16#0#;

-- GPIO port constant CFG_GRGPIO_ENABLE : integer := 1; constant CFG_GRGPIO_IMASK : integer := 16#0FFFE#; constant CFG_GRGPIO_WIDTH : integer := (32);

-- I2C master constant CFG_I2C_ENABLE : integer := 1;

-- VGA and PS2/ interface constant CFG_KBD_ENABLE : integer := 1; constant CFG_VGA_ENABLE : integer := 0; constant CFG_SVGA_ENABLE : integer := 1;

-- AMBA System ACE Interface Controller constant CFG_GRACECTRL : integer := 1;

-- GRLIB debugging constant CFG_DUART : integer := 0;

end;

62

SnapGear software configuration: # make xconfig

63

64

65

66

67

68

config.vendor file for SnapGear build of multicore Leon3 w/ Mul/Div instructions

# # Automatically generated make config: don't edit #

# # Vendor/Product Selection #

# # Select the Vendor you wish to target # CONFIG_DEFAULTS_GAISLER=y

# # Select the Product you wish to target # # CONFIG_DEFAULTS_GAISLER_LEON2MMU is not set CONFIG_DEFAULTS_GAISLER_LEON3MMU=y

# # Select the options for a selected Product #

# # Gaisler/Leon2/3 MMU options # CONFIG_DEFAULTS_GAISLER_LEON2_MV8=y # CONFIG_DEFAULTS_GAISLER_LEON2_FPU is not set # CONFIG_DEFAULTS_GAISLER_LEON2_FREQ_20000 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_FREQ_25000 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_FREQ_30000 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_FREQ_40000 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_FREQ_50000 is not set CONFIG_DEFAULTS_GAISLER_LEON2_FREQ_80000=y # CONFIG_DEFAULTS_GAISLER_LEON2_FREQ_100000 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_FREQ_101000 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_BAUDRATE_9600 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_BAUDRATE_19200 is not set CONFIG_DEFAULTS_GAISLER_LEON2_BAUDRATE_38400=y # CONFIG_DEFAULTS_GAISLER_LEON2_LOOPBACK is not set # CONFIG_DEFAULTS_GAISLER_LEON2_UARTFLOWCTRL is not set # CONFIG_KERNEL_ROOTMEM_ROMFS is not set CONFIG_KERNEL_ROOTMEM_INITRAMFS=y # CONFIG_KERNEL_ROOTMEM_NONE is not set CONFIG_KERNEL_INITRAMFS_SOURCE="" # CONFIG_KERNEL_INITRAMFS_SOURCE_EXCLUSIVE is not set CONFIG_KERNEL_INIT_PATH="/sbin/init" CONFIG_KERNEL_COMMAND_LINE="console=ttyS0,38400" # CONFIG_DEFAULTS_GAISLER_LEON2_ROMSIZE_8k is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROMSIZE_16k is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROMSIZE_32k is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROMSIZE_64k is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROMSIZE_128k is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROMSIZE_256k is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROMSIZE_512k is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROMSIZE_1mb is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROMSIZE_2mb is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROMSIZE_4mb is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROMSIZE_8mb is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROMSIZE_16mb is not set CONFIG_DEFAULTS_GAISLER_LEON2_ROMSIZE_32mb=y # CONFIG_DEFAULTS_GAISLER_LEON2_ROMSIZE_64mb is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROMSIZE_128mb is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROMSIZE_256mb is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROM_rws_0 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROM_rws_1 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROM_rws_2 is not set

69

# CONFIG_DEFAULTS_GAISLER_LEON2_ROM_rws_3 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROM_rws_4 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROM_rws_5 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROM_rws_6 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROM_rws_7 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROM_rws_8 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROM_rws_9 is not set CONFIG_DEFAULTS_GAISLER_LEON2_ROM_rws_10=y # CONFIG_DEFAULTS_GAISLER_LEON2_ROM_rws_11 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROM_rws_12 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROM_rws_13 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROM_rws_14 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROM_rws_15 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROM_wws_0 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROM_wws_1 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROM_wws_2 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROM_wws_3 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROM_wws_4 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROM_wws_5 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROM_wws_6 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROM_wws_7 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROM_wws_8 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROM_wws_9 is not set CONFIG_DEFAULTS_GAISLER_LEON2_ROM_wws_10=y # CONFIG_DEFAULTS_GAISLER_LEON2_ROM_wws_11 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROM_wws_12 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROM_wws_13 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROM_wws_14 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROM_wws_15 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_ROM_WE is not set # CONFIG_DEFAULTS_GAISLER_LEON2_USE_SRAM is not set # CONFIG_DEFAULTS_GAISLER_LEON2_USE_SDRAM is not set # CONFIG_DEFAULTS_GAISLER_LEON2_USE_DDRSDRAM is not set CONFIG_DEFAULTS_GAISLER_LEON2_USE_DDR2SDRAM=y

# # SDRam options # # CONFIG_DEFAULTS_GAISLER_LEON2_SDRAMSIZE_4mb is not set # CONFIG_DEFAULTS_GAISLER_LEON2_SDRAMSIZE_8mb is not set # CONFIG_DEFAULTS_GAISLER_LEON2_SDRAMSIZE_16mb is not set # CONFIG_DEFAULTS_GAISLER_LEON2_SDRAMSIZE_32mb is not set # CONFIG_DEFAULTS_GAISLER_LEON2_SDRAMSIZE_64mb is not set # CONFIG_DEFAULTS_GAISLER_LEON2_SDRAMSIZE_128mb is not set CONFIG_DEFAULTS_GAISLER_LEON2_SDRAMSIZE_256mb=y # CONFIG_DEFAULTS_GAISLER_LEON2_SDRAMSIZE_512mb is not set # CONFIG_DEFAULTS_GAISLER_LEON2_SDRAMSIZE_1024mb is not set CONFIG_DEFAULTS_GAISLER_LEON2_SDRAMBANKS_1=y # CONFIG_DEFAULTS_GAISLER_LEON2_SDRAMBANKS_2 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_SDRAMBANKS_3 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_SDRAMBANKS_4 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_SDRAMCOL_256 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_SDRAMCOL_512 is not set CONFIG_DEFAULTS_GAISLER_LEON2_SDRAMCOL_1024=y # CONFIG_DEFAULTS_GAISLER_LEON2_SDRAMCOL_2048 is not set # CONFIG_DEFAULTS_GAISLER_LEON2_SDRAMCOL_4096 is not set CONFIG_DEFAULTS_GAISLER_LEON2_SDRAM_refresh="7.8" CONFIG_DEFAULTS_GAISLER_LEON2_DDR2SDRAM_FREQ=190 # CONFIG_DEFAULTS_GAISLER_LEON2_SDRAM_SRAMKEEP is not set

# # Kernel image position # # CONFIG_KERNEL_PHYSICAL_ADDR_SET is not set

# # Kernel/Library/Defaults Selection # CONFIG_DEFAULTS_KERNEL_2_6_21_1=y # CONFIG_DEFAULTS_KERNEL_2_6_29 is not set CONFIG_DEFAULTS_LIBC_GLIBC_FROM_COMPILER=y

70

# CONFIG_DEFAULTS_LIBC_UCLIBC_FROM_COMPILER is not set # CONFIG_DEFAULTS_LIBC_NONE is not set # CONFIG_DEFAULTS_OVERRIDE is not set # CONFIG_DEFAULTS_KERNEL is not set # CONFIG_DEFAULTS_VENDOR is not set # CONFIG_DEFAULTS_VENDOR_UPDATE is not set

# # Template Configurations # CONFIG_TEMPLATE_LEON3MMU_NONE=y # CONFIG_TEMPLATE_LEON3MMU_GR_L4ITX_BUSYBOX is not set # CONFIG_TEMPLATE_LEON3MMU_GR_L4ITX_SERIAL is not set # CONFIG_TEMPLATE_LEON3MMU_GR_L4ITX_VIDEO is not set # CONFIG_TEMPLATE_LEON3MMU_GR_XC3S_1500 is not set # CONFIG_TEMPLATE_LEON3MMU_HAPS_51 is not set # CONFIG_TEMPLATE_LEON3MMU_LEON3_ALTERA_EP2S60_DDR is not set # CONFIG_TEMPLATE_LEON3MMU_LEON3_XILINX_ML509 is not set # CONFIG_TEMPLATE_LEON3MMU_NETFILTER is not set # CONFIG_TEMPLATE_LEON3MMU_NFS_ROOT is not set # CONFIG_TEMPLATE_LEON3MMU_VGA_PS2 is not set # CONFIG_TEMPLATES_UPDATE is not set CONFIG_VENDOR=gaisler CONFIG_PRODUCT=leon3mmu CONFIG_LINUXDIR=linux-2.6.21.1 CONFIG_LIBCDIR=glibc-from-compiler/build CONFIG_LANGUAGE=

config.linux file for SnapGear build of multicore Leon3 w/ Mul/Div instructions

# # Automatically generated make config: don't edit # Linux kernel version: 2.6.21.1 # Thu May 24 19:32:10 2012 # CONFIG_MMU=y CONFIG_HIGHMEM=y CONFIG_ZONE_DMA=y CONFIG_GENERIC_ISA_DMA=y CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

# # Code maturity level options # CONFIG_EXPERIMENTAL=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32

# # General setup # CONFIG_LOCALVERSION="" # CONFIG_LOCALVERSION_AUTO is not set # CONFIG_SWAP is not set # CONFIG_SYSVIPC is not set # CONFIG_POSIX_MQUEUE is not set CONFIG_BSD_PROCESS_ACCT=y # CONFIG_BSD_PROCESS_ACCT_V3 is not set # CONFIG_TASKSTATS is not set # CONFIG_UTS_NS is not set # CONFIG_AUDIT is not set # CONFIG_IKCONFIG is not set # CONFIG_CPUSETS is not set CONFIG_SYSFS_DEPRECATED=y # CONFIG_RELAY is not set CONFIG_BLK_DEV_INITRD=y CONFIG_INITRAMFS_SOURCE="initramfs-root.txt" CONFIG_INITRAMFS_ROOT_UID=0 CONFIG_INITRAMFS_ROOT_GID=0

71

# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_SYSCTL=y CONFIG_EMBEDDED=y CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y # CONFIG_KALLSYMS is not set # CONFIG_HOTPLUG is not set CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y # CONFIG_EPOLL is not set CONFIG_SHMEM=y CONFIG_SLAB=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_RT_MUTEXES=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 # CONFIG_SLOB is not set

# # Loadable module support # # CONFIG_MODULES is not set

# # Block layer # CONFIG_BLOCK=y # CONFIG_LBD is not set # CONFIG_BLK_DEV_IO_TRACE is not set # CONFIG_LSF is not set

# # IO Schedulers # CONFIG_IOSCHED_NOOP=y # CONFIG_IOSCHED_AS is not set # CONFIG_IOSCHED_DEADLINE is not set CONFIG_IOSCHED_CFQ=y # CONFIG_DEFAULT_AS is not set # CONFIG_DEFAULT_DEADLINE is not set CONFIG_DEFAULT_CFQ=y # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED="cfq"

# # General machine setup # CONFIG_SMP=y CONFIG_NR_CPUS=32 CONFIG_SPARC=y CONFIG_SPARC32=y CONFIG_SBUS=y CONFIG_SBUSCHAR=y CONFIG_SERIAL_CONSOLE=y CONFIG_SUN_AUXIO=y CONFIG_SUN_IO=y CONFIG_RWSEM_GENERIC_SPINLOCK=y CONFIG_GENERIC_FIND_NEXT_BIT=y CONFIG_GENERIC_HWEIGHT=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y # CONFIG_ARCH_HAS_ILOG2_U32 is not set # CONFIG_ARCH_HAS_ILOG2_U64 is not set CONFIG_SUN_PM=y CONFIG_LEON=y CONFIG_PAGE_SIZE_LEON_4K=y # CONFIG_PAGE_SIZE_LEON_8K is not set # CONFIG_PAGE_SIZE_LEON_16K is not set

72

CONFIG_LEON_3=y

# # Grlib: Amba device driver configuration # CONFIG_AMBA_PROC=y # CONFIG_AMBA_PNP_PRINT is not set

# # Vendor Gaisler # # CONFIG_GRLIB_GAISLER_GPIO is not set CONFIG_GRLIB_GAISLER_APBUART=y CONFIG_GRLIB_GAISLER_APBUART_CONSOLE=y CONFIG_GRLIB_GAISLER_GRETH=y CONFIG_GRLIB_GAISLER_GRETH_MACMSB=00007A CONFIG_GRLIB_GAISLER_GRETH_MACLSB=CC0012

# # Vendor Opencores # # CONFIG_GRLIB_OPENCORES_ETHERMAC is not set # CONFIG_PCI is not set # CONFIG_SUN_OPENPROMFS is not set # CONFIG_SPARC_LED is not set CONFIG_BINFMT_ELF=y # CONFIG_BINFMT_AOUT is not set # CONFIG_BINFMT_MISC is not set # CONFIG_SUNOS_EMUL is not set CONFIG_SELECT_MEMORY_MODEL=y CONFIG_FLATMEM_MANUAL=y # CONFIG_DISCONTIGMEM_MANUAL is not set # CONFIG_SPARSEMEM_MANUAL is not set CONFIG_FLATMEM=y CONFIG_FLAT_NODE_MEM_MAP=y # CONFIG_SPARSEMEM_STATIC is not set CONFIG_SPLIT_PTLOCK_CPUS=4 # CONFIG_RESOURCES_64BIT is not set CONFIG_ZONE_DMA_FLAG=1

# # Networking # CONFIG_NET=y

# # Networking options # # CONFIG_NETDEBUG is not set # CONFIG_PACKET is not set CONFIG_UNIX=y CONFIG_XFRM=y # CONFIG_XFRM_USER is not set # CONFIG_XFRM_SUB_POLICY is not set # CONFIG_XFRM_MIGRATE is not set # CONFIG_NET_KEY is not set CONFIG_INET=y # CONFIG_IP_MULTICAST is not set # CONFIG_IP_ADVANCED_ROUTER is not set CONFIG_IP_FIB_HASH=y CONFIG_IP_PNP=y # CONFIG_IP_PNP_DHCP is not set # CONFIG_IP_PNP_BOOTP is not set # CONFIG_IP_PNP_RARP is not set # CONFIG_NET_IPIP is not set # CONFIG_NET_IPGRE is not set # CONFIG_ARPD is not set # CONFIG_SYN_COOKIES is not set # CONFIG_INET_AH is not set # CONFIG_INET_ESP is not set # CONFIG_INET_IPCOMP is not set

73

# CONFIG_INET_XFRM_TUNNEL is not set CONFIG_INET_TUNNEL=y CONFIG_INET_XFRM_MODE_TRANSPORT=y CONFIG_INET_XFRM_MODE_TUNNEL=y CONFIG_INET_XFRM_MODE_BEET=y CONFIG_INET_DIAG=y CONFIG_INET_TCP_DIAG=y # CONFIG_TCP_CONG_ADVANCED is not set CONFIG_TCP_CONG_CUBIC=y CONFIG_DEFAULT_TCP_CONG="cubic" # CONFIG_TCP_MD5SIG is not set CONFIG_IPV6=y # CONFIG_IPV6_PRIVACY is not set # CONFIG_IPV6_ROUTER_PREF is not set # CONFIG_INET6_AH is not set # CONFIG_INET6_ESP is not set # CONFIG_INET6_IPCOMP is not set # CONFIG_IPV6_MIP6 is not set # CONFIG_INET6_XFRM_TUNNEL is not set # CONFIG_INET6_TUNNEL is not set CONFIG_INET6_XFRM_MODE_TRANSPORT=y CONFIG_INET6_XFRM_MODE_TUNNEL=y CONFIG_INET6_XFRM_MODE_BEET=y # CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION is not set CONFIG_IPV6_SIT=y # CONFIG_IPV6_TUNNEL is not set # CONFIG_IPV6_MULTIPLE_TABLES is not set # CONFIG_NETWORK_SECMARK is not set # CONFIG_NETFILTER is not set

# # DCCP Configuration (EXPERIMENTAL) # # CONFIG_IP_DCCP is not set

# # SCTP Configuration (EXPERIMENTAL) # # CONFIG_IP_SCTP is not set

# # TIPC Configuration (EXPERIMENTAL) # # CONFIG_TIPC is not set # CONFIG_ATM is not set # CONFIG_BRIDGE is not set # CONFIG_VLAN_8021Q is not set # CONFIG_DECNET is not set # CONFIG_LLC2 is not set # CONFIG_IPX is not set # CONFIG_ATALK is not set # CONFIG_X25 is not set # CONFIG_LAPB is not set # CONFIG_ECONET is not set # CONFIG_WAN_ROUTER is not set

# # QoS and/or fair queueing # # CONFIG_NET_SCHED is not set

# # Network testing # # CONFIG_NET_PKTGEN is not set # CONFIG_HAMRADIO is not set # CONFIG_IRDA is not set # CONFIG_BT is not set # CONFIG_IEEE80211 is not set

#

74

# Device Drivers #

# # Generic Driver Options # CONFIG_STANDALONE=y CONFIG_PREVENT_FIRMWARE_BUILD=y # CONFIG_DEBUG_DRIVER is not set # CONFIG_DEBUG_DEVRES is not set # CONFIG_SYS_HYPERVISOR is not set

# # Connector - unified userspace <-> kernelspace linker # # CONFIG_CONNECTOR is not set

# # Memory Technology Devices (MTD) # # CONFIG_MTD is not set

# # Parallel port support # # CONFIG_PARPORT is not set

# # Plug and Play support # # CONFIG_PNPACPI is not set

# # Block devices # # CONFIG_BLK_DEV_FD is not set # CONFIG_BLK_DEV_COW_COMMON is not set CONFIG_BLK_DEV_LOOP=y # CONFIG_BLK_DEV_CRYPTOLOOP is not set # CONFIG_BLK_DEV_NBD is not set CONFIG_BLK_DEV_RAM=y CONFIG_BLK_DEV_RAM_COUNT=16 CONFIG_BLK_DEV_RAM_SIZE=4096 CONFIG_BLK_DEV_RAM_BLOCKSIZE=1024 # CONFIG_CDROM_PKTCDVD is not set # CONFIG_XILINX_SYSACE is not set # CONFIG_ATA_OVER_ETH is not set

# # Misc devices #

# # ATA/ATAPI/MFM/RLL support # # CONFIG_IDE is not set

# # SCSI device support # # CONFIG_RAID_ATTRS is not set # CONFIG_SCSI is not set # CONFIG_SCSI_NETLINK is not set

# # Serial ATA (prod) and Parallel ATA (experimental) drivers # # CONFIG_ATA is not set

# # Multi-device support (RAID and LVM)

75

# # CONFIG_MD is not set

# # Fusion MPT device support # # CONFIG_FUSION is not set

# # IEEE 1394 (FireWire) support #

# # I2O device support #

# # Network device support # CONFIG_NETDEVICES=y # CONFIG_DUMMY is not set # CONFIG_BONDING is not set # CONFIG_EQUALIZER is not set # CONFIG_TUN is not set

# # PHY device support # # CONFIG_PHYLIB is not set

# # Ethernet (10 or 100Mbit) # CONFIG_NET_ETHERNET=y CONFIG_MII=y # CONFIG_SUNLANCE is not set # CONFIG_HAPPYMEAL is not set # CONFIG_SUNBMAC is not set # CONFIG_SUNQE is not set # CONFIG_SMC91X is not set

# # Ethernet (1000 Mbit) # # CONFIG_MYRI_SBUS is not set

# # Ethernet (10000 Mbit) #

# # Token Ring devices #

# # Wireless LAN (non-hamradio) # # CONFIG_NET_RADIO is not set

# # Wan interfaces # # CONFIG_WAN is not set # CONFIG_PPP is not set # CONFIG_SLIP is not set # CONFIG_SHAPER is not set # CONFIG_NETCONSOLE is not set # CONFIG_NETPOLL is not set # CONFIG_NET_POLL_CONTROLLER is not set

#

76

# ISDN subsystem # # CONFIG_ISDN is not set

# # Telephony Support # # CONFIG_PHONE is not set

# # Input device support # CONFIG_INPUT=y # CONFIG_INPUT_FF_MEMLESS is not set

# # Userland interfaces # # CONFIG_INPUT_MOUSEDEV is not set # CONFIG_INPUT_JOYDEV is not set # CONFIG_INPUT_TSDEV is not set # CONFIG_INPUT_EVDEV is not set # CONFIG_INPUT_EVBUG is not set

# # Input Device Drivers # # CONFIG_INPUT_KEYBOARD is not set # CONFIG_INPUT_MOUSE is not set # CONFIG_INPUT_JOYSTICK is not set # CONFIG_INPUT_TOUCHSCREEN is not set # CONFIG_INPUT_MISC is not set

# # Hardware I/O ports # # CONFIG_SERIO is not set # CONFIG_GAMEPORT is not set

# # Character devices # CONFIG_VT=y CONFIG_VT_CONSOLE=y CONFIG_HW_CONSOLE=y # CONFIG_VT_HW_CONSOLE_BINDING is not set # CONFIG_SERIAL_NONSTANDARD is not set

# # Serial drivers #

# # Non-8250 serial port support # CONFIG_SERIAL_SUNCORE=y # CONFIG_SERIAL_SUNZILOG is not set CONFIG_SERIAL_CORE=y CONFIG_SERIAL_CORE_CONSOLE=y CONFIG_UNIX98_PTYS=y CONFIG_LEGACY_PTYS=y CONFIG_LEGACY_PTY_COUNT=256

# # IPMI # # CONFIG_IPMI_HANDLER is not set

# # Watchdog Cards #

77

# CONFIG_WATCHDOG is not set # CONFIG_HW_RANDOM is not set # CONFIG_DTLK is not set # CONFIG_R3964 is not set # CONFIG_RAW_DRIVER is not set

# # TPM devices # # CONFIG_TCG_TPM is not set

# # I2C support # # CONFIG_I2C is not set

# # SPI support # # CONFIG_SPI is not set # CONFIG_SPI_MASTER is not set

# # Dallas's 1-wire bus # # CONFIG_W1 is not set

# # Hardware Monitoring support # # CONFIG_HWMON is not set # CONFIG_HWMON_VID is not set

# # Multifunction device drivers # # CONFIG_MFD_SM501 is not set

# # Multimedia devices # # CONFIG_VIDEO_DEV is not set

# # Digital Video Broadcasting Devices # # CONFIG_DVB is not set

# # Graphics support # # CONFIG_BACKLIGHT_LCD_SUPPORT is not set # CONFIG_FB is not set

# # Console display driver support # # CONFIG_VGA_CONSOLE is not set # CONFIG_PROM_CONSOLE is not set CONFIG_DUMMY_CONSOLE=y

# # Sound # # CONFIG_SOUND is not set

# # HID Devices # CONFIG_HID=y # CONFIG_HID_DEBUG is not set

78

# # USB support # CONFIG_USB_ARCH_HAS_HCD=y # CONFIG_USB_ARCH_HAS_OHCI is not set CONFIG_USB_ARCH_HAS_EHCI=y CONFIG_USB_ARCH_HAS_UHCI=y # CONFIG_USB is not set

# # NOTE: USB_STORAGE enables SCSI, and 'SCSI disk support' #

# # USB Gadget Support # # CONFIG_USB_GADGET is not set

# # MMC/SD Card support # # CONFIG_MMC is not set

# # LED devices # # CONFIG_NEW_LEDS is not set

# # LED drivers #

# # LED Triggers #

# # InfiniBand support #

# # EDAC - error detection and reporting (RAS) (EXPERIMENTAL) #

# # Real Time Clock # # CONFIG_RTC_CLASS is not set

# # DMA Engine support # # CONFIG_DMA_ENGINE is not set

# # DMA Clients #

# # DMA Devices #

# # Auxiliary Display support #

# # Virtualization #

79

# # Misc Linux/SPARC drivers # # CONFIG_SUN_OPENPROMIO is not set # CONFIG_SUN_MOSTEK_RTC is not set # CONFIG_SUN_BPP is not set # CONFIG_SUN_VIDEOPIX is not set # CONFIG_TADPOLE_TS102_UCTRL is not set # CONFIG_SUN_JSFLASH is not set

# # Unix98 PTY support # CONFIG_UNIX98_PTY_COUNT=256

# # File systems # # CONFIG_EXT2_FS is not set # CONFIG_EXT3_FS is not set # CONFIG_EXT4DEV_FS is not set # CONFIG_REISERFS_FS is not set # CONFIG_JFS_FS is not set # CONFIG_FS_POSIX_ACL is not set # CONFIG_XFS_FS is not set # CONFIG_GFS2_FS is not set # CONFIG_OCFS2_FS is not set # CONFIG_MINIX_FS is not set CONFIG_ROMFS_FS=y CONFIG_INOTIFY=y CONFIG_INOTIFY_USER=y # CONFIG_QUOTA is not set # CONFIG_DNOTIFY is not set CONFIG_AUTOFS_FS=y CONFIG_AUTOFS4_FS=y # CONFIG_FUSE_FS is not set

# # CD-ROM/DVD Filesystems # # CONFIG_ISO9660_FS is not set # CONFIG_UDF_FS is not set

# # DOS/FAT/NT Filesystems # # CONFIG_MSDOS_FS is not set # CONFIG_VFAT_FS is not set # CONFIG_NTFS_FS is not set

# # Pseudo filesystems # CONFIG_PROC_FS=y CONFIG_PROC_KCORE=y CONFIG_PROC_SYSCTL=y CONFIG_SYSFS=y # CONFIG_TMPFS is not set # CONFIG_HUGETLB_PAGE is not set CONFIG_RAMFS=y # CONFIG_CONFIGFS_FS is not set

# # Miscellaneous filesystems # # CONFIG_ADFS_FS is not set # CONFIG_AFFS_FS is not set # CONFIG_HFS_FS is not set # CONFIG_HFSPLUS_FS is not set # CONFIG_BEFS_FS is not set # CONFIG_BFS_FS is not set

80

# CONFIG_EFS_FS is not set # CONFIG_CRAMFS is not set # CONFIG_VXFS_FS is not set # CONFIG_HPFS_FS is not set # CONFIG_QNX4FS_FS is not set # CONFIG_SYSV_FS is not set # CONFIG_UFS_FS is not set

# # Network File Systems # CONFIG_NFS_FS=y CONFIG_NFS_V3=y # CONFIG_NFS_V3_ACL is not set # CONFIG_NFS_V4 is not set # CONFIG_NFS_DIRECTIO is not set # CONFIG_NFSD is not set # CONFIG_ROOT_NFS is not set CONFIG_LOCKD=y CONFIG_LOCKD_V4=y CONFIG_NFS_COMMON=y CONFIG_SUNRPC=y # CONFIG_RPCSEC_GSS_KRB5 is not set # CONFIG_RPCSEC_GSS_SPKM3 is not set # CONFIG_SMB_FS is not set # CONFIG_CIFS is not set # CONFIG_NCP_FS is not set # CONFIG_CODA_FS is not set # CONFIG_AFS_FS is not set # CONFIG_9P_FS is not set

# # Partition Types # # CONFIG_PARTITION_ADVANCED is not set CONFIG_MSDOS_PARTITION=y CONFIG_SUN_PARTITION=y

# # Native Language Support # # CONFIG_NLS is not set

# # Distributed Lock Manager # # CONFIG_DLM is not set

# # Instrumentation Support # # CONFIG_PROFILING is not set

# # Kernel hacking # # CONFIG_PRINTK_TIME is not set CONFIG_ENABLE_MUST_CHECK=y # CONFIG_MAGIC_SYSRQ is not set # CONFIG_UNUSED_SYMBOLS is not set # CONFIG_DEBUG_FS is not set # CONFIG_HEADERS_CHECK is not set CONFIG_DEBUG_KERNEL=y CONFIG_LOG_BUF_SHIFT=14 CONFIG_DETECT_SOFTLOCKUP=y # CONFIG_SCHEDSTATS is not set # CONFIG_TIMER_STATS is not set # CONFIG_DEBUG_SLAB is not set # CONFIG_DEBUG_RT_MUTEXES is not set # CONFIG_RT_MUTEX_TESTER is not set # CONFIG_DEBUG_SPINLOCK is not set

81

# CONFIG_DEBUG_MUTEXES is not set # CONFIG_DEBUG_SPINLOCK_SLEEP is not set # CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set # CONFIG_DEBUG_KOBJECT is not set # CONFIG_DEBUG_HIGHMEM is not set # CONFIG_DEBUG_BUGVERBOSE is not set CONFIG_DEBUG_INFO=y # CONFIG_DEBUG_VM is not set # CONFIG_DEBUG_LIST is not set CONFIG_FORCED_INLINING=y # CONFIG_RCU_TORTURE_TEST is not set # CONFIG_FAULT_INJECTION is not set # CONFIG_DEBUG_STACK_USAGE is not set

# # Security options # # CONFIG_KEYS is not set # CONFIG_SECURITY is not set

# # Cryptographic options # # CONFIG_CRYPTO is not set

# # Library routines # CONFIG_BITREVERSE=y # CONFIG_CRC_CCITT is not set # CONFIG_CRC16 is not set CONFIG_CRC32=y # CONFIG_LIBCRC32C is not set CONFIG_PLIST=y CONFIG_HAS_IOMEM=y CONFIG_HAS_IOPORT=y

82

Bibliography

About OpenSPARC . (2012, April 8). Retrieved from OpenSPARC: http://www.opensparc.net/about.html

Aeroflex Gaisler. (2008, May 14). LEON3/GRLIB Product Brief. Retrieved from Aeroflex Gaisler Web site: http://www.gaisler.com/doc/Leon3%20Grlib%20folder.pdf

Aeroflex Gaisler. (2011, June). GRLIB IP Core User’s Manual, Version 1.1.0 - B4108. Retrieved from Aeroflex Gaisler Web site: http://www.gaisler.com/products/grlib/grip.pdf

Altera Corporation. (2011, May). Cyclone V Device Family Advance Information Brief. Retrieved from Altera Web site: http://www.altera.com/literature/hb/cyclone- v/cyv_51001.pdf?GSA_pos=9&WT.oss_r=1&WT.oss=partial%20reconfig

Altera Corporation. (2012). Stratic V FPGAs: Ultimate Flexibility Through Partial and Dynamic Reconfiguration . Retrieved March 30, 2012, from www.altera.com: http://www.altera.com/devices/fpga/stratix-fpgas/stratix-v/overview/partial- reconfiguration/stxv-part-reconfig.html

Bobda, C. (2007). Introduction to Reconfigurable Computing - Architectures, Algorithms, and Applications. Dordrecht, The Netherlands: Springer.

Craven, S., Patterson, C., & Athanas, P. (2005, September 7-9). Configurable Soft Processor Arrays Using the OpenFire Processor. Unpublished paper presented at 2005 MAPLD International Conference . Washington, D.C.

Dye, D. (2011, July 6). Partial Reconfiguration of Xilinx FPGAs Using ISE Design Suite. Retrieved from Xilinx Web site: http://www.xilinx.com/support/documentation/white_papers/wp374_Partial_Reco nfig_Xilinx_FPGAs.pdf

Freescale, Inc. (2008, June). End-use Applications for Multicore Processors - Freescale QorIQ Communications Platform P1, P2 and P4 Series. Retrieved April 2, 2012, from Freescale Web site: http://cache.freescale.com/files/32bit/doc/white_paper/QORIQ_ENDUSE_APPLI CATIONS.pdf

Ganssle, J. (2008, February). Is multicore hype or reality? Embedded System Design , pp. 47-52.

Guccione, S., Levi, D., & Sundararajan, P. (1999). JBits: based interface for reconfigurable computing. Proceedings of 2nd Annual Military and Aerospace Applications for Programmable Devices and Technologies. Laurel, Maryland.

83

Hauck, S., & DeHon, A. (2008). Reconfigurable Computing: The Theory and Practice of FPGA-Based Computing. Burlington, MA: Morgan Kaufmann Publishers.

Hoffman, J. C., & Pattichis, M. S. (2011). A High Speed Dynamic Partial Reconfiguration Controller Using Direct Memory Access Through A Multi-Port Memory Controller and Over-clocking with Active Feedback. International Journal of Reconfigurable Computing , p1-10.

Khalaf, M., & Jagtiani, A. (2011, June). Making Hardware more like Software. Embedded Systems Design , pp. 22-27B.

Kleidermacher, D. (2008, January). Is symmetric multiprocessing for you? Embedded System Design , pp. 28-32.

Kwok, T. T.-O., & Kwok, Y.-K. (2008). On the Design, Control, and Use of a Reconfigurable Heterogeneous Multi-Core. IEEE International Symposium on Parallel and Distributed Processing, IPDPS (pp. 1-11). Miami, Florida: IEEE.

Maxfield, C. (2006, February 21). The state-of-play in multi-processor and reconfigurable computing. Retrieved February 23, 2006, from Programmable Logic DesignLine: http://eetimes.com/design/programmable-logic/4014810

McQueen, S. R. (2003). Basic DES block cypher IP core . Retrieved from OpenCores.org: http://www.opencores.org/

Morris, K. (2005, July 19). Actel Adds Analog . Retrieved March 13, 2012, from Electronic Engineering Journal: http://www.eejournal.com/archives/articles/20050719_actel/

National Instruments Corp. (2008, June 12). Advantages of the Xilinx Virtex-5 FPGA. Retrieved from National Instruments Web Site: http://zone.ni.com/devzone/cda/tut/p/id/7440#toc1

OpenCores: Mission . (n.d.). Retrieved from OpenCores Web site: http://opencores.org/opencores,mission

Oracle Corp. (n.d.). About OpenSPARC . Retrieved April 8, 2012, from OpenSPARC Web site: http://www.opensparc.net/about.html

OSI Mission . (n.d.). Retrieved from Open Source Initiative Web site: http://www.opensource.org/

Papachristou, C., Wolff, F., & Ewing, R. (2005, September 7-9). Reconfigurable and Evolvable Hardware Fabric. Unpublished paper presented at 2005 MAPLD International Conference . Washington, D.C.

Parulkar, I., Wood, A., Hoe, J. C., Falsafi, B., Adve, S. V., Torrellas, J., & Mitra, S. (2008, April). OpenSPARC: An Open Platform for Hardware Reliability

84

Experimentation. published in the Fourth Workshop on Silicon Errors in Logic- System Effects (SELSE). Austin.

Serres, O., Narayana, V. K., & El-Ghazawi, T. (2011). An Architecture for Reconfigurable Multi-core Explorations. 2011 International Conference on Reconfigurable Computing and FPGAs (pp. 105-110). Cancun, Mexico: IEEE.

Snowden, T. (2006, May 15). Xilinx Unveils 65nm Virtex-5 Family. Retrieved from Xilinx Press Releases: http://www.xilinx.com/prs_rls/2006/silicon_vir/0657v5family.htm

Soft CPU Cores for FPGA. (2009, January 24). Retrieved March 13, 2012, from CORE Technologies: http://www.1-core.com/library/digital/soft-cpu-cores/

Somervill, K. (2008, September 15-18). Lunar Applications in Reconfigurable Computing. Unpublished paper presented at 2008 MAPLD Conference , 14. Annapolis, MD.

Sun, Xilinx Partner on OpenSPARC Evaluation Platform . (2008, September 8). Retrieved from HPCwire Web site: http://www.hpcwire.com/hpcwire/2008-09- 08/sun_xilinx_partner_on_opensparc_evaluation_platform.html

Tong, J. G., Anderson, I. D., & Khalid, M. A. (2006). Soft-Core Processors for Embedded Systems. The 18th International Conference on Microelectronics (ICM) (pp. 170-173). Dhahran, Saudi Arabia: IEEE.

Turley, J. (2009, September). Gaming the system -- high-end networking on the Cell processor. Retrieved March 31, 2012, from Embedded System Design: http://www.eetimes.com/design/other/4027510/

Virtex-5 OpenSPARC Evaluation Platform . (n.d.). Retrieved from Digilent Web site: http://www.digilentinc.com/Products/Detail.cfm?NavPath=2,400,795&Prod=XU PV5

Xilinx Inc. (2009, February 6). Virtex-5 Family Overview. Retrieved from Xilinx Web site: http://www.xilinx.com/support/documentation/data_sheets/ds100.pdf

Xilinx, Inc. (2010, August 20). Virtex-5 FPGA Configuration User Guide. Retrieved from Xilinx Web site: http://www.xilinx.com/support/documentation/user_guides/ug191.pdf

85