<<

LECTURE-5 Features of Zynq

MicroBlaze

MicroBlaze is the principal soft processor type and is supported within both the ISE and Vivado design flows, including the most recent releases. Multiple MicroBlaze instances can be deployed on a single device, if desired. There is no implied licensing cost of using MicroBlaze processors in system designs, commercially or otherwise.

One of the benefits of using a soft-processor is that its configuration is flexible. The MicroBlaze has a number of different architectural options which can be included or excluded from the processor implementation depending on the requirements of the target application.

For example, the FPU can be excluded if the system does not call for floating point computation, thus reducing the footprint of the processor implementation on the FPGA (i.e. the amount of resources it requires). In a more general sense, the configuration of the MicroBlaze can be customised to optimise for operating frequency, performance, or area; alternatively, it can be specified such that a suitable balance between these three metrics is achieved. This is done very easily in Vivado using a configuration wizard.

MicroBlaze resource utilisation varies with configuration, starting at approximately 900 LUTs, 700 FFs, and 2 Block RAMs for the ‘minimum area’ option, and rising to about 3800 LUTs, 3200 FFs, 6 DSP48E1s and 21 Block RAMs for the ‘maximum performance’ configuration.

1

Figure 5-1 Floorplans of ‘minimum area’ (top) and ‘maximum performance’ (bottom) MicroBlaze soft processor implementation

The closest MicroBlaze equivalent to the (dual-core) ARM Cortex-A9 processor would comprise two MicroBlaze instances.

Processing Performance The maximum frequency attainable by MicroBlaze is dependent on its configuration, which is customizable, and also other factors such as placement and routing on the PL. To provide a rough indication, a typical MicroBlaze configuration might achieve about 70% of the maximum frequency of the PL, which equates to, at most, two or three hundred MHz — this compares to the ARM processor’s maximum operating frequency of 800MHz to 1GHz.

2

In order to quantify the performances of the ARM Cortex-A9 and MicroBlaze, and thus compare them, two widely used benchmarks can be used:

• DMIPs (Dhrystone Millions of ) The quantity DMIPs expresses the number of operations achieved per second undertaken by the processor when running the Dhrystone standard test application. Dhrystone is a synthetic application (i.e. it is not representative of real work), specifically designed to exercise the processor with a representative set of processor operations. •CoreMark score CoreMark establishes a simple numerical ‘score’ for processor performance, which can be directly compared with the scores of other processors. The CoreMark application serves the same purpose as Dhrystone, but its content is tailored to execute a set of operations truer to typical embedded processor usage.

For a range of reasons, CoreMark is generally considered to be a more robust and realistic benchmark than the older Dhrystone method, and indeed ARM recommends the use of CoreMark.

ARM processor offers approximately a 20 times greater processing performance than a single MicroBlaze core.

3

CoreMark figures can also be obtained to compare the Zynq ARM processor with MicroBlaze.

The figures show a large difference between the capability of the ARM Cortex-A9 processor on Zynq, and that of MicroBlaze. Other Features and Factors There are several important differences between the MicroBlaze and ARM Cortex-A9 processors. Among them: the MicroBlaze is a single core processor compared to ARM’s dual-core; the ARM has a richer instruction set than the MicroBlaze; the MicroBlaze FPU implements only single precision floating point, whereas the ARM also supports double precision; and the configuration of the MicroBlaze provides a single level cache, whereas the ARM has a two-level cache with greater capacity. These architectural and functional differences help to account for the difference in performance between the two processor types.

MicroBlaze System

In 2012, a lightweight version of the MicroBlaze was introduced: the MicroBlaze Micro Controller System (MCS). It was designed for controller applications, and has a fixed architecture including an area-optimised MicroBlaze processor, combined with data and program memory and a standard set of peripherals.

A rough cost is 550 - 700 LUTs and 300 - 600 FFs, or more if debug features are included.

4

PicoBlaze

The PicoBlaze is a microcontroller rather than a processor (i.e. it comprises other facilities besides the processing element, and supports a limited but useful set of operations).

However, it is worth including PicoBlaze here for completeness, and to establish how it differs from the similarly named MicroBlaze. This 8-bit soft microcontroller IP has a very small footprint (a few tens of slices plus program memory) and is capable of implementing finite state machines and other simple control functionality.

The designs for PicoBlaze can be obtained as a download directly from the Xilinx website, and the fileset includes VHDL and for the core PicoBlaze controller, together with optional functionality such as UART and SPI interfacing.

Given that it is an 8-bit controller, PicoBlaze functionality is limited, and incomparable to that of a Zynq ARM processor. However, a PicoBlaze instance can run at over 200MHz in Kintex-7 logic fabric, in most cases as fast as the logic it may be controlling.

ARM Cortex-M1

ARM offers a ‘soft-core’ microcontroller, the ARM Cortex-M1, which is optimized for FPGA implementation. Therefore in Zynq, this core would be implemented in the PL section of the device to complement the processing undertaken by the ARM Cortex-A9. Like the MicroBlaze, the configuration of the Cortex-M1 can be specified according to user requirements, meaning that the logic resources required to implement the core may be minimized Other Processor Types There are some other FPGA embedded processors which it is useful to be aware of, and these can be categorized as soft and hard processors.

Soft Processors MicroBlaze is the most prevalent soft processor in Xilinx FPGA and SoC designs, due to both the integrated and extensive support provided for it, and its excellent implementation and performance

5 characteristics. However, it is not the only soft processor available, and third party processor IP is available as an alternative, or to cater for nice applications.

Example third party processors include LEON4, OpenSparc and OpenRISC. OpenRISC is a collaborative open source project hosted by OpenCores. OpenSparc is an open-source, 64-bit Reduced Instriction Set Computer (RISC) processor developed by Sun MicroSystems Hard Processors IBM PowerPC®, which was included as a hard processor in the Virtex-II Pro (released in the 2002 ) and subsequently in a subset of Virtex-4 and Virtex-5 FPGAs. Each of these FPGAs includes either one or two PowerPC (PPC) units.

Figure 5-2 Performance comparison of hard and soft processor options.

6

Figure 5-3 Performance comparison of hard and soft processor options. Exploiting the Zynq Architecture and Design Flow One particularly powerful aspect of the Vivado design flow is its tool for high-level synthesis, Vivado HLS, which allows hardware (destined for implementation on the PL) to be generated from a C-based software description. Changing the realization of system elements represents a different hardware/software partitioning, which may achieve performance or implementation benefits. For example, in Figure 5-4, the functional element F4 has been moved from software to hardware implementation (potentially via the use of HLS), and the element F1 has been shifted from a hardware implementation to a software routine. The adapted system architecture may be found to facilitate increased data throughput, for example.

Figure 5-4

7

Dynamic Partial Reconfiguration (DPR) The technique of DPR involves designating a region (or regions) of the PL for reconfiguration during run time. These areas are referred to as Reconfigurable Partitions (RPs), and their functionality can be completely altered while the rest of the PL continues to operate. Importantly, reconfiguration of an RP is achieved without affecting any other part of the PL. There may be multiple RPs on a Zynq or FPGA device, and each RP has a set of Reconfigurable Modules (RMs) associated with it. Here we will focus on a single RP for the benefit of clarity.

An RP may have any arbitrary number of corresponding RMs available, but only one of them occupies the RP at any given time, and hence, functionality is time-multiplexed onto a specific part of the PL. This concept is demonstrated in Figure 5-5.

Figure 5-5 DPR Application Example

It is assumed that the SDR comprises four functional blocks: coding, modulation, a transform, and a digital upconverter. We assume that the last of these components is common to all variations on the architecture, while there are different RMs for each of the others. The appropriate coding RM, modulation RM, and transform RM are chosen at run time and coordinated by the PS.

8

Figure 5-6

9