Microarchitecture

CORE Metadata, citation and similar papers at core.ac.uk

Provided by Springer - Publisher Connector Chapter 7

Addressing Application Bottlenecks: Microarchitecture

Microarchitectural performance tuning is one of the most difficult parts of the performance tuning process. In contrast to other tuning activities, it is not immediately clear what the bottlenecks are. Usually, discovering them requires studying processor manuals, which provide the details of the execution flow. Furthermore, a certain understanding of assembly language is needed to reflect the findings back onto the original source code. Each processor model will also have its own microarchitectural characteristics that have to be considered when writing efficient software. In this chapter, we outline some of the general design principles of modern processors that will allow you to understand the do’s and don’ts of diagnosing bottlenecks and to exploit tools to extract the required information that will allow you to tune software on the microarchitectural level.

Overview of a Modern Processor Pipeline Let’s discuss the most important features of a current CPU core in these beginning few pages. Features that you will find in most cores are pipelining, out-of-order and superscalar execution, data-parallel (SIMD) execution, and branch prediction. We will associate these features with the Intel Xeon E5-2600 series processor (code name Sandy Bridge). To begin, let’s briefly define some important terms that we will use extensively in this chapter: • Register: A small, but very fast storage directly connected to the processor core. Since registers must be able to hold memory addresses, they need to be as big as the address space. A 64-bit architecture has 64-bit registers. And there might be additional, larger registers that we will deal with later.

201 Chapter 7 ■ Addressing Application Bottlenecks: Microarchitecture

• Assembly language: The one-to-one translation of the machine instructions that the CPU is able to understand. On a Linux system, you can see the assembly language for an object file with the objdump tool. For example,

$ objdump -M intel intel-mnemonic -d