Optimizing Linux* for Dual-Core AMD Opteron™ Processors Optimizing Linux for Dual-Core AMD Opteron Processors

Technical White Paper www.novell.com DATA CENTER Optimizing Linux* for Dual-Core AMD Opteron™ Processors Optimizing Linux for Dual-Core AMD Opteron Processors Table of Contents: 2 . SUSE Linux Enterprise and the 7 . How Can Applications Be Tuned AMD Opteron Platform for Dual-core? 2 . Dual-processor, Dual-core and 7 . What Tuning Is Possible to Control Multi-core Definitions Performance on a System Using Dual-core CPUs? 2 . Dual-core Background 7 . How Does the Linux 2.6 CPU 2 . AMD Specific Scheduler Handle Dual-core CPUs? . NUMA Background 3 8 . What User Mechanisms, Commands or System Calls (e.g., , 3 . NUMA Memory Access and taskset , etc.) Are Available to Allocation in Linux Basics numactl Allow Application Control on a 3 . What Are Characteristics of System with Dual-core CPUs? Workloads that Would Be Helped . How Does Dual-core Interact with by Using Dual-core CPUs? 9 AMD PowerNow! Technology? 4 . What Class of Parallel Workloads . How Does Dual-core Interact with Can Specifically Take Advantage 9 I/O and Async I/O? of Dual-core Systems? 9 . Should Hyper-threading Be Disabled 4 . How Can I Tell If My Workload or on Dual-core? Application Would Benefit? 10 . Conclusion 4 . What Tools Exist to Tell Me Whether My Application Would Benefit? 11 . Glossary p. 1 SUSE® Linux Enterprise and the AMD Opteron™ Platform Dual Core technologies are the latest evolu- core processor appears much the same as tion in processor design. Novell® enterprise a traditional multiprocessor system with two products are ready to take advantage of the single core chips (symmetric multiprocess- state of the art AMD64 dual-core technology. ing [SMP]), and all the basic multiprocessor Read this white paper to learn how to optimize optimization techniques apply. For the most your Linux applications with SUSE® Linux on part, single-core SMP-optimized applications the AMD Opteron™ platform. will also be optimized for dual-core. Dual-processor, Dual-core and Dual-core differs from single-core SMP Multi-core Definitions because memory access occurs per chip. This means that dual-core systems will have Dual-processor (DP) systems are those that contain two separate physical computer lower memory latency but also lower memory processors in the same chassis. In dual- bandwidth. Each core on a dual-core AMD processor systems, the two processors can Opteron processor will have a 128 KB L1 either be located on the same motherboard cache (256 KB/chip) and a 1 MB L2 cache or on separate boards. (2 MB/chip), but future versions may also provide a shared L3 cache, further improving Native dual-core technology refers to a performance. As the technology matures, CPU (central processing unit) that includes operating systems might adopt process- two complete execution cores per physical scheduling algorithms that can take advantage processor. It combines two processors and of these differences to improve performance their caches and cache controllers onto a for each multiple-processor model. single integrated circuit (silicon chip). It is basically two processors residing side by Complete optimization for the dual-core side on the same die. processor requires both the operating system and applications running on the computer A dual-core processor has many advantages, to support a technology called thread-level especially for those looking to boost their parallelism (TLP). TLP is the part of the OS system’s multitasking computing power. or application that runs multiple threads Since each core has its own cache, the simultaneously, where threads refer to the operating system has sufficient resources part of a program that can execute indepen- to handle intensive tasks in parallel, which dently of other parts. noticeably improves multitasking. Dual-core technology is quite often referred to as multi- AMD Specific core technology, yet multi-core can also Things are never quite as simple as they seem. mean more than two separate processors As a first order approximation, a system with (i.e., quad-core). two cores on one die will behave similarly to Dual-core Background a system with two separate processors. As said before, a dual-core processor is However, to get the most performance out comprised of two independent execution of the hardware, software might have to be cores. From a software perspective, a dual- aware of the the location of each core. p. 2 Optimizing Linux for Dual-Core AMD Opteron Processors www.novell.com First touch means memory is allocated from AMD Opteron processors with direct connect the local memory of the CPU that first architecture, being a Non-Uniform Memory touched that memory. Access, or NUMA, system architecture, have the concept of a node, and local A thread which calls malloc() to allocate and remote memory. Each AMD Opteron memory will not actually cause physical mem- processor is a node, and has local memory ory pages to be allocated, only reserved. that it can access with the greatest speed. Physical memory is only allocated when the The local memory of one node is termed program reads from or writes to that newly remote memory when accessed from another malloc()-ed data. This may occur from a node. NUMA architectures allow software different thread or CPU than the one that to transparently access local and remote called malloc()! memory; however, performance will be greatest when remote memory accesses Once memory is allocated to a node, it will are minimized. not move even if the program changes to another CPU and all subsequent accesses The direct connect architecture of the AMD to that memory are ‘remote’. For this reason Opteron processor implements an indepen- it can be important for performance for thread dent memory controller on each node (chip), and process migration between nodes to be whether it be single- or dual-core. This means minimized, and for memory placement to be that both cores of a dual-core chip share carefully considered and tuned. memory resources and have the same local memory, while two single-core AMD What Are Characteristics Opteron processors each have their own of Workloads that Would memory resources. Be Helped by Using Dual- core CPUs? NUMA Background For a workload to benefit from using NUMA technology is a scalable memory dual-core CPUs, it must first and foremost architecture for multiprocessor systems be able to run in parallel (parallelizable). that has long been used by the most power- This is a fundamental requirement of any ful servers and supercomputers. A NUMA multiprocessor system. API for Linux (www.novell.com/collateral/ 4621437/4621437.pdf) has a good introduc- Any workload can basically be divided into a tion to NUMA technology and terminology. serial component and a parallel component This paper also details the interfaces and such that the parallel component may be methods used to fine tune NUMA memory executed and run by any number of proces- control on Linux, and will be referenced in sors while the serial component can only be regard to this subject later in this document. run on a single processor. A workload is said to be able to run in parallel (parallelizable) if NUMA Memory Access and the parallel component dominates. A workload Allocation in Linux Basics that can only run one processor has a zero- sized parallel component and is called serial. Memory management of NUMA systems on Linux comprises a number of policies. A workload for a given system might be parallelized in one of two ways: either it is The default policy for program heap memory running an application that is specifically (also called anonymous memory: e.g., written to take advantage of a multiprocessor malloc() memory) is so called first touch. system, or it is running multiple independent/ p. 3 semi-independent applications that may or workload is already taking good advantage of may not be individually parallelized. multiple processors, it will likely benefit from dual-core chips. However if the workload is What Class of Parallel very memory intensive, this test won’t be Workloads Can Specifically very indicative. Take Advantage of Dual-core Systems? If your application or workload has inherent limits in its parallelization (for example, if only Dual-core systems increase some specific four processes are run at once or if there is types of computing resources in a system a hot lock hindering scalability), then adding versus single-core chips. Workloads where more cores may not help even if your appli- the computing resource causing a bottle- cation is not at all memory intensive. neck is one of those increased by dual-core chips will benefit from dual-core systems. What Tools Exist to Tell Me Whether My Application Remember that both cores of a Dual-Core Would Benefit? AMD Opteron processor share the same memory controller. Dual-core chips will On Linux, some basic monitoring programs double the basic CPU processing capability like vmstat and top can be used to see of the system, and also double the cache whether your system is compute bound. size. They do not introduce more memory If idle plus iowait combined (or just idle, subsystem resources (memory bandwidth). if there is no iowait field) is approaching 0 Thus, the main applications that will benefit for some or most of the time, then the system from dual-core processors are those that could be compute bound.’ are compute bound and have spare memory These programs can also indicate the bandwidth available for the second core. number of runnable (executable) processes How Can I Tell If My Workload in the system. For multiple cores to be of or Application Would Benefit? use, there need to be processes available to run on them.

Optimizing Linux* for Dual-Core AMD Opteron™ Processors Optimizing Linux for Dual-Core AMD Opteron Processors

AMD Athlonxp/Athlon/Duron Processor Based MAIN BOARD

Motherboard Gigabyte Ga-A75n-Usb3

AMD Ryzen™ PRO & Athlon™ PRO Processors Quick Reference Guide

SQL Server 2005 on the IBM Eserver Xseries 460 Enterprise Server

AMD's Early Processor Lines, up to the Hammer Family (Families K8

Processor Check Utility for 64-Bit Compatibility

Support for NUMA Hardware in Helenos

A Novel Hard-Soft Processor Affinity Scheduling for Multicore Architecture Using Multiagents

Agent Based Load Balancing Scheme Using Affinity Processor Scheduling for Multicore Architectures

AMD Athlon Processor Model 4 Data Sheet

AMD Athlon 64 FX Processor Product Data Sheet

Ideapad 3 15ADA05 Reference