Pipeline Behavior Prediction for Superscalar Processors by Abstract Interpretation

Pipeline Behavior Prediction for Superscalar Processors by Abstract Interpretation Jijrn Schneider* Christian Ferdinand Universitat des Saarlandes AbsInt Angewandte Informatik GmbH Fachbereich Informatik Universitat des Saarlandes Postfach 15 11 50 Starterzentrum - Gebaude 45 D-66041 Saarbriicken, Germany D-66123 Saarbriicken, Germany Phone: +49 681 302 3434 Phone: +49 681 8318317 Fax: +49 681 302 3065 Fax: +49 681 8318320 jsQcs.uni-sb.de [email protected] http://www.cs.uni-sb.de/wjs http://www.AbsInt.de Abstract combinations of input values must be considered. We use static program analysis to compute the WCET of For real time systems not only the logical function is programs. Our method is based on abstract interpreta- important but also the timing behavior, e. g. hard real tion. time systems must react inside their deadlines. To guar- To obtain sharp bounds for the WCET it is necessary antee this it is necessary to know upper bounds for the to consider the features of the target processors. Mod- worst case execution times (WCETs). The accuracy ern processors gain speed through: The use of cache of the prediction of WCETs depends strongly on the memories and pipelines, and the parallel execution of ability to model the features of the target processor. instructions. Cache memories, pipelines and parallel functional In this paper the prediction of the behavior of pipe- units are architectural components which are respon- lines combined with parallel execution on a superscalar sible for the speed gain of modern processors. It is not processor is presented. Initial results of a prototypical trivial to determine their influence when predicting the implementation for the SuperSPARC I processor [18] worst case execution time of programs. are presented. This paper describes a method to predict the behavior of pipelined superscalar processors and reports 2 Super-scalar Pipelines initial results of a prototypical implementation for the SuperSPARC I processor. 2.1 Principle of Pipelines The idea of pipelining is to overlap the execution of in- 1 Introduction structions. This is accomplished by dividing the execu- The correctness of a hard real-time system depends not tion of instructions into several steps (pipeline stages). only on the logical functionality, the programs must also The ideal case of full overlapped instructions can not satisfy the temporal requirements dictated by the phys- always be reached. Pipeline bubbles can occur in real ical environment. To analyze the schedulability of a pipelines, when the pipeline stalls. Situations, which given set of tasks, upper bounds for the worst case exe- can lead to pipeline stalls are called pip&e hazards. cution time (WCET) of the tasks are required. In gen- There are three types of hazards. Structural hazards are eral it is not possible to determine these upper bounds caused by resource conflicts, data hazards are caused by measurements or simulation, because all possible by data dependences, and control hazards are caused by control flow changes (e.g. branches). *Partly supported by DFG (German Reskarch Foundation), Trans- ferbereich 14 2.2 Superscalar Execution Permission to make digital or hard copies of 811 or part of this work for personal or classroom use is granted without fee provided that A measure for the throughput of a processor is the CPI copies are not made or distributed for profit or commercial advan- tage and that copies bear this notice and the full citation on the first page. value.(Cycles Per Instruction). To copy otherwise, to republish, to post on servers or to With the concept of pipelining it is possible to reach redistribute to lists, requires prior specific permission and/or e fee. LCTES ‘99 5/99 Atlanta, GA, USA at best a CPI value of 1. To increase the throughput, 0 1999 ACM l-56113~136s4/99/0006...$5.00 35 i. e. decreasethe CPI value further it is necessary to in- troduce parallel execution. The parallelization can be Decode of Floating point operations done statically or dynamically. The static approach is +-I-= used in e. g. VLIW (Very Large Instruction Word) pro- FRD Read of FP registers cessors [8], where the compiler is responsible for select- ing the instructions to execute simultaneously. In su- FE Execution of FP instructions perscalar processors the dynamic approach is used, i. e. the processor selects the instructions to be processed FL Rounding and normalization of results concurrently during run time. FWB Write back of floating point results Stage Tasks FO l Fetch of up to 4 instr. in case of cache hit, or up to 8 instr. from memory Table 2: Stages of the floating point pipeline. Fl l Filling of the prefetch queue data dependences, resource conflicts, and control flow changes. The data dependences and the resource con- DO l Selection of instructions from prefetch flicts can occur between the available instructions and queue to form a group of concurrent between these and already grouped instructions. executable instructions The SuperSPARC I has a main or integer pipeline and a floating point pipeline. The pipeline stages of Dl l Assignment of functional units the main and floating point pipeline are displayed in and register ports to instructions Table 1 and Table 2. b Computation of branch destination addr. The processing of a floating point instruction starts l Reading of address registers in the first part of the main pipeline and continues in for memory accesses the floating point pipeline. The transfer from the integer pipeline to the floating point pipeline is done in EO D2 l Reading of operand registers CFD). l Computation of virtual ‘The execution time of some floating point instruc- addressesfor memory references tions depends on their operands. If the operands or the result are subnormal values the computation takes EO l Execution of computations in longer (see [S]). Arithmetic Logic Unit 1 (ALUl) l Start of data cache accesses 3 Pipeline Analysis by Abstract Interpretation 0 Transfer of floating point instructions to the FPU Abstract interpretation is a well developed theory of static program analysis [3]. It is semantics based, El l Computations, that depend on thus supporting correctness proofs of program analy- instruction in same group are ses. Abstract interpretation amounts to performing a executed in the cascaded ALU2 program’s computations using value descriptions or ab- l Completion of data cache accesses stract values in place of concrete values. One reason for using abstract values instead of con- WB l Write back of integer results crete ones is computability: to ensure that analysis results are obtained in finite time. Another is to obtain Table 1: Stages of the integer pipeline. results that describe the result of computations on a set of possible (e.g., all) inputs. The behavior of a program (including its pipeline behavior) is given by the (formal) semantics of the pro- 2.3 Example: SuperSPARC I gram. To predict the pipeline behavior of a program, The SuperSPARC I is a superscalar RISC processor, its “collecting semantic@ approximated. The collect- compatible with the SPARC version 8 architecture [19]. ing semantics gives the set of all program (pipeline) It is capable of grouping up to three instructions to ex- states for a given program point. Such information ecute them concurrently. The grouping of instructions combined with a path analysis can be used to derive is a dynamic process. The decision which instructions WCET-bounds of programs. This approach has suc- are grouped together depends on the availability of in- cessfully been applied to predict the cache behavior of structions in the prefetch queue or the cache, and on ‘In [3], the term “static semantics” is used for this. 36 programs [4, 6, 201. processor that can execute a group of up to N instruc- The approach works as follows: in a first step, the tions concurrently. “concrete pipeline semantics” of programs is defined. For superscalar processors it is not sufficient to build The concrete pipeline semantics is a simplified seman- a kind of static reservation table, since the assignment tics that describes only the interesting aspects of the of resources to pipeline stages of instructions can change pipeline behavior but ignores other details of execution, dynamically during the grouping process. Additionally e. g. register values, results of computations, etc. In this the state of some resources must be provided. way each real pipeline state is represented by a concrete pipeline state. In the next step, we define the “abstract 4.1 Concrete Pipeline Semantics pipeline semantics” that “collects” all occurring con- Definition 4.1 (resource association) crete pipeline states for each program point. Let R = {q , . , T,} be the set of resource types and From the construction of the abstract semantics fol- resources of the processor. Let PS be the set of pipe- lows that all concrete pipeline states that are included line stages. A pair (s, {rj, , . , rj,, }) with s E PS in the collecting semantics for a given program point are and rj, 7. 7rj,, E R is a resource association. R = also included in the abstract semantics [4]. An abstract (PS x 2R) denotes the set of all resource associations. pipeline state at a program point may contain concrete 0 pipeline states that cannot occur at this point, due to infeasible paths. This can reduce the precision of the Definition 4.2 (resource association sequence) analysis but doesn’t affect the correctness (see [4]). A sequence F E R = R* is a resource association se- The computation of the abstract semantics has been quence. Let “.” be the concatenation operator for re- implemented with the help of the program analyzer gen- source association sequences. 0 erator PAG [14], which allows to generate a program analyzer from a description of the abstract domain (here, Definition 4.3 sets of concrete pipeline states) and of the abstract se- A resource demand sequence is a resource association semantic functions.

Pipeline Behavior Prediction for Superscalar Processors by Abstract Interpretation

Memory Profiling on Shared-Memory Multiprocessors

Microsparc-II-Usersm

The Supersparc Microprocessor

Deliveringperformanceonsun:Optimizing Applicationsforsolaris

Sun-4 Handbook - Home Page

Table of Contents

Computer Architectures an Overview

Numerical Computation Guide

An Analysis of the Strategic Management of Technology in the Context of the Organizational Life-Cycle

MICROPROCESSOR EVALUATIONS for SAFETY-CRITICAL, REAL-TIME February 2009 APPLICATIONS: AUTHORITY for EXPENDITURE NO

MIPS, HPS, Two-Level Branch Prediction, and Compressed Code RISC Processor

A Method for System Performance Analysis of the Supersparc Microprocessor by Melissa C Kwok