Pipeline and Vector Processing

Computer Organization and Architecture Chapter 4 : Pipeline and Vector processing Chapter – 4 Pipeline and Vector Processing 4.1 Pipelining Pipelining is a technique of decomposing a sequential process into suboperations, with each subprocess being executed in a special dedicated segment that operates concurrently with all other segments. The overlapping of computation is made possible by associating a register with each segment in the pipeline. The registers provide isolation between each segment so that each can operate on distinct data simultaneously. Perhaps the simplest way of viewing the pipeline structure is to imagine that each segment consists of an input register followed by a combinational circuit. o The register holds the data. o The combinational circuit performs the suboperation in the particular segment. A clock is applied to all registers after enough time has elapsed to perform all segment activity. The pipeline organization will be demonstrated by means of a simple example. o To perform the combined multiply and add operations with a stream of numbers Ai * Bi + Ci for i = 1, 2, 3, …, 7 Each suboperation is to be implemented in a segment within a pipeline. R1 Ai, R2 Bi Input Ai and Bi R3 R1 * R2, R4 Ci Multiply and input Ci R5 R3 + R4 Add Ci to product Each segment has one or two registers and a combinational circuit as shown in Fig. 9-2. The five registers are loaded with new data every clock pulse. The effect of each clock is shown in Table 4-1. Compiled By: Er. Hari Aryal [[email protected]] Reference: W. Stallings | 1 Computer Organization and Architecture Chapter 4 : Pipeline and Vector processing Fig 4-1: Example of pipeline processing Table 4-1: Content of Registers in Pipeline Example General Considerations Any operation that can be decomposed into a sequence of suboperations of about the same complexity can be implemented by a pipeline processor. The general structure of a four-segment pipeline is illustrated in Fig. 4-2. We define a task as the total operation performed going through all the segments in the pipeline. The behavior of a pipeline can be illustrated with a space-time diagram. o It shows the segment utilization as a function of time. Compiled By: Er. Hari Aryal [[email protected]] Reference: W. Stallings | 2 Computer Organization and Architecture Chapter 4 : Pipeline and Vector processing Fig 4-2: Four Segment Pipeline The space-time diagram of a four-segment pipeline is demonstrated in Fig. 4-3. Where a k-segment pipeline with a clock cycle time tp is used to execute n tasks. o The first task T1 requires a time equal to ktp to complete its operation. o The remaining n-1 tasks will be completed after a time equal to (n-1)tp o Therefore, to complete n tasks using a k-segment pipeline requires k+(n-1) clock cycles. Consider a nonpipeline unit that performs the same operation and takes a time equal to tn to complete each task. o The total time required for n tasks is ntn. Fig 4-3: Space-time diagram for pipeline The speedup of a pipeline processing over an equivalent non-pipeline processing is defined by the ratio S = ntn/(k+n-1)tp . If n becomes much larger than k-1, the speedup becomes S = tn/tp. If we assume that the time it takes to process a task is the same in the pipeline and non-pipeline circuits, i.e., tn = ktp, the speedup reduces to S=ktp/tp=k. This shows that the theoretical maximum speed up that a pipeline can provide is k, where k is the number of segments in the pipeline. To duplicate the theoretical speed advantage of a pipeline process by means of multiple functional units, it is necessary to construct k identical units that will be operating in parallel. This is illustrated in Fig. 4-4, where four identical circuits are connected in parallel. Instead of operating with the input data in sequence as in a pipeline, the parallel circuits accept four input data items simultaneously and perform four tasks at the same time. Compiled By: Er. Hari Aryal [[email protected]] Reference: W. Stallings | 3 Computer Organization and Architecture Chapter 4 : Pipeline and Vector processing Fig 4-4: Multiple functional units in parallel There are various reasons why the pipeline cannot operate at its maximum theoretical rate. o Different segments may take different times to complete their sub operation. o It is not always correct to assume that a non pipe circuit has the same time delay as that of an equivalent pipeline circuit. There are two areas of computer design where the pipeline organization is applicable. o Arithmetic pipeline o Instruction pipeline 4.2 Parallel Processing Parallel processing is a term used to denote a large class of techniques that are used to provide simultaneous data-processing tasks for the purpose of increasing the computational speed of a computer system. The purpose of parallel processing is to speed up the computer processing capability and increase its throughput, that is, the amount of processing that can be accomplished during a given interval of time. The amount of hardware increases with parallel processing, and with it, the cost of the system increases. Parallel processing can be viewed from various levels of complexity. o At the lowest level, we distinguish between parallel and serial operations by the type of registers used. e.g. shift registers and registers with parallel load o At a higher level, it can be achieved by having a multiplicity of functional units that perform identical or different operations simultaneously. Fig. 4-5 shows one possible way of separating the execution unit into eight functional units operating in parallel. o A multifunctional organization is usually associated with a complex control unit to coordinate all the activities among the various components. Compiled By: Er. Hari Aryal [[email protected]] Reference: W. Stallings | 4 Computer Organization and Architecture Chapter 4 : Pipeline and Vector processing Fig 4-5: Processor with multiple functional units There are a variety of ways that parallel processing can be classified. o Internal organization of the processors o Interconnection structure between processors o The flow of information through the system M. J. Flynn considers the organization of a computer system by the number of instructions and data items that are manipulated simultaneously. o Single instruction stream, single data stream (SISD) o Single instruction stream, multiple data stream (SIMD) o Multiple instruction stream, single data stream (MISD) o Multiple instruction stream, multiple data stream (MIMD) SISD Represents the organization of a single computer containing a control unit, a processor unit, and a memory unit. Instructions are executed sequentially and the system may or may not have internal parallel processing capabilities. parallel processing may be achieved by means of multiple functional units or by pipeline processing. SIMD Represents an organization that includes many processing units under the supervision of a common control unit. All processors receive the same instruction from the control unit but operate on different items of data. The shared memory unit must contain multiple modules so that it can communicate with all the processors simultaneously. Compiled By: Er. Hari Aryal [[email protected]] Reference: W. Stallings | 5 Computer Organization and Architecture Chapter 4 : Pipeline and Vector processing MISD & MIMD MISD structure is only of theoretical interest since no practical system has been constructed using this organization. MIMD organization refers to a computer system capable of processing several programs at the same time. e.g. multiprocessor and multicomputer system Flynn’s classification depends on the distinction between the performance of the control unit and the data-processing unit. It emphasizes the behavioral characteristics of the computer system rather than its operational and structural interconnections. One type of parallel processing that does not fit Flynn’s classification is pipelining. We consider parallel processing under the following main topics: o Pipeline processsing . Is an implementation technique where arithmetic suboperations or the phases of a computer instruction cycle overlap in execution. o Vector processing . Deals with computations involving large vectors and matrices. o Array processing . Perform computations on large arrays of data. 4.3 Arithmetic Pipeline Pipeline arithmetic units are usually found in very high speed computers o Floating–point operations, multiplication of fixed-point numbers, and similar computations in scientific problem Floating–point operations are easily decomposed into sub operations. An example of a pipeline unit for floating-point addition and subtraction is showed in the following: o The inputs to the floating-point adder pipeline are two normalized floating-point binary number X A 2a Y B 2b o A and B are two fractions that represent the mantissas o a and b are the exponents The floating-point addition and subtraction can be performed in four segments, as shown in Fig. 4-6. The suboperations that are performed in the four segments are: o Compare the exponents . The larger exponent is chosen as the exponent of the result. o Align the mantissas . The exponent difference determines how many times the mantissa associated with the smaller exponent must be shifted to the right. o Add or subtract the mantissas o Normalize the result Compiled By: Er. Hari Aryal [[email protected]] Reference: W. Stallings | 6 Computer Organization and Architecture Chapter 4 : Pipeline and Vector processing . When an overflow occurs, the mantissa of the sum or difference is shifted right and the exponent incremented by one. If an underflow occurs, the number of leading zeros in the mantissa determines the number of left shifts in the mantissa and the number that must be subtracted from the exponent. The following numerical example may clarify the suboperations performed in each segment.

Pipeline and Vector Processing

How Data Hazards Can Be Removed Effectively

Flynn's Taxonomy

Computer Organization & Architecture Eie

Computer Organization Structure of a Computer Registers Register

Computer Architectures

UTP Cable Connectors

Assembly Language: IA-X86

Lecture #2 "Computer Systems Big Picture"

Evaluation of Synthesizable CPU Cores

MIPS Architecture with Tomasulo Algorithm [12]

Chapter 1 + Basic Concepts and Computer Evolution Computer Architecture 2 Computer Organization

Intro to Systems Digital Logic