Architectural Design of a Reconfigurable Computational Platform

Prepared for: David Buehler

Prepared by: Team Morphing Architecture:

John Geidl Paul Jaszkowiak William Walker

University of Idaho

November 27, 2004 Senior Design Proposal 2

Table of Contents

1.0 Executive Summary...... 5

2.0 Background...... 6

2.1 Situational Background...... 6

2.2 Problem...... 6

2.3 Previous Research...... 6

2.4 Limits...... 7

3.0 Problem Definition...... 7

3.1 Project Goals...... 7

3.2 Project Requirements...... 7

3.3 Project Constraints...... 8

4.0 Concepts Considered...... 9

4.1 Basic Considerations...... 9

4.2 RMM Concepts...... 9

4.3 Reconfigurable Interconnect Concepts...... 9

4.4 Control Unit Concepts...... 9

5.0 Concepts Selected...... 10

5.1 Hardware Selection...... 10

5.2 RMM Concept Selection...... 11

5.3 Reconfigurable Interconnect Concept Selection...... 11

5.4 Control Unit Concept Selection...... 11

6.0 Overall Architecture...... 11 Senior Design Proposal 3

6.1 System Components...... 11

6.2 Data Lines...... 12

6.2.1 Memory Data Lines...... 12

6.2.2 RPN Data Lines...... 12

6.2.3 Off-chip Data Lines...... 12

6.3 Control Lines...... 13

6.3.1 Off-chip Instruction Lines...... 13

6.3.2 Instruction Configuration Lines...... 13

6.3.3 Off-chip Assert Line...... 13

6.3.4 Start/Stop Line...... 14

6.3.5 Push/Pop/Read/Write Lines...... 14

6.3.6 Memory Empty Lines...... 14

6.4 Overall System...... 14

7.0 Control Unit...... 15

7.1 Control Unit Components...... 15

7.1.1 Instruction Decoder...... 15

7.1.2 Memory Counter and Comparator...... 15

8.0 Reconfigurable Memory Module Architecture...... 16

8.1 RMM Components...... 16

8.2.1 RAM Memory...... 16

8.2.2 Address Generator...... 16

9.0 Reconfigurable Processing Node...... 17

9.1 FPGA Simulator...... 17 Senior Design Proposal 4

10.0 Reconfigurable Interconnect...... 17

10.1 Routing Multiplexers...... 17

11.0 Economic Analysis...... 18

11.1 Current Estimates...... 18

11.2 Purchasing Alternatives...... 18

12.0 Future Work...... 19

12.1 Phase II...... 19

12.2 Phase III...... 19

13.0 Appendix...... 21

List of Figures

Figure 1 – RCP General Architecture...... 21

Figure 2 – RCP Detailed Architecture...... 22

Figure 3 – RMM Architecture...... 23

Figure 4 – RPN Architecture...... 24

Figure 5 – Reconfigurable Interconnect Architecture...... 25 Senior Design Proposal 5

1.0 Executive Summary

This report details the architectural design of a Reconfigurable Computational Platform

(RCP). The RCP will become part of a larger project to improve the computing speed of hardware while maintaining tolerances for space. This design is meant to be a prototype, with an emphasis on overall design, for an extended project. The architecture presented in this report centers on the flow of data between components of the RCP. This design improves the speed of a computational element by eliminating memory-accessing times from instructions and minimizing instruction decoding.

The Reconfigurable Memory Modules (RMMs) implement instructions without the need for outside control signals during run-time. The RMMs implement software style storage for faster, easier accesses. Specifying the RMM configuration allows the RMM to perform its own addressing in parallel with the RPNs. The separation of duties between the RMMs and RPNs allows for much shorter instruction cycle times than would otherwise be possible. The RMMs communicate with the RPNs during run-time, but only when the RPNs indicate that a new set of data is needed.

This architecture increases the speed of the RCP, but does not rely on special hardware.

Since no special hardware is needed, the architecture presents a cost-effective solution to increasing the speed of Space-safe systems. The design will be implemented in VHDL and later in hardware in the final phases of this project, with future implementations following the design presented in this report. Senior Design Proposal 6

2.0 Background

2.1 Situational Background

The continued exploration of space requires ever increasing computational power (bits per second) from spacecraft. Detailed data retrieval and the sheer amount of data needed for scientific purposes demand faster, larger resources. New discoveries can only be made with more information, and current architectures are reaching their limitations. A fresh approach is needed.

2.2 Problem

Increasing the computational power of systems requires that the speed be increased.

Increasing the speed of a microprocessor usually requires larger, heavier, components or components that have less radiation hardness (ability to resist radiation damage to hardware components). Although these components increase computational speed, the size, weight, and lack of radiation hardness create problems when used in spacecraft where these areas are constrained.

A Field Programmable Processor Array (FPPA) can increase the computational speed by using circuitry dedicated to specific operations. The FPPA reconfigures its parts to be optimized for specific operations, increasing the speed. Reconfiguring can increase speed due to specialty circuits, but would only use standard hardware (such as FPGAs and CPLDs). This design is for one element of the FPPA, the Reconfigurable Computational Platform (RCP).

2.3 Previous Research

Previous research by David Buehler of the University of Idaho, for the NASA Space grant, determined the overall design of an FPPA. This design will consist of numerous RCPs cascaded together. The RCPs will work in parallel on specific tasks to complete an operation.

Each RCP will be configured to work on one section of a problem with the results being Senior Design Proposal 7 combined to form the results of an operation. Buehler’s previous research has developed the overall design of the FPPA and work has begun on one component of a RCP, the Reconfigurable

Processing Nodes (RPNs).

2.4 Limits

The project will be completed in three phases:

 design,

 VHDL implementation,

 hardware implementation.

The design will consist of the overall design of a RCP and the design for the sub components in the system with the exception of the RPNs. Buehler’s research covers the design of the RPNs, and so their design is beyond the scope of this project. As such, the function of the RPNs will be simulated to help verify the design of the other components.

3.0 Problem Definition

3.1 Project Goals

The main goal of this project is to provide a working prototype of the architecture for a

Reconfigurable Computational Platform. To these ends, the project is divided into three phases.

The first phase will design the RCP architecture in schematics. Phase II will implement these schematics using VHDL. Phase III will construct the RCP architecture using hardware components. The project will deliver working hardware which will demonstrate Appodization and Pixel Replacement. This report concludes Phase I of the project, the schematic design.

3.2 Project Requirements Senior Design Proposal 8

The two main functional requirements are to perform Appodization and Pixel

Replacement on a provided data set. Appodization is the process by which interferograms with invalid data near the edges are corrected. This process can be done by multiplying the values in an interferogram by a predetermined cosine function and then passing the result through a high- pass filter. Pixel Replacement corrects satellite pictures by replacing “hot” pixels, white pixels caused by radiation, with the spatial average of its neighbors. Pixel Replacements requires detection of the pixels and averaging several values of related pixels.

In order to accomplish these tasks, the design of the RCP must include:

 Two Reconfigurable Memory Modules (RMMs)

 Two RPNs

 One Reconfigurable Interconnect (RIC)

 One Control Unit

 16 bit data paths

 Support for at least two modes: loading and run-time

 Support of Stack, Queue, and Sequential Memory accessing

 Support of Read and Write modes in both RMMs

 Throughput of at least 10 million 16-bit samples per second

3.3 Project Constraints

The customer, David Buehler, constrained the project in the following ways:

 All data must pass through the RIC

 No control signals may pass through the RIC

 RMMs must signal when an operation has completed

 RMM addressing must be handled internally Senior Design Proposal 9

 Run-time mode must be independent of the Control Unit

 The design must be realizable in Space-safe hardware

 Memory must support both Read and Write operations for every instruction

4.0 Concepts Considered

4.1 Basic Considerations

The team was given specifications that the system would consist of two Reconfigurable

Memory Modules, two Reconfigurable Processing Nodes, one Reconfigurable Interconnect, and one Control Unit. However, it was up to the team to determine the hardware, communication, and implementation of the system, given these specifications. The choices available for implementing the components in hardware are FGPAs, CPLDs, and microcontrollers.

4.2 RMM Concepts

The RMM requires an instruction signal in order to determine its configuration, and so it was determined that a 2-bit configuration line would be the minimum required. However, adding one bit to the configuration line would allow the instructions to be encoded using “one hot” encoding.

4.3 Reconfigurable Interconnect Concepts

The RIC serves as a configurable data path for the entire system. One concept to be considered was if the RIC would also pass control signals to the different components. Senior Design Proposal 10

4.4 Control Unit Concepts

The system needed a way to set operational modes and switch between them in a reliable manner. The Control Unit would do this, but would need to issue multiple commands to the various components. The first design issue was how many modes of operation the system would require. The mode issue was based on the different operations and how the system would switch modes. Switching modes could be signaled by the Control Unit with a Run/Stop bit, or using a

Data Valid bit. Another option was static timing of the system. If the Run/Stop bit was used, configuration of the other components would be achieved using configuration lines and the operation begun when the Run/Stop bit changed. Using the Data Valid bit the configuration lines might change while the components were still operating. The Data Valid bit would go high when valid data propagated though the system. Static timing requires anything communicating from off the system to know the internal timing of the system.

Memory loading could be controlled by an external source or by the Control Unit.

However, with memory loading the issue also arose as to whether or not the data would be passed through the RIC or if there will be data lines specifically dedicated to loading. If there are dedicated data lines for loading the memory, then it is possible that the Control Unit may not need to be involved at all. Once the system has been configured and the memory loaded then the system will enter a run state. Senior Design Proposal 11

5.0 Concept Selection

5.1 Hardware Selection

CPLDs will be used for the RMMs, the RIC, and for the Control Unit, due to the fact that the CPLDs would simulate having non-volatile components in the actual system. FPGAs will be used in the RPNs because the RPNs will be dynamic components in the actual system.

5.2 RMM Concept Selection

The configuration lines for the RMMs will use one hot encoding for configuration to cut down on the complexity of the decoding system in the RMMs and increase the total speed.

5.3 Reconfigurable Interconnect Concept Selection

The RIC will not pass control signals to the different components. Its design would be for data only to simplify the architecture and better conform to a single task of data routing.

5.4 Control Unit Concept Selection

The Control Unit will use a Run/Stop bit to change modes. The Run/Stop bit provided to easiest of means of synchronizing the various components while ensuring that enough time would be given to flush the system. This will also allow for further modes to be included at a later time with little effort.

6.0 System Architecture

6.1 System Components

The architecture of the RCP is composed of eight components. Six of these components are mentioned in the requirements and consist of a Control Unit, two RPNs, two RMMs, and an

RIC. The other two components are the connections between the subsystems in the form of data and control lines. The Control Unit only needs to operate with the control signals and has no Senior Design Proposal 12 need for data lines. Hence, the data lines pass data between the different components (with the exception of the control unit), and on and off the chip. Specifically, due to the restriction that all data pass through the RIC, the data lines connect the two RMMs, the two RPNs, and the outside work through the RIC. These data lines can be seen as the red lines in Figure 1 in the appendix.

The control lines determine how the data is passed in the architecture. The flow of data depends on what operation is being performed. The Control Unit receives a control signal from outside the RCP to determine what operation is needed. The Control Unit then sets the controls for all the other components. The control lines can be seen as the blue and green lines in Figure

1 in the appendix.

6.2 Data Lines

Both the data lines and control lines consist of more than just one signal to or from components. The data lines must be specified as to the number of bits per line, which is 16, and the number of lines for both input and output. The number of lines is determined from the operations to be performed and what each component needs in order to perform that operation.

6.2.1 Memory Data Lines

The RMM modules use two data lines. One line is dedicated for input and the other for output. The reasoning for two data lines is that each instruction may choose to read from memory, write to memory, or do both. By having the option available for both operations to occur in the same instruction cycle, the time delay for addressing can be greatly reduced.

6.2.2 RPN Data Lines

Each RPN uses two input data lines and one output data line. The two input lines will be used for operations involving multiple values, as most instructions will require two operands.

The input data lines may be routed to either RMM, or outside the chip to provide the input for Senior Design Proposal 13 the RPN operation. The output is on a single line, because a single output can be routed to multiple destinations.

6.2.3 Off-chip Data Lines

Two data lines connect the RCP to the outside world. One is dedicated input and the other dedicated output. Only one input and one output will be used because even though multiple sets of information many need to be brought onto the chip, it is faster to load each set of data into an RMM using loading mode, and then operate on those values from the memory. This also simplifies handshaking between the RCP and the off-chip entity by allowing the Control

Unit to initiate all data transfers.

6.3 Control Lines

The control lines determine data routing information (for the RIC) and timings. Most control signals are generated by the Control Unit for configuration purposes, but some feedback control signals help with timing issues.

6.3.1 Off-chip Instruction Lines

The Off-chip Instruction Lines determine what mode the RCP should be operating in and what operations should be run. These lines would be generated by a scheduling device not associated with the RCP.

6.3.2 Instruction Configuration Lines

The Instruction Configuration Lines are generated by the Control Unit and are sent to all the other components. These lines set the configurations of all the other devices. The configurations convey mode type and operation type. The number of bits in the signal is kept to the minimum needed to realize all the modes and instructions. Senior Design Proposal 14

6.3.3 Off-chip Assert Line

The Off-chip Assert Line is used when loading data from off-chip into an RMM. This signal is generated by the Control Unit and controls the timing of data transmissions into the

RMMs. The signal is asserted after setup and hold times for the RMMs have been met. The signal consists of a single bit.

6.3.4 Start/Stop Line

The Start/Stop Signal is a single bit with 1 being start and 0 being stop. This bit allows the components of the RCP to finish configuring themselves before beginning operations. This bit prevents timing errors associated with the propagation of the Instruction Configuration Lines.

6.3.5 Push/Pop/Read/Write Lines

These signals originate from the RPNs and connect to the RMMs. Each RPN sends two bits to each RMM. These bits signal the RMM as to when the RPN is ready to receive the next value or write the next value. The Instruction Configuration Lines configure the RPNs and

RMMs to only enable the signals needed for the specified operation. These lines change their function depending on the configuration type. The naming convention only enables designers to more easily understand what is happening within the RMMs, but Push and Write both signify a transfer of information into the RMMs while Pop and Read signify data transferring out of the

6.3.6 Memory Empty Lines

The Memory Empty Lines denotes when an RMM no longer contains any data. Each

RMM generates a signal if this occurs. The Memory Empty Lines connect to the Control Unit.

These lines allow for feedback to let the Control Unit know when it is safe to change operating Senior Design Proposal 15 modes. These lines used with a suitable delay will make sure that no data is lost when switching between operating modes.

6.4 Overall System

The overall architecture of the RCP can be seen in Figure 4 in the appendix. This figure shows the interconnections of data and control lines between the six components of the RCP and off-chip communications. All inputs arrive from the left, and outputs exit from the right. The signals containing more than one bit are denoted by the bold blue lines, while single bit signals are shown in the normal green lines. This architecture will facilitate the operation of the RCP to increase speed without increasing the clock speed. The architecture allows for each specified operation as well as giving opportunity to expand the operation set at a later time.

7.0 Control Unit

7.1 Control Unit Components

The Control Unit consists of three subcomponents. These consist of an instruction decoder, a memory counter, and a comparator. These three components allow the Control Unit to generate the needed control signals, effectively switch between operations, and communicate with the outside world.

7.1.1 Instruction Decoder

The instruction decoder simply takes the instruction from off-chip and sets the configuration lines for each of the other components in the RCP. These signals set the data routing for the RCP as well as determine the operations to be performed. The decoder will change as new operations are added to the RCP functionality. The design of the decoder will be Senior Design Proposal 16 found using truth tables for the needed instructions and simplified using standard binary logic simplification. The results will be included in the VHDL simulation of the component.

7.1.2 Memory Counter and Comparator

The memory counter and comparator work in conjunction with each other to handle loading data into the memory from off-chip. The memory counter keeps track of the current position to load data into memory. The comparator compares that value with the expected final value for the given operation. When the comparator signals that the final value is reached, the

Control Unit knows that data loading is finished and another operation can begin.

8.0 Reconfigurable Memory Module

8.1 RMM Components

The RMMs consist of two components, a RAM memory and a Memory Controller. The

RAM memory functions as a normal RAM memory, taking in an address and a set of data for that address or returning the data from a given address. The Memory Controller provides the addresses for the RAM. The architecture for the RMM is shown in Figure 3 in the appendix.

The green lines represent the control signals while the blue lines represent the data signals.

8.1.1 RAM Memory

The RAM memory will be a separate SRAM chip. The read, write, address, and data signals are routed through the memory controller. The SRAM will operate at its fastest operating speed to provide two accesses during each instruction cycle.

8.1.2 Address Generator

The Address Generator generates the addresses for the RAM memory based on the configuration of the RMM. The configuration types are stack, queue, and sequential. The stack Senior Design Proposal 17 and queue configurations work in the same as the normal software implementations where the last value read is removed from memory. Sequential accesses do not remove the value from memory. The Address Generator generates two addresses in each instruction cycle, one for a read and one for a write. A pointer to the current position in memory determines the next address, adding or subtracting from the current for a write or a read.

9.0 Reconfigurable Processing Node

9.1 FPGA Simulation

The RPNs need only provide enough function to complete the Pixel Replacement and

Appodization problems and test the architecture of the system. To accomplish these tasks, the

RPNs will only consist of standard VHDL library functions simulated on an FPGA. These standard libraries allow for the function of the RPNs to be verified easily with test vectors. In this way, the RPNs can include the minimum processing power needed and ease the need for complicated testing procedures in other components.

To complete the Pixel Replacement problem, the RPNs need to include a detection element to designate the pixels needing replacement, and averaging capability in order to compute the spatial average of the surrounding pixels, and a delay circuit to accumulate the needed values for the spatial average.

The Appodization problem requires that the RPNs multiply the data values by a preset cosine function and then pass the result through a high pass filter. A simple multiplier and filter found in the standard libraries cascaded together would complete the problem. Senior Design Proposal 18

10.0 Reconfigurable Interconnect

10.1 Routing Multiplexers

The RIC routes the data to its proper destinations in the RCP. To accomplish this task, the RIC uses a series of multiplexers to switch between the appropriate outputs. The configuration signals from the Control Unit determine which signal is turned on for each of the multiplexers and gets routed to the different components. The inputs to the RIC can be turned on or off, and the outputs are selected from the several possibilities. Not all inputs or outputs are used in each configuration. The selection of possible outputs may be changed if more operations are added to the RCP. The schematic for the RIC can be seen in Figure 4 in the appendix. The green lines signify the control signals while the blue lines show the data lines.

11.0 Economic Analysis

11.1 Current Estimates

There have been no costs incurred thus far in the conceptual design phase of the

Morphing Architecture project. In the detail design phase, parts will be purchased in the form of n FPGAs (for the processing units), p CPLDs (for the Control Unit, RIC, and RMMs), and two

SRAM chips. The FPGAs used will be Xilinx Spartan-IIE chips mounted on Digilent D2SB development boards. The cost per board (including the Spartan-IIE chip and programming cables) is $90. The CPLDs used will be Xilinx XPLA3 CoolRunner (XCR3064XL,

BGA/PLCC/QFP package, 64 macrocells), each costing $20. In order to program the CPLDs, a

HW-130 adapter is needed. The cost of the HW-130 is $795. The HW-130 also requires an adapter to program the CoolRunners, which costs $240. The SRAM chips will be Cypress

CY7C198s (DIP package), each costing $9.40. Senior Design Proposal 19

11.2 Purchasing Alternatives

In the fabrication phase the CPLDs must have sockets created for them. Also, an I/O board (of sorts) must be created to house and route the CPLDs to each other, the SRAM, and to the FPGA development boards. The cost for these two items is unknown. Alternately, we may decide to implement the design using only FPGAs, in which case the parts cost would total to

$540 (6 FPGA boards), plus the cost of an I/O board to house and connect the SRAM to the

FPGA development boards.

12.0 Future Work

12.1 Phase II

Phase II of the project will require that the architectural designs for the components of the

RCP to be translated into VHDL simulation code. For each module, the inputs and outputs are specified in the design figures. Any changes to the design as a result of the VHDL coding or testing should be documented and the design figures updated. VHDL simulation should begin with the RMMs at the start of the spring semester. The RMMs will determine the standards used for communicating between components. Emphasis will be given to developing a working model for the RMMs and then for the overall architecture. RPN VHDL will conform to designs of the other components and will be the first to be changed if problems occur. The different modules will begin at the beginning of the spring semester with a different group member in charge of each. Paul will handle the RMMs, John will work with the RPNs, and Bill will be in charge of the Control Unit. The RIC will be designated to the first to finish their section. VHDL code of components will be tested individually and then as a whole using comprehensive test vectors. Senior Design Proposal 20

12.2 Phase III

Phase III of the project will be the hardware implementation of the VHDL code. For this section of the project, all hardware will need to be purchase, assembled, and prepared as needed to facilitate using the VHDL code. The VHDL code for all of the components will need to be completed and tested. All hardware components will be tested initially for defects and later for proper function of the VHDL code. The final phase will mostly cover the communications between the differing devices used for the components, as well as timing and handshaking issues.

The final project will then be tested with sample inputs for both the Pixel Replacement and

Appodization problems with output going off-chip. Phase III will begin once parts arrive. Senior Design Proposal 21

13.0 Appendix

Figure 1 – RCP General Architecture Senior Design Proposal 22

Figure 2 – RCP Detailed Architecture Senior Design Proposal 23

Figure 3 – RMM Architecture Senior Design Proposal 24

Figure 4 – RPN Architecture Senior Design Proposal 25

Figure 5 – Reconfigurable Interconnect Architecture