Design of a Neural Network Simulator on a Transputer Array
Total Page:16
File Type:pdf, Size:1020Kb
DESIGN OF A NEURAL NETWORK SIMULATOR ON A TRANSPUTER ARRAY Gary Mclntire James Villarreal, Paul Baffes, and Advanced Systems Engineering Dept. Ford Monica Rua Artificial Intelligence Section, Aerospace, Houston, TX. NASNJohnson Space Center, Houston, TX. Abstract they will be commercially available. Even if they were available today, they would be A high-performance simulator is being built generally unsuitable for our work because to support research with neural networks. All they are very difficult (usually impossible) to of our previous simulators have been special reconfigure programmably. A non-hardwired purpose and would only work with one or two VLSl neural network chip does not exist today types of neural networks. The primary design but probably will exist within a year or two. If goal of this simulator is versatility; it should done correctly, this would be ideal for our be able to simulate all known types of neural simulations. But the state of the art in networks. Secondary goals, in order of reconfigurable simulators are supercomputers importance, are high speed, large capacity, and and parallel processors. We have a very fast ease of use. A brief summary of neural simulator running on the SX-2 supercomputer networks is presented herein which (200 times faster than our VAX 11/780 concentrates on the design constraints simulator), but supercomputer CPU time is imposed. Major design issues are discussed yfzy costly. together with analysis methods and the chosen For the purposes of our research, solutions. several parallel processors were investigated Although the system will be capable of including the Connection Machine, the BBN running on most transputer architectures, it Butterfly, and the Ncube and Intel Hypercubes. currently is being implemented on a 40- The best performance for our needs was transputer system connected in a toroidal exhibited by the INMOS Transputer System. architecture. Predictions show a performance Our previous neural network simulators level nearly equivalent to that of a highly have been specific to one particular type of optimized simulator running on the SX-2 algorithm and consequently would not work for supercomputer. other types of networks. With this simulator our primary goal is to be able to implement all Introduction types of networks. This will be more complicated but is deemed well worth the There are several ways to simulate large effort. When we are finished, it should be neural networks. Computationally speaking, possible to implement a different kind of some of the fastest are via optical computers network in less than a day. The performance and neural net integrated circuits (hardwired reduction will be less than 10 percent for this VLSI). However, both methods have some basic general-purpose capability. problems that make them unsuitable for our research in neural networks. Optical computers and hardwired VLSl are still under development, and it will be a few years before 111 With this error computed, the weights at each Example networks connection between the hidden and the output layer can be adjusted. Next, the error values To have examples to work with, let us consider are backpropagated from the output layer to two very typical neural networks. The first is the hidden layer. Finally, each weight can be a three-layer feedforward network (fig. 1) adjusted for the connections between the input that is to be trained with the generalized delta and hidden layers. learning algorithm (also called back This entire sequence is repeated for propagation). Rumelhart et. a1.[5] describe this each successive I/O pair. Note that this algorithm in detail. We will assume that every algorithm has a sequence of events to it. Other node in one layer is connected to every node in neural net algorithms do not; instead, every its adjacent layer. The network will be trained node and connection updates itself with a number of I10 pairs which are an cont i n u o us Iy [ 2,4]. Such a Ig o r it h m s can be encoding of the associations to be learned. viewed as a sequence of length one. The sequence of events to the algorithm can be n described as follows. First, an input vector is placed into the input nodes. The weight values of the connections between the input layer and hidden layer are then multiplied by the output value of the corresponding input nodes and the result is propagated forward to the hidden layer. These products are collected and summed at each node of the hidden layer. Once all the nodes in the hidden layer have output values, this process is repeated to propagate Figure 2. - Hopfield net. signals from the hidden layer to the output layer. When the output layer values are The second network to be considered is computed, an error can be calculated by the Hopfield network[l] shown in figure 2. it is comparing the output vector with the desired an auto-associative memory in which every output value of the I/O pair. node is equivalent; i. e., they are all used as inputs and outputs. Every node is connected to Hidden Layer every other node but is not connected to itself. The connections are bidirectional and Layer symmetric which means that the weight on the Input 1.e connection from node i to node j is the same as the weight from node j to node i. Other neural networks impose different constraints. For example, the connection scheme may be totally random, or it may be that every ninth node should be connected. The equations used are very often different. The order of sequencing forward propagation, back propagation, weight adjusting, loading inputs, etc., also may be different. Stochastic v networks update their values probabilistically. Figure 1. - Feedforward network. Yet some generalizations can be made for almost all types of networks and these are what we have used as the basis of our simulator design. The first generalization is that all network computations are local computations. In other words, the computations only involve a node and the nodes to which it is connected. 112 This is a generally accepted principal in the the nodes and connections. Almost all literature and some authors even use it in the networks include added representations to definition of a neural network. ease debugging tasks. Because of these The second generalization is that all differences in size, we have allowed the user neural computations can be performed by to specify node variables and their types by applying functions to the state variables of a defining structures in the C language to hold node and its neighbors. While this is a trivial the state variable information. This allows all restatement of Turing equivalence, the the flexibility of C (ie. integers, floats, emphasis here is that this can be done in a doubles, bytes, bit fields, etc.). computationally efficient manner. The third generalization is that any CPU Utilization computation performed by a node on its input signals can be decomposed into parallel Since previous simulations have shown that a computations that reduce multiple incoming neural net simulator spends almost all of its signals to a single signal. This single signal is time processing connections (typically, there then sent to a node to be combined with single are many more connections than there are signals sent from the other parallel nodes), an examination of execution speed computations. This allows a node computation must focus on the calculations done for each to be broken apart, and the partial results connection. The operation common to almost forwarded to the node which computes the all neural nets is some function of the dot final result. For example, a computation that product. This is is common to almost all networks is the dot oj = f( W- 0. ) product which is a sum of products. This can c I// be decomposed into multiple sums of products whose results are forwarded and summed. where Oi is the output of the ithnode, Oj is Memory Utilization the output of the j th node, Wjj is the connecting weight, and f is some function For maximum memory efficiency, the target applied to the dot product (note that the time design should have one memory cell of the spent executing the function f would be minimum possible size for each state variable relatively small in any sizable network since which the network must keep. These state there would be few nodes compared to the variables consist of the values kept for the number of connections). Notice that the above connections and nodes. For a connection in the computation can be thought of as a loop of two networks above, one memory cell per multiply-accumulate operations. For each connection is needed since the only operation the computer must calculate two information associated with a connection is addresses, fetch the two referenced variables, its weight. If the connectivity is like that of multiply them together, and add that product the above two networks (where everything in to a local register. If the addresses were one layer in connected to everything in another sequential, calculating a new address could be layer), the connection information can be done by incrementing a register. Otherwise, a implicit. At the other extreme, where the randomly connected network would require connectivity is totally random, a additional that the computer fetch a pointer which would pointer between nodes would have to be kept be used as the address of Oj. for each connection. Without the extra pointer, we could For nodes, a network like the Hopfield imagine that the weight, Wii, and Oj could be network only has to keep one variable: its fetched in parallel from two separate output value.