F06 Physics Processor
Total Page:16
File Type:pdf, Size:1020Kb
Nicholas Bannister, Vinay Chaudhary, Aaron Hoy, Charles Norman, Jeremy Weagley Team VerchuCom PPU + Demo Final Report What We Built Game Description Overall Implementation Our original plan of action was to implement the game of Worms that utilized a customized Physics Processing Unit (PPU) built on the FPGA. We had planned to run the game on the Workstation, send objects back and forth to a shared bank of memory on the board, and update the objects’ state using the PPU. However, we came to realize that Worms would not fully show off our main feature of the project, which is the PPU. Therefore, we decided to create a “demo” which basically consisted of a bunch of squares interacting with each other, wind resistance and gravity within a confined area. We then took it to the next level and implemented a simple 2-player shooter game. The object of the game is for each player to destroy all of their squares, either blue or square, before their opponent. Each person has unlimited regular ammo along with 2 “nuclear bombs,” which destroy all of the bombs within a certain radius. Physics Processing Unit (PPU) We designed a specialized CPU (A PPU) that operates on structs representing 2 dimensional objects (position, velocity, acceleration, orientation, etc.). It modeled various particle effects, such as • Model collisions between particles and elastic/inelastic collisions • Effects of Gravity • Wind Resistance The accelerator sits in a basic iteration loop that applies each of the various effects on all the particles in memory, updating their positions, velocities, etc. based on physics. The CPU creates and destroys objects in the PPU’s loop, and reads the positions and orientations of the objects for use in the game. All updates are to the shared memory (BRAMs). Hardware Description PPU Pipeline / Overall Architecture We recently realized that there are some duplicate signals going through the pipeline. Namely, the pending values and the operational values are both originally taken from the pending data out of memory, and they are the same for several stages. I have not changed it yet, but those will be run down the pipeline only once until the point where they divide. Also, Charles is working on the overlap amount unit and the acceleration calculator (its continuation) as one superpipelined unit, so there will be one fewer major stages, and the stage resulting from the merge will have more minor stages in it. Also in 2 the current verilog code, the next address generator is not finished. Since the object array is null terminated, we know to loop A and B around to the base address when they get a null in their Object ID field. We want to know about this as soon as possible, and give the memory the correct wrapped around address in time, so the address generator must be a mealy machine. Its current implementation will be changed shortly. The rest of the pipeline still works like in the diagram. In the first major stage, we generate an address to put on the memory read port and on the next cycle we get the entire object out and run it through the breakdown module to separate it into the various values associated with the object. Having this module makes it very easy to change our word size or the memory ordering without affecting the rest of the PPU. Here we also generate a NOP signal based on whether or not the memory data is ready, and A or B address is wrapping around, etc. The second and third major stages are now being combined into one larger stage. Here we have a superpipelined acceleration calculator which takes in the positions and orientations of A and B along with other properties and outputs the resulting accelerations on A (a fixed number of cycles later). In parallel with this, we have the global acceleration calculator, which calculates accelerations resulting from gravity, wind, and wind viscosity. On the final iteration of A, once it has been compared to all over objects, the global accelerations will be applied instead. In the next stage, we have out accumulation units. This consists of the acceleration accumulator and the overstress detector. The acceleration accumulator keeps a running sum of all the accelerations on A as a result of any collisions it has with any of the other objects. The overstress detector keeps track of whether or not any of the collisions have overstressed the object, causing it to eventually be destroyed. These units are tricky in stalls and changes in A because they are state full (using pipeline registers), so we had to run some additional signals to make sure they stay correct in these situations. In the next stage we compute all the time dependent updates. Basically, we update the position based on the velocity and elapsed time, and update the velocity based on the acceleration and elapsed time. In order to obtain the elapsed time, we include a timestamp (measured in cycles) with each objects data. In this stages, we get the new cycles count and do a subtraction to get the elapsed number of cycles from the last time A was updated. We also keep this time to writeback as A's new timestamp. In the final stage we put all the properties of A back into one word and write it back to the memory. Although it is not shown on the diagram, the pipeline must actually stall when there is a write, because there is normally a read on every cycle, and the PPU is only getting one port of the dual ported memory controller. This is ok, however, because writes will happen an insignificant percentage of the time (once about every n cycles, where n is the number of objects in the array). This is the issue that the memory can only have one port that has the ability to write. In the instance that the PPU is writing and the PPC (ethernet code) is trying to write at the same time, the PPU gets priority and 3 a write failure signal is sent to the PPC. This is done because it is easier to handle a failed write in software than in the PPU architecture. Overlap Amount Calculator In this module of the pipeline, 2 objects will be tested for a collision. We have decided to treat all objects as squares. The objects have x and y coordinates along with the length of their sides (all are equal since they are squares). The overall goal of this module will be to output the total area of overlap between the two objects, horizontal/vertical distances between the centers of the two objects, and the projected direction of horizontal/vertical acceleration. This is done explicitly through several subtractors and adders with some extra compare and absolute value logic based on custom bit manipulation. Collision Acceleration Calculator The collision acceleration calculator is a bigger module that relies on the overlap amount calculator. This module takes in the outputs from the overlap amount calculator along with the object’s mass (this is inverted to eliminate the need for a division) and positional/size properties. There are three outputs that this module creates: acceleration of the object in the x direction, acceleration of the object in the y direction, and angular or theta acceleration of the object. We are using Newton’s F=m*a formula to calculate acceleration. Therefore, in calculating the 3 different accelerations, we use a=F*m-1. We therefore need to determine the “force” for each component. The force for the x and y components are going to be modeled by Fx=a*b*dx*[elas_A+elas_B] and Fy=a*b*dy*[elas_A+elas_B], where a and b are dimensions of the overlap rectangle, elas_A and elas_B are constant properties of the objects that are given, and dx and dy are the horizontal and vertical distances between the center of the two objects. The angular force will be based on the same principles. However, now we will take Ftheta to be the greater of Fx and Fy. The only difference will be the direction. The angular acceleration must be negated due to convention. One last issue is saturation. Due to the multiplication of several numbers which are up to 16- bits long, the final values must be saturated. Therefore, in the end, the minimum acceleration will be -32,768 units and the maximum will be 32,767. These values will be interpreted and scaled appropriately by the game. At this point, the final outputs of this module should be ready to be passed on to the next stage of the pipeline. Acceleration Accumulator This module determines the new overall acceleration of the object based on it's "old" acceleration and the new acceleration based on a particular object/force. This is done through 3 parallel 16-bit adders and 3 parallel 16-bit muxes. Essentially the muxes are there so that junk data is not written into the accumulator when the next stage does not contain valid data. Time-Dependent Update The time independent update module takes the computations computed previously in the pipeline and actually applies it to the object. The “next state” of the object, most 4 specifically the next position and velocity, will be updated. A simple subtraction is used to compute the total amount of cycles elapsed. This value is then multiplied by the velocity and acceleration components to achieve the next state position and velocity components, respectively. Once again, there is some custom bit manipulation that is done to assure the proper values are computed.