Lecture 8: More Pipelining

Lecture 8: More Pipelining

Overview n Getting Started with Lab 2 Lecture 8: n Just get a single pixel calculating at one time n Then look into filling your pipeline n Multipliers More Pipelining n Different options for pipelining: what do you need? n 3 Multipliers or put x*x, y*y, and x*y through sequentially? David Black-Schaffer n Pipelining [email protected] n If it won’t fit in one clock cycle you have to divide it EE183 Spring 2003 up so each stage will fit n The control logic must be designed with this in mind n Make sure you need it EE183 Lecture 8 - Slide 2 Public Service Announcement Logistics n n Xilinx Programmable World Lab 2 Prelab due Friday by 5pm n Tuesday, May 6th n http://www.xilinx.com/events/pw2003/index.htm n Guest lecture next Monday n Guest Lectures Synchronization and Metastability n Monday, April 28th These are critical for high-speed systems and Ryan Donohue on Metastability and anything where you’ll be connecting across Synchronization clock domains. n Wednesday, May 7th Gary Spivey on ASIC & FPGA Design for Speed n The content of these lectures will be on the Quiz SHOW UP! (please) EE183 Lecture 8 - Slide 3 EE183 Lecture 8 - Slide 4 1 Easier FSMs Data Path always @(button or current_state) Do this if nothing else is begin Xn Yn write_en <= 0; specified. Mandel X Julia X Mandel X Julia Y output <= 1; next_state <= `START_STATE; Mux * * * Mux case(current_state) Note that the else is not `START_STATE: specified. begin write_en <= 1; - <<1 + if (button) next_state <= `WAIT_STATE; What is the next state? end What do we+ do if the+ wholeIf>4 dataoutput 1path `WAIT_STATE: doesn’t fit in one clock cycle? begin X Y output <= 0; Be careful! n+1 n+1 end end Easy way to Feedback for next iteration EE183 Lecture 8 - Slide 5 infer latches! EE183 Lecture 8 - Slide 6 Pipelining Example 1 Pipelining Example 2 Mandel X Julia X Xn Yn Mandel X Julia Y Mandel X Julia X Xn Yn Mandel X Julia Y Critical path 2ns Mux * 12ns * 12ns * 12nsMux 2ns 12ns2ns Mux * 12ns * 12ns * 12nsMux 2ns Pipeline Registers - 6ns <<1 1ns + 6ns - 6ns <<1 1ns + 6ns Critical path + 6ns + If>4 output 1 3ns Critical path + 6ns + If>4 output 1 3ns 24ns 12ns12ns+??ns X n+1 Y n+1 X n+1 Y n+1 How long does this wire take? How long does this wire take? Feedback for next iteration Feedback for next iteration EE183 Lecture 8 - Slide 7 EE183 Lecture 8 - Slide 8 2 Latency and Throughput Pipelining Example 3 Not Pipelined Xn Yn Time Input Output Mandel X Julia X Mandel X Julia Y t=0 A1 Input Output t=35ns A2 A1 Critical path Comb. Logic t=70ns A3 A2 12ns2ns Mux * 12ns * 12ns * 12nsMux 2ns t=105ns A3 Pipeline First output takes longer because 35ns critical path Registers it has to go through all the stages Critical path 1 stage but subsequent results can come 6ns - 6ns <<1 1ns + 6ns out every clock cycle. Pipeline Pipelined Registers S1 Time Input S1 Output Critical path + 6ns + If>4 output 1 3ns Input Output t=0 A1 6ns + ??ns A1A2A3x A1A2A3x A1A2A3 t=20ns A2 A1 t=40ns A3 A2 A1 X n+1 Y n+1 t=60ns A3 A2 <20ns critical path t=80ns A3 Feedback for next iteration 3 stages EE183 Lecture 8 - Slide 9 EE183 Lecture 8 - Slide 10 Key points on Pipelining Multipliers n Increases utilization for operators n CoreGen gives you several pipelining n You can do multiple calculations at once so you can use everything maximally (ideally) options n This is the point! Store the results from smaller calculations to n Which is best? make the overall calculation faster. n Insert the next data item into the datapath before the n Depends on your design previous one has finished n Data size will determine speed and pipelining n The pipe registers keep the computation separate n You will have a lot of pipe registers if you’re doing a lot of calculations (I.e., Lab 3!) n Design is an iterative process so you won’t n What is the effect of the algorithm feeding back on itself? be able to choose the best approach at first n Do all points have the same number of iterations? control (i.e., get started early!) n Is the data dependent between pipeline stages? hazards EE183 Lecture 8 - Slide 11 EE183 Lecture 8 - Slide 12 3 Xn Yn Multiplier Issues Mandel X Julia X Mandel X Julia Y Mux * * * Mux n Multipliers are BIG Pipeline Registers n * * * How can we get away with fewer multipliers? Pipeline Registers * * * n Multipliers may be SLOW Pipeline Registers n How can we utilize them maximally? - <<1 + Pipeline Registers + + If>4 output 1 X n+1 Y n+1 EE183 Lecture 8 - Slide 13 EE183 Lecture 8 - Slide 14 Xn Yn Now we have… Mandel X Julia X Mandel X Julia Y Mux M1a* M2a* *M3a Mux n With a 3-stage multiplier you’ve now got 5 Pipeline Registers pipeline stages M1b* M2b* *M3b Pipeline n How can you keep the pipeline full? Registers M1c* M2c* *M3c n How many things do you need to Pipeline Registers calculate at once? Add1 - <<1 + Pipeline Registers + + If>4 output 1 n What is full? Will you ever get 100% Add2 utilization? What is good enough? X n+1 Y n+1 EE183 Lecture 8 - Slide 15 EE183 Lecture 8 - Slide 16 4 Pipeline Performance Analysis Pipeline Performance Analysis n With the bad data path (3, 3 stage multipliers and 2 n With the bad data path (3, 3 stage multipliers and 2 stages after that; one pixel at a time) stages after that; multiple pixels at a time) Clk M1a M2a M3a M1b M2b M3b M1c M2c M3c Add1 Add2 Clk M1a M2a M3a M1b M2b M3b M1c M2c M3c Add1 Add2 1 • • • 1 • • • 2 • • • 2 s s s • • • 3 • • • 3 • • • s s s • • • 4 • 4 v v v • • • s s s • 5 • 5 • • • v v v • • • s • n In 5 cycles we used 11 units out of 55 available: n We approach 100% utilization if there are no stalls or 20% average utilization dependencies and we can keep getting new data EE183 Lecture 8 - Slide 17 EE183 Lecture 8 - Slide 18 What performance is required? What do we expect n Replication and Pipelining are not trivial n The previous data path is terribly inefficient if to implement—make sure you need them you only put one pixel through at a time, but doing multiple pixels at once is very n Is either needed for Lab #2? complicated n How would you tell? n Hint: each Julia image takes at most n As an alternative you can use one multiplier and (64*64*64*7*1/50e6) = 0.036s to create. put your x*x, y*y, and x*y through it in a n Is this “real-time” enough for an animation? pipelined manner. n Other issues? Need to meet timing for the n What’s the efficiency? Is it a good tradeoff for VGA. area/speed? This analysis is critical! EE183 Lecture 8 - Slide 19 EE183 Lecture 8 - Slide 20 5 Pipeline Performance Pipeline Performance Analysis Conclusions n Single multiplier, put x2 (•) through first, then n You need to know your algorithm and y2 (s), then x•y(•). what tradeoffs you are making Clk Ma Mb Mc Add1 Add2 n In 7 cycles we use 11/35 1 • functional units = 31% n What do you care about? 2 s n But we only have 1 • n Speed? 3 s multiplier • • n Area? n How much space do we 4 s • save? What is most n Both. 5 • important? (Power is a function of speed and number of 6 • transistors, i.e., area.) 7 • EE183 Lecture 8 - Slide 21 EE183 Lecture 8 - Slide 22 Zooming n An arbitrary portion of the screen can be described in many ways. Here are two: n Xmin, Xmax, Ymin, Ymax n Requires dividing by the number of pixels n Xorigin, Yorigin, scale n Requires a fixed number of pixels n Hints for Zooming: n Have registers with the X, Y origins and increment/decrement them with the up/down, left/right buttons n Have a scale register which goes up/down with the zooming in/out n When converting form 0..63 to -2.00 to 2.00 use the scale and origin to calculate the new value EE183 Lecture 8 - Slide 23 EE183 Lecture 8 - Slide 24 6 Lecture 6 Key Points n Pipelining increases the clock speed but decreases the amount of work per clock n Parallelism is easy except for resource conflicts n Logistics n Lab 2 Prelab due Friday by 5pm email URL to Joel n Visiting lecturer next Monday – contents will be on the quiz EE183 Lecture 8 - Slide 25 7.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    7 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us