Power Implications of Implementing Logic Using

POWER IMPLICATIONS OF IMPLEMENTING LOGIC USING FIELD-PROGRAMMABLE GATE ARRAY EMBEDDED MEMORY BLOCKS by SCOTT YIN LUNN CHIN B.Eng., University of Victoria, 2003 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE in THE FACULTY OF GRADUATE STUDIES (Electrical and Computer Engineering) THE UNIVERSITY OF BRITISH COLUMBIA August 2006 © Scott Yin Lunn Chin, 2006 ABSTRACT POWER IMPLICATIONS OF IMPLEMENTING LOGIC USING FIELD- PROGRAMMABLE GATE ARRAY EMBEDDED MEMORY BLOCKS Modern field-programmable gate arrays (FPGAs) are used to implement entire systems, and these systems often require storage. FPGA vendors have responded by incorporating two types of embedded memory resources into their architectures: dedicated and non- dedicated. The dedicated embedded memory blocks lead to much denser memory implementations and are therefore very efficient for implementing large systems that require storage. However, for logic intensive circuits that do not require storage, the chip area devoted to the embedded FPGA memory is wasted. This need not be the case if the FPGA memories are configured as ROMs to implement logic. Previous work has presented algorithms that automatically map logic circuits to FPGAs with both large ROMs and small lookup tables. These previous studies, however, did not consider the impact on power. Power has become a first-class concern among FPGA vendors. In this thesis, we develop a power model for FPGAs that contain embedded memories, and apply it to investigate the impact of various embedded memory architectural parameters on power dissipation when using memories to implement logic. From this study, we find that mapping logic to memories incurs a significant power penalty due to the power consumed in the embedded memories. We then investigate two possible ways to reduce this power penalty at the CAD level, one of which we found to be effective. ii TABLE OF CONTENTS ABSTRACT II TABLE OF CONTENTS.... Ill LIST OF TABLES V LIST OF FIGURES VI ACKNOWLEDGEMENTS VIII 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.2 RESEARCH GOALS AND CONTRIBUTIONS 2 1.3 THESIS ORGANIZATION 3 2 BACKGROUND AND PREVIOUS WORK 4 2.1 FPGA ARCHITECTURE 4 2.1.1 Logic Element Architecture 5 2.1.2 Clusters • 7 2.1.3 Embedded Block Memory Architecture 8 2.1.4 Routing Architecture 10 2.1.4.1 Wire Segments " 2.1.4.2 Connection Blocks and Switch Blocks 12 2.1.4.3 Programmable Connections 12 2.2 FPGA CAD 13 2.2.1 Technology Mapping / 4 2.2.2 Clustering 15 2.2.3 Placement 15 2.2.4 Routing 16 2.3 HETEROGENEOUS TECHNOLOGY MAPPING 17 2.3.1 Terminology 17 2.3.2 Existing A Igorithms 18 2.3.3 SMAP 19 2.4 POWER ESTIMATION 21 2.4.1 Switching Activity 21 2.4.1.1 Transition Density Model 22 2.4.1.2 Lag One Model 23 2.4.1.3 ACE2.0 24 2.4.2 Power Estimation for FPGAs 25 2.4.3 Power Estimation for Memories and Caches 26 2.5 Focus AND CONTRIBUTION OF THESIS 29 3 POWER MODEL FOR FPGAS CONTAINING EMBEDDED MEMORIES 31 3.1 ACTIVITY ESTIMATION 31 3.1.1 Read Only Memory 32 3.1.2 Random Access Memory 33 3.1.3 Framework and Integration into ACE2.0 34 3.2 POWER ESTIMATION 37 3.2.1 Fixed-Size Memory 38 3.2.2 Programmable Column Decoder 38 3.2.3 Framework and Implementation of the Power Estimator 40 3.3 SUMMARY 43 4 POWER IMPLICATIONS OF MAPPING LOGIC TO MEMORIES 44 iii 4.1 EXPERIMENTAL METHODOLOGY 45 4.1.1 VPR Based Flow 45 4.1.2 Board Measurement Flow 48 4.2 EXPERIMENTAL RESULTS 50 4.2.1 Energy vs. Number of Memories 50 4.2.2 Energy vs. Memory Array Size 56 4.2.3 Energy vs. Memory Flexibility 60 4.3 SENSITIVITY OF RESULTS 65 4.4 SUMMARY 65 5 POWER AWARE METHODS FOR MAPPING LOGIC TO MEMORIES 67 5.1 ACTIVITY AWARE COST FUNCTION 67 5.1.1 Power Aware Homogeneous Technology Mapping 68 5.1.2 Activity-Aware SMAP 68 5.1.3 Experimental Methodology 73 5.1.4 Experimental Results 73 5.1.5 Summary for Activity-Aware Cost Function 77 5.2 POWER EFFICIENT SUPER-ARRAYS 77 5.2.1 Experimental Methodology 80 5.2.2 Experimental Results 81 5.2.2.1 Packing Efficiency 82 5.2.2.2 Power Efficiency ! 85 5.2.3 Summary for Power-Efficient Super-Arrays 87 5.3 SUMMARY 88 6 CONCLUSIONS 89 6.1 SUMMARY OF CONTRIBUTIONS 89 6.2 FUTURE WORK 91 6.2.1 Power Model 91 6.2.2 Heterogeneous Technology Mapping 91 REFERENCES 93 iv LIST OF TABLES Table 2-1. Aspect Ratios of Memories in Commercial FPGAs [2, 3, 19, 20] 8 Table 3-1. VPR Memory Power Parameters 41 Table 4-1. Benchmark Characteristics 46 Table 4-2. Parameters Under Investigation 50 Table 5-1. Percentage Change in Routing Energy When Using the Activity Aware Cost Function and Memories with B = 512 bits 75 Table 5-2. Percentage Change in Logic Energy When Using the Activity Aware Cost Function and Memories with B= 512 76 Table 5-3. Percentage Change in Overall Energy When Using the Activity Aware Cost Function and Memories with B= 512 76 Table 5-4. Number of LUTs Needed For Power Efficient Logical Memories 80 Table 5-5. Summary of Experiments (left: B=512 bits, right: B=4096 bits) 81 Table 5-6. LUTs Removed After Mapping (B=512) 82 Table 5-7. LUTs Removed After Mapping (B=4096) 83 Table 5-8. Average Percent Change in Energy When Using BF=2 87 v LIST OF FIGURES Figure 2-1. Conceptual FPGAs. Left: Traditional. Right: Heterogeneous 5 Figure 2-2. 2-LUT Configured as an AND Gate 6 Figure 2-3. LUT Paired with Flip-Flop 6 Figure 2-4. Cluster Architecture 7 Figure 2-5. High Level Embedded Memory Block Architecture 9 Figure 2-6. Programmable Column Decoder Architecture 10 Figure 2-7. Island-Style FPGA Routing Architecture 11 Figure 2-8. Connection Types a) unbuffered b) buffered uni-directional c) buffered bi• directional 13 Figure 2-9. CAD Flow 13 Figure 2-10. Technology Mapping Example 14 Figure 2-11. Example of Mapping Logic to a Memory Array 20 Figure 2-12. Glitch Filtration 24 Figure 2-13. Typical SRAM Memory Architecture 27 Figure 3-1. Replacing a ROM with Equivalent Nodes and Registers 32 Figure 3-2. Integration of Memory Activity Estimation into ACE2.0 35 Figure 3-3. Pseudo-Code for RAM Simulator 36 Figure 3-4. Transistor Level Modelling of the Programmable Column Decoder 39 Figure 3-5. Modelling of of LUTs in the Poon Power Model (from [57]) 40 Figure 4-1. Flow for VPR-Based Experiments 46 Figure 4-2. Test Harness for Board Measurements 49 Figure 4-3. Impact on Energy When Increasing the Number of 512bit Memory Arrays (VPR Flow) 51 Figure 4-4. Impact on Energy When Increasing the Number of 4kBit Memory Arrays (VPR Flow) 53 Figure 4-5. Number of Packed 4LUTs When Increasing the Number of Memories 53 Figure 4-6. Impact on Energy When Increasing the Number of 512bit Memory Arrays (Measured Flow) 55 Figure 4-7. Impact on Energy When Increasing the Number of 4kBit Memory Arrays (Measured Flow) '. 55 Figure 4-8. Impact on Memory Energy When Increasing Memory Array Size 57 Figure 4-9. Impact on Logic Energy When Increasing Memory Size 58 Figure 4-10. Impact on Amount of Packable LUTs When Increasing Memory Size 58 Figure 4-11. Impact on Routing Energy When Increasing Memory Size 59 Figure 4-12. Impact on Overall Energy When Increasing Memory Size 60 Figure 4-13. Impact on Logic Energy When Increasing Memory Flexibility 62 Figure 4-14. Impact on Routing Energy When Increasing Memory Flexibility 64 Figure 4-15. Impact on Overall Energy When Increasing Memory Flexibility 64 Figure 5-1. Node Replication in SMAP 70 Figure 5-2. Reducing Cut-Set Fanout 71 Figure 5-3. Number of Packed LUTs Using the Activity Aware Cost Function 74 Figure 5-4. Forming Logical Memories a)Area Efficient b)Power Efficient 78 vi Figure 5-5. Methodology For Power-Efficient Super-Arrays 81 Figure 5-6. Distribution of How the Number of LUTs That Can be Removed Are Affected for 512Bit Memories When BF=2 84 Figure 5-7. Distribution of How the Number of LUTs That Can be Removed Are Affected for 4096Bit Memories When BF=2 84 Figure 5-8. Impact on Energy When Increasing the Number of 512bit Memories 85 Figure 5-9. Impact on Energy When Increasing the Number of 4096bit Memories... 86 vii ACKNOWLEDGEMENTS The first person that I'd like to thank is my supervisor Dr. Steve Wilton. Although there were many candidates more qualified than me, Dr. Wilton took a chance by giving me the opportunity to be a part of his research group. Through his dedication to his students, I gained more than just a technical education in my masters program; Dr. Wilton has exposed me to every aspect of a career in research. Without his guidance, encouragement, and humor, this thesis would not have been possible. To all the members of the System on Chip research group, I would like to thank you for all the insightful conversations and above all else, the friendship and company. Special mention to the FPGA research group - Brad, Cary, David Grant, David Yeager, Eddy, Eric, Jason, Julien, Mark, Martin, Marvin, Nathalie, and Usman; the boys from MCLD315 - Amit, Karim, Reza, and XiongFei; and other SoC students who talked to me - Derek, David Chiu, Dipanjan, Melody, Neda, Rod, and Shirley.

Load more