FPGA to ASIC Comparison Details
Total Page:16
File Type:pdf, Size:1020Kb
book 4/9/2009 16: 24 page 133 Appendix A FPGA to ASIC Comparison Details This appendix provides information on the benchmarks used for the FPGA to ASIC comparisons in Chap. 3. As well, some of the absolute data from that comparison is provided; however, area results are not included as that would disclose confidential information. A.1 Benchmark Information Information about each of the benchmarks used in the FPGA to ASIC comparisons is listed in Table A.1. For each benchmark, a brief description of what the benchmark does is given along with information about its source. Most of the benchmarks were obtained from OpenCores (http://www.opencores.org/) while the remainder of the benchmarks came from either internal University of Toronto projects [29, 71, 165, 166] or external benchmark projects at http://www.humanistic.org/∼hendrik/reed- solomon/index.html or http://www.engr.scu.edu/mourad/benchmark/RTL-Bench. html. As noted in the table, in some cases, the benchmarks were not obtained directly from these sources and, instead, were modified as part of the work performed in [79]. The modifications included the removal of FPGA vendor-specific constructs and the correction of any compilation issues in the designs. A.2 FPGA to ASIC Comparison Data The results in Chap. 3 were given only in relative terms. This section provides the raw data underlying these relative comparisons. TablesA.2 andA.3 list the maximum operating frequency and dynamic power, respectively, for each design for both the FPGA and ASIC. Finally, Tables A.4 and A.5 report the FPGA and ASIC absolute static power measurements for each benchmark at typical- and worst-case conditions, respectively. The static power measurements for the FPGAs include the adjustments to account for the partial utilization of each device as described in Sect. 3.4.3.2. Finally, Table A.6 summarizes the results when retiming was used with the FPGA 133 book 4/9/2009 16: 24 page 134 134 A FPGA to ASIC Comparison Details Table A.1 Benchmark descriptions Benchmark Description booth 32-bit serial Booth-encoded multiplier created by the author rs encoder (255,239) Reed Solomon encoder from OpenCores cordic18 18-bit CORDIC algorithm implementation from OpenCores cordic8 8-bit CORDIC algorithm implementation from OpenCores des area DES Encryption/Decryption designed for area from OpenCores with modifications from [79] des perf DES Encryption/Decryption designed for performance from OpenCores with modifications from [79] fir restruct 8-bit 17-tap finite impulse response filter with fixed coefficients from http:// www.engr.scu.edu/mourad/benchmark/RTL-Bench.html with modifications from [79] mac1 Ethernet Media Access Control (MAC) block from OpenCores with modifications from [79] aes192 AES Encryption/Decryption with 192-bit keys from OpenCores fir3 8-bit 3-tap finite impulse response filter from OpenCores with modifications from [79] diffeq Differential equation solver from OpenCores with modifications from [79] diffeq2 Differential equation solver from OpenCores with modifications from [79] molecular Molecular dynamics simulator [29] rs decoder1 (31,19) Reed Solomon decoder from http://www.humanistic.org/∼hendrik/ reed-solomon/index.html with modifications from [79] rs decoder2 (511,503) Reed Solomon decoder http://www.humanistic.org/∼hendrik/ reed-solomon/index.html with modifications from [79] atm High speed 32 × 32 ATM packet switch based on the architecture from [50] aes AES Encryption with 128-bit keys from OpenCores aes inv AES Decryption with 128-bit keys from OpenCores ethernet Ethernet Media Access Control (MAC) block from OpenCores serialproc 32-bit RISC processor with serial ALU [165, 166] fir24 16-bit 24-tap finite impulse response filter from OpenCores with modifications from [79] pipe5proc 32-bit RISC processor with 5 pipeline stages [165, 166] raytracer Image rendering engine [71] CAD flow as described in Sect. 3.5.2. The benchmark size (in ALUTs), the oper- ating frequency increase and the total register increase are listed for each of the benchmarks. book 4/9/2009 16: 24 page 135 A.2 FPGA to ASIC Comparison Data 135 Table A.2 FPGA and ASIC operating frequencies Benchmark Maximum operating frequency (MHz) FPGA ASIC booth 188.71 934.58 rs encoder 288.52 1098.90 cordic18 260.08 961.54 cordic8 376.08 699.30 des area 360.49 729.93 des perf 321.34 1000.00 fir restruct 194.55 775.19 mac1 153.21 584.80 aes192 125.75 549.45 fir3 278.40 961.54 diffeq 78.23 318.47 diffeq2 70.58 281.69 molecular 89.01 414.94 rs decoder1 125.27 358.42 rs decoder2 101.24 239.23 atm 319.28 917.43 aes 213.22 800.00 aes inv 152.28 649.35 ethernet 168.58 704.23 serialproc 142.27 393.70 fir24 249.44 645.16 pipe5proc 131.03 378.79 raytracer 120.35 416.67 book 4/9/2009 16: 24 page 136 136 A FPGA to ASIC Comparison Details Table A.3 FPGA and ASIC dynamic power consumption Benchmark Dynamic power Consumption (W) FPGA ASIC booth 5.10×10−03 1.71×10−04 rs encoder 4.63×10−02 1.88×10−03 cordic18 6.75×10−02 1.08×10−02 cordic8 1.39×10−02 2.44×10−03 des area 3.50×10−02 1.32×10−03 des perf 1.22×10−01 1.31×10−02 fir restruct 2.47×10−02 2.56×10−03 mac1 8.94×10−02 4.63×10−03 aes192 1.04×10−01 3.50×10−03 fir3 7.91×10−03 1.06×10−03 diffeq 4.53×10−02 3.86×10−03 diffeq2 5.18×10−02 4.16×10−03 molecular 4.55×10−01 2.76×10−02 rs decoder1 3.48×10−02 2.20×10−03 rs decoder2 4.74×10−02 4.29×10−03 atm 5.59×10−01 3.71×10−02 aes 6.32×10−02 6.71×10−03 aes inv 7.65×10−02 1.13×10−02 ethernet 9.17×10−02 5.91×10−03 serialproc 3.42×10−02 2.16×10−03 fir24 1.18×10−01 2.22×10−02 pipe5proc 5.11×10−02 6.23×10−03 raytracer 8.99×10−01 1.08×10−01 book 4/9/2009 16: 24 page 137 A.2 FPGA to ASIC Comparison Data 137 Table A.4 FPGA and ASIC static power consump- tion – typical Benchmark Static power consumption (W) FPGA ASIC rs encoder 1.31×10−02 2.61×10−04 cordic18 4.43×10−02 5.73×10−04 des area 1.14×10−02 1.25×10−04 des perf 5.52×10−02 1.08×10−03 fir restruct 1.40×10−02 2.03×10−04 mac1 3.52×10−02 4.08×10−04 aes192 1.61×10−02 1.90×10−04 diffeq2 1.15×10−02 3.63×10−04 molecular 1.27×10−01 1.83×10−03 rs decoder1 1.74×10−02 7.47×10−05 rs decoder2 2.31×10−02 1.91×10−04 atm 2.46×10−01 1.08×10−03 aes 1.67×10−02 5.06×10−04 aes inv 2.06×10−02 6.68×10−04 ethernet 5.11×10−02 2.94×10−04 fir24 2.18×10−02 1.66×10−03 pipe5proc 2.06×10−02 1.27×10−04 raytracer 1.69×10−01 1.74×10−03 Table A.5 FPGA and ASIC static power consump- tion – worst case Benchmark Static power consumption (W) FPGA ASIC rs encoder 3.46×10−02 1.00×10−02 cordic18 1.17×10−01 2.27×10−02 des perf 1.45×10−01 4.16×10−02 fir restruct 3.70×10−02 7.86×10−03 mac1 9.28×10−02 1.56×10−02 aes192 5.00×10−02 7.51×10−03 diffeq 2.45×10−02 1.44×10−02 diffeq2 3.04×10−02 1.40×10−02 molecular 3.95×10−01 7.19×10−02 rs decoder1 4.60×10−02 3.02×10−03 rs decoder2 6.10×10−02 7.46×10−03 atm 7.70×10−01 4.61×10−02 aes 5.21×10−02 1.93×10−02 aes inv 6.42×10−02 2.58×10−02 ethernet 1.35×10−01 1.07×10−02 fir24 6.80×10−02 6.52×10−02 pipe5proc 5.44×10−02 9.20×10−03 raytracer 7.14×10−01 N/A book 4/9/2009 16: 24 page 138 138 A FPGA to ASIC Comparison Details Table A.6 Impact of retiming on FPGA performance Benchmark Benchmark ALUTs Operating frequency Register count category increase (%) increase (%) des area Logic 469 1.2 0.0 booth Logic 34 0.0 0.0 rs encoder Logic 683 0.0 0.0 fir scu rtl Logic 615 14 89 fir restruct1 Logic 619 11 64 fir restruct Logic 621 15 76 mac1 Logic 1,852 0.0 0.0 cordic8 Logic 251 0.0 0.0 mac2 Logic 6,776 0.0 0.0 md5 1 Logic 2,227 23 21 aes no mem Logic 1,389 0.0 0.0 raytracer framebuf v1 Logic 301 3.0 0.0 raytracer bound Logic 886 0.0 0.0 raytracer bound v1 Logic 889 0.0 0.0 cordic Logic 907 0.0 0.0 aes192 Logic 1,090 9.7 30 md5 2 Logic 858 10 13 cordic Logic 1,278 0.0 0.0 des perf Logic 1,840 −0.5 1.0 cordic18 Logic 1,169 0.0 0.0 aes inv no mem Logic 1,962 0.0 0.0 fir3 DSP 52 −14 −40 diffeq DSP 219 0.0 0.0 iir DSP 284 0.0 0.0 iir1 DSP 218 0.0 0.0 diffeq2 DSP 222 0.0 0.0 rs decoder1 DSP 418 5.4 7.5 rs decoder2 DSP 535 −0.3 11 raytracer gen v1 DSP 1,625 0.0 0.0 raytracer gen DSP 1,706 0.0 0.0 molecular DSP 6,289 1.3 14 molecular2 DSP 6,557 24 71 stereovision1 DSP 2,934 36 19 stereovision3 Memory 82 10 9.3 serialproc Memory 671 −2.0 16 raytracer framebuf Memory 457 12 0.0 aes Memory 675 0.0 0.0 aes inv Memory 813 0.0 0.0 ethernet Memory 1,650 −0.6 4.1 faraday dma Memory 1,987 0.5 0.9 faraday risc Memory 2,596 −1.0 1.3 faraday dsp Memory 7,218 −2.9 −0.1 stereovision0 v1 Memory 2,919 −1.6 0.2 atm Memory 10,514 4.7 1.1 stereovision0 Memory 19,969 3.7 0.4 oc54 cpu DSP & Mem 1,543 0.0 0.0 pipe5proc DSP & Mem 746 5.5 49 (continued) book 4/9/2009 16: 24 page 139 A.2 FPGA to ASIC Comparison Data 139 Table A.6 (continued) Benchmark Benchmark ALUTs Operating frequency Register count category increase (%) increase (%) fir24 DSP & Mem 821 −7.4 −3.3 fft256 nomem DSP & Mem 966 0.0 0.0 raytracer top DSP & Mem 11,438 14 0.0 raytracer top v1 DSP & Mem 11,424 11 −0.3 raytracer DSP & Mem 13,021 3.0 −0.6 fft256 DSP & Mem 27,479 0.0 0.0 stereovision2 v1 DSP & Mem 27,097 117 131 stereovision2 DSP & Mem 27,691 97 124 book 4/9/2009 16: 24 page 141 Appendix B Representative Delay Weighting The programmability of FPGAs means that the eventual critical paths are not known at design time.