Athena – Automated Tool for Hardware Evalualon

ATHENa – Automated Tool for Hardware Evalua4oN: Toward Fair and Comprehensive Benchmarking of Cryptographic Hardware using FPGAs Kris Gaj, Jens-Peter Kaps, Venkata Amirineni, Marcin Rogawski, Ekawat Homsirikamol, Benjamin Y. Brewster Supported in part by the National Institute of Standards & Technology (NIST)1 ATHENa Team Venkata Ekawat Marcin John Rajesh Michal “Vinny” “Ice” PhD exchange MS CpE PhD CpE PhD ECE MS CpE PhD ECE student from student student student student student Slovakia Outline • Motivation & Goals • Previous Work • ATHENa • Two Case Studies • Future Work & Conclusions 3 Common Benchmarking Pitfalls • taking credit for improvements in technology • choosing a convenient (but not necessarily fair) performance measure • comparing designs with different functionality • comparing designs optimized using different optimization target • comparing clock frequency after synthesis vs. clock frequency after placing and routing 4 4 Objective Benchmarking Difficulties • lack of standard interfaces • influence of tools and their options • stand-alone performance vs. performance as a part of a bigger system • time & effort spent on optimization 5 5 Goal of Our Project • spread knowledge and awareness needed to eliminate benchmarking pitfalls • develop methodology and tools required to overcome objective difficulties 6 6 Why Cryptographic Algorithms? • well documented speed-ups (10x-10,000x) • security gains (e.g., key generation & storage) • constantly evolving standards • cryptographic standard contests 7 7 Cryptographic Standard Contests IX.1997 X.2000 AES 15 block ciphers → 1 winner NESSIE I.2000 XII.2002 CRYPTREC XI.2004 V.2008 34 stream ciphers → 4 SW+4 HW winners eSTREAM X.2007 XII.2012 51 hash functions → 1 winner SHA-3 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 time 8 8 Criteria Used to Evaluate Cryptographic Algorithms Security Software Hardware Efficiency Efficiency Flexibility 9 9 Advanced Encryption Standard Contest Round 2 of AES Contest, 2000 Speed in FPGAs Votes at the AES 3 conference 10 Benchmarking in Cryptography Software FPGAs ASICs eBACS ? ? D. Bernstein, T. Lange 11 11 eBACS: ECRYPT Benchmarking of Cryptographic Systems SUPERCOP - toolkit developed by D. Bernstein and T. Lange for measuring performance of cryptographic software • measurements on multiple machines (currently over 70) • each implementation is recompiled multiple times (currently over 1200 times) with various compiler options • time measured in clock cycles/byte for multiple input/output sizes • median, lower quartile (25th percentile), and upper quartile (75th percentile) reported • standardized function arguments (common API) 12 12 ATHENa – Automated Tool for Hardware EvaluatioN http://cryptography.gmu.edu/athena Benchmarking open-source tool, written in Perl, aimed at an AUTOMATED generation of OPTIMIZED results for MULTIPLE hardware platforms Currently under development at George Mason University. 13 Why Athena? "The Greek goddess Athena was frequently called upon to settle disputes between the gods or various mortals. Athena Goddess of Wisdom was known for her superb logic and intellect. Her decisions were usually well-considered, highly ethical, and seldom motivated by self-interest.” from "Athena, Greek Goddess of Wisdom and Craftsmanship" 14 Basic Dataflow of ATHENa User FPGA Synthesis and Implementation 6 5 Ranking 2 3 Database of designs query HDL + scripts + Result Summary configuration files + Database Entries ATHENa 1 Server Download scripts HDL + FPGA Tools and configuration files8 4 Database Designer Entries Interfaces 0 + Testbenches 15 configura4on constraint files files testbench synthesizable source files database result entries summary (machine- (user-friendly) friendly) 16 ATHENa Major Features (1) • synthesis, implementa3on, and 3ming analysis in batch mode • support for devices and tools of mul4ple FPGA vendors: • genera3on of results for mulple families of FPGAs of a given vendor • automated choice of a best-matching device within a given family 17 ATHENa Major Features (2) • automated verifica4on of designs through simula3on in batch mode OR • support for mul-core processing • automated extrac4on and tabula4on of results • several opmizaon strategies aimed at finding – opmum opons of tools – best target clock frequency – best star3ng point of placement 18 Multi-Pass Place-and-Route Analysis GMU SHA-256, Xilinx Spartan 3 100 runs for different placement starting points ~ 15% best The bigger the better worst 19 19 Results for default target clock frequency preselected COST_TABLEs default COST_TABLE (1) (21, 41, 61, 81) 20 Results for target clock frequency = 80 MHz 21 Results for target clock frequency = 85 MHz default COST_TABLE (1) preselected COST_TABLEs (21, 41, 61, 81) 22 Results for target clock frequency = 90 MHz 23 Results for target clock frequency = 95 MHz 24 Optimization Strategy Used for Xilinx Devices 1. Frequency search Search for the highest requested clock frequency that is met with a single run of tools. 2. Exhaustive search Search for the best combination of the following options • optimization target for synthesis: area, speed • optimization target for mapping: area, speed • optimization effort level for placing and routing: medium, high 3. Placement search Search for the best starting point for placement, using 4 additional values of the COST_TABLE {21, 41, 61, 81}. Total number of runs = 15-20 25 Optimization Strategy Used for Altera Devices 1. Exhaustive search Search for the best combination of the following options: • Synthesis optimization: speed, area, balanced • Optimization effort: auto, fast • Implementation effort: standard, auto 2. Placement search Search for the best starting point of placement, using 4 additional values of SEED {2001, 4001, 6001, 8001}. Total number of runs = 16 26 Benchmarking Goals Facilitated by ATHENa 1. comparing multiple cryptographic algorithms 2. comparing multiple hardware architectures or implementations of the same cryptographic algorithm 3. comparing various hardware platforms from the point of view of their suitability for the implementation of a given algorithm, such as a choice of an FPGA device or FPGA board for implementing a particular cryptographic system 4. comparing various tools and languages in terms of quality of results they generate (e.g. Verilog vs. VHDL, Synplicity Synplify Pro vs. Xilinx XST, ISE v. 10.2 vs. ISE v. 9.1) 27 Algorithm Comparison: SHA-256 vs. Fugue on Xilinx Spartan 3 Throughput (Mbit/s) Area (CLB slices) 3987 1400 1283.2 4000 3873 (- 3%) 3500 1200 1100.2 (+17%) 1000 3000 2500 800 694.9 625.9 2000 600 (+11%) 1500 1020 400 1000 200 500 883 (-13%) 0 0 Single Single Optimized SHA 256 Fugue 256 Optimized SHA 256 Fugue 256 28 Architecture Comparison: SHA-256 Basic Loop vs. Rescheduling on Altera Cyclone II Throughput (Mbit/s) Area (LE) 900 831 838.7 2500 2291 800 854.6 (+2%) 2019 2216 (-3%) 700 2000 600 871.8 (+5%) 2015 (- .1 1500 500 %) 400 1000 300 200 500 100 0 0 Single Basic Loop Single Optimized Rescheduling Optimized Basic Loop Rescheduling 29 Platform Comparison: SHA-256 Xilinx Spartan 3 vs. Altera Cyclone II Throughput (Mbit/s) Area (LC or LE) 900 831 871.8 (+5%) 2500 800 2019 2000 2040 2015 (-.1%) 700 625.9 600 694.9 1500 500 1776 (-13 400 (+11%) 1000 300 %) 200 500 100 0 0 Single Single Optimized Spartan 3 Cyclone II Optimized Spartan 3 Cyclone II 30 Tool Comparison: SHA-256 Rescheduling with ISE 9.1 vs. ISE 11.1 Throughput (Mbit/s) Area (CLB Slices) 729.2 800 694.9 1200 (+19%) 700 (+11%) 1020 1020 613.4 625.9 1000 600 883 (-13%) 500 800 873 (-14%) 400 600 300 400 200 100 200 0 0 Single Single Optimized ISE 9.1 ISE 11.1 Optimized ISE 9.1 ISE 11.1 31 Benchmarking of 14 Round-2 SHA-3 Candidates • Timeline 5-6 1-2 51 Round 1 Round 2 Round 3 14 candidates Oct. 2008 July 2009 End of 2010 Mid 2012 • High-speed implementations of all 14 Round 2 SHA-3 candidates and the current standard SHA-2 developed and evaluated using ATHENa • Results reported at CHES 2010 and at the SHA-3 conference organized by NIST 32 Generation of Results Facilitated by ATHENa • batch mode of FPGA tools vs. • ease of extraction and tabulation of results • Excel, CSV (available), LaTeX (coming soon) • optimized choice of tool options • GMU_Xilinx_optimization_1 strategy 33 Relative Improvement of Results from Using ATHENa Virtex 5, 256-bit Variants of Hash Functions 2.5 2 1.5 Area 1 Thr Thr/Area 0.5 0 Ratios of results obtained using ATHENa suggested options vs. default options of FPGA tools 34 Relative Improvement of Results from Using ATHENa Virtex 5, 512-bit Variants of Hash Functions 3 2.5 2 1.5 Area 1 Thr Thr/Area 0.5 0 Ratios of results obtained using ATHENa suggested options vs. default options of FPGA tools 35 Most Important Features of ATHENa • comprehensive • automated • collaborative • practical With single point of contact: • distributed • optimized • with a single point of contact 36 Our Environment will Serve • Researchers – fair, automated, and comprehensive comparison of new algorithms, architectures, and implementations with previously reported work • Designers – informed choice of technology (FPGA, ASIC, microprocessor) and a specific device within a given technology • Developers and Users of Tools – comprehensive comparison across various tools; optimization methodologies developed and comprehensively tested as a part of this project • Standardization Organizations (such as NIST) – evaluation of existing and emerging standards; support of contests for new standards; comprehensive and easy to locate database of results 37 Invita4on

Load more