Usuba, Optimizing Bitslicing Compiler Darius Mercadier

Total Page:16

File Type:pdf, Size:1020Kb

Usuba, Optimizing Bitslicing Compiler Darius Mercadier Usuba, Optimizing Bitslicing Compiler Darius Mercadier To cite this version: Darius Mercadier. Usuba, Optimizing Bitslicing Compiler. Programming Languages [cs.PL]. Sorbonne Université (France), 2020. English. tel-03133456 HAL Id: tel-03133456 https://tel.archives-ouvertes.fr/tel-03133456 Submitted on 6 Feb 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. THESE` DE DOCTORAT DE SORBONNE UNIVERSITE´ Specialit´ e´ Informatique Ecole´ doctorale Informatique, Tel´ ecommunications´ et Electronique´ (Paris) Present´ ee´ par Darius MERCADIER Pour obtenir le grade de DOCTEUR de SORBONNE UNIVERSITE´ Sujet de la these` : Usuba, Optimizing Bitslicing Compiler soutenue le 20 novembre 2020 devant le jury compose´ de : M. Gilles MULLER Directeur de these` M. Pierre-Evariste´ DAGAND Encadrant de these` M. Karthik BHARGAVAN Rapporteur Mme. Sandrine BLAZY Rapporteur Mme. Caroline COLLANGE Examinateur M. Xavier LEROY Examinateur M. Thomas PORNIN Examinateur M. Damien VERGNAUD Examinateur Abstract Bitslicing is a technique commonly used in cryptography to implement high-throughput parallel and constant-time symmetric primitives. However, writing, optimizing and pro- tecting bitsliced implementations by hand are tedious tasks, requiring knowledge in cryptography, CPU microarchitectures and side-channel attacks. The resulting programs tend to be hard to maintain due to their high complexity. To overcome those issues, we propose Usuba, a high-level domain-specific language to write symmetric cryptographic primitives. Usuba allows developers to write high-level specifications of ciphers with- out worrying about the actual parallelization: an Usuba program is a scalar description of a cipher, from which the Usuba compiler, Usubac, automatically produces vectorized bitsliced code. When targeting high-end Intel CPUs, the Usubac applies several domain-specific op- timizations, such as interleaving and custom instruction-scheduling algorithms. We are thus able to match the throughputs of hand-tuned assembly and C implementations of several widely used ciphers. Futhermore, in order to protect cryptographic implementations on embedded devices against side-channel attacks, we extend our compiler in two ways. First, we integrate into Usubac state-of-the-art techniques in higher-order masking to generate implemen- tations that are provably secure against power-analysis attacks. Second, we implement a backend for SKIVA, a custom 32-bit CPU enabling the combination of countermeasures against power-based and timing-based leakage, as well as fault injection. 3 R´esum´e Le bitslicing est une technique utilisee´ pour implementer´ des primitives cryptographiques efficaces et s’executant´ en temps constant. Cependant, ecrire,´ optimiser, et securiser´ manuellement des programmes bitslices´ est une tacheˆ fastidieuse, necessitant´ des con- naissances en cryptographie, en microarchitecture des processeurs et en attaques par canaux caches.´ Afin de remedier´ a` ces difficultes,´ nous proposons Usuba, un langage dedi´ e´ permettant d’implementer´ des algorithmes de cryptographie symetrique.´ Usuba permet aux developpeurs´ d’ecrire´ des specifications´ de haut niveau sans se soucier de leur parallelisation:´ un programme Usuba est une description scalaire d’une primitive, a` partir de laquelle le compilateur Usuba, Usubac, produit automatiquement un code bitslice´ et vectorise.´ Afin de produire du code efficace pour les processeurs haut de gamme, Usubac ap- plique plusieurs optimisations specialement´ conc¸ues pour les primitives cryptographiques, telles que l’entrelacement et l’ordonnancement d’instructions. Ainsi, le code produit par notre compilateur offre des performances comparables a` du code assembleur ou C opti- mise´ a` la main. De plus, afin de gen´ erer´ des implementations´ securis´ ees´ contre des attaques par canaux caches,´ nous proposons deux extensions de Usubac. Lorsque les attaques par analyse de courant sont un risque a` considerer,´ Usubac est capable de proteger´ les implementations´ qu’il produit a` l’aide de masquage booleen.´ Si, additionellement, des attaques par injec- tion de fautes doivent etreˆ prevenues,´ alors Usubac peut gen´ erer´ du code pour SKIVA, un processeur 32-bit offrant des instructions permettant de combiner des contre-mesures pour du code bitslice.´ 5 Acknowledgments Note: the names mentioned in this section are ordered using a non-disclosed algorithm. I recom- mend that you do not try to find a logic in the ordering; and especially that you do not wrongly assume that first is better than last. First, I would like to thank Sandrine Blazy and Karthik Bhargavan for reviewing my manuscript. Having spent months trying to produce a good manuscript, I was very pleased that you enjoyed reading it, and your positive reviews really warmed my heart. I would also like to thank Damien Vergnaud, Xavier Leroy, Caroline Collange and Thomas Pornin for being part of the jury. Of course, I am immensely grateful to Pierre-Evariste´ Dagand and Gilles Muller for supervising and guiding me throughout these 3 years. Pierre-Evariste,´ you always pushed me to strive for perfection, to leave no unturned stones, to try every possibilities, and to do the best work possible. Even though I probably wasn’t the most docile PhD student, I enjoyed working with you, and I learned a lot during these 3 years, in large part thanks to you. Or, to put it otherwise (just for you), I won’t say that I am unhappy to not have done this Ph.D. in another way. A lot of work in this thesis would not have been possible without collaborations with great researchers. Patrick Schaumont, Pantea Kiaei, Raphael¨ Wintersdorff, Sonia Bela¨ıd, Matthieu Rivain, Lionel Lacassagne, Karine Heydemann, I’m glad to have worked with you, and I hope this is neither the end of our collaborations, nor of our friendship. It was in particular really interesting to work with people from different sub-fields of cryptog- raphy: on one side, Patrick and Pantea, with your hardware expertise and low-level un- derstanding of side-channel attacks, and on the other side, Raphael,¨ Sonia and Matthieu, with your theoretical approach to the problem. While not an official collaborator, I’m infinitely grateful to my brother Hector who helped me visualize and interpret quite a lot of data from my benchmarks. I’d also like to thank some of my professors who inspired me to pursue this PhD and become a researcher: Jer´ emie´ Salvucci (although your last advice regarding the PhD was “run away”), Fred´ eric´ Peschanski, Antoine Genitrini, Emmanuel Chailloux, Julien Sopena, Quentin Meunier. Three years in the same office working on the same topic would certainly have been too much for me, if it weren’t for the amazing people surrounding me. In particular, Redha, Lucas, Antoine, Cedric´ and Pierre, whose eternal commitment to not being too serious made it a pleasure to work around them. And more generally, other PhD stu- dents, interns and post-doc from the Whisper, DELYS, MoVe and APR teams. There is more to life than just research (hope I’m not offending too many people with that sentence). And one of the major reasons that those last 3 years were great is the time I spent with my friends (not really sorted): Pierre B, Julien P, Juliette, Camille, Theo,´ Guillaume, Ninon, Jordi, Remy,´ Astrid and all the ones I’m forgetting. Thank you also to my cycling partners (Yacine, Jerome, Vicky and Antoine), and my Counter-Strike mates (Xili, Mando, John, Pasjordi, Arta, Thaur). I’d like to thank my family for always being there and for being so great. While I might not show it, I’m glad to have you in my life. 7 8 Finally, I would like to thank the most important person to me, Marjorie, without whom life would be so grim. Beyond being the most amazing girlfriend ever, you are the nicest and most caring person I know, and living with you is an unending pleasure. Thank you for helping me get through the finish line of this PhD (I would probably have taken a least a few more months without you pushing me to work), and thank you for being in my life ~ Contents 1 Introduction 13 1.1 Contributions . 21 1.2 Background . 22 1.3 Conclusion . 47 2 Usuba, Informally 49 2.1 Data Layout . 50 2.2 Syntax & Semantics, Informally . 51 2.3 Types . 54 2.4 Applications . 60 2.5 Conclusion . 68 3 Usuba, Greek Edition 73 3.1 Syntax . 74 3.2 Type System . 76 3.3 Monomorphization . 78 3.4 Semantics . 81 3.5 Usuba0 ....................................... 83 3.6 Translation to PseudoC .............................. 84 3.7 Conclusion . 90 4 Usubac 91 4.1 Frontend . 91 4.2 Backend . 96 4.3 Conclusion . 123 5 Performance Evaluation 125 5.1 Usuba vs. Reference Implementations . 125 5.2 Scalability . 130 5.3 Polymorphic Cryptographic Implementations . 132 5.4 Conclusion . 134 6 Protecting Against Power-based Side-channel Attacks 135 6.1 Masking . 136 6.2 Tornado ....................................... 136 6.3 Evaluation . 140 6.4 Conclusion . 147 7 Protecting Against Fault Attacks and Power-based Side-channels Attacks 151 7.1 Fault Attacks . 151 7.2 SKIVA . 152 7.3 Evaluation . 158 9 7.4 Conclusion . 161 8 Conclusion 163 8.1 Future Work . 164 Bibliography 175 Appendix A Backend optimizations 195 List of Figures 1.1 The RECTANGLE cipher . 19 1.2 Skylake’s pipeline . 23 1.3 Some SIMD instructions and intrinsics . 25 1.4 Our benchmark code . 27 1.5 Bitslicing example with 3-bit data and 4-bit registers . 28 1.6 The AVX-512 vpandd instruction .
Recommended publications
  • Maldet: How to Detect the Malware?
    International Journal of Computer Applications (0975 – 8887) Volume 123 – No.6, August 2015 MalDet: How to Detect the Malware? Samridhi Sharma Shabnam Parveen Department of CSE, Assistant Professor, Department of CSE, Seth Jai Parkash Mukand Lal Institute of Seth Jai Parkash Mukand Lal Institute of Engineering and Technology, Engineering and Technology, Harayana, India. Harayana, India. ABSTRACT to violate the privacy and security of a system. According to Malware is malicious software. This software used to the Symantec Internet Threat Report 499,811 new malware interrupt computer functionality. Protecting the internet is samples were received in the second half of 2007 detection. probably a enormous task that the contemporary epoch of So it becomes necessary to detect the malware. computers have seen. Day by day the threat levels large thus This paper is organized as follows: Section two has covered making the network susceptible to attacks. Many novel the recent state of the malware security. Section three strategies are brought into the field of cyber security to guard discusses about the malware classification, section four websites from attacks. But still malware has remained a grave presents the malware dectector. Section five studies reason of anxiety to web developers and server administrators. mechanism of malware detection, and finally section six With this war takes place amid the security community and explains malware normalization process. malicious software developers, the security specialists use all possible techniques, methods and strategies to discontinue and 2. RELATED WORK eliminate the threats while the malware developers utilize new Technology has turn out to be an building block in recent types of malwares that avoid implemented security features.
    [Show full text]
  • System Design for a Computational-RAM Logic-In-Memory Parailel-Processing Machine
    System Design for a Computational-RAM Logic-In-Memory ParaIlel-Processing Machine Peter M. Nyasulu, B .Sc., M.Eng. A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree of Doctor of Philosophy Ottaw a-Carleton Ins titute for Eleceical and Computer Engineering, Department of Electronics, Faculty of Engineering, Carleton University, Ottawa, Ontario, Canada May, 1999 O Peter M. Nyasulu, 1999 National Library Biôiiothkque nationale du Canada Acquisitions and Acquisitions et Bibliographie Services services bibliographiques 39S Weiiington Street 395. nie WeUingtm OnawaON KlAW Ottawa ON K1A ON4 Canada Canada The author has granted a non- L'auteur a accordé une licence non exclusive licence allowing the exclusive permettant à la National Library of Canada to Bibliothèque nationale du Canada de reproduce, ban, distribute or seU reproduire, prêter, distribuer ou copies of this thesis in microform, vendre des copies de cette thèse sous paper or electronic formats. la forme de microficbe/nlm, de reproduction sur papier ou sur format électronique. The author retains ownership of the L'auteur conserve la propriété du copyright in this thesis. Neither the droit d'auteur qui protège cette thèse. thesis nor substantial extracts fkom it Ni la thèse ni des extraits substantiels may be printed or otherwise de celle-ci ne doivent être imprimés reproduced without the author's ou autrement reproduits sans son permission. autorisation. Abstract Integrating several 1-bit processing elements at the sense amplifiers of a standard RAM improves the performance of massively-paralle1 applications because of the inherent parallelism and high data bandwidth inside the memory chip.
    [Show full text]
  • Talking Book Topics November-December 2017
    Talking Book Topics November–December 2017 Volume 83, Number 6 Need help? Your local cooperating library is always the place to start. For general information and to order books, call 1-888-NLS-READ (1-888-657-7323) to be connected to your local cooperating library. To find your library, visit www.loc.gov/nls and select “Find Your Library.” To change your Talking Book Topics subscription, contact your local cooperating library. Get books fast from BARD Most books and magazines listed in Talking Book Topics are available to eligible readers for download on the NLS Braille and Audio Reading Download (BARD) site. To use BARD, contact your local cooperating library or visit nlsbard.loc.gov for more information. The free BARD Mobile app is available from the App Store, Google Play, and Amazon’s Appstore. About Talking Book Topics Talking Book Topics, published in audio, large print, and online, is distributed free to people unable to read regular print and is available in an abridged form in braille. Talking Book Topics lists titles recently added to the NLS collection. The entire collection, with hundreds of thousands of titles, is available at www.loc.gov/nls. Select “Catalog Search” to view the collection. Talking Book Topics is also online at www.loc.gov/nls/tbt and in downloadable audio files from BARD. Overseas Service American citizens living abroad may enroll and request delivery to foreign addresses by contacting the NLS Overseas Librarian by phone at (202) 707-9261 or by email at [email protected]. Page 1 of 128 Music scores and instructional materials NLS music patrons can receive braille and large-print music scores and instructional recordings through the NLS Music Section.
    [Show full text]
  • 9781412929998.Pdf
    Duck-3494-Prelims.qxd 1/16/2007 10:39 AM Page i Human Relationships Duck-3494-Prelims.qxd 1/16/2007 10:39 AM Page ii Duck-3494-Prelims.qxd 1/16/2007 10:39 AM Page iii Human Relationships 4th Edition Steve Duck Duck-3494-Prelims.qxd 1/16/2007 10:39 AM Page iv © Steve Duck 2007 First published 2007 Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act, 1988, this publication may be reproduced, stored or transmitted in any form, or by any means, only with the prior permission in writing of the publishers, or in the case of reprographic reproduction, in accordance with the terms of licences issued by the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to the publishers. SAGE Publications Ltd 1 Oliver’s Yard 55 City Road London EC1Y 1SP SAGE Publications Inc. 2455 Teller Road Thousand Oaks, California 91320 SAGE Publications India Pvt Ltd B 1/I 1 Mohan Cooperative Industrial Area Mathura Road, New Delhi 110 044 India SAGE Publications Asia-Pacific Pte Ltd 33 Pekin Street #02-01 Far East Square Singapore 048763 British Library Cataloguing in Publication data A catalogue record for this book is available from the British Library ISBN 978-1-4129-2998-1 ISBN 978-1-4129-2999-8 (pbk) Library of Congress Control Number: 2006929060 Typeset by C&M Digitals (P) Ltd, Chennai, India Printed in Great Britain by The Alden Press, Witney Printed on paper from sustainable resources Duck-3494-Prelims.qxd 1/16/2007
    [Show full text]
  • X86-64 Calling Conventions
    CSE 351: The Hardware/Software Interface Section 4 Procedure calls Procedure calls In x86 assembly, values are passed to function calls on the stack Perks: Concise, easy to remember Drawbacks: Always requires memory accesses In x86-64 assembly, values are passed to function calls in registers Perks: Less wasted space, faster Drawbacks: Potentially requires a lot of register manipulation 2/23/2014 2 x86 calling conventions Simply push arguments onto the stack in order, then “call” the function! Suppose we define the following function: int sum(int a, int b) { return a + b; } (See also sum.c from the provided code) 2/23/2014 3 x86 calling conventions int sum(int a, int b) { return a + b; } In assembly, we have something like this: sum: pushl %ebp # Save base pointer movl %esp, %ebp # Save stack pointer movl 12(%ebp), %eax # Load b movl 8(%ebp), %edx # Load a addl %edx, %eax # Compute a + b popl %ebp # Restore base pointer ret # Return 2/23/2014 4 x86 calling conventions What is happening with %ebp and %esp? pushl %ebp The base pointer %ebp is the address of the caller, which is the location to which “ret” returns. The function pushes it into the stack so that it won’t be overwritten movl %esp, %ebp Functions often shift the stack pointer to allocate temporary stack space, so this instruction makes a backup of the original location. In the body of the function, %ebp is now the original start of the stack ret When sum() returns, execution picks up at the stored base pointer address.
    [Show full text]
  • For Those Who Won't Fail Themselves
    ~For Those Who Won’t Fail Themselves~ Black Sun Imperium 2 Mine is a Heart of Carnelian, Crimson as Murder on a Holy Day. Mine is a Heart of Corneal, the Gnarled Roots of a Dogwood and the Bursting of Flowers. I am the Broken Wax Seal on my Lover's Letters. I am the Phoenix, the Fiery Sun, Consuming and Resuming Myself. I Will what I Will. Mine is a Heart of Carnelian, Blood Red as the Crest of a Phoenix. ~To My Beloved Master Enlil Hiro-kura Hoshigaoka~ 3 Phagos Angmar Hoshigaoka Black Sun Imperium Bringing Forth the Noesis of the Ancient Darklords 4 Black Sun Imperium Index Foreword ………………………………….. 7 Preface …………………………………... 9 Imperium of the Black Sun …………………………………On 16 The Deathtrap of Mental Inertia: The Yoke of the Herd ………………………………. Two 26 The Black Sun ……………………………… . Thr 35 Darklords in Disguise ……………………………… Four 55 The 5 Second Rule ……………………………… Fi 91 The Abysmal Bridge ………………………………… Six 100 The Bridge of Manifestation ………………………………… S n 122 Prelude to the Styx ………………………………… ight 130 The Styx ov Power ………………………………… Nin 144 The Styx ov Wealth ………………………………… Ten 199 The Styx ov Eminence ………………………………… le n 220 The Styx ov Strife ………………………………… Twl 252 The Nine Pointed Star Black Sun Working ………………………………… . 13 266 Glossary ………………………………….. 274 List of Resources …………………………………… 277 5 6 Foreword Foreword from the Editor When first I was asked to proof read and correct this book, I had no idea what to expect; I myself was an open book as far as this subject was concerned. Always a fan of Darkside characters but never having done too much research into the subject, I had no formed opinions and absolutely no idea there was a culture dedicated to living life in this manner.
    [Show full text]
  • Block Ciphers: Fast Implementations on X86-64 Architecture
    Block Ciphers: Fast Implementations on x86-64 Architecture University of Oulu Department of Information Processing Science Master’s Thesis Jussi Kivilinna May 20, 2013 Abstract Encryption is being used more than ever before. It is used to prevent eavesdropping on our communications over cell phone calls and Internet, securing network connections, making e-commerce and e-banking possible and generally hiding information from unwanted eyes. The performance of encryption functions is therefore important as slow working implementation increases costs. At server side faster implementation can reduce the required capacity and on client side it can lower the power usage. Block ciphers are a class of encryption functions that are typically used to encrypt large bulk data, and thus make them a subject of many studies when endeavoring greater performance. The x86-64 architecture is the most dominant processor architecture in server and desktop computers; it has numerous different instruction set extensions, which make the architecture a target of constant new research on fast software implementations. The examined block ciphers – Blowfish, AES, Camellia, Serpent and Twofish – are widely used in various applications and their different designs make them interesting objects of investigation. Several optimization techniques to speed up implementations have been reported in previous research; such as the use of table look-ups, bit-slicing, byte-slicing and the utilization of “out-of-order” scheduling capabilities. We examine these different techniques and utilize them to construct new implementations of the selected block ciphers. Focus with these new implementations is in modes of operation which allow multiple blocks to be processed in parallel; such as the counter mode.
    [Show full text]
  • Works by Tom Stoppard
    Works by Tom Stoppard ‘A Play In Three Acts’ 56n, 463n, 581n, 582n Dogg’s Hamlet, Cahoot’s Macbeth xxiv, A Separate Peace 499, 528n xxvi, 55, 176, 354n, 393, 423, 557, 582 A Walk on the Water 516 After Magritte xxxi, 5, 8–11, 18, 39, 55–56, Empire of the Sun 7, 414, 440n 437, 470, 491, 504, 504n, 535n, 581 Enigma 77 Albert’s Bridge 155, 551 Enter a Free Man 254, 399, 454, 516n, 524, Anna Karenina 257 560, 560n, 583 Another Moon Called Earth 37n, 121, 156, 239, Every Good Boy Deserves Favour xxv–xxvi, 239n, 320, 323, 441, 443, 455, 582 xxix, 50–52, 55, 56, 59n, 90, 121, 136, Arcadia xxii, xxiv–xvii, 15, 17, 23, 37, 182–184, 235–237, 286–289, 294, 298, 38–41, 55, 74–79, 88, 88n, 118n, 126, 327n, 381–384, 437, 447, 473, 479, 136–152, 179, 185, 212, 222, 231, 234, 238, 490, 496n, 582 248, 260, 309, 313, 315, 322, 337, 347, 350–352, 369, 377n, 398n, 399, 407, ‘First Person’ 244n 412, 418, 424, 436n, 441, 451, 470, 473, ‘Freedom but thousands are still 488–489, 505, 514, 521, 523, captive’ 277n 531, 552, 566, 568–572, 575, 578–579, Funny Man 458, 559 581, 583 Article on James Thurber 5 Galileo 144, 258–259, 260, 471 Artist Descending a Staircase xxiii, xxvi, 11–14, 26, 62, 63, 64, 64n, 66, 118, 120–121, Hapgood xxiii–xxvi, xxvii–xxix, 15, 40, 153n, 157–159, 236n, 238, 326, 329, 352, 52, 55, 56, 127, 129, 135–137, 143, 145, 363, 370–372, 423, 533, 551, 560–565, 222n, 238, 258, 259n, 260, 309, 343, 347, 579, 582 349, 351, 362, 365–369, 398n, 400–407, 400n, 423–424, 426, 428, 471, 478, 488, ‘But For The Middle Classes’ 251n, 261n, 492–495, 502–503,
    [Show full text]
  • Sample Applications User Guides Release 20.05.0
    Sample Applications User Guides Release 20.05.0 May 26, 2020 CONTENTS 1 Introduction to the DPDK Sample Applications1 1.1 Running Sample Applications...............................1 1.2 The DPDK Sample Applications..............................1 2 Compiling the Sample Applications3 2.1 To compile all the sample applications...........................3 2.2 To compile a single application..............................3 2.3 To cross compile the sample application(s)........................4 3 Command Line Sample Application5 3.1 Overview..........................................5 3.2 Compiling the Application.................................5 3.3 Running the Application..................................6 3.4 Explanation.........................................6 4 Ethtool Sample Application8 4.1 Compiling the Application.................................8 4.2 Running the Application..................................8 4.3 Using the application....................................8 4.4 Explanation.........................................9 4.5 Ethtool interface......................................9 5 Hello World Sample Application 11 5.1 Compiling the Application................................. 11 5.2 Running the Application.................................. 11 5.3 Explanation......................................... 11 6 Basic Forwarding Sample Application 13 6.1 Compiling the Application................................. 13 6.2 Running the Application.................................. 13 6.3 Explanation........................................
    [Show full text]
  • An FPGA-Accelerated Embedded Convolutional Neural Network
    Master Thesis Report ZynqNet: An FPGA-Accelerated Embedded Convolutional Neural Network (edit) (edit) 1000ch 1000ch FPGA 1000ch Network Analysis Network Analysis 2x512 > 1024 2x512 > 1024 David Gschwend [email protected] SqueezeNet v1.1 b2a ext7 conv10 2x416 > SqueezeNet SqueezeNet v1.1 b2a ext7 conv10 2x416 > SqueezeNet arXiv:2005.06892v1 [cs.CV] 14 May 2020 Supervisors: Emanuel Schmid Felix Eberli Professor: Prof. Dr. Anton Gunzinger August 2016, ETH Zürich, Department of Information Technology and Electrical Engineering Abstract Image Understanding is becoming a vital feature in ever more applications ranging from medical diagnostics to autonomous vehicles. Many applications demand for embedded solutions that integrate into existing systems with tight real-time and power constraints. Convolutional Neural Networks (CNNs) presently achieve record-breaking accuracies in all image understanding benchmarks, but have a very high computational complexity. Embedded CNNs thus call for small and efficient, yet very powerful computing platforms. This master thesis explores the potential of FPGA-based CNN acceleration and demonstrates a fully functional proof-of-concept CNN implementation on a Zynq System-on-Chip. The ZynqNet Embedded CNN is designed for image classification on ImageNet and consists of ZynqNet CNN, an optimized and customized CNN topology, and the ZynqNet FPGA Accelerator, an FPGA-based architecture for its evaluation. ZynqNet CNN is a highly efficient CNN topology. Detailed analysis and optimization of prior topologies using the custom-designed Netscope CNN Analyzer have enabled a CNN with 84.5 % top-5 accuracy at a computational complexity of only 530 million multiply- accumulate operations. The topology is highly regular and consists exclusively of convolu- tional layers, ReLU nonlinearities and one global pooling layer.
    [Show full text]
  • Advanced Synthesis Cookbook
    Advanced Synthesis Cookbook Advanced Synthesis Cookbook 101 Innovation Drive San Jose, CA 95134 www.altera.com MNL-01017-6.0 Document last updated for Altera Complete Design Suite version: 11.0 Document publication date: July 2011 © 2011 Altera Corporation. All rights reserved. ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS and STRATIX are Reg. U.S. Pat. & Tm. Off. and/or trademarks of Altera Corporation in the U.S. and other countries. All other trademarks and service marks are the property of their respective holders as described at www.altera.com/common/legal.html. Altera warrants performance of its semiconductor products to current specifications in accordance with Altera’s standard warranty, but reserves the right to make changes to any products and services at any time without notice. Altera assumes no responsibility or liability arising out of the application or use of any information, product, or service described herein except as expressly agreed to in writing by Altera. Altera customers are advised to obtain the latest version of device specifications before relying on any published information and before placing orders for products or services. Advanced Synthesis Cookbook July 2011 Altera Corporation Contents Chapter 1. Introduction Blocks and Techniques . 1–1 Simulating the Examples . 1–1 Using a C Compiler . 1–2 Chapter 2. Arithmetic Introduction . 2–1 Basic Addition . 2–2 Ternary Addition . 2–2 Grouping Ternary Adders . 2–3 Combinational Adders . 2–3 Double Addsub/ Basic Addsub . 2–3 Two’s Complement Arithmetic Review . 2–4 Traditional ADDSUB Unit . 2–4 Compressors (Carry Save Adders) . 2–5 Compressor Width 6:3 .
    [Show full text]
  • X86 Assembly Language Reference Manual
    x86 Assembly Language Reference Manual Part No: 817–5477–11 March 2010 Copyright ©2010 Oracle and/or its affiliates. All rights reserved. This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related software documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, the following notice is applicable: U.S. GOVERNMENT RIGHTS Programs, software, databases, and related documentation and technical data delivered to U.S. Government customers are “commercial computer software” or “commercial technical data” pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use, duplication, disclosure, modification, and adaptation shall be subject to the restrictions and license terms setforth in the applicable Government contract, and, to the extent applicable by the terms of the Government contract, the additional rights set forth in FAR 52.227-19, Commercial Computer Software License (December 2007). Oracle USA, Inc., 500 Oracle Parkway, Redwood City, CA 94065.
    [Show full text]