Fast Convolutional Neural Networks on Fpgas With
FAST CONVOLUTIONAL NEURAL NETWORKS ON FPGAS WITH HLS4ML Thea Aarrestad, Vladimir Loncar,∗ Nicolò Ghielmetti,y Maurizio Pierini, Sioni Summers European Organization for Nuclear Research (CERN) CH-1211 Geneva 23, Switzerland Jennifer Ngadiuba Christoffer Petersson,z Hampus Linander Yutaro Iiyama California Institute of Technology Zenseact ICEPP, University of Tokyo Pasadena, CA 91125, USA Gothenburg, 41756, Sweden Tokyo, Japan Giuseppe Di Guglielmo Javier Duarte Philip Harris, Dylan Rankin Columbia University, New York, University of California San Diego Massachusetts Institute of Technology NY 10027, USA La Jolla, CA 92093, USA Cambridge, MA 02139, USA Sergo Jindariani, Kevin Pedro, Nhan Tran Mia Liu Edward Kreinar Fermi National Accelerator Laboratory Purdue University HawkEye360 Batavia, IL 60510, USA West Lafayette, IN 47907, USA Herndon, VA 20170, USA Zhenbin Wu Duc Hoang University of Illinois at Chicago Rhodes College Chicago, IL 60607, USA Memphis, TN 38112, USA April 30, 2021 arXiv:2101.05108v2 [cs.LG] 29 Apr 2021 ABSTRACT We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with convolutional layers on FPGAs. By extending the hls4ml library, we demonstrate an inference latency of 5 µs using convolutional architectures, targeting microsecond latency applications like those at the CERN Large Hadron Collider. Considering benchmark models trained on the Street View House Numbers Dataset, we demonstrate various methods for model compression in order to fit the computational constraints of a typical FPGA device used in trigger and data acquisition systems of particle detectors. In particular, we discuss pruning and quantization-aware training, and demonstrate how resource utilization can be significantly reduced with little to no loss in model accuracy.
[Show full text]