Automatically Adapting Programs for Mixed-Precision Floating-Point Computation Michael O. Lam and Bronis R. de Supinski and Jeffrey K. Hollingsworth Matthew P. LeGendre Dept. of Computer Science Center for Applied Scientific Computing University of Maryland Lawrence Livermore National Laboratory College Park, MD, USA Livermore, CA, USA
[email protected],
[email protected] [email protected],
[email protected] ABSTRACT IEEE standard provides for different levels of precision by As scientific computation continues to scale, efficient use of varying the field width, with the most common widths being floating-point arithmetic processors is critical. Lower preci- 32 bits (\single" precision) and 64 bits (\double" precision). sion allows streaming architectures to perform more opera- Figure 1 graphically represents these formats. tions per second and can reduce memory bandwidth pressure Double-precision arithmetic generally results in more ac- on all architectures. However, using a precision that is too curate computations, but with several costs. The main cost low for a given algorithm and data set leads to inaccurate re- is the higher memory bandwidth and storage requirement, sults. In this paper, we present a framework that uses binary which are twice that of single precision. Another cost is instrumentation and modification to build mixed-precision the reduced opportunity for parallelization, such as on the configurations of existing binaries that were originally devel- x86 architecture, where packed 128-bit XMM registers can oped to use only double-precision. This framework allows only hold and operate on two double-precision numbers si- developers to explore mixed-precision configurations with- multaneously compared to four numbers with single preci- out modifying their source code, and it enables automatic sion.