HP MLIB User's Guide

HP MLIB User’s Guide Seventh Edition HP Part No. B6061-96027 and B6061-96028 HP MLIB December 2004 Printed in USA Edition: Seventh Document Numbers: B6061-96027 and B6061-96028 Remarks: Released Dcember 2004 with HP MLIB software version9.0. User’s Guide split into two volumes with this release. Edition: Sixth Document Number: B6061-96023 Remarks: Released September 2003 with HP MLIB software version 8.50. Edition: Fifth Document Number: B6061-96020 Remarks: Released September 2002 with HP MLIB software version 8.30. Edition: Fourth Document Number: B6061-96017 Remarks: Released June 2002 with HP MLIB software version 8.20. Edition: Third Document Number: B6061-96015 Remarks: Released September 2001 with HP MLIB software version 8.10. Edition: Second Document Number: B6061-96012 Remarks: Released June 2001 with HP MLIB software version 8.00. Edition: First Document Number: B6061-96010 Remarks: Released December 1999 with HP MLIB software version B.07.00. This HP MLIB User’s Guide contains both HP VECLIB and HP LAPACK information. This document supercedes both the HP MLIB VECLIB User’s Guide, fifth edition (B6061-96006) and the HP MLIB LAPACK User’s Guide, sixth edition (B6061-96005). Notice Copyright 1979-2004 Hewlett-Packard Development Company. All Rights Reserved. Reproduction, adaptation, or translation without prior written permission is prohibited, except as allowed under the copyright laws. The information contained in this document is subject to change without notice. Hewlett-Packard makes no warranty of any kind with regard to this material, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. Hewlett-Packard shall not be liable for errors contained herein or for incidental or consequential damages in connection with the furnishing, performance or use of this material. Table of Contents Preface . xv VECLIB . xv LAPACK . xvi ScaLAPACK . xvi SuperLU . xvii SOLVERS . xvii VMATH . xviii Purpose and audience . xix Organization . xx Notational conventions . xxii Documentation resources . xxiii Part 1 1 Introduction to VECLIB . 1 Overview . 1 Chapter objectives . 2 Standardization . 3 Accessing VECLIB . 3 Compiling and linking (VECLIB) . 5 Problem with +ppu compatibility and duplicated symbols . 14 Optimization . 17 Parallel processing . .17 Linking for parallel or non parallel processing . 18 Controlling VECLIB parallelism at runtime . 18 Performance benefits . 19 OpenMP-based nested parallelism . 20 v Message passing-based nested parallelism . 21 Default CPS library stack is too small for MLIB . 22 Default Pthread library stack is too small for MLIB . 22 Roundoff effects . 23 Data types and precision . 23 VECLIB naming convention . 24 Data type and byte length . 25 Operator arguments . 25 Error handling . 28 Low-level subprograms . 28 High-level subprograms . 28 Troubleshooting . .29 HP MLIB man pages . 30 2 Basic Vector Operations . 31 Overview . 31 Chapter objectives . .32 Associated documentation . 32 What you need to know to use vector subprograms . 33 BLAS storage conventions . 33 BLAS indexing conventions . 35 Operator arguments in the BLAS Standard . 36 Representation of a permutation matrix . 37 Representation of a Householder matrix . 38 Subprograms for basic vector operations . 39 ISAMAX/IDAMAX/IIAMAX/ICAMAX/IZAMAX Index of maximum of magnitudes 40 ISAMIN/IDAMIN/IIAMIN/ICAMIN/IZAMIN Index of minimum of magnitudes . 43 ISCTxx/IDCTxx/IICTxx/ICCTxx/IZCTxx Count selected vector elements. 46 ISMAX/IDMAX/IIMAX Index of maximum element of vector . 49 ISMIN/IDMIN/IIMIN Index of minimum element of vector. 51 ISSVxx/IDSVxx/IISVxx/ICSVxx/IZSVxx Search vector for element . 53 SAMAX/DAMAX/IAMAX/SCAMAX/DZAMAX Maximum of magnitudes . 56 SAMIN/DAMIN/IAMIN/SCAMIN/DZAMIN Minimum of magnitudes . 59 vi Table of Contents SASUM/DASUM/IASUM/SCASUM/DZASUM Sum of magnitudes . 62 SAXPY/DAXPY/CAXPY/CAXPYC/ZAXPY/ZAXPYC Elementary vector operation 65 SAXPYI/DAXPYI/CAXPYI/ZAXPYI Sparse elementary vector operation . 68 SCLIP/DCLIP/ICLIP Two sided vector clip . 71 SCLIPL/DCLIPL/ICLIPL Left sided vector clip . 74 SCLIPR/DCLIPR/ICLIPR Right sided vector clip . 77 SCOPY/DCOPY/ICOPY/CCOPY/CCOPYC/ZCOPY/ZCOPYC Copy vector . 80 SDOT/DDOT/CDOTC/CDOTU/ZDOTC/ZDOTU Dot product . 84 SDOTI/DDOTI/CDOTCI/CDOTUI/ZDOTCI/ZDOTUI Sparse dot product . 88 SFRAC/DFRAC Extract fractional parts. 92 SGTHR/DGTHR/IGTHR/CGTHR/ZGTHR Gather sparse vector . 94 SGTHRZ/DGTHRZ/IGTHRZ/CGTHRZ/ZGTHRZ Gather and zero sparse vector . 96 SLSTxx/DLSTxx/ILSTxx/CLSTxx/ZLSTxx List selected vector elements . 99 SMAX/DMAX/IMAX Maximum of vector . 103 SMIN/DMIN/IMIN Minimum of vector. 105 SNRM2/DNRM2/SCNRM2/DZNRM2 Euclidean norm . 107 SNRSQ/DNRSQ/SCNRSQ/DZNRSQ Euclidean norm squared . 110 SRAMP/DRAMP/IRAMP Generate linear ramp. 112 SROT/DROT/CROT/CSROT/ZROT/ZDROT Apply Givens rotation . 114 SROTG/DROTG/CROTG/ZROTG Construct Givens rotation . 118 SROTI/DROTI Apply sparse Givens rotation . 120 SROTM/DROTM Apply modified Givens rotation . 123 SROTMG/DROTMG Construct modified Givens rotation . 127 SRSCL/DRSCL/CRSCL/CSRSCL/ZRSCL/ZDRSCL Scale vector. 130 SSCAL/DSCAL/CSCAL/CSSCAL/CSCALC/ZSCAL/ZDSCAL/ZSCALC Scale vector . 133 SSCTR/DSCTR/ISCTR/CSCTR/ZSCTR Scatter sparse vector. 136 SSUM/DSUM/ISUM/CSUM/ZSUM Vector sum . 139 SSWAP/DSWAP/ISWAP/CSWAP/ZSWAP Swap two vectors . 141 SWDOT/DWDOT/CWDOTC/CWDOTU/ZWDOTC/ZWDOTU Weighted dot product . 145 SZERO/DZERO/IZERO/CZERO/ZZERO Clear vector . 150 F_SAMAX_VAL/F_DAMAX_VAL/F_CAMAX_VAL/F_ZAMAX_VAL Maximum absolute value and location . 153 F_SAMIN_VAL/F_DAMIN_VAL/F_CAMIN_VAL/F_ZAMIN_VAL Minimum absolute value.

HP MLIB User's Guide

New Fast and Accurate Jacobi Svd Algorithm: I

November 1960 Table of Contents

REPORTS in INFORMATICS Department of Informatics

Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods1

Applied Numerical Linear Algebra. Lecture 14

The Astrometric Core Solution for the Gaia Mission Overview of Models, Algorithms and Software Implementation

The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme Scale

Singular Value Decomposition in Embedded Systems Based on ARM Cortex-M Architecture

Object Oriented Design Philosophy for Scientific Computing

Ubung 4.10 Check That the Inverse of L

New Fast and Accurate Jacobi Svd Algorithm: I. Lapack Working Note 169

Arxiv:1301.5758V3