A Scalable, Numerically Stable, High-performance Tridiagonal Solver using GPUs Li-Wen Chang, John A. Stratton, Hee-Seok Kim, and Wen-Mei W. Hwu Electrical and Computer Engineering University of Illinois at Urbana-Champaign Urbana, Illinois, USA flchang20, stratton, kim868,
[email protected] Abstract—In this paper, we present a scalable, numerically of a single system at a time. Assuming that the systems stable, high-performance tridiagonal solver. The solver is based are all large enough to require partitioning, the amount of on the SPIKE algorithm for partitioning a large matrix into communication and transfer overhead will grow with the small independent matrices, which can be solved in parallel. For each small matrix, our solver applies a general 1-by-1 or number of times each system is partitioned. The number of 2-by-2 diagonal pivoting algorithm, which is also known to be partitions will grow directly in proportion to the number of numerically stable. Our paper makes two major contributions. unrelated systems being solved simultaneously on the same First, our solver is the first numerically stable tridiagonal solver device. All else being equal, we should then prefer to send for GPUs. Our solver provides comparable quality of stable large integrated systems to the GPU for minimal transfer solutions to Intel MKL and Matlab, at speed comparable to the GPU tridiagonal solvers in existing packages like CUSPARSE. It overhead. Internally, the GPU solver is then free to perform is also scalable to multiple GPUs and CPUs. Second, we present further partitioning as necessary to facilitate parallel execution, and analyze two key optimization strategies for our solver: a high- which does not require external communication.