Received December 21, 2018, accepted January 14, 2019, date of publication January 31, 2019, date of current version February 14, 2019. Digital Object Identifier 10.1109/ACCESS.2019.2895541 A Case for Malleable Thread-Level Linear Algebra Libraries: The LU Factorization With Partial Pivoting SANDRA CATALÁN1, JOSÉ R. HERRERO 2, ENRIQUE S. QUINTANA-ORTÍ1, RAFAEL RODRÍGUEZ-SÁNCHEZ3, AND ROBERT VAN DE GEIJN4 1Departarmento Ingeniería y Ciencia de Computadores, Universidad Jaume I, 12071 Castellón de la Plana, Spain 2Departamento d'Arquitectura de Computadors, Universitat Politècnica de Catalunya, 08034 Barcelona, Spain 3Departamento de Arquitectura de Computadores y Automática, Universidad Complutense de Madrid, 28040 Madrid, Spain 4Department of Computer Science, Institute for Computational Engineering and Sciences, The University of Texas at Austin, Austin, TX 78712, USA Corresponding author: José R. Herrero (
[email protected]) This work was supported in part by the Spanish Ministerio de Economía y Competitividad under Project TIN2014-53495-R, Project TIN2015-65316-P, and Project TIN2017-82972-R, in part by the H2020 EU FETHPC ``INTERTWinE'' under Project 671602, in part by the Generalitat de Catalunya under Project 2017-SGR-1414, and in part by the NSF under Grant ACI-1550493. ABSTRACT We propose two novel techniques for overcoming load-imbalance encountered when imple- menting so-called look-ahead mechanisms in relevant dense matrix factorizations for the solution of linear systems. Both techniques target the scenario where two thread teams are created/activated during the factorization, with each team in charge of performing an independent task/branch of execution. The first technique promotes worker sharing (WS) between the two tasks, allowing the threads of the task that completes first to be reallocated for use by the costlier task.