RICE UNIVERSITY
Automatic and Interactive Parallelization
by
c
Kathryn S M Kinley
A Thesis Submitted
in Partial Fulfillment of the
Requirements for the Degree
Do ctor of Philosophy
Approved Thesis Committee
Ken Kennedy
Noah Harding Professor chair
Computer Science
Keith D Co op er Asso ciate Professor
Computer Science
Don H Johnson Professor
Electrical and Computer Engineering
Danny C Sorensen Professor
Mathematical Sciences
Houston Texas
March
Automatic and Interactive Parallelization
c
Kathryn S M Kinley
Abstract
The goal of this dissertation is to give programmers the ability to achieve high p er
formance by fo cusing on developing parallel algorithms rather than on architecture
sp eci c details The advantages of this approach also include program p ortability and
legibility To achieve high p erformance we provide automatic compilation techniques
that tailor parallel algorithms to shared memory multipro cessors with lo cal caches
and a common bus In particular the compiler maps complete applications onto the
sp eci cs of a machine exploiting b oth parallelism and memory
To optimize complete applications we develop novel general algorithms to trans
form lo ops that contain arbitrary conditional control ow In addition we provide new
interpro cedural transformations which enable optimization across pro cedure b ound
aries These techniques provide the basis for a robust automatic parallelizing algo
rithm that is applicable to complete programs
The algorithm for automatic parallel co de generation takes into consideration the
interaction of parallelism and data lo cality as well as the overhead of parallelism The
algorithm is based on a simple cost mo del that accurately predicts cache line reuse
from multiple accesses to the same memory lo cation and from consecutive accesses
The optimizer uses this mo del to improve data lo cality It also uses the mo del to
discover and intro duce e ective parallelism that complements the b ene ts of data
lo cality The optimizer further improves the e ectiveness of parallelism by seeking to
increase its granularity Parallelism is intro duced only when granularity is su cient
to overcome its asso ciated costs
The algorithm for parallel co de generation is shown to b e e cient and several of its
comp onent algorithms are proven optimal The e cacy of the optimizer is illustrated
with exp erimental results In most cases it is very e ective and either achieves or
improves the p erformance of hand crafted parallel programs When p erformance is
not satisfactory we provide an interactive parallel programming to ol which combines
compiler analysis and algorithms with human exp ertise
Acknowledgments
Ken Kennedy provided me with the three most imp ortant elements of supp ort in
graduate scho ol intellectual p olitical and nancial In addition Ken Keith Co op er
and Linda Torczon fostered a research atmosphere and working environment whose
b ene ts are untold I would also like to recognize the other memb ers of my committee
Keith Co op er Don Johnson and Danny Sorensen Keith has b een an endless source
of encouragement and wisdom throughout my graduate career Don Johnson gave
me my rst taste of research and ho oked me for life
I am fortunate that many of my fellow graduate students and friends supp orted my
research intellectually emotionally and with implementations I would esp ecially like
to thank Chau Wen Tseng Marina Kalem Mary Hall Paul Havlak Nat McIntosh
Preston Briggs Ben Chase and the entire compiler group
I am extremely gratefully to my entire family As always my parents were a
constant source of love and encouragement To my husband Scotty Strahan I hop e
I am the ro ck for you that you have b een for me
A little Madness in the Spring
Is wholesome even for the King Emily Dickinson
Contents
Abstract ii
Acknowledgments iii
List of Illustrations ix