CompilerCompiler OptimizationsOptimizations forfor PerformancePerformance

WeiWei LiLi

Stanford University CS243 Winter 2006 1 ExampleExample I:I: ComputerComputer GamesGames

„ Pocket Quake* II - Compiler Benchmark The demo compares two compiled versions of the same benchmark. The better optimized version provides about 2X performance boost on Pocket Quake II.

Click object )in Powerpoint presentation mode to start film Less optimizations More optimizations

Stanford University CS243 Winter 2006 2 ExampleExample II:II: WeatherWeather ForecastForecast

Adding OpenMP, Itanium® 2 4 CPUs Weather forecast application 10x

Compiler directives, minor source changes, Itanium® 2 4 CPU /O3 – Enable advanced optimizations 4x Legacy system Itanium® 2 4 CPU 8 CPUs 2.5x Compiler Original source, default options, Itanium® 2 4 CPU Compiler 1x 1.2x Compiler Compiler

Optimizations may improve performance significantly.

Stanford University CS243 Winter 2006 3 CompilerCompiler FeaturesFeatures (example)(example)

„ O2:O2: scalarscalar optimizationsoptimizations andand basicbasic schedulingscheduling „ O3:O3: optimizationsoptimizations forfor technicaltechnical computingcomputing applicationsapplications (loopy(loopy codes)codes) „ O1:O1: optimizationsoptimizations forfor serverserver applicationsapplications (straight(straight--lineline andand branchybranchy codescodes withwith flatflat profile)profile)

Stanford University CS243 Winter 2006 4 CompilerCompiler FeaturesFeatures (example)(example)

„ PGOPGO (profile(profile--guidedguided optimizations):optimizations): usingusing profileprofile toto guideguide optimizations.optimizations. „ IPOIPO (inter(inter--proceduralprocedural optimizations):optimizations): multimulti--filefile inlininginlining,, interproceduralinterprocedural optimizations.optimizations. „ Parallel:Parallel: automaticautomatic parallelizationparallelization „ OpenMPOpenMP:: exploitingexploiting threadthread--LevelLevel parallelismparallelism „ OptimizationOptimization report:report: compilercompiler optimizationsoptimizations performedperformed oror notnot donedone

Stanford University CS243 Winter 2006 5 CompilerCompiler OrganizationOrganization (example)(example) C++ FORTRAN Front End Front End

Profiler

Interprocedural analysis and optimizations: inlining, constant prop, whole program detect, mod/ref, points-to

Loop optimizations: data deps, prefetch, vectorizer, Disambiguation: unroll/interchange/fusion/dist, auto-parallel/OpenMP types, array, pointer, structure, directives Global scalar optimizations: partial redundancy elim, dead store elim, , dead code elim

Code generation: predication, , global scheduling, , code emit

Stanford University CS243 Winter 2006 6 Example:Example: SoftwareSoftware PipeliningPipelining

L1: ld4 r4 = [r5], 4; // 0 add r7 = r4, r9; // 2 st4 [r6] = r7, 4 // 3 br.cloop L1;; 4 cycles per iteration

Stanford University CS243 Winter 2006 7 Example:Example: SoftwareSoftware PipeliningPipelining • Exploit parallelism across loop iterations.

ld ld L1: ld4 r4 = [r5], 4; // 0 add ld add r7 = r4, r9; // 2 st add ld st4 [r6] = r7, 4 // 3 st add ld br.cloop L1;; st add st add 4 cycles per iteration st

Stanford University CS243 Winter 2006 8 Example:Example: SoftwareSoftware PipeliningPipelining • Exploit parallelism across loop iterations.

ld ld L1: ld4 r4 = [r5], 4; // 0 add ld add r7 = r4, r9; // 2 st add ld st4 [r6] = r7, 4 // 3 st add ld br.cloop L1;; st add st add 4 cycles per iteration st L1: (p16) ld4 r32 = [r5], 4 // cycle 0 (p18) add r35 = r34, r9 // cycle 0 (p19) st4 [r6] = r36, 4 // cycle 0 br.ctop L1;; // cycle 0 1 cycle per iteration (with architecture support)

Stanford University CS243 Winter 2006 9 CourseCourse EmphasisEmphasis

„ CompilerCompiler foundationfoundation „ TheoreticalTheoretical frameworksframeworks „ AlgorithmsAlgorithms „ ExperimentationExperimentation „ HandsHands--onon experienceexperience „ NonNon--goal:goal: howhow toto buildbuild aa completecomplete optimizingoptimizing compilercompiler „ ExposureExposure toto realreal worldworld impactimpact „ HowHow theythey workedworked inin practicepractice

Stanford University CS243 Winter 2006 10