Compiler Optimizations for Performance

Compiler Optimizations for Performance

CompilerCompiler OptimizationsOptimizations forfor PerformancePerformance WeiWei LiLi Stanford University CS243 Winter 2006 1 ExampleExample I:I: ComputerComputer GamesGames Pocket Quake* II - Compiler Benchmark The demo compares two compiled versions of the same benchmark. The better optimized version provides about 2X performance boost on Pocket Quake II. Click object )in Powerpoint presentation mode to start film Less optimizations More optimizations Stanford University CS243 Winter 2006 2 ExampleExample II:II: WeatherWeather ForecastForecast Adding OpenMP, Itanium® 2 4 CPUs Weather forecast application 10x Compiler directives, minor source changes, Itanium® 2 4 CPU /O3 – Enable advanced optimizations 4x Legacy system Itanium® 2 4 CPU 8 CPUs 2.5x Compiler Original source, default options, Itanium® 2 4 CPU Compiler 1x 1.2x Compiler Compiler Optimizations may improve performance significantly. Stanford University CS243 Winter 2006 3 CompilerCompiler FeaturesFeatures (example)(example) O2:O2: scalarscalar optimizationsoptimizations andand basicbasic schedulingscheduling O3:O3: optimizationsoptimizations forfor technicaltechnical computingcomputing applicationsapplications (loopy(loopy codes)codes) O1:O1: optimizationsoptimizations forfor serverserver applicationsapplications (straight(straight--lineline andand branchybranchy codescodes withwith flatflat profile)profile) Stanford University CS243 Winter 2006 4 CompilerCompiler FeaturesFeatures (example)(example) PGOPGO (profile(profile--guidedguided optimizations):optimizations): usingusing profileprofile toto guideguide optimizations.optimizations. IPOIPO (inter(inter--proceduralprocedural optimizations):optimizations): multimulti--filefile inlininginlining,, interproceduralinterprocedural optimizations.optimizations. Parallel:Parallel: automaticautomatic parallelizationparallelization OpenMPOpenMP:: exploitingexploiting threadthread--LevelLevel parallelismparallelism OptimizationOptimization report:report: compilercompiler optimizationsoptimizations performedperformed oror notnot donedone Stanford University CS243 Winter 2006 5 CompilerCompiler OrganizationOrganization (example)(example) C++ FORTRAN Front End Front End Profiler Interprocedural analysis and optimizations: inlining, constant prop, whole program detect, mod/ref, points-to Loop optimizations: data deps, prefetch, vectorizer, Disambiguation: unroll/interchange/fusion/dist, auto-parallel/OpenMP types, array, pointer, structure, directives Global scalar optimizations: partial redundancy elim, dead store elim, strength reduction, dead code elim Code generation: predication, software pipelining, global scheduling, register allocation, code emit Stanford University CS243 Winter 2006 6 Example:Example: SoftwareSoftware PipeliningPipelining L1: ld4 r4 = [r5], 4; // 0 add r7 = r4, r9; // 2 st4 [r6] = r7, 4 // 3 br.cloop L1;; 4 cycles per iteration Stanford University CS243 Winter 2006 7 Example:Example: SoftwareSoftware PipeliningPipelining • Exploit parallelism across loop iterations. ld ld L1: ld4 r4 = [r5], 4; // 0 add ld add r7 = r4, r9; // 2 st add ld st4 [r6] = r7, 4 // 3 st add ld br.cloop L1;; st add st add 4 cycles per iteration st Stanford University CS243 Winter 2006 8 Example:Example: SoftwareSoftware PipeliningPipelining • Exploit parallelism across loop iterations. ld ld L1: ld4 r4 = [r5], 4; // 0 add ld add r7 = r4, r9; // 2 st add ld st4 [r6] = r7, 4 // 3 st add ld br.cloop L1;; st add st add 4 cycles per iteration st L1: (p16) ld4 r32 = [r5], 4 // cycle 0 (p18) add r35 = r34, r9 // cycle 0 (p19) st4 [r6] = r36, 4 // cycle 0 br.ctop L1;; // cycle 0 1 cycle per iteration (with architecture support) Stanford University CS243 Winter 2006 9 CourseCourse EmphasisEmphasis CompilerCompiler foundationfoundation TheoreticalTheoretical frameworksframeworks AlgorithmsAlgorithms ExperimentationExperimentation HandsHands--onon experienceexperience NonNon--goal:goal: howhow toto buildbuild aa completecomplete optimizingoptimizing compilercompiler ExposureExposure toto realreal worldworld impactimpact HowHow theythey workedworked inin practicepractice Stanford University CS243 Winter 2006 10.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    10 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us