Cetus: a Source-To-Source Compiler Infrastructure
Total Page:16
File Type:pdf, Size:1020Kb
!"#$%&'('%)$*!"+#)+%)$*!"'!),-./"*' .01*(%#*$!#$*"'1)*',$/#.!)*"%' ! ! ! "#$%#!"&'()%!*+,!"#-.&+)!/#.0&%%&1+%2! 3,$4*)&1+*5!1.!-#.%1+*5!$%#!16!)(&%!0*)#.&*5!&%!-#.0&))#,!7&)(1$)!6##8!-.19&,#,!%$4(! $%#2!:;!&%!+1)!0*,#!61.!-.16&)<!=;!&+45$,#%!)(&%!+1)&4#!*+,!*!6$55!4&)*)&1+!)1!)(#!1.&'&+*5! 71.>!1+!)(#!6&.%)!-*'#!16!)(#!41-?<!*+,!@;!,1#%!+1)!&0-5?!A333!#+,1.%#0#+)!16!*+?! )(&.,B-*.)?!-.1,$4)%!1.!%#.9&4#%C!D$)(1.%!*+,!)(#&.!410-*+&#%!*.#!-#.0&))#,!)1!-1%)! )(#&.!A333B41-?.&'()#,!0*)#.&*5!1+!)(#&.!17+!E#F!%#.9#.%!7&)(1$)!-#.0&%%&1+8! -.19&,#,!)(*)!)(#!A333!41-?.&'()!+1)&4#!*+,!*!6$55!4&)*)&1+!)1!)(#!1.&'&+*5!71.>! *--#*.!1+!)(#!6&.%)!%4.##+!16!)(#!-1%)#,!41-?C! ! /#.0&%%&1+!)1!.#-.&+)G.#-$F5&%(!)(&%!0*)#.&*5!61.!4100#.4&*58!*,9#.)&%&+'8!1.! -.101)&1+*5!-$.-1%#%!1.!61.!4.#*)&+'!+#7!4155#4)&9#!71.>%!61.!.#%*5#!1.! .#,&%).&F$)&1+!0$%)!F#!1F)*&+#,!6.10!A333!F?!7.&)&+'!)1!)(#!A333!A+)#55#4)$*5! /.1-#.)?!"&'()%!H66&4#8!IIJ!K1#%!L*+#8!/&%4*)*7*?8!MN!OPPJIBI:I:!1.!-$F%B -#.0&%%&1+%Q&###C1.'C!R1-?.&'()!S!=OOT!A333C!D55!.&'()%!.#%#.9#,C! ! DF%).*4)&+'!*+,!L&F.*.?!U%#2! DF%).*4)&+'!&%!-#.0&))#,!7&)(!4.#,&)!)1!)(#!%1$.4#C!L&F.*.&#%!*.#!-#.0&))#,!)1! -(1)141-?!61.!-.&9*)#!$%#!16!-*).1+%8!-.19&,#,!)(#!-#.B41-?!6##!&+,&4*)#,!&+!)(#! 41,#!*)!)(#!F1))10!16!)(#!6&.%)!-*'#!&%!-*&,!)(.1$'(!)(#!R1-?.&'()!R5#*.*+4#!R#+)#.8! ===!"1%#711,!V.&9#8!V*+9#.%8!WD!O:T=@C! ! ! R&)*)&1+2! R(&.*'!V*9#8!K*+%*+'!X*#8!Y#$+'BN*&!W&+8!Y#?1+'!L##8!"$,156!3&'#+0*++8!Y*0$#5! W&,>&668!ZR#)$%2!D!Y1$.4#B)1BY1$.4#!R10-&5#.!A+6.*%).$4)$.#!61.!W$5)&41.#%8Z! !"#$%&'(8!915C!I=8!+1C!:=8!--C!@[BI=8!V#4C!=OOT8!,1&2:OC::OTGWRC=OOTC@PJ! ! ! \1.!01.#!&+61.0*)&1+8!-5#*%#!9&%&)2! ())-2GG4#)$%C#4+C-$.,$#C#,$! ! ! R1+)*4)!$%!*)2! 4#)$%Q#4+C-$.,$#C#,$! ! ](#!R#)$%!]#*0! COVER FEATURE CETUS: A SOURCE- TO-SOURCE COMPILER INFRASTRUCTURE FOR MULTICORES Chirag Dave, Hansang Bae, Seung-Jai Min, Seyong Lee, Rudolf Eigenmann, and Samuel Midkiff, Purdue University The Cetus tool provides an infrastruc- This state-of-the-art snapshot of automatic paral- ture for research on multicore compiler lelization for multicores uses the Cetus tool. Cetus is an optimizations that emphasizes automatic infrastructure for research on multicore compiler optimi- parallelization. The compiler infrastruc- zations, with an emphasis on automatic parallelization. ture, which targets C programs, supports We have created a compiler infrastructure that supports source-to-source transformations, is user-oriented and source-to-source transformations, is user- easy to handle, and provides the most important par- oriented and easy to handle, and provides allelization passes as well as the underlying enabling the most important parallelization passes as techniques. The infrastructure project follows Polaris,1,2 well as the underlying enabling techniques. which was arguably the most advanced research infra- structure for optimizing compilers on parallel machines. While Polaris translated Fortran, Cetus targets C programs. ith the advent of multicore architec- Cetus arose initially from the work of several enthusiastic tures, automatic parallelization has, graduate students, who continued what they began in a for several reasons, re-emerged as an class project. Recently, we have obtained funding from the important tool technology. While classi- US National Science Foundation to evolve the project into cal parallel machines served a relatively a community resource. Wsmall user community, multicores aim to capture a mass In our work, we have measured both Cetus and Cetus- market, which demands user-oriented, high-productivity parallelized program characteristics. These results show programming tools. Further, multicores are replacing a high-quality parallelization infrastructure equally pow- complex superscalar processors, the parallelism of which erful but easier to use than other choices. Cetus can be was unquestionably exploited by the compiler and under- compared to Intel’s ICC compiler and the COINS research lying architecture. compiler. Initially we had also considered infrastructures This same model is desirable for the new generation such as SUIF (suif.stanford.edu), Open64 (www.open64. of CPUs: Automatic parallelization had its successes on net), Rose (rosecompiler.org), the Gnu C compiler, Pluto shared-address-space architectures exhibiting small num- (pluto-compiler.sourceforge.net), and the Portland Group bers of processors, which is the very structure of today’s (PGI) C Compiler. However, parallelization results for these multicores. tools were either unavailable or lacked explanation. 36 COMPUTER Published by the IEEE Computer Society 0018-9162/09/$26.00 © 2009 IEEE class Instrumenter { ... Developing a dependable public void instrumentLoops(Program p) community support system and { reaching out to Cetus users is one DepthFirstIterator iter = new DepthFirstIterator(p); of our goals. Cetus maintains a int loop_number = 0; while ( iter.hasNext() ) { community portal at cetus.ecn. Object obj = iter.next(); purdue.edu, where the compiler if ( obj instanceof ForLoop ) can be downloaded under an ar- insertTimingCalls((ForLoop)obj, loop_number++); } tistic license. The portal offers a } utility for submitting bug reports and feature requests, and a Cetus private void insertTimingCalls(ForLoop loop, int number) users’ mailing list to discuss ideas, { FunctionCall tic, toc; new functionality, research topics, tic = new FunctionCall(new Identifier(“cetus_tic”)); and user concerns. The Cetus toc = new FunctionCall(new Identifier(“cetus_toc”)); portal further provides documen- tic.addArgument(new IntegerLiteral(number)); tation for installing and running toc.addArgument(new IntegerLiteral(number)); CompoundStatement parent = (CompoundStatement)loop.getParent(); the compiler and for writing new parent.addStatementBefore(loop, new ExpressionStatement(tic)); analysis and transformation parent.addStatementAfter(loop, new ExpressionStatement(toc)); passes using the internal program representation (IR) interface. } ... Several US and worldwide } research groups already use Cetus.3-5 In our ongoing work, Figure 1. Implementation example for basic loop instrumentation. The instrumenter we apply the infrastructure for inserts timing calls around each for loop in the given input program. creating translators that convert shared-memory programs written in OpenMP to other models, such as message-passing6 and t Symbol table. Cetus’s symbol table functionality pro- CUDA (for graphics processing units).7 vides information about identifiers and data types. Its implementation makes direct use of the information CETUS ORGANIZATION AND INTERNAL stored in declaration statements in the IR. There is no REPRESENTATION separate and redundant symbol table storage. Cetus’s IR is implemented in the form of a Java class hier- t Annotations. Comments, pragmas, directives, and archy. A high-level representation provides a syntactic view other types of auxiliary information about IR ob- of the source program to the pass writer, making it easy jects can be stored in annotation objects, which to understand, access, and transform the input program. take the form of declarations. An annotation can For example, the Program class type represents the entire be associated with a statement (such as informa- program that may consist of multiple source files. Each tion about an OpenMP directive belonging to a for source file is represented as a Translation Unit. Other base statement) or could stand independently (such as IR object types are Statement, Declaration, and Expres- a comment line). sion. Specific source constructs in the IR are represented t Printing. The printing functions have been extended by classes derived from these base classes. For example, to allow for flexible rendering of the IR classes. ExpressionStatement represents a Statement that contains an Expression, and an AssignmentExpression represents an CETUS ANALYSES AND TRANSFORMATIONS Expression that assigns the value of the right-hand side to Automatically instrumenting source programs is a com- the left-hand side. There is complete data abstraction, and monly used compiler capability. Figure 1 shows a basic pass writers only manipulate the IR through access func- loop instrumenter that inserts timing calls around each tions. Important features of the IR include the following: for loop in the given input program. Here we assume that the function call cetus_tic(id) stores the current time t Traversable objects. All Cetus IR objects are derived (get time of day, for example) with the given integer tag, from a base class “Traversable.” This class provides the and cetus_toc(id) computes and stores the time differ- functionality to iterate over lists of objects generically. ence since the last call of cetus_tic(id). t Iterators. BreadthFirst, DepthFirst, and Flat iterators At the end of an application run, basic runtime statis- are built into the functionality to provide easy tra- tics on the instrumented code sections can be generated versal and search over the program IR. from this information. The Cetus code in Figure 1 assigns a DECEMBER 2009 37 COVER FEATURE int foo(void) constructs such as “adding argu- { ments” to a FunctionCall or “adding int i; double t, s, a[100]; an additional Statement” to an exist- for ( i = 0; i < 50; ++i ) ing CompoundStatement in the IR. { t = a[i]; Analysis passes a[i+50] = t + (a[i] + a[i+50])/2.0; s = s + 2*a[i]; Advanced program analysis capa- } bilities are essential to Cetus; they will return 0; grow both through our own efforts } (a) and the Cetus user community’s. Here we describe some basic analyses. int foo() Symbolic manipulation. Like its { predecessor, Polaris,1,2 Cetus provides int i; double t, s, a[100]; a key feature in its ability to analyze cetus_tic(0); the represented program in symbolic for (i = 0; i < 50; ++i) terms. The ability to manipulate sym- { bolic expressions is essential when t = a[i]; a[(i+50)] = (t + ((a[i] + a [(i+50)])/2.0)); designing analysis and transforma- s = (s + (2*a[i])); tion algorithms that deal with real } programs. For example, a data depen- cetus_toc(0); dence test can easily extract the return 0; } necessary information if array sub- (b) scripts are normalized with respect to the relevant loop indices.