EMBEDDED C and C++ COMPILER EVALUATION METHODOLOGY Class 443, Embedded Systems Conference, September 26-30, 1999 Chuck Tribolet and John Palmer

EMBEDDED C AND C++ COMPILER EVALUATION METHODOLOGY Class 443, Embedded Systems Conference, September 26-30, 1999 Chuck Tribolet and John Palmer The Authors Chuck Tribolet is a Senior Programmer at IBM Research Division’s Almaden Research Center in San Jose, California. Chuck graduated from Stanford in 1973 with a BS in Industrial Engineering and an MS in Computer Engineering. He joined IBM at Los Gatos in April 1973 and holds five patents on various aspects of software. He is currently responsible for evaluating C and C++ compilers for an assortment of embedded microprocessors for use in disk drives . e-mail: [email protected] Voice: 408-927-2707 Fax: 408-927-4166 John Palmer is a Research Staff Member at the IBM Almaden Research Center. He works on disk drive performance, both at the level of the overall drive and at the level of the processors that run the disk drives. For variety, he also studies disk error patterns to improve error recovery behavior. John graduated from Clemson University in 1969 with a BS in Applied Mathematics. E-mail: [email protected] Voice: 408-927-1724 Fax: 408-927-4010 Abstract Making a processor architecture change is a frightening prospect to anyone in the embedded systems world. Not only are there questions about code porting and new development tools, but there are likely to be large unknowns about the performance of the new architecture. Nevertheless, advances in architecture knowledge and decreasing chip costs make it attractive to consider the potential gain from a more modern architecture. In evaluating new architectures for our embedded application, we organized our efforts in two directions: code compilation and performance modeling. In the natural way, the two are related: generally the larger the code size, the longer it will take to execute and the more you will have to pay for memory space. The compiler and microprocessor vendors will all supply data showing that their product is best. Which one is right? Asking them about the relative merits of Embedded C and C++ Compiler Evaluation Methodology Page 1 of 17 their product versus the competition isn’t productive because they all think they have the greatest compiler in the world, and can show you data to prove it. Unfortunately, they spend more time turning the knobs on their own compiler than they do on their competitor’s, so they may be comparing a well-tuned Mustang to a poorly tuned Camaro. Furthermore, their benchmark is different from your application. These in-house benchmarks have made them all honestly believe they have the best processor and compiler. Obviously, they ALL can’t all have them. This paper presents a methodology for evaluating embedded C and C++ compilers and their associated microprocessors. This methodology was used in the selection of the components to be used on a disk drive. It uncovered a number of potential pitfalls early in the design cycle. Had these been uncovered later, it would have been difficult to correct them. We used our compiler observations in selecting the microprocessor. It was obvious that in some cases a good compiler was enhancing the performance of the microprocessor. Additionally, we inferred information on the performance characteristics of the microprocessor from analysis of the generated code. The other tool we created to aid us in evaluating choices and in correctly tuning the final choice was a simulator that would predict the behavior of processors of different architectures. As input to the simulator, we had traces from two different processors running the different parts of our function that we hoped to combine on a single processor in a new architecture. With this simulator, we were able to predict with sufficient precision the behavior of our application on several different architectures, use this to predict the clock rates and memory systems that would be required with each of the processor architectures. We did it all long before we could have created running code for any of the proposed systems. The paper describes the structure of the simulator itself, particularly how it was made flexible enough to model widely differing architectures. We will discuss the collection of traces and adaptation of them both to our expected future workload and to each of the proposed architectures. Finally, we will discuss the type of information obtained from the simulator and the way in which it was used in making an architecture decision, fine tuning the system that we picked, and in the planning for future evolution. Compiler Selection The Initial Approach A naive approach is to select some code, compile for various engines or have the vendors compile it, and tabulate the results. We started out this way, but the actual process played out to be much more interesting. Defining the Benchmark Embedded C and C++ Compiler Evaluation Methodology Page 2 of 17 The best benchmark is your entire application. Size does matter. The law of large numbers applies. Small benchmarks can produce misleading results if the modules selected happen to match what one combination of compiler and microprocessor handles well or poorly. Any vendor worth her salt can make a small benchmark look good. Furthermore, a small benchmark can be compiled on a smaller memory model than the entire application, will fit in a smaller Small Data Area than the whole, and needs fewer registers. Early in our evaluation, we created a single module benchmark. Its results were entirely different from those of a much larger benchmark. Using your entire application as a benchmark will require some degree of modification, both to make it more ANSI compliant in the cases where it gratuitously diverges from the standard, and to isolate with preprocessor logic the noncompliant code in the cases that exploit some required nonstandard feature. A couple of person-days effort made 75% of our files compile. Three quarters of your application is a much better benchmark than any small benchmark or any vendor benchmark. Another couple of person-weeks over the course of the bench marking got the remaining 25% to compile. In our case, the cost of using the entire application as the benchmark paid off. It resulted in more accurate results, we now have a very nearly ANSI-compliant code base for our product, and it wasn’t nearly as much work as predicted. It is important to use your own application, or something very similar. Our application consists primarily of long strings of if/then/else statements so loop optimization is not as important as the optimization of linear code and branches, exploitation of the Small Data Area, optimization of register usage, and ROM size. On the other hand a laser printer manufacturer would care about how fast the code will loop through a four hundred megabit image of a page at 600 dpi. Your benchmark must match the characteristics of your own application. Since we did not have hardware with any of the alternative microprocessors, we chose not to actually exectute the code generated by the various compilers. If you are evaluating compilers for existing hardware, this would be a useful validation of the results. Procuring Compilers An old recipe for rabbit stew starts off: “First, catch a rabbit.” Well, first identify and acquire the interesting compilers. Ask your microprocessor vendors which compiler vendor they recommend, but don’t limit yourself to just one compiler per microprocessor. Review the trade press, and cruise the floor at the Embedded Systems Conference. Most compiler vendors will give you a 2-8 week evaluation license for free and they were very cooperative about extending these licenses when the selection process went longer than expected. You will find that some compilers are “badge-engineered,” that is, the same compiler will appear under several names. In general, you will only need to benchmark one of them in depth. Embedded C and C++ Compiler Evaluation Methodology Page 3 of 17 Measurement Environment Once you have your first compiler in hand, it’s time to start setting up an efficient workbench for compiler measurement. You will be doing many, many, runs, so it is very important that it take as little of your time as possible to do a run. We ended up bench marking 22 compilers on 13 processors. One processor had six different compilers. We ran over 150 combinations of options. There are variations in C between compilers and microprocessors. For example the size of an int might be 16 bits in one implementation and 32 bits in another, or one microprocessor might have a small data area sufficient to hold all static variables, while another’s SDA might be small and only hold a few key variables. I set up a single master source directory. When a given compiler required minor source code changes, I did it using preprocessor logic (#if statements) in that master source directory. These allowed minor changes (usually in pragmas and typedefs) to be tailored for each combination of compiler and microprocessor. For each run on a given compiler there is a OBJ directory that contains the output files (object and listing files) and the variable portions of the input (the batch file used to invoke the compiler and parameter file, if any). I keep these object directories long term, which allows me to track what the environment was, and what the results were. Going a bit overboard on record keeping will pay dividends down the road (but bear in mind that I do sell disk drives for a living). Running the Benchmark Be aggressive about trying compiler options. The compilers each have many options, and it is important to arrive at the best set of the options for each compiler.

Load more