GCC for Embedded Systems
Total Page:16
File Type:pdf, Size:1020Kb
GCC for Embedded Systems Ayal Zaks IBM Haifa Research Lab [email protected] http://www.haifa.il.ibm.com/dept/svt/code_compiler.html IBM Haifa Research Lab IBM Haifa Research Lab GCC for Embedded Systems – Talk Goals GCC for Embedded Systems – why GCC? What (R&D) are we doing in GCC? Where can you learn more? 2 IBM Haifa Research Lab GCC for Embedded Systems - Talk Layout 1. GCC – development: open community product of FSF 2. GCC – recent contributions from IBM Haifa Vectorization (GCC 4.0) Modulo scheduling (GCC 4.0) Inter-Procedural constant propagation (GCC 4.1) Data-layout optimizations (GCC 4.3) 3. GCC – performance for embedded: EEMBC 4. GCC – research: HiPEAC 3 IBM Haifa Research Lab GCC development – Mission Statement (from http://gcc.gnu.org/gccmission.html) GCC development is a part of the GNU Project , aiming to improve the compiler used in the GNU system including the GNU/Linux® variant. The GCC development effort uses an open development environment and supports many other platforms in order to foster a world-class optimizing compiler , to attract a larger team of developers, to ensure that GCC and the GNU system work on multiple architectures and diverse environments , and to more thoroughly test and extend the features of GCC. 4 IBM Haifa Research Lab GNU Compiler Collection Advantages Consistent across large number of targets (target is architecture plus operating system) Very good warnings aid porting to new targets Free software project: licenses protect GCC as free software while allowing users to build proprietary software doesn't depend on any one organization developers have deep commitment to GCC many choices for commercial support Continuous improvement 5 IBM Haifa Research Lab Computer Architectures Aspects of support instructions, registers, memory CPU tuning parameters Architectures GCC supports 68HC11 68HC12 Alpha AMD64 ARC ARM AVR Blackfin CRIS CRX C4x D30V FR-30 FR-V H8/300 IA-32 IPF (IA-64) IQ2000 M32C M32R M68000 M88000 MCORE MIPS MMIX MN10300 PA-RISC PowerPC® PowerPC64 ROMP S/390® SH SPARC SPARC-V9 Stormy16 V850 VAX Xscale Xtensa z/Architecture™ Cell 6 IBM Haifa Research Lab GCC – GNU Compiler Collection Free Software Foundation (FSF) 2.3 million lines of code, 20 years of development (since 1987) Multiple programming languages C, C++, Objective-C, Ada, Java, Fortran Multiplatform: Multiple target processors and operating systems IBM eServers, RISC, embedded, ... (over 30 architectures!) Linux , Unix, Windows, OS/X, embedded, ... GCC is de-facto standard in the Linux eco-system 7 IBM Haifa Research Lab GCC – Development Process Legal Issues Copyright held by the FSF (assignments required!) Licensed under the GPL Who’s involved Volunteers Academia Linux distros RedHat, Novell/SUSE IBM IBM LTC Haifa Lab IBM Research Vendors: CodeSourcery, ARM, AdaCore, more 8 IBM Haifa Research Lab GCC for Embedded Systems - Talk Layout 1. GCC – development: product of Free Software Foundation community 2. GCC – recent contributions from IBM Haifa Vectorization (GCC 4.0) Modulo scheduling (GCC 4.0) Inter-Procedural constant propagation (GCC 4.1) Data-layout optimizations (GCC 4.3) 3. GCC – performance for embedded: EEMBC 4. GCC – research: HiPEAC 9 IBM Haifa Research Lab (http://www.gccsummit.org/2004/speakers.php) 10 IBM Haifa Research Lab Programming for Vector Machines Proliferation of SIMD (Single Instruction Multiple Data) model MMX/SSE, Altivec Communications, Video, Gaming Fortran90 a[0:N] = b[0:N] + c[0:N]; Intrinsics vector float vb = vec_load (0, ptr_b); vector float vc = vec_load (0, ptr_c); vector float va = vec_add (vb, vc); vec_store (va, 0, ptr_a); Autovectorization : Automatically transform serial code to vector code by the compiler. 11 IBM Haifa Research Lab Vectorization original serial loop: loop in vector notation: for(i=0; i<N; i++){ for (i=0; i<(N-N% VF ); i+= VF ){ a[i] = a[i] + b[i]; vectorization a[i:i+ VF ] = a[i:i+ VF ] + b[i:i+ VF ]; } } vectorized loop loop in vector notation: for ( ; i < N; i++) { for (i=0; i<N; i+= VF ){ a[i] = a[i] + b[i]; a[i:i+ VF ] = a[i:i+ VF ] + b[i:i+ VF ]; } epilog loop } 12 IBM Haifa Research Lab Vectorization Examples Pixel Blending Application - small dataset: 16x improvement - tiled large dataset: 7x improvement - large dataset with display: 3x improvement for (i = 0; i < sampleCount; i++) { output[i] = ( (input1[i] * α)>>8 + (input2[i] * ( α-1))>>8 ); } SPEC gzip – 9% improvement for (n = 0; n < SIZE; n++) { lvx v0,r3,r2 m = head[n]; vsubuhs v0,v0,v1 head[n] = (m >= WSIZE ? m-WSIZE : 0); stvx v0,r3,r2 } addi r2,r2,16 bdnz L2 13 IBM Haifa Research Lab GCC Developers’ Summit 2005 (http://www.gccsummit.org/2005/speakers.php) 14 IBM Haifa Research Lab GCC Developers’ Summit 2006 (http://www.gccsummit.org/2006/speakers.php) 15 IBM Haifa Research Lab GCC for Embedded Systems - Talk Layout 1. GCC – development: product of Free Software Foundation community 2. GCC – recent contributions from IBM Haifa Vectorization (GCC 4.0) Modulo scheduling (GCC 4.0) Inter-Procedural constant propagation (GCC 4.1) Data-layout optimizations (GCC 4.3) 3. GCC – performance for embedded: EEMBC 4. GCC – research: HiPEAC 16 IBM Haifa Research Lab EEMBC Reports Record-Breaking Telecom Benchmark Scores for IBM PowerPC 970FX 2-GHz Processor “EL DORADO HILLS, Calif.—Aug. 21, 2006—The Embedded Microprocessor Benchmark Consortium today announced certified EEMBC benchmark scores for the IBM PowerPC 970FX which set a new record for embedded processors tested against the consortium’s TeleBench™ telecommunications benchmark suite. The IBM PowerPC 970FX, clocked at 2 GHz and using the GCC auto-vectorization compiler , with full-fury optimizations for Vector/SIMD Multimedia Extension Technology, achieved a score of 1058.7 Telemarks. In out- of-the-box tests, the PowerPC 970FX achieved a score of 141.8 Telemarks, a 182% improvement over a previous record-setting score for the same device obtained with a different compiler. ... "High-end processors are being used in more embedded products than ever, and OEMs are looking for benchmark scores that offer a more application-specific indication of performance than traditional server benchmarks," said Markus Levy, EEMBC president. "By publishing these TeleBench scores for its PowerPC 970FX, IBM is showing its commitment to providing customers with valuable, trustworthy information on processor performance. Beyond delivering a realistic picture of how its device will perform in communications applications, the scores also represent a convincing indication of the performance advantages provided by vector processing in general.“ For this new round of PowerPC 970FX benchmark tests, and in accordance with EEMBC rules for optimization, IBM Research improved the GCC auto-vectorization compiler and IBM's Power Architecture Performance Team modified the EEMBC benchmark source code to use the VMX instructions documented in the PowerPC Microprocessor Family: Vector/SIMD Multimedia Extension Technology Programming Environments Manual. … Detailed score reports for the PowerPC 970FX are available free on the EEMBC web site at www.eembc.org.” (taken from: http://www.eembc.org/Press/PressRelease/060821.htm) 17 IBM Haifa Research Lab EEMBC: The Embedded Microprocessor Benchmark Consortium (http://www.eembc.org) 18 IBM Haifa Research Lab GCC for Embedded Systems - Talk Layout 1. GCC – development: product of Free Software Foundation community 2. GCC – recent contributions from IBM Haifa Vectorization (GCC 4.0) Modulo scheduling (GCC 4.0) Inter-Procedural constant propagation (GCC 4.1) Data-layout optimizations (GCC 4.3) 3. GCC – performance for embedded: EEMBC 4. GCC – research: HiPEAC 19 IBM Haifa Research Lab HiPEAC (www.hipeac.net) 20 IBM Haifa Research Lab 21 IBM Haifa Research Lab References GCC on the Free Software Foundation website: gcc.gnu.org (plus wiki) www.gccsummit.org (including paper proceedings) HiPEAC GCC Tutorials: http://www.hipeac.net/gcc-tutorial Recent Academic Papers "Auto-Vectorization of Interleaved Data for SIMD", Dorit Nuzman, Ira Rosen, and Ayal Zaks, PLDI , June 12-14, 2006, Ottawa, Canada "Multi-platform Auto-vectorization", Dorit Nuzman and Richard Henderson, CGO- 4, March 26-29, 2006, Manhattan, New York Thread-level speculation: PPoPP’06 [Liu et al.] and ICS’05 [Renau et al.] Induction variable recognition: HiPEAC’05 [Pop et al.] LLVM - life-long adaptive optimization: CGO’04 [Lattner and Adve] Generic programming (template metaprogramming and concepts): PoPL’06 [Dos Reis and Stroustrup] “Contributions to the GNU Compiler Collection GCC”, IBM’s Systems Journal, issue on Open Source, volume 44, number 2, 2005 22 IBM Haifa Research Lab GCC for Embedded Systems – Talk Goals GCC for Embedded Systems – why GCC? What (R&D) are we doing in GCC? Where can you learn more? thanks ! 23.