info16

Compilation appears quarterly | October 2008 Architecture and Embedded High Performance and Network of Excellence on

2 Message from the HiPEAC coordinator

3 Message from the project officer

2 Community News - Multicore days in Stockholm, Sweden - Awards and new books

4 HiPEAC Activity - HiPEAC 2009 Conference - HiPEAC Fall Computing Systems Week - 6th HiPEAC Industrial Workshop

5 In the Spotlight - Good bye and Thanks Sylvie! Hello and Welcome Klaas! - EC FP7 STREP Apple-Core Project - EC FP7 STREP eMuCo Project Welcome to THALES – HiPEAC Fall Computing Systems Week 8 Guest Columns - Norbert Wehn, University of Kaiserslautern - Gerd Ascheid, RWTH Aachen University

10 HiPEAC Startups

13 HiPEAC Students and Trip Reports

18 PhD News

20 Upcoming Events www.HiPEAC.net

HiPEAC 2009 Conference: Paphos, Cyprus, January 25-28 Intro

Message from the HiPEAC coordinator

Dear friends,

This summer we could all enjoy the Olympic Games in Beijing. Unmistakably, there is a parallel between the world of sports and networks of excellence. The athletes meet each other regu- larly at specialized events, they are competitors for the same scarce resources, but at the same time many of them are friends. They belong to the very best in their field, they try to promote their field, and they are constantly looking for new talent. Koen De Bosschere

Then there is the Olympic motto: there is the HiPEAC Conference in new innovative products, how to citius, altius, fortius – faster, higher, January 2009, in Paphos, Cyprus. All convince policy makers that comput- stronger. Don’t we have similar goals events are ideal occasions for find- ing systems research is at least as – to be ever faster, more power effi- ing the perfect project partners for important for the future of mankind cient, and more reliable? The upcom- the upcoming FP7 call on computing as research in particle physics. ing FP7 call is exactly about this. We systems. hope that our community will submit As a coordinator of HiPEAC2, I am many excellent research proposals in The end of this year also means the very committed to making this hap- this call. end of the HiPEAC1 project. In ret- pen. However, as we all know, in our rospect, we believe that HiPEAC did community the credit should go to the This summer, 200 of us enjoyed the much groundbreaking work for our inventor and not to the implementer! yearly ACACES summer school in community in Europe. We started Therefore I would like to sincerely L’Aquila. From the evaluation forms, with a small network and we worked thank Mateo Valero who initially had we learnt that the participants were very hard to create a strong HiPEAC a dream about a vibrant HiPEAC com- very pleased with the program. We brand. Once we got all the processes munity Europe, and steered HiPEAC are currently working on the 2009 in place, we were then able to open through its early years. The current event, which will be announced in the network up to a wider commu- generation can only be thankful for the next HiPEACinfo. nity. Nobody could ever dream then, such an inheritance which we will when we started, that we would have protect and further develop in the There are three more important what we have today. But we do still coming years. Unfortunately, we will upcoming events for our community: realize today that this is only the first never be able to make more than there is the launch of the next FP7 step. incremental contributions to such a call on November 25, there is the fall great concept. HiPEAC Computing Systems Week in We believe that the next step will be the same week at Thales Paris, and about consolidation - how to attract Take care, the necessary resources to undertake cutting-edge research, how to create Koen De Bosschere n

Community News HPCA Group: NVIDIA Professor Partner Award

NVIDIA, the world’s largest manufac- The research conducted by the members research is to improve the programma- turer of graphics processors, has distin- of the HPCA group under the leader- bility of parallel solutions to dense linear guished the HPCA (High Performance ship of Profs. Enrique S. Quintana-Orti, algebraic problems by approaching the Computing and Architectures) group Rafael Mayo and Gregorio Quintana- design of the libraries with a high level (http://www.hpca.uji.es) from the Orti, pursues the development of faster of abstraction. Important applications Universidad Jaime I de Castellon with and more reliable solvers for dense lin- that lead to large-scale problems of this one of the 2008 NVIDIA Professor ear systems of equations using NVIDIA class arise, e.g., in boundary elements Partner Awards. hardware. A second major goal of this methods for Aeronautics, Statistics,

2 info16 Panos Tsarchopoulos [email protected]

Message from the project officer

ICT 2008 - “I”s to the future: communication technologies. ICT 2008 in the devel- Invention, Innovation, Impact will set the agenda for ICT research and opment of With more than 4000 visitors expect- innovation in Europe during this crucial next genera- ed, the biennial ICT Event is the most decade. The event hosts leading vision- tion ICT and to ensure the expansion important forum for discussing research aries from academia and industry and of existing businesses. In the spotlight: and public policy in Information and addresses topics as diverse as Europe’s SMEs, the future internet, new business Communication Technologies at role in shaping the future internet, ICT’s sectors and creative industries. The rela- European level. The ICT Event brings contribution to advancing the sustain- tionship between research and success- together researchers and innovators, pol- ability agenda and alternative research ful innovation will be critically examined. icy and business decision makers work- paths for future ICT components and Mobilising and inspiring today’s young ing in the field of digital technologies. systems. These and many other cutting- people as potential researchers and inno- edge themes will be explored in depth vators of the future will also receive spe- The ICT 2008 event takes place from at ICT 2008. cial attention at ICT 2008. 25th to 27th November in Lyon, France. It is organised by the European Inventing the Future: ICT technologies Impact through Policy. Focus on Commission’s Directorate General for for the future. The event’s “Inventing effective public policies to stimulate the Information Society and Media and the Future” theme covers major research ICT research and innovation for growth is hosted by the French Presidency of the trends in information and communica- and sustainable development. This will European Union. tion technologies (ICT) such as new include issues such as community and computing paradigms, ICT-bio and nano, public research spending, the creation Over e2 Billion for ICT research, photonics, cognition robotics and the of conditions favourable to innovation 2009−2010. This year’s ICT event - the use of ICT for science. The 2009−2010 and better coordination of the European largest research event in Europe in 2008 Work Programme and Calls for Proposals research effort in ICT. – will examine: for ICT research in the EU’s Seventh Framework Programme (FP7) will be pre- Exhibition and Networking. The event • European Union priorities in ICT sented in detail - one dedicated session is accompanied by an exhibition with research for over e2 billion of funding will present the next Call in Computing more than 200 stands and numerous available in 2009−2010 Systems. Other sources of EU research networking sessions designed to facili- • The major current technological trends funding for ICT will also be examined, tate contacts between researchers, inno- which impact upon strategic research including the new Joint Technology vators and engineers from all ICT fields. planning Initiatives (JTIs) and the Competitiveness More information at: • Public research policies to stimulate and Innovation Programme (CIP). These http://ec.europa.eu/information_society/ research and innovation initiatives together represent over e2 bil- events/ict/2008/index_en.htm lion in EU support for ICT research over Setting the ICT research agenda for the next two years. Panos Tsarchopoulos the next decade. The next ten years Project Officer will see major transformations in the Innovative Europe: new markets, technological, industrial and business new sectors, new players. This year’s landscapes surrounding information and ICT event will seek to involve new players n

Computational Chemistry, and Earth Sciences. While the solution to these problems would require months using thousands of more traditional (gener- al-purpose) multi-core processors, this research may yield a reduction in the number of resources or the solution time by one order of magnitude, and thus it is expected to have a significant impact in these application domains. n HPCA Group, Universidad Jaime I de Castellon info16 3 HiPEAC Activity

HiPEAC 2009 Conference

The three previous editions of the reconfigurable computing, interconnec- HiPEAC conference in Barcelona, Ghent, tion networks, reliability and simulation/ and Goteborg have continued to con- evaluation. Also, a tutorial on reliability tribute greatly towards establishing the will be offered by experts from Intel. HiPEAC conference as a premier forum for dissemination and networking in the The conference technical program is also area of high performance and embed- rich and diverse. The program chairs, In conclusion, the last week of January ded architecture and compilers. A clear Mike O’Boyle and Margaret Martonosi, 2009 will be full of various events that illustration of this is HiPEAC 2008’s together with their program committee, are central to HiPEAC activities, and record participation of more than 250 have selected 27 out of 97 submitted therefore provide many reasons for you attendees. It is a great honour and privi- papers for presentation at the confer- to attend. Please check the conference lege to be the general co-chairs of the ence. The conference will also feature web page (http://www.hipeac.net/con- fourth HiPEAC conference that takes two keynotes from Tilak Agerwala (IBM ference) for conference information and place in Paphos, Cyprus on January Research VP, IBM Systems) and from news, and to register for these great 25-28, 2009. The local organization Francois Bodin (CTO, CAPS Entreprise). events in time. has been arranged by Yanos Sazeides, There will be plenty of opportunities for University of Cyprus. interaction during the conference, which Looking forward to seeing you in Paphos takes place at Amathus, a 5-star seaside in January next year! HiPEAC 2009 is shaping up to be an hotel. The conference will include an exciting event. Stefanos Kaxiras, the excursion to various archeological sites. André Seznec workshop and tutorial chair, is organis- After the main conference, HiPEAC IRISA/INRIA ing a set of seven workshops on topics research cluster meetings will take place central to HiPEAC: multi-cores, compiler on 29th and 30th January. Joel Emer optimizations, compilation framework, INTEL/MIT n

HiPEAC Activity HiPEAC Fall Computing Systems Week November 24-28 2008, Thales Research and Technology, France

We are glad to announce that the The HiPEAC Cluster Meetings will take ancient 17th century typical domains to “HiPEAC Computing Systems Week” place on November 27th and 28th. allow further opportunities for discus- will be hosted by Thales Research and Extra meeting rooms will also be made sions and networking. Technology, Palaiseau, France from available on Monday 23rd and Tuesday We are looking forward to seeing November 24th to November 28th, 24th (before the industrial workshop) you and your collaborators during the 2008. The facility is located in the sub- for extra HiPEAC-related meetings, such HiPEAC Computing Systems Week! urbs of Paris and close to INRIA-Saclay as research-project meetings. Please take note of the following sched- and Paris XI University. Besides these events, we also plan ule; all but the social event are taking The week will include the following co- an exciting social event at one of the place at Thales Facilities: located events: The 6th HiPEAC Industrial Workshop on November 26th. The goal of this Date/Time Event workshop is to bring together research- Monday 24th November, all day Extra meetings ers from academia and industry to Tuesday 25th November, all day Extra meetings investigate the challenges, techniques, Wednesday 26th November, all day 6th HiPEAC Industrial Workshop tools and requirements of multi-core Thursday 27th November, all day HiPEAC Cluster Meetings architectures in embedded systems, especially in the automotive, aerospace, Thursday 27th November, PM Social Event and security industry. Friday 28th November, all day HiPEAC Cluster Meetings

4 info16 HiPEAC Activity

6th HiPEAC Industrial Workshop –Domain specific, real time and safety critical embedded systems in the multi-core era November 26th, 2008 Organized by Embedded System Lab, Thales Research & Technology http://www.hipeac.net/industry_workshop6

The ubiquity of embedded electronics Furthermore, a lot of these systems, (including performance predictability, in all aspects of human life as well as especially in the automotive and the reliability constraints, …). the emerging applications in automo- aerospace domains, are subject to hard • Low power techniques for multi- tive, aerospace and security industry real time and safety critical requirements cores suggest multi-core architectures as a that reduce the efficiency and potential • Reconfigurable and adaptive architec- natural path to scalable performance. of current multi-cores and high perform- tures Still, such applications have challenging ance architectures. • Heterogeneous multi-core architec- requirements beyond those existing in tures and virtualization techniques high performance computing and high The goal of this workshop is to bring • Compilers, tools and methods for volume embedded systems. together researchers from academia and embedded, high performance and As well as the increasing need for per- industry to investigate the challenges, heterogeneous parallel architectures formance, low power consumption, and techniques, tools and requirements of • Mapping parallel applications to het- the challenges the industry is facing multi-core architectures in embedded erogeneous multi-cores to effectively program parallel archi- systems, especially in the automotive, • Simulation tools, design space explo- tectures, such specific systems are usu- aerospace, and security industry. ration and performance modeling of ally subject to conflicting requirements multi-core and/or application specific such as flexibility (or “genericity”) to The topics of interest include, but are architectures. compensate for low volume production not limited to: • Iterative compilation techniques for needs, and customizability to address • Processor customization and applica- embedded and high performance sys- each application domain needs. tion-specific architectures tems • Parallel and multi-core systems for • Domain/application specific accelera- real time and safety critical systems tors n

In the Spotlight

Good bye and Thanks Sylvie! Hello and Welcome Klaas!

As of September 1st, Sylvie Detournay, who was the HiPEAC technical staff member, left the network. Over the last year, Sylvie enhanced the HiPEAC website with many new services, and she helped with the logistics for sev- eral HiPEAC tasks, such as the cluster meetings, the conference, and the summer school. She will start a PhD in Mathematics at INRIA (the network works in many ways). We wish her a very successful career.

Sylvie is replaced by Klaas Millet. As well as possessing a Master in History, Klaas also has a Master in Applied Informatics. He has built several web- sites using the Drupal-system (e.g. sition to go smoothly. You may contact www.halestra.be). We expect the tran- Klaas at [email protected]. n

info16 5 In the Spotlight

EC FP7 STREP Apple-Core Project: Architecture Paradigms and Programming Languages for Efficient Programming of Multiple CORES Apple-CORE will develop compilers, operating systems and execution platforms to support and evaluate a novel architecture paradigm that can exploit many-core computer systems to the end of silicon.

• SVP cores can execute legacy binary systems are given below and show the Coordinator & Technical Manager code unchanged. expected performance of a single clus- Name: Chris Jesshope The project has produced a full soft- ter of SVP cores executing a 256-point Institution: University of Amsterdam ware emulation of a cluster of SVP FFT. This code generates 8 stages of Email: [email protected] cores. This emulation is being used to 128 threads and is a relatively small Project website: evaluate memory systems, to evaluate problem. The performance assumes that http://www.apple-core.info code-generation schemes for high-level, the FFT was delegated to a set of bare Partners: University of Amsterdam parallelising compilers and to investi- cores using an SVP create instruction (Netherlands), University of gate the model’s generality. Within the and also assumes all local caches were Hertfordshire (UK), Institute of project we will also be implementing cold. A number of on-chip memories Information Theory and Automation a hardware prototype of an SVP core are evaluated. The dotted line shows (Czech Republic), University of Ionnina based on the LEON III and we will port the maximum theoretical performance. (Greece), Associated Compiler Experts a Linux operating system on top of the All results show very good scaling. Note (Netherlands), Gaisler Research SVP kernel. that at 32 cores there are only 4 threads (Sweden) The core compiler for this project is per core and very little opportunity to based on GCC. We have defined a lan- tolerate memory latency. It is not surpris- guage, µTC, that adds the SVP kernel ing therefore that performance begins Objectives actions to the C language and we have to saturate at this point. As expected, The Apple-CORE project will evaluate modified GCC to compile this language the ideal memory saturates latest and a programming model developed by to SVP extended ISA. Currently the com- the COMA earliest, as there is signifi- the University of Amsterdam in the FP6 piler produces Alphas code but will also cant latency induced by the coherence AETHER project. This model, SVP (the be modified to support the SVP version protocol. Self-adaptive Virtual Processor) provides of Leon III. for concurrency creation, control and The project will promote the μTC lan- Expected Impact reflection. Its actions can be considered guage as a standard front-end to the With the infrastructure developed in this as an operating system kernel. The SVP GCC compiler and will use it as a target project, it will be possible to evaluate model provides the following benefits: for all user-level compiler development. silicon products with many cores for a • Safe composition of concurrent pro- These include: variety of applications, including embed- grams without inducing deadlock; • a parallelising C compiler based on ded, commodity and high-performance • It captures program concurrency the ACE CoSy framework systems. The project will also impact pro- abstractly without any mapping or • a data-parallel functional language – grammability aspects of this migration scheduling - even at the ISA level; Single-assignment C (SaC) and support the software engineering • It captures resources as first-class of highly concurrent systems. This results objects and supports the dynamic Some Results from the natural separation of concerns allocation of resources. Some results from the Alpha multi- between application and concurrency In Apple-CORE this model is imple- core emulator using different memory issues in the SVP model. The languages mented in the ISA of a conventional supported are: μTC for low-level system RISC processor, which allows many-core code, C for legacy applications and SaC chips to be designed with the following for the rapid development of efficient advantages: code for high performance applications. • Provides a high degree of latency tol- Overall, this will allow the concept of erance to asynchronous operations, massively threaded architectures to be such as memory references; explored by the European computer • Given the abstract nature of the bina- industry and the outputs of the project ry code, it has position independence could have a large impact on all the and is independent of the number of above markets. cores used to execute it; n

6 info16 Community News

Multicore days in Stockholm, Sweden

All speakers at Multicore days, 2008 in Stockholm

For two days (11th and 12th September, intensive systems industry in Sweden. tions to current problems. 2008) multi-core technology was the The Initiative brings together all par- • Swedish Multi-core Workshop: An focus at the Multi-core days held in ties interested in taking this technol- annual workshop for academia and Stockholm, Sweden. The event followed ogy forward. The founding members industry to present and discuss recent up the success from last year with more are from six academic institutions and research results. The first workshop than 300 registered visitors who listened three enterprises under the umbrella of will be organized by Blekinge tekni- to world renowned experts in the area, Swedsoft (www.swedsoft.se). ska högskola in November 2008. such as Anant Agarwal of MIT and The vision of the Initiative is to increase • Best practices workshop: An annual Tilera corporation, Kunle Olukotun of competitiveness of the Swedish software workshop for industry and academia Stanford Pervasive Parallelism labora- intensive industry, leading to an ability to present and discuss best practices tory, David Padua of University of Illinois to leverage multi/manycore technology. in multi-core software development. and HiPEAC members Per Stenström of Objectives of the initiative include: First BP workshop will take place in Chalmers and Erik Hagersten of Uppsala • To ensure that graduates from February 2009. University, among others. Swedish universities have the high- • Networking activities between est competence in multi-core-related research groups One of the more exciting sessions of technology • Working groups will be formed the event was the panel discussion • To help Swedish multi-core research around a number of important top- where the future direction of multi-core remains competitive worldwide ics: architectures and programming models • To help leading researchers seek col- • A technology roadmap from a were debated. It was clear that the laborations with members of the net- Swedish industrial perspective panel members had strongly differing work • Curriculum development views in various areas, e.g. homogene- The Initiative will help foster ties • Coordination and marketing of ity vs heterogeneity; many thin cores between Swedish and international Swedish multi-core competence or fewer fat cores; domain specific lan- industrial/academic partners, such as guages vs general purpose languages the ones at UC Berkeley/University of The Swedish multi-core initiative is supporting parallelism. There is still a lot Illinois UC (Microsoft, Intel US $20m) coordinated by a steering committee of interesting work to be done. and at (AMD, HP, consisting of representatives of the Intel, NVidia and Sun US $6m). founding members (Blekinge Institute Multi-core days 2008, co-organized of Technology, Chalmers, Mälardalen with Ericsson SW Research, is one of Activities organized by the Swedish University, Swedish Institute of Computer the activities in the newly established Multi-core Initiative include: Science, Royal Institute of Technology Swedish Multi-core Initiative. This is a • Multi-core day: A state-of-the-art and Uppsala University). concerted effort to address the tactical annual seminar for industry and For more information about the Swedish and strategic issues related to multi-core academia highlighting technology Multi-core Initiative and its activities see processor technology for the software advances, research results and solu- www.sics.se/multicore. n

info16 7 Guest Column

Professor Norbert Wehn, University of Kaiserslautern

As a relatively new member to the the energy consumed by other system introduced by dynamic power manage- HiPEAC network, I am very proud to be components, such as the memory sub- ment. XEEMU is still under active devel- asked to write a guest column for the system, for example, which is known to opment and has recently been extended HiPEAC newsletter, and thus be given contribute significantly to system energy with accurate models for the simulation the opportunity to present some of consumption. of SDRAM power management schemes the research conducted at my research and support for OS simulation. We feel group. My name is Prof. Norbert Wehn, Thus, a meaningful evaluation of power that XEEMU is a very interesting tool and I currently head the Microelectronic management schemes and other energy for researchers in the field of power- Systems Design Research Group at the related design decisions can only be efficient software design and power Department of done either by power measurements or management. and Information Technology at the by simulation. While power measure- University of Kaiserslautern, Germany. ments often are complex, error prone, XEEMU is freely available in source and Today, I want to introduce to you our or simply infeasible in early design stag- binary form for Linux operating systems XScale Energy EMUlator XEEMU, a simu- es, simulators often lack capabilities under GPL license from our webpage at lation tool which we developed in coop- of memory subsystem simulation, and www.inf.u-szeged.hu/xeemu. For fur- eration with the research group from do not model the time- and energy ther information please also visit www. Prof. Tibor Gyimóthy at the University of overheads introduced by voltage and eit.uni-kl.de/wehn or feel free to write Szeged, Hungary. frequency scaling. an e-mail to [email protected]. n

Energy efficiency is key in embedded-sys- In order to overcome the above men- tem design. Understanding the complex tioned problems we have developed issue of software power consumption in XEEMU, a cycle accurate simulator for early design phases is of extreme impor- the wide spread XScale architecture. tance for making the right design deci- XEEMU was first presented to the sions. Power management schemes for research community at PATMOS 2007. CPUs have been thoroughly researched, In contrast to other well known simula- dynamic voltage and frequency scal- tors, such as the SimpleScalar tool set, ing (DVFS) perhaps being the most it aims at the most accurate simulation important one. A large amount of work possible for an existing platform. It com- is based on models, which assume a bines cycle accurate behavioural simula- quadratic dependency of power con- tion models for the CPU, the system sumption on supply voltage and a linear busses, and a configurable SDRAM sub- dependency on operational frequen- system. All its power models are based cy. In reality, however, this assump- on measurements on an evaluation Prof. Dr.-Ing. Norbert Wehn. tion does not hold for today’s complex board and include accurate simulation Microelectronic Systems Design Research CPUs and it also completely neglects for the timing and energy overheads Group. University of Kaiserslautern

Community News The Honorary for 2008 awards to Prof. Mateo Valero

On September 12th, the University of • His research results have had an ed this scientific area up to the mid Belgrade awarded Prof. Mateo Valero, extremely strong impact on the devel- 1950s). Director of the National Supercomputing opment of world science in the field of • His contribution to the creation of the Center in Spain, an Honorary Doctorate computer architecture. Furthermore, computer architecture department at of the University of Belgrade. The rea- Prof. Mateo Valero has once again the Technical University of Catalonia sons for this important recognition are placed the whole of Europe on the (UPC) as follows: map for research in this field (the • His contributions since 1984 to the United States and Japan dominat- creation of several parallel computing

8 info16 Guest Column

Professor Gerd Ascheid, RWTH Aachen University

As coordinator of the UMIC research which for mobile access should be as cluster, I feel honoured to contribute a close as possible to wired access. guest column for the HiPEAC Newsletter. About two years ago, Germany started An example of a key research topics at an initiative that involved the funding UMIC is the interfaces between applica- of “clusters of excellence” at German tions and the mobile network, enabling a universities, granted in a country-wide more efficient use of the mobile network competition. The only research cluster in and providing a significantly improved the mobile communication domain was quality of service. Moreover, suitable awarded to RWTH Aachen University; its network architectures to improve ultra focus was on Ultra high-speed Mobile high-speed data rate coverage and Information and Communication - abbre- an intelligent use of mobile network viated as “UMIC”. resources. Mobile devices must have much higher compute performance in While the dominating application in the future to support such a strongly Prof. Dr.-Ing Gerd Ascheid mobile networks in the past and still improved quality, much higher data rates Institute for Integrated Signal Processing today is voice communication, traffic and demanding applications, e.g. in the Systems (ISS) RWTH Aachen University figures show strong growths in data multi-media realm. Unfortunately, future communication. It is expected that future silicon technologies will not allow much mobile network traffic will be dominated higher clock speeds. Therefore, mas- The UMIC research cluster involves more by data services. A key driver for this sively parallel processing will be required. than 20 chairs of the Electrical Engineering traffic is the mobile internet. In addi- The processing for wireless communica- and Information Technology Faculty and tion, many new “mobile applications” tion includes hard real time constraints. the Computer Science Department of are under development or envisaged. Because mobile devices are battery oper- RWTH Aachen University. UMIC operates 3G technology and its evolution (LTE) - ated, low power is another key require- a research centre, where the interdisci- which will be deployed in the coming ment. As a result, wireless devices will plinary research teams, four new UMIC years - support increasing data rates. be based on heterogeneous rather than professorships and a joint research lab However, significant progress is still homogeneous multi-processor systems are located. In December the centre required to meet future needs. Key issues on a single chip (MPSoC). Key UMIC will move from its temporary site to the are not just higher data rates, but high research topics in this field, which has UMIC research centre building which is data rates for many users in dense traffic close links to the HiPEAC network of under construction. areas (densely populated areas, business excellence, are power efficient proces- areas, etc.) and an improved country- sor architectures, ultra fast simulation For further information please also visit wide coverage of high data rate services. of MPSoC to enable hardware/software www.umic.rwth-aachen.de or contact Even more important is an improvement co-design and programming of embed- us via email: [email protected] of the quality, as perceived by the user, ded MPSoC. aachen.de n

centers made the construction of the • His activities have had a strong impact National Supercomputing Center in in Serbia. 2004 possible, where he is the director. This public consortium and the compu- The Rector of the University of Belgrade, ter architecture department at UPC Branko Kovačević presented the award to are strongly linked to the computer Prof. Mateo Valero and thanked him for architecture departments of the most his contributions and continuous dedi- prestigious international universities, cation to science and especially to the such as the University of Belgrade. relevant progress made in the field of • Prof. Mateo Valero has built strong computer architecture. bridges with the IT industry, especially Microsoft, Intel and IBM. n

info16 9 HiPEAC Start-ups

Promotion of HiPEAC Start-ups Marco Cornero

In this issue of the start-up promotion tronics industry today: at one extreme ming multi-cores, offering interactive section we focus on Nanochronous Nanochronous Logic offers innovative automatic software threading tools. Logic and Nema Labs. Although very EDA tools for the design and fabrication We are pleased to see that HiPEAC different in nature, they still share the challenges of deep submicron circuits, offers visibility and networking oppor- same ambition to address some of while at the other end NEMA Labs deals tunities in such a wide range of funda- the fundamental problems of the elec- with the issue of effectively program- mental topics. n

Nanochronous Logic

Nanochronous Logic is a dynamic start- The Technology 45nm and below. Utilizing the novel up company providing breakthrough Nanochronous Logic provides a novel concept of creating timing partitions for EDA technology for ASIC and SoC devel- timing methodology creating designs a design, an IC’s global timing can be opers to address design challenges at that are variation-aware; addressing the optimally tuned at the presence of post- 65nm, 45nm and below. The company hurdles presented by processes at 65nm, fabrication variations. is a co-located entity with corporate activities in San Jose, CA and R&D facili- ties located in Heraklion, Crete, Greece. NanoSync™, Nanochronous’ first product is an EDA tool for Nanochronous Logic’s “De-Synchro- applying de-synchronization to any conventional ASIC or SoC nization” technology enables for the design. It integrates smoothly into existing front-to-back-end first time in EDA the automatic transfor- EDA tool flows. Based on an analysis of the clocking, the data- mation of any conventional ASIC or SoC path structure and the timing constraints of an IC, NanoSync is circuit to a functionally equivalent circuit capable of dividing an IC into local regions of timing. Each region with embedded variation-sensitive and is driven by an adaptive, variation-sensitive clock and it can bor- timing-adaptive structures. A de-syn- row time from neighboring timing regions. Therefore, global chronized device improves IC implemen- timing is not directly impacted from every local problem. tation in many ways: Power-saving techniques, such as MSMV and PSO, can be incor- • Reduces power by enabling effective, porated as timing region properties. DFVS is inherently supported continuous DFVS (Dynamic Frequency by the methodology. The timing capabilities of the tool include Voltage Scaling), effective MSMV rationally related clocks, asynchronous clocks and Globally (Multi-supply Multi-voltage), and PSO Asynchronous Locally Synchronous (GALS) systems. Back-end (Power Shut-Off) Operation integration is another key strength of the tool. Post P&R informa- • Enables effective Yield Correction for tion, such as placement locations and back-annotated parasitics, Power or Performance by allowing can be fed back into NanoSync in order to fine tune and re- fabricated die to individually self-diag- calibrate the desynchronized clocking network parameters prior nose, report and even self-tune their to tapeout. speed based on operating conditions NanoVerify™ is a formal verification EDA tool that works in con- and applied voltage junction with industry standard formal verification engines. It is • Increases achievable performance by able to formally verify the correctness of the de-synchronization (i) adapting a circuit’s timing to the conversion process by comparing the synchronous and desyn- actual, not the worst-case predicted chronized netlists for equivalence. NanoVerify is a total solution conditions, and (ii) exploiting a time- for desynchronized designs. It is capable of verifying the logic, borrowing mechanism which can the timing partitions and the clocking structures of the desyn- compensate for inter/intra-die varia- chronized design with respect to the reference (original design) tions through its own equivalence and assurance checks. • Enables EME (Electromagnetic Emissions) constrained design through clock-control mechanisms Nanochronous Logic has filed a number released versions of its software tools, The Nanochronous team is composed of of U.S.P.T.O. patent applications for its NanoSync™ and NanoVerify™ to Brian Hamel, Co-founder/CEO, Christos technology. tier one IDMs. The company is looking P. Sotiriou, Co-founder/CTO, and Spyros to expand its initial customer relation- Lyberis, VP Engineering/lead software Company status ships throughout 2008 and plans to developer. Nanochronous Logic has already deliver additional tools to complement

10 info16 HiPEAC Start-ups

NanoSync and NanoVerify throughout number of industrial complexity designs ing round and is now targeting a Series A 2008 and beyond. have been de-synchronized in 90nm and round in the following months. This will Nanochronous Logic has passed multiple 65nm processes, where the voltage scal- allow Nanochronous to ramp up busi- designs through the NanoSync de-syn- ing ability and the low electromagnetic ness development and technical support chronization tool and has already dem- emissions have been successfully demon- resources, as well as support the contin- onstrated working silicon on an older, strated through simulation. ued development of additional products 250nm technology node. A significant The company has closed on a seed fund- derived from the core technology. n

Nema Labs – Fast and Reliable Threading on Multicores

Near the turn of the century, processor that allow software developers to thread code. A press of the next architects began to use the extra logic legacy C/C++ code fast and reliably by button determines whether to place multiple cores per die because using a sequential abstraction with the the threading process pro- extra investment in single-core proces- help of automatic threading in an interac- duced code that is free of deadlock and sors generated extra heat, not greater tive fashion. race conditions, the two most notorious performance. All major silicon processor problems faced by programmers attempt- manufacturers were providing commodity At Chalmers University, Dr. Per Stenstrom ing to parallelize code. Pushing the final processors with at least two cores per die, has contributed to architectural advances button will report the effective perform- and system manufacturers rapidly brought for parallel systems for well over two ance speedup obtained by the threading to market new machines that were, in decades. With the emergence of multi- process. fact, not optimum for most software on cores on the horizon, Dr. Stenstrom lev- the market and in development. This mis- eraged both his knowledge of parallel This seemingly simple process conceals match between the hardware capability systems and his mission to educate new a wealth of innovations underlying the of commodity systems and the software generations of computer scientists into a entire method. With unique and pow- architecture of decades of products has new company, Nema Labs, that is offer- erful algorithms, Nema Labs is able to been termed a ‘software crisis,’ an expres- ing a roadmap of threading tools that analyze and safely thread loops in seconds sion that truly understates the magnitude innovates on top of automated C/C++ that might otherwise take many weeks of the problem facing the trillion dollar source code threading products to accel- or months by a programmer trained in information technology industry. erate reliable threading of legacy code. parallel programming. And this is accom- The first product, called FASThread (Fast plished for both novice and experienced It will take many years of parallel pro- Automated Software Threader), will be programmers, regardless of their parallel gramming education and innovation for released in early 2009 as a Linux plug-in programming experience. The combina- software vendors to create products that for the Eclipse Integrated Development tion of these two features—threading will blaze when executed on multi-core Environment (IDE), and will provide devel- difficult code, and doing it fast and auto- systems. In the meantime, nearly 90% opers a familiar setting for threading their matically—makes the Nema Labs’ product of today’s C/C++ programmers can’t pro- C programs with a few simple, automated release of high interest to customers and gram for multi-cores, and at least 70% of steps. Shortly thereafter, plug-ins for users. legacy C/C++ software won’t run better other IDEs and operating systems will on multi-cores. Multi-core systems, with be released followed by a C++ version Nema Labs has released an alpha prod- the capability to execute multiple threads, as well. uct for targeted work with customers in are in fact mostly running single-threaded Sweden to help them discover how they code. FASThread is straightforward to use and might significantly improve their prod- does not require any prior experience with ucts with minimal new investment. The Nema Labs’ mission is to offer tools that parallelization. The user loads a project company has also launched a new US allow software developers to concentrate to be parallelized into Eclipse, sets the operation based in Silicon Valley, and has on delivering functionality and to leave number of threads desired, builds, and initiated projects with several equipment the task of reliably threading the code runs the program. A push of a button manufacturers and software development to leverage on the performance of multi- then profiles the code and produces a groups. The company plans to release its cores to intelligent tools. While tools and call graph from which the user can see first product in the spring of 2009. methods to parallelize C/C++ code are where the program is spending most of available, they either reduce software its time. The user can then zoom into The innovations and discoveries in the development productivity by offering too the function of interest and right-click on Nema Labs’ products are made possible low a level of abstraction (e.g., by using the loop to be threaded, selecting it for by an exceptionally talented and focused OpenMP) or by only occasionally lev- threading analysis. A push of another group of software engineers and devel- eraging the performance potential of button threads the loop and informs the opers. Nema Labs is currently expanding multi-cores (auto-parallelizers). By con- user of actions to take to extract parallel- rapidly and seeking highly creative and trast, Nema Labs roadmap realizes tools ism from what was otherwise intractable skilled people to join the team. n

info16 11 In the Spotlight

EC FP7 STREP eMuCo Project: Embedded Multi-Core Processing for Mobile Communication Systems ICT-eMuCo project addresses the system platform of future mobile devices based on multi-core architecture

• Flexible system platform by a modular System-/house 2.5/3G Multi- 4G architecture. keeping Access APP 3 APP 2 Media LTE • Seamless system platform allowing functions stack the coexistence of difference radio Scalability Scalability access technologies and software environments on the same platform. Multitasking Application Multi-threaded “Modem” • Improvement of computational Subsystem subsystem processing capacity of applications • Reduction of power consumption • Scalability OS 3 OS 2 OS 1 OS: Operating System VM: Virtual Machine • Real time handling of the system VM VM VM control signals and real time applica- tions. Load Balancer Dynamic Task Mapping Run Time Environment It is expected that eMuCo will have an User Mode impact on the future of mobile devices due to the revolutionary approaches L4 Micro Kernel System Mode in the system architecture. Two major European companies are active in this Core1 Core3 Multicore Hardware Platform market: ARM as the world’s leading Core2 Core4 supplier of cores for mobile devices and Flexible System Architecture for Future Mobile Devices Infineon as the world market leader for communication RF integrated circuits. ICT-eMuCo (www.emuco.eu) is a the best quality of service in the current Both will participate in this project and European project with a total cost environment of the user. In addition, on will ensure the technology transfer. of 4.6m€ which is supported by the one hand, it is expected that there will Many other suppliers in Europe depend European Union under the Seventh be exponential growth in the usage of on this market either directly or as a Framework Programme (FP7) for multimedia applications, such as video supplier for the major market players. research and technological develop- streaming, video conferencing, com- n ment with 2.9m€. This project is coor- plex graphics etc., thereby increasing dinated by Ruhr-Universität Bochum, the demand for computational power, which is known as one of the big- which can not be pursued further by gest universities in Germany. There accelerating the processor clock. On are also many strong academic, as the other hand, the coexistence of well as industrial, partners participat- multiple software environments will be ing, such as Technische Universität necessary for delivering all demanded Dresden (Germany), University of York services to the user. (United Kingdom), “Politechnica” Today’s mobile communication systems University of Timisoara (Romania), usually contain multiple processing Infineon (Germany), Telelogic (Sweden), units, but individual units are typi- ARM (United Kingdom) and GWT-TUD cally dedicated to specific tasks rather (Germany). than being general purpose, e.g. spe- Mobile communication has become the cifically for protocol stack handling. dominant branch in the telecommuni- Other design approaches allocate spe- cation’s sector over the last decade and cial tasks to dedicated systems resulting is still rapidly growing in the market. in unnecessary hardware for situations Mobile devices for standards like the other than high load. Attila Bilgic Universal Mobile Telecommunication The aim of eMuCo is to address the Institute for Integrated Systems System (UMTS) and the future Long platform for future mobile convergent Ruhr Universität Bochum Term Evolution (LTE) incorporate multi- devices based on multi-core architec- D-44780 Bochum, Germany ple radio access technologies to enable ture offering: [email protected]

12 info16 HiPEAC Students

Trip Report: ACACES 2008

This summer, I was able to attend It was clear the first day that the lectur- After two more days of increasingly ACACES-2008, the fourth edition of the ers were not going to limit themselves interesting classes, coffee breaks during annual HiPEAC summer school held in to a mere introduction to the topic they which people got to know each other L’Aquila, Italy. What follows is a brief were lecturing on. Most made it clear better, dinner with people from coun- trip report of a week full of learning that they would also touch on hot ongo- tries all around the world and evenings new stuff from field experts, meeting ing research and invite the audience to filled with card games, beer and tons of interesting people and crappy wireless actively participate in the classes. Several fun, people got ready for the gala din- internet. people grasped the opportunity given ner and closing party on Friday night. to them by answering questions, giving Everyone enjoyed the free ice cream, After getting up early in the morning insightful comments and questioning wine, beer and sangria, and the live on Sunday July 13th to catch my 10am the lecturers themselves. band playing funky music; some brave flight to Rome Fiumicino airport and souls even got up to dance (after a few catching the bus in Tiburtina station Another invited talk was given on drinks!) We also exchanged contact info near downtown Rome, we arrived at the Tuesday evening by Peter S. Magnusson, to keep in touch. campus in L’Aquila just in time for the founder of VirtuTech. He entertained small opening reception. the audience by talking about entrepre- All and all, the HiPEAC summer school is Although most students were a bit neurship, going over his top 10 reasons an experience every PhD student should apprehensive at first, I soon noticed to do a start-up in IT (most of which try and enjoy at least once in his/her people having interesting discussions were money-related). I was thoroughly career. over a glass of bubbles. I was able to amused by the people asking him ques- have a nice informal talk with Mary tions like “Are you a millionaire?” and More information, including an exten- Lou Soffa, the lecturer of the compilers “Once you have all that money, what sive set of pictures can be found at course I was going to take the rest of comes next?” http://www.hipeac.net/summerschool/. the week. And don’t forget: next year is an anni- In the evening, Koen De Bosschere, On Wednesday, the afternoon course versary edition, i.e. the fifth annual the main organizer, held his opening slots were replaced by a student poster HiPEAC Summer School. So, don’t for- speech, which was immediately fol- session. Each participating student was get to apply in time, because only a lim- lowed by a keynote talk by Yale Patt, invited to present his/her poster on ited number of attendees are allowed! professor at UT Austin. He explained ongoing or future work. I actively par- n why we should not give up on uniproc- ticipated and was glad to find multiple essor performance in the multi-core era people interested in my work, including we are in. fellow PhD students, people from indus- try and lecturers. The first classes got started early Monday After classes, the annual soccer tour- morning. Each day, students took class- nament was held, traditionally against es in four time slots, choosing one of the campus personnel, but this year Kenneth Hoste three courses in each slot. Thus, 12 also against a team of young Italians. (kenneth.hoste@elis. field experts gave lectures on a variety Although the ACACES crowd did ugent.be) of interesting topics, including (but not the best they could to cheer for their Ghent University limited to) compilers, networks on chip, team(s), we still lost the trophy to the http://www.elis.ugent. transactional memory, etc. young Italian team. be/~kehoste

info16 13 HiPEAC Students

Trip Report: MPSoC 2008 – an exciting week in St. Gerlach

which are tremendously helpful and valuable for the university research- ers. Being “confined” in the glamorous Château also allowed me to approach and talk to many senior executives and famous professors for in-depth details about their presentations and other interesting topics face-to-face, a privi- lege that you probably won’t find in many other conferences.

Besides the dense technical activities, MPSoC 2008 also provided plenty social situations for the attendees. The coffee breaks in the sunny garden and the wonderful dinners in the Château were MPSoC 2008 attendees and Speakers (slides available: mpsoc-forum.org) perfect for networking and bringing Maybe I would never have come across multi-processor SoC design. This year people together easily. A city tour of the Château St. Gerlach in my life if it were the event featured more than 50 world- town of Aachen was also arranged dur- not for the venue of the MPSoC 2008 class speakers from both industry and ing the week – the nice European sum- event. Fortunately enough, in a charm- academia, giving an excellent coverage mer weather and a fun meet-up in the ing June week, I not only enjoyed the of the challenges and opportunities in guest house of Aachen University simply irresistible beauty of this 800-year-old the MPSoC. The 4-day, info-packed made this great event even greater. luxury Château near Maastricht/Aachen, presentations delivered a reliable and but also had great fun in learning clear message on where we are now in It has been a fruitful and memora- about state-of-the-art MPSoC technolo- the MP-infant age and where we are ble week for me in the marvelous St. gies from presentations and face-to-face heading to on the road to maturity. I Gerlach. I hope the MPSoC 2009 edition meetings with many top experts around particularly like the breadth of the top- finds its next superior spot well. It’d be the globe. It has been an unbelievably ics – to give a few examples, the new great if I can be one of the crowd again rewarding experience for a PhD student technology fronts like 3D chips, effi- next year at the MPSoC, and hopefully like me. cient hardware implementations, e.g. in meet you there. the streaming/SDR areas, the exploding MPSoC 2008 is the eighth event of the MPSoC software programming chal- Weihua Sheng forum series, which focuses on the fun- lenges, and many proven good practices ([email protected]) damental and strategic issues to master from the leading industrial companies, ISS/SSS, RWTH Aachen University n

HiPEAC Students

Polychronis Xekalakis, Institute for Computing Systems Architecture, University of Edinburgh

I started my research while still being an University of Edinburgh! develop my ideas in a really nice enviro- undergraduate at the University of Patras ment where people do a lot of interesting in Greece. There, under the supervision of My name is Polychronis Xekalakis and world-class research. Assistant Professor Stefanos Kaxiras, I was I am currently a fourth-year PhD can- working on thermally-aware leakage con- didate at the School of Informatics in With the advent of multicore systems, trol policies and way-selection techniques the University of Edinburgh, under the one of the problems our group is trying for cache memories. Finishing my under- supervision of Assistant Professor Marcelo to solve is how to utilize them as much graduate studies I wanted to do my PhD Cintra. Being a member of the Compiler as possible so as to be able to gain some in a well established research group, while and Architecture Design (CArD) group, performance benefits even from sequen- still living in a very nice city so I picked the I have had the opportunity to work and tial applications. In an effort to relieve

14 info16 Community News

New Book João M. P. Cardoso and Pedro C. Diniz: Compilation Techniques for Reconfigurable Architectures, Springer, 2008, ISBN 13: 978-0-387-09670-4.

The extreme flexibility of reconfigurable programming languages, most notably architectures and their high-performance imperative languages, to reconfigura- potential have made them a vehicle of ble architectures. The book presents a choice in a wide range of computing brief but broad description of reconfig- domains, from rapid circuit prototyping urable architectures and contemporary to high-performance computing. The synthesis flows, followed by a detailed increasing availability of transistors on a description of important code transfor- die has allowed the emergence of pow- mations and mapping techniques when erful reconfigurable architectures with targeting reconfigurable architecture. In a large number of hardware resources, this description, the authors distinguish flexible interconnection topologies, and between the applications of mapping storage resources, making thus possi- techniques to fine- and coarse-grained ble to accommodate complex computing reconfigurable architectures. The book structures. also presents a comprehensive survey urable computing. Through this book, of significant research prototype com- the authors hope to motivate further To exploit the potential of these reconfig- pilation efforts and concludes with the discussions on the challenging topic of urable architectures with efficient recon- authors’ perspective on possible avenues compilation for reconfigurable architec- figurable computing implementations, of research for reconfigurable computing tures. It is also hoped that this book can programmers must currently navigate a to truly become a mainstream computing be a reference for researchers, educators maze of program transformations, map- paradigm. and students to learn more about the ping, and synthesis steps. The richness most prominent efforts on this subject in and sophistication of any of these steps This book is primarily intended for prac- recent years. make the mapping of computations to titioners, researchers, and graduate stu- reconfigurable architectures an increas- dents in the areas of study of hardware Despite the tremendous progress in ingly daunting process. It is thus widely compilation and advanced computing recent years, compilation techniques for believed that automatic compilation from architectures in the fields of Electrical and reconfigurable computing platforms still high-level programming languages is the and Computer offer many exciting research and devel- key to the success of reconfigurable Science. Since it focuses on the spe- opment opportunities. The authors hope computing. cific topic of compilation from high-level this book will motivate further research program descriptions to reconfigurable efforts in this domain and serve as a base This book describes a wide range of code architectures, this book can easily sup- for a deeper understanding of the overall transformations and mapping techniques port advanced compiler and computer compilation and synthesis problems, cur- for programs described in high-level architecture courses related to reconfig- rent solutions, and open issues. n

the programmer from the parallelization a deep understanding of how the new meet and exchange ideas with many burden, we are trying to design systems execution paradigm affects and is affect- researchers across Europe. I was also able perform automatic hardware paral- ed by them and finally employ techniques able to work in a company in a Sweden lelization, with some minimal compila- that will minimize their power consump- for three months under a HiPEAC grant, tion support. Under this effort, we are tion while increasing their performance. which for me was an invaluable experi- performing research in architecture and One of the things I am currently looking ence. Of course this was all due to the compilation using both conventional and at, is whether we can distinguish cases financial support and to the organization machine learning based techniques. where parallelism does not exist and try of events like the summer school by the to utilize the underlying threading mecha- HiPEAC community. My current work aims at designing micro- nism to increase the ILP of the application architectural support for efficient Thread by converting the TLS threads to helper Level Speculative (TLS) systems. Increasing threads. contact: the efficiency of these systems involves [email protected] identifying which are the critical microar- During the course of these years, I have http://homepages.inf.ed.ac.uk/s0565349/ chitectural mechanisms for them, gaining been lucky enough to travel a lot and n

info16 15 Community News

New Book Stefanos Kaxiras, Margaret Martonosi: Computer Architecture Techniques for Power- Efficiency, Morgan & Claypool Publishers, 2008, ISBN: 13 978 1 598 292084

Morgan & Claypool Publishers) processors and memory hierarchies. “In the last few years, power dissipa- The second part (dynamic power) is tion has become an important design structured after the terms a, C, V, and constraint, on par with performance, in f in the dynamic power equation: P the design of new computer systems. = a C V2 f. Chapter 3 discusses the Whereas in the past, the primary job of techniques that affect V and f, namely the computer architect was to translate dynamic frequency/voltage scaling from improvements in operating frequency the system level to the flip-flop level. and transistor count into performance, Chapter 4 presents techniques that now power efficiency must be taken affect a and C aiming to reduce (at run- into account at every step of the design time) the effective switched capacity. process. While for some time, archi- The third part discuses the increasingly tects have been successful in delivering important problem of leakage power. It 40−50% annual improvement in proc- presents techniques invented to reduce essor performance, costs that were pre- various forms of leakage power by viously brushed aside eventually caught manipulating (at run-time) physical val- up. The most critical of these costs ues such as the supply voltage or the A new book in the Synthesis Lectures on is the inexorable increase in power threshold voltage. The chapter concen- Computer Architecture Series (Morgan dissipation and power density in proc- trates on the architectural-level policies & Claypool Publishers, Mark Hill Ed.) essors. Power dissipation issues have for applying such techniques. was presented this June at the ISCA catalyzed new topic areas in computer 2008 (Beijing) conference. The book architecture, resulting in a substantial In a fast-changing field the book aims to entitled “Computer Architecture body of work on more power-efficient synthesize some of the most important Techniques for Power Efficiency” architectures. recent work. The target readers of this is co-authored by Stefanos Kaxiras Power dissipation coupled with dimin- book are engineers, researchers, com- (University of Patras) and Margaret ishing performance gains, was also puter architecture graduate students, Martonosi (Princeton University). Both the main cause for the switch from or advanced undergraduates who are are affiliated with the HiPEAC Network single-core to multi-core architectures fairly fluent in computer architecture of Excellence: Kaxiras has been an active and a slowdown in frequency increase. concepts, but who want to build their member in HiPEAC-1 and is currently This book aims to document some of understanding of how power-aware the leader of the Task Force on Low the most important architectural tech- design influences architectures without Power in HiPEAC-2, while Martonosi, a niques that were invented, proposed, the need to delve deep into transistor HiPEAC-2 associate member, has been and applied to reduce both dynamic or circuits details, beyond the basics of participating in many HiPEAC activities power and static power dissipation in CMOS gate structures. from across the Atlantic: collabora- processors and memory hierarchies. tion in many HiPEAC funded European A significant number of techniques groups, ACACES 2006 teacher and have been proposed for a wide range n HiPEAC2009 PC Chair among others. of situations and this book synthesizes those techniques by focusing on their The book is the first that deals with common characteristics.” power efficiency (and low-power) tech- niques at the architectural level. While The book is divided into three parts: there are many great books at the the first part (comprising chapters 1 circuit or VLSI level discussing power and 2) presents introductory material, efficiency, this book intends to cover the second part (comprising chapters 3 an emerging need: the synthesis of a and 4) deals with dynamic power con- significant body of recent architecture sumption and the third part (chapter work on power efficiency. The abstract 5) deals with static (leakage) power. below, quoted from the book, gives a The book ends with a short summary flavour of the book’s style: chapter (chapter 6). Material in the first part includes introduction to modeling, Abstract from the book (copyright measuring, and simulating power in

16 info16 HiPEAC Students

Trip Report: DAC 2008 – A sunny week in the city of Anaheim

Anaheim, the sunny south Californian nerd, you’ll never want to miss the close is the place where posters meet reviews, city, is a famous place thanks to contact with a lot of cool stuff, e.g., and food meets people. Anyway, it’s a Disneyland. After years of dream- iPhone 3G or Android. There were sev- lovely place to go and I would really like ing, I found a way to go there for a eral pavilions and panel sections to hear to go back and meet some HiPEAC folks good reason -- the 45th annual Design some crazy thoughts, and the discus- over there. Automation Conference (DAC), which sions were really enjoyable. n is something amazing that I’d really love to recommend to you. DAC 46th will be held in San Francisco in July of next year. But before you plan As one of the premier EDA events, DAC the trip, allow me a minute to give you consists of a conference, an exhibition, some extra recommendations. DAC is and several other activities, such as a a perfect place for getting feedback number of collocated workshops and on your work (both praise and critique) tutorials, among others. The technical and, who knows, even good offers. program featured 138 novel papers, However, since it’s large, and in case not only on electronic design and veri- you don’t know where to go, maybe fication, but also covering architectures, try two events which are extremely tools, applications and a broad spectrum attractive to students: the University of topics. Apart from that, more than Booth and the PhD Student Forum. The 200 exhibitors showed their state-of- first one, organized by ACM SIGDA the-art techniques and products. New and located in the expo, is exclusively techniques were unveiled, new products reserved for students and is a perfect were showcased, and new ideas were platform to show you work. You should The author presenting his work at ACM discussed. The only problem for me was apply in advance to get a place and trav- SIGDA University Booth how to survive such a huge amount of el funding. The second one, scheduled Lei Gao ([email protected]), information. If you are a technology after the conference talks have finished, ISS/SSS, RWTH-Aachen University

Community News Best Paper Award: International Conference on High Performance Computing and Simulation HPCS’08

Samir Ammenouche (PhD student), Sid eral, they have a negative impact on Touati (Assistant Professor) and William the predictability of program perform- Jalby (Professor) from the University ances. This lack of performance stabil- of Versailles have been awarded the ity is a non-desirable characteristic for Best Paper Award for their contribution embedded computing. We present entitled “Practical Precise Evaluation of the progress of our experimental study Cache Effects on Low Level Embedded about the influence of cache effects on VLIW Computing”. The article was pre- embedded VLIW processors (ST2xx proc- sented and published in the Proceedings essors). We are trying to understand of the International Conference of High qualitatively and quantitatively the inter- Performance Computing and Simulation actions between cache effects (Data (HPCS’08, Nicosia, Cyprus). cache) and instruction level parallelism sonable, we mean looking at opportuni- at different granularities: applications ties at two levels: 1) program execution Abstract: and functions (coarse grain), program acceleration with tolerable performance The introduction of caches inside high regions (medium grain) and instructions predictability, and 2) active interactions performance processors provides techni- (fine grain). Our aim is to come up with with compiler optimization techniques. cal ways to reduce the memory gap by experimental arguments to help decide Our study is based on many months tolerating long memory access delays. whether non-blocking caches would be of full-time simulations on numerous While such intermediate fast caches a reasonable architectural design choice workstations producing many terabytes accelerate program execution in gen- for embedded VLIW processors. By rea- of data to analyze. n

info16 17 PhD News

Mechanisms for out of order memory accesses management

By Fernando Castro mechanism is needed in order to detect writes a very small fraction of all stores. ([email protected]) potential memory ordering violations in The scheme works in a decoupled way, Prof. Daniel Chaver, Prof. Luis the banked queue. To do this, we employ so that in the first step an age-based Pinuel and Prof. Manuel Prieto a simple and quick mechanism: a Bloom filtering is performed. As a result of Complutense of Madrid , Spain filter. this process, store instructions potentially June 2008 dependent with some in-flight load, are The two other designs (IMDC and DMDC) marked. Furthermore, using a straightfor- This work is included in the research area are included in the global definition of ward age register, an instruction window concerning new designs for microproces- age-based mechanisms. Both remove containing potential offending loads is sors. Three new implementations for the the associative LQ and replace it with delimited. In the second phase (previous hardware responsible for load and store much simpler structures. IMDC (Issue- to commit time), simple address compari- instructions management, i.e., the load- time Memory Dependence Checking) sons are established between both kind store queue (LSQ) are made. As a result uses a hash table instead of LQ to record of instructions. of the proposed designs, we obtain sig- explicitly the age of load instructions. nificant reductions in hardware complex- Therefore, stores only need to check this All proposed designs achieve the com- ity and energy consumption. structure and compare their own age mon goal of reducing LSQ hardware with the value recorded in the table to complexity, managing to very significant The first implementation, named split detect memory dependence violations. energy savings in this structure and the LQ, splits conventional and associative whole processor, with negligible per- LQ into two new structures: an associa- The third design, DMDC (Delayed Memory formance losses. tive one and another banked and non- Dependence Checking) replaces the LQ associative one. This way, a back-up with an address-based table, which only

Reconfigurable Optical Interconnection Networks for Shared-Memory Multiprocessor Architectures

By Wim Heirman cation networks is very irregular, making connections are one part of the solution ([email protected]) some parts of the network highly satu- to the communication problem, since Prof. Jan Van Campenhout rated while other parts are barely used. they allow for much higher data rates. and Prof. Dirk Stroobandt The precise patterns also depend on the Moreover, optics allows the network Ghent University, Belgium application, and can even change during to be reconfigured, during the course July 2008 the run time of a single application. This of a program, in a data-transparent makes it very difficult to build an efficient way, to the current traffic patterns. This Current multiprocessor communication communication network. way, all network connections are utilized technologies, using electrical signaling, more efficiently, which can both increase are reaching the end of their capabilities. This thesis explores the possibilities of the network’s performance and reduce Moreover, the traffic on such communi- optical, reconfigurable networks. Optical power usage.

Efficient Mechanisms to Provide Fault Tolerance in Interconnection Networks for PC Clusters

By Jose Miguel Montañana increases. However, most of the fault- alternative paths by joining fragments ([email protected]) tolerant routing strategies proposed for of existing paths in a time-efficient man- Prof. Jose Flich and Prof. Antonio massively parallel computers cannot be ner. Also, a simple and fast method for Robles applied to PC clusters. The thesis propos- dynamic network reconfiguration in the Technical University of Valencia es two fault tolerant routing approaches presence of faults is proposed. These Valencia, Spain to select alternative paths at the source mechanisms dynamically tolerate a rea- July 2008 end nodes when a fault appears. The sonable number of faults and can be first one provides in advance several dis- implemented on current commercial net- Fault tolerance in interconnection net- joint paths between every source-desti- work technologies for PC clusters. works for clusters is becoming a key nation pair of nodes, whereas the second issue as far as the number of end nodes one is able to dynamically provide new

18 info16 PhD News

Avoiding Conversion and Rearrangement Overhead in SIMD Architectures

By: Asadollah Shahbahrami, again before they can be written back using the sim-outorder simulator of the First promoter: Prof. Stamatis to memory incurs conversion overhead. SimpleScalar toolset. Additionally, three Vassiliadis* issues related to the efficient imple- Second promoter: Prof. Kees The matrix register file technique allows mentation of the 2D Discrete Wavelet Goossens data that is stored consecutively in Transform (DWT) on general-purpose Copromotor: Dr. Ben Juurlink memory to be loaded into a column processors, in particular the Pentium 4, TU Delft, The Netherlands of the register file (where a column are discussed. These are 64K aliasing, September 2008 corresponds to the corresponding sub- cache conflict misses, and SIMD vectori- words of different registers). In other zation. 64K aliasing is a phenomenon In this dissertation, a novel SIMD exten- words, this technique provides both that happens on the Pentium 4, which sion called Modified MMX (MMMX) for row-wise as well as column-wise access can degrade performance by an order multimedia computing is presented. to the media register file. It is a useful of magnitude. It occurs if two or more Specifically, the MMX architecture is approach for matrix operations that data items whose addresses differ by enhanced with the extended subwords are common in multimedia process- a multiple of 64K need to be cached and the matrix register file techniques. ing. In addition, in this work, new simultaneously. There are also many The extended subwords technique uses and general SIMD instructions address- cache conflict misses in the implemen- SIMD registers that are wider than the ing the multimedia application domain tation of vertical filtering of the DWT, packed format used to store the data. It are investigated. It does not consider if the filter length exceeds the number uses 32 bits extra for each 64-bit regis- an ISA that is application-specific. For of cache ways. In this dissertation, ter. The extended subwords technique example, special-purpose instructions techniques are proposed to avoid 64K avoids data type conversion overhead are synthesized using a few general- aliasing and to mitigate cache conflict and increases parallelism in SIMD archi- purpose SIMD instructions. The per- misses. Furthermore, the performance tectures. This is because promoting the formance of the MMMX architecture of the 2D DWT is improved by exploit- subwords of the source SIMD registers is compared to the performance of ing the data-level parallelism using the to larger subwords before they can be the MMX/SSE architecture for different SIMD instructions supported by most processed and demoting the results multimedia applications and kernels general-purpose processors.

A Cache-based Hardware Accelerator for Memory Data Movements

By Filipa Duarte This means that the presence of the of cache line and word granularity, and Prof. Stephan Wong caches can be utilized to significantly can be connected to a direct-mapped TU Delft, The Netherlands reduce the latencies associated with or a set-associative cache, and can October 2008 memory copies, when a “smarter” way efficiently reduce the memory copy to perform the memory copy operation bottleneck in both single core or multi- This dissertation presents a hardware is used. core processors, executing a message accelerator that is able to accelerate passing communication model. large (including non-parallel) memory Therefore, the proposed accelerator data movements, in particular memory for memory copies takes advantage of The proposed solutions have been copies, performed traditionally by the the presence of these caches and intro- implemented in a FPGA as a proof of processors. As today’s processors are duces a redirection mechanism that concept and incorporated in a simulator tied with or have integrated caches links the original data (in the cache) to running several benchmarks to deter- with varying sizes (from several kilo- the copied addresses (in a newly added mine the performance gains of the pro- bytes in hand-held devices to many indexing table). The proposed solutions posal. In particular, for the receiver side megabytes in desktop devices or large avoid cache pollution and duplication of the TCP/IP stack, the proposed solu- servers), it is only logical to assume that of data, and efficiently schedule the tions can reach speed-ups from 2.96 to data to-be-copied by a memory copy is access to the main memory, thus effec- 4.61 times and reduce the number of already present within the cache. This tively reducing the latency associated instructions executed by 26% to 44%. is especially true when considering that with memory copies. Moreover, the such data often must be processed first. proposed accelerator supports copies

info16 19 Upcoming Events

International Symposium on System-on-Chip 2008. Call for Participation Tampere, Finland, November 5-6, 2008. http://soc.cs.tut.fi/

41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2008) Lake Como, Italy, November 8-12, 2008. http://www.microarch.org/micro41/

International Conference on Computer Aided Design (ICCAD 2008) San Jose, USA, November 10-13, 2008

International Conference for High Performance Computing, Networking, Storage and Analysis (SC08) Austin, Texas, November 15-21, 2008. http://sc08.supercomputing.org/

HiPEAC Fall Computing Systems Week Paris, France, November 24-28, 2008, http://www.hipeac.net/computing_systems_week_paris

HiPEAC 6th Industrial Workshop Paris, France, November 26, 2008

Information and Communication Technologies (ICT 2008) Lyon, France, November 25-27, http://ec.europa.eu/information_society/events/ict/2008/index_en.htm

20th International Conference on Parallel and Distributed Computing and Systems (PDCS 2008) Orlando, USA, November 25-27, 2008. http://www.iasted.org/conferences/home-631.html

8th USENIX Symposium on Operating Systems Design and Implementation (OSDI 08) San Diego, USA, December 8-10, 2008. http://www.usenix.org/event/osdi08/

International Symposium on Parallel and Distributed Processing with Applications (ISPA-2008) Sydney, Australia, December 10-12, 2008. http://www.cs.usyd.edu.au/~ispa2008/workshops.html

36th Symposium on Principles of Programming Languages (POPL 2009) Savannah, USA, January 21-23, 2008. http://www.cs.ucsd.edu/popl/09/

International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT 2008) Dunedin, New Zealand, December 1-4, 2008, http://www.cs.otago.ac.nz/pdcat08/

HiPEAC 2009 Conference, Paphos, Cyprus, January 25-28, 2009 3rd Workshop on Statistical and Machine learning approaches applied to ARchitectures and compilation (SMART’09). Co-located with HiPEAC’09. Paphos, Cyprus, January 25,2009, http://www.hipeac.net/smart-workshop.html

Contributions If you are a HiPEAC member and would like to contribute to future HiPEAC newsletters, please contact Rainer Leupers at [email protected]

HiPEAC Info is a quarterly newsletter published by the HiPEAC Network of Excellence, 16 funded by the 7th European Framework Programme (FP7) under contract no. IST-217068. 20 info Website: http://www.HiPEAC.net Subscriptions: http://www.HiPEAC.net/newsletter

HiPEAC2 Cluster Meetings 27th of November, Paris