MAHMUT TAYLAN KANDEMIR Assoc. Prof. of and Engineering Department The Pennsylvania State University 354C IST Building, University Park, PA 16802, USA Phone: (814) 863-4888 Fax: (814) 865-3176 Email: [email protected] Web: http://www.cse.psu.edu/~kandemir

Education and Training Istanbul Technical University, Turkey Computer Engineering B.S. 1988 Istanbul Technical University, Turkey Computer Engineering M.S. 1992 Syracuse University, NY, USA Computer Science Ph.D. 1999

Research and Professional Experience 9/04 - present Associate Professor, Computer Science and Engineering, Pennsylvania State University 9/99 - 9/04 Assistant Professor, Computer Science and Engineering, Pennsylvania State University 9/97 - 9/99 Visiting Ph.D. Student, ECE Department, Northwestern University 9/95 - 9/97 Teaching Assistant, EECS Department, Syracuse University

Research Interests Optimizing Compilers, Parallel I/O and Storage Systems, High-Performance Computing, Energy-Aware System Design, and Reliable Computing.

Teaching Interests Compilers, Programming Languages, High-Level Synthesis and Multicore Architectures.

Awards and Honors Outstanding Research Award, Penn State Engineering Society (PSES), 2004 Top Download from ACM’s Digital Library (October 2006) Best Paper Award in IPDPS, 2008 Best Paper Award in ICPADS, 2006 Best Paper Award in Compiler Construction Conference, 2002 Best Paper Nominations in DAC 2004, DAC 2005, DAC 2006

Editorship

2006-present Associate Editor, International Journal of Distributed Sensor Networks

2004-present Associate Editor, ACM Transactions on Design Automation of Electronic Systems (TODAES)

2003 Co-Editor (with L. Benini and J. Ramanujam), Compilers and Operating Systems for Low Power, Kluwer Academic

Graduate Advising

Graduated Students (Ph.D.) – 11 students [4 co-advised] Murali Vilayannur (VMware, USA), Victor De La Luz (Bank of Mexico, Mexico), Wei Zhang (University of Southern Illinois, USA), Ismail Kadayif (Canakkale University, Turkey), Hendra Saputra (Institute for Infocomm Research, Singapore), Suleyman Tosun (Ankara University, Turkey), Guangyu Chen (Microsoft, USA), Guilin Chen (Google, USA), Aman Gayasen (Cadence, USA), Fehui Li (NVIDIA, USA), Ozcan Ozturk (Bilkent University, Turkey), Seung Woo Son (Argonne National Laboratory).

Graduated Students (MSc.) -- 8 students Samarjeet S. Tomar, Amisha P. Parikh, Priya Unnikrishnan, Avanti Nadgir, Serdar Erkan, Adaeze Ibeneche, Shengyan Hong, Piyou Song.

Current Students (Ph.D.) -- 15 students [4 co-advised] Sri Hari Krishna Narayanan, Hakduran Koc, John Sustersic, Taylan Yemliha, Gary Giger, Yang Ding, Shekhar Srikantaiah, Rajat Garg, Sai Prashanth Muralidhara, Christina Patrick, Yuanrui Zhang, Ramya Prabhakar, Emre Kultursay, Praveen Yedlapalli, Akbar Sharifi.

Current Students (MSc.) – 1 student Ryan Prins.

Selected Publications (Year 2006+; a complete publication list is available in http://www.cse.psu.edu/~kandemir)

Recent Journal Papers (out of total 80+)

Compilers

O. Ozturk, M. Kandemir, G. Chen. Access Pattern-Based Code Compression for Memory-Constrained Systems. To appear in the ACM Transactions on Design Automation of Electronic Systems (TODAES).

G. Chen, M. Kandemir. September 2008. Compiler-Directed Code Restructuring for Improving Performance of MPSoCs. IEEE Transactions on Parallel and Distributed Systems (TPDS), 19(9):1201-1214.

G. Chen, M. Kandemir. July 2007. An Approach for Enhancing Inter-processor Data Locality on Chip Multiprocessors. Transactions on High-Performance Embedded Architectures and Compilers, Special Issue on Future Directions in Embedded Systems Compilation. Springer-Verlag LNCS 4050:214-233.

I. Kadayif, P. Nath, M. Kandemir, A. Sivasubramaniam. February 2007. Reducing Data TLB Power via Compiler-directed Address Generation. IEEE Transactions on CAD (TCAD), 26(2):312-324.

J. Ramanujam, J. Hong, M. Kandemir, A. Narayan. January 2006. Estimating and Reducing the Memory Requirements of Signal Processing Codes for Embedded Processor Systems. IEEE Transactions on Signal Processing (TSP) 54(1):286-294.

Embedded Systems

O. Ozturk, M. Kandemir. July 2008. ILP-Based Energy Minimization Techniques for Banked Memories. ACM Transactions on Design Automation of Electronic Systems (TODAES) 13(3):50:1-50:40.

B. Demiroz, H. Topcuoglu, M. Kandemir. October 2007. Solving the Register Allocation Problem for Embedded Systems Using a Hybrid Evolutionary Algorithm. IEEE Transactions on Evolutionary Computation 11(5):620-634.

De La Luz, V., M. Kandemir, I. Kolcu. September 2006. Reducing Memory Energy Consumption of Embedded Applications that Process Dynamically-allocated Data. IEEE Transactions on CAD (TCAD), 25(9):1855-1860.

Kandemir, M. April 2006. Reducing Energy Consumption of Multiprocessor SoC Architectures by Exploiting Memory Bank Locality. ACM Transactions on Design Automation of Electronic Systems (TODAES) 11(2):410-441.

G. Chen, M. Kandemir, M. J. Irwin, J. Ramanujam. February 2006. Reducing Code Size through Address Register Assignment. ACM Transactions on Embedded Computing (TECS) 5(1):225-258.

I/O, Storage and High Performance Computing

S. W. Son, K. Malkowski, G. Chen, M. Kandemir, P. Raghavan. September 2007. Reducing Energy Consumption of Parallel Sparse Matrix Applications through Integrated Link/CPU Voltage Scaling. Journal of Supercomputing 41(3):179-213.

S. W. Son, G. Chen, O. Ozturk, M. Kandemir, A. Choudhary. September 2007. Compiler-directed Energy Optimization for Parallel Disk Based Systems. IEEE Transactions on Parallel and Distributed Systems (TPDS) 18(9):1241-1257.

S. W. Son, M. Kandemir. July 2007. A Prefetching Algorithm for Multi-speed Disks. Transactions on High- Performance Embedded Architectures and Compilers, Special Issue on Future Directions in Embedded Systems Compilation. Springer-Verlag LNCS 4050:317-340.

G. Memik, M. Kandemir, W.-K. Liao, A. Choudhary. August 2006. Multi-collective I/O: A Technique for Exploiting Inter-file Access Patterns. ACM Transactions on Storage (TOS) 2(3):349-369.

M. Vilayannur, A. Sivasubramaniam, M. Kandemir, R. Thakur, R. Ross. January 2006. Discretionary Caching for I/O on Clusters. Journal on Cluster Computing: Special Issue on Parallel I/O in Computational Grids and Cluster Computing Systems 9(1):29-44.

Computer Design

A. Gayasen, S. Srinivasan, N. Vijaykrishnan, M. Kandemir. 2007. Design of Power-Aware FPGA Fabrics. International Journal of Embedded Systems (IJES) 3(1/2):52-64.

W. Zhang, Y-F. Tsai, D. Duarte, N. Vijaykrishnan, M. Kandemir, M. J. Irwin. February 2006. Reducing Dynamic and Leakage Energy in VLIW Architectures. ACM Transactions on Embedded Computing Systems (TECS), Special Issue on Power-Aware Embedded Computing 5(1):1-28.

C. Liu, A. Sivasubramaniam, M. Kandemir. February 2006. Optimizing Bus Energy Consumption of On- Chip Multiprocessors Using Frequent Values. Journal of Systems Architecture, Special Issue on Best Papers of Euromicro Conference on Parallel and Distributed Processing 52(2):129-142.

E. J. Kim, G. Link, K. H. Yum, N. Vijaykrishnan, M. Kandemir, M. J. Irwin, C. R. Das. June 2005. A Holistic Approach to Designing Energy-Efficient Cluster Interconnects. IEEE Transactions on Computers (TC), 54(6):660-671.

I. Kadayif, A. Sivasubramaniam, M. Kandemir, G. Kandiraju, G. Chen, G. Chen. April 2005. Optimizing Instruction TLB Energy Using Software and Hardware Techniques. ACM Transactions on Design Automation of Electronic Systems 10(2):229-257.

Recent Conference/Workshop Papers (out of total 340+ papers)

Compilers

M. Kandemir, O. Ozturk. June 2008. Software-Directed Combined CPU/Link Voltage Scaling for NoC- Based CMPs. Proceedings of the International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS 2008). pp. 359-370. Annapolis, MD.

M. Kandemir. November 2007. Data locality enhancement for CMPs. Proceedings of the ACM/IEEE 2007 International Conference on Computer-Aided Design (ICCAD 2007). San Jose, CA. pp. 155-159.

F. Li, G. Chen, M. Kandemir, I. Kolcu. June 2007. Profile-Driven Energy Reduction in Network-on-Chips. Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation (PLDI 2007). pp. 394-404. San Diego, CA.

O. Ozturk, G. Chen, M. Kandemir. March 2007. Compiler-directed Variable Latency Aware SPM Management to Cope with Timing Problems. Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization (CGO'07).

M. Kandemir, S.-W. Son. October 2006. Reducing Power Through Compiler-directed Barrier Synchronization Elimination. Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED 2006). pp. 354-357. Tegernsee, Germany.

G. Chen, F. Li, M. Kandemir, M. J. Irwin. June 2006. Reducing NoC Energy Consumption through Compiler-Directed Channel Voltage Scaling. Proceedings ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation (PLDI'06). pp. 193-203. Ottawa, Canada.

M. Mutyam, F. Li, N. Vijaykrishnan, M. Kandemir, M. J. Irwin. June 2006. Compiler-directed thermal management for VLIW functional units. Proceedings of the ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES 2006), pp. 163-172, Ottawa, Canada.

G. Chen., F. Li, M. Kandemir. January 2006. Compiler-Directed Channel Allocation for Saving Power in On-Chip Networks. Proceedings of the Thirty-Third Annual ACM-SIGACT Symposium on Principles of Programming Languages (POPL 2006). pp. 194-205. Charleston, SC.

Embedded Systems

T. Yemliha, T., S. Srikantaiah, M. Kandemir, M. Karakoy, M. J. Irwin. November 2008. Integrated Code and Data Placement in Two-Dimensional Mesh Based Chip Multiprocessors. To appear in Proceedings of the ACM/IEEE 2008 International Conference on Computer-Aided Design (ICCAD 2008). San Jose, CA.

T. Yemliha, S. Srikantaiah, M. Kandemir, O. Ozturk. November 2008. SPM Management Using Markov Chain Based Data Access Prediction. To appear in Proceedings of the ACM/IEEE 2008 International Conference on Computer-Aided Design (ICCAD 2008). San Jose, CA.

G. Chen, F. Li, S. W. Son, M. Kandemir. Application Mapping for Chip Multiprocessors. Proceedings of the Forty-Fifth ACM/IEEE Design Automation Conference (DAC 2008). pp. 620-625. Anaheim, CA.

A. Marongiu, L. Benini, M. Kandemir. Lightweight barrier-based parallelization support for non-cache- coherent MPSoC platforms. Proceedings of International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES 2007), pp. 145-149, Salzburg, Austria.

L. Xue, O. Ozturk, M. Kandemir. June 2007. A Memory-Conscious Code Parallelization Scheme. Proceedings of the Forty-Fourth Design Automation Conference (DAC 2007). pp. 230-233. San Diego, CA.

H. Koc., M. Kandemir, E. Ercanli, O. Ozturk. June 2007. Reducing Off-Chip Memory Access Costs Using Data Recomputation in Embedded Chip Multi-processors. Proceedings of the Forty-Fourth Design Automation Conference (DAC 2007).

G. Chen, O. Ozturk, M. Kandemir, M. Karakoy. March 2006. Dynamic Scratch-Pad Memory Management for Irregular Array Access Patterns. Proceedings of the Conference on Design, Automation, and Test in Europe (DATE’06). [Top Download from ACM’s Digital Library in October 2006].

O. Ozturk, M. Kandemir, M. J. Irwin, S. Tosun. July 2006. Multi-level On-chip Memory Hierarchy Design for Embedded Chip Multiprocessors. Proceedings of the Twelfth International Conference on Parallel and Distributed Systems (ICPADS'06). [Best Paper Award].

I/O, Storage and High Performance Computing

O. Ozturk, S. W. Son, M. Kandemir, M. Karakoy. November 2008. Prefetch Throttling and Data Pinning for Improving Performance of Shared Storage Caches. To appear in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’08). Austin, TX.

M. Kandemir, F. Li, M. J. Irwin, S. W. Son. November 2008. A Novel Migration-based NUCA Design for Chip Multiprocessors. To appear in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’08). Austin, TX.

S. Son, S. W., P. Muralidhara, O. Ozturk, M. Kandemir, I. Kolcu, M. Karakoy. October 25-29, 2008. Profiler and Compiler Assisted Adaptive I/O Prefetching for Shared Storage Caches. To appear in Proceedings of the Seventeenth International Conference on Parallel Architectures and Compilation Techniques (PACT’08).Toronto, Canada.

A. Yanamandra, A., B. Cover, P. Raghavan, M. J. Irwin, M. Kandemir. April 2008. Evaluating the Role of Scratchpad Memories in Chip Multiprocessors for Sparse Matrix Computations. Proceedings of the Twenty-Second IEEE International Parallel and Distributed Processing Symposium (IPDPS 2008).

M. Kandemir, S. W. Son. February 2008. Improving I/O Performance of Applications through Compiler- Directed Code Restructuring. Proceedings of the Sixth USENIX Conference on File and Storage Technologies (FAST’08). pp. 159-174. San Jose, CA.

R. Prabhakar, S. W. Son, C. Patrick, S. H. K. Narayanan, and M. Kandemir September 2007. Securing Disk- Resident Data Through Application Level Encryption. Proceedings of the 4th International IEEE Security in Storage Workshop (SISW'07), pp. 46-57, San Diego, CA.

S. W. Son, M. Kandemir. May 2007. Integrated Data Reorganization and Disk Mapping for Reducing Disk Energy Consumption. Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2007). pp. 557-564. Rio de Janeiro, Brazil.

S. W. Son, G. Chen, M. Kandemir. March 2006. A Compiler-guided Approach for Reducing Disk Power Consumption by Exploiting Disk Access Locality. Proceedings of the Fourth Annual Symposium on Code Generation and Optimization (CGO'06). pp. 256-268. Manhattan, NY.

Computer Design

Y. Ding, M. Kandemir, M. J. Irwin, and P. Raghavan. Adapting Application Mapping to Systematic Within- Die ProcessVariations on Chip Multiprocessors. To appear in Proceedings of the 4th International Conference on High Performance and Embedded Architectures and Compilers, Cyprus.

Y. Ding, M. Kandemir, P. Raghavan, M. J. Irwin. April 2008. A Helper Thread Based EDP Reduction Scheme for Adapting Application Execution in CMPs. Proceedings of the Twenty-Second IEEE International Parallel and Distributed Processing Symposium (IPDPS 2008), Miami, FL. [Best Paper Award].

S. Srikantaiah, S., M. Kandemir, M. J. Irwin. March 2008. Adaptive Set-Pinning: Managing Shared Caches in Chip Multiprocessors. Proceedings of the Thirteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2008). pp. 135-144. Seattle, WA.

K. Malkowski, P. Raghavan, M. Kandemir, M. J. Irwin. August 2007. Phase-aware Adaptive Hardware Selection for Power-Efficient Scientific Computations. Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED 2007).

I. Kadayif, M. Kandemir. June 2007. Modeling and Improving Data Cache Reliability. Proceedings of the International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS). pp. 1-12. San Diego, CA.

Li, F., C. Nicopoulos, T. Richardson, Y. Xie, N. Vijaykrishnan, M. Kandemir. June 2006. Design and Management of 3D Chip Multiprocessors using Network-in-memory. Proceedings of the Thirty-Third Annual International Symposium on Computer Architecture (ISCA'06). pp. 130-141. Boston, MA.

G. Chen, M. Kandemir, I. Kolcu. June 2006. Memory-conscious Reliable Execution on Embedded Chip Multiprocessors. Proceedings of the International Conference on Dependable Systems and Networks (DSN-2006).

C. Liu, A. Sivasubramaniam, M. Kandemir, M. J. Irwin. April 2006. Enhancing L2 Organization for CMPs with a Center Cell. Proceedings of the Twentieth IEEE International Parallel and Distributed Processing Symposium (IPDPS'06). pp. 1-10. Rhodes Island, Greece.

Research Projects

Completed Research Projects (total 13 projects, including the NSF CAREER award)

9/1/2004-2/28/2006 Power Aware Systems (PAS) (with Irwin, Gigascale Systems $445,000 Co-PI Narayanan) Research Center

7/1/2004-6/30/2006 Transaction Level Power Modeling Pittsburgh Digital $149,903 Co-PI Methodology (with Narayanan, Xie) Greenhouse

8/15/2004-8/14/2007 Energy-Efficient On-Chip Communication Semiconductor $299,997 Co-PI and Storage for Multiprocessor SoCs (with Research Corp. Irwin, Narayanan, Benini, Bogliolo)

8/1/2004-7/31/2008 Collaborative Research: Dynamic Runtime NSF/ACI $306,000 PI and Compilation Support for I/O-Intensive Applications (with A. Choudhary)

7/1/2002-6/30/2007 (I^3)C: An Infrastructure for Innovation in NSF/CISE/ $1,795,729 Participant Information Computing (with Das, Acharya, EIA Giles, Irwin, Plassmann) (NSF/$1,795,729; PSU Match/$765,307)

7/1/2002-5/31/2003 Reliable Energy-Efficient System Design Pittsburgh $101,571 Co-PI (with Irwin Narayanan) Digital Greenhouse

3/15/2002-2/28/2005 From High Performance to Low Power: NSF/CISE/ $49,556 Co-PI Infrastructure for Ubiquitous Computing EIA (with Irwin, Narayanan, Sivasubramaniam) (Plus PSU Match/$49,556)

9/15/2001-8/30/2005 NGS: POWERful Software for Power NSF/CISE/ $600,667 Co-PI Constrained Systems (with Irwin, NGS Narayanan, Sivasubramaniam)

8/15/2001-8/14/2007 CAREER: EOC: An Energy-Aware NSF/ $250,000 PI Optimizing Compiler Framework CAREER/ CCR

8/1/2001-7/31/2005 Systems Support for High Performance I/O NSF/CISE/ $239,527 PI on Shared Storage Clusters (with C-CR Sivasubramaniam)

8/1/2001-7/31/2004 Scalable I/O Management and NSF/CISE/ $6,010 PI Optimizations for Scientific Applications (A NGS Subcontract to Northwestern University)

5/1/2001-4/30/2004 CIP URI: Mobile Ubiquitous Security Office of Naval $90,000 Co-PI Environment (MUSE) (with R. Brooks, Research Narayanan)

1/1/2000-12/31/2001 Architecture and Compiler Power Issues in Pittsburgh $226,042 Co-PI SoCs (with Irwin, Narayanan) Digital Greenhouse

Active Research Projects (total 11 projects)

9/1/2008-8/31/2011 Collaborative Research: Advanced Compiler NSF/CCF/ $308,479 PI Optimizations and Programming Language ITR-HECURA Enhancements for Petascale I/O and Storage (with ANL) 8/1/2008-7/31/2011 CPA-CPL-T: Collaborative Research: REEact- NSF/CCF $599,999 Co-PI II: A Robust Execution Environment for Fragile Multicore Systems (with Irwin, Univ of Virginia, and Univ of Pittsburgh)

7/1/2008-6/30/2011 MRI Acquisition of A Scalable Instrument for NSF/CNS/ $1,995,000 Co-PI Discovery through Computing (with OIA/DBI/ Raghavan, Chen, Hudson, Smith) DMR/OCI/ BCS

9/1/2007-8/31/2010 Collaborative Research: SDCI HPC: NSF/OCI $1,746,144 Co-PI Improvement: Parallel I/O Infrastructure for Petascale Systems (with Northwestern University)

8/23/2007-8/22/2009 Collaborative Research: CSR-AES: REEact: NSF/CNS $50,000 PI A Robust Execution Environment for Fragile Multicore Systems (with Irwin, Univ of Virginia, and Univ of Pittsburgh)

8/15/2007-8/14/2010 CSR-SMA: Toward Model-Driven Multilevel NSF/CNS $1,500,000 Co-PI Analysis and Optimization of Multicomponent Computer Systems (with Raghavan, Irwin, Shontz, Li)

8/1/2007-7/31/2010 A Self-regulating Compiler Framework for NSF/CCF $425,000 PI NoC Based CMPs (with Irwin)

5/1/2007- Dynamic Compilation Support for Distrbtd. MICROSOFT $75,000 PI Adaptation for NoC Based CMPs

8/31/2006-8/1/2009 Collaborative Research: Scalable I/O NSF/CCF $149,459 PI Middleware and File System Optimizations for High-Performance Computing (with Northwestern University)

8/1/2007-8/1/2009 Fantom II: Algorithm-Architecture Co-design DARPA $557,333 Co-PI for High Performance Application-on-Chips in SAR Systems (with Narayanan)

10/1/2004-3/31/2009 Collaborative Research: Ultra-Scalable System NSF $108,000 PI Software and tools for Data-Intensive Computing (with Loyola University and Northwestern University)

Service

Department Level Graduate Coordinator (2007- ) Currently 255 graduate students in the program Colloquium Chairman (2006-07) Graduate Committee (1999-00, 2001-02, 2002-03, 2003-04) Faculty Recruiting Committee (2005-06, 2006-07, 2007-08, 2008-09) Teaching Issues Committee (2006-07, 2007-08, 2008-09) Curriculum Committee (2001-02, 2002-03, 2003-04, 2004-05) Awards Committee (2005-06) Publications Committee (2007-08, 2008-09) Laboratory Committee (2001-02)

College Level CSE Advisor, Engineering Advising Center (2000-01) Graduate Council, Alternate Member (2006)

Service to the Profession

ASPLOS Program Committee Member (2009)

CASES Program Committee Member (2002)

CGO Student Poster Chair (2008)

COLP (in PACT) Program Committee Member (2000, 2002, 2003), Program Co-Chair (2000), Co-Organizer (2001)

CPC Session Chair (1996)

CTCES Program Committee Member (2004)

DAC Program Committee Member (2005, 2006), Session Chair (2006)

EMSOFT Program Committee Member (2005)

EUC Program Committee Member (2005, 2006)

FPL Program Committee Member (2005, 2006)

HCW Program Committee Member (2004)

HICA Program Committee Member (2007)

HiPC Program Committee Member (1999, 2003, 2004, 2006)

HiPEAC Program Committee Member (2007, 2008)

HPCA Session Chair (2002, 2003)

ICCD Program Committee Member (2003, 2004, 2006)

ICESS Program Committee Member (2004, 2008)

ICPADS Program Committee Member (2001, 2006)

ICPP Session Chair (1997), Program Committee Member (2003)

INTERACT (in HPCA) Program Committee Member (2007)

IRADSN Program Committee Member (2008)

ISLPED Program Committee Member (2001, 2002, 2003, 2004), Publicity Chair (2005)

LCTES Program Committee Member (2001, 2005, 2006, 2007), Program Chair (2009)

MCW Program Committee Member (2001)

MICRO Program Committee Member (2007)

MUE International Advisory Board (2007), Program Committee Member (2008, 2009)

PACS Program Committee Member (2003)

PACT Program Committee Member (2009)

PARC Program Committee Member (2005)

RTAS Program Committee Member (2006)

SEC Program Committee Member (2008)

SIGMETRICS Program Committee Member (2009)

UNESST Program Committee Member (2008)

WASP (in CODES+ISSS) Program Committee Member (2005)

WHSO (in ICCD) Program Committee Member (2000)

WMPP (in IPDPS) Program Committee Member (2001, 2002, 2003, 2005), Publicity Chair (2004)

Sample Research Projects

Compiler and Runtime System Support for Chip Multiprocessors [supported by NSF and Microsoft] With the emergence of the chip multiprocessor (CMP) architectures comes the promise of integrating enormous computing power in a single chip, thereby enabling parallel computing in all types of platforms, including handheld computers and desktop machines. Providing proper software support for applications is critical to harness the true power of these new architectures. This research involves developing (1) novel, software (compiler and runtime system) based approaches to reduce power consumption and mitigate associated thermal issues, while still maintaining high performance. Kandemir and his students focus on runtime adaptation techniques for NoC (network-on-chip) based CMP systems. The developed techniques are being embedded a dynamic compiler which observes both application and machine status at runtime, and adapts execution automatically to dynamically changing conditions using a set of helper threads. These helper threads and application threads share the same CMP/NoC resources, and cooperate and adapt at runtime to ensure power consumption is under control, no thermal limits are exceeded, and the required level of performance is maintained. (2) model-driven optimizations spanning multiple levels of a CMP-based system including the architecture, compiler, algorithm and application layers, for multiple objectives such as performance, power and productivity. A primary goal is to develop a comprehensive framework for model-driven multilevel, multiobjective optimizations and large-scale, sparse engineering and scientific applications. These models are parameterized and integrate characterization of the application, architecture and compiler transformations. The research also includes an optimization framework to determine multiobjective, optimal or Pareto-optimal designs while capturing uncertainties.

Ultra-Scalable System Software and Tools for High-Performance Computing [supported by NSF] This project entails research and development and to address the software and tools problems for ultra- scale parallel machines, especially targeted for scalable memory and storage hierarchy. The fundamental premise is that to achieve extreme scalability, incremental changes or adaptation of traditional (extension of sequential) interfaces and techniques for scaling data accesses will not succeed, because they are based on pessimistic and conservative assumptions of parallelism, synchronization, and data sharing patterns. Kandemir’s research group develops innovative techniques to optimize data access that utilize the understanding of high-level access patterns ("intent"), and use that information through runtime layers to enable optimizations and reduction / elimination of lockings and synchronizations at different levels. The proposed mechanisms allow different software layers to interact/cooperate with each other. Specifically, the upper layers in the software stack extract high-level access pattern information and pass it to the lower layers in the stack, which in turn exploit them to achieve ultra-scalability.

Language, Middleware and File System Support for Petascale I/O [supported by NSF] As system sizes and capabilities approach petascale range, opportunities to solve problems that were unimaginable as recently as last decade now exist. However, an important problem that must be solved as a prerequisite is that of programming petascale systems in a way to easily exploit these raw capabilities. One aspect which requires immediate attention from computer science community is the large-scale I/O (input- output) and storage system (disks, flash devices, tapes, and combination of these) management support. Kandemir’s research is on addressing the I/O problem at language, compiler and runtime layers to ensure that I/O will not be bottleneck in emerging petascale systems. Specifically, the research in this project involves: (1) designing programming-language enhancement for enabling I/O in petascale systems; (2) designing performance/power-oriented I/O optimizations that use novel compiler analyses and optimizations; (3) developing a compilation framework that accepts user directives and translates them to optimizations passed to the I/O runtime stack; (4) designing and implementing a novel hint-handling mechanism within the I/O stack, which has the capability of both hint translation and hint-to-optimization conversion; and (5) evaluating the developed infrastructure under realistic, petascale I/O-intensive workloads from national labs.

Algorithm-Architecture Coadaptation for Embedded Computing [Supported by DARPA] Classical hardware-software codesign for embedded systems works under an algorithmic description and partitions the input algorithm between hardware (customized ASIC, or reconfigurable fabric such as FPGA) and software (mapped to CPU). However, the initial algorithmic description limits the potential search space for optimum solutions. Kandemir and his students study a more powerful code partitioning scheme that also considers algorithmic alternatives during search for solutions. This approach, which opens new alternatives in codesign research that targets embedded computing platforms, employs two loops during the search process: the outer loop iterates over alternate algorithmic options while the inner loop invokes a hardware-software codesign partitioning module that returns the best mapping under the algorithmic option specified by the outer loop. Example application Kandemir’s research group focuses is a SAR specification from Raytheon and the target architecture is a hybrid system that contains both multiple FPGAs and multiple CPUs.