To View Abstracts in Advance (Pdf)
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings of 23rd IEEE International Parallel and Distributed Processing Symposium IPDPS 2009 Advance Program Abstracts Abstracts for both contributed papers and all workshops have been compiled to allow authors to check accuracy and so that visitors to this website may preview the papers to be presented at the conference. Full proceedings of the conference will be published on a cdrom to be dis- tributed to registrants at the conference. Contents Session 1: Algorithms - Scheduling I 2 On Scheduling Dags to Maximize Area . 3 Efficient Scheduling of Task Graph Collections on Heterogeneous Resources . 3 Static Strategies for Worksharing with Unrecoverable Interruptions . 4 On the Complexity of Mapping Pipelined Filtering Services on Heterogeneous Platforms . 4 Session 2: Applications - Biological Applications 5 Sequence Alignment with GPU: Performance and Design Challenges . 6 Evaluating the use of GPUs in Liver Image Segmentation and HMMER Database Searches . 6 Improving MPI-HMMER’s Scalability with Parallel I/O . 7 Accelerating Leukocyte Tracking Using CUDA: A Case Study in Leveraging Manycore Coprocessors . 7 Session 3: Architecture - Memory Hierarchy and Transactional Memory 8 Efficient Shared Cache Management through Sharing-Aware Replacement and Streaming-Aware Insertion Policy . 9 Core-aware Memory Access Scheduling Schemes . 9 Using Hardware Transactional Memory for Data Race Detection . 10 Speculation-Based Conflict Resolution in Hardware Transactional Memory . 10 Session 4: Software - Fault Tolerance and Runtime Systems 11 Compiler-Enhanced Incremental Checkpointing for OpenMP Applications . 12 DMTCP: Transparent Checkpointing for Cluster Computations and the Desktop . 12 Elastic Scaling of Data Parallel Operators in Stream Processing . 13 Scalable RDMA performance in PGAS languages . 13 Session 5: Algorithms - Resource Management 14 Singular Value Decomposition on GPU using CUDA . 15 Coupled Placement in Modern Data Centers . 15 An Upload Bandwidth Threshold for Peer-to-Peer Video-on-Demand Scalability . 16 Competitive Buffer Management with Packet Dependencies . 16 Session 6: Applications - System Software and Applications 17 Annotation-Based Empirical Performance Tuning Using Orio . 18 Automatic Detection of Parallel Applications Computation Phases . 18 Handling OS Jitter on Multicore Multithreaded Systems . 19 Building a Parallel Pipelined External Memory Algorithm Library . 19 Session 7: Architecture - Power Efficiency and Process Variability 20 On Reducing Misspeculations in a Pipelined Scheduler . 21 Efficient Microarchitecture Policies for Accurately Adapting to Power Constraints . 21 An On/Off Link Activation Method for Low-Power Ethernet in PC Clusters . 22 A new mechanism to deal with process variability in NoC links . 22 Session 8: Software - Data Parallel Programming Frameworks 23 A framework for efficient and scalable execution of domain-specific templates on GPUs . 24 A Cross-Input Adaptive Framework for GPU Program Optimizations . 24 CellMR: A Framework for Supporting MapReduce on Asymmetric Cell-Based Clusters . 25 Message Passing on Data-Parallel Architectures . 25 i Session 9: Algorithms - Scheduling II 26 Online time constrained scheduling with penalties . 27 Minimizing Total Busy Time in Parallel Scheduling with Application to Optical Networks . 27 Energy Minimization for Periodic Real-Time Tasks on Heterogeneous Processing Units . 28 Multi-Users Scheduling in Parallel Systems . 28 Session 10: Applications - Graph and String Applications 29 Input-independent, Scalable and Fast String Matching on the Cray XMT . 30 Compact Graph Representations and Parallel Connectivity Algorithms for Massive Dynamic Network Analysis . 30 Transitive Closure on the Cell Broadband Engine: A study on Self-Scheduling in a Multicore Processor . 31 Parallel Short Sequence Mapping for High Throughput Genome Sequencing . 31 Session 11: Architecture - Networks and Interconnects 32 TupleQ: Fully-Asynchronous and Zero-Copy MPI over InfiniBand . 33 Disjoint-Path Routing: Efficient Communication for Streaming Applications . 33 Performance Analysis of Optical Packet Switches Enhanced with Electronic Buffering . 34 An Approach for Matching Communication Patterns in Parallel Applications . 34 Session 12: Software - I/O and File Systems 35 Adaptable, Metadata Rich IO Methods for Portable High Performance IO . 36 Small-File Access in Parallel File Systems . 37 Making Resonance a Common Case: A High-Performance Implementation of Collective I/O on Parallel File Systems 38 Design, Implementation, and Evaluation of Transparent pNFS on Lustre . 38 Plenary Session: Best Papers 39 Crash Fault Detection in Celerating Environments . 40 HPCC RandomAccess Benchmark for Next Generation Supercomputers . 40 Exploring the Multiple-GPU Design Space . 41 Accommodating Bursts in Distributed Stream Processing Systems . 41 Session 13: Algorithms - General Theory 42 Combinatorial Properties for Efficient Communication in Distributed Networks with Local Interactions . 43 Remote-Spanners: What to Know beyond Neighbors . 43 A Fusion-based Approach for Tolerating Faults in Finite State Machines . 44 The Weak Mutual Exclusion Problem . 44 Session 14: Applications - Data Intensive Applications 45 Best-Effort Parallel Execution Framework for Recognition and Mining Applications . 46 Multi-Dimensional Characterization of Temporal Data Mining on Graphics Processors . 46 A Partition-based Approach to Support Streaming Updates over Persistent Data in an Active Data Warehouse . 47 Architectural Implications for Spatial Object Association Algorithms . 47 Session 15: Architecture - Emerging Architectures and Performance Modeling 48 vCUDA: GPU Accelerated High Performance Computing in Virtual Machines . 49 Understanding the Design Trade-offs among Current Multicore Systems for Numerical Computations . 49 Parallel Data-Locality Aware Stencil Computations on Modern Micro-Architectures . 50 Performance Projection of HPC Applications Using SPEC CFP2006 Benchmarks . 50 Session 16: Software - Distributed Systems, Scheduling and Memory Management 51 Work-First and Help-First Scheduling Policies for Async-Finish Task Parallelism . 52 Autonomic management of non-functional concerns in distributed & parallel application programming . 52 Scheduling Resizable Parallel Applications . 53 Helgrind+: An Efficient Dynamic Race Detector . 53 ii Session 17: Algorithms - Wireless Networks 54 Sensor Network Connectivity with Multiple Directional Antennae of a Given Angular Sum . 55 Unit Disk Graph and Physical Interference Model: Putting Pieces Together . 55 Path-Robust Multi-Channel Wireless Networks . 56 Information Spreading in Stationary Markovian Evolving Graphs . 56 Session 18: Applications I - Cluster/Grid/P2P Computing 57 Multiple Priority Customer Service Guarantees in Cluster Computing . 58 Treat-Before-Trick : Free-riding Prevention for BitTorrent-like Peer-to-Peer Networks . 58 A Resource Allocation Approach for Supporting Time-Critical Applications in Grid Environments . 59 Session 19: Applications II - Multicore 60 High-Order Stencil Computations on Multicore Clusters . 61 Dynamic Iterations for the Solution of Ordinary Differential Equations on Multicore Processors . 62 Efficient Large-Scale Model Checking . 62 Session 20: Software - Parallel Compilers and Languages 63 A Scalable Auto-tuning Framework for Compiler Optimization . 64 Taking the Heat off Transactions: Dynamic Selection of Pessimistic Concurrency Control . 64 Packer: an Innovative Space-Time-Efficient Parallel Garbage Collection Algorithm Based on Virtual Spaces . 65 Concurrent SSA for General Barrier-Synchronized Parallel Programs . 65 Session 21: Algorithms - Self-Stabilization 66 Optimal Deterministic Self-stabilizing Vertex Coloring in Unidirectional Anonymous Networks . 67 Self-stabilizing minimum-degree spanning tree within one from the optimal degree . 67 A snap-stabilizing point-to-point communication protocol in message-switched networks . 68 An Asynchronous Leader Election Algorithm for Dynamic Networks . 68 Session 22: Applications - Scientific Applications 69 A Metascalable Computing Framework for Large Spatiotemporal-Scale Atomistic Simulations . 70 Scalability Challenges for Massively Parallel AMR Applications . 70 Parallel Accelerated Cartesian Expansions for Particle Dynamics Simulations . 71 Parallel Implementation of Irregular Terrain Model on IBM Cell Broadband Engine . 71 Session 23: Software - Communications Systems 72 Phaser Accumulators: a New Reduction Construct for Dynamic Parallelism . 73 NewMadeleine: An Efficient Support for High-Performance Networks in MPICH2 . 73 Scaling Communication-Intensive Applications on BlueGene/P Using One-Sided Communication and Overlap . 74 Dynamic High-Level Scripting in Parallel Applications . 74 Session 24: Algorithms - Network Algorithms 75 Map Construction and Exploration by Mobile Agents Scattered in a Dangerous Network . 76 A General Approach to Toroidal Mesh Decontamination with Local Immunity . 76 On the Tradeoff Between Playback Delay and Buffer Space in Streaming . 77 Session 25: Applications - Sorting and FFTs 78 A Performance Model for Fast Fourier Transform . 79 Designing Efficient Sorting Algorithms for Manycore GPUs . 80 Minimizing Startup Costs for Performance-Critical Threading . 80 iii Workshop 1: Heterogeneity in Computing Workshop 81 Offer-based Scheduling of Deadline-Constrained Bag-of-Tasks Applications for Utility Computing Systems . 82 Resource-aware allocation strategies for divisible loads on large-scale systems . 82 Robust Sequential Resource Allocation in Heterogeneous Distributed Systems with Random Compute Node Failures 83 Revisiting communication performance models for computational clusters . 83 Cost-Benefit Analysis of Cloud Computing versus Desktop Grids . 84 Robust Data Placement in Urgent Computing Environments . ..