Proceedings

International Conference on Computer Design: VLSl IN COMPUTERS & PROCESSORS

October 2 - 4,1995 Austin, Texas

Sponsored by IEEE Computer Society Technical Committee on Design Automation IEEE Circuits and Systems Society

In cooperation with IEEE Electron Devices Society

IEEE Computer Society Press Los Alamitos, California

Washington 0 Brussels 0 Tokyo Contents International Conference on Computer-Aided Design - ICCD’95

Welcome to ICCD’95 ...... xiv ICCD‘95 Conference Committee ...... xv ICCD’95 Track Chairs and Committee Members ...... xvi

Hardware Support for Emerging Software Technologies L. Loucks

Session 1.2.1 : VLSl & Technology Plenary Chair: Larry Pileggi, University of Texas at Austin

Advances in Semiconductor Packaging and their Impact on System Design N.Naclerio

Session 1.2.2: ArchitecturelAlgorithms Plenary Chair: Bing Sheu, University of Southern California, Los Angeles

Statistical Generalization: Theory and Applications ...... 4 B. Wah, A. leumwananonthachai,S. Yao, and T. Yu

Session 1.3.2: System Level Interconnect Chair: Larry Pileggi, University Texas at Austin \ Signal Propagation In High-speed MCM Circuits...... 12 C. Truzzi, E. Beyne, E. Ringoot, and I. Peeters

Transient Analysis of Coupled Transmission Lines Characterized with the Frequency-Dependent Losses Using Scattering-Parameter Based Macromodel ...... 18 ‘ J.S.-H. Wang and W. W.-M. Dai

A CMOS Gate Array with Dynamic-Termination GTL 1/0 Circuits ...... 25 . J. Kudoh, T. Takahashi, Y. Umada,M. Kimura, S. Yamamoto,and Y. It0

Session 1.3.3: Asynchronous Systems Chair: Steve Nowick, Columbia University

Precise Exception Handling for a Self-Timed Processor ...... 32 W.F. Richardson and E. Brunvand

Implementing a STAN Chip ...... 38 M.R. Greenstreet

A High-PerformanceAsynchronous SCSI Controller ...... 44 K.Y. Yun and D.L. Dill

V Session 1.3.4: Analysis Chair: Sharon Hu, Western Michigan University

Performance Assessment of Embedded Hw/Sw Systems ...... 52 J.-P. Calvez and 0.Pasquier

A Simulation Environment for Hardware-Software Codesign ...... 58 S.L. Coumeri and D.E. Thomas Performance Estimation for Real-Time Distributed Embedded Systems ...... 64 T.-Y. Yen and W. Wolf

Session 1.4.1: Formal Verification Meets the Real World Chair: Mirian Leeser, Cornel1 University and P.A. Subrahmayam, AT&T Bell Laboratories

Verifying the Performance of the PCI Local Bus using Symbolic Techniques ...... 72 S. Campos, E.M. Clarke, W. Marrero, and M. Minea

Formal Verification of a PowerPCm ...... 79 D.P. Appenzeller and A. Kuehlmann

Extending VLSI Design with Higher-Order Logic ...... 85 A. Chavan, S.-K. Chin, S. lkram, J. D. Kim, and 1.-Y. Lu

Session 1.4.2: Issues in Superscalar Processors Chair: Bob Colwell, Intel Corp.

Design and Implementation of a 100 MHz Centralized Instruction Window for a Superscalar Microprocessor ...... 96 S. Wallace, N. Dagli, and N. Bagherzadeh

A Superscalar RISC Processor with Pseudo Vector Processing Feature ...... 102 K. Shimamura, S. Tanaka, T. Shimomura, T. Hotta, E. Kamada, H. Sawamoto, T. Shimizu, and K. Nakazawa

The Resource Conflict Methodology for Early-Stage Design Space Exploration of Superscalar RISC Processors...... 110 J.-D. Wellman and E.S. Davidson

Session I.4.3: SPARC Design Methodologies Chair: Chin-Long Wey, Michigan State University

Design of an Efficient Power Distribution Network for the UltraSPARC-IrMMircoprocessor ...... 118 A. Dalal, L. Lev, and S. Mitra

Clock Controller Design in SuperSPARC IITMMicroprocessor ...... 124 H. Ha0 and K. Bhabufhmal

Incas: A Cycle Accurate Model of UltraSPARCTM...... 130 G. Maturana, J.L. Ball, J. Gee, A. lyer, and J.M. O’Connor

vi Session 1.4.4: Simulation Chair: Derek Beatty,

Accurate Device Modeling Techniques for Efficient Timing Simulation of Integrated Circuits ...... 138 A. Devgan

Execution-Time Profiling for Multiple-Process Behavioral Synthesis...... 144 J. K. Adams, J.A.Miller, and D.E. Thomas

Emulation Verification of the Motorola 68060...... 150 J. Kumar, N.Strader, J. Freeman, and M. Miller

Session 2.1 .I: Embedded Systems Plenary Chair: Rolf Emst, University of Braunschweig

Technical Challenges of PDA Design B. Mangione-Smith

Session 2.2.1: Design for Testability Chair: Sumit Dasgupta, Sematech/IBM Corp.

Testability Analysis and Insertion for RTL Circits Based on Pseudorandom BIST ...... 162 J.E. Carletta and C.A. Papachristou

Efficient Testability Enhancement for Combinational Circuit ...... 168 Y. Fang and A. Albicki

Design for Hierarchical Testability of RTL Circuits Obtained by Behavioral Synthesis ...... 173 1. Ghosh, A. Raghunathan, and N.K. ]ha

Synthesis for Testability of Large Complexity Controllers...... 180 F. Fummi, D.Sciuto, and M. Serra

Session 2.2.2: PowerPCTM Chairs: Tim Brodnax, IBM Corp. and Nasr Ullah, Motorola

Multiprocessor Design Verification for the PowerPC 620TMMicroprocessor ...... 188 C. Montemayor, J.-T. Yen, M. Sullivan, P. Wilson, and R. Evers

The PowerPC 603e’rMMicroprocessor: An Enhanced, Low-Power, Superscalar Microprocessor ...... 196 J. Slaton, S.P. Licht, M. Alexander, K.R. Kishore, R. Jessani, and S. Reeve

A High Performance Bus and Cache Controller for PowerPCTMMultiprocessing Systems...... 204 M.S. Allen, W.K. Lewchuk, and J.D. Coddington

Performance Monitoring on the PowerPCTM604 Microprocessor ...... 212 F.E. Levine, C.P. Roth, and E.H. Welbon

vii Session 2.2.3: Floor Planning & Placement Chair: Carl Sechen, University of Washington

Thermal Placement for High-Performance Multichip Modules ...... 218 K.-Y. Chao and D.F. Wong

EPNR An Energy-Efficient Automated Layout Synthesis Package ...... 224 G. Holt and A. Tyagi

PEPPER - A Timing Driven Early Floorplanner ...... 230 G. Vijayan, V.Narayananan, D. LaPotin, and R. Gupta

Connection-Oriented Net Model and Fuzzy Clustering Techniques for K-Way Circuit Partitioning ...... 236 1.-T. Yan

Session 2.2.4: Combinational and Sequential Logic Optimization Chair: Masahiro Fujita, Fujitsu Labs of America

An Enhanced Algorithm for the Minimization of Exclusive-OR Sum-of-Products for Incompletely Specified Functions ...... 244 T. Kozlowski, E.L. Dagless, and J.M. Saul

Implicit State Minimization of Non-Deterministic FSMs ...... 250 T. Kam, T. Villa, R.K. Brayton, and A.L. Sangiovanni-Vincentelli

Extending Equivalence Class Computation to Large FSMs ...... 258 G. Cabodi, S. Quer, and P. Camurati

Efficient State Assignment Framework for Asynchronous State Graphs ...... (late paper can be found on page 692) C. Ykman-Couvreur and B. Lin

Session 2.3.1 : Massively Parallel Processing Interconnects Chair: Joydeep Ghosh, University of Texas at Austin

Adaptive Routing in Clos Networks ...... 266 P. Franaszek, C.J. Georgiou, and C.-S. Li

Rational Clocking ...... 271 L.F.G. Sarmenta, G.A. Pratt, and S.A. Ward

A Prototype Router for the Massively Parallel Computer RWC-1 ...... 279 T. Yokota, H. Matsuoka, K. Okamoto, H. Hirono, A. Hori, and S. Sakai

Session 2.3.2: Test Pattern Generation Chair: R. Molyneaux

Distributed Automatic Test Pattern Generation with a Parallel FAN Algorithm ...... (late paper can be found on page 698) S. Radfke, W. Anheier, and J. Bargfrede

... Vlll Concurrent Automatic Test Pattern Generation Algorithm for Combinational Circuits ...... 286 A.-F.S. Yousifand J, Gu

Test Generation for Multiple State-Table Faults in Finite-State Machines ...... 292 I. Pomeranz and S.M. Reddy

Session 2.3.3: Caching Strategies Chair: Jim Bondi, Texas Instruments

Pollution Control Caching...... 300 S.J. Walsh and J.A.Board

Caching Processor General Registers ...... 307 R. Yung and N.C. Wilhelm

A Dynamic Cache Sub-block Design to Reduce False Sharing ...... 313 M. Kadiyala and L.N. Bhuyan

Session Z.3.4: Embedded System Architecture & Case Studies Chair: Jim Browne, University of Texas at Austin

A Programmable Routing Controller for Flexible Communicationsin Point-to-Point Networks ...... 320 S. W. Daniel, J.L. Rexford, J. W.Dolter, and K.G. Shin

POM: A Processor Model for Image Processing ...... 326 J.-P. Theis and L. Thiele

A Case Study in Low-Power System-Level Design ...... 332 A. Wove

Session 2.4.1 : ATM and High-speed Networking Alternatives Chair: Bob Horst, Tandem Computer

A Novel Architecture for an ATM Switch ...... 340 J. Li and C.-L. Wu

Designing Fibre Channel Fabrics ...... 346 L. Cherkasova, V. Kotov, and T. Rokicki

Architecture and Design of a 40 Gigabit per second ATM Switch ...... 352 S.E. Butner and D.A. Skirmonf

Session 2.4.2: Routing & Extraction Chair: Lukas van Ginneken, Synopsys, Inc.

Accurate and Efficient Layout-to-Circuit Extraction for High-speed MOS and Bipolar/BiCMOS Integrated Circuits ...... 360 F. Beeffink, A.J.van Genderen, and N.P. van der Meijs

An Efficient Cut-Based Algorithm on Mimimizing the Number L-Shaped Channels for Safe Routing Ordering ...... 366 J.-T. Yan

ix FPGA Global Routing Based on a New Congestion Metric ...... 372 Y.-W. Chang, D.F. Wong, and C.K. Wong

Session 2.4.3: Asynchronous Datapaths Chair: Erik Brunvand, University of Utah

Asynchronous 2-D Discrete Cosine Transform Core Processor ...... 380 B. Stott, D. lohnson, and V.Akella

A Self-timed Redundant-Binary Number to Binary Number Convertor for Digital Arithmetic Processors ...... 386 C.-L. Wcy, H. Wang, and C.-P. Wang

A Self-Timed FPdA System for Functional Simulation and Logic Emulation ...... (paper nut received at presstime) D. How

Session 2.4.4: FPGA - Synthesis Chair: Steve Trimberger, Xilinx

Design and Analysis of FPGA/FPIC Switch Modules ...... 394 Y.-W. Chang, D.F. Wung, and C.K. Wong

Simultaneous Area and Delay Minimum K-LUT Mapping for K-Exact Networks ...... 402 S. Thakur and D.F. Wong

DART: Delay and Routability Driven Technology Mapping for LUT Based FPGAs ...... 409 A. Lu, E. Dagless, and J. Saul

Logic Synthesis for a Single Large Look-up Table ...... 415 R. Murcgai, F. Hirose, and M. Fujita

Session 3.1.I : Design & Test Plenary Chair: Alexander Albicki, University of Rochester

Testing -What’s Missing? An Incomplete List of Challenges ...... 426 S. Reddy

Session 3.1.2: CAD Pienary Chair: Luc Claesen, lMEC

Toward Integrated System Design: A Global Perspective ...... 428 B. Hosticka

Session 3.2.1 : Topics in High-Level Synthesis Chair: Ahmed Jerraya, TIMA/lNPG

Analysis of Conditional Resource Sharing using a Guard-based Control Representation ...... 434 I. Radivojevic and F. Brewer

Multi-Dimensional Interleaving for Time-and-Memory Design Optimization ...... 440 N.L. Passos, E.H.-M. Sha, and L.-F. Chao

X High Level Profiling Based Low Power Synthesis Technique ...... 446 S. Katkoori, N. Kumar, and R. Vemuri

Session 3.2.2: Low Power and High-Performance Circuits Chair: Kit Cham, Hewlett-Packard

Control Unit Synthesis Targeting Low-Power Processors ...... 454 C.-Y. Wang and K. Roy

Low Power Data Format Converter Design Using Semi-static Register Allocation ...... 460 L. Lucke, C. Chakrabarti, and K. Srivatsan

A 13.3ns Double-precisionFloating-point ALU and Multiplier ...... 466 H.Yamada, T. Hotta, T. Nishiyuma, F. Murabayashi, T. Yamauchi, and H. Sawamoto

Session 3.2.3: Arithmetic Modules Chair: N. Ranganathan, University of South Florida

A Floating Point Radix 2 Shared Division/Square Root Chip ...... 472 H.R. Srinivas and K.K. Parhi

High-Radix SRT Division with Speculation of Quotient Digits ...... 479 H.-S. Kay, T.-H. Pan, Y. Chun, and C.-L. Wey

A for Accurate and Reliable Numerical Computations...... (late paper can befound on page 686) M.J. Schulte and E.E. Swartzlander, Jr.

Session 3.2.4: Architectures for Signal Processors Chair: Kaushik Roy, Purdue University

Special Purpose FPGA for High-speed Digital Telecommunication Systems...... 486 A. Tsutsui, K. Yamada, T. Miyazaki, and N.Ohta VLSI Design of Densely-Connected Array Processors ...... 492 B.J. Sheu, R.C. Chang, E.Y. Chou, and T.H. Wu

VLSI Issues in Memory-System Design for Video Signal Processors ...... 498 S. Dutta, W. Wolf, and A. Wolfe

Session 3.3.1 : Memory System Performance Chair: Pradip Bose, IBM T. J. Watson Resear Center Write Buffer Design for Cache-Coherent Shared-Memory Multiprocessors ...... 506 F. Mounes-Toussi and D.J. Lilja

Reducing Data Access Penalty Using Intelligent Opcode-DrivenCache Prefetching ...... 512 C.-H. Chi and S.-C. Lau

Interrupt Based Hardware Support for Profiling Memory System Performance ...... 518 A. Goldberg and J. Trotter

xi Session 3.3.2: Emerging Technologies for Processor Verification Chair: Warren Hunt, Computational Logic Inc.

Verification of a Subtractive Radix-2 Square Root Algorithm and Implementation ...... 526 M. Leeser and J. O’Lea y

Automatic Extraction of the Control Flow Machine and Application to Evaluating Coverage of Verification Vectors ...... 532 Y.V. Hoskote, D. Moundanos, and J.A.Abraham

Theorem Proving: Not an Esoteric Diversion, but the Unifying Framework for Industrial Verification ...... 538 D.Cyrluk and M.K. Srivas

Session 3.3.3: Memory Architectures for Signal Processing Chair: Bryan Ackland, AT&T Bell Laboratories

An Empirical Study of Datapath, Memory Hierarchy, and Network in SIMD Array Architectures ...... 546 M.C. Herbordt and C.C. Weems

Memory Organization for Video Algorithms on Programmable Signal Processors ...... 552 E. De Greg, F. Catthoor, and H. De Man

SSM-MP: More Scalabilityin Shared-Memory Multi-Processor ...... 558 S. Iwasa, S.H. Shing, H. Mogi, H. Nozuwe, H. Hayashi, 0. Wakamori, T. Ohmizo, K. Tanaka, H. Sakai, and M. Saito

Session 3.3.4: Novel Design Concepts Chair: Christos Papachristou, Case Western Reserve University

Low Power and High Speed Multiplication Design Through Mixed Number Representations...... 566 M. Zheng and A. Albicki

Minimal Self-correcting Shift Counters ...... 571 A.M. Tokarnia and A.M. Peterson

Estimation of Sequential Circuit Activity Considering Spatial and Temporal Correlations ...... 577 T.-L. Chou and K. Roy

Session 3.4.1 : FSM Verification Chair: Gabriel Bischoff, Digital Equipment Corporation

A Symbolic-Simulation Approach to the Timing Verification of Interacting FSMs ...... 584 A.J.Daga and W. Birmingham

Incremental Methods for FSM Traversa! ...... 590 G. Swamy, V. Singhal, and R.K. Brayton

Extraction of Finite State Machines from Transistor Netlists by Symbolic Simulation ...... 596 M. Pandey, G. York, D. Beatty, A. Jain, S. Jain, and R.E. Byant

xii Dynamic Minimization of OKFDDs...... 602 R. Drecksler and B. Becker

Session 3.4.2: Fault Simulation Chair: Srimat Chakradhar, NEC

Data Parallel Fault Simulation...... 610 M.B. Amin and B. Vinnakota

A Parallel Algorithm for Fault Simulation Based on PROOFS ...... 616 S. Parkes, P. Banerjee, and J. Patel

Statistics on Concurrent Fault and Design Error Simulation...... 622 B. Grayson, S.A. Skaikk, and S.A. Szygenda

A New Architectural-level Fault Simulation using Propagation Prediction of Grouped Fault-Effects ...... 628 M.S. Hsiao and J.H. Patel

Session 3.4.3: Application-Specific Processors Chair Ashwiai Nanda, Texas Instruments

A CMOS Wave-pipelined Image Processor for Real-time Morphology ...... 638 R.K. Krisknamurfky and R. Sridkar

An Efficient Systolic Array for the Discrete Cosine Transform Based on Prime-Factor Decomposition ...... 644 H. Lim and E.E. Swartzlander, Jr.

Systolic Algorithms for Tree Pattem Matching ...... 650 A. Ejnioui and N.Ranganatkan

Smart-Pixel Array Processors Based on Optimal Cellular Neural Networks for Space Sensor Applications ...... (late paper can be found on page 703) W.-C. Fang, B.J. Skeu, H. Venus, and R. Sandau _-

Session 3.4.4: Performance Driven Synthesis Chair: Andreas Kuehlmann, IBM T. J. Watson Research Center

Logic Extraction Based on Normalized Netlengths ...... 658 H. Vaisknav and M. Pedram

Transformation of Min-Max Optimization to Least-Square Estimation and Application to Interconnect Design Optimization ...... 664 J.S.-H. Wang and W.W.-M. Dai

Simple Tree-ConstructionHeuristics for the Fanout Problem ...... 671 R.J. Carragker, C.-K. Ckeng, and M. Fujifa

Concurrent Timing Optimization of Latch-Based Digital Systems ...... 680 H.-Y. Hsiek, W. Liu, C.T. Gray, and R.K. Cavin

Index of Authors ...... 709

xiii