Proceedings
International Conference on Computer Design: VLSl IN COMPUTERS & PROCESSORS
October 2 - 4,1995 Austin, Texas
Sponsored by IEEE Computer Society Technical Committee on Design Automation IEEE Circuits and Systems Society
In cooperation with IEEE Electron Devices Society
IEEE Computer Society Press Los Alamitos, California
Washington 0 Brussels 0 Tokyo Contents International Conference on Computer-Aided Design - ICCD’95
Welcome to ICCD’95 ...... xiv ICCD‘95 Conference Committee ...... xv ICCD’95 Track Chairs and Committee Members ...... xvi
Hardware Support for Emerging Software Technologies L. Loucks
Session 1.2.1 : VLSl & Technology Plenary Chair: Larry Pileggi, University of Texas at Austin
Advances in Semiconductor Packaging and their Impact on System Design N.Naclerio
Session 1.2.2: ArchitecturelAlgorithms Plenary Chair: Bing Sheu, University of Southern California, Los Angeles
Statistical Generalization: Theory and Applications ...... 4 B. Wah, A. leumwananonthachai,S. Yao, and T. Yu
Session 1.3.2: System Level Interconnect Chair: Larry Pileggi, University Texas at Austin \ Signal Propagation In High-speed MCM Circuits...... 12 C. Truzzi, E. Beyne, E. Ringoot, and I. Peeters
Transient Analysis of Coupled Transmission Lines Characterized with the Frequency-Dependent Losses Using Scattering-Parameter Based Macromodel ...... 18 ‘ J.S.-H. Wang and W. W.-M. Dai
A CMOS Gate Array with Dynamic-Termination GTL 1/0 Circuits ...... 25 . J. Kudoh, T. Takahashi, Y. Umada,M. Kimura, S. Yamamoto,and Y. It0
Session 1.3.3: Asynchronous Systems Chair: Steve Nowick, Columbia University
Precise Exception Handling for a Self-Timed Processor ...... 32 W.F. Richardson and E. Brunvand
Implementing a STAN Chip ...... 38 M.R. Greenstreet
A High-PerformanceAsynchronous SCSI Controller ...... 44 K.Y. Yun and D.L. Dill
V Session 1.3.4: Embedded System Analysis Chair: Sharon Hu, Western Michigan University
Performance Assessment of Embedded Hw/Sw Systems ...... 52 J.-P. Calvez and 0.Pasquier
A Simulation Environment for Hardware-Software Codesign ...... 58 S.L. Coumeri and D.E. Thomas Performance Estimation for Real-Time Distributed Embedded Systems ...... 64 T.-Y. Yen and W. Wolf
Session 1.4.1: Formal Verification Meets the Real World Chair: Mirian Leeser, Cornel1 University and P.A. Subrahmayam, AT&T Bell Laboratories
Verifying the Performance of the PCI Local Bus using Symbolic Techniques ...... 72 S. Campos, E.M. Clarke, W. Marrero, and M. Minea
Formal Verification of a PowerPCm Microprocessor...... 79 D.P. Appenzeller and A. Kuehlmann
Extending VLSI Design with Higher-Order Logic ...... 85 A. Chavan, S.-K. Chin, S. lkram, J. D. Kim, and 1.-Y. Lu
Session 1.4.2: Issues in Superscalar Processors Chair: Bob Colwell, Intel Corp.
Design and Implementation of a 100 MHz Centralized Instruction Window for a Superscalar Microprocessor ...... 96 S. Wallace, N. Dagli, and N. Bagherzadeh
A Superscalar RISC Processor with Pseudo Vector Processing Feature ...... 102 K. Shimamura, S. Tanaka, T. Shimomura, T. Hotta, E. Kamada, H. Sawamoto, T. Shimizu, and K. Nakazawa
The Resource Conflict Methodology for Early-Stage Design Space Exploration of Superscalar RISC Processors...... 110 J.-D. Wellman and E.S. Davidson
Session I.4.3: SPARC Design Methodologies Chair: Chin-Long Wey, Michigan State University
Design of an Efficient Power Distribution Network for the UltraSPARC-IrMMircoprocessor ...... 118 A. Dalal, L. Lev, and S. Mitra
Clock Controller Design in SuperSPARC IITMMicroprocessor ...... 124 H. Ha0 and K. Bhabufhmal
Incas: A Cycle Accurate Model of UltraSPARCTM...... 130 G. Maturana, J.L. Ball, J. Gee, A. lyer, and J.M. O’Connor
vi Session 1.4.4: Simulation Chair: Derek Beatty, Motorola
Accurate Device Modeling Techniques for Efficient Timing Simulation of Integrated Circuits ...... 138 A. Devgan
Execution-Time Profiling for Multiple-Process Behavioral Synthesis...... 144 J. K. Adams, J.A.Miller, and D.E. Thomas
Emulation Verification of the Motorola 68060...... 150 J. Kumar, N.Strader, J. Freeman, and M. Miller
Session 2.1 .I: Embedded Systems Plenary Chair: Rolf Emst, University of Braunschweig
Technical Challenges of PDA Design B. Mangione-Smith
Session 2.2.1: Design for Testability Chair: Sumit Dasgupta, Sematech/IBM Corp.
Testability Analysis and Insertion for RTL Circits Based on Pseudorandom BIST ...... 162 J.E. Carletta and C.A. Papachristou
Efficient Testability Enhancement for Combinational Circuit ...... 168 Y. Fang and A. Albicki
Design for Hierarchical Testability of RTL Circuits Obtained by Behavioral Synthesis ...... 173 1. Ghosh, A. Raghunathan, and N.K. ]ha
Synthesis for Testability of Large Complexity Controllers...... 180 F. Fummi, D.Sciuto, and M. Serra
Session 2.2.2: PowerPCTM Chairs: Tim Brodnax, IBM Corp. and Nasr Ullah, Motorola
Multiprocessor Design Verification for the PowerPC 620TMMicroprocessor ...... 188 C. Montemayor, J.-T. Yen, M. Sullivan, P. Wilson, and R. Evers
The PowerPC 603e’rMMicroprocessor: An Enhanced, Low-Power, Superscalar Microprocessor ...... 196 J. Slaton, S.P. Licht, M. Alexander, K.R. Kishore, R. Jessani, and S. Reeve
A High Performance Bus and Cache Controller for PowerPCTMMultiprocessing Systems...... 204 M.S. Allen, W.K. Lewchuk, and J.D. Coddington
Performance Monitoring on the PowerPCTM604 Microprocessor ...... 212 F.E. Levine, C.P. Roth, and E.H. Welbon
vii Session 2.2.3: Floor Planning & Placement Chair: Carl Sechen, University of Washington
Thermal Placement for High-Performance Multichip Modules ...... 218 K.-Y. Chao and D.F. Wong
EPNR An Energy-Efficient Automated Layout Synthesis Package ...... 224 G. Holt and A. Tyagi
PEPPER - A Timing Driven Early Floorplanner ...... 230 G. Vijayan, V.Narayananan, D. LaPotin, and R. Gupta
Connection-Oriented Net Model and Fuzzy Clustering Techniques for K-Way Circuit Partitioning ...... 236 1.-T. Yan
Session 2.2.4: Combinational and Sequential Logic Optimization Chair: Masahiro Fujita, Fujitsu Labs of America
An Enhanced Algorithm for the Minimization of Exclusive-OR Sum-of-Products for Incompletely Specified Functions ...... 244 T. Kozlowski, E.L. Dagless, and J.M. Saul
Implicit State Minimization of Non-Deterministic FSMs ...... 250 T. Kam, T. Villa, R.K. Brayton, and A.L. Sangiovanni-Vincentelli
Extending Equivalence Class Computation to Large FSMs ...... 258 G. Cabodi, S. Quer, and P. Camurati
Efficient State Assignment Framework for Asynchronous State Graphs ...... (late paper can be found on page 692) C. Ykman-Couvreur and B. Lin
Session 2.3.1 : Massively Parallel Processing Interconnects Chair: Joydeep Ghosh, University of Texas at Austin
Adaptive Routing in Clos Networks ...... 266 P. Franaszek, C.J. Georgiou, and C.-S. Li
Rational Clocking ...... 271 L.F.G. Sarmenta, G.A. Pratt, and S.A. Ward
A Prototype Router for the Massively Parallel Computer RWC-1 ...... 279 T. Yokota, H. Matsuoka, K. Okamoto, H. Hirono, A. Hori, and S. Sakai
Session 2.3.2: Test Pattern Generation Chair: R. Molyneaux
Distributed Automatic Test Pattern Generation with a Parallel FAN Algorithm ...... (late paper can be found on page 698) S. Radfke, W. Anheier, and J. Bargfrede
... Vlll Concurrent Automatic Test Pattern Generation Algorithm for Combinational Circuits ...... 286 A.-F.S. Yousifand J, Gu
Test Generation for Multiple State-Table Faults in Finite-State Machines ...... 292 I. Pomeranz and S.M. Reddy
Session 2.3.3: Caching Strategies Chair: Jim Bondi, Texas Instruments
Pollution Control Caching...... 300 S.J. Walsh and J.A.Board
Caching Processor General Registers ...... 307 R. Yung and N.C. Wilhelm
A Dynamic Cache Sub-block Design to Reduce False Sharing ...... 313 M. Kadiyala and L.N. Bhuyan
Session Z.3.4: Embedded System Architecture & Case Studies Chair: Jim Browne, University of Texas at Austin
A Programmable Routing Controller for Flexible Communicationsin Point-to-Point Networks ...... 320 S. W. Daniel, J.L. Rexford, J. W.Dolter, and K.G. Shin
POM: A Processor Model for Image Processing ...... 326 J.-P. Theis and L. Thiele
A Case Study in Low-Power System-Level Design ...... 332 A. Wove
Session 2.4.1 : ATM and High-speed Networking Alternatives Chair: Bob Horst, Tandem Computer
A Novel Architecture for an ATM Switch ...... 340 J. Li and C.-L. Wu
Designing Fibre Channel Fabrics ...... 346 L. Cherkasova, V. Kotov, and T. Rokicki
Architecture and Design of a 40 Gigabit per second ATM Switch ...... 352 S.E. Butner and D.A. Skirmonf
Session 2.4.2: Routing & Extraction Chair: Lukas van Ginneken, Synopsys, Inc.
Accurate and Efficient Layout-to-Circuit Extraction for High-speed MOS and Bipolar/BiCMOS Integrated Circuits ...... 360 F. Beeffink, A.J.van Genderen, and N.P. van der Meijs
An Efficient Cut-Based Algorithm on Mimimizing the Number L-Shaped Channels for Safe Routing Ordering ...... 366 J.-T. Yan
ix FPGA Global Routing Based on a New Congestion Metric ...... 372 Y.-W. Chang, D.F. Wong, and C.K. Wong
Session 2.4.3: Asynchronous Datapaths Chair: Erik Brunvand, University of Utah
Asynchronous 2-D Discrete Cosine Transform Core Processor ...... 380 B. Stott, D. lohnson, and V.Akella
A Self-timed Redundant-Binary Number to Binary Number Convertor for Digital Arithmetic Processors ...... 386 C.-L. Wcy, H. Wang, and C.-P. Wang
A Self-Timed FPdA System for Functional Simulation and Logic Emulation ...... (paper nut received at presstime) D. How
Session 2.4.4: FPGA - Synthesis Chair: Steve Trimberger, Xilinx
Design and Analysis of FPGA/FPIC Switch Modules ...... 394 Y.-W. Chang, D.F. Wung, and C.K. Wong
Simultaneous Area and Delay Minimum K-LUT Mapping for K-Exact Networks ...... 402 S. Thakur and D.F. Wong
DART: Delay and Routability Driven Technology Mapping for LUT Based FPGAs ...... 409 A. Lu, E. Dagless, and J. Saul
Logic Synthesis for a Single Large Look-up Table ...... 415 R. Murcgai, F. Hirose, and M. Fujita
Session 3.1.I : Design & Test Plenary Chair: Alexander Albicki, University of Rochester
Testing -What’s Missing? An Incomplete List of Challenges ...... 426 S. Reddy
Session 3.1.2: CAD Pienary Chair: Luc Claesen, lMEC
Toward Integrated System Design: A Global Perspective ...... 428 B. Hosticka
Session 3.2.1 : Topics in High-Level Synthesis Chair: Ahmed Jerraya, TIMA/lNPG
Analysis of Conditional Resource Sharing using a Guard-based Control Representation ...... 434 I. Radivojevic and F. Brewer
Multi-Dimensional Interleaving for Time-and-Memory Design Optimization ...... 440 N.L. Passos, E.H.-M. Sha, and L.-F. Chao
X High Level Profiling Based Low Power Synthesis Technique ...... 446 S. Katkoori, N. Kumar, and R. Vemuri
Session 3.2.2: Low Power and High-Performance Circuits Chair: Kit Cham, Hewlett-Packard
Control Unit Synthesis Targeting Low-Power Processors ...... 454 C.-Y. Wang and K. Roy
Low Power Data Format Converter Design Using Semi-static Register Allocation ...... 460 L. Lucke, C. Chakrabarti, and K. Srivatsan
A 13.3ns Double-precisionFloating-point ALU and Multiplier ...... 466 H.Yamada, T. Hotta, T. Nishiyuma, F. Murabayashi, T. Yamauchi, and H. Sawamoto
Session 3.2.3: Arithmetic Modules Chair: N. Ranganathan, University of South Florida
A Floating Point Radix 2 Shared Division/Square Root Chip ...... 472 H.R. Srinivas and K.K. Parhi
High-Radix SRT Division with Speculation of Quotient Digits ...... 479 H.-S. Kay, T.-H. Pan, Y. Chun, and C.-L. Wey
A Coprocessor for Accurate and Reliable Numerical Computations...... (late paper can befound on page 686) M.J. Schulte and E.E. Swartzlander, Jr.
Session 3.2.4: Architectures for Signal Processors Chair: Kaushik Roy, Purdue University
Special Purpose FPGA for High-speed Digital Telecommunication Systems...... 486 A. Tsutsui, K. Yamada, T. Miyazaki, and N.Ohta VLSI Design of Densely-Connected Array Processors ...... 492 B.J. Sheu, R.C. Chang, E.Y. Chou, and T.H. Wu
VLSI Issues in Memory-System Design for Video Signal Processors ...... 498 S. Dutta, W. Wolf, and A. Wolfe
Session 3.3.1 : Memory System Performance Chair: Pradip Bose, IBM T. J. Watson Resear Center Write Buffer Design for Cache-Coherent Shared-Memory Multiprocessors ...... 506 F. Mounes-Toussi and D.J. Lilja
Reducing Data Access Penalty Using Intelligent Opcode-DrivenCache Prefetching ...... 512 C.-H. Chi and S.-C. Lau
Interrupt Based Hardware Support for Profiling Memory System Performance ...... 518 A. Goldberg and J. Trotter
xi Session 3.3.2: Emerging Technologies for Processor Verification Chair: Warren Hunt, Computational Logic Inc.
Verification of a Subtractive Radix-2 Square Root Algorithm and Implementation ...... 526 M. Leeser and J. O’Lea y
Automatic Extraction of the Control Flow Machine and Application to Evaluating Coverage of Verification Vectors ...... 532 Y.V. Hoskote, D. Moundanos, and J.A.Abraham
Theorem Proving: Not an Esoteric Diversion, but the Unifying Framework for Industrial Verification ...... 538 D.Cyrluk and M.K. Srivas
Session 3.3.3: Memory Architectures for Signal Processing Chair: Bryan Ackland, AT&T Bell Laboratories
An Empirical Study of Datapath, Memory Hierarchy, and Network in SIMD Array Architectures ...... 546 M.C. Herbordt and C.C. Weems
Memory Organization for Video Algorithms on Programmable Signal Processors ...... 552 E. De Greg, F. Catthoor, and H. De Man
SSM-MP: More Scalabilityin Shared-Memory Multi-Processor ...... 558 S. Iwasa, S.H. Shing, H. Mogi, H. Nozuwe, H. Hayashi, 0. Wakamori, T. Ohmizo, K. Tanaka, H. Sakai, and M. Saito
Session 3.3.4: Novel Design Concepts Chair: Christos Papachristou, Case Western Reserve University
Low Power and High Speed Multiplication Design Through Mixed Number Representations...... 566 M. Zheng and A. Albicki
Minimal Self-correcting Shift Counters ...... 571 A.M. Tokarnia and A.M. Peterson
Estimation of Sequential Circuit Activity Considering Spatial and Temporal Correlations ...... 577 T.-L. Chou and K. Roy
Session 3.4.1 : FSM Verification Chair: Gabriel Bischoff, Digital Equipment Corporation
A Symbolic-Simulation Approach to the Timing Verification of Interacting FSMs ...... 584 A.J.Daga and W. Birmingham
Incremental Methods for FSM Traversa! ...... 590 G. Swamy, V. Singhal, and R.K. Brayton
Extraction of Finite State Machines from Transistor Netlists by Symbolic Simulation ...... 596 M. Pandey, G. York, D. Beatty, A. Jain, S. Jain, and R.E. Byant
xii Dynamic Minimization of OKFDDs...... 602 R. Drecksler and B. Becker
Session 3.4.2: Fault Simulation Chair: Srimat Chakradhar, NEC
Data Parallel Fault Simulation...... 610 M.B. Amin and B. Vinnakota
A Parallel Algorithm for Fault Simulation Based on PROOFS ...... 616 S. Parkes, P. Banerjee, and J. Patel
Statistics on Concurrent Fault and Design Error Simulation...... 622 B. Grayson, S.A. Skaikk, and S.A. Szygenda
A New Architectural-level Fault Simulation using Propagation Prediction of Grouped Fault-Effects ...... 628 M.S. Hsiao and J.H. Patel
Session 3.4.3: Application-Specific Processors Chair Ashwiai Nanda, Texas Instruments
A CMOS Wave-pipelined Image Processor for Real-time Morphology ...... 638 R.K. Krisknamurfky and R. Sridkar
An Efficient Systolic Array for the Discrete Cosine Transform Based on Prime-Factor Decomposition ...... 644 H. Lim and E.E. Swartzlander, Jr.
Systolic Algorithms for Tree Pattem Matching ...... 650 A. Ejnioui and N.Ranganatkan
Smart-Pixel Array Processors Based on Optimal Cellular Neural Networks for Space Sensor Applications ...... (late paper can be found on page 703) W.-C. Fang, B.J. Skeu, H. Venus, and R. Sandau _-
Session 3.4.4: Performance Driven Synthesis Chair: Andreas Kuehlmann, IBM T. J. Watson Research Center
Logic Extraction Based on Normalized Netlengths ...... 658 H. Vaisknav and M. Pedram
Transformation of Min-Max Optimization to Least-Square Estimation and Application to Interconnect Design Optimization ...... 664 J.S.-H. Wang and W.W.-M. Dai
Simple Tree-ConstructionHeuristics for the Fanout Problem ...... 671 R.J. Carragker, C.-K. Ckeng, and M. Fujifa
Concurrent Timing Optimization of Latch-Based Digital Systems ...... 680 H.-Y. Hsiek, W. Liu, C.T. Gray, and R.K. Cavin
Index of Authors ...... 709
xiii