Automatically Tuning Collective Communication for One-Sided Programming Models

Total Page:16

File Type:pdf, Size:1020Kb

Automatically Tuning Collective Communication for One-Sided Programming Models Automatically Tuning Collective Communication for One-Sided Programming Models by Rajesh Nishtala A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Katherine A. Yelick, Chair Professor James W. Demmel Professor Panayiotis Papadopoulos Fall 2009 Automatically Tuning Collective Communication for One-Sided Programming Models Copyright 2009 by Rajesh Nishtala 1 Abstract Automatically Tuning Collective Communication for One-Sided Programming Models by Rajesh Nishtala Doctor of Philosophy in Computer Science University of California, Berkeley Professor Katherine A. Yelick, Chair Technology trends suggest that future machines will rely on parallelism to meet increasing performance requirements. To aid in programmer productivity and application performance, many parallel programming models provide communication building blocks called collective communication. These operations, such as Broadcast, Scatter, Gather, and Reduce, ab- stract common global data movement patterns behind a simple library interface allowing the hardware and runtime system to optimize them for performance and scalability. We consider the problem of optimizing collective communication in Partitioned Global Address Space (PGAS) languages. Rooted in traditional shared memory programming models, they deliver the benefits of sophisticated distributed data structures using language extensions and one-sided communication. One-sided communication allows one processor to directly read and write memory associated with another. Many popular PGAS language implementations share a common runtime system called GASNet for implementing such communication. To provide a highly scalable platform for our work, we present a new implementation of GASNet for the IBM BlueGene/P, allowing GASNet to scale to tens of thousands of processors. We demonstrate that PGAS languages are highly scalable and that the one-sided com- munication within them is an efficient and convenient platform for collective communication. We show how to use one-sided communication to achieve 3× improvements in the latency and throughput of the collectives over standard message passing implementations. Using a 3D FFT as a representative communication bound benchmark, for example, we see a 17% increase in performance on 32,768 cores of the BlueGene/P and a 1.5× improvement on 1024 cores of the CrayXT4. We also show how the automatically tuned collectives can deliver more than an order of magnitude in performance over existing implementations on shared memory platforms. There is no obvious best algorithm that serves all machines and usage patterns demon- strating the need for tuning and we thus build an automatic tuning system in GASNet that optimizes the collectives for a variety of large scale supercomputers and novel multicore architectures. To understand the large search space, we construct analytic performance 2 models use them to minimize the overhead of autotuning. We demonstrate that autotun- ing is an effective approach to addressing performance optimizations on complex parallel systems. i Dedicated to Rakhee, Amma, and Nanna for all their love and encouragement ii Contents List of Figures v 1 Introduction 1 1.1 Related Work . 3 1.1.1 Automatically Tuning MPI Collective Communication . 3 1.2 Contributions . 4 1.3 Outline . 5 2 Experimental Platforms 7 2.1 Processor Cores . 8 2.2 Nodes . 8 2.2.1 Node Architectures . 8 2.2.2 Remote Direct Memory Access . 11 2.3 Interconnection Networks . 12 2.3.1 CLOS Networks . 12 2.3.2 Torus Networks . 17 2.4 Summary . 17 3 One-Sided Communication Models 19 3.1 Partitioned Global Address Space Languages . 19 3.1.1 UPC . 20 3.1.2 One-sided Communication . 22 3.2 GASNet . 23 3.2.1 GASNet on top of the BlueGene/P . 23 3.2.2 Active Messages . 28 4 Collective Communication 29 4.1 The Operations . 29 4.1.1 Why Are They Important? . 30 4.2 Implications of One-Sided communication for Collectives . 31 4.2.1 Global Address Space and Synchronization . 31 4.2.2 Current Set of Synchronization Flags . 32 4.2.3 Synchronization Flags: Arguments for and Against . 33 4.2.4 Optimizing the Synchronization and Collective Together . 34 iii 4.3 Collectives Used in Applications . 35 5 Rooted Collectives for Distributed Memory 38 5.1 Broadcast . 39 5.1.1 Leveraging Shared Memory . 39 5.1.2 Trees . 40 5.1.3 Address Modes . 49 5.1.4 Data Transfer . 51 5.1.5 Nonblocking Collectives . 52 5.1.6 Hardware Collectives . 55 5.1.7 Comparison with MPI . 56 5.2 Other Rooted Collectives . 58 5.2.1 Scratch Space . 58 5.2.2 Scatter . 59 5.2.3 Gather . 60 5.2.4 Reduce . 62 5.3 Performance Models . 64 5.3.1 Scatter . 66 5.3.2 Gather . 69 5.3.3 Broadcast . 69 5.3.4 Reduce . 72 5.4 Application Examples . 72 5.4.1 Dense Matrix Multiplication . 74 5.4.2 Dense Cholesky Factorization . 76 5.5 Summary . 78 6 Non-Rooted Collectives for Distributed Memory 80 6.1 Exchange . 80 6.1.1 Performance Model . 82 6.1.2 Nonblocking Collective Performance . 86 6.2 Gather-to-All . 86 6.2.1 Performance Model . 89 6.2.2 Nonblocking Collective Performance . 89 6.3 Application Example: 3D FFT . 90 6.3.1 Packed Slabs . 92 6.3.2 Slabs . 94 6.3.3 Summary . 94 6.3.4 Performance Results . 95 6.4 Summary . 99 7 Collectives for Shared Memory Systems 101 7.1 Non-rooted Collective: Barrier . 102 7.2 Rooted Collectives . 106 7.2.1 Reduce . 106 iv 7.2.2 Other Rooted Collectives . 110 7.3 Application Example: Sparse Conjugate Gradient . 113 7.4 Summary . 115 8 Software Architecture of the Automatic Tuner 117 8.1 Related Work . 118 8.2 Software Architecture . 119 8.2.1 Algorithm Index . 119 8.2.2 Phases of the Automatic Tuner . 120 8.3 Collective Tuning . 122 8.3.1 Factors that Influence Performance . 122 8.3.2 Offline Tuning . 123 8.3.3 Online Tuning . 124 8.3.4 Performance Models . 125 8.4 Summary . 129 9 Teams 132 9.1 Thread-Centric Collectives . 132 9.1.1 Similarities and Differences with MPI . 133 9.2 Data-Centric Collectives . 134 9.2.1 Proposed Collective Model . 135 9.2.2 An Example Interface . 135 9.2.3 Application Examples . 138 9.3 Automatic Tuning with Teams . 142 9.3.1 Current Status . 144 10 Conclusion 146 Bibliography 150 v List of Figures 2.1 Sun Constellation Node Architecture . 9 2.2 Cray XT4 Node Architecture . 10 2.3 Cray XT5 Node Architecture . 10 2.4 IBM BlueGene/P Node Architecture . 11 2.5 Example Hierarchies with 4 port switches . 13 2.6 Example 16-node 5-stage CLOS Network . 13 2.7 Sun Constellation System Architecture . 15 2.8 Example 8x8 2D torus Network . 16 3.1 UPC Pointer Example . 21 3.2 Roundtrip Latency Comparison on the IBM BlueGene/P . 26 3.3 Flood Bandwidth Comparison on the IBM BlueGene/P . 26 4.1 Comparison of Loose and Strict.
Recommended publications
  • Rs-232 Rs-422 Rs-485
    ConceptConcept ofof SerialSerial CommunicationCommunication AgendaAgenda Serial v.s. Parallel Simplex , Half Duplex , Full Duplex Communication RS-485 Advantage over RS-232 SerialSerial v.s.v.s. ParallelParallel Application: How to Measure the temperature in a long distance? Measuring with a DAC card: 1200 m Remote sensor Control room T/C wire T/C A/D noise Application: How to Measure the temperature in a long distance? Measuring with a remote I/O module: 1200 m Remote sensor Control room T/C Remote I/O Standard Serial Communication T/C signal, 4-20mA, 0-5V… Noise rejection (Differential signal) MostMost PopularPopular 33 typestypes ofof SerialSerial Comm.Comm. z Most commonly available Tx Rx Rx Tx z Simple wiring CTS RTS z Low cost RTS CTS RS-232 z Short length (40 ft) DTR DSR DSR DTR Bar code reader z Slow data rates GND GND z Subject to noise Tx+ z High data rates Tx- z Longer cable lengths (4000 ft) Rx+ Rx- RS-422 z Full-duplex GND z Noise rejection PLC z Multipoint application (Up to 32 units) z Low cost Data+ z Longer cable lengths (4000 ft) Data- RS-485 zNoise immunity GND zHalf-duplex PLC SerialSerial V.S.V.S. ParallelParallel CommunicationCommunication Serial Communication Transfer the data bit by bit Synchronous Data Transfer Bit Send Data Receive Data Parallel Communication Transfer the all data simultaneously Asynchronous Data Transfer Bit Bit Bit Bit Bit Bit Bit Bit Send Data Receive Data SimplexSimplex ,, HalfHalf DuplexDuplex ,, FullFull DuplexDuplex CommunicationCommunication SimplexSimplex CommunicationCommunication Simplex Communication : – Data in a simplex channel is always one way.
    [Show full text]
  • Distributed Systems
    14-760: ADVANCED REAL-WORLD NETWORKS LECTURE 17 * SPRING 2019 * KESDEN SERIAL COMMUNICATION Courtesy 18-349 SERIAL VS. PARALLEL TX Serial MCU 1 RX MCU 2 signal Data[0:7] Parallel MCU 1 MCU 2 3 WHY SERIAL COMMUNICATION? 4 • Serial communication is a pin-efficient way of sending and receiving bits of data • Sends and receives data one bit at a time over one wire • While it takes eight times as long to transfer each byte of data this way (as compared to parallel communication), only a few wires are required • Typically one to send, one to receive (for full-duplex), and a common signal ground wire • Simplistic way to visualize serial port • Two 8-bit shift registers connected together • Output of one shift register (transmitter) connected to the input of the other shift register (receiver) • Common clock so that as a bit exits the transmitting shift register, the bit enters the receiving shift register • Data rate depends on clock frequency SIMPLISTIC VIEW OF SERIAL PORT OPERATION Transmitter Receiver n 0 1 2 3 4 5 6 7 n n+1 0 1 2 3 4 5 6 n+1 7 n+2 0 1 2 3 4 5 n+2 6 7 n+3 0 1 2 3 4 n+3 5 6 7 n+4 0 1 2 3 n+4 4 5 6 7 n+5 0 1 2 n+5 3 4 5 6 7 n+6 0 1 n+6 2 3 4 5 6 7 n+7 0 n+7 1 2 3 4 5 6 7 n+8 n+8 0 1 2 3 4 5 6 7 Interrupt raised when Interrupt raised when Transmitter (Tx) is empty Receiver (Rx) is full a Byte has been transmitted a Byte has been received and next byte ready for loading and is ready for reading SIMPLE SERIAL PORT Receive Buffer Register 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Receive Shift Register Transmit Shift Register 0 1 2 3 4 5
    [Show full text]
  • Enhanced Communications Protocol Serial Port
    Enhanced Communications Protocol Serial Port Huey lope below. Shakier Muffin may very threefold while Andrej remains bloody-minded and hypertonic. Foreordained Bartlett overshoots excellently. Why degaussing is missing the linux, it is connected to have various commands from serial communications control signals going into digital Specifies the stac algorithm on the CAIM card for the port. Enhanced communication protocols there is ready for access control equipment costs, required for this interface configuration mode is also occurs waiting for other. The protocol such a real plant as a standard ports for wio lte module can be a single instrument approaches built around, said serial manager. Scanner 1131 Sensia. This communications option uses the NITP protocol to communication with the serial. Its serial manager is installed handlers should be improved accuracy of remote device in some way is processed. Hart communication port interface also values that. Moreover our enhanced security features and mass device management. Homeyduino allows you. Modbus communication problems. However, we recommend that you configure the working interface first. ASCII character key string or byte value serial communication protocols. This allows additional watchdog logic to monitor multiple slave devices for communication faults not detected by the Serial Interface. These protocols are called a protocol for controlling train simulator on device which would still often erred in parallel process to be included a communications in. Data Transmission Parallel vs Serial Transmission Quantil. This prompt type is standard Ethernet protocol; the same used on an empty internal computer network. LZS and MPPC data compression algorithms. For communications with Modbus devices any way these methods can be utilized.
    [Show full text]
  • Serial Communication Buses
    Computer Architecture 10 Serial Communication Buses Made wi th OpenOffi ce.org 1 Serial Communication SendingSending datadata oneone bitbit atat oneone time,time, sequentiallysequentially SerialSerial vsvs parallelparallel communicationcommunication cable cost (or PCB space), synchronization, distance ! speed ? ImprovedImproved serialserial communicationcommunication technologytechnology allowsallows forfor transfertransfer atat higherhigher speedsspeeds andand isis dominatingdominating thethe modernmodern digitaldigital technology:technology: RS232, RS-485, I2C, SPI, 1-Wire, USB, FireWire, Ethernet, Fibre Channel, MIDI, Serial Attached SCSI, Serial ATA, PCI Express, etc. Made wi th OpenOffi ce.org 2 RS232, EIA232 TheThe ElectronicElectronic IndustriesIndustries AllianceAlliance (EIA)(EIA) standardstandard RS-232-CRS-232-C (1969)(1969) definition of physical layer (electrical signal characteristics: voltage levels, signaling rate, timing, short-circuit behavior, cable length, etc.) 25 or (more often) 9-pin connector serial transmission (bit-by-bit) asynchronous operation (no clock signal) truly bi-directional transfer (full-duplex) only limited power can be supplied to another device numerous handshake lines (seldom used) many protocols use RS232 (e.g. Modbus) Made wi th OpenOffi ce.org 3 Voltage Levels RS-232RS-232 standardstandard convertconvert TTL/CMOS-levelTTL/CMOS-level signalssignals intointo bipolarbipolar voltagevoltage levelslevels toto improveimprove noisenoise immunityimmunity andand supportsupport longlong cablecable lengthslengths TTL/CMOS → RS232: 0V = logic zero → +3V…+12V (SPACE) +5V (+3.3V) = logic one → −3V…−12V (MARK) Some equipment ignores the negative level and accepts a zero voltage level as the "OFF" state The "dead area" between +3V and -3V may vary, many receivers are sensitive to differentials of 1V or less Made wi th OpenOffi ce.org 4 Data frame CompleteComplete one-byteone-byte frameframe consistsconsists of:of: start-bit (SPACE), data bits (7, 8), stop-bits (MARK) e.g.
    [Show full text]
  • SAM2634 Datasheet
    SAM2634 LOW-POWER SYNTHESIZER WITH EFFECTS The SAM2634 integrates into a single chip a proprietary DREAM® DSP core (64-slots DSP + 16-bit microcontroller), a 32k x 16 RAM and an LCD display interface. With addition of a single external ROM or FLASH, a complete low cost sound module can be built, including reverb and chorus effects, parametric equalizer, surround effects, without compromising on sound quality. Key features . Single chip synthesizer + effects, typical application includes: o Wavetable synthesis, serial MIDI in & out, parallel MIDI o Effects: reverb + chorus, on MIDI and/or audio in o Surround on 2 or 4 speakers with intensity/delay control o Equalizer: 4 bands, parametric o Audio in processing through echo, equalizer, surround . Low chip count o Synthesizer, ROM/Flash, DAC o Effects RAM is built-in (32k x 16) . Low power o 14 mA typ. operating o Single 3.3V supply o Built-in 1.2V regulator with power down mode . High quality wavetable synthesis o 16 bits samples, 48 KHz sampling rate, 24dB digital filter per voice o Up to 64 voices polyphony o Up to 64MByte ROM/Flash and RAM for firmware, and PCM data . Available wavetable firmwares and sample sets o CleanWave8® low cost General MIDI 1 megabyte firmware + sample set o CleanWave32® high quality 4 megabyte firmware + sample set o CleanWave64® top quality 8 megabyte firmware + sample set o Other sample sets available under special conditions. Fast product to market o Enhanced P16 processor with C compiler o Built-in ROM debugger o Flash programmer through dedicated pins. Small footprint o 100-pin LQFP package .
    [Show full text]
  • Interface Between a PC Parallel Port and GPIB Devices
    Giant Meter wave Radio Telescope National Center for Radio Astrophysics Tata Institute of Fundamental Research NCRA·TTF'R Interface between a PC parallel port and GPIB devices Author: L. Pommier Project director: T.L.VeHkanmuBHUllmll Date: 06102104 Table of Contents 1.Introduction................................................................................................................................... 2­ 2.The GPm interface .......................................................................................................................§. 2.1.GPIB messages....................................................................................................................................................2 2.2.GPIB devices.......................................................................................................................................................2 2.3.6PIB Signals........................................................................................................................................................2 2.3.1.Data lines....................................................................................................................................................Z 2.3.2.Management lines......................................................................................................................................Z 2.3.3.Handshake lines.........................................................................................................................................Z 2.4.The handshaldng
    [Show full text]
  • Communication Protocols on the PIC24EP and Arduino-A Tutorial For
    Communication Protocols on the PIC24EP and Arduino – A Tutorial for Undergraduate Students A thesis submitted to the graduate school of the University of Cincinnati in partial fulfillment of the requirements for the degree of Master of Science In the Department of Electrical Engineering and Computing Systems of the College of Engineering and Applied Science By Srikar Chintapalli Bachelor of Technology: Electronics and Communications Engineering NIT Warangal, May 2015 Committee Chair: Dr. Carla Purdy Abstract With embedded systems technology growing rapidly, communication between MCUs, SOCs, FPGAs, and their peripherals has become extremely crucial to the building of successful applications. The ability for designers to connect and network modules from different manufacturers is what allows the embedded computing world to continue to thrive and overcome roadblocks, driving us further and further towards pervasive and ubiquitous computing. This ability has long been afforded by standardized communication protocols developed and incorporated into the devices we use on a day-to-day basis. This thesis aims to explain to an undergraduate audience and to implement the major communication protocols that are used to exchange data between microcontrollers and their peripheral modules. With a thorough understanding of these concepts, students should be able to interface and program their microcontroller units to successfully build projects, giving them hands on experience in embedded design. This is an important skill to have in a field in which configuring the electronics and hardware to work correctly is equally as integral as writing code for the desired application. The protocols that are discussed are the three main serial communication protocols: I2C (inter-integrated circuit), SPI (serial peripheral interface), and TTL UART (universal asynchronous receiver transmitter).
    [Show full text]
  • Computer Ports
    Computer Ports In computer hardware, a port serves as an interface between the computer and other computers or peripheral devices. Physically, a port is a specialized outlet on a piece of equipment to which a plug or cable connects. Electronically, the several conductors making up the outlet provide a signal transfer between devices. The term "port" is derived from a Dutch word "poort" meaning gate, entrance or door. ETHERNET PORTS: Ethernet is a family of computer networking technologies for local area networks (LANs) commercially introduced in 1980. Standardized in IEEE 802.3, Ethernet has largely replaced competing wired LAN technologies. Systems communicating over Ethernet divide a stream of data into individual packets called frames. Each frame contains source and destination addresses and error-checking data so that damaged data can be detected and re-transmitted. The standards define several wiring and signaling variants. The original 10BASE5 Ethernet used coaxial cable as a shared medium. Later the coaxial cables were replaced by twisted pair and fiber optic links in conjunction with hubs or switches. Data rates were periodically increased from the original 10 megabits per second, to 100 gigabits per second. Since its commercial release, Ethernet has retained a good degree of compatibility. Features such as the 48-bit MAC address and Ethernet frame format have influenced other networking protocols. HISTORY OF ETHERNET: Ethernet was developed at Xerox PARC between 1973 and 1974.[1][2] It was inspired by ALOHAnet, which Robert Metcalfe had studied as part of his PhD dissertation.[3] The idea was first documented in a memo that Metcalfe wrote on May 22, 1973.[1][4] In 1975, Xerox filed a patent application listing Metcalfe, David Boggs, Chuck Thacker and Butler Lampson as inventors.[5] In 1976, after the system was deployed at PARC, Metcalfe and Boggs published a seminal paper.
    [Show full text]
  • Timers and Counters A/DD/A Converter Keyboards Displays
    I/O Devices I/O Devices Universal Asynchronous Receiver/Transmitter (UART) Timers and counters A/D D/A converter Keyboards Displays UART Universal asynchronous receiver transmitter provides serial communication. Usually integrated into standard PC interface chip. Serial communication Characters are transmitted separately: no char start bit 0 bit 1 ... bit n-1 stop time Serial communication parameters Baud (bit) rate. Number of bits per character. Parity/no parity. Even/odd parity. Length of stop bit (1, 1.5, 2 bits). 8251 CPU interface status (8 bit) xmit/ CPU 8251 rcv data serial (8 bit) port Timers and counters Very similar: Registers to hold the current value; An increment input that adds one to the current register value. Timer Connected to a periodic clock signal Counter Connected to more general (periodic/aperiodic) signals Watchdog timer Watchdog timer is periodically reset by system timer. If watchdog is not reset, it generates an interrupt to reset the host. interrupt host CPU watchdog reset timer A/D and D/A converters A/D converter (ADC) The control signal requires sampling the analog signal and converting it to digital form (binary) D/A converter (DAC) Convert the digital signal to the analog signal Keyboard An array of switches N1 N2 N3 k_pressed N4 M1 M2 M3 M4 4 key_code key_code keypad controller Button bouncing and debouncingN=4, M=4 Key Bouncing The mechanical contact to make or break an electrical circuit generates bouncing signal LED Light Emitting Diode Use resistor to limit current:
    [Show full text]
  • Serial Data Transfer Schemes
    Microprocessors and Microcontrollers P a g e | 1 UNIT-5 SERIAL DATA TRANSFER SCHEMES SERIAL COMMUNICATION INTRODUCTION Serial communication is common method of transmitting data between a computer and a peripheral device such as a programmable instrument or even another computer. Serial communication transmits data one bit at a time, sequentially, over a single communication line to a receiver. Serial is also a most popular communication protocol that is used by many devices for instrumentation. This method is used when data transfer rates are very low or the data must be transferred over long distances and also where the cost of cable and synchronization difficulties makes parallel communication impractical. Serial communication is popular because most computers have one or more serial ports, so no extra hardware is needed other than a cable to connect the instrument to the computer or two computers together. SERIAL AND PARALLEL TRANSMISSION Let us now try to have a comparative study on parallel and serial communications to understand the differences and advantages & disadvantages of both in detail. We know that parallel ports are typically used to connect a PC to a printer and are rarely used for other connections. A parallel port sends and receives data eight bits at a time over eight separate wires or lines. This allows data to be transferred very quickly. However, the setup looks more bulky because of the number of individual wires it must contain. But, in the case of a serial communication, as stated earlier, a serial port sends and receives data, one bit at a time over one wire.
    [Show full text]
  • The PC Parallel Ports Chapter 21
    The PC Parallel Ports Chapter 21 The original IBM PC design provided support for three parallel printer ports that IBM designated LPT1:, LPT2:, and LPT3:1. IBM probably envisioned machines that could support a standard dot matrix printer, a daisy wheel printer, and maybe some other auxiliary type of printer for different purposes, all on the same machine (laser printers were still a few years in the future at that time). Surely IBM did not antic- ipate the general use that parallel ports have received or they would probably have designed them differ- ently. Today, the PC’s parallel port controls keyboards, disk drives, tape drives, SCSI adapters, ethernet (and other network) adapters, joystick adapters, auxiliary keypad devices, other miscellaneous devices, and, oh yes, printers. This chapter will not attempt to describe how to use the parallel port for all these var- ious purposes – this book is long enough already. However, a thorough discussion of how the parallel interface controls a printer and one other application of the parallel port (cross machine communication) should provide you with enough ideas to implement the next great parallel device. 21.1 Basic Parallel Port Information There are two basic data transmission methods modern computes employ: parallel data transmission and serial data transmission. In a serial data transmission scheme (see “The PC Serial Ports” on page 1223) one device sends data to another a single bit at a time across one wire. In a parallel transmission scheme, one device sends data to another several bits at a time (in parallel) on several different wires.
    [Show full text]
  • An Efficient Architecture for PCI Bus Design
    ISSN (Print) : 2320 – 3765 ISSN (Online): 2278 – 8875 International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering (An ISO 3297: 2007 Certified Organization) Vol. 4, Issue 5, May 2015 An Efficient Architecture for PCI Bus Design Amit Sanjayrao Mamidwar1, Vrushali G. Raut2 PG Student [Digital System], Dept. of E&TC, Sinhgad College of Engineering, Pune, Maharashtra, India1 Assistant Professor, Dept. of E&TC, Sinhgad College of Engineering, Pune, Maharashtra, India 2 ABSTRACT: In several application of embedded system, the digital components interconnects are needs flexible to power and area of the devices. Also it requires a bus width with several versions like 64 bits or 32 bits with different frequencies (Ex. 33 MHz or 66 MHz). PCI bus is used for connecting these hardware devices in computer. Instead of using different buses in computer, it can use single bus for different frequencies with multiplexing bus width. In order to solve this problem, this paper designs FPGA based PCI bus with these requirements. Through analysis of PCI bus, the top-down method applies on the PCI bus with several functional modules. Also it uses the master and target as coprocessor of the device which are programmed with finite state machine using VHDL language and structured in detail which helps to minimize the power consumption. For selecting bandwidth and frequency, it has used separate multiplexing module. Through the analysis of test bench and synthesis report it will prove that the bus complete requirements of power consumption on the basis of utilization of registers, CLB‟s and flip flops and helps to design with dual frequency.
    [Show full text]