PCI & PCI-X Ordering Rules
Total Page:16
File Type:pdf, Size:1020Kb
Course Introduction Purpose • The intent of this course is to explain the major differences between PCI and PCI-X blocks of Freescale’s PowerQUICC III Processor and to explain the programming model for the PCI and PCI-X blocks. Objectives • Identify the major differences between PCI and PCI-X blocks. • Describe the major features of PCI blocks. • Describe the major features of PCI-X blocks. • Identify the programming models of the PCI and PCI-X blocks. Contents • 30 pages • 5 questions Learning Time • 45 minutes Welcome to this course on the PCI and PCI-X blocks of Freescale’s PowerQUICC III Processor. Additional information is provided at the end to explain the programming model for PCI and PCI-X blocks. The underlying assumption of this course is that people are familiar with PCI, so we are not going to spend too much time describing the protocol, rather in this course we’ll talk about the major differences between PCI and PCI-X from a top level and how they are implemented in the PowerQUICC III. Some of the major features of the PCI and PCI-X blocks will also be discussed. And finally, at the end of this course, the programming models of the PCI and PCI-X blocks will be described. 1 PCI Bus Architecture • Hierarchical arbitrated multimaster 32- or 64-bit muxed address/data bus CPU • Up to 66 MHz operation – Peak bandwidth 528 MB/s Host Bus Host Mem – Significantly less in practice (50%?) Bridge • 32- and 64-bit addressing PCI 0 • Transactions may be retried or deferred PCI PCI PCI – Retried transactions repeated by the Bridge Device Bridge master PCI 1 PCI 2 – Deferred transactions accepted and started by target while master retries PCI PCI PCI PCI transaction Device Device Device Device PCI is an input/output (I/O) interconnect architecture that has been around for a very long time. This figure shows a typical PCI-based system. As you can see, you can add many PCI devices using the Host PCI bridge and PCI-to-PCI bridge. A large number of devices can be connected to a PCI Bus. According to the PCI specification, some of the major features of the PCI Bus are that the Bus width would be either 32- or 64-bits, and in the PowerQUICC III part we support both of them. The common PCI frequency is 33 MHz or 66 MHz or sometimes slower. At 64-bits and 66 MHz PCI Bus frequency, the particular peak bandwidth is 528 megabytes per second and again in 32- or 64-bit mode, the addressing is PCI is 32-bit and 64-bit, respectively. So PCI defines the transaction to be seen by all devices on the PCI system by retried and deferred mechanism. Retried transactions are repeated by the master, and deferred transactions are accepted by the target. For example, let’s say a master initiates a transaction addressed towards a target on the PCI Bus. Then, if the target is busy or doesn’t have enough resources to process the transaction, it will retry the initiator. Now, the master is allowed to retry this particular transaction at a later time. On the other hand, the target should be ready to accept the transaction which it deferred once. This is basically a very top level description of the PCI Bus architecture. 2 PCI-X Differences from PCI 133 MHz Operation • Latch-to-latch protocol • More margin for prop delay Prop Delay Receiver Logic • More margin for receiver logic • In practice, only point-to-point is PCI achievable at 133 MHz Transmitter Receiver Asserts Signal Responds Prop Receiver Logic Delay Scan PCI-X PCI-X Transmitter Receiver Receiver PCI2.2 I/O Pin Asserts Signal Samples Responds PCI/PCI-X Logic Let’s take a look at the differences between PCI/X and PCI. The conventional PCI uses the immediate protocol. Look at the PCI flow in this diagram. The transmitter asserts a signal, then the signal propagates across the bus, which causes the propagation delay. Then on the same cycle, the receiver logic decodes the transaction to find out whether it should respond. On the following cycle, the receiver responds if it has to. Notice that on the first clock cycle for PCI, time is needed for the transmitter to assert the signal, then for propagation delay, and then finally the receiver logic. PCI-X uses the latch-to-latch or register-to-register protocol. As you can see from the PCI-X flow of the diagram, only transmitter signal assertion and propagation delay occur in the first cycle. The receiver logic which decodes the transaction is not used in the first cycle. In fact, at the end of the first cycle for PCI-X, the transaction is latched and the PCI-X receiver logic decodes the latched transaction on the second clock cycle. Finally, on the third clock cycle, the receiver responds if required. From this simple diagram, you can see that PCI requires two cycles and PCI-X requires three clock cycles. The key point is that when it comes to PCI-X, the amount of work it has to do on a clock cycle basis is less because it uses latch-to-latch protocol. As a consequence, the time period for PCI-X is also less, and that’s why the PCI-X can operate at a much higher frequency than the conventional PCI. Note that although higher frequency could be achieved in PCI/X, at the maximum frequency of 133 MHz, it becomes a point-to-point protocol and you can’t have more than two sync (one host and an agent). As you keep increasing the number of sync in a PCI-X, system the operating frequency reduces. 3 Question Is the following statement true or false? Click “Done” when you are finished. “PCI requires two cycles, and PCI-X requires three clock cycles.” True False Consider this question regarding the differences between PCI and PCI-X. Correct. PCI requires two cycles, and PCI-X requires three clock cycles. 4 PCI-X Differences from PCI • Split Transactions – Separate transaction for request and response – Request and response are separately arbitrated – Latency and bus utilization are improved Master Master (Initiator) (Initiator) REQ COMPLETE READOther COMPLETE GNT Bus GNT Activity COMPLETE REQ Target Target Completion Completer Completer Initiator Data Read Conventional PCI protocol supports delayed transactions. With a delayed transaction, the device requesting data must poll the target to determine when the request has been completed and if its data is available. So, this polling time is a complete overhead, and during this time, the bus is held up by the requestor who does nothing but wait for the data from the target. In this type of situation, other masters on the bus who could have used the bus can’t use it. A split transaction is implemented in PCI/X to improve bus utilization. Let’s look at the diagram. The master arbitrates for the PCI-X Bus and makes a read request. The target might not be able to immediately process the request made by the master. In this type of situation, the completer logic of the target sends acknowledgment only to the initiator, and the transaction is completed without the initiator getting the data. After this, the target can continue doing other things. Then finally, when the target has enough resources available to process the transaction once requested by initiator, it will process it. The target would then arbitrate for the bus, provide the data that was once requested by the master, and terminate the transactions. With the help of these split bus transactions, the PCI-X Bus could be utilized in a efficient way. In sum, requesting the data sends a signal to the target. The target device informs the requester that it has accepted the request. The requester is free to process other information until the target device initiates a new transaction and sends the data to the requester. Thus, split transactions enable more efficient use of the bus. 5 PCI-X Differences from PCI Mouse over each bulleted point for more information. • Insertion of wait states restricted – Initiators cannot insert wait states. – Targets cannot insert wait states after the first data beat. – Both initiators and targets may end a burst only on naturally aligned 128-byte boundaries. – This improves bus utilization. • Additional state carried with transactions – Each transaction in a sequence carries a byte count. – Each transaction carries the identity of the initiator. – This improves buffer management. • Maximum transaction size limited to 4K bytes – This improves worst case latency. Let’s take a look at another major difference between PCI and PCI-X: wait state. Conventional PCI devices often add extra clock cycles, or wait states, into their transactions. The wait states are added to “stall” the bus if the PCI device is not ready to proceed with the transaction. This can slow bus throughput dramatically. PCI-X eliminates the use of wait states, except for initial target latency. In other words, the initiators are not allowed to add wait states and the target can add wait state only with the first data bit. When a PCI-X device does not have data to transfer, it will remove itself from the bus so that another device can use the bus bandwidth. This provides more efficient use of bus and memory resources. Move your mouse pointer over the first bulleted point for more information. With PCI-X, adapters and bridges (host-to-PCI-X and PCI-X-to-PCI-X) are permitted to disconnect transactions only on naturally aligned 128-byte boundaries.