(12) United States Patent (10) Patent No.: US 7.412,554 B2 Danilak (45) Date of Patent: Aug

USOO7412554B2 (12) United States Patent (10) Patent No.: US 7.412,554 B2 Danilak (45) Date of Patent: Aug. 12, 2008 (54) BUS INTERFACE CONTROLLER FOR 2005, 0140682 A1 6/2005 Sumanaweera et al. COSTEFFECTIVE HIGH PERFORMANCE 2006/0005000 A1* 1/2006 King et al. ..................... T13/1 GRAPHICS SYSTEM WITH TWO ORMORE 2006, OO59451 A1 3/2006 Koehler et al. GRAPHICS PROCESSING UNITS 2006/0098020 A1* 5/2006 Shen et al. .................. 345,520 2006/0294.279 A1* 12/2006 McKee et al. ............... T10,301 (75) Inventor: Radoslav Danilak, Cupertino, CA (US) 2007. O139423 A1 6/2007 Kong et al. (73) Assignee: NVIDIA Corporation, Santa Clara, CA 2007/0214323 A1 9, 2007 Moll et al. (US) (*) Notice: Subject to any disclaimer, the term of this OTHER PUBLICATIONS patent is extended or adjusted under 35 Danilak, Radoslav, entitled “Graphics Processing for Cost Effective U.S.C. 154(b) by 0 days. High Performance Graphics System With Two or More Graphics Processing Units.” U.S. Appl. No. 1 1/454991, filed Jun. 15, 2006. (21) Appl. No.: 11/454,992 Danilak, Radoslav, entitled “Motherboard For Cost-Effective High Performance Graphics System With Two or More Graphics Process (22) Filed: Jun. 15, 2006 ing Units.” U.S. Appl. No. 1 1/455,072, filed Jun. 15, 2006. (65) Prior Publication Data * cited by examiner US 2007/O294458A1 Dec. 20, 2007 Primary Examiner Mark Rinehart Assistant Examiner Kim T Huynh (51) Int. Cl. (74) Attorney, Agent, or Firm—Cooley Godward Kronish G06F 3/4 (2006.01) LLP (52) U.S. Cl. ....................................... 710/305: 710/306 (58) Field of Classification Search ......... 710/301-306, (57) ABSTRACT 710/100, 240 244 See application file for complete search history. A bus interface controller manages a set of serial data lanes. (56) References Cited The bus interface controller supports operating a subset of the U.S. PATENT DOCUMENTS serial data lanes as a private bus. 7,007,025 B1 2/2006 Nason et al. 2004/0233230 A1 11/2004 Hancock 12 Claims, 13 Drawing Sheets 805 -o BIOS GPU Configuration PCI-E Configuration P CI-E Physical Interface Logical Logical Channel Channel Module Module 885 - 840 U.S. Patent Aug. 12, 2008 Sheet 1 of 13 US 7412,554 B2 FIG. I. (Prior Art) U.S. Patent Aug. 12, 2008 Sheet 2 of 13 US 7412,554 B2 180 - 130-B 30-A FIG. 2 (Prior Art) U.S. Patent Aug. 12, 2008 Sheet 3 of 13 US 7412,554 B2 302 CPU x 16 350 320-B -/ FIG. 3 (Prior Art) U.S. Patent Aug. 12, 2008 Sheet 4 of 13 US 7412,554 B2 3O2 CPU 402 404 404 Bridge Bridge x 16 x 16 Xl6 x 16 420 420 420 420 GPU GPU GPU FIG. 4 (Prior Art) U.S. Patent Aug. 12, 2008 Sheet 5 of 13 US 7412,554 B2 U.S. Patent Aug. 12, 2008 Sheet 6 of 13 US 7412,554 B2 U.S. Patent Aug. 12, 2008 Sheet 7 of 13 US 7412,554 B2 U.S. Patent Aug. 12, 2008 Sheet 8 of 13 US 7.412,554 B2 BIOS CPU GPU Configuration PCI-E Configuration 540 545 PCI-E Interface Logical Channel Module U.S. Patent Aug. 12, 2008 Sheet 9 of 13 US 7.412,554 B2 550 CPU 905 920 920 X 16 PCI-E Interface X 16 PCI-E Interface X8 X8 X8 X8 930 930 930 930 x8 x8 GPU GPU GPU GPU FIG. 9 U.S. Patent Aug. 12, 2008 Sheet 10 of 13 US 7.412,554 B2 550 CPU 1020 X32 PCI-E Interface X8 x8 X8 X8 930 930 930 930 x8 x8 GPU GPU GPU GPU 950 950 U.S. Patent Aug. 12, 2008 Sheet 11 of 13 US 7412,554 B2 U.S. Patent Aug. 12, 2008 Sheet 12 of 13 US 7.412,554 B2 550 CPU 905 920 920 X 16 PCI-E Interface X 16 PCI-E Interface x8 X8 X8 x8 930 I 105 930 I 105 x8 x8 GPU GPU PCB 950 950 FIG. 12 U.S. Patent Aug. 12, 2008 Sheet 13 of 13 US 7.412,554 B2 X 32 PCI-E Interface US 7,412,554 B2 1. 2 BUSINTERFACE CONTROLLER FOR 140-A) and a second Switch position in which eight lanes are COSTEFFECTIVE HIGH PERFORMANCE routed from chip 110 to PCI-E connector 140-A and the other GRAPHICS SYSTEM WITH TWO ORMORE eight lanes from chip 110 are routed to PCI-E connector GRAPHCS PROCESSING UNITS 140-B. Thus, in an SLI mode each PCI-E connector has half of its serial data lanes coupled to a chipset, while the other half FIELD OF THE INVENTION are unused. This results in an inherent compromise in that graphics processing power in increased (because of the two The present invention is generally related to graphics sys GPUs operating in parallel) but at the cost that each graphics tems capable of Supporting different numbers of graphics card has half of the PCI-E bandwidth that would be the case cards for improved performance. More particularly, the 10 present invention is directed towards a private bus to Support if it was used alone. a cost-effective high performance graphics system. SLI is typically implemented in a master/slave arrange ment in which work is divided up between graphics proces BACKGROUND OF THE INVENTION sors. Software drivers distribute the work of processing 15 graphics data between the two graphics cards. For example, in Graphics systems are typically implemented as a three split frame rendering (SFR), the graphics processing is orga dimensional assembly of different cards (also sometimes nized such that an individual frame is split into two different called “boards') that are plugged into a motherboard. The portions, which are processed by the different graphics pro motherboard is the main circuit board of the system and cessors in parallel. In alternate frame rendering (AFR), one typically includes a central processing unit and other chips graphics card processes the current frame while the other that are known as a “chipset.” Additionally, a motherboard graphics card works on the next frame. In one version, an includes connectors, ports, and other features for attaching external SLI connector 180 provides a link between the other electronic components. graphics cards to transmit synchronization and pixel data Referring to FIG. 1, in a conventional graphics system a between the graphics cards. motherboard 100 includes a chipset that includes, for 25 example, a bridge unit 110 and a central processing unit Recently, quad SLI Systems that include four graphics (CPU) 120. For the purposes of illustration, a graphics card cards have been released by the Nvidia Corporation. A quad 130 is illustrated in position for assembly. Graphics card 130 SLI system is an extension of SLI in which four graphics typically includes a graphics processing unit (GPU) (not cards process graphics data. For example, the work may be shown). The graphics card 130 typically includes connector 30 split into a combination of AFR and SFR in which groups of surfaces 135. For the purposes of illustration, a single con two graphics cards work on alternate frames, with each group nector surface 135 is illustrated that is designed to mate with of two graphics cards in turn performing split frame render a Peripheral Component Interface (PCI) Express (often ing. referred to as “PCI-E” or “PCIe) connector 140. PCI-E is a One problem with conventional SLI is that it is more high speed bus interface standard that utilizes high speed 35 expensive than desired. In particular, extra components. Such serial data lanes. The PCI-SIG organization publishes the as Switch cards and SLI connectors, are typically required, PCI-E standard. An individual data lane 150 comprises two increasing the cost. Another issue is related to performance simplex connections, one for receiving data and the other for caused by splitting the PCI-E bandwidth of chip 110 between transmitting data. two graphics cards. The bandwidth from the chipset to the The PCI-E standard specifies a protocol for bus interfaces 40 to configure a set of data lanes into a link between two entities. GPU is reduced by half compared to a single graphics card The bandwidth of the link scales with the number of data architecture. This also has the result of limiting the available lanes operated in parallel. The size of a PCI-E bus is com bandwidth for GPU-to-GPU traffic that flows through the monly referred to as a multiple of one data lane, e.g., “xN” or chipset. “Nx' to indicate that the link has N times the bandwidth of a 45 As illustrated in FIG.3, one alternative to conventional SLI single data lane. PCI-E Supports bus sizes of x1, x2, x4, x8. would be to use a more expensive set of chips 305,310 in the x 16, and x32 lanes. Conventionally, a variety of standard chipset to increase the PCI-E bandwidth such that each GPU connector sizes are utilized, with a x16 connector size being 320-A and 320-B has a dedicated x 16 bandwidth to the commonly used for graphics cards. chipset. However, in addition to the more expensive chipset FIG. 2 illustrates a scalable link interface (SLI) graphics 50 that is required, the architecture illustrated in FIG. 3 does not system similar to that developed by the Nvidia Corporation of have symmetric data paths 350 and 360 from the CPU 302 to Santa Clara, Calif. A SLI graphics system utilizes two or more the GPUs. Command streams from the GPU may thus arrive graphics cards 130-A and 130-B operating together to pro at each GPU at slightly different times.

(12) United States Patent (10) Patent No.: US 7.412,554 B2 Danilak (45) Date of Patent: Aug

Supermicro GPU Solutions Optimized for NVIDIA Nvlink

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

An Emerging Architecture in Smart Phones

It's Meant to Be Played

Data Sheet FUJITSU Tablet STYLISTIC M702

Technical Brief

Dell EMC Poweredge C4140 Technical Guide

Arm-Based Computing Platform Solutions Accelerating Your Arm Project Development

Mellanox Technologies Delivers 10 Gb/Sec Infiniband Server Blade Reference Design

ACACES a Decoupled Access/Execute Architecture for Mobile Gpus

Precision 5820 Tower Spec Sheet

3Dexplorer® 3000 USER's MANUAL