Nvidia Hardware Documentation Release Git

nVidia Hardware Documentation Release git Marcelina Koscielnicka´ May 24, 2021 Contents 1 Notational conventions 3 1.1 Introduction...............................................3 1.2 Bit operations...............................................3 1.3 Sign extension..............................................4 1.4 Bitfield extraction............................................5 2 nVidia hardware documentation7 2.1 nVidia GPU introduction.........................................7 2.2 GPU chips................................................ 13 2.3 nVidia PCI id database.......................................... 28 2.4 PCI/PCIE/AGP bus interface and card management logic........................ 77 2.5 Power, thermal, and clock management................................. 81 2.6 GPU external device I/O units...................................... 99 2.7 Memory access and structure...................................... 102 2.8 PFIFO: command submission to execution engines........................... 142 2.9 PGRAPH: 2d/3d graphics and compute engine............................. 167 2.10 falcon microprocessor.......................................... 316 2.11 Video decoding, encoding, and processing............................... 373 2.12 Performance counters.......................................... 491 2.13 Display subsystem............................................ 509 3 nVidia Resource Manager documentation 519 3.1 PMU................................................... 519 4 envydis and envyas documentation 543 4.1 Using envydis and envyas........................................ 543 5 TODO list 549 6 Indices and tables 685 Index 687 i ii nVidia Hardware Documentation, Release git Contents: Contents 1 nVidia Hardware Documentation, Release git 2 Contents CHAPTER 1 Notational conventions Contents • Notational conventions – Introduction – Bit operations – Sign extension – Bitfield extraction 1.1 Introduction Semantics of many operations are described in pseudocode. Here are some often used primitives. 1.2 Bit operations In many places, the GPUs allow specifying arbitrary X-input boolean or bitwise operations, where X is 2, 3, or 4. They are described by a 2**X-bit mask selecting the bit combinations for which the output should be true. For example, 2-input operation 0x4 (0b0100) is ~v1 & v2: only bit 2 (0b10) is set, so the only input combination (0, 1) results in a true output. Likewise, 3-input operation 0xaa (0b10101010) is simply a passthrough of first input: the bits set in the mask are 1, 3, 5, 7 (0b001, 0b011, 0b101, 0b111), which corresponds exactly to the input combinations which have the first input equal to 1. The exact semantics of such operations are: # single-bit version def bitop_single(op, *inputs): (continues on next page) 3 nVidia Hardware Documentation, Release git (continued from previous page) # first, construct mask bit index from the inputs bitidx=0 for idx, input in enumerate(inputs): if input: bitidx|=1 << idx # second, the result is the given bit of the mask return op >> bitidx&1 def bitop(op, *inputs): max_len= max(input.bit_length() for input in inputs) res=0 # perform bitop_single operation on each bit (+ 1 for sign bit) for x in range(max_len+1): res|= bitop_single(op, *(input >>x&1 for input in inputs)) <<x # all bits starting from max_len will be identical - just what sext does return sext(res, max_len) As further example, the 2-input operations on a, b are: • 0x0: always 0 • 0x1: ~a & ~b • 0x2: a & ~b • 0x3: ~b • 0x4: ~a & b • 0x5: ~a • 0x6: a ^ b • 0x7: ~a | ~b • 0x8: a & b • 0x9: ~a ^ b • 0xa: a • 0xb: a | ~b • 0xc: b • 0xd: ~a | b • 0xe: a | b • 0xf: always 1 For further enlightenment, you can search for GDI raster operations, which correspond to 3-input bit operations. 1.3 Sign extension An often used primitive is sign extension from a given bit. This operation is known as sext after xtensa instruction of the same name and is formally defined as follows: 4 Chapter 1. Notational conventions nVidia Hardware Documentation, Release git def sext(val, bit): # mask with all bits up from #bit set mask=-1 << bit if val&1 << bit: # sign bit set, negative, set all upper bits return val| mask else: # sign bit not set, positive, clear all upper bits return val&~mask 1.4 Bitfield extraction Another often used primitive is bitfield extraction. Extracting an unsigned bitfield of length l starting at position s in val is denoted by extr(val, s, l), and signed one by extrs(val, s, l): def extr(val, s, l): return val >>s&((1 << l)-1) def extrs(val, s, l): return sext(extrs(val, s, l), l-1) 1.4. Bitfield extraction 5 nVidia Hardware Documentation, Release git 6 Chapter 1. Notational conventions CHAPTER 2 nVidia hardware documentation Contents: 2.1 nVidia GPU introduction Contents • nVidia GPU introduction – Introduction – Card schematic – GPU schematic - NV3:G80 – GPU schematic - G80:GF100 – GPU schematic - GF100- 2.1.1 Introduction This file is a short introduction to nvidia GPUs and graphics cards. Note that the schematics shown here are simplified and do not take all details into account - consult specific unit documentation when needed. 2.1.2 Card schematic An nvidia-based graphics card is made of a main GPU chip and many supporting chips. Note that the following schematic attempts to show as many chips as possible - not all of them are included on all cards. 7 nVidia Hardware Documentation, Release git +------+ memory bus+---------+ analog video+-------+ | VRAM|------------||-----------------|| +------+|| I2C bus| VGA| ||-----------------|| +--------------+||+-------+ | PCI/AGP/PCIE|---------|| +--------------+|| TMDS video+-------+ ||-----------------|| +----------+ parallel|| analog video|| | BIOS ROM|----------||-----------------| DVI-I| +----------+ or SPI|| I2C bus+ GPIO|| ||-----------------|| +----------+ I2C bus||+-------+ | HDCP ROM|----------|| +----------+|| videolink out+------------+ ||-----------------| external|+----+ +-----------+ VID GPIO|| I2C bus|TV|--|TV| | voltage|----------||-----------------| encoder|+----+ | regulator|| GPU|+------------+ +-----------+|| | I2C bus|| +----------------|| videolink in+out+-----+ |||------------------| SLI| +--------------+|| GPIOs+-----+ | thermal| ALERT|| | monitoring|--------|| ITU-R-656+------------+ |+fan control| GPIO||-----------||+-------+ +--------------+|| I2C bus| TV decoder|--|TV in | |||-----------||+-------+ |||+------------+ +-----+ FAN GPIO|| | fan|-------------|| media port+--------------+ +-----+||------------| MPEG decoder| ||+--------------+ +-------+ HDMI bypass|| | SPDIF|--------------||+----------------------+ +-------+ audio input||-----| configuration straps| ||+----------------------+ +---------+ Note: while this schematic shows a TV output using an external encoder chip, newer cards have an internal TV encoder and can connect the output directly to the GPU. Also, external encoders are not limitted to TV outputs - they’re also used for TMDS, DisplayPort and LVDS outputs on some cards. Note: in many cases, I2C buses can be shared between various devices even when not shown by the above schema. In summary, a card contains: • a GPU chip [see GPU chips for a list] • a PCI, AGP, or PCI-Express host interface • on-board GPU memory [aka VRAM] - depending on GPU, various memory types can be supported: VRAM, EDO, SGRAM, SDR, DDR, DDR2, GDDR3, DDR3, GDDR5. • a parallel or SPI-connected flash ROM containing the video BIOS. The BIOS image, in addition to standard 8 Chapter 2. nVidia hardware documentation nVidia Hardware Documentation, Release git VGA BIOS code, contains information about the devices and connectors present on the card and scripts to boot up and manage devices on the card. • configuration straps - a set of resistors used to configure various functions of the card that need to be up before the card is POSTed. • a small I2C EEPROM with encrypted HDCP keys [optional, some G84:GT215, now discontinued in favor of storing the keys in fuses on the GPU] • a voltage regulator [starting with NV10 [?] family] - starting with roughly NV30 family, the target voltage can be set via GPIO pins on the GPU. The voltage regulator may also have “power good” and “emergency shutdown” signals connected to the GPU via GPIOs. In some rare cases, particularly on high-end cards, the voltage regulator may also be accessible via I2C. • optionally [usually on high-end cards], a thermal monitoring chip accessible via I2C, to supplement/replace the bultin thermal sensor of the GPU. May or may not include autonomous fan control and fan speed measurement capability. Usually has a “thermal alert” pin connected to a GPIO. • a fan - control and speed measurement done either by the thermal monitoring chip, or by the GPU via GPIOs. • SPDIF input [rare, some G84:GT215] - used for audio bypass to HDMI-capable TMDS outputs, newer GPUs include a builtin audio codec instead. • on-chip video outputs - video output connectors connected directly to the GPU. Supported output types depend on the GPU and include VGA, TV [composite, S-Video, or component], TMDS [ie. the protocol used in DVI digital and HDMI], FPD-Link [aka LVDS], DisplayPort. • external output encoders - usually found with older GPUs which don’t support TV, TMDS or FPD-Link outputs directly. The encoder is connected to the GPU via a parallel data bus [“videolink”] and a controlling I2C bus. • SLI connectors [optional, newer high-end cards only] - video links used to transmit video to display from slave cards in SLI configuration to the master. Uses the same circuitry as outputs to external output encoders. • TV decoder

Nvidia Hardware Documentation Release Git

Supermicro GPU Solutions Optimized for NVIDIA Nvlink

Gs-35F-4677G

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

Xfx Geforce 8200 Motherboard Drivers Download, Xfx Geforce 8200 Motherboard Drivers

Programming Graphics Hardware Overview of the Tutorial: Afternoon

GPU-Based Deep Learning Inference

How to Download 382.33 Nvidia Driver Geforce Game Ready Driver

Download Gtx 970 Driver Download Gtx 970 Driver

Arxiv:1809.03668V2 [Cs.LG] 20 Jan 2019 17, 20, 21]

Nvidia Forceware Graphics Drivers for XP, Manual and Notes

Specification XPC SN85G4V3

Numerical Behavior of NVIDIA Tensor Cores