PERMEDIA and GLINT Delta

3Dlabs - Hot Chips - Stanford August 1996 - Page 1 P New GenerationSiliconfor3DGraphics New GenerationSiliconfor3DGraphics P www.3dlabs.com (408) 4363455 Vice President Marketing Neil Trevett ERMEDIA ERMEDIA andGLINTDelta andGLINTDelta 3Dlabs - Hot Chips - Stanford August 1996 - Page 2

Games Pervasive Professional 3Dlabs NewGenerationSilicon 3Dlabs NewGenerationSilicon 1st Generation Q195 Q195 GLINT GLINT 300SX 300SX 3D Blaster 3D Blaster Chip Chip The firstsinglechip geometry pipeline Q395 Q395 processor GLINT GLINT Delta Delta 2nd Generation GLINT GLINT 500TX 500TX Q196 Q196 PERMEDIA PERMEDIA obsolete Making 2Dchips Pervasive 3D- to Workstations Strong competition Professional 3D- Q396 Q396 3Dlabs - Hot Chips - Stanford August 1996 - Page 3 No Compromises ... tolow-cost3Dsilicon Alternative Approaches...... tolow-cost3Dsilicon Alternative Approaches... Performance evsv DFeDGms3 Arcade3D Games 3D FreeD Pervasive 3D 3Dlabs 3Dlabs 3D Performance? Matrox ATI S3 Rendition Nvidia Performance? 2D and3D VideoLogic 3Dfx No 2D? Cost 3Dlabs - Hot Chips - Stanford August 1996 - Page 4 No compromises! Pervasive 3D-theChallenge Pervasive 3D-theChallenge No compromises! Video Acceleration Video Acceleration • 3D performancemustbe muchgreaterthansoftwareonly • • Fast WindowsAcceleration mapped pixelspersecond Hardware shoulddeliver>25 millionbilinear-filteredtexture- Software =5milliontexture-mapped pixelspersecond Fast WindowsAcceleration 3D forgames 3D forgames Pervasive 3D Pervasive 3D 3D forauthoring 3D forauthoring Fast VGAforDOSgames Fast VGAforDOSgames 3D forbrowsing 3D forbrowsing 3Dlabs - Hot Chips - Stanford August 1996 - Page 5 P • • • • P Low cost Fast 3Dperformance No 2Dcompromises Robust 100%pixelfunctionalityofallkey3DAPIs • • • • • • ERMEDIA ERMEDIA Selling onboardscosting<$200 (2MBytes) 30 Millionbilinearfilteredtexture-mapped pixels/second 600,000 texturedpolygons/second Balanced performanceforbothtexturesandpolygons >30 MillionWinmarks Direct 3D,OpenGL,Heidi,QuickDraw3D RAVE DesignTargets DesignTargets 3Dlabs - Hot Chips - Stanford August 1996 - Page 6 High performance Unified graphics bus interface P P accelerated 3D and2D core for ERMEDIA ERMEDIA Bypass fordirect renderingandcontrolregisters Interface RamDac C Memory PCI Architecture Architecture Graphics Core VGA Bypass Interface Video for DOSgamesandboot Backward compatibility High bandwidth integrated interface memory RAMDAC Drives 3Dlabs - Hot Chips - Stanford August 1996 - Page 7 Avoids byte-swapping Avoids pollingtheFIFO Glueless PCI P P Interface ERMEDIA ERMEDIA Provides setup-fetch Interface 32-bit Master PCI Bus Rev 2.1 32-bit Slave Disconnect Bi-Endian Support overlap HostInterface HostInterface Controller Controller Interrupt DMA Memory Bypass 41x32 FIFO Interface Delta Allows softwaredrawing Provides upgradepath draw overlap Provides fetch- 3Dlabs - Hot Chips - Stanford August 1996 - Page 8 P P • • 2 to8MBytes SGRAM fornextgenerationgraphics • • • • • • ERMEDIA ERMEDIA E.g. frontcolorbuffer,backdepthtexturebuffer Up to4pagesopenatanytime Upgrade pathto100MHzandbeyond Write-per-bit mask,neededforper-windowdoublebuffering Block fills-veryimportantforclearingbuffers Good randomaccessspeed-vitalfortexturemapping Core Texture RGBA Stencil Bypass Z-Buffer Memory Unit I/F MemoryInterface MemoryInterface 64 bit SGRAM SGRAM 2 MBytes 256x32 256x32 BANK 0 SGRAM SGRAM 4 MBytes 256x32 256x32 BANK 1 6 MBytes BANK 2 etc... 3Dlabs - Hot Chips - Stanford August 1996 - Page 9 • • • Consolidated Memory All buffersinsamephysicalmemory All buffersinsamephysicalmemory Consolidated Memory Use textureoperationson All datainsamememory,scopeforoptimization, e.g. Efficient andFlexible • • • • • • • 3D spriteprocessing Video texture-mapping Full sceneanti-aliasing Clear depthbufferwithframebufferblockfills Trade resolutionfordepthbuffer,colortexturespace etc. Any sparememoryavailablefortextures Dynamically allocatecoloranddepthbuffers any image Depth andstencil Frontbuffer Backbuffer Texture 3Dlabs - Hot Chips - Stanford August 1996 - Page 10 P ERMEDIA P P • • Message passingprotocolbetweenunits Hyper-pipelined functionunits Texture/Fog Color DDA ERMEDIA ERMEDIA Rasterizer Blend Core Framebuffer Scissor Stipple ihrLogicOp Dither Read PixelCore PixelCore Localbuffer Localbuffer Write Read Framebuffer Stencil Depth YUV Write Address Texture Texture Read Host Out 3Dlabs - Hot Chips - Stanford August 1996 - Page 11 Pipeline Principles • • • Pipeline Principles Some unitsknowtheirplace Unit StateMachine Each unitinthepipelineisindependent • • • • • • • Some unitsarecompletelyself contained Return towaiting Else processmessageandpassonanymessagesasrequired If messageisnotrelevant,passtonextunit Wait formessageininputFIFO Can bedesigned,testedandsynthesizedseparately Some unitsknowwherethey areinthepipeline • • E.g. YUVcanabsorblocalbufferdata ifthechromatestfails E.g. scissor/stippleunit 3Dlabs - Hot Chips - Stanford August 1996 - Page 12 Unit PipelineStage Unit PipelineStage • • Data =32bits Tag =9bits A messageismadeupofatagfieldanddata The pipelineusesamessagepassingparadigm Data • Tag The tagidentifiesthemessagetype Two stageFIFO O F F I Data Tag Core Unit Input Stage Pipelining asrequired Register Storage Processing Control

Output Stage Data Tag Two stageFIFO O F F I Data Tag 3Dlabs - Hot Chips - Stanford August 1996 - Page 13 Message Passing • • Message Passing ‘Step’ messagesdrivetheunits Everything thatmovesthroughthepipelineisamessage • • • • • • • • • If apixelfailstestitisconverted fromactivetopassive Passive stepsarepixelsnottobe plotted A Stepmessageforeachpixeltobeplotted Messages areusedforsynchronization-e.g.Syncmessage Messages areusedascommands-e.g.startnewprimitive current pixel Messages areusedtocarrytransientinformatione.g.texturecolor for Messages areusedtoprogramcontrolregisters-e.g.enabletexture Step messagesholdthepixelX,Y coordinateinthedatafield Passive stepscannotbedeletedbecause theyadvanceDDAunits 3Dlabs - Hot Chips - Stanford August 1996 - Page 14 Unified 2D/3DPixelEngine Unified 2D/3DPixelEngine • • • Using the3Dunitsisgateefficient 2D operationsuse3Dpipelineandspecialfeatures 3D isasupersetof2D • • • • • • No duplicationoffunctions Bilinear filterusedforstretchblits Chroma keytestusedfortransparentblits Texture unitsusedfortiledblits Don’t separatethem No compromiseonperformance 3Dlabs - Hot Chips - Stanford August 1996 - Page 15 P • • • • • P > 30MillionWinmarks 2 GBytes/secFillrateusingSGRAMblockfill complexity =40Hzframerate 640x480 fullscreenbi-lineartextured,x2.5depth 30 Mpixels/sec,600Kpolygons/sec,textured,bilinear, noZ Video Playbackperformance -30fps ERMEDIA ERMEDIA • • • 320x200 YUVsourcezoomedandfiltered to640x480x16-bitRGB 2 GBytes/secColorexpansion textures, 50displayedpixelsperpolygon,meshed,640x480at75Hz With fullperpixelperspectivecorrection,16-bitframebuffer,4-bitpalletized Performance Performance 3Dlabs - Hot Chips - Stanford August 1996 - Page 16 P P • • • ERMEDIA ERMEDIA Shipping now Process Packaging • • • • • 60 MHz 0.35 3W at3.3V Wire-bonded intoaplasticBGApackage 256 pinBGA µ , 4layermetal PhysicalCharacteristics PhysicalCharacteristics 3Dlabs - Hot Chips - Stanford August 1996 - Page 17 Board Design Low componentcount Low componentcount • • Board Design External interfaces Single PERMEDIAChip • • • • High speedpixelporttoRAMDAC High performance64-bitSGRAMInterface Glueless PCIInterface plus SGRAM,RAMDAC,ROM RAM DAC OSC PERMEDIA ROM SGRAM SGRAM 256x32 256x32 SGRAM SGRAM 256x32 256x32 upgrade to4MBytes with optional Typically 2MBytes, 3Dlabs - Hot Chips - Stanford August 1996 - Page 18 1K polygons/MHzonaPentium Geometry! But where’sthebottleneck? Geometry! But where’sthebottleneck? • saturated ifrunningthegeometryinsoftware The fastestPentiumProcannotkeepPERMEDIA (90K polygonsonaP5/90) Rasterization in PERMEDIA 100% of silicon Class machine Rasterization Rasterization Delta Calcs Transforms Delta Calcs Transforms Lighting Lighting 3D API 3D API spent insetup! CPU cycles 70% ofthe 3Dlabs - Hot Chips - Stanford August 1996 - Page 19 110 Bytes/polygon • • Breaking theGeometryBottleneck GLINT Delta Breaking theGeometryBottleneck GLINT Delta Reduces PCIBandwidth-justpassingvertices no slopes Hardwired 3DPipelineProcessing • • • 100 MFlopfloatingpointprocessor Performs alldeltacalculationsandfloatingpointconversions 1M vertex/secVertexSetupProcessor P Delta Calcs Transforms Delta Calcs Transforms ERMEDIA Lighting Lighting 3D API 3D API PCI P PCI Transforms Transforms ERMEDIA GLINT Lighting Lighting 3D API 3D API Delta PCI 33 Bytes/polygon Performance Geometry Triples CPU 3Dlabs - Hot Chips - Stanford August 1996 - Page 20 setup-fetch Full Bus Provides Master overlap GLINT Delta Setup ProcessinginaPCIBridge Setup ProcessinginaPCIBridge GLINT Delta PCI Bus Primary Allows transparentuseofVGAand8514behindbridge 5V I/O 3.3V power 176 PinPQFP Controller DMA Interface Slave Master & PCI 0 Slope andSetup Calculations for GLINT andPermedia FIFO Input Function 1Decode Function 0DecodewithVGA/8514 Engine Setup Delta Path Bypass FIFO Output Control DMA and Interface Master PCI 1 P GLINT or ERMEDIA DMA to PCI Bus Secondary 3Dlabs - Hot Chips - Stanford August 1996 - Page 21 GLINT Delta Setup EngineFunctionality Setup EngineFunctionality • • • • • • • • • GLINT Delta High precisionsub-pixel correction Optional inputvalueclamping Texture coordinateauto normalization Accepts floatingpoint(IEEE SP)orfixedpointinputs Interpolation Parameters-XYZ,RGBA,F,STQ, Ks,Kd Line primitivesetup(AAandnon-AA) Triangle primitivesetup(AAandnon-AA) API neutral-low-levelfunctionality Follows themessagepassingarchitectureofPERMEDIA • Delta isjustanotherunitinfrontoftherasterizer

3Dlabs - Hot Chips - Stanford August 1996 - Page 22 GLINT DeltaSetupEngine Hardwired processing Hardwired processing GLINT DeltaSetupEngine Vertex Store MLFD DVFDIV FDIV FADD FMUL Input VertexInformation-16x32 0 Floating PointOperation Units Vertex Store 1 Data Routing Vertex Store 2 VHDL Coded,InferredRouting FConvert Working Output FIFO Store Storage Temp 8x32 3Dlabs - Hot Chips - Stanford August 1996 - Page 23 Floating Pointimprovesrobustnessandvisual quality GLINT DeltaCalculations Floating Pointimprovesrobustnessandvisual quality GLINT DeltaCalculations • • • • Main floatingpointoperators are: RGBAZ triangleset-upinvolves: All internalcalculationsincustomfloatingpoint format Input parameterscore-boarding • • • • • • • • • • Float tofixedpoint conversionwithclamping Four comparators Two dividers(5cycleiterative,autonomous) One adder/subtracter(singlecycle) One multiplier(onepipelinestage, singlecycle). plus.. compares,clamping,fixedpoint/floatingpointconversions 5 floatingpointdivides 27 floatingpointmultiplies 41 floatingpointaddorsubtract Less dynamicrange,butmoreprecisionthanIEEE 3Dlabs - Hot Chips - Stanford August 1996 - Page 24 Hard-Wired Processing Cost-effective floatingpointperformance Cost-effective floatingpointperformance Hard-Wired Processing • • • • Data pathsareinferreddirectlyfromVHDL Control isaVHDLstatemachine. 35 cents/MFlop No softwaremaintenance • • • • No generalpurposeroutingcosts No programfetch(lessmemorybandwidth) No programsequencerorinstructionset(lessgates) No RAMorROMforprogramstorage(lessgates) 3Dlabs - Hot Chips - Stanford August 1996 - Page 25 GLINT Delta Physical Characteristics • • • • Physical Characteristics GLINT Delta Performance Shipping now .45 Low costdevice-176pinPQFP • • 2M 2Dpolylines/sec 1M MeshedShaded,Zbufferedtriangles/sec µ , 40MHz,3layermetal 3Dlabs - Hot Chips - Stanford August 1996 - Page 26 Combined BoardDesign DeltaandP DeltaandP Combined BoardDesign • • • Sub $350streetprice engines, entry-leveldesktopOpenGLacceleration High performanceArcademachines,VR/simulation Matched GeometryandRasterizationperformance RAM DAC OSC PERMEDIA GLINT Delta ERMEDIA ERMEDIA ROM SGRAM SGRAM 256x32 256x32 SGRAM SGRAM 256x32 256x32 SGRAM SGRAM 256x32 256x32 SGRAM SGRAM 256x32 256x32 3Dlabs - Hot Chips - Stanford August 1996 - Page 27 Measured PerformanceIncreases GLINT Delta Measured PerformanceIncreases GLINT Delta Meshed Triangles (No Z, flat) Single Pixel Triangle Meshed Triangles (No Z, Shaded) Single Pixel Triang Meshed Triangles (No Z, flat) Small Triangles per s Meshed Triangles (No Z, Shaded) Small Triangles per Meshed Triangles (No Z, flat) 25 Pixel per second Meshed Triangles (No Z, Shaded) 25 Pixel per second No Meshed Triangles (No Z, flat) 50 Pixel per second Meshed Triangles (No Z, Shaded) 50 Pixel per second Meshed Triangles (Z, flat) Single Pixel Triangles p Meshed Triangles (Z, Shaded) Single Pixel Triangles Meshed Triangles (Z, flat) Small Triangles per seco Meshed Triangles (Z, Shaded) Small Triangles per se Meshed Triangles (Z, flat) 25 Pixel per second Meshed Triangles (Z, Shaded) 25 Pixel per second Meshed Triangles (Z, flat) 50 Pixel per second Tspeed3 V3.0 OpenGL Meshed Triangles (Z, Shaded) 50 Pixel per second er second s per second nd econd per second cond les per second second 7,1 586,527 646,607 271,214 586,527 200,079 646,607 271,068 585,847 200,159 514,781 272,531 365,412 199,290 277,016 223,155 586,527 182,048 600,476 249,629 586,527 187,454 599,762 249,629 573,212 187,454 427,242 232,398 321,247 180,744 205,870 5,4 238,997 155,146 Delta With Delta X Faster X With Delta Delta With Delta 2.16 3.23 2.16 3.23 2.15 2.58 1.64 1.52 2.35 3.20 2.35 3.20 2.47 2.36 1.56 1.54 3Dlabs - Hot Chips - Stanford August 1996 - Page 28 Future Directions • • • • • Future Directions Major siliconvendorsentering graphicschipsmarket Aggressive Performance Increases 3D GraphicsontheMotherboard Unified Memory More GeometryPipelineinHardwiredLogic • • • • • • Next generationsilicon-single chipmillionpolygondevices High integration-RAMDACs andgeometryincludedon-chip Intel’s AGP-AcceleratedGraphicsPort Using systemmemoryfortexture Hardwired logicismorecost-effective CPUs justaren’tfastenough