US008310487B2

(12) Ulllted States Patent (10) Patent N0.: US 8,310,487 B2 Howson (45) Date of Patent: Nov. 13, 2012

(54) MULTI-CORE PROCESSING IN OTHER PUBLICATIONS A TILE BASED RENDERING SYSTEM Search Report from United Kingdom Patent Of?ce dated Mar. 27, 2009 1 . (75) Inventor: John W- Howson: St- Albans (GB) lnterniltigillilagearch Report dated May 8, 2009 (4 pages). Written Opinion of International Search Authority (6 pages). (73) Assignee: Imagination Technologies Limited, Montryn, J S et al. “In?niteReality: A Real-Time Graphics System”, Kings Langley, Hens (GB) Proceedings, Annual Conference Series, 1997, pp. 293-302. ( >x< ) Notice: Subject to any disclaimer, the term ofthis Igehy, H et “The Design of a Parallel Graphics Interface”, Com - - puter Graphics. Conference Proceedings, Orlando, FL Jul. 19-24, patent 15 extended or adjusted under 35 1998 pp 1414 50 USC‘ 154(1)) by 62 9 days‘ Eldridge, Matthew “Designing Graphics Architectures Around Scal ability and Communication” Jun. 1, 2001, retrieved from Internet: (21) APP1~ NOJ 12/315,263 URLzhttp://graphics.stanford.edu/papers/eldridgeithesis/ eldridgeiphdpdf. (22) Filed: Dec. 1, 2008 Holten-Lind, Hans “Design for Scalability in Architectures” Jun. 1, 2001, retrieved from Internet: URL :http//WW2. (65) Prior Publication Data imm.dtu.dk/pubdb/vieWs/edocidownload.php/888/pdf/imm888. pdf. Us 2009/0174706 A1 J111~ 9, 2009 Coppen, Derek et al. “A Distributed Frame Buffer for Rapid Dynamic Changes to 3D Scenes” Computers and Graphics, Mar. 1, 1995, pp. (51) Int. Cl. 247-250. G06F 15/16 (2006.01) * . t d b . (52) us. Cl...... 345/502; 345/503; 345/505; 345/506; C1 e y exammer 345/419; 382/303; 382/304; 712/28 Primary Examiner i Xiao M_ Wu (58) Field of Classi?cation Search ...... 345/502; Assistant Examiner i Todd Bumam _ _ 345/505’ 506’ 382/303’ 3,04’ 712/28 (74) Attorney, Agent, or Firm * Flynn; Thiel; Boutell & See appl1cat1on ?le for complete search hlstory. Tanis P C

(56) References Cited (57) ABSTRACT US. PATENT DOCUMENTS A method and an apparatus are provided for combining mul tiple independent tile based graphic cores. An incoming 5,485,559 A * 1/1996 Sakaibara et al...... 345/505 _ _ _ _ 5,745,125 A * 4/ 199g Deering et a1, ,,,,,,,,,, u 345/503 geometry stream 15 spl1t1nto a plurallty of streams and sent to 5,821,950 A * 10/1998 Rentschler et al. .. 345/505 respective tile based graphics processing cores. Each one 6,344,852 B1 : 2/2002 Zhu et a1~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~~ 345/418 generates a separate tiled geometry lists. These may be com 6’798’4l0 Bl * 9/2004 Reflshaw et a1‘ ' "" " 345/427 bined into a master tiling unit or; alternatively; markers may 7,002,586 B2 2/2006 Chlu et al...... 345/505 ...... 7 898 545 B1 * 30011 Alben et a1‘ ' ' ' ' ' ' ' ' ' “ 345/519 be lnserted 1nto the t1led geometry lrsts Wh1ch are used 1n the 2007/0146378 A1* 6/2007 Sorgard et a1, ,,,,,,,,,,,,,, ,, 345/581 rasterization phase to switch between tiling lists from differ ent eomet rocessin cores. FOREIGN PATENT DOCUMENTS g W p g GB 2 430 513 A 3/2007 26 Claims, 7 Drawing Sheets

1 02 103 104 TILED SCREEN GEOMETRY SPACE PROCESSING GEOMETRY LISTS 1 10 \ 1 1 4

PRIMITIVE/COMMAND 105\ TILED1 ALPHA TEST/ 112\ \ FETCH PARAMETER FOGGING/ALPHA ACCQJLYFUFLEPF‘QT'ON —- HEEL / FETCH BLENDING PROC S'NG 101 108 106 HIDDEN TEXTURING 1 16 \ —» AND / \ SCEEILI'EDIESEIPER REMOVAL SHADING

EXAMPLE TILE BASED RENDERING SYSTEM US. Patent Nov. 13, 2012 Sheet 1 of7 US 8,310,487 B2

2 103 104 10 TILED SCREEN GEOMETRY SPACE PROCESSING GEOMETRY LISTS 1 10 PRIMIITIVEI I \ 112 \ 114 _\ COMMAND 105 TILED ALPHA TEST/ FETCH \ PARAMETER FOGGING/ALPHA ACCg’LY'FUFLEAg'ON —~ PROFSEEQNG / FETCH BLENDING 101 108 106 \ HIDDEN TEXTUR'NG / 116 \ RENDERED SURFACE —» AND REMOVAL SHADING SCENE BUFFER

FIGURE 1-EXAMPLE TILE BASED RENDERING SYSTEM

240 250

260 270

FIGURE Z-EXAMPLE OF TILE OVERLAP AND TRIANGLE ORDERING WITHIN TILES US. Patent Nov. 13, 2012 Sheet 2 of7 US 8,310,487 B2

320 340 360 \ TILED GEOMETRY GEOMETRY HST o 38o\ 310 PROCESSING CORE 39° GPCO REFERENCE LIST 0 STREAM 365 MASTER TILING MASTER TILE SPLITTER UNIT LIST m ; GEQMETRY 375 REFERENCE LIST 1 PROCESSING CORE m GPC1 TILED GEOMETRY @- / LIST 1 ' 350 0 37 330

FIG. 3-SPLITTTING OF CONTROL STREAM ACROSS MULTIPLE CORES

40° GPCO TRL

1 410

E1

0,1 1,0

420 GPCD TRL 43o iammmlmmmma

STA E1 STA E1 STA E2 T2 FM IPTR T2 FM IPTR T4 FM IPIR E 0 END "E 0 0.0 0,1 1.0 1,1

FIG. 4-TILE REFERENCE LIST DATA STRUCTURE US. Patent Nov. 13, 2012 Sheet 3 of7 US 8,310,487 B2

500

REGION HEADERS

510

FIG. S-HIERARCHIAL TILE LIST STRUCTURE

620

STATE 1 PRIM 1 PRIM 2 PRIM a 66 TRIE 0 To 945 GEOMETRY PROCESSING CORE GPCO PRIM 3 TRIE 1946 TO 1999 STATE 2 PRIM 4 4 TRIANGLES TRlE 0 TO 50 TRIANGLES 1946 2000 TRIANGLES STREAM SPLITTER 1500 TRIANGLES STATE 1 PRIM 3 TRIE 946 TO 1945

GEOMETRY PROCESSING CORE GPC1 STATE 2 PRIM 4 TRIE 1946 TO 2499

65

FIG. 6-BLOCK BASED PRIMITIVE SPLIT-TING US. Patent Nov. 13, 2012 Sheet 4 of7 US 8,310,487 B2

GPCO 712 715 \ 120 \ 725 \ 740 \ i PRIMITNE GEOMETRY TILE '_ FETCH PROCESSING T'L'NG REFEI'EEJNCE

700 70s 73o \ TILEDSPACE SCREEN _____ ‘\~\ / 160 / 750 \ \ GEOMETRY \ 22mm‘? __ STREAM — “9 MASTER ___ MASTER FETCH sPLITTER _ REGION LIST TILING UNIT

1 TILED SCREEN _,/ SPACE ~-'—" GEOMETRY 731 / LIsT

FIFO PR'M'T'VE GEOMETRY TILING REFESENCE ’—I_7_’_ FETCH PROCESSING FIFO 714 71s / 721 / 72s / 741 /

GPC1

FIG. 7-EXAMPLE SYSTEM

GPCO 812 815 \ 820 \ e25 \ B40 \ 2 F‘RlMlTlVE GEOMETRY THE _' FETCH PROCESSING T'L'NG REFERENCE FIFO

aao\ TILED SCREEN ._____ B60 550 am: \ 805 \ GEOMETRYSPACE “~~\- / / gg'a'gyg __ STREAM _ L'ST MASTER MASTER SPLITTER REGION LIST “‘ TILING UNIT FETCH _

| TILED SCREEN // SPACE l?'” GEOMETRY 831 / LIsT

r—-q PRIMITIVE GEOMETRY THE FIFo FETCH PROCESSING TILING REFEII'RCENCE 814 3 816 621 / B26 / s41 /

GPC1 SERVICE /a7o ORDER FIFO FIG. 8-ALERNATIVE EXAMPLE SYSTEM US. Patent Nov. 13, 2012 Sheet 5 of7 US 8,310,487 B2

I - GEOM ETRY TILED GEOMETRY PROCESSING CORE LIST 0 III GPCO

-m GEOM ETRY TILED GEOMETRY PROCESSING CORE LIST 1 930/ 950/ 970/

FIG. 9-SPLITTING OF GEOMETRY PROCESSING ACROSS MULTIPLE CORES USING "PIPELINE INTERLEAVE MARKERS"

1000

REGION HEADERS GPCO

1 1O

TILE LISTS GPCO

1020

REGION HEADERS GPC1

1

TILE LISTS GPC1

FIG. 10-TILE LIST DATA STRUCTURE USING l‘PIPELINE INTERLEAVE MARKERS" US. Patent Nov. 13, 2012 Sheet 6 017 US 8,310,487 B2

1100 FETCH LIST POINTERS FROM \ REGION HEADER FOR EACH CORE INTO CORE LIST PTR/ARRAY

1105 \' SET CORE LIST SELECT TO '0'

111

IS THE No CORE LIST SELECT ENTRY IN CORE PTR ARRAY ZERO ? 1120 \ FETCH NEXT TILED GEOMETRY LIST ENTRY 1115\ INCREMENT CORE LIST SELECT

IS ENTRY A PIM 1130\ 1: UPDATE LIST POINTER IN CORE PTR ARRAY 1135\ SET CORE LIST SELECT TO 1140 PIM VALUE

IS ENTRY END ?

( TILE COMPLETE )

1145 \ PROCESS TILE GOMETRY LIST ENTRY 1150 \ UPDATE LIST POINTER N CORE PTR ARRAY I__

FIG. 11- PROCESSING OF CONTROL LISTS AND "PIPELINE INTERLEAVE MARKERS" US. Patent Nov. 13, 2012 Sheet 7 017 US 8,310,487 B2

GPCO 22 1212 1215 1220 1 5

PRIMITIVE GEOMETRY _ FETCH " PROCESSING __ T'UNG

17g TILEDSPACE SCREEN 1200 1205 GEOMETRY \ \ LISTS PRIMITIVEI STREAM COMMAND _._ SPLITTER FETCH TILED SCREEN SPACE GEOMETRY 1 31 LISTS

PRIMITIVE GEOMETRY “" FETCH PROCESSING mm‘;

1214 1216 1221 1226 GPC1

FIG. 12-EXAMPLE GEOMETRY PROCESSING SYSTEM USING PIPELINE INTERLEAVE MARKERS

1300 1310

TlLEgpiglzEEN REGION GEOMETRY LIsTs HEADER FETCH

GPCO TILED 1320 SCREEN SPACE GEOMETRY L'STS LIST F'TR 0 CORE LIsT UST pTR 1 POINTER ARRAY

GPC1 TILED 1340 SCREEN SPACE \ GEOMETRY LISTS TILED HIDDEN GEOMETRY LIsT SURFACE FETCH REMOVAL

1330

FIG. 13-EXAMPLE OF THE FRONT END OF A TILE BASED RASTERISATION SYSTEM CAPABLE OF PROCESS TILE LISTS GENERATED BYA PIM BASES GEOMETRY SYSTEM US 8,310,487 B2 1 2 MULTI-CORE GEOMETRY PROCESSING IN tion in the order T1, T2, T3, T4 and overlap the four tiles Tile A TILE BASED RENDERING SYSTEM 0 (240), Tile 1 (250), Tile 2 (260) and Tile 3 (270) as shoWn. In order to preserve the original order of the triangles in the This invention relates to a three-dimensional computer tile lists the triangles Would be referenced in each tile list as graphics rendering system and in particular to a method and folloWs. an apparatus associated With combining multiple indepen dent tile based graphics cores for the purpose of increasing geometry processing performance. Tile 0 Tile 1 Tile 2 Tile 3 BACKGROUND OF THE INVENTION T1 T2 T3 T3 T2 T3 T4 It is desirable to offer computer graphics processing cores T3 at many different performance points eg from basic hand held applications through to sophisticatedly dedicated In order to evenly distribute load across geometry and graphic computers. However, the complexity of modern com tiling processors, the input data needs to be split across the puter graphics makes it di?icult to do this in either a timely or processors either on a round-robin basis or based on the load cost effective manner. As such, it is desirable to have a method of combining multiple independent cores such that perfor on individual processors. HoWever, as each processor is gen mance may be increased Without developing a Whole neW erating object tile lists locally, the preservation of the order in core. 20 Which objects are inserted into tiles requires that the order in Tile based rendering systems are Well-knoWn. These sub Which the processors Write to the per tile object lists be con divide an image into a plurality of rectangular blocks or tiles. trolled. This control Would normally require communication FIG. 1 illustrates an example of a tile based rendering system. betWeen each of the GPC’s (Graphics Processing Cores) A primitive/ command fetch unit 101 retrieves command and present, meaning that their design Would need to be changed primitive data from memory and passes the command and the 25 When scaling the number of cores present. primitive data to a geometry processing unit 102. The geom etry processing unit 102 transforms the primitive and com SUMMARY OF THE INVENTION mand data into screen space using Well-knoWn methods. This data is then supplied to a tiling unit 103 Which inserts object Preferred embodiments of the present invention provide a data from the screen space geometry into object lists for each 30 method and an apparatus that alloW a tile based rendering of a set of de?ned rectangular regions or tiles. An object list system to scale geometry processing and tiling performance for each tile contains primitives that exist Wholly or partially in a linear fashion. This is accomplished by the use of a in that tile. The list exists for every tile on the screen, although hierarchical list structure that alloWs chunks of incoming some object lists may have no data in them. These object lists geometry to be processed and tiled locally Within a core and are fetched by a tile parameter fetch unit 105 Which supplies 35 for resulting region lists from each core to be linked e?i the object lists tile by tile to a hidden surface removal unit ciently together in an order that corresponds to the original (HSR) 106. The hidden surface removal unit (HSR) 106 input geometry order. Further, the mechanism employed removes surfaces Which Will not contribute to the ?nal scene alloWs multiple cores to be used in parallel With little or no (usually because they are obscured by another surface). The required modi?cation to each of those cores. HSR unit 106 processes each primitive in the tile and passes 40 Preferably, embodiments of the invention provide a only data for visible pixels to a testing and shading unit (TSU) method and an apparatus for combining multiple independent 108. The TSU takes the data from the HSR and uses the data tile based graphic cores in Which an incoming geometry to fetch textures and apply shading to each pixel Within a stream is split into a plurality of geometry streams, one per tile visible object using Well-knoWn techniques. The TSU then based graphics processing core. Each core and separate tiled supplies the textured and shaded data to an alpha test/fogging/ 45 geometry lists for each triangle the core processes are then alpha blending unit 110. The alpha test/fogging/alpha blend combined using either a master tiling unit Which takes data ing unit 110 can apply degrees of transparency/ opacity to the from the geometry processing cores to generate a master tile surfaces again using Well-knoWn techniques. Alpha blending list for each tile preserving the input geometry order, or dur is performed using an on chip tile buffer 112 thereby elimi ing rasteriZation processing having markers Within the tiled nating the requirement to access external memory for this 50 geometry lists to have the rasteriZation core sWitch betWeen operation. Once each tile has been completed, the pixel pro lists. cessing unit 114 performs any necessary backend processing such as packing and anti-alias ?ltering before Writing the BRIEF DESCRIPTION OF THE DRAWINGS result data to a rendered scene buffer 116, ready for display. In British Patent No. GB2343598 there is described a pro 55 Preferred embodiments of the invention Will noW be cess of scaling rasteriZation performance Within a tile based described in detail by Way of examples With reference to the rendering environment by separating geometry processing accompanying draWings in Which: and tiling operations into a separate processor that supplies FIG. 1 illustrates a prior art tile based rendering system as multiple rasteriZation cores. This method does not take into discussed above; account the issues of scaling geometry processing and in FIG. 2 illustrates an example set of four triangles overlap particular tiling throughput across multiple parallel tile based ping four tiles as described above; cores. FIG. 3 illustrates the splitting of a control stream across It is commonly knoWn that 3D hardWare devices must multiple cores; preferably preserve the ordering of primitives With respect to FIG. 4 shoWs the data structure proposed for the tile refer the order in Which they Were submitted by a supplying appli 65 ence lists in an embodiment of the invention; cation. For example FIG. 2 illustrates 4 triangles T1 (200), T2 FIG. 5 shoWs a proposed hierarchical tile list data structure (210), T3 (220) and T4 (230) that are present by the applica in an embodiment of the invention; US 8,310,487 B2 3 4 FIG. 6 illustrates the splitting of control stream across master tile list 390, and this list has the data structure illus multiple core at a courser granularity in an embodiment of the trated in FIG. 5. With a tile based rendering system, invention; each tile in region headers 500 points to a corresponding tile FIG. 7 illustrates an example implementation of a system list Within the top level tile lists 510. It should be noted that embodying the invention; these lists preserve the original presentation order of the FIG. 8 illustrates a modi?cation to the example system of triangles and each list has a “Terminate” master. The top level FIG. 7; tile lists contain links to the referenced triangle lists Within FIG. 9 illustrates the splitting of a controls team across each tile 520 and 530 as generated by the GPC’s and dis multiple cores using Pipe Interleave Markers; cussed above. FIG. 10 shoWs a proposed tile list data structure in an As mentioned above, each triangle in each GPC tiled list is embodiment of the invention that uses Pipe Interleave Mark folloWed by an “End” marker. These markers are used by ers; rasteriZation hardWare in order to instruct it to move from the FIG. 11 illustrates the used by a rasteriZation GPC tile lists back to the high level tile list. The marker is used core to process multiple tile lists that are linked togetherusing so that groups of triangles can be processed on each GPC Pipe Interleave Markers; instead of single triangles. This is important as it minimiZes FIG. 12 illustrates an example geometry processing system the amount of memory associated With the high level tile lists using Pipeline Interleave Marks; and and alloWs greater decoupling of GPC’ s in case that the vertex FIG. 13 illustrates an example of the front end of a tile processing on some triangles takes more time than others. based rasteriZation system capable of generating the process FIG. 6 illustrates the splitting of the incoming primitive tile lists by a PIM based geometry system. 20 stream from an application across multiple GPC’s Where blocks of 1000 triangles are pushed doWn to each GPC. The DETAILED DESCRIPTION OF PREFERRED incoming data stream 600 contains four primitives, prim 1, 2, EMBODIMENTS 3 and 4. Each primitive contains 4, 50, 2000, 1500 triangles respectively. The stream splitter 610 splits the stream into FIG. 3 illustrates a simpli?ed version of the proposed sys 25 four blocks for processing across the tWo GPC’s (650 and tem using a master tiling unit to combine the geometry lists. 660) as illustrated. Blocks 620 and 630 are passed to GPCO In order to process incoming geometry stream 300 across and blocks 640 and 650 are passed to GPC1. Prim 1 and Prim multiple cores, the geometry stream 300 must ?rst be split 2 are both sent to GPCO along With ?rst portions of Prim 3 and into separate streams, one per core by a stream splitter 310 Prim 4. The remaining portions of Prim 3 and Prim 4 are sent Which divides the data stream for processing across (in this 30 to GPC1. The purpose of the split is to attempt to balance the example) the tWo geometry processing cores (GPC) 340 and load betWeen the tWo GPCs. In blocks 620 and 630, data from 350. In this example, the incoming stream is distributed Prim 3 is also split betWeen the tWo blocks, both of Which are across the tWo cores in a simple round-robin basis. HoWever, processed by GPCO. This process produces similar block it is also possible to split the stream across the processing siZes. The TRL and top level data structures are unchanged cores based on the processing load on each core. It is impor 35 With the exception that instead of pointing to a single triangle, tant to note that the stream splitter distributes control state as the per tile references point to groups of triangles from each Well as primitives to each geometry processing core. The block Within each tile. control state contains information that instructs the hardWare FIG. 7 illustrates an example of implementing a system hoW to process the geometry data. For example, a matrix may that uses tWo geometry processing and tiling cores. The be used to transform the geometry in a Well knoWn manner or 40 primitive and command fetch unit 700 reads the incoming details may be included regarding hoW texturing Will be control stream and passes it to the stream splitter unit 705 applied to the geometry. For example, the control stream for Which splits the stream for processing across the tWo (or GPCO 320 contains control state 1 associated With triangles more) cores as described above. The splitter passes pointers to T1 through T3, and the control stream for GPC1 330 contains the primitives to be fetched for the separate cores, speci?cally state 1 for T1 through T3 and state 2 for T4. 45 the FIFO’s 712 and 714 at the input to the “local” primitive Each of the GPC’s 340 and 350 generates a separate tiled fetch units 715 and 716. The FIFO’s are required to help geometry list 360 and 370 for each triangle that it processes. decouple the stream splitting processing from the time taken For each tile that is updated for each triangle, the GPC’ s insert by each core in order to process each batch of the primitives. a reference into a respective one of the tile reference lists The local primitive fetch units read pointers from the FIFO’ s (TRL) 3 65 and 375. The TRL’ s and the per GPC tile geometry 50 712 and 714. The local primitive fetch units then read the lists form the data structures illustrated in FIG. 4, and are used actual geometry data from memory and pass it to the geom by a master tiling unit 380 to produce a master tile list 390. etry processing units 720 and 721, Which process the geom The TRL 400 for GPCO contains references for triangles etry and pass it to the local tiling units 725 and 726. The tiling T1 and T3 that are processed through that core. For example, units tile the processed geometry generated by local tiled lists T1 is present in tile 0.0 only, and thus a reference and a pointer 55 730 and 731, and pass TRL’s for these lists into tile reference to the corresponding tiled geometry list 410 are included in FIFO’s 740 and 741 Which buffer the previously described the TRL, folloWed by references for T3 in all four tiles. TRL’s While Waiting to be consumed by the master tiling unit Similarly the TRL for GPC1 420 contains references for each (MTU) 750. It should be noted that these FIFO’s can be tile overlapped by T2 and T4 in the corresponding tiled geom contained either in external memory or on chip, alloWing etry list for GPC1 43 0. It should be noted that the tiled triangle 60 signi?cant ?exibility in the amount of buffering betWeen the lists include an “End” marker after each triangle is indicated GPC’s and the master tiling unit. The use of a FIFO/buffer at 430. alloWs the GPC’s to be decoupled from the operation of the The master tiling unit (MTU) 380 in FIG. 3 reads the TRL’ s MTU, and this minimiZes stalls in case that the MTU spends in the same round-robin order that the primitives are distrib a signi?cant amount of time generating the master tile lists. uted across the GPC’s (in this example), taking the tile refer 65 The MTU uses the TRL data from the FIFO’ s to generate the ences for one triangle from each TRL before moving to the master region lists 760, Which form the data structure together next. The MTU 380 takes the tile references and generates a With the local tile lists as described above. US 8,310,487 B2 5 6 Using a simple round-robin scheme means that ?lling up FIG. 12 illustrates an example of an implementation of a either of the split stream FIFO’s 712 or 714 due to one GPC PIM based system that uses tWo geometry processing and takes signi?cantly longer than the other GPC that the stream tiling cores. The primitive and command fetch unit 1200 splitter Will stall. As such, in the case of signi?cant imbalance reads the incoming control stream and passes it to the stream in processing time, these FIFO’ s may need to be signi?cantly splitter unit 1205 Which splits the stream for processing larger in order to prevent any of the GPC pipelines from being across the tWo (or more) cores as described above. The splitter idle. FIG. 8 depicts an alternative embodiment in Which the passes pointers to the primitives to be fetched to the separate splitter sends groups of primitives to each core based on hoW cores, speci?cally the FIFO’s 1212 and 1214 at the input to busy that core is. The processing load of each core is moni the “local” primitive fetch units 1215 and 1216. The FIFO’s tored. The operation of the system is identical to one are required to help decouple the stream splitting processing described above With the exception that the stream splitter from the time taken by each core to process each batch of 805 is fed With information from the geometry processing primitives. The local primitive fetch units read the actual units 820 and 821 Which indicates hoW busy they are, for geometry data from memory and pass it to the geometry example the fullness of input buffering. The stream splitter processing units 1220 and 1221, Which process the geometry uses this information to direct groups of primitives to the GPC and pass it to the tiling units 1225 and 1226. The tiling units Which is least heavily loaded. As the order Which primitives tile the processed geometry generated by the per core tile lists Will be submitted to the cores is noW non-deterministic, the 1230 and 1231. stream splitter must generate a core reference sequence for FIG. 13 illustrates the front end of the rasteriZation core the MTU so that it can pull the TRL’s from the TRL FIFO’s capable of traversing tile lists generated by multiple GPC’s in the correct order. The reference sequence is Written into a 20 using PIMs. The region header fetch unit 1310 reads the Service Order FIFO 870 by the stream splitter Which the region headers from the screen space tiled geometry lists MTU reads in order to determine Which TRL FIFO is to be 1300 generated by each GPC and Writes the resulting lists read next. pointers into the core list pointer array 1320 as described in FIG. 9 illustrates a system that alloWs the geometry to be FIG. 11. The tiled geometry list fetch unit 1330 then fetches processed across multiple cores using “Pipe Interleave Mark 25 and processes the per tile control lists as described in FIG. 11, ers” instead of a master tiling unit. Like the master tiling unit and passes resulting geometry to the hidden surface removal based system, the incoming geometry stream data is split by unit 1340. All of the processings at the hidden surface the stream splitter 910 and distributed to the GPCs 940, 950 as removal unit 1340 are the same as described for a normal tile described above. Each GPC generates its oWn tiled geometry based rendering system. lists 960 and 970. FIG. 10 illustrates the structure of the tile 30 geometry lists. Each GPC generates its oWn region headers The invention claimed is: 1000 and 1020 Which point to the tiled geometry lists 1010 1. A method for performing geometry processing and tiling and 1030. Like the normal tile based rendering system, the in a three dimensional graphics rendering system comprising geometry passes through each core. At the end of each geom the steps of: etry block, a GPC inserts a “Pipe Interleave Market” (PIM) 35 providing a stream of graphics primitive data for a scene to 1040 Which is used during the rasteriZation process to enable be rendered, each primitive comprising data de?ning a traversal of the lists in the correct order by a single core. plurality of triangles making up that primitive; The How chart in FIG. 11 illustrates hoW the rasteriZation splitting primitive data betWeen a plurality of geometry uses the PIM markers to traverse the lists. At the start of processing units; processing each tile, the contents of the region headers gen 40 for each triangle processed by each geometry processing erated by each core are loaded into a core list pointer array at unit, inserting a triangle into a respective tile list in a set 1100. This results in each entry Within the array containing a of tiled geometry lists associated With that geometry pointer to the region list generated for each core of the region processing unit; being processed. Processing of the region lists starts assum for each tile in Which a triangle is inserted, inserting a ing that the ?rst block of primitive data Was processed by the 45 reference to that tile into a tile reference list associated ?rst GPC i.e. GPCO by setting an index into the array to 0 at With that geometry processing unit; and 1105. The pointer value is then tested at 1110 to see if it is generating a master tile list for the scene to be rendered Zero. If it is Zero, it means the list is either empty or has from the tile reference lists and the tile geometry lists already been processed to completion. The array index is associated With the geometry processing units, includ incremented at 1115 and the test performed at 1110 is 50 ing reading data from the tile reference lists associated repeated. This process is repeated until a list that contains With each geometry processing unit in the order in Which geometry is found Where point data is fetched at 1120 from data Was distributed to the geometry processing units to the tiled geometry list using the point indexed by the array generate a master tile list comprising pointers to each index. At 1125, the fetched data is tested to determine if it is respective tile list in the set of tile geometry lists asso a PIM. If it is a PIM, then the current list pointer is updated to 55 ciated With each geometry processing unit. point to the next data in the tiled geometry list and Written 2. The method according to claim 1, Wherein the step of back into the core list pointer array at 1130. The array index splitting primitive data comprises the step of splitting on a is then set to the value speci?ed Within the PIM at 1135, and round robin basis. the processingjumps back to 1110. Ifthe test at 1125 does not 3. The method according to claim 1, Wherein the step of detect a PIM, the fetched data is tested to see if it is an End 60 splitting primitive data is arranged to distribute substantially market at 1140. If an end is detected, then processing of the similar amounts of primitive data to each geometry process current tile has completed and the hardWare Will move onto ing unit. processing the next tile. If the data is not an end marker, then 4. The method according to claim 1, Wherein the step of it is a geometry or state reference and is processed at 1145 as splitting primitive data comprises the steps of monitoring the necessary. The list pointer is then updated at 1150 and pro 65 processing load on each and splitting cessing returns to 1120 to fetch the next entries Within the data betWeen graphic processing units in dependence on their tiled geometry list. processing loads. US 8,310,487 B2 7 8 5. The method according to claim 1, including the step of headers and its oWn tiled geometry lists With each region buffering primitive data betWeen the step of splitting the header pointing to the tiled geometry lists; primitive data and sending it to the graphics processing units. for each triangle processed by each geometry processing 6. The method according to claim 1, including the step of unit, inserting a triangle into a respective tile list in a set buffering data from tile reference lists and the tiled geometry 5 of tiled geometry lists associated With that geometry lists before the step of generating the tiling data. processing unit; 7. A system for performing geometry processing and tiling for each tile, a geometry processing unit inserts a pipe in a three dimensional graphics rendering system comprising: interleave marker into the tiled geometry lists associated means for providing a stream of graphics primitive data for With that geometry processing unit for each block of the scene to be rendered, each primitive comprising data geometry processed by each graphics processing unit; de?ning a plurality of triangles making up that primitive; Which is used during rasteriZation to enable traversal of the lists in the correct order by a single processing core; means for splitting primitive data betWeen a plurality of generating tiling data for the scene to be rendered from the geometry processing units; tiled geometry lists associated With each geometry pro a geometry processing unit arranged to insert a triangle into cessing unit; a tiled geometry list in a set of tiled geometry lists and using the markers in the tiled geometry lists to indicate associated With that geometry processing unit for each When to sWitch betWeen tiled geometry lists from differ triangle processed; ent graphics processing units. means for inserting a reference to that tile into a tile refer 15. The method according to claim 14, Wherein the step of ence list associated With a geometry processing unit for 20 splitting primitive data comprises the step of splitting on a each tile in Which a triangle in inserted; and round robin basis. means for generating a master tile list for the scene to be 16. The method according to claim 14, Wherein the step of rendered from the tile reference lists and the tiled geom splitting primitive data is arranged to distribute substantially etry and similar amounts of primitive data to each geometry process means for generating a master tile list for the scene to be 25 ing unit. rendered from the tile reference lists and the tile geom 17. The method according to claim 14, Wherein the step of etry lists associated With the geometry processing units, splitting primitive data comprises the steps of monitoring the including reading data from the tile reference lists asso processing load on each graphics processing unit and splitting ciated With each geometry processing unit in the order in data betWeen graphic processing units in dependence on their Which data Was distributed to the geometry processing 30 processing loads. units to generate a master tile list comprising pointers to 18. The method according to claim 14 including the step of each respective tile list in the set of tile geometry lists buffering primitive data between the step of splitting the associated With each geometry processing unit. primitive data and sending it to the graphics processing units. 8. The system according to claim 7, Wherein the means for 19. The method according to claim 14 including the step of splitting primitive data does this on a round robin basis. 35 buffering data from the tiled geometry lists before the step of 9. The system according to claim 7, Wherein the means for generating the tiling data. splitting primitive data is arranged to distribute substantially 20. The method according to claim 14, Wherein the step of similar amounts of primitive data to each geometry process using the markers in the tiled geometry lists comprises the ing unit. steps of reading region headers from each of the tiled geom 10. The system according to claim 7, Wherein the means for 40 etry lists, and Writing tiling list pointers into a pointer array splitting primitive data comprises means for monitoring the Which points to the start of the tiled geometry lists generated processing load on each graphics processing unit and means by each graphics processing unit lists, for each tile in turn, for for splitting data betWeen graphics processing units in depen use in rendering the scene. dence on their processing loads. 21. A system for performing geometry processing and 11. The system according to claim 10, Wherein the means 45 tiling in a three dimensional graphics rendering system com for using markers in the tiled geometry lists comprises means prising: for reading region headers from each of the tiled geometry means for providing a stream of graphics primitive data for lists, means for Writing tiling list pointers into a pointer array the scene to be rendered, each primitive comprising data Which points to the start of the tiled geometry lists generated de?ning a plurality of triangles making up that primitive; by each graphics processing unit, and means for fetching the 50 means for splitting primitive data betWeen a plurality of tiled geometry list for each tile in turn for use in rendering the geometry processing units, each unit generates its oWn scene. region headers and its oWn tiled geometry lists With each 12. The system according to claim 7 including means for region header pointing to the tiled geometry lists; buffering primitive data betWeen the means for splitting the each geometry processing unit arranged to insert a triangle primitive data and the graphics processing unit. 55 into a tiled geometry list in a set of tiled geometry lists 13. The system according to claim 7 including means for associated With that geometry processing unit for each buffering data from tile reference lists and the tile geometry triangle processed; before providing it to the means for generating a master tile means for inserting, by a geometry processing unit, a pipe list. interleave marker for each tile into the tiled geometry 14. A method for performing geometry processing and 60 lists associated With that geometry processing unit, for tiling in a three dimensional graphics rendering system com each block of geometry processed by that graphics pro prising the steps of: cessing unit; Which is used during rasteriZation to enable providing a stream of graphics primitive data for the scene traversal of the lists in the correct order by a single to be rendered, each primitive comprising data de?ning processing unit; a plurality of triangles making up that primitive; 65 means for generating tiling data for the scene to be ren splitting primitive data betWeen a plurality of geometry dered from the tiled geometry associated With each processing units, each unit generates its oWn region geometry processing unit; and US 8,310,487 B2 10 means for indicating With the markers in the tiled geometry the processing load on each graphics processing unit and lists When to switch betWeen tiled geometry lists from means for splitting data betWeen graphics processing units in different graphics processing cores. dependence on their processing loads. 22. The system according to claim 21, Wherein the means 25. The system according to claim 21 including means for for splitting primitive data does this on a round robin basis. buffering primitive data betWeen the means for splitting the 23. The system according to claim 21, Wherein the means primitive data and the graphics processing unit. for splitting primitive data is arranged to distribute substan 26. The system according to claim 21 including means for tially similar amounts of primitive data to each geometry buffering data from tile reference lists and the tile geometry processing unit. before providing it to the means for generating tiling data. 24. The system according to claim 21 Wherein the means for splitting primitive data comprises means for monitoring * * * * *