glTF 2.0 Quick Reference Guide Page 1

TM Concepts The conceptual relationships between the top-level glTF - what the ? elements of a glTF asset are shown here:

An overview of the basics of For glTF scene the GL Transmission Format 2.0! node glTF was designed and specified by the Khronos ® Group, for the efficient transfer of 3D content over networks. camera mesh skin Document Title: The core of glTF is a JSON file that describes the structure and composition of a scene containing glTF 2.0 Quick Reference Guide material accessor animation Copyright Notice: 3D models. The top-level elements of this file are: Rev. 0119 scenes, nodes Copyright URL: Basic structure of the scene texture bufferView www.khronos.org/gltf cameras View configurations for the scene sampler image buffer

meshes Geometry of 3D objects Binary data references The images and buffers of a glTF asset may refer to buffers, bufferViews, accessors external files that contain the data that are required Data references and data layout descriptions for rendering the 3D content: materials "buffers": [ The buffers refer to binary Definitions of how objects should be rendered { files (.BIN) that contain "uri": "buffer01.bin" geometry- or animation textures, images, samplers "byteLength": 102040, data. Surface appearance of objects } ], The images refer to image skins "images": [ { files (PNG, JPG...) that Information for vertex skinning "uri": "image01.png" contain texture data for } the models. animations ], Changes of properties over time The data is referred to via URIs, but can also be These elements are contained in arrays. References included directly in the JSON using data URIs. The between the objects are established by using their data URI defines the MIME type, and contains the indices to look up the objects in the arrays. data as a base64 encoded string:

It is also possible to store the whole asset in a single Buffer data: binary glTF file. In this case, the JSON data is stored "data:application/gltf-buffer;base64,AAABAAIAAgA..." as a string, followed by the binary data of buffers or images. Image data (PNG): "..."

Further resources Version 2.0b glTF version 2.0 The Khronos glTF landing page: The Khronos glTF GitHub repository: This overview is https://www.khronos.org/gltf https://github.com/KhronosGroup/glTF non-normative! Feedback: [email protected]

glTF and the glTF logo are trademarks of the Inc. ©2016-2019 Marco Hutter www.marco-hutter.de Rev. 0119 www.khronos.org/gltf glTF 2.0 Quick Reference Guide Page 2 Edit the scenes, nodes meshes The glTF JSON may contain scenes (with an optional The meshes may contain multiple mesh primitives. background color default scene). Each scene can contain an array of These refer to the geometry data that is required indices of nodes. for rendering the mesh. only on the "scene": 0, Each of the nodes can "meshes": [ Each mesh primitive has a "scenes": [ contain an array of indices { rendering mode, which is Master A { of its children. This allows "primitives": [ a constant indicating whether "nodes": [ 0, 1, 2 ] { modeling a simple scene it should be rendered as Page! } "mode": 4, ], hierarchy: "indices": 0, POINTS, LINES, or TRIANGLES. "nodes": [ "attributes": { The primitive also refers to { scene 0 "POSITION": 1, indices and the attributes "children": [ 3, 4 ], "NORMAL": 2 of the vertices, using the Field values are in ... }, indices of the accessors for }, node 0 node 1 node 2 "material": 2 this area on page 1. { ... }, } this data. The material that { ... }, ] should be used for rendering { ... }, } node 3 node 4 is also given, by the index of { ... }, ], ... the material. ], Each attribute is defined by mapping the attribute "nodes": [ A node may contain a local name to the index of the accessor that contains the { transform. This can be given as attribute data. This data will be used as the vertex "matrix": [ a column-major matrix array, attributes when rendering the mesh. The attributes 1,0,0,0, or with separate translation, may, for example, define the and the 0,1,0,0, POSITION 0,0,1,0, rotation and scale properties, NORMAL of the vertices: 5,6,7,1 where the rotation is given as a ], quaternion. The local transform POSITION 1.2 -2.6 4.3 2.7 -1.8 6.2 ...... matrix is then computed as }, NORMAL 0.0 1.0 0.0 0.71 0.71 0.0 ... { M = T * R * S "translation": where , and are the matrices T R S (2,7, -1.8, 6.2) [ 0,0,0 ], that are created from the (0.71, 0.71, 0.0) "rotation": translation, rotation and scale. [ 0,0,0,1 ], The global transform of a node Position: (1.2, -2.6, 4.3) "scale": Normal: (0.0, 1.0, 0.0) (... ) [ 1,1,1 ] is given by the product of all local (... ) ... transforms on the path from the }, root to the respective node. A mesh may define multiple morph targets. Such ] a morph target describes a deformation of the Each node may refer to a mesh or "nodes": [ original mesh. { a camera, using indices that point { "mesh": 4, To define a mesh with morph into the meshes and cameras arrays. "primitives": [ ... targets, each mesh primitive These elements are then attached { }, to these nodes. During rendering, ... can contain an array of { "targets": [ targets. These are dictionaries instances of these elements are "camera": 2, { that map names of attributes created and transformed with the ... "POSITION": 11, to the indices of accessors that global transform of the node. }, "NORMAL": 13 ] }, contain the displacements of The translation, rotation and scale properties of a { the geometry for the target. node may also be the target of an animation: The "POSITION": 21, "NORMAL": 23 The mesh may also contain an animation then describes how one property } array of weights that define changes over time. The attached objects will move ] the contribution of each morph } accordingly, allowing to model moving objects or target to the final, rendered camera flights. ], "weights": [0, 0.5] state of the mesh. Nodes are also used in vertex skinning: A node } hierarchy can define the skeleton of an animated Combining multiple morph targets with different character. The node then refers to a mesh and to weights allows, for example, modeling different a skin. The skin contains further information about facial expressions of a character: The weights can how the mesh is deformed based on the current be modified with an animation, to interpolate skeleton pose. between different states of the geometry.

Rev. 0119 www.khronos.org/gltf glTF 2.0 Quick Reference Guide Page 3 buffers, bufferViews, accessors Sparse accessors The buffers contain the data that is used for the geometry of 3D models, animations, and skinning. When only few elements of an accessor differ from The bufferViews add structural information to this a default value (which is often the case for morph data. The accessors define the exact type and targets), then the data can be given in a very layout of the data. compact form using a sparse data description: Each of the buffers refers "buffers": [ The accessor defines the { to a binary data file, using "accessors": [ { type of the data (here, "byteLength": 35, a URI. It is the source of "type": "SCALAR", scalar float values), and "uri": "buffer01.bin" one block of raw data with } "componentType": 5126, the total element count. ], the given byteLength. "count": 10, The sparse data block bufferViews Each of the "sparse": { contains the count of "bufferViews": [ refers to one buffer. It "count": 4, { sparse data elements. "buffer": 0, has a byteOffset and a "values": { The values refer to the "byteOffset": 4, byteLength, defining the "bufferView": 2, "byteLength": 28, part of the buffer that }, bufferView that contains "byteStride": 12, belongs to the bufferView, the sparse data values. "target": 34963 and an optional OpenGL "indices": { } "bufferView": 1, The target indices for ], buffer target. "componentType": 5123 the sparse data values The accessors define how } are defined with a "accessors": [ } { the data of a bufferView is } reference to a "bufferView": 0, interpreted. They may ] bufferView and the "byteOffset": 4, define an additional componentType. "type": "VEC2", "componentType": 5126, byteOffset referring to the "count": 2, start of the bufferView, The values are written into the final accessor data, "min" : [0.1, 0.2] and contain information at the positions that are given by the indices: "max" : [0.9, 0.8] about the type and layout } sparse (count=4) of the bufferView data: ] 4.3 9.1 5.2 2.7 values The data may, for example, be defined as 2D vectors 1 4 5 7 indices of floating point values when the type is "VEC2" and the componentType is GL_FLOAT (5126). The range of all values is stored in the min and max property. The data of multiple accessors may be interleaved inside a bufferView. In this case, the bufferView will 0123456789 have a byteStride property that says how many 0.04.3 0.0 0.0 9.1 5.2 0.02.7 0.0 0.0 bytes are between the start of one element of an Final accessor data with 10 float values accessor, and the start of the next. The buffer data is read from a file:

buffer 0 4 8 12 16 20 24 28 32 byteLength = 35 This data may, for example, be used by a mesh primitive, The bufferView defines a segment of the buffer data: to access 2D texture bufferView 4 8 12 16 20 24 28 32 coordinates: The data of the byteOffset = 4 bufferView may be bound byteLength = 28 as an OpenGL buffer, using The accessor defines an additional offset: glBindBuffer. Then, the accessor 8 12 16 20 24 28 32 properties of the accessor byteOffset = 4 may be used to define this The bufferView defines a stride between the elements: buffer as vertex attribute 8 12 16 20 24 28 data, by passing them to byteStride = 12 glVertexAttribPointer when the bufferView buffer accessor The defines that the elements are 2D float vectors: is bound. type = "VEC2" componentType = GL_FLOAT x0 y0 x1 y1 Rev. 0119 www.khronos.org/gltf glTF 2.0 Quick Reference Guide Page 4 Edit the materials Each mesh primitive may refer to one of the materials that are 1.0 background color contained in a glTF asset. The materials describe how an object should be rendered, based on physical material properties. This 0.75 only on the allows to apply Physically Based Rendering (PBR) techniques,

to make sure that the appearance of the rendered object is 0.5 Master A consistent among all renderers. The default material model is the Metallic-Roughness-Model. 0.25

Page! Roughness Values between 0.0 and 1.0 are used to describe how much the material characteristics resemble that of a metal, and how rough 0.0 the surface of the object is. These properties may either be Field values are in given as individual values that apply to the whole object, or be 0.0 0.25 0.5 0.75 1.0 this area on page 1. read from textures. Metal "materials": [ The properties that define a material in the Metallic-Roughness-Model { are summarized in the pbrMetallicRoughness object: "pbrMetallicRoughness": { "baseColorTexture": { The baseColorTexture is the main texture that will be applied to the "index": 1, object. The baseColorFactor contains scaling factors for the red, green, "texCoord": 1 blue and alpha component of the color. If no texture is used, these }, "baseColorFactor": values will define the color of the whole object. [ 1.0, 0.75, 0.35, 1.0 ], The metallicRoughnessTexture contains the metalness value in "metallicRoughnessTexture": { the "blue" color channel, and the roughness value in the "green" color "index": 5, channel. The metallicFactor and roughnessFactor are multiplied "texCoord": 1 }, with these values. If no texture is given, then these factors define "metallicFactor": 1.0, the reflection characteristics for the whole object. "roughnessFactor": 0.0, } In addition to the properties that are defined via the Metallic-Roughness- "normalTexture": { Model, the material may contain other properties that affect the object "scale": 0.8, appearance: "index": 2, "texCoord": 1 The normalTexture refers to a texture that contains tangent-space }, normal information, and a scale factor that will be applied to these "occlusionTexture": { normals. "strength": 0.9, "index": 4, The occlusionTexture refers to a texture that defines areas of the "texCoord": 1 surface that are occluded from light, and thus rendered darker. This }, information is contained in the "red" channel of the texture. The "emissiveTexture": { strength "index": 3, occlusion is a scaling factor to be applied to these values. "texCoord": 1 The emissiveTexture refers to a texture that may be used to }, illuminate parts of the object surface: It defines the color of the "emissiveFactor": [0.4, 0.8, 0.6] light that is emitted from the surface. The emissiveFactor contains } scaling factors for the red, green and blue components of this texture. ],

Material properties in textures The texture references in a material always "meshes": [ "materials": [ contain the index of the texture. They may { ... also contain the texCoord set index. This "primitives": [ { is the number that determines the { "name": "brushed gold", attribute of the rendered "material": 2, "pbrMetallicRoughness": { TEXCOORD_ "attributes": { "baseColorFactor": mesh primitive that contains the texture "NORMAL": 3, [1,1,1,1], coordinates for this texture, with 0 being "POSITION": 1, "baseColorTexture": { the default. "TEXCOORD_0": 2, "index" : 1, "TEXCOORD_1": 5 "texCoord" : 1 "textures": [ }, }, ... } "metallicFactor": 1.0, { ] "roughnessFactor": 1.0 "source": 4, } } "sampler": 2 ], } } ], ], Rev. 0119 www.khronos.org/gltf glTF 2.0 Quick Reference Guide Page 5 cameras skins Each of the nodes may refer to one of the cameras A glTF asset may contain the information that is that are defined in the glTF asset. necessary to perform vertex skinning. With vertex skinning, it is possible to let the vertices of a mesh "cameras": [ be influenced by the bones of a skeleton, based on { There are two types of "type": "perspective", its current pose. cameras: perspective "perspective": { A node that refers to a mesh and orthographic "nodes": [ "aspectRatio": 1.5, { may also refer to a skin. "yfov": 0.65, ones, and they define "name" : "skins": [ "zfar": 100, the projection matrix. "Skinned mesh node", { "znear": 0.01 "mesh": 0 "inverseBindMatrices": 12, } "skin": 0, "joints": [ 1, 2, 3 ... ] }, The value for the far }, } { clipping plane distance ... ], "type": "orthographic", of a perspective camera, { "orthographic": { zfar, is optional. When "name": "Torso", The skins contain an array "xmag": 1.0, it is omitted, the camera "children": of joints, which are the "ymag": 1.0, [ 2, 3, 4, 5, 6 ] indices of nodes that define "zfar": 100, uses a special projection "rotation": [...], the skeleton hierarchy, and "znear": 0.01 matrix for infinite "scale": [ ...], } projections. "translation": [...] the inverseBindMatrices, } }, which is a a reference to an ] ... accessor that contains one { matrix for each joint. When one of the nodes refers to a camera, then "name": "LegL", an instance of this camera is created. The camera "children": [ 7 ], The skeleton hierarchy is matrix of this instance is given by the global ... modeled with nodes, just transform matrix of the node. }, ... like the scene structure: { Each joint node may have a textures, images, samplers "name": "FootL", local transform and an array ... of children, and the "bones" The textures contain information about textures }, of the skeleton are given that may be applied to rendered objects: Textures ... ], implicitly, as the connections are referred to by materials to define the basic between the joints. color of the objects, as well as physical properties that affect the object appearance. The mesh primitives of a skinned mesh contain the "textures": [ The texture consists of a Head POSITION attribute that { reference to the source of ArmR ArmL refers to the accessor for the "source": 4, the texture, which is one of "sampler": 2 vertex positions, and two } the images of the asset, and special attributes that are ... a reference to a sampler. Torso required for skinning: ], A JOINTS_0 and a WEIGHTS_0 The images define the image "images": [ attribute, each referring to LegR LegL ... data used for the texture. an accessor. { This data can be given via The JOINTS_0 attribute data "uri": "file01.png" URI a that is the location of contains the indices of the } an image file, or by a { joints that should affect the bufferView "bufferView": 3, reference to a vertex. "mimeType" : and a MIME type that FootR FootL The attribute data "image/jpeg" defines the type of the image WEIGHTS_0 "meshes": [ } data that is stored in the defines the weights indicating ], { how strongly the joint should buffer view. "primitives": [ { influence the vertex. "samplers": [ The samplers describe the "attributes": { From this information, the ... wrapping and scaling of "POSITION": 0, { skinning matrix can be textures. (The constant "JOINTS_0": 1, "magFilter": 9729, computed. values correspond to "WEIGHTS_0": 2 "minFilter": 9987, ... "wrapS": 10497, OpenGL constants that }, This is explained in detail in "wrapT": 10497 can directly be passed to ] } "Computing the skinning glTexParameter). } }, ], matrix".

Rev. 0119 www.khronos.org/gltf glTF 2.0 Quick Reference Guide Page 6 Edit the Computing the skinning matrix The skinning matrix describes how the vertices of a mesh are transformed based on the current pose of a background color skeleton. The skinning matrix is a weighted combination of joint matrices. only on the Computing the joint matrices The skin refers to the inverseBindMatrices. This For each node whose index appears in the joints of Master A is an accessor which contains one inverse bind the skin, a global transform matrix can be computed. matrix for each joint. Each of these matrices It transforms the mesh from the local space of the Page! transforms the mesh into the local space of the joint, based on the current global transform of the joint. joint, and is called globalJointTransform. inverseBindMatrix[1] From these matrices, a jointMatrix may be Field values are in joint2 computed for each joint: joint2 this area on page 1. jointMatrix[j] = inverse(globalTransform) * joint1 globalJointTransform[j] * joint1 inverseBindMatrix[j]; Any global transform of the node that contains the mesh and the skin is cancelled out by joint0 joint0 pre-multiplying the joint matrix with the inverse of this transform. For implementations based on OpenGL or WebGL, the jointMatrix array will be passed to the vertex shader as a uniform. globalJointTransform[1]

Combining the joint matrices to create the skinning matrix The primitives of a skinned mesh contain the POSITION, Vertex Shader JOINT and WEIGHT attributes, referring to accessors. uniform mat4 u_jointMatrix[12]; These accessors contain one element for each vertex: attribute vec4 a_position; attribute vec4 a_joint; POSITION JOINTS_0 WEIGHTS_0 attribute vec4 a_weight; vertex 0: px py pz pw j0 j1 j2 j3 w0 w1 w2 w3 ...... void main(void) { ... vertex n: px py pz pw j0 j1 j2 j3 w0 w1 w2 w3 mat4 skinMatrix = a_weight.x * u_jointMatrix[int(a_joint.x)] + The data of these accessors is passed as attributes to a_weight.y * u_jointMatrix[int(a_joint.y)] + the vertex shader, together with the array. a_weight.z * u_jointMatrix[int(a_joint.z)] + jointMatrix a_weight.w * u_jointMatrix[int(a_joint.w)]; In the vertex shader, the skinMatrix is computed. It is gl_Position = a linear combination of the joint matrices whose indices modelViewProjection * skinMatrix * position; } are contained in the JOINTS_0 attribute, weighted with the WEIGHTS_0 values: a_weight.x a_weight.y a_joint.x a_joint.y skinMatrix = 1.0 * jointMatrix[1] + 0.0 * jointMatrix[0] +... The skinMatrix transforms the skinMatrix = 0.75 * jointMatrix[1] + 0.25 * jointMatrix[0] +... joint2 vertices based on the skeleton pose, skinMatrix = 0.5 * jointMatrix[1] + 0.5 * jointMatrix[0] +... before they are transformed with joint1 skinMatrix = 0.25 * jointMatrix[1] + 0.75 * jointMatrix[0] +... the model-view- perspective matrix. joint0 skinMatrix = 0.0 * jointMatrix[1] + 1.0 * jointMatrix[0] +...

Wiki page about skinning in COLLADA: https://www.khronos.org/collada/wiki/Skinning (The vertex skinning in COLLADA Section 4-7 in the COLLADA specification: https://www.khronos.org/files/collada_spec_1_5.pdf is similar to that in glTF) Rev. 0119 www.khronos.org/gltf glTF 2.0 Quick Reference Guide Page 7 animations A glTF asset can contain animations. An animation can be applied to the properties of a node that define the local transform of the node, or to the weights for the morph targets.

"animations": [ Each animation consists of two elements: An array of channels and an { array of samplers. "channels": [ Each channel defines the target of the animation. This target usually { "target": { refers to a node, using the index of this node, and to a path, which "node": 1, is the name of the animated property. The path may be "translation", "path": "translation" "rotation" or "scale", affecting the local transform of the node, or }, "sampler": 0 "weights", in order to animate the weights of the morph targets of } the meshes that are referred to by the node. The channel also refers ], to a sampler, which summarizes the actual animation data. "samplers": [ { A sampler refers to the input and output data, using the indices of "input": 4, accessors that provide the data. The input refers to an accessor with "interpolation": "LINEAR", scalar floating-point values, which are the times of the key frames of "output": 5 the animation. The output refers to an accessor that contains the } ] values for the animated property at the respective key frames. The } sampler also defines an interpolation mode for the animation, which ] may be "LINEAR", "STEP", or "CUBICSPLINE". Animation samplers During the animation, a "global" animation time (in seconds) is advanced. 0 1 2 3 4 5 6 Global time:

Current time: 1.2 The data of the input accessor The sampler looks up the key frames for of the animation sampler, 0.0 0.8 1.6 2.4 3.2 the current time, in the input data. containing the key frame times

The data of the accessor The corresponding values of the output output 10.0 14.0 18.0 24.0 31.0 of the animation sampler, data are read, and interpolated based on containing the key frame values 5.0 3.0 1.0 -1.0 -3.0 the interpolation mode of the sampler. for the animated property -5.0 -2.0 1.0 4.0 7.0 16.0 The interpolated value is forwarded 2.0 to the animation channel target. -0.5 Animation channel targets Animating the weights for the morph targets that The interpolated value that is provided by an are defined for the primitives of a mesh that is animation sampler may be applied to different attached to a node: animation channel targets. Original mesh primitive attribute translation Animating the of a node: Displacement for "POSITION" Displacement for translation=[2, 0, 0] translation=[3, 2, 0] "POSITION" from "POSITION" from morph target 0: morph target 1:

Animating the rotation of a skeleton node of a skin:

rotation= rotation= Rendering [0.0, 0.0, 0.0, 1.0] [0.0, 0.0, 0.38, 0.92]

weights= weights= [0.5,0.0] [0.0,0.5]

Rev. 0119 www.khronos.org/gltf glTF 2.0 Quick Reference Guide Page 8 Edit the Binary glTF files In the standard glTF format, there are two options To overcome these drawbacks, there is the option background color for including external binary resources like buffer to combine the glTF JSON and the binary data into data and textures: They may be referenced via a single binary glTF file. This is a little-endian file, only on the URIs, or embedded in the JSON part of the glTF with the extension ".glb". It contains a header, using data URIs. When they are referenced via URIs, which gives basic information about the version Master A then each external resource implies a new and structure of the data, and one or more download request. When they are embedded as chunks that contain the actual data. The first Page! data URIs, the base 64 encoding of the binary data chunk always contains the JSON data. The will increase the file size considerably. remaining chunks contain the binary data. 12-byte header chunk 0 (JSON) chunk 1 (Binary Buffer) Field values are in magic version length chunkLength chunkType chunkData chunkLength chunkType chunkData ... this area on page 1. uint32 uint32 uint32 uint32 uint32 uchar[] uint32 uint32 uchar[]

The magic entry has the value 0x46546C67, The chunkLength is the length of the chunkData, in bytes which is the ASCII string "glTF".This is used to identify the data as a binary glTF The chunkType value defines what type of data is contained in the chunkData. It may be 0x4E4F534A, which is the ASCII string "JSON", for JSON data, or The version defines the file format version. 0x004E4942, which is the ASCII string "BIN", for binary data. The version described here is version 2 The chunkData contains the actual data of the chunk. This may be The length is the total length of the file, in bytes the ASCII representation of the JSON data, or binary buffer data. Extensions The glTF format allows extensions to add new "textures" : [ Extensions allow adding { arbitrary objects in the functionality, or to simplify the definition of ... commonly used properties. "extensions" : { extensions property of "KHR_lights_common" : { other objects. When an extension is used "extensionsUsed" : [ "lightSource" : true, The name of such an in a glTF asset, it has to be "KHR_lights_common", }, object is the same as the listed in the top-level "CUSTOM_EXTENSION" "CUSTOM_EXTENSION" : { name of the extension, extensionsUsed property. ] "customProperty" : "customValue" and it may contain The extensionsRequired "extensionsRequired" : [ } further, extension-specific property lists the extensions "KHR_lights_common" } properties. that are strictly required to ] } properly load the asset. ] Existing extensions The following extensions are developed and maintained on the Khronos GitHub repository: Specular-Glossiness Materials https://github.com/KhronosGroup/glTF/tree/master/extensions/2.0/Khronos/KHR_materials_pbrSpecularGlossiness This extension is an alternative to the default Metallic-Roughness material model: It allows to define the material properties based on specular and glossiness values. Unlit Materials https://github.com/KhronosGroup/glTF/tree/master/extensions/2.0/Khronos/KHR_materials_unlit This extension allows the definition of materials for which no physically based lighting computations should be performed. Punctual Lights https://github.com/KhronosGroup/glTF/tree/master/extensions/2.0/Khronos/KHR_lights_punctual This extension allows adding different types of lights to the scene hierarchy. This refers to point lights, spot lights and directional lights. The lights can be attached to the nodes of the scene hierarchy. WebGL Rendering Techniques https://github.com/KhronosGroup/glTF/tree/master/extensions/2.0/Khronos/KHR_techniques_webgl With this extension, it is possible to define GLSL shaders that should be used for rendering the glTF asset in OpenGL or WebGL Texture transforms https://github.com/KhronosGroup/glTF/tree/master/extensions/2.0/Khronos/KHR_texture_transform This extension allows defining offset, rotation, and scaling for textures, so that multiple textures can be combined in order to create a texture atlas

Rev. 0119 www.khronos.org/gltf