Deep Shading
Total Page:16
File Type:pdf, Size:1020Kb
Deep Shading Team Cucumbers Joey Chiu, Subash Chebolu, Mirkamil Mijit, Bharat Suri 1 Object Representation in Computer Graphics http://www.cmap.polytechnique.fr/~peyre/ geodesic_computations/ 2 Graphic Pipeline: Terminology 1. Local space. 2. Object space. (World space) 3. View space. 4. Clip Space. 5. Screen Space. 3 https://tayfunkayhan.wordpress.com/2018/12/16/rasterization-in-one-weekend-part-ii/ 4 https://tayfunkayhan.wordpress.com/2018/12/16/rasterization-in-one-weekend-part-ii/ 5 Phong Lighting 6 Graphic Pipeline http://web.cse.ohio-state.edu/~shen.94/5542/Site/Slides_files/hardware5542.pdf 7 Forward Shading VS Deferred Shading 8 More about Deferred Shading 9 More about Deferred Shading 10 Deferred Shading 1. Lighter Calculation 2. Heavy Memory Usage 3. No Transparent Object 11 Ray Tracing ● Completely different from GPU graphics pipeline previously mentioned ● Can capture global illumination effects ● Computationally expensive 12 Features 13 Effects ● Certain effects make rendered images appear more realistic ○ Ambient Occlusion (AO) ○ Directional Occlusion (DO) ○ Indirect Light (GI) ○ Subsurface Scattering (SSS) ○ Depth-of-Field (DOF) ○ Motion Blur (MB) ○ Image-based lighting (IBL) ○ Anti-aliasing (AAA) ● These effects are generally produced during rendering and require significant computational resources 14 Directional Occlusion (Indirect Light) ● Generalized application of Ambient Occlusion ● Accounts for the direction of light when approximating global illumination ● Screen Space Directional Occlusion (SSD0, RGS09) ○ Direction of incoming light ○ One bounce of indirect illumination ○ Minor additional computational time (compared to SSAO) ○ Avoids ray-tracing and only uses normals and geometry 15 Ambient Occlusion ● Approximates global illumination by calculating how much ambient light is “occluded” from a point by surrounding geometry ● 2 approaches of computation ○ Screen Space AO (SSAO) ■ Developed by Crytek ■ Uses pixel depth from Z Buffer ○ Ray Tracing ■ Casts rays to determine if geometry is in the way ■ Very slow (until recently) 16 17 Sub-Surface Scattering (SSS) ● Phenomenon where light passes through a translucent material and is scattered, bouncing around, before exiting at a different point ● Visible on leaves, skin, wax, etc ● Screen Space SSS (JSG09) ○ Done in screen-space vs texture space ○ Does not require a diffusion profile or irradiance map 18 Depth of Field ● Physical property specifying the range of objects that are in focus ● A realistic and often desired feature in rendered images ● Screen-Space DOF ○ Post processing with specific filters 19 Motion Blur ● Physical phenomenon that occurs when object movement is faster than shutter speed of a camera ● Similar to DOF, this feature is often desirable in rendered images to make them look more realistic ● Implementation in graphics ○ Filtering in post-processing 20 Anti-Aliasing ● Aliasing occurs when sampling rate is not high enough (Nyquist frequency) ○ Fourier transform ○ Texture magnification/minification during pixel lookup ● Depth information can be used to blur discontinuities and reduce aliasing 21 Introduction to Paper 22 Data Structure ● 61,000 pairs with a train-validation-test split of 54,000-6,000-6,000 ● Data generation ○ 1,000 base images from each of 10 scenes ○ Each base image is flipped horizontally, and vertically and rotated in steps of 90 degrees ○ 170 hours of computation on one GPU ● The base images are from a perspective camera with 50 degree FOV ● 512 x 512 px for AO ● 256 x 256 px for other effects 23 Attributes Attribute Name Space Notation Attribute Name Space Notation Position Screen Ps RGB diffuse and // Rdiff/Rspec specular colors Normals Camera/World Ns/Nw Scalar glossiness // Rgloss Depth Screen Ds Scatter // Rscatt Distance to the focal Screen Dfocal plane Direct light // L Radius of the circle of // B Direct light for diffuse // L confusion of the lens diff system Normalized direction to World Cw the camera Material parameters // R 24 Appearance AO and DO do not compute to final RGB appearance ● Output RGB radiance which is multiplied by albedo RGB vs Mono networks ● RGB networks are trained on input for all 3 channels at the same time ● Mono networks are trained on a single channel at a time 25 Network Architecture 26 Network Architecture Network architecture is based on the U-Net architecture 27 Network Architecture The network architecture is based on the U-Net architecture. A new network was created for each type of effect. ● Up to 6 “levels” of down sampling and up sampling ● Range from 512x512 px to 16x16 px throughout network ● Downsampling is a 2x2 mean-pooling ● Upsampling is bilinear upsampling ● Every level effectively doubles the feature maps ● Every level effectively halfs the dimensions of the image ● Activation function is always LeakyReLU 28 Network Architecture The architecture uses grouped convolutions which is great for parallelizing Regular Convolution Grouped Convolution 29 Training Caffe API was used to train these Networks ● Input has 3 to 18 channels ● ADADELTA optimizer with momentum of 0.9 ● The loss function is SSIM (Structural Similarity Index) ○ Ranges from -1 to 1 ○ Tiled into 8x8 px’s and SSIM’s combined 30 Analysis 31 Analysis This section discusses the following: ● Address the shortcomings using Typical Artifacts ● Structural Choices and Trade-offs in Training 32 Visual Analysis Their method produced some Artifacts which have been used to address the shortcomings. They use visual analysis of the following artifacts: ● Typical Artifacts ● Range of Values ● Effect Radius ● Internal Camera Parameters 33 Visual Analysis (Typical Artifacts) ● Light transport can be highly complex and mapping attributes to shading becomes ambiguous ● Patterns resemble correct shading but are inconsistent with the laws of optics ● Capturing high frequencies is a challenge and the network needs enough capacity and training ● Results might over-blur as seen in the figure but are better than ringing and Monte Carlo Noise as seen in man-made shaders Source: http://deep-shading-datasets.mpi-inf.mpg.de/deep-shading.pdf 34 Visual Analysis (Effect Radius) ● Screen space shading is faded out based on a distance term ● Training is done in one resolution but later applied to different resolutions ● As the resolution changes, the effect radius should also change accordingly ● Effect radius is not an input to the network but fixed in the training data ● It can still be adjusted at test time by scaling the attributes determining the spatial scale of the effect Source: http://deep-shading-datasets.mpi-inf.mpg.de/deep-shading.pdf 35 Visual Analysis (Internal Camera Parameters) ● It is unclear how the network performs on framebuffers rendered using a FOV different from the training data ● The figure investigates the influence of a FOV mismatch on image quality ● To keep the image content as similar as possible while changing FOV, they performed dolly-zoom ● The network is robust to FOV mismatches as seen from the minimal fluctuation of error in the figure Source: http://deep-shading-datasets.mpi-inf.mpg.de/deep-shading.pdf 36 Network Structure This section investigates the effect of structural parameters of the CNN architecture on its ability in terms of expressiveness and computational demand. They studied two modes of variation for this: ● Varying Spatial Extent of the Kernels ● Number of Kernels on the First Level The goal was to find the smallest network with adequate learning capacity, that generalizes well on unseen data 37 Spatial Kernel Size and Initial Number of Kernels There was no noticeable difference in the performance for different sized kernels. 3x3 and 5x5 kernels had the same capacity but the smaller kernel was faster to train so that was finally used. The network seems to lose some capacity when 4 kernels are used instead of 8 38 Choice of Loss Function 39 Choice of Loss Function ● Loss function has a significant impact on how Deep Shading performs ● Same network structure was trained using the common L1 and L2 losses as well as the perceptual SSIM metric ● Combination of Structural Similarity metric with L1 and L2 have also been used ● L1 and L2 alone are prone to Halo effects ● L1+SSIM and SSIM alone produce the best results overall 40 Training Data Trade-offs Ideally, a training set consists of a vast collection of images with no imperfections from Monte Carlo noise However, to produce such training data is typically impractical, and so trade-offs must be considered when the network is trained These are the trade-offs that were considered when the network was trained: ● Amount of Noise vs Image Set Size ● Scene Diversity 41 Noise vs Image Set Size ● Time spent to generate a training set is roughly linear ● Twice as many images can be rendered that use half as many samples per pixel ● Generally, a larger number of views per scene is more desirable than noiseless images ● Excessive noise also hinders with the network training 42 Scene Diversity ● Diversity of scenes also has a major influence on the quality of trained Deep Shaders ● Directional Occlusion was sensitive to the scene diversity and was used to investigate this trade-off ● Increasing the number of scenes from 1 to 5 results in a 5% increase in performance ● Going from 5 scenes to 10 only has a 1% increase in performance ● In terms of DO, the difference in loss visually translates to a more correct placement of darkening 43 Results ● SSIM: Metric to evaluate similarities between images ○ -1 to 1 range, 1 is best ○ Part of perceptual loss 44 Results ● Ambient