: Building a Next-Generation from Unconventional Technology Jean-Marc Valin, Timothy B. Terriberry, Nathan E. Egge, Thomas J. Daede, research Yushin Cho, Christopher Montgomery, Michael Bebenita

The Daala Project Perceptual Deringing Filter OBMC Project goals: Gain-shape vector quantization based on a Overview: Blending based on 4-8 mesh ● Royalty-free video codec spherical projection of the pyramid vector ● Conditional replacement filter ● Only double MVs for each level quantizer in N dimensions ● Replacing traditional tools with ● Directional 35-tap (5x7) separable filter ● Gain = contrast new/uncommon ones ● Decoder-side direction estimation ● ● Shape = details Exploring new ideas without constraints ● Computed on 8x8 blocks ● Number of pulses K based on gain Effort is now part of the Alliance for Open ● Strength signaled on superblocks (64x64) Media’s (AOM) AV1 codec ● Completely vectorizable (SIMD) Chroma from Luma (CfL) Techniques Conditional replacement filter Luma and chroma are highly correlated, so Filtered we can predict chroma from luma Main Daala techiques: ● Chroma shape is predicted from luma ● Lapped transforms Input 26 8 22 25 24 23 80 Average = 29 ● Code gain and sign ● Overlapped-block K=3 (5.25 bits) K=4 (6.04 bits) K=6 (7.19 bits) K=8 (8.01 bits) Replacement ● 1 0 1 1 1 1 0 Luma optionally down-sampled (4:2:0) (OBMC) mask for T = 5 Using prediction: ● Perceptual vector quantization (PVQ) Input after ● 26 25 22 25 24 23 25 Average = 24 Multi-Symbol Entropy Coder Subtracting prediction from input replacement ● Chroma from luma (CfL) prediction ● Transform input using prediction Entropy decoding is a (serial) bottleneck in ● Directional Filtering: Haar DC ● video decoding. We can reduce the cost by Use Householder reflection to align ● ● 7-tap filter along direction increasing the alphabet size and coding Multi-symbol entropy coding prediction with one axis ● fewer symbols. Daala uses alphabet sizes up ● 5-tap filter across lines (lower threshold) Deringing filter ● Introduce angle θ between input and to 16. weights: [1 2 3 (4) 3 2 1] weights: [1 1 (1) 1 1] In AV1 Considered for AV1 Not considered for AV1 prediction 80 ● N 23 Code sphere in -1 dimensions 24

Lapped Transforms ● 25 Second filter Optional no reference coding 22 AOM’s new AV1 codec based on 4x4 to 64x64 DCTs, with 4-point lapping 8 Filtered pixel ● 26 All of VP9 ()

Effective spatial ● Advantages: support Parts of Daala (Xiph.Org/Mozilla) Prediction ● No blocking artefacts ● Parts of (CISCO) Input Direction estimation: ● ● Better energy compaction θ New contributions Prediction ● Minimize error compared to directional ● Perfect reconstruction Input line averages Results Disadvantages: ● Fast, vectorizable algebraic Progress over the past 3 years ● Cannot use traditional intra prediction simplifications ● Have to use fixed size lapping (search) PVQ can take advantage of contrast masking ● Better resolution for small gain

up and left Haar DC ● Quantize companded gain is better Hierarchically code DC coefficients using ● Can be done with no signaling HQ YouTube Haar transform 4x4 Split coeffs in bands 8x8 16x16 LQ Video Compensate for lack of intra predictor Conference ● DC coded separately Jan ● Recursive subdivision

● Split per octave Threshold based on signaled global H.265 May threshold, signaled superblock adjustment, ● Split per direction Jun non-signaled block variance measurement Apr Apr Nov Nov Feb