CELT Overview

CELT Overview

Overview of CELT Jean-Marc Valin Timothy B. Terriberry Gregory Maxwell CELT: Constrained Energy Lapped Transform ● Transform codec (MDCT, like MP3, Vorbis) – Short windows (5-22 ms), poor frequency resolution ● Explicitly code energy of each band of the signal – Coarse shape of sound preserved no matter what ● Code remaining details using vector quantization ● Variable time-frequency resolution for transients ● Now uses pitch post-filtering 2 Block Diagram Encoder Decoder Band Q1 energy band energy Pre- Post- Input Window MDCT / Q x -1 WOLA Output filter 3 residual MDCT filter Side information (period and gain) 3 "Lapped Transform" Modified DCT ● The normal DCT causes coding artifacts (sharp discontinuities) between blocks, easily audible ● The "Modified" DCT (MDCT) uses a decaying window to overlap multiple blocks – Same transform used in MP3, Vorbis, AAC, etc. – But with much smaller blocks, less overlap 4 "Constrained Energy" Critical Bands ● Group MDCT coefficients into bands approximating the critical bands (Bark scale) Bark Scale vs. CELT @ 48kHz, Frame Size=256 Bark CELT 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 Frequency (Hz) 5 "Constrained Energy" Coding Band Energy ● Most important psychoacoustic lesson learned from Vorbis: Preserve the energy in each band ● Vorbis does this implicitly with its "floor curve" ● CELT codes the energy explicitly – Coarse energy (6 dB resolution), predicted from previous frame and from previous band – Fine energy, improves resolution where we have available bits, not predicted 6 Coding Band Shape ● After normalizing, each band is represented by an N-dimensional unit vector – Point on an N-dimensional sphere – Describes "shape" of energy within the band ● Code this shape using with vector quantization – Pyramid Vector Quantizer (Fischer, 1986) ● Vectors with integer coordinates whose magnitudes sum to K 7 Psychoacoustic Tricks ● Avoiding "birdie" artifacts – K may be small, giving a sparse spectrum > 8 kHz – Use N-D rotation to “spread” pulses ● Makes the signal sound more “noisy” (less tonal) ● Inverse rotation applied in the encoder ● Avoiding "pre-echo" artifacts – When a transient is detected, split the frame and do a smaller MDCT on each piece – Interleave the results and continue as normal 8 Time-Frequency Adjustment ● Finer transient control: TF resolution switching – Apply Hadamard transform in one band over multiple blocks – Forward transform: increase frequency resolution – Inverse transform: increase time resolution ● Two TF resolutions per frame – Decision coded per band – Handles mixed tonal/transient content 9 Post-Filter ● Proposed by Raymond Chen (Broadcom) ● Noise shaping for highly harmonic contents Prefilter Postfilter 10 Questions? 11 .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    11 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us