100x Evolution of Video Codec Chips
Tribute to Prof. Goto
Jinjia Zhou1, Dajiang Zhou2, Satoshi Goto2 1Hosei University, Tokyo, Japan 2Waseda University, Kitakyushu, Japan Prof. Goto has been my supervisor from 2008 to 2015. (M.S -> Ph.D -> PDF) 2 Prof. S. Goto’s Video Coding Research Group
►One of the first Full-HD H.264 encoders, first to use SiS DRAM (VLSI’07 and JSSC’09) ►First 4kx2k@60fps H.264 decoder (VLSI’10) ►First 8kx4k H.264 decoder (ISSCC’12) ►First 8Kx4K H.264 (intra-frame) encoder (VLSI’12) ►First 8Kx4K H.264 ME encoder (VLSI’13 and JSSC’14) ►First 8Kx4K HEVC decoder (ISSCC’16 and JSSC’16)
3 Mass media
4 Video codec: encoder and decoder
Source video Compressed data video stream Encoder Transmit. 100% ~1% Compression Video camera Channel/ ~1% Storage
Restored video data stream Decoder Receiver 100% ~1% Decompression Display device
5 Applications of video codec chips
TV conference Surveillance Small frame delay Ultra-low power High compression High video quality Free-point view ….
Automotive
…… Enc./Dec. Chip
Home entertainment Mobile/Portable
Source of the images: http://www.artesanosdecastillalamancha.org/wp-content/uploads/2015/06/28.png 6 http://www.caradvice.com.au/67890/2011-brakes-camera-action-pedestrian-detection-automated-platooning/photos/ 8K UHDTV and free-viewpoint TV
7680 pixels
≥120fps
25~30fps 7 Video coding standards
Compression ratio ~50:1 ~100:1 ~200:1
ITU-T standards H.261 H.263 (+/++)
Joint ITU-T & H.262 H.264 H.265 MPEG standards MPEG-2 MPEG-4 AVC HEVC
MPEG standards MPEG-1 MPEG-4
1990 1995 2000 2005 2010 2013
8 High compression at high complexity
RAW H.264 (2003) HEVC (2013) 48000Mbps ~480Mbps ~240Mbps
►Recent powerful codecs address the huge video throughput in the communication channel ►Their high compression ratio, however, is at the expense of high complexity 9 Complexity of video codecs (norm.)
1000 307.2
100 20 10
1 1
1080p/MPEG-2 4K/H.264 8K/HEVC 0.1 Complexity/pixel Throughput Overall complexity 10 Real-time 8K UHDTV codec systems
NHK 8K codec (2007)
NHK 8K encoder (2013)
Our target: Single chip/chipset 11 Memory bandwidth issue
►Performance bottleneck . >50GBps BW required for decoding 8K UHDTV . >100GBps BW required for encoding 8K UHDTV
►Power consumption Memory . Majority of power traffic consumed by DRAM Codec traffic
►Fabrication cost DRAM . BW determines chip pin count
12 Data dependency issue
►Video codecs exploit all kinds of data dependencies to strengthen compression . Inter-frame prediction . Intra-frame prediction . Context-adaptive entropy coding (CABAC)
►Data dependencies restricts the degree of efficient parallelism/pipelining . Power and area issues . Performance issue
13 Challenges summarized
Transform Entropy & Quant. Coding - Source Frm. Decoder Inv. Trans. & Inv. Quant. Data Computational dependencies complexity Deblocking Filter Intra Prediction Frame Output Motion Compensation Reference Motion Frames Estimation Memory bandwidth
requirements 14 Our efforts to address the challenges Reduce memory access / Increase memory bandwidth System Bus/interface optimization, 3DLSI Processing order optimization Algorithm Embedded compression 2-D cache Architecture Reduce complexity Circuits Trade-off b/w time & quality Hardware friendliness, … Device Alleviate data dependencies Processing order optimization Evaluation Predictive execution, …
15 System integration
►FIFO vs RAM . FIFO: simple interface and flexibility . RAM: random accessibility for data reordering
►Proposed BIBO (Block-in-block- out) queues: combines benefits of the two
<16> Word write & block push
Block Word BIBO queue
►Words in a block can be written in a random order ►A block can be pushed after all words are written ►Blocks follows first-in-first-out ►Blocks can be in a variable size <17> Word read & block pull
BIBO queue
Merged
►Words in a block can be read in a random order ►A block can be pulled after all words are read ►Blocks can be pulled in different sizes as pushed <18>
Merge and split BIBO queue BIBO
►Block merging/splitting can be automated by BIBO given both word addressing and block scan follow a Z-scan order <19> Merge and split
Push BIBO queue BIBO
<20>
Merge and split BIBO queue BIBO
Pull
<21>
Merge and split BIBO queue BIBO
<22> Implemented video codec chips from Goto’s Lab.
23 Source: http://www.f.waseda.jp/goto/html/chip.html Video decoder demo: 4K@FPGA, 8K@chip
24 Performance of codec VLSI chips Mpixel/s 3981 4000 144x 3000
1990 2000
1000 249 27.6 0 MIT Ours Ours NTU MIT NTT Ours ASSCC'08 ISSCC'12 VLSIC'12 VLSIC'13 ISSCC'13 VLSIC'15 ISSCC'16 H.264 decoder HEVC decoder H.264 encoder HEVC encoder 25 Shen Li
Thanks to all members who have contributed in the video codec chip design
26 Thank you! [email protected]
27