100X Evolution of Video Codec Chips

100x Evolution of Video Codec Chips Tribute to Prof. Goto Jinjia Zhou1, Dajiang Zhou2, Satoshi Goto2 1Hosei University, Tokyo, Japan 2Waseda University, Kitakyushu, Japan Prof. Goto has been my supervisor from 2008 to 2015. (M.S -> Ph.D -> PDF) 2 Prof. S. Goto’s Video Coding Research Group ►One of the first Full-HD H.264 encoders, first to use SiS DRAM (VLSI’07 and JSSC’09) ►First 4kx2k@60fps H.264 decoder (VLSI’10) ►First 8kx4k H.264 decoder (ISSCC’12) ►First 8Kx4K H.264 (intra-frame) encoder (VLSI’12) ►First 8Kx4K H.264 ME encoder (VLSI’13 and JSSC’14) ►First 8Kx4K HEVC decoder (ISSCC’16 and JSSC’16) 3 Mass media 4 Video codec: encoder and decoder Source video Compressed data video stream Encoder Transmit. 100% ~1% Compression Video camera Channel/ ~1% Storage Restored video data stream Decoder Receiver 100% ~1% Decompression Display device 5 Applications of video codec chips TV conference Surveillance Small frame delay Ultra-low power High compression High video quality Free-point view …. Automotive …… Enc./Dec. Chip Home entertainment Mobile/Portable Source of the images: http://www.artesanosdecastillalamancha.org/wp-content/uploads/2015/06/28.png 6 http://www.caradvice.com.au/67890/2011-brakes-camera-action-pedestrian-detection-automated-platooning/photos/ 8K UHDTV and free-viewpoint TV 7680 pixels ≥120fps 25~30fps 7 Video coding standards Compression ratio ~50:1 ~100:1 ~200:1 ITU-T standards H.261 H.263 (+/++) Joint ITU-T & H.262 H.264 H.265 MPEG standards MPEG-2 MPEG-4 AVC HEVC MPEG standards MPEG-1 MPEG-4 1990 1995 2000 2005 2010 2013 8 High compression at high complexity RAW H.264 (2003) HEVC (2013) 48000Mbps ~480Mbps ~240Mbps ►Recent powerful codecs address the huge video throughput in the communication channel ►Their high compression ratio, however, is at the expense of high complexity 9 Complexity of video codecs (norm.) 1000 307.2 100 20 10 1 1 1080p/MPEG-2 4K/H.264 8K/HEVC 0.1 Complexity/pixel Throughput Overall complexity 10 Real-time 8K UHDTV codec systems NHK 8K codec (2007) NHK 8K encoder (2013) Our target: Single chip/chipset 11 Memory bandwidth issue ►Performance bottleneck . >50GBps BW required for decoding 8K UHDTV . >100GBps BW required for encoding 8K UHDTV ►Power consumption Memory . Majority of power traffic consumed by DRAM Codec traffic ►Fabrication cost DRAM . BW determines chip pin count 12 Data dependency issue ►Video codecs exploit all kinds of data dependencies to strengthen compression . Inter-frame prediction . Intra-frame prediction . Context-adaptive entropy coding (CABAC) ►Data dependencies restricts the degree of efficient parallelism/pipelining . Power and area issues . Performance issue 13 Challenges summarized Transform Entropy & Quant. Coding - Source Frm. Decoder Inv. Trans. & Inv. Quant. Data Computational dependencies complexity Deblocking Filter Intra Prediction Frame Output Motion Compensation Reference Motion Frames Estimation Memory bandwidth requirements 14 Our efforts to address the challenges Reduce memory access / Increase memory bandwidth System Bus/interface optimization, 3DLSI Processing order optimization Algorithm Embedded compression 2-D cache Architecture Reduce complexity Circuits Trade-off b/w time & quality Hardware friendliness, … Device Alleviate data dependencies Processing order optimization Evaluation Predictive execution, … 15 System integration ►FIFO vs RAM . FIFO: simple interface and flexibility . RAM: random accessibility for data reordering ►Proposed BIBO (Block-in-block- out) queues: combines benefits of the two <16> Word write & block push Block Word BIBO queue ►Words in a block can be written in a random order ►A block can be pushed after all words are written ►Blocks follows first-in-first-out ►Blocks can be in a variable size <17> Word read & block pull BIBO queue Merged ►Words in a block can be read in a random order ►A block can be pulled after all words are read ►Blocks can be pulled in different sizes as pushed <18> Merge and split BIBO queue ►Block merging/splitting can be automated by BIBO given both word addressing and block scan follow a Z-scan order <19> Merge and split Push BIBO queue <20> Merge and split BIBO queue Pull <21> Merge and split BIBO BIBO queue <22> Implemented video codec chips from Goto’s Lab. 23 Source: http://www.f.waseda.jp/goto/html/chip.html Video decoder demo: 4K@FPGA, 8K@chip 24 Performance of codec VLSI chips Mpixel/s 3981 4000 144x 3000 1990 2000 1000 249 27.6 0 MIT Ours Ours NTU MIT NTT Ours ASSCC'08 ISSCC'12 VLSIC'12 VLSIC'13 ISSCC'13 VLSIC'15 ISSCC'16 H.264 decoder HEVC decoder H.264 encoder HEVC encoder 25 Shen Li Thanks to all members who have contributed in the video codec chip design 26 Thank you! [email protected] 27.

Load more