<<

100x Evolution of Chips

Tribute to Prof. Goto

Jinjia Zhou1, Dajiang Zhou2, Satoshi Goto2 1Hosei University, Tokyo, 2Waseda University, Kitakyushu, Japan Prof. Goto has been my supervisor from 2008 to 2015. (.S -> Ph.D -> PDF) 2 Prof. S. Goto’s Video Coding Research Group

►One of the first Full-HD H.264 encoders, first to use SiS DRAM (VLSI’07 and JSSC’09) ►First 4kx2k@60fps H.264 decoder (VLSI’10) ►First 8kx4k H.264 decoder (ISSCC’12) ►First 8Kx4K H.264 (intra-frame) encoder (VLSI’12) ►First 8Kx4K H.264 ME encoder (VLSI’13 and JSSC’14) ►First 8Kx4K HEVC decoder (ISSCC’16 and JSSC’16)

3 Mass

4 : encoder and decoder

Source video Compressed data video stream Encoder Transmit. 100% ~1% Compression Video camera Channel/ ~1% Storage

Restored video data stream Decoder Receiver 100% ~1% Decompression Display device

5 Applications of video codec chips

TV conference Surveillance Small frame delay Ultra-low power High compression High Free-point view ….

Automotive

…… Enc./Dec. Chip

Home entertainment Mobile/Portable

Source of the images: http://www.artesanosdecastillalamancha.org/wp-content/uploads/2015/06/28.png 6 http://www.caradvice.com.au/67890/2011-brakes-camera-action-pedestrian-detection-automated-platooning/photos/ 8K UHDTV and free-viewpoint TV

7680

≥120fps

25~30fps 7 Video coding standards

Compression ratio ~50:1 ~100:1 ~200:1

ITU-T standards H.261 H.263 (+/++)

Joint ITU-T & H.262 H.264 H.265 MPEG standards MPEG-2 MPEG-4 AVC HEVC

MPEG standards MPEG-1 MPEG-4

1990 1995 2000 2005 2010 2013

8 High compression at high complexity

RAW H.264 (2003) HEVC (2013) 48000Mbps ~480Mbps ~240Mbps

►Recent powerful address the huge video throughput in the communication channel ►Their high compression ratio, however, is at the expense of high complexity 9 Complexity of video codecs (norm.)

1000 307.2

100 20 10

1 1

1080p/MPEG-2 4K/H.264 8K/HEVC 0.1 Complexity/ Throughput Overall complexity 10 Real-time 8K UHDTV codec systems

NHK 8K codec (2007)

NHK 8K encoder (2013)

Our target: Single chip/chipset 11 Memory bandwidth issue

►Performance bottleneck . >50GBps BW required for decoding 8K UHDTV . >100GBps BW required for encoding 8K UHDTV

►Power consumption Memory . Majority of power traffic consumed by DRAM Codec traffic

►Fabrication cost DRAM . BW determines chip pin count

12 Data dependency issue

►Video codecs exploit all kinds of data dependencies to strengthen compression . Inter-frame prediction . Intra-frame prediction . Context-adaptive entropy coding (CABAC)

►Data dependencies restricts the degree of efficient parallelism/pipelining . Power and area issues . Performance issue

13 Challenges summarized

Transform Entropy & Quant. Coding - Source Frm. Decoder Inv. Trans. & Inv. Quant. Data Computational dependencies complexity Intra Prediction Frame Output Reference Motion Frames Estimation Memory bandwidth

requirements 14 Our efforts to address the challenges Reduce memory access / Increase memory bandwidth System Bus/interface optimization, 3DLSI Processing order optimization Algorithm Embedded compression 2-D cache Architecture Reduce complexity Circuits Trade-off b/w time & quality Hardware friendliness, … Device Alleviate data dependencies Processing order optimization Evaluation Predictive execution, …

15 System integration

►FIFO vs RAM . FIFO: simple interface and flexibility . RAM: random accessibility for data reordering

►Proposed BIBO (Block-in-block- out) queues: combines benefits of the two

<16> Word write & block push

Block Word BIBO queue

►Words in a block can be written in a random order ►A block can be pushed after all words are written ►Blocks follows first-in-first-out ►Blocks can be in a variable size <17> Word read & block pull

BIBO queue

Merged

►Words in a block can be read in a random order ►A block can be pulled after all words are read ►Blocks can be pulled in different sizes as pushed <18>

Merge and split BIBO queue BIBO

►Block merging/splitting can be automated by BIBO given both word addressing and block scan follow a Z-scan order <19> Merge and split

Push BIBO queue BIBO

<20>

Merge and split BIBO queue BIBO

Pull

<21>

Merge and split BIBO queue BIBO

<22> Implemented video codec chips from Goto’s Lab.

23 Source: http://www.f.waseda.jp/goto/html/chip.html demo: 4K@FPGA, 8K@chip

24 Performance of codec VLSI chips Mpixel/s 3981 4000 144x 3000

1990 2000

1000 249 27.6 0 MIT Ours Ours NTU MIT NTT Ours ASSCC'08 ISSCC'12 VLSIC'12 VLSIC'13 ISSCC'13 VLSIC'15 ISSCC'16 H.264 decoder HEVC decoder H.264 encoder HEVC encoder 25 Shen Li

Thanks to all members who have contributed in the video codec chip design

26 Thank you! [email protected]

27