<<

Multimedia-Systems: Compression

Prof. Dr.-Ing. Ralf Steinmetz Prof. Dr. Max Mühlhäuser

R. Steinmetz, M. Mühlhäuser MM: TU Darmstadt - Darmstadt University of Technology,

© Dept. of of Computer Science TK - Telecooperation, Tel.+49 6151 16-3709, Alexanderstr. 6, D-64283 Darmstadt, Germany, [email protected] Fax. +49 6151 16-3052 http://www.tk.informatik.tu-darmstadt.de http://www.kom.e-technik.tu-darmstadt.de

RS: TU Darmstadt - Darmstadt University of Technology, Dept. of Electrical Engineering and Information Technology, Dept. of Computer Science KOM - Industrial Process and System Communications, Tel.+49 6151 166151, Merckstr. 25, D-64283 Darmstadt, Germany, [email protected] Fax. +49 6151 166152 GMD -German National Research Center for Information Technology httc - Hessian Telemedia Technology Competence-Center e.V Scope

Contents

05A-compression.fm 1 15.March.01 Scope

Applications

Learning & Teaching Design User Interfaces Usage

Content Group Docu- Synchro- Process- Security ... Communi- ments nization

R. Steinmetz, M. Mühlhäuser ing cations Services ©

Databases Programming http://www.tk.informatik.tu-darmstadt.de http://www.kom.e-technik.tu-darmstadt.de Media-Server Operating Systems Communications

Systems Opt. Memories Quality of Service Networks

Compression Computer Archi- Image &

Basics Animation Audio Scope tectures Graphics

Contents

05A-compression.fm 2 15.March.01 Contents

1. Motivation 2. Requirements - General 3. Fundamentals - Categories 4. Source Coding 5. Entropy Coding: 6. Hybrid Coding: Basic Encoding Steps R. Steinmetz, M. Mühlhäuser © 7. JPEG

http://www.tk.informatik.tu-darmstadt.de 8. H.261 and related ITU Standards http://www.kom.e-technik.tu-darmstadt.de 9. MPEG-1 10. MPEG-2 11. MPEG-4 12. Wavelets Scope 13. Fractal Contents 14. Basic Audio and Schemes 15. Conclusion

05A-compression.fm 3 15.March.01 1. Motivation

Digital video in computing means for • Text: • 1 page with 80 char/line and 64 lines/page and 2 Byte/Char • 80x64x2x8=80kBit/page • Image: • 24 /, 512 x 512 Pixel/image • 512x512x24=6MBit/Image

R. Steinmetz, M. Mühlhäuser • Audio: © • CD-quality, samplerate44,1 kHz, 16 Bit/sample • Mono: 44,1 x 16 = 706 kBit/s http://www.tk.informatik.tu-darmstadt.de

http://www.kom.e-technik.tu-darmstadt.de Stereo: 1.412 MBit/s • Video: • full frames with 1024 x 1024 Pixel/frame, 24 Bit/Pixel, 30 frames/s 1024x1024x24x30=720MBit/s • more realistic 360 x 240 Pixel/frame = 60 MBit/s Scope Hence compression is NECESSARY Contents

05A-compression.fm 4 15.March.01 2. Requirements - General

low delay

intrinsic scalability R. Steinmetz, M. Mühlhäuser © high quality

http://www.tk.informatik.tu-darmstadt.de compression http://www.kom.e-technik.tu-darmstadt.de

low complexity (e.g., ease of decoding) Scope efficient implementation (e.g., memory req.)

Contents

05A-compression.fm 5 15.March.01 Requirements

DIALOGUE AND RETRIEVAL mode requirements: • Independence of frame size and video • Synchronization of audio, video, and other media

DIALOGUE mode requirements: • Compressionanddecompressioninreal-time (e.g. 25 frames/s) R. Steinmetz, M. Mühlhäuser

© • End-to-end delay < 150ms http://www.tk.informatik.tu-darmstadt.de http://www.kom.e-technik.tu-darmstadt.de

RETRIEVAL mode requirements: • Fast forward and backward data retrieval • Random access within 1/2 s

Scope

Contents

Software and/or hardware-assisted implementation requirements

05A-compression.fm 6 15.March.01 3. Fundamentals - Categories

entropy coding hybrid - ignoring semantics of the data coding - lossless R. Steinmetz, M. Mühlhäuser © source coding - entropy http://www.tk.informatik.tu-darmstadt.de http://www.kom.e-technik.tu-darmstadt.de - based on semantic of the data and -oftenlossy source coding channel coding

Scope - adaptation to communication channel

Contents - introduction of redundancy

05A-compression.fm 7 15.March.01 H.263

Differences of H.263 compared to H.261 • mv may point forward in time (future interframe), cf. MPEG, for video • optional PB-frames (2 combined pictures: 1 B- & 1 P-Frame) • optional overlapped block motion compensation • optional motion vector pointing outside image • half pel motion compensation (instead of full pel) • JPEG is the still picture mode

R. Steinmetz, M. Mühlhäuser • no included error detection and correction © • unlimitedsearchspaceformotionvector --> fast encoder can do better http://www.tk.informatik.tu-darmstadt.de

http://www.kom.e-technik.tu-darmstadt.de • ..

Scope

Contents

05A-compression.fm 48 15.March.01 H.320, H.32x Family

H.320 specifies (as overview) videophone for ISDN H.310 • adapt MPEG 2 for communication over B-ISDN (ATM) H.321 • define videoconferencing terminal for B-ISDN (instead of N-ISDN) H.322

R. Steinmetz, M. Mühlhäuser • adapt H.320 for guaranteed QoS LANs (like ISO-Ethernet) © H.323 • videoconferencing over non-guaranteed LANs http://www.tk.informatik.tu-darmstadt.de http://www.kom.e-technik.tu-darmstadt.de H.324 • Terminal for low communication (over V.34 Modems)

Scope

Contents

05A-compression.fm 49 15.March.01 9. MPEG-1

Motion Picture Expert Group (MPEG) • ISO/IEC working group(s) • ISO/IEC JTC1/SC29/WG11 • ISO IS 11172 since 3/93 Starting point: MPEG-1 • Audio/video at about 1.5 Mbit/s • Based on experiences with JPEG and H.261 R. Steinmetz, M. Mühlhäuser

© Follow-up standards • MPEG-2 http://www.tk.informatik.tu-darmstadt.de

http://www.kom.e-technik.tu-darmstadt.de • MPEG-4 • MPEG-7 • MPEG-21

Scope

Contents

05A-compression.fm 50 15.March.01 MPEG - Features

MPEG audio video system

combined stream coding data stream coding data stream common buffer management R. Steinmetz, M. Mühlhäuser © Consideration of other standards:

http://www.tk.informatik.tu-darmstadt.de • JPEG http://www.kom.e-technik.tu-darmstadt.de • H.261 Symmetric and asymmetric compression Constant data rate, should be < 1856 kbit/s Original target rate ~ 1.2 Mbps including audio (=1x CD-ROM: 150 kBps) Scope

Contents

05A-compression.fm 51 15.March.01 MPEG - Video: Preparation Step

Fixed image format Color subsampling: • Y, C r,Cb • 4:2:0 Resolution: • Should be at most 768 x 576 pixel • 8 bit/pixel in each layer (i.e., for Y, C ,C )

R. Steinmetz, M. Mühlhäuser r b

© • 14 pixel aspect ratios • 8 frame rates http://www.tk.informatik.tu-darmstadt.de

http://www.kom.e-technik.tu-darmstadt.de No user defined MCU like JPEG No progressive mode like JPEG

Scope

Contents

05A-compression.fm 52 15.March.01 MPEG - Video: Processing Step

4 types of frames: I-frames (intra-coded frames): • Like JPEG • Real-time decoding demands P-frames (predictive coded frames): • Reference to previous I- or P-frames • Motion vector R. Steinmetz, M. Mühlhäuser

© • MPEG does not define how to determine the motion vector • difference of similar is DCT coded http://www.tk.informatik.tu-darmstadt.de

http://www.kom.e-technik.tu-darmstadt.de • DC and AC coefficients are runlength coded B-frames (bi-directional predictive coded frames): • Reference to previous and subsequent (I or P) frames • Interpolation between macro blocks D-frames (DC-coded frames): Scope • Only DC-coefficients are DCT coded • For fast forward and rewind Contents

05A-compression.fm 53 15.March.01 Further Improvements R. Steinmetz, M. Mühlhäuser © http://www.tk.informatik.tu-darmstadt.de http://www.kom.e-technik.tu-darmstadt.de

Quadtree partitioning: • Problem:

Scope • fixed 8*8 blocks do not reflect image properties • Solution: Contents • flexible partition of image into larger or smaller squares • driven by image structure Partitioning into rectangles and triangles

05A-compression.fm 96 15.March.01 Advantages & Drawbacks

+ High quality at high compression rates • At least for images with self-similarities • Here: better than JPEG ("cross-over point" at about 1:10 to 1:30) + Zooming into image supported • detailed view possible, interpolation instead of "pixelization" + Scalability • decompression steps yield iteratively improving image R. Steinmetz, M. Mühlhäuser ©

- Long compression times http://www.tk.informatik.tu-darmstadt.de http://www.kom.e-technik.tu-darmstadt.de • asymmetric mechanisms • improving search techniques for range & domain block pairs - blockwise artifacts with Information losses • Wimg is only approximative - Not well applicable to images of non-fractal nature Scope • E.g. texts, sharp lines & no quality guarantee possible Contents - Lower quality than JPEG at low compression rates - Error Propagation (Fehlerfortpflanzung)

05A-compression.fm 97 15.March.01 14. Basic Audio and Speech Coding Schemes

Voice encoder/decoder: "vocoder" Background • ITU driven activities

G.711: PCM • with 64 kbps

R. Steinmetz, M. Mühlhäuser G.722 differential PCM (DPCM) © • 48, 56, 64 kbps

http://www.tk.informatik.tu-darmstadt.de G.723 http://www.kom.e-technik.tu-darmstadt.de • Multipulse-maximum Likelihood Quatizer (MP-MLQ): 6,3 kbps • Algebraic Codebook Excitation Linear Prediction (ACELP) 5,3 kbps • application: speech

Scope

Contents

05A-compression.fm 98 15.March.01 Schemes for Speech Coding

G.728: Low Delay Code Excited Linear Prediction (LD-CELP) • used in audio/video conferencing • 16 kbps • one-way end to end delay less than 2 msec (due to algorithm) • complex algorithm • 16-18 MIPS in floating point required • appr. 40 MIPS whole encoding and decoding R. Steinmetz, M. Mühlhäuser

© AV.253 • still “under consideration” at ITU

http://www.tk.informatik.tu-darmstadt.de • 32 kbps http://www.kom.e-technik.tu-darmstadt.de IS-54 • VSELP • good for voice • bad for music • 13 kbps (appr. 8 kbps voice + 5.05 kbps forward error correction FEC) Scope • driving force: Motorola (similar developments in Japan) Contents

05A-compression.fm 99 15.March.01 Speech Coding in Mobile Telephone Networks

RPE-LTP (GSM) • Regular Pulse Excitation - Long-Term Predictor • used in European GSM: speech • 13 kbps GSM Half-Rate Coders • 5.6 - 6.25 kbps • quality and characteristics similar to RPE-LPT R. Steinmetz, M. Mühlhäuser © http://www.tk.informatik.tu-darmstadt.de http://www.kom.e-technik.tu-darmstadt.de

Scope

Contents

05A-compression.fm 100 15.March.01 Vocoder: e.g. Inmarsat IMBE Coder

Improved Multiband Excitation Coder IMBE • application: maritime satellite communications • 4,15 kbps for voice (plus 2,25 kbps for channel coding) Principle: Vocoder • (IMBE voiced and unvoiced individually for each frequency band) 200 - .DC 200 - 300 Hz +lowpass modulator 300 Hz R. Steinmetz, M. Mühlhäuser noe Speech encoded pehinput Speech

© 300 - DC 300 - 450 Hz +lowpass modulator 450 Hz

http://www.tk.informatik.tu-darmstadt.de … ……… http://www.kom.e-technik.tu-darmstadt.de

2.800 - DC 2.800 - 3.400 Hz +lowpass modulator 3.400 Hz … replicated for pitch analysis switch each frequency band Scope puls noise generator generator Contents

05A-compression.fm 101 15.March.01 15. Conclusion

JPEG: • Very general format with high compression ratio • SW and HW for baseline mode available H.261 / H.263: • Established standard by telecom world • Preferable hardware realization MPEG family of standards: R. Steinmetz, M. Mühlhäuser

© • Video and audio compression for different data rates • Asymmetric (focus) and symmetric http://www.tk.informatik.tu-darmstadt.de

http://www.kom.e-technik.tu-darmstadt.de Proprietary systems: e.g. Quicktime Product • Migration to the use of standards Next steps: wavelets, fractals, models of objects

Scope

Contents

05A-compression.fm 102 15.March.01