Formatting and Source Coding

부산대학교 정보컴퓨터공학부

김종덕 ([email protected]) 강의의 목표

 문자, 음성, 이미지 등의 Information  Digital Data Formatting 하 는 주요 기법을 이해한다.

. Code, Encoding/Decoding, CODEC

. 한글 코드 / PCM Modulation

. 대역폭과 정보 표현과의 관계를 이해한다.

 Data를 효율적으로 표현하기 위한 Source Coding

. Compressor/Decompressor, CODEC

. 멀티미디어 데이터의 크기

. Digital Audio / Digital Video 압축

2 Digital Info.  Digital Data

 Coding Schemes

. Encoding/Decoding, CODEC  Alphabet, Digits and other characters…

. ASCII, EBCDIC, …  MIDI

. Musical Instrument Digital Interface

. 음악과 관련한 정보인데 Digital Info. ?  한글 코드?

. 완성형, 조합형 ?

. KSC-5601, UNICODE

3 한글 코드 (ANSI Code ? / UNICODE)

>> fid=fopen(‘song.txt’, ‘r’); >> ansi_string = fread(fid); >> fclose(fid); >> uni_string = native2unicode(ansi_string); >> unicode_start_code=[255, 254]; >> fid2=fopen('uni_song.txt', 'w'); >> fwrite(fid2, unicode_start_code, 'uint8'); >> fwrite(fid2, uni_string, 'uint16'); >> fclose(fid2);

4 Data Acquisition System

 Data Acquisition H/W

. At the heart of any data acquisition system lies the data acquisition hardware. The main function of this hardware is to

convert analog signals to digital signals, and to convert digital signals to analog signals. (ADC / DAC)

 Sensor and Actuators (Transducers)

. Sensors and actuators can both be transducers. A transducer is a device that converts input energy of one form into output energy of another form. For example, a microphone is a sensor that converts sound energy (in the form of pressure) into electrical energy, while a loudspeaker is an actuator that converts electrical energy into sound energy.

 Signal Conditioning H/W

. Sensor signals are often incompatible with data acquisition hardware. To overcome this incompatibility, the signal must be conditioned. For example, you might need to condition an input signal by amplifying it or by removing unwanted frequency components. Output signals might need conditioning as well.

Signal Sensor Conditioning

Acquisition Physical Phenomena Computer H/W

Actuator

5 Analog Info.  Digital Data

 소리

. PCM (Pulse Coded Modulation)

. Sampling Rate, Bits/Sample, Channel  이미지

. Pixel, RGB, Bits/Pixel

. VGA (640*480), QVGA, CIF(352*288), QCIF  동영상

. Frame, 24/30 FPS

. 480P, 720P, 1080i, 1080P …

6 Pulse Code Modulation (PCM)

 Nyquist Sampling Theory

. If a signal is sampled at regular intervals at a rate higher than twice the highest signal freque ncy, the samples contain all the information of the original signal

. Ex) Voice data limited to below 4000Hz  Require 8000 sample per second  Analog samples : Pulse Amplitude Modulation (PAM)

 Each sample assigned digital value - Quantization

. Quantizing error or noise

. Approximations mean it is impossible to recover original exactly

. Ex) 8 bit sample gives 256 levels

• 8000 samples per second of 8 bits each gives 64kbps

7 PCM Example

8 멀티미디어 정보의 크기

 48Khz, 16bits/Sample, Stereo (2 Channel) Digital Audio를 1시간 동 안 녹음할 경우 발생하는 정보의 양은?

. CD-ROM 에 기록할 수 있는 정보의 양?  VGA, 16bits/Pixel, 30FPS 비압축 동영상의 초당 정보 발생량은?

. 1시간 동안 녹음할 경우 발생하는 정보의 양은 ?  고화질 멀티미디어 방송?

. 5.1 Channel, 720P, 1080i, 1080P

. DVD, BlueRay, HD-DVD  압축은 필수 요소

9 멀티미디어 압축 Audio 압축 기술

 Digital Speech Coding

. 낮은 전송 자원 소모가 주목적으로 높은 압축율이 중요

. Human Vocal System의 특성을 활용, Vocoder

. 핵심 알고리즘 및 기술 – LPC () & CELP (Code Exited Linear Prediction)

. 주요 압축 표준 – AMR(Adaptive Multi-Rate), G722, G723.1, G726, G728, G729…  Digital Audio Coding

. 높은 압축율이 좋지만 좋은 음질을 재생해낼 수 있는 것이 중요

. Human Auditory System의 특성을 활용 – Psychoacoustic Model

. 핵심 요소 – Hearing Sensitivity, Frequency Masking, Temporal Masking

. 주요 압축 표준 – MPEG-1 Audio Layers (1, 2, 3), Dolby AC3, MPEG-2 (AAC), MPEG-4 AAC (HE-AAC), …

11 Multimedia compression and container formats (wiki)

MJPEG · Motion JPEG 2000 · MPEG-1 · MPEG-2 (Part 2) · MPEG-4 (Part 2/ASP · Part ISO/IEC 10/AVC) · HEVC

ITU-T H.120 · H.261 · H.262 · H.263 · H.264 · HEVC Video AVS · Bink · CineForm · · Dirac · DV · · · OMS others Video · · RealVideo ·RTVideo · SheerVideo · Smacker · Sorenson Video & Sorenson Spark · · VC-1 · VC-2 · VC-3 · VP3 ·VP6 · VP7 · VP8 · WMV MPEG-1 Layer III (MP3) · MPEG-1 Layer II (Multichannel) · MPEG-1 Layer I · AAC · HE- ISO/IEC AAC · MPEG Surround ·MPEG-4 ALS · MPEG-4 SLS · MPEG-4 DST · MPEG-4 HVXC · MPEG-4 CELP

ITU-T G.711 · G.718 · G.719 · G.722 · G.722.1 · G.722.2 · G.723 · G.723.1 · G.726 · G.728 · G.729 · G.729.1 Audio AC-3, AMR, AMR-WB, AMR-WB+, , ATRAC, CELT, DRA, DTS, EVRC, EVRC-B, FLAC, others GSM-HR, GSM-FR, GSM-EFR, iLBC, iSAC, Monkey's Audio, TTA (True Audio), MT9, A-law, μ-law, , Nellymoser, OptimFROG, OSQ, QCELP, RealAudio, RTAudio, SD2, SHN, SILK, , SMV, , SVOPC, TwinVQ, VMR-WB, , WavPack, WMA

ISO/IEC/ITU-T JPEG · JPEG 2000 · JPEG XR · lossless JPEG · JBIG · JBIG2 · PNG · TIFF/EP · TIFF/IT Image others APNG · BMP · DjVu · EXR · GIF · ICER · ILBM · MNG · PCX · PGF · TGA · QTVR · TIFF · WBMP · We bP MPEG-PS · MPEG-TS · ISO base media file format · MPEG-4 Part 14 · Motion JPEG 2000 · MPEG- ISO/IEC 21 Part 9

Cont- ITU-T H.222.0 · T.802 ainer 3GP and 3G2 · AMV · ASF · AIFF · AVI · AU · Bink · DivX Media Format · DPX · EVO · Flash others Video · GXF ·M2TS · · MXF · · QuickTime File Format · RealMedia · REDCODE RAW · RIFF · Smacker ·MOD and TOD · VOB · WAV · WebM

12 Digital Speech Coding

The Human Speech Production System

In relation to the opening and closing vibrations of the vocal cords as air blows over them, speech signals can be roughly categorized into two types of signals: voiced speech and unvoiced speech.

13 Voiced vs. Unvoiced Speech

 Voiced Speech

 Unvoiced Speech

14 Linear Predictive Coding

 A speech signal s(n) can be approximated as an auto-regressive (AR) 푝 formulation 푠 푛 = 푒 푛 + 푎푘푠(푛 − 푘) 푘=1

. The coefficients {푎푘} are derived on the basis of a 20~30ms block of data (frame)

15 Digital Audio Coding – Auditory System

1) The outer ear directs sounds through the ear canal towards the eardrum 2) The middle ear transforms sound pressure waves into mechanical movement on three small bones called “ossicles” (the hammer, anvil, and stirrup) 3) The inner ear houses the cochlea, a spiral-shaped structure for human hearing which sits in an extremely sensitive membrane called the basilar membrane. The cochlea converts the middle ear’s mechanical movements to basilar membrane movement and eventually into the firing of auditory neurons, which, in turn send electrical signals to the brain

16 Hearing Sensitivity

Frequency(Hz)

17 Hearing Sensitivity의 활용

 If we uniformly quantize each audio sample with 12 bits, the resulting quantization noise can be as low as -26 dB, which is far below the threshold of hearing.

 We can divide the audible frequency range (20Hz to 20Khz) into several bands, and the audio sample in different bands can be quantized with different numbers of bits to accommodate different tolerances of quantization noise.

18 Frequency Masking

19 Frequency Masking

20 Temporal Masking

 A weak sound emitted soon after the end of a louder sound is masked by the louder sound. (Post-masking)

 Even a weak sound just before a louder sound can be masked by the louder sound. (Pre-masking)

The combined frequency and temporal masking effect

21 Frequency Domain Analysis ?

 앞서 살펴본 Digital Speech/Audio Coding 기술 적용을 위해서는 Audio 신호에 대한 스펙트럼(주파수) 분석이 필요  Fourier Analysis

22 Digital Audio Standards

 MPEG-1 Audio Layer I, II, III

. Layer I : MP1

• one of three audio formats included in the MPEG-1 standard. While supported by most media players, the codec is considered largely outdated, and replaced by MP2 or MP3.

. Layer II : MP2, (sometimes incorrectly called MUSICAM)

• While MP3 is much more popular for PC and internet applications, MP2 remains a dominant standard for audio broadcasting.

• 우리의 지상파 DMB (T-DMB)의 원조라고 할 수 있는 Eureka-147이라 불리는 DAB(Digital Audio Broadcasting)의 기본 Audio Codec

• 유럽의 DTV 표준인 DVB(Digital Video Broadcasting)의 기본 Audio Codec

• MPEG-2 Audio Layer II extension을 통해 Multi-Channel을 지원

. Layer III : MP3

• a patented digital audio encoding format using a form of lossy . It is a common audio format for consumer audio storage, as well as a de facto standard of digital audio compression for the transfer and playback of music on digital audio players.

23 Digital Audio Standards

 Dolby AC3 Audio Codec

. Multi-Channel Support

. http://en.wikipedia.org/wiki/Dolby_AC3

24 Digital Audio Standards

 Advanced Audio Coding (AAC)

. Designed to be the successor of the MP3 format, AAC generally achieves better sound quality than MP3 at similar bit rates

• AAC has been standardized by ISO and IEC, as part of the MPEG-2 and MPEG-4 specifications. Part of the AAC known as High-Efficiency Advanced Audio Coding (HE-AAC) which is part of MPEG-4 Audio is also adopted into digital radio standards like DAB+ and Digital Radio Mondiale, as well as mobile television standards DVB-H and ATSC-M/H.

• AAC supports inclusion of 48 full-bandwidth (up to 96 kHz) audio channels in one stream plus 16 low frequency effects (LFE, limited to 120 Hz) channels, up to 16 "coupling" or dialog channels, and up to 16 data streams. The quality for stereo is satisfactory to modest requirements at 96 kbit/s in joint stereo mode; however, hi-fi transparency demands data rates of at least 128 kbit/s (VBR). The MPEG-2 audio tests showed that AAC meets the requirements referred to as "transparent" for the ITU at 128 kbit/s for stereo, and 320 kbit/s for 5.1 audio.

• AAC is also the default or standard audio format for iPhone, iPod, iPad, Nintendo DSi, iTunes, DivX Plus Web Player and PlayStation 3. It is supported on PlayStation Portable, Wii (with the Photo Channel 1.1 update installed for Wii consoles purchased before late 2007), Sony Walkman MP3 series and later, mobile phones made by Sony Ericsson and Nokia and Android-based mobile phones.

25 VIDEO COMPRESSION 영상처리의 과정

 Analog / Digital Convert

 RGB-YUV Convert

 Subsampling

 Encoding ( H.264 / MPEG4-AVC)

27 RGB & YUV

 YUV

. The YUV model defines a color space in terms of one luma (Y) and two chrominance (UV ) components. The YUV color model is used in the PAL, NTSC, and SECAM composite c olor video standards. Previous black-and-white systems used only luma (Y) information a nd color information (U and V) was added so that a black-and-white receiver would still be able to display a color picture as a normal black and white picture.

. YUV models human perception of color in a different way from the standard RGB model u sed in computer graphics hardware.

. Y stands for the luma component (the brightness) and U and V are the chrominance (colo r) components. The YPbPr color model used in analog component video and its digital ver sion YCbCr used in digital video are more or less derived from it (Cb/Pb and Cr/Pr are de viations from grey on blue-yellow and red-cyan axes, whereas U and V are blue-luminanc e and red-luminance differences), and are sometimes inaccurately called "YUV". The YIQ color space used in the analog NTSC television broadcasting system is related to it, altho ugh in a more complex way.

28 컬러 공간(Color space)

 YUV

. 명도(휘도)와 채도로 나타낸 색상계

. Y : 명도 (Luminance)

. U : 채도( 청색 계열 : Y – B )

. V : 채도 (적색 계열 : Y – R )

. RGB ↔ YUV 변환가능  사용 이유

. 명도에 좀 더 중점을 두기 위하여

• 사람의 눈은 색상보다 밝기에 민감

. Subsampling

• 명도(Y)는 유지시키고 색깔정보(U, V)의 정보량을 줄임

29 Subsampling  Subsampling

. Y, U, V의 비율을 다르게 해서 추출하는 방식

. 4:4:4 샘플링 방식은 비손실 압축

. 4:2:2 (카메라), 4:2:0 (다양한 압축기술)

. 4:2:0의 경우 기존 4:4:4보다 50% 압축

4:4:4 4:2:2 4:2:0

1 2 3 4 1 2 Y Y 1 2 5 6 7 8 3 4 Y

1 2 3 4 U 1 2 3 4 U 1 2 1 2 5 6 7 8 V 1 2 3 4 U V 1 2 3 4 V 5 6 7 8

24 samples 16 samples 12 samples

30 영상 압축(Video Compression)의 방법

공간적 압축 (Spatial Model)

영상압축의 방법 시간적 압축 확률적 압축

(Temporal (Entropy Model) Model)

31 공간적 압축 (Spatial model)

 공간적 압축(Spatial model)

. 공간주파수(Spatial frequency)

• 공간에서의 색이나 구조의 변화

. DCT(Discrete Cosine transform)

• 화소 값 -> 공간주파수

• 푸리에 변환과 유사한 변환 공간주파수가 낮다 공간주파수가 높다

• 일반 영상의 경우, DCT의 값들이 저주파 쪽으로 몰리는 성질

Image Block DCT Coefficient Matrix

DCT

32 Discrete Cosine Transform

 For the reduction of spatial redundancy

 convert the spatial representation of an 8*8 image to the frequency domain

. Similar to FFT

1 7 7 (2x 1)i (2y 1) j DCT(i, j)  C(i)C( j) pixel (x, y)cos[ ]cos[ ] 4 x0 y0 16 16 1 7 7 (2x 1)i (2y 1) j pixel (x, y)  C(i)C( j)DCT(i, j)cos[ ]cos[ ] 4 i0 j0 16 16 where  1  x  0 C(x)   2  1 otherwise

33 Example

88 Source Image Block DCT Coefficient Matrix

DCT

Quantization

Quantization Table

ZigZag Scanning & RLE Quantized Coefficient Matrix

34 시간적 압축(Temporal model)

 시간적 압축(Spatial model)

. 시간적 중복 (Temporal Redundancy)

• 텔레비전 : 약 30 fps (frame per second) , 영화 : 약 24 fps

• 사물의 움직임에 비해 1 frame당 시간은 매우 짧음

• 따라서 영상에서는 시간적 중복이 많이 일어남

35 시간적 압축(Temporal model)

 시간적 압축(Spatial model)

. 움직임 예측(Motion Estimation)

• 현재의 블록을 과거의 프레임에서 찾는 과정

. 움직임 보상(Motion Compensation) http://en.wikipedia.org/wiki/Motion_compensation

• 움직임 벡터 (Motion Vector) 를 구하는 과정 과 거 현 재

36 시간적 압축(Temporal model)

 시간적 압축(Spatial model)

. 움직임 벡터와 기준 프레임 사용하여 현재의 프레임을 복구

과 거 현 재

37 Group of Pictures

 I frame

. transformed without using prediction

. restarting point for prediction

. random access point

 P frame

. unidirectional prediction

 B frame

. bidirectional prediction

. not used for predicting other frames

38 Group Of Picture 1 2 3 4 5 6 7 8 9 10 11 12 13

I B P

재생 순서

Group Of Picture 1 5 2 3 4 9 6 7 8 13 10 11 12

I P B

코딩 순서 & 전송 순서

Group Of Picture

39 Representation - Entropy

 The Concept of Entropy from Information Theory

. For a given set of symbols, 퐴 = 푎1, 푎2, . . . , 푎푁

. Each symbol 푎푛 is associated with an event or an observation that has

occurrence probability 푝푛 separately; 푝푛 ∈ 푝1, 푝2, . . . , 푝푁

. The Information measure 퐼(푎푛) of the symbol 푎푛 is defined as

1 퐼(푎푛) = − log푏 푝푛 = −log푏 푝푛 . The average amount (expected value) of information we can get from each symbol emitted in the stream from the source is defined as the entropy 퐻(퐴)

for the discrete set of probabilities 푃 ∈ 푝1, 푝2, . . . , 푝푁 :

푁 1 퐻 퐴 = 퐼 퐴 = 푝푛 log푏 푝푛 푛=1 . http://en.wikipedia.org/wiki/Information_entropy

40 Entropy Coding

 Entropy Coding

. A coding scheme that assigns codes to symbols so as to match code lengths with the probabilities of the symbol.

. The more frequently, the shorter codeword

• According to Shannon’s source coding theorem, the optimal code length for a

symbol is 1 ; p is the probability of the input symbol log푏 푝

. Example: , Lempel-Zip Coding

. ex) 2 bits per sample -> 1.6 bits per sample

Input Codeword Frequency (Prob.) Output Codeword 00 0.6 0 01 0.15 100 10 0.2 11 11 0.05 101  Run-Length Encoding

. ex) 000000001122222 ==> (0;8)(1;2)(2;5) 41 SOFTWARE DEFINED RADIO, VISIBLE LIGHT COMMUNICATION, ACOUSTIC COMMUNICATION  The GNU Software Radio

. http://www.gnuradio.org

. GNU Radio is a free & open-source software development toolkit that provides signal processing blocks to implement software radios. It can be used with readily-available low-cost external RF hardware to create software- defined radios, or without hardware in a simulation-like environment. It is widely used in hobbyist, academic and commercial environments to support both wireless communications research and real-world radio systems.  Hardware - USRP

. The Universal Software Radio Peripheral is the recommended device for interfacing GNU Radio with the real world. The USRP has been developed especially for GNU Radio, and is available from Ettus Research.

43 Exploring GNU Radio

 http://www.gnu.org/software/gnuradio/doc/exploring- gnuradio.html

44 USRP

45 Listening to FM Radio using GNU Radio

 http://www.linuxjournal.com/article/7505

 Daughter Board (TVRX 50Mhz to 870Mhz Receiver)

. Bandpass signal with 6Mhz bandwidth at IF (Intermediate Frequency) 5.75Mhz.  ADC

. Up to 64M samples per seconds, 12 bits/sample  FPGA

. Digital Down Converter

46 FFT of FM Bands

(1024) Samples from ADC 10MHz

FFT of Demodulated FM Signal

19KHz : Stereo Pilot tone 38KHz

FFT at Output of DDC

47 Listening to FM Radio using GNU Radio

 Angle Modulation (Phase Modulation, Frequency Modulation)

푠 푡 = 퐴푐 ⋅ cos[2휋푓푐푡 + 휙 푡 ]  PM : 휙 푡 = 푘 ⋅ 푚(푡)

 FM : 휙′ 푡 = 푘 ⋅ 푚(푡)

. Instantaneous frequency

48 Listening to FM Radio using GNU Radio

 푠 푡 = 퐴푐 ⋅ cos[2휋푓푐푡 + 휙 푡 ] 에서 휙′ 푡 = 푘 ⋅ 푚(푡) 추출하기  Digital Down Converter

. IF (Intermediate Frequency)  Baseband 로; 2휋푓푐 없애기 . FPGA에서 수행

1 cos[2휋푓 푡 + 휙 푡 ] ⋅ cos 2휋푓 푡 = ⋅ (cos 4휋푓 푡 + 휙 푡 + cos 휙 푡 ) 푐 푐 2 푐  Quadrature Demodulator

. Differential, Difference ? ei(t2 )

ei (t1 )

ei((t2 )(t1 ))  ei(t2 ) ei(t1 )

49 GNU Radio Applications

 In addition to the examples discussed above, GNU Radio comes with a complete HDTV transmitter and receiver, a spectrum analyzer, an oscilloscope, concurrent multichannel receiver and an ever-growing collection of modulators and demodulators.  Projects under investigation or in progress include:

. A TiVo equivalent for radio, capable of recording multiple stations simultaneously.

. Time Division Multiple Access (TDMA) waveforms.

. A passive radar system that takes advantage of broadcast TV for its signal source

. TETRA transceiver.

. Digital Radio Mundial (DRM).

. Software GPS.

. Distributed sensor networks.

. Distributed measurement of spectrum utilization.

. Amateur radio transceivers.

. Ad hoc mesh networks.

. RFID detector/reader.

. Multiple input multiple output (MIMO) processing. 50 Visible Light Communication

 가시광 통신

. http://blog.skbroadband.com/938

. http://www.disneyresearch.com/project/visible-light-communication/

51 Communication over Screen-Camera Links?

 2D barcodes are everywhere !!!

Transmitter Receiver  “Transmitting” information (vs linking)

Mixing patte rn varies by l ine Original frame Single frame 2-frame mix

52 Acoustic Communication / Soundcode

 Acoustic Communication ?

. 자연계에서 일반적으로 쓰이는 전통적 통신 방법

. 기술적 가치 ? 수중 통신 (Underwater Communication)  스마트 폰과 연계? - http://digxtal.egloos.com/v/2654784

 2 Approaches

. Sonic Notify inserts ultra-high frequency sounds to the carrier audio. These frequencies are beyond the hearing range of most people and thus people just perceive it as if there were no alterations. https://sonicnotify.com/

. Intrasonics modifies the carrier audio and adds artificial echoes to it. The human brain perceives these as natural echoes and just ignores them as if there are a few insignificant objects that bounces the original sound.

53