Fundamentals Series Defining Quality

Fundamentals Series Defining Quality © Polycom, Inc. All rights reserved. Fundamentals Series Signals H.323 Analog vs. Digital SIP Network Defining Quality Communication I Network Standards Communication II © Polycom, Inc. All rights reserved. 2 Welcome to Defining Quality, the third module in the Polycom Fundamentals series. This module is approximately 10 minutes long. Introduction © Polycom, Inc. All rights reserved. 3 In order to understand how videoconferencing works it’s important to understand the underlying technologies at work behind the scenes. In this short module we will talk some more about digital video, how we compress it and the effects that has on the quality of the video we see. We won’t get too in depth here, the goal is just to give you an understanding of how these basic technologies work. Remember that to make a digital signal, we sample the analog waveform, meaning that we take a measurement of the waveform a number of times per second (the sample rate). Each sample is saved using a number of bits; the exact number is known as the bit depth. Compression © Polycom, Inc. All rights reserved. 4 So, once we have a digital representation of our analog waveform, how does this get across the network quickly enough to give us a real-time conference? Well, a related concept to sampling is compression. Digital compression comes into play when we want to use a particular bit depth (say 16-bit) and then send that information using less than the originally sampled data (say 8 bits). We “compress” the information by using algorithms (mathematical calculations) to manipulate the data and then to recreate that information in an acceptable way at the other end. We use fewer bits to represent the larger numbers with minimal loss of quality. It’s tricky but compression can become really important when you’re trying to send a lot of information over a limited amount of bandwidth and need it all to arrive quickly. μ-law and A-law © Polycom, Inc. All rights reserved. 5 Two examples of compression used in digital audio are the μ-law (mu law or u-law) and A- law algorithms. These are what are known as “companding” algorithms. They handle the compressing and expanding of digital information (giving us comp-anding). By segmenting the signal range and treating each a little differently (by having more steps in lower ranges than in the higher ranges) we can use fewer values for high frequencies and more for the mid range frequencies, the ones in the center of the human hearing range. This makes the mid range sounds more accurate than the higher frequencies, but that’s okay because we really need more accuracy in the speech range for voice communications anyway. So, by using μ-law and A-law we can take samples at a 16 bit bit-depth (with 65,536 possible values for each sample, making it very accurate) and transmitting that information using only 8 bits of information (using 255 possible values). This is great since the primary information we’re worried about is carried in the mid ranges around the 1 kHz sound frequency. So μ-law and A-law focus on keeping more samples in that range and accepting lower quality at the higher and lower ends of the sound spectrum Audio Transmission © Polycom, Inc. All rights reserved. 6 Now let’s apply that to a digitized audio signal. If I have an original sound frequency range of 300 to 4000 Hz, and I sample that at 8000 samples per second, and I use 16 bits per sample to get the best audio quality, but then use the μ-law algorithm to compress that down to 8 bits for transmission… I end up with a 64,000 bits per second audio data stream. That gives me voice quality audio in a 64 kbps stream. 8000 samples per second transmitted using 8 bits = 64,000 bits per second. That happens to be the G.711 audio codec standard. How about that. A codec is just a word for a technique for coding and decoding a signal. Audio Codecs © Polycom, Inc. All rights reserved. 7 There are other audio codecs that are used in digital telephony and video conferencing. Some of the most common are listed here. They are all just built of different variations of sample rates, bit depths and compression algorithms. When we transmit these in our video conferencing call they are all sent across the network as voltage levels representing on/off (bits) in sequenced groups (bytes). There… simple isn’t it? Pixels © Polycom, Inc. All rights reserved. 8 Moving onto pictures, a digital picture is made up of pixels. Pixel is a shortened form of ‘picture element’, which is the smallest screen element in a display. The more pixels on the screen, the higher we say the resolution is. Here is a great image showing kind of how this works – when you look close up, like a low resolution image, you can see the pixels quite clearly. But when you view the whole image, like a higher resolution image, the smoother it is. Resolutions © Polycom, Inc. All rights reserved. 9 The resolution of the image is defined as the number of pixels across the screen by the number of lines on the screen. There are many, many standard resolutions, several of which you will be familiar with, such as VGA, which is 640 pixels by 480 lines. Each resolution has what we call an aspect ratio; the two most common are 4:3 (like an old television) or 16:9 (widescreen). Looking at the 640 x 480 example just given, you can see that it has a 4:3 aspect ratio, that is to say, if you split the 640 into 4, which gives 160, and then multiply that by 3, you get 480. This ties into video codecs as not all codecs support all resolutions, for example, the resolutions we describe as being ‘high definition’ require a specific codec to decode them. This means that even if you receive an HD signal, if you can’t decode it, you won’t get HD. Television Standards © Polycom, Inc. All rights reserved. 10 We will divert here momentarily to talk about television standards. Television wasn’t created in just one place and distributed around the world, and as a result, there were several competing systems for standard definition (4:3) TV that developed around the same time. The primary standards we see today are NTSC (National Television System Committee), PAL (Phase Alternating Line) and SECAM (Séquentiel couleur à mémoire, "Sequential Color with Memory”). You can see on the map where each is still used today. Each of these have slightly different ways in which they display video images on a monitor. There are two primary differences between them; the first is the number of lines that make up the picture, which as we know will help give us the resolution of the image, and the second is the number of times per second the image refreshes on the screen, which is known as the frame rate, and affects how smoothly the moving picture, well, moves. We don’t need to get into this any further, but it is worth knowing as when we move forward into discussing specific resolutions, you will find that some have been specified to NTSC standards, and some have been specified to PAL/SECAM standards. PAL and SECAM share the same resolution and frame rate so do not require separate resolutions, although they are not compatible due to the way they handle color signaling. Video Codecs © Polycom, Inc. All rights reserved. 11 There are several digital video compression codecs that are used today. In our video conferencing systems we commonly see three of them. These are H.261, H.263 and H.264. The primary things that differentiate them from each other fall into three categories: What resolutions does it support What frame rates does it support How much does it compress the video sequence to keep quality high and required bandwidth low Each codec does this a little differently, and although we will discuss this further, for now we are going to talk about what these three parameters mean to us, starting with resolution. We will concentrate here on resolutions commonly used in videoconferencing. SD Resolutions CIF SIF 4CIF 4SIF © Polycom, Inc. All rights reserved. 12 The Standard Definition, or SD, resolutions we will look at here all use the 4:3 aspect ratio. One of the first videoconferencing resolutions was CIF or Common International Format. It was first proposed in the H.261 standard back in 1988 as an easy way to convert PAL to NTSC and back. This is not to be confused with Source Input Format (SIF) which is a similar low resolution format but only for NTSC. Both of these also make resolutions by multiplying or dividing the base resolution. The most commonly seen of these is 4CIF and 4SIF, which multiply the base resolution by 2 for a higher quality. Doubling the resolution gives four times the quality; if this doesn’t seem to make sense, think of it this way; if I have a square and I double the length of both sides, how many of the originally sized squares can I fit inside the one which is twice the size? Yup, that would be four. HD Resolutions © Polycom, Inc. All rights reserved. 13 In comparison, HD resolutions are all widescreen, or 16:9 aspect ratio formats. The most common HD resolutions are shown here – 1280 x 720, and 1920 x 1080.

Fundamentals Series Defining Quality

PXC 550 Wireless Headphones

Audio Coding for Digital Broadcasting

(A/V Codecs) REDCODE RAW (.R3D) ARRIRAW

A Multi-Frame PCA-Based Stereo Audio Coding Method

Lossless Compression of Audio Data

Improving Opus Low Bit Rate Quality with Neural Speech Synthesis

Cognitive Speech Coding Milos Cernak, Senior Member, IEEE, Afsaneh Asaei, Senior Member, IEEE, Alexandre Hyaﬁl

Codec Is a Portmanteau of Either

Lossy Audio Compression Identification

Speech Compression

Video Source File Specifications

An Audio Codec for Multiple Generations Compression Without Loss of Perceptual Quality