Ultraspectral Data Compression

Total Page:16

File Type:pdf, Size:1020Kb

Ultraspectral Data Compression Ultraspectral Data Compression A Dissertation Presented by Rolando Herrero to The Department of Electrical and Computer Engineering in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the field of Electrical Engineering Northeastern University Boston, Massachusetts October, 2012 Abstract Ultraspectral images capture 2D data tuned at different wavelengths across the mid infrared electromagnetic spectrum. Atmospheric Infrared Sounder (AIRS) remote sensors are ultraspectral sensors that generate images with thousands of highly correlated bands and are considered the future of spectroscopy. The major application of AIRS is the acquisition of atmospheric parameters such as gases, temperature and other quantities to perform climate and weather forecast. Because typical AIRS images are well over 40 MB in size and because they are captured in remote locations data compression (before transmission) becomes a very critical issue. Existent compression studies of AIRS data adapt generic multispectral image compression techniques (not necessarily ultraspectral) but do not take into account the particular nature of ultraspectral images. Most of them do not consider correlation beyond one band, use fixed linear prediction (leading to significant distortion) and are not optimized to overcome time and space complexity constraints. Moreover compression studies do not provide analytical models of rate-distortion either. This dissertation focuses on presenting a whole new architecture for the compression of ultraspectral data presenting sound mathematical models that can be used to describe a set of algorithms and their practical implementation. Specifically we will (1) present a new preprocessing reversible stage that will rearrange the data to make it more efficient when the compression stage is performed; (2) present a new linear prediction based compression stage that will improve the compression rate of any given distortion when compared to literature ultraspectral data compression techniques. This involves defining a distortion measure and its effect on real applications; (3) define a mathematical model to approximate the rate- distortion of the compression stage and compare it against the real performance of the proposed algorithm; (4) compare the performance of the overall architecture against that of other state of the art compression schemes. To summarize it can be said that the results of this dissertation will contribute to the understanding of ultraspectral compression as well as the introduction of a novel ultraspectral codec. Contents 1 Introduction ....................................... 18 1.1 MultispectralImaging ............................... 18 1.2 MultispectralData................................. 23 1.2.1 AVIRIS.................................... 23 1.2.2 AIRS..................................... 24 1.2.3 AIRSProcessing............................... 25 1.3 UltraspectralDataCompression . .. 26 1.3.1 LosslessvsLossyCompression . 26 1.3.2 DistortionMeasure ............................. 27 1.3.3 Architecture for Ultraspectral Data Compression . ..... 30 1.4 ObjectivesandContribution . 30 2 ReviewofLiterature .................................. 32 2.1 BasicTechniques.................................. 32 2.1.1 EntropyCoding ............................... 32 2.1.2 VectorQuantization............................. 33 2.1.3 TransformCoding.............................. 34 2.1.3.1PCA................................. 34 2.1.3.2DCT................................. 34 2.1.3.3DWT ................................ 36 2.1.4 Linear Mixture Analysis . 36 2.1.5 LinearPrediction .............................. 36 2.2 PreprocessingTechniques . 44 2.2.1 BandOrdering................................ 44 2.2.2 BiasAdjustedReordering. 45 2.3 CompressionTechniques. .. .. 45 2.3.1 EntropyCoding ............................... 45 2.3.2 VectorQuantization............................. 46 2.3.3 TransformCoding.............................. 47 2.3.3.1PCA................................. 47 2 2.3.3.2DCT................................. 47 2.3.3.3DWT ................................ 47 2.3.4 FractalBasedCoding............................ 48 2.3.5 LinearPrediction .............................. 49 2.3.5.13DADPCM............................. 49 2.3.5.2JPEG-7 ............................... 49 2.3.5.3CALIC................................ 51 2.3.5.4JPEG-LS .............................. 51 2.3.5.5CELP ................................ 52 2.3.5.6SLSQ ................................ 53 2.3.5.7PLT ................................. 53 2.4 ProposedApproach ................................ 53 3 PreprocessingStage................................. .. 55 3.1 AIRSImages .................................... 55 3.2 TheSpeechAnalogy ................................ 58 3.3 PreprocessingSteps ................................ 61 3.3.1 BandPreprocessing ............................. 61 3.3.1.1 Band Normalization . 61 3.3.1.2 SubbandDecomposition . 62 3.3.2 BandOrdering................................ 69 3.3.3 ImageScanning ............................... 71 3.4 Summary ...................................... 86 4 CompressionStage ................................... 91 4.1 PredictionGain................................... 91 4.2 Longtermprediction................................ 94 4.3 InputFrameSize.................................. 104 4.4 PredictionCoefficients............................... 105 4.5 PredictionError .................................. 113 4.6 Summary ...................................... 121 5 RateDistortionAnalysis ................................ 125 5.1 Introduction .................................... 125 5.2 MathematicalModel................................ 126 5.2.1 PredictionError............................... 126 5.2.2 Rate ..................................... 128 5.2.3 Distortion .................................. 130 5.3 TheoreticalRateDistortion . 133 3 5.4 SimplifiedModel .................................. 136 5.5 Summary ...................................... 139 6 PerformanceComparison . .. .. 140 6.1 Introduction .................................... 140 6.2 NewAIRSLPCELPCompression . 141 6.3 OtherCompressionTechniques . 152 6.3.1 3DADPCM ................................. 157 6.3.2 JPEG-7 ................................... 163 6.3.3 CALIC.................................... 175 6.3.4 DWT..................................... 191 6.3.5 3DSPIHT.................................. 205 6.3.6 3DSPECK ................................. 209 6.4 CompressionResults................................ 215 7 ConclusionsandFutureWork . .. .. 216 7.1 Conclusions..................................... 216 7.2 FutureWork .................................... 218 A Rate-DistortionComputation . 219 A.1 Prediction Error Probability . 219 A.2Rate......................................... 220 A.3Distortion...................................... 221 BAlgorithms........................................ 222 B.1LBG......................................... 222 B.1.1LloydAlgorithm............................... 222 B.1.2 Generalized Lloyd Algorithm . 223 B.1.3 Stochastic Relaxation . 224 B.2Durbin-Levinson .................................. 226 B.2.1PredictorofOrderZero. .. .. 227 B.2.2EditorofOrderOne............................. 228 B.2.3PredictorofOrderTwo. .. .. 229 B.2.4PredictorofOrderThree . 230 B.2.5Summary................................... 231 B.2.6 Conversion of Reflection Coefficients to Linear Prediction Coefficients . 233 B.2.7 Conversion of Linear Prediction Coefficients to Reflection Coefficients . 233 Bibliography ........................................ 236 4 List of Figures 1.1 HyperspectralDataStructure. ... 19 1.2 ReflectanceCurve ................................ 19 1.3 HyperspectralPixelVector . 20 1.4 SpectralSolarIrradiance ............................ 21 1.5 AtmosphericAbsorption ............................ 22 1.6 AVIRIS...................................... 24 1.7 AIRS....................................... 25 1.8 Encoder...................................... 29 1.9 Decoder...................................... 29 2.1 PCAaxisrotation: (a)before,(b)after . ... 35 2.2 DWT....................................... 35 2.3 LP ........................................ 36 2.4 SystemIdentification .............................. 37 2.5 Barnwellwindow................................. 40 2.6 ARencoder.................................... 40 2.7 ARdecoder.................................... 41 2.8 MAencoder ................................... 41 2.9 MAdecoder ................................... 41 2.10 ARencoder.................................... 42 2.11 ARdecoder.................................... 42 2.12 MAencoder ................................... 43 2.13 MAdecoder ................................... 43 2.14 FBencoding ................................... 48 2.15 3DADPCM ................................... 49 2.16 JPEG-7...................................... 50 2.17 JPEG-LS..................................... 50 2.18 SLSQContexts.................................. 52 3.1 March3,2004Granule16............................ 56 6 3.2 March3,2004Granule82............................ 56 3.3 March3,2004Granule126 ........................... 57 3.4 March3,2004Granule151 ........................... 57 3.5 March3,2004Granule193 ........................... 58 3.6 Voicedframe................................... 60 3.7 Voicedframe(zoom)............................... 60 3.8 Two-Level Signal Decomposition/Reconstruction . ..... 63 3.9 Row-ColumnDWTComputation. 63 3.10 SecondLevelDecomposition
Recommended publications
  • Relating Optical Speech to Speech Acoustics and Visual Speech Perception
    UNIVERSITY OF CALIFORNIA Los Angeles Relating Optical Speech to Speech Acoustics and Visual Speech Perception A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Electrical Engineering by Jintao Jiang 2003 i © Copyright by Jintao Jiang 2003 ii The dissertation of Jintao Jiang is approved. Kung Yao Lieven Vandenberghe Patricia A. Keating Lynne E. Bernstein Abeer Alwan, Committee Chair University of California, Los Angeles 2003 ii Table of Contents Chapter 1. Introduction ............................................................................................... 1 1.1. Audio-Visual Speech Processing ............................................................................ 1 1.2. How to Examine the Relationship between Data Sets ............................................ 4 1.3. The Relationship between Articulatory Movements and Speech Acoustics........... 5 1.4. The Relationship between Visual Speech Perception and Physical Measures ....... 9 1.5. Outline of This Dissertation .................................................................................. 15 Chapter 2. Data Collection and Pre-Processing ...................................................... 17 2.1. Introduction ........................................................................................................... 17 2.2. Background ........................................................................................................... 18 2.3. Recording a Database for the Correlation Analysis.............................................
    [Show full text]
  • Error Correction Capacity of Unary Coding
    Error Correction Capacity of Unary Coding Pushpa Sree Potluri1 Abstract Unary coding has found applications in data compression, neural network training, and in explaining the production mechanism of birdsong. Unary coding is redundant; therefore it should have inherent error correction capacity. An expression for the error correction capability of unary coding for the correction of single errors has been derived in this paper. 1. Introduction The unary number system is the base-1 system. It is the simplest number system to represent natural numbers. The unary code of a number n is represented by n ones followed by a zero or by n zero bits followed by 1 bit [1]. Unary codes have found applications in data compression [2],[3], neural network training [4]-[11], and biology in the study of avian birdsong production [12]-14]. One can also claim that the additivity of physics is somewhat like the tallying of unary coding [15],[16]. Unary coding has also been seen as the precursor to the development of number systems [17]. Some representations of unary number system use n-1 ones followed by a zero or with the corresponding number of zeroes followed by a one. Here we use the mapping of the left column of Table 1. Table 1. An example of the unary code N Unary code Alternative code 0 0 0 1 10 01 2 110 001 3 1110 0001 4 11110 00001 5 111110 000001 6 1111110 0000001 7 11111110 00000001 8 111111110 000000001 9 1111111110 0000000001 10 11111111110 00000000001 The unary number system may also be seen as a space coding of numerical information where the location determines the value of the number.
    [Show full text]
  • Improving Opus Low Bit Rate Quality with Neural Speech Synthesis
    Improving Opus Low Bit Rate Quality with Neural Speech Synthesis Jan Skoglund1, Jean-Marc Valin2∗ 1Google, San Francisco, CA, USA 2Amazon, Palo Alto, CA, USA [email protected], [email protected] Abstract learned representation set [11]. A typical WaveNet configura- The voice mode of the Opus audio coder can compress wide- tion requires a very high algorithmic complexity, in the order band speech at bit rates ranging from 6 kb/s to 40 kb/s. How- of hundreds of GFLOPS, along with a high memory usage to ever, Opus is at its core a waveform matching coder, and as the hold the millions of model parameters. Combined with the high rate drops below 10 kb/s, quality degrades quickly. As the rate latency, in the hundreds of milliseconds, this renders WaveNet reduces even further, parametric coders tend to perform better impractical for a real-time implementation. Replacing the di- than waveform coders. In this paper we propose a backward- lated convolutional networks with recurrent networks improved compatible way of improving low bit rate Opus quality by re- memory efficiency in SampleRNN [12], which was shown to be synthesizing speech from the decoded parameters. We compare useful for speech coding in [13]. WaveRNN [14] also demon- two different neural generative models, WaveNet and LPCNet. strated possibilities for synthesizing at lower complexities com- WaveNet is a powerful, high-complexity, and high-latency ar- pared to WaveNet. Even lower complexity and real-time opera- chitecture that is not feasible for a practical system, yet pro- tion was recently reported using LPCNet [15]. vides a best known achievable quality with generative models.
    [Show full text]
  • Lecture 16: Linear Prediction-Based Representations
    Return to Main LECTURE 16: LINEAR PREDICTION-BASED REPRESENTATIONS Objectives ● Objectives: System View: Analysis/Synthesis ❍ Introduce analysis/synthesis systems The Error Signal ❍ Discuss error analysis and gain-matching Gain Matching ❍ Relate linear prediction coefficients to other spectral representations Transformations: Levinson-Durbin ❍ Introduce reflection and prediction coefficent recursions Reflection Coefficients Transformations ❍ Lattice/ladder filter implementations The Burg Method There is a classic textbook on this subject: Summary: Signal Modeling J.D. Markel and A.H. Gray, Linear Prediction of Speech, Springer-Verlag, New Typical Front End York, New York, USA, ISBN: 0-13-007444-6, 1976. On-Line Resources: This lecture also includes material from two other textbooks: AJR: Linear Prediction LPC10E MAD LPC J. Deller, et. al., Discrete-Time Processing of Speech Signals, MacMillan Publishing Co., ISBN: 0-7803-5386-2, 2000. Filter Design and, L.R. Rabiner and B.W. Juang, Fundamentals of Speech Recognition, Prentice-Hall, Upper Saddle River, New Jersey, USA, ISBN: 0-13-015157-2, 1993. Return to Main Introduction: 01: Organization (html, pdf) Speech Signals: ECE 8463: FUNDAMENTALS OF SPEECH 02: Production (html, pdf) RECOGNITION 03: Digital Models Professor Joseph Picone (html, pdf) Department of Electrical and Computer Engineering Mississippi State University 04: Perception (html, pdf) email: [email protected] phone/fax: 601-325-3149; office: 413 Simrall 05: Masking URL: http://www.isip.msstate.edu/resources/courses/ece_8463 (html, pdf) Modern speech understanding systems merge interdisciplinary technologies from Signal Processing, 06: Phonetics and Phonology Pattern Recognition, Natural Language, and Linguistics into a unified statistical framework. These (html, pdf) systems, which have applications in a wide range of signal processing problems, represent a revolution in Digital Signal Processing (DSP).
    [Show full text]
  • Source-Channel Coding for CELP Speech Coders
    2G'S c5 Source-Channel Coding for CELP Speech Coders J.A. Asenstorfer B.Ma.Sc. (Hons),B.Sc.(Hons),8.8',M'E' University of Adelaide Department of Electrical and Electronic Engineering Thesis submitted for the Degree of Doctor of Philosophy. Septemb er 26, L994 A*ond p.] l'ìq t Contents 1 1 Speech Coders for NoisY Channels 1.1 Introduction 1 2 I.2 Thesis Goals and Historical Perspective 2 Linear Predictive Coders Background I I 2.1 Introduction ' 10 2.2 Linear Predictive Models . 2.3 Autocorrelation of SPeech 15 17 2.4 Generic Code Excited Linear Prediction 3 Pitch Estimation Based on Chaos Theory 23 23 3.1 Introduction . 3.1.1 A Novel Approach to Pitch Estimation 25 3.2 The Non-Linear DYnamical Model 26 29 3.3 Determining the DelaY 32 3.4 Speech Database 3.5 Initial Findings 32 3.5.1 Poincaré Sections 34 35 3.5.2 Poincaré Section in Time 39 3.6 SpeechClassifrcation 40 3.6.1 The Zero Set 4l 3.6.2 The Devil's Staircase 42 3.6.3 Orbit Direction Change Indicator 45 3.7 Voicing Estimator 47 3.8 Pitch Tracking 49 3.9 The Pitch Estimation Algorithm ' 51 3.10 Results from the Pitch Estimator 55 3.11 Waveform Substitution 58 3.11.1 Replication of the Last Pitch Period 59 3.1I.2 Unvoiced Substitution 60 3.11.3 SPlicing Coders 61 3.11.4 Memory Considerations for CELP 61 3.12 Conclusion Data Problem 63 4 Vector Quantiser and the Missing 63 4.I Introduction bÐ 4.2 WhY Use Vector Quantisation? a Training Set 66 4.2.1 Generation of a Vector Quantiser Using Results 68 4.2.2 Some Useful Optimal Vector Quantisation ,7r 4.3 Heuristic Argument ùn .
    [Show full text]
  • Soft Compression for Lossless Image Coding
    1 Soft Compression for Lossless Image Coding Gangtao Xin, and Pingyi Fan, Senior Member, IEEE Abstract—Soft compression is a lossless image compression method, which is committed to eliminating coding redundancy and spatial redundancy at the same time by adopting locations and shapes of codebook to encode an image from the perspective of information theory and statistical distribution. In this paper, we propose a new concept, compressible indicator function with regard to image, which gives a threshold about the average number of bits required to represent a location and can be used for revealing the performance of soft compression. We investigate and analyze soft compression for binary image, gray image and multi-component image by using specific algorithms and compressible indicator value. It is expected that the bandwidth and storage space needed when transmitting and storing the same kind of images can be greatly reduced by applying soft compression. Index Terms—Lossless image compression, information theory, statistical distributions, compressible indicator function, image set compression. F 1 INTRODUCTION HE purpose of image compression is to reduce the where H(X1;X2; :::; Xn) is the joint entropy of the symbol T number of bits required to represent an image as much series fXi; i = 1; 2; :::; ng. as possible under the condition that the fidelity of the It’s impossible to reach the entropy rate and everything reconstructed image to the original image is higher than a we can do is to make great efforts to get close to it. Soft com- required value. Image compression often includes two pro- pression [1] was recently proposed, which uses locations cesses, encoding and decoding.
    [Show full text]
  • Samia Dawood Shakir
    MODELLING TALKING HUMAN FACES Thesis submitted to Cardiff University in candidature for the degree of Doctor of Philosophy. Samia Dawood Shakir School of Engineering Cardiff University 2019 ii DECLARATION This work has not been submitted in substance for any other degree or award at this or any other university or place of learning, nor is being submitted concurrently in candidature for any degree or other award. Signed . (candidate) Date . STATEMENT 1 This thesis is being submitted in partial fulfillment of the requirements for the degree of PhD. Signed . (candidate) Date . STATEMENT 2 This thesis is the result of my own independent work/investigation, except where otherwise stated, and the thesis has not been edited by a third party beyond what is permitted by Cardiff Universitys Policy on the Use of Third Party Editors by Research Degree Students. Other sources are acknowledged by explicit references. The views expressed are my own. Signed . (candidate) Date . STATEMENT 3 I hereby give consent for my thesis, if accepted, to be available online in the Universitys Open Access repository and for inter-library loan, and for the title and summary to be made available to outside organisations. Signed . (candidate) Date . ABSTRACT This thesis investigates a number of new approaches for visual speech synthesis using data-driven methods to implement a talking face. The main contributions in this thesis are the following. The ac- curacy of shared Gaussian process latent variable model (SGPLVM) built using the active appearance model (AAM) and relative spectral transform-perceptual linear prediction (RASTAPLP) features is im- proved by employing a more accurate AAM.
    [Show full text]
  • Generalized Golomb Codes and Adaptive Coding of Wavelet-Transformed Image Subbands
    IPN Progress Report 42-154 August 15, 2003 Generalized Golomb Codes and Adaptive Coding of Wavelet-Transformed Image Subbands A. Kiely1 and M. Klimesh1 We describe a class of prefix-free codes for the nonnegative integers. We apply a family of codes in this class to the problem of runlength coding, specifically as part of an adaptive algorithm for compressing quantized subbands of wavelet- transformed images. On test images, our adaptive coding algorithm is shown to give compression effectiveness comparable to the best performance achievable by an alternate algorithm that has been previously implemented. I. Generalized Golomb Codes A. Introduction Suppose we wish to use a variable-length binary code to encode integers that can take on any non- negative value. This problem arises, for example, in runlength coding—encoding the lengths of runs of a dominant output symbol from a discrete source. Encoding cannot be accomplished by simply looking up codewords from a table because the table would be infinite, and Huffman’s algorithm could not be used to construct the code anyway. Thus, one would like to use a simply described code that can be easily encoded and decoded. Although in practice there is generally a limit to the size of the integers that need to be encoded, a simple code for the nonnegative integers is often still the best choice. We now describe a class of variable-length binary codes for the nonnegative integers that contains several useful families of codes. Each code in this class is completely specified by an index function f from the nonnegative integers onto the nonnegative integers.
    [Show full text]
  • The Pillars of Lossless Compression Algorithms a Road Map and Genealogy Tree
    International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 6 (2018) pp. 3296-3414 © Research India Publications. http://www.ripublication.com The Pillars of Lossless Compression Algorithms a Road Map and Genealogy Tree Evon Abu-Taieh, PhD Information System Technology Faculty, The University of Jordan, Aqaba, Jordan. Abstract tree is presented in the last section of the paper after presenting the 12 main compression algorithms each with a practical This paper presents the pillars of lossless compression example. algorithms, methods and techniques. The paper counted more than 40 compression algorithms. Although each algorithm is The paper first introduces Shannon–Fano code showing its an independent in its own right, still; these algorithms relation to Shannon (1948), Huffman coding (1952), FANO interrelate genealogically and chronologically. The paper then (1949), Run Length Encoding (1967), Peter's Version (1963), presents the genealogy tree suggested by researcher. The tree Enumerative Coding (1973), LIFO (1976), FiFO Pasco (1976), shows the interrelationships between the 40 algorithms. Also, Stream (1979), P-Based FIFO (1981). Two examples are to be the tree showed the chronological order the algorithms came to presented one for Shannon-Fano Code and the other is for life. The time relation shows the cooperation among the Arithmetic Coding. Next, Huffman code is to be presented scientific society and how the amended each other's work. The with simulation example and algorithm. The third is Lempel- paper presents the 12 pillars researched in this paper, and a Ziv-Welch (LZW) Algorithm which hatched more than 24 comparison table is to be developed.
    [Show full text]
  • New Approaches to Fine-Grain Scalable Audio Coding
    New Approaches to Fine-Grain Scalable Audio Coding Mahmood Movassagh Department of Electrical & Computer Engineering McGill University Montreal, Canada December 2015 Research Thesis submitted to McGill University in partial fulfillment of the requirements for the degree of PhD. c 2015 Mahmood Movassagh In memory of my mother whom I lost in the last year of my PhD studies To my father who has been the greatest support for my studies in my life Abstract Bit-rate scalability has been a useful feature in the multimedia communications. Without the need to re-encode the original signal, it allows for improving/decreasing the quality of a signal as more/less of a total bit stream becomes available. Using scalable coding, there is no need to store multiple versions of a signal encoded at different bit-rates. Scalable coding can also be used to provide users with different quality streaming when they have different constraints or when there is a varying channel; i.e., the receivers with lower channel capacities will be able to receive signals at lower bit-rates. It is especially useful in the client- server applications where the network nodes are able to drop some enhancement layer bits to satisfy link capacity constraints. In this dissertation, we provide three contributions to practical scalable audio coding systems. Our first contribution is the scalable audio coding using watermarking. The proposed scheme uses watermarking to embed some of the information of each layer into the previous layer. This approach leads to a saving in bitrate, so that it outperforms (in terms of rate-distortion) the common scalable audio coding based on the reconstruction error quantization (REQ) as used in MPEG-4 audio.
    [Show full text]
  • End-To-End Video-To-Speech Synthesis Using Generative Adversarial Networks
    1 End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks Rodrigo Mira, Konstantinos Vougioukas, Pingchuan Ma, Stavros Petridis Member, IEEE Bjorn¨ W. Schuller Fellow, IEEE, Maja Pantic Fellow, IEEE Video-to-speech is the process of reconstructing the audio speech from a video of a spoken utterance. Previous approaches to this task have relied on a two-step process where an intermediate representation is inferred from the video, and is then decoded into waveform audio using a vocoder or a waveform reconstruction algorithm. In this work, we propose a new end-to-end video-to-speech model based on Generative Adversarial Networks (GANs) which translates spoken video to waveform end-to-end without using any intermediate representation or separate waveform synthesis algorithm. Our model consists of an encoder-decoder architecture that receives raw video as input and generates speech, which is then fed to a waveform critic and a power critic. The use of an adversarial loss based on these two critics enables the direct synthesis of raw audio waveform and ensures its realism. In addition, the use of our three comparative losses helps establish direct correspondence between the generated audio and the input video. We show that this model is able to reconstruct speech with remarkable realism for constrained datasets such as GRID, and that it is the first end-to-end model to produce intelligible speech for LRW (Lip Reading in the Wild), featuring hundreds of speakers recorded entirely ‘in the wild’. We evaluate the generated samples in two different scenarios – seen and unseen speakers – using four objective metrics which measure the quality and intelligibility of artificial speech.
    [Show full text]
  • Critical Assessment of Advanced Coding Standards for Lossless Audio Compression
    TONNY HIDAYAT et al: A CRITICAL ASSESSMENT OF ADVANCED CODING STANDARDS FOR LOSSLESS .. A Critical Assessment of Advanced Coding Standards for Lossless Audio Compression Tonny Hidayat Mohd Hafiz Zakaria, Ahmad Naim Che Pee Department of Information Technology Faculty of Information and Communication Technology Universitas Amikom Yogyakarta Universiti Teknikal Malaysia Melaka Yogyakarta, Indonesia Melaka, Malaysia [email protected] [email protected], [email protected] Abstract - Image data, text, video, and audio data all require compression for storage issues and real-time access via computer networks. Audio data cannot use compression technique for generic data. The use of algorithms leads to poor sound quality, small compression ratios and algorithms are not designed for real-time access. Lossless audio compression has achieved observation as a research topic and business field of the importance of the need to store data with excellent condition and larger storage charges. This article will discuss and analyze the various lossless and standardized audio coding algorithms that concern about LPC definitely due to its reputation and resistance to compression that is audio. However, another expectation plans are likewise broke down for relative materials. Comprehension of LPC improvements, for example, LSP deterioration procedures is additionally examined in this paper. Keywords - component; Audio; Lossless; Compression; coding. I. INTRODUCTION Compression is to shrink / compress the size. Data compression is a technique to minimize the data so that files can be obtained with a size smaller than the original file size. Compression is needed to minimize the data storage (because the data size is smaller than the original), accelerate information transmission, and limit bandwidth prerequisites.
    [Show full text]