© in This Web Service Cambridge University

Total Page:16

File Type:pdf, Size:1020Kb

© in This Web Service Cambridge University Cambridge University Press 978-0-521-76451-3 - Multimedia Computing Gerald Friedland and Ramesh Jain Index More information Index Aboud, G.D., 302 machine learning algorithms, 235–36 Absolute threshold of hearing (ATH), 165–66 machine vision algorithms, 235 Accelerometers, 30–31 MP3 format, 168–70 Acoustic event detection (AED), 235 , 281–82 MPEG-1 algorithm, 151–53 Acoustic Long-Term Average Spectrum (LTAS), 239 Sequential Minimal Optimization (SMO) Acoustic pitch, 238 algorithm, 257 Actuators, 28–29 Viterbi algorithm, 259 , 272–73 AdaBoost learning algorithm, 281 x-means algorithm, 143–44 Adaptive codebook, 184 Allen, J.F., 34 Addison, Paul S., 172 American National Standards Institute (ANSI), 53 ADPCM algorithm, 148–50 Anderson, C.A., 121 ADRS envelope, 47 ANNs (Artifi cial Neural Networks), 252–55 Advanced perceptual compression Anstis, S.M., 302 convolution theorem, 160–61 Apple, 108 Cooley-Tukey algorithm, 157 Apple iTunes, 202 Discrete Cosine Transform (DCT), 158–60 Application layer QoS control, 100 Discrete Fourier Transform (DFT), 156–58 Approximants, 44 Fast Cosine Transform, 157 Arithmetic coding, 135–38 , 136t.11.5 Fast Fourier Transform, 157 Artifi cial intelligence, 236 JPEG format and, 161–64 Artifi cial Neural Networks (ANNs), 252–55 MP3 format and, 168–70 ATH (Absolute threshold of hearing), 165–66 overview, 156 Attack, 47 perceptual video compression, 170–71 . Audio editing, 73–74 ( see also Perceptual video compression) Audio processing, 235 psychoacoustics and, 164–68 Audio sensors, 29 AED (Acoustic event detection), 235 , 281–82 Audiovisual media, 7–8 Ahmed, N., 171–72 Autostereoptic displays, 59 A-law, 145–46 A-weighting scheme, 37 Algorithms AdaBoost learning algorithm, 281 Baars, B.J., 93 ADPCM algorithm, 148–50 Background, signal processing and, 223–24 arithmetic coding, 135–38 , 136t.11.5 Bark scale, 165 , 217 Cooley-Tukey algorithm, 157 Barkhausen, Heinrich, 165 H u ff man coding, 128–31 Barrow, H.G., 302 k-means algorithm, 142–43 Bayesian Information Criterion (BIC), 252 , 262 , Lempel-Ziv algorithms, 131–35 279–80 LZ77 algorithm, 131–33 , 132t.11.3 Belongie, S., 302 LZ78 algorithm, 133–36 , 133t.11.4 Berkeley, George, 290 , 302 LZC algorithm, 135 Bickford, A.C., 49 LZW algorithm, 135 Big Data, 11 317 © in this web service Cambridge University Press www.cambridge.org Cambridge University Press 978-0-521-76451-3 - Multimedia Computing Gerald Friedland and Ramesh Jain Index More information 318 Index Binary Format for Scenes (BIFS), 171 recall, 194 Bing, 297 relevance feedback, 194 , 196–98 Blackstock, D.T., 48 term frequency/inverse document frequency Blu-ray format (TF/IDF), 195–96 compression and, 124 Web Image Search, 198 diff erential coding and, 151 Cloud computing, 11 , 308 lossy compression and, 141 CMYK color space, 61 Body transfer illusion, 84 Codd, Edgar, 189–90 Bohr, Niels, 305 Code Excited Linear Prediction (CELP) Books, 67–68 adaptive codebook, 184 Botvinick, M., 84 bitstream, 185t.14.1 Bovik, Al, 231 fi xed codebook, 184–85 Brandenburg, K., 172 line spectral frequencies (LSF), 183–84 Broughton, S.A., 171–72 line spectral pairs (LSP), 183–84 Bryan, K., 171–72 overview, 182–83 Bunke, Horst, 287 Cohen, J.D., 84 Bureau, M., 121 Color histograms, 243–44 Bushman, B.J., 121 Color spaces, 61–64 CIELAB color space, 62–64 Caitlin, D.E., 25 CIEXYZ color space, 63–64 Calvert, Gemma, 93 CMYK color space, 61 Cameras, 54–57 RGB color space, 61 autostereoptic displays, 59 YUV color space, 62 digital cameras, 57 Color vision, 60–61 Exif format and, 24–25 , 161 , 298–99 Communication image formation, 54–56 defi ned, 16 line of sight, 56 evolution of technology, 8–10 , 8t.2.1 moving camera, moving objects (MCMO), 282–83 in human society, 7–8 moving camera, stationary objects (MCSO), 282–83 overview, 25 parallax barrier method, 59 Compasses, 30–31 resolution, 56 Compression stationary camera, moving objects (SCMO), 282–83 advanced perceptual compression. (see Advanced stationary camera, stationary objects (SCSO), perceptual compression) 282–83 algorithms, 128–38 television and, 56–57 arithmetic coding, 135–38 , 136t.11.5 3D video, 57 , 59 DEFLATE method, 133 video cameras, 56–57 dynamic range compression (DRC), 216–17 Candelas, 52 entropy and, 127 Carryer, J. Edward, 34 H u ff man coding, 128–31 CBIR. See Content based image retrieval (CBIR) information content and, 125–32 , 126t.11.1. , C E L P. See Code Excited Linear Prediction (CELP) 127t.11.2 Challenges in multimedia systems Lempel-Ziv algorithms, 131–35 context versus content, 23–25 Lossy compression. (see Lossy semantic gap, 22–23 , 25 compression) overview, 21 LZ77 algorithm, 131–33 , 132t.11.3 Chroma quantization, 162 LZ78 algorithm, 133–36 , 133t.11.4 CIELAB color space, 62–64 LZC algorithm, 135 CIEXYZ color space, 63–64 LZW algorithm, 135 Clark, A.B., 146 overview, 4 , 124 Classic information retrieval perceptual video compression, 170–71 . logical representation, 193–94 ( see also Perceptual video compression) Multimedia Information Retrieval (MIR) run-length encoding (RLE), 124–25 compared, 192–98 source coding theorem, 127 overview, 192–94 speech compression. ( see Speech compression) PageRank, 194–95 weakness of entropy-based compression precision, 194 methods, 138 ranking, 194 Computer graphics, 235 © in this web service Cambridge University Press www.cambridge.org Cambridge University Press 978-0-521-76451-3 - Multimedia Computing Gerald Friedland and Ramesh Jain Index More information Index 319 Computing audiovisual media and, 7–8 evolution of technology, 8–10 , 9t.2.2 communication in human society, 7–8 expected results of computing, 11–12 elements of multimedia computing. (see Elements personal computers, 10–11 of multimedia computing) Connected components, signal processing and, 223 emerging applications and, 11 Connectivity, signal processing and, 223 evolution of computing and communication Content analysis. See Multimedia content analysis technology, 8–10 , 8t.2.1. , 9t.2.2 Content analysis systems. See Multimedia content evolving nature of information and, 11–12 analysis systems expected results of computing and, 11–12 Content based image retrieval (CBIR), 199–205 experiential environments and, 12–13 , 14 overview, 199–201 formal defi nition, 13 Query by Humming (QbH), 202 multimedia aspect, 10–13 semantic content-based retrieval, 202–3 personal computers and, 10–11 tag-based retrieval, 203–4 printing press and, 7 , 13 video retrieval, 204–5 storage and recording technology and, 8 Content replication, 101 telegraphy and, 7 Content segments, 69 telephony and, 10 Context versus content DEFLATE method, 133 connecting data and users, 292 DER (Diarization Error Rate), 280 context defi ned, 293–94 Desktop metaphor, 111 , 112 context in content, 294 , 295–96 Detection error tradeoff (DET) curve, 264 context only image search, 297 Device parameters, 297–99 data acquisition context, 294–95 derived metal layer, 298 device parameters, 297–99 . ( see also Device human induced metal layer, 298 parameters) meta layer, 297 interpretation context, 295 optical meta layer, 297 overview, 23–25 , 290–91 overview, 294 perceivers, 299–302 . ( see also Perceivers) pixel/spectral layer, 297 smart phone photo management, 300–2 spatial metal layer, 298 types of context, 294–95 temporal metal layer, 297 v e r i fi cation vision, 296–97 DFT (Discrete Fourier Transform), 156–58 Continuous media distribution services, 100–1 Dialog boxes, 110 Contours, 227 Diarization Error Rate (DER), 280 Control theory, 25 Dictionaries, 16–17 Conventional video encoding, 170–71 Diff erential coding, 147–53 Convolution theorem, 160–61 ADPCM algorithm, 148–50 Cooley-Tukey algorithm, 157 in audio, 148–50 Countermeasures regarding privacy, 118 in images, 150–51 Crocker, Lee Daniel, 151 MPEG-1 algorithm, 151–53 CRT technology, 58 Paeth fi lter, 150–51 PNG format, 150–51 , 151t.12.1 Data acquisition and organization in in video, 151–53 documents, 71–72 Digital cameras Data acquisition context, 294–95 Exif format and, 24–25 , 161 , 298–99 Databases overview, 57 logical level of data, 190 Digital television Multimedia Information Retrieval (MIR) and, diff erential coding and, 151 189–91 lossy compression and, 141 organization, storage, management, and retrieval Digital video discs (DVDs) (OSMR), 189–90 diff erential coding and, 151 physical level of data, 190 lossy compression and, 141 view level of data, 190–91 Digitization, 32–33 Daubechies, Ingrid, 172 discretization error, 33 DCT (Discrete Cosine Transform), 158–60 Nyquist frequency, 33 Decay, 47 quantization, 32 Decibel scale, 37 quantization noise, 33 D e fi ning multimedia systems sampling, 32 © in this web service Cambridge University Press www.cambridge.org Cambridge University Press 978-0-521-76451-3 - Multimedia Computing Gerald Friedland and Ramesh Jain Index More information 320 Index Diphthongs, 44 Edges, 226–28 Discrete Cosine Transform (DCT), 158–60 contours, 227 Discrete Fourier Transform (DFT), 156–58 detection of, 228 Discretization error, 33 edge detectors, 227 Distraction, 120 edge fragments, 227 Dix, Alan, 114 , 121 edge linking, 227 Documents edge orientations, 227 audio editing, 73–74 edge points, 227 audiovisual technology and, 68–69 enhancement of, 227 books, 67–68 fi ltering of, 227 content segments, 69 isotropic operators, 227 current authoring environments, 78–79 line discontinuities, 226 data acquisition and organization, 71–72 ramp edges, 226 defi ned, 67–69 roof edges, 226 dynamic documents, 71 step discontinuities, 226 editing, 72 vectors, 227 elements of multimedia authoring Edison, Th omas
Recommended publications
  • Relating Optical Speech to Speech Acoustics and Visual Speech Perception
    UNIVERSITY OF CALIFORNIA Los Angeles Relating Optical Speech to Speech Acoustics and Visual Speech Perception A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Electrical Engineering by Jintao Jiang 2003 i © Copyright by Jintao Jiang 2003 ii The dissertation of Jintao Jiang is approved. Kung Yao Lieven Vandenberghe Patricia A. Keating Lynne E. Bernstein Abeer Alwan, Committee Chair University of California, Los Angeles 2003 ii Table of Contents Chapter 1. Introduction ............................................................................................... 1 1.1. Audio-Visual Speech Processing ............................................................................ 1 1.2. How to Examine the Relationship between Data Sets ............................................ 4 1.3. The Relationship between Articulatory Movements and Speech Acoustics........... 5 1.4. The Relationship between Visual Speech Perception and Physical Measures ....... 9 1.5. Outline of This Dissertation .................................................................................. 15 Chapter 2. Data Collection and Pre-Processing ...................................................... 17 2.1. Introduction ........................................................................................................... 17 2.2. Background ........................................................................................................... 18 2.3. Recording a Database for the Correlation Analysis.............................................
    [Show full text]
  • Can Skype Be More Satisfying? a Qoe-Centric Study of the FEC
    HUANG LAYOUT 2/22/10 1:14 PM Page 2 Could Skype Be More Satisfying? A QoE-Centric Study of the FEC Mechanism in an Internet-Scale VoIP System Te-Yuan Huang, Stanford University Polly Huang, National Taiwan University Kuan-Ta Chen, Academia Sinica Po-Jung Wang, National Taiwan University Abstract The phenomenal growth of Skype in recent years has surpassed all expectations. Much of the application’s success is attributed to its FEC mechanism, which adds redundancy to voice streams to sustain audio quality under network impairments. Adopting the quality of experience approach (i.e., measuring the mean opinion scores), we examine how much redundancy Skype adds to its voice streams and systematically explore the optimal level of redundancy for different network and codec settings. This study reveals that Skype’s FEC mechanism, not so surprisingly, falls in the ballpark, but there is surprisingly a significant margin for improvement to ensure consistent user satisfaction. here is no doubt that Skype is the most popular VoIP data, at the error concealment level. service. At the end of 2009, there were 500 million Hereafter, we refer to the redundancy-based error conceal- users registered with Skype, and the number of concur- ment function of the system as the forward error correction rent online users regularly exceeds 20 million. Accord- (FEC) mechanism. There are two components in a general Ting to TeleGeography, in 2008 8 percent of international FEC mechanism: a redundancy control algorithm and a redun- long-distance calls were made via Skype, making Skype the dancy coding scheme. The control algorithm decides how largest international voice carrier in the world.
    [Show full text]
  • Improving Opus Low Bit Rate Quality with Neural Speech Synthesis
    Improving Opus Low Bit Rate Quality with Neural Speech Synthesis Jan Skoglund1, Jean-Marc Valin2∗ 1Google, San Francisco, CA, USA 2Amazon, Palo Alto, CA, USA [email protected], [email protected] Abstract learned representation set [11]. A typical WaveNet configura- The voice mode of the Opus audio coder can compress wide- tion requires a very high algorithmic complexity, in the order band speech at bit rates ranging from 6 kb/s to 40 kb/s. How- of hundreds of GFLOPS, along with a high memory usage to ever, Opus is at its core a waveform matching coder, and as the hold the millions of model parameters. Combined with the high rate drops below 10 kb/s, quality degrades quickly. As the rate latency, in the hundreds of milliseconds, this renders WaveNet reduces even further, parametric coders tend to perform better impractical for a real-time implementation. Replacing the di- than waveform coders. In this paper we propose a backward- lated convolutional networks with recurrent networks improved compatible way of improving low bit rate Opus quality by re- memory efficiency in SampleRNN [12], which was shown to be synthesizing speech from the decoded parameters. We compare useful for speech coding in [13]. WaveRNN [14] also demon- two different neural generative models, WaveNet and LPCNet. strated possibilities for synthesizing at lower complexities com- WaveNet is a powerful, high-complexity, and high-latency ar- pared to WaveNet. Even lower complexity and real-time opera- chitecture that is not feasible for a practical system, yet pro- tion was recently reported using LPCNet [15]. vides a best known achievable quality with generative models.
    [Show full text]
  • Lecture 16: Linear Prediction-Based Representations
    Return to Main LECTURE 16: LINEAR PREDICTION-BASED REPRESENTATIONS Objectives ● Objectives: System View: Analysis/Synthesis ❍ Introduce analysis/synthesis systems The Error Signal ❍ Discuss error analysis and gain-matching Gain Matching ❍ Relate linear prediction coefficients to other spectral representations Transformations: Levinson-Durbin ❍ Introduce reflection and prediction coefficent recursions Reflection Coefficients Transformations ❍ Lattice/ladder filter implementations The Burg Method There is a classic textbook on this subject: Summary: Signal Modeling J.D. Markel and A.H. Gray, Linear Prediction of Speech, Springer-Verlag, New Typical Front End York, New York, USA, ISBN: 0-13-007444-6, 1976. On-Line Resources: This lecture also includes material from two other textbooks: AJR: Linear Prediction LPC10E MAD LPC J. Deller, et. al., Discrete-Time Processing of Speech Signals, MacMillan Publishing Co., ISBN: 0-7803-5386-2, 2000. Filter Design and, L.R. Rabiner and B.W. Juang, Fundamentals of Speech Recognition, Prentice-Hall, Upper Saddle River, New Jersey, USA, ISBN: 0-13-015157-2, 1993. Return to Main Introduction: 01: Organization (html, pdf) Speech Signals: ECE 8463: FUNDAMENTALS OF SPEECH 02: Production (html, pdf) RECOGNITION 03: Digital Models Professor Joseph Picone (html, pdf) Department of Electrical and Computer Engineering Mississippi State University 04: Perception (html, pdf) email: [email protected] phone/fax: 601-325-3149; office: 413 Simrall 05: Masking URL: http://www.isip.msstate.edu/resources/courses/ece_8463 (html, pdf) Modern speech understanding systems merge interdisciplinary technologies from Signal Processing, 06: Phonetics and Phonology Pattern Recognition, Natural Language, and Linguistics into a unified statistical framework. These (html, pdf) systems, which have applications in a wide range of signal processing problems, represent a revolution in Digital Signal Processing (DSP).
    [Show full text]
  • Source-Channel Coding for CELP Speech Coders
    2G'S c5 Source-Channel Coding for CELP Speech Coders J.A. Asenstorfer B.Ma.Sc. (Hons),B.Sc.(Hons),8.8',M'E' University of Adelaide Department of Electrical and Electronic Engineering Thesis submitted for the Degree of Doctor of Philosophy. Septemb er 26, L994 A*ond p.] l'ìq t Contents 1 1 Speech Coders for NoisY Channels 1.1 Introduction 1 2 I.2 Thesis Goals and Historical Perspective 2 Linear Predictive Coders Background I I 2.1 Introduction ' 10 2.2 Linear Predictive Models . 2.3 Autocorrelation of SPeech 15 17 2.4 Generic Code Excited Linear Prediction 3 Pitch Estimation Based on Chaos Theory 23 23 3.1 Introduction . 3.1.1 A Novel Approach to Pitch Estimation 25 3.2 The Non-Linear DYnamical Model 26 29 3.3 Determining the DelaY 32 3.4 Speech Database 3.5 Initial Findings 32 3.5.1 Poincaré Sections 34 35 3.5.2 Poincaré Section in Time 39 3.6 SpeechClassifrcation 40 3.6.1 The Zero Set 4l 3.6.2 The Devil's Staircase 42 3.6.3 Orbit Direction Change Indicator 45 3.7 Voicing Estimator 47 3.8 Pitch Tracking 49 3.9 The Pitch Estimation Algorithm ' 51 3.10 Results from the Pitch Estimator 55 3.11 Waveform Substitution 58 3.11.1 Replication of the Last Pitch Period 59 3.1I.2 Unvoiced Substitution 60 3.11.3 SPlicing Coders 61 3.11.4 Memory Considerations for CELP 61 3.12 Conclusion Data Problem 63 4 Vector Quantiser and the Missing 63 4.I Introduction bÐ 4.2 WhY Use Vector Quantisation? a Training Set 66 4.2.1 Generation of a Vector Quantiser Using Results 68 4.2.2 Some Useful Optimal Vector Quantisation ,7r 4.3 Heuristic Argument ùn .
    [Show full text]
  • Samia Dawood Shakir
    MODELLING TALKING HUMAN FACES Thesis submitted to Cardiff University in candidature for the degree of Doctor of Philosophy. Samia Dawood Shakir School of Engineering Cardiff University 2019 ii DECLARATION This work has not been submitted in substance for any other degree or award at this or any other university or place of learning, nor is being submitted concurrently in candidature for any degree or other award. Signed . (candidate) Date . STATEMENT 1 This thesis is being submitted in partial fulfillment of the requirements for the degree of PhD. Signed . (candidate) Date . STATEMENT 2 This thesis is the result of my own independent work/investigation, except where otherwise stated, and the thesis has not been edited by a third party beyond what is permitted by Cardiff Universitys Policy on the Use of Third Party Editors by Research Degree Students. Other sources are acknowledged by explicit references. The views expressed are my own. Signed . (candidate) Date . STATEMENT 3 I hereby give consent for my thesis, if accepted, to be available online in the Universitys Open Access repository and for inter-library loan, and for the title and summary to be made available to outside organisations. Signed . (candidate) Date . ABSTRACT This thesis investigates a number of new approaches for visual speech synthesis using data-driven methods to implement a talking face. The main contributions in this thesis are the following. The ac- curacy of shared Gaussian process latent variable model (SGPLVM) built using the active appearance model (AAM) and relative spectral transform-perceptual linear prediction (RASTAPLP) features is im- proved by employing a more accurate AAM.
    [Show full text]
  • End-To-End Video-To-Speech Synthesis Using Generative Adversarial Networks
    1 End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks Rodrigo Mira, Konstantinos Vougioukas, Pingchuan Ma, Stavros Petridis Member, IEEE Bjorn¨ W. Schuller Fellow, IEEE, Maja Pantic Fellow, IEEE Video-to-speech is the process of reconstructing the audio speech from a video of a spoken utterance. Previous approaches to this task have relied on a two-step process where an intermediate representation is inferred from the video, and is then decoded into waveform audio using a vocoder or a waveform reconstruction algorithm. In this work, we propose a new end-to-end video-to-speech model based on Generative Adversarial Networks (GANs) which translates spoken video to waveform end-to-end without using any intermediate representation or separate waveform synthesis algorithm. Our model consists of an encoder-decoder architecture that receives raw video as input and generates speech, which is then fed to a waveform critic and a power critic. The use of an adversarial loss based on these two critics enables the direct synthesis of raw audio waveform and ensures its realism. In addition, the use of our three comparative losses helps establish direct correspondence between the generated audio and the input video. We show that this model is able to reconstruct speech with remarkable realism for constrained datasets such as GRID, and that it is the first end-to-end model to produce intelligible speech for LRW (Lip Reading in the Wild), featuring hundreds of speakers recorded entirely ‘in the wild’. We evaluate the generated samples in two different scenarios – seen and unseen speakers – using four objective metrics which measure the quality and intelligibility of artificial speech.
    [Show full text]
  • Linear Predictive Modelling of Speech - Constraints and Line Spectrum Pair Decomposition
    Helsinki University of Technology Laboratory of Acoustics and Audio Signal Processing Espoo 2004 Report 71 LINEAR PREDICTIVE MODELLING OF SPEECH - CONSTRAINTS AND LINE SPECTRUM PAIR DECOMPOSITION Tom Bäckström Dissertation for the degree of Doctor of Science in Technology to be presented with due permission for public examination and debate in Auditorium S4, Department of Electrical and Communications Engineering, Helsinki University of Technology, Espoo, Finland, on the 5th of March, 2004, at 12 o'clock noon. Helsinki University of Technology Department of Electrical and Communications Engineering Laboratory of Acoustics and Audio Signal Processing Teknillinen korkeakoulu Sähkö- ja tietoliikennetekniikan osasto Akustiikan ja äänenkäsittelytekniikan laboratorio Helsinki University of Technology Laboratory of Acoustics and Audio Signal Processing P.O. Box 3000 FIN-02015 HUT Tel. +358 9 4511 Fax +358 9 460 224 E-mail [email protected] ISBN 951-22-6946-5 ISSN 1456-6303 Otamedia Oy Espoo, Finland 2004 Let the floor be the limit! HELSINKI UNIVERSITY OF TECHNOLOGY ABSTRACT OF DOCTORAL DISSERTATION P.O. BOX 1000, FIN-02015 HUT http://www.hut.fi Author Tom Bäckström Name of the dissertation Linear predictive modelling for speech - Constraints and Line Spectrum Pair Decomposition Date of manuscript 6.2.2004 Date of the dissertation 5.3.2004 Monograph 4 Article dissertation (summary + original articles) Department Electrical and Communications Engineering Laboratory Laboratory Acoustics and Audio Signal Processing Field of research Speech processing Opponent(s) Associate Professor Peter Händel Supervisor Professor Paavo Alku (Instructor) Abstract In an exploration of the spectral modelling of speech, this thesis presents theory and applications of constrained linear predictive (LP) models.
    [Show full text]
  • Specview User Manual Rev 3.04 for Version
    User Manual Copyright SpecView 1994 - 2007 3.04 For SpecView Version 2.5 Build #830/32 Disclaimer SpecView software communicates with industrial instrumentation and displays and stores the information it receives. It is always possible that the data being displayed, stored or adjusted is not as expected. ERRORS IN THE DATABASE OR ELSEWHERE MEAN THAT YOU COULD BE READING OR ADJUSTING SOMETHING OTHER THAN THAT WHICH YOU EXPECT! Safety devices must ALWAYS be used so that safe operation of equipment is assured even if incorrect data is read by or sent from SpecView. SpecView itself MUST NOT BE USED IN ANY WAY AS A SAFETY DEVICE! SpecView will not be responsible for any loss or damage caused by incorrect use or operation, even if caused by errors in programs supplied by SpecView Corporation. Warranties & Trademarks This document is for information only and is subject to change without prior notice. SpecView is a registered trademark of SpecView Corporation Windows is a trademark of Microsoft Corporation. All other products and brand names are trademarks of their respective companies. Copyright 1995-2007 by SpecView Corporation. All Rights Reserved. This document was produced using HelpAndManual. I Contents I Table of Contents Foreword 0 1 Installation 1 1.1 Instrument................................................................................................................................... Installation and Wiring 1 1.2 Instrument..................................................................................................................................
    [Show full text]
  • 1. Zpracování Hlasu 2. Kodeky
    2. přenáška Hlas a jeho kódování 1 Osnova přednášky 1. Zpracování hlasu 2. Kodeky 2 1. Hlas 3 Hlas Kmitočet hlasivek je charakterizován základním tónem lidského hlasu (pitch periode) F0, který tvoří základ znělých zvuků (tj. samohlásek a znělých souhlásek). Kmitočet základního tónu je různý u dětí, dospělých, mužů i žen, pohybuje se většinou v rozmezí 150 až 4 000 Hz. Sdělení zprostředkované řečovým signálem je diskrétní, tzn. může být vyjádřeno ve tvaru posloupnosti konečného počtu symbolů. Každý jazyk má vlastní množinu těchto symbolů – fonemů, většinou 30 až 50. Hlásky řeči dále můžeme rozdělit na znělé (n, e, ...), neznělé (š, č, ...) a jejich kombinace. Znělá hláska představuje kvaziperiodický průběh signálu, neznělá pak signál podobný šumu. Navíc energie znělých hlásek je větší než neznělých. Krátký časový úsek znělé hlásky můžeme charakterizovat její jemnou a formantovou strukturou. Formantem označujeme tón tvořící akustický základ hlásky. Ten vlastně představuje spektrální obal řečového signálu. Jemná harmonická struktura představuje chvění hlasivek. 4 Příklad časového a kmitočtového průběhu znělého a neznělého segmentu hovorového signálu Blíže http://www.itpoint.cz/ip-telefonie/teorie/princip-zpracovani-hlasu-ip-telefonie.asp 5 Pásmo potřebné pro telefonii 6 Pěvecké výkony jsou mimo oblast IP telefonie Blíže diskuze na http://forum.avmania.e15.cz/viewtopic.php?style=2&f=1724&t=1062463&p=6027671&sid= 7 2. Kodeky 8 Brána VoIP 9 Architektura VoIP brány SLIC – Subscriber Line Interface Circuit PLC – Packet Loss Concealment (odstranění důsledků ztrát rámců) Echo canceliation – odstranění odezvy 10 Některé vlastnosti kodérů . Voice Activity Detection (VAD) V pauze řeči je produkováno jen velmi malé množství bitů, které stačí na generování šumu.
    [Show full text]
  • PC-Netzwerke – Das Umfassende Handbuch 735 Seiten, Gebunden, Mit DVD, 7
    Wissen, wie’s geht. Leseprobe Wir leben in einer Welt voller Netzwerktechnologien. Dieses Buch zeigt Ihnen, wie Sie diese sinnvoll einsetzen und ihre Funktionen verstehen. Diese Leseprobe macht Sie bereits mit einzelnen Aspek- ten vertraut. Außerdem können Sie einen Blick in das vollständige Inhalts- und Stichwortverzeichnis des Buches werfen. »Netzwerk-Grundwissen« »Wireless LAN« »DHCP« »Netzwerkspeicher« Inhaltsverzeichnis Index Die Autoren Leseprobe weiterempfehlen Axel Schemberg, Martin Linten, Kai Surendorf PC-Netzwerke – Das umfassende Handbuch 735 Seiten, gebunden, mit DVD, 7. Auflage 2015 29,90 Euro, ISBN 978-3-8362-3680-5 www.rheinwerk-verlag.de/3805 Kapitel 3 Grundlagen der Kommunikation Dieser Teil des Buches soll Ihnen einen vertieften Überblick über das theoretische Gerüst von Netzwerken geben und damit eine Wissensbasis für die weiteren Kapitel des Buches schaffen. Das Verständnis der Theorie wird Ihnen bei der praktischen Arbeit, insbesondere bei der Fehleranalyse, helfen. Aktuelle Netzwerke sind strukturiert aufgebaut. Die Strukturen basieren auf verschie- denen technologischen Ansätzen. Wenn Sie ein Netzwerk aufbauen wollen, dessen Technologie und Struktur Sie ver- stehen möchten, dann werden Sie ohne Theorie sehr schnell an Grenzen stoßen. Sie berauben sich selbst der Möglichkeit eines optimal konfiguriertes Netzwerkes. In Fehlersituationen werden Ihnen die theoretischen Erkenntnisse helfen, einen Feh- ler im Netzwerk möglichst schnell zu finden und geeignete Maßnahmen zu seiner Beseitigung einzuleiten. Ohne theoretische Grundlagen sind Sie letztlich auf Glücks- treffer angewiesen. Dieses Buch legt den Schwerpunkt auf die praxisorientierte Umsetzung von Netzwer- ken und konzentriert sich auf die Darstellung von kompaktem Netzwerkwissen. Ein Computernetzwerk kann man allgemein als Kommunikationsnetzwerk bezeich- nen. Ausgehend von der menschlichen Kommunikation erkläre ich die Kommunika- tion von PCs im Netzwerk.
    [Show full text]
  • Audio Processing and Loudness Estimation Algorithms with Ios Simulations
    Audio Processing and Loudness Estimation Algorithms with iOS Simulations by Girish Kalyanasundaram A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Science Approved September 2013 by the Graduate Supervisory Committee: Andreas Spanias, Chair Cihan Tepedelenlioglu Visar Berisha ARIZONA STATE UNIVERSITY December 2013 ABSTRACT The processing power and storage capacity of portable devices have improved considerably over the past decade. This has motivated the implementation of sophisticated audio and other signal processing algorithms on such mobile devices. Of particular interest in this thesis is audio/speech processing based on perceptual criteria. Specifically, estimation of parameters from human auditory models, such as auditory patterns and loudness, involves computationally intensive operations which can strain device resources. Hence, strategies for implementing computationally efficient human auditory models for loudness estimation have been studied in this thesis. Existing algorithms for reducing computations in auditory pattern and loudness estimation have been examined and improved algorithms have been proposed to overcome limitations of these methods. In addition, real-time applications such as perceptual loudness estimation and loudness equalization using auditory models have also been implemented. A software implementation of loudness estimation on iOS devices is also reported in this thesis. In addition to the loudness estimation algorithms and software, in this thesis project we also created new illustrations of speech and audio processing concepts for research and education. As a result, a new suite of speech/audio DSP functions was developed and integrated as part of the award-winning educational iOS App 'iJDSP.” These functions are described in detail in this thesis. Several enhancements in the architecture of the application have also been introduced for providing the supporting framework for speech/audio processing.
    [Show full text]