Introduction • A file format for storing digital audio data on a computer system.

• Can be stored uncompressed, or compressed to reduce the file size.

• Can be a raw bitstream

• It is usually a container format or an audio data format with defined storage layer. History PHONOGRAPH • Thomas Edison's phonograph (invented in 1877) • Emile Berliner's Gramophone (patented in 1887) RADIO • This technology has only been around since the 1920s. • The theoretical basis of the propagation of electromagnetic waves was first described in 1873 by James Clerk Maxwell in his paper to the Royal Society A dynamical theory of the electromagnetic field, which followed his work between 1861 and 1865. History TRACKS • Invented in the early 1960s by William Powell Lear, and heavily marketed and used in the '70s. • The 8-track was the premier portable audio format for almost 15 years. • The 8-track was designed around a single reel with the two ends of the plastic recording tape joined with a piece of conductive foil tape to make one continuous loop. History CASSETTE TAPE • Not popular until the late 1970s. • Magnetic tape recording got used in music studios around 1950. • Musicians could record in longer sessions then select and combine the best cuts into polished songs. COMPACT DISK • Used for storing digital data. • Originally invented for digital audio and also used for data storage • Frequently included as a component in personal computers. • Personal computers can generally play audio CDs. Parameters of Sound File 1. SAMPLING RATE • Most digital sound files save information as a long series of sound samples (like frames in moving pictures).

• The quality of a sound file can be increased by taking more of these samples in the same amount of time; that is, increasing the “sampling rate”.

• Its effect is increasing the file size. Parameters of Sound File 2. BITS PER SAMPLE • Commonly, either 8 or 16 bits are used to represent each sample. • Using 16 bits provides for much better quality (files twice as large as 8-bit files.)

3. NUMBER OF CHANNELS • Stereo files uses two separate audio channels & are usually about twice the size of mono files. • It is theoretically possible to record any number of channels. • For example, a file intended to be played on a surround-sound system may record seven or more channels, and will thus be about seven times the size of a mono version of the same sound. Types of Sound File There are three major groups of audio file formats: 1. Uncompressed audio formats • such as WAV, AIFF, AU or raw header-less PCM. 2. Formats with • such as FLAC, Monkey's Audio ( APE), WavPack (filename extension WV), TTA, (filename extension m4a), MPEG-4 SLS, MPEG-4 ALS, MPEG-4 DST, Lossless (WMA Lossless), and (SHN). 3. Formats with • such as MP3, , AAC, ATRAC and Windows Media Audio Lossy (WMA lossy). Audio File Formats 1. WAV (Waveform Extension) • Very good sound quality. • Widely supported in many browsers with no need for a plugin. • You can record your own . files from a CD, tape, microphone, etc. • The very large file sizes severely limit the length of the sound clips that you can use on your Web pages. • Microsoft ADPCM compressed waveform forma consists of 4-bit per channel compressed data. • Each 4-bit sample is expanded to 16-bits when loaded. • The 16-bit data can still be quickly converted to 8-bit during playback on cards that don't support 16-bit. Audio File Formats 2. AIFF (Audio ) • Format for storing digital audio (waveform) data. • Supports a variety of bit resolutions, sample rates, and channels of audio. • Very popular on Apple platforms. • Widely used in professional programs that process digital audio waveforms. • uses the Electronic Arts Interchange File Format method for storing data in "chunks" A) The commom Chunk • Describes fundamental parameters of the waveform data such as sample rate, bit resolution, and how many channels of digital audio are stored in the FORM AIFF Audio File Formats B) Sound Data Chunk • The Sound Data Chunk contains the actual sample frames (ie, all channels of waveform data). C) Marker Chunk • The Marker Chunk contains markers that point to positions in the waveform data D) The Instrument Chunk • The Instrument Chunk defines basic parameters that an instrument, such as a MIDI sampler, could use to play the waveform data. E) The Text Chunks, Name, Author, Copyright, Annotation • These four optional chunks are included in the definition of every Standard IFF file. Audio File Formats Chunk Precedence Audio File Formats 3. VOC (Creative Voice) • Created by Creative and is generally associated with their 8-bit line of cards (Sound Blaster, Sound Blaster Pro). • Initially limited to unsigned 8-bit PCM(Pulse Code Modulation) and ADPCM (Adaptive Differential Pulse-Code Modulation) data. • Eventually expanded to handle 16-bit formats with the introduction of Creative's 16-bit cards. • Format is composed of a file header followed by one or more data block. Audio File Formats 4. AVI () • Audio Video Interleaved is a multimedia container format • Introduced by Microsoft in 1992. • AVI files can contain both audio and video data in a file container. • AVI files support multiple streaming audio and video. • It is derivative of the Resource Interchange File Format (RIFF), which divides a file's data into blocks, or "chunks." • Each "chunk" is identified by a FourCC tag. • An AVI file takes the form of a single "chunk" in a RIFF formatted file, which is then subdivided into two mandatory "chunks" and one optional "chunk". Audio File Formats 5. RMF (Rich Music Format) • Audio file created in the Rich Music Format. • Created by Beatnik. • Used as a container format for MIDI files and other standard audio formats. • Can be encrypted to protect original music. • used by some older video games. • The original RMF editor released by Beatnik, is no longer available. • Still supported by Java Sound and the Java Media Framework. Audio File Formats 6. WMA (Windows Media Audio) • WMA is both an audio format and an . • Competitor for the MP3 and RealAudio audio formats. There are 4 versions of the WMA codec: • WMA is the original codec, and was initially released in 1999. • WMA Pro is an improved lossy codec intended for audio professionals • WMA Lossless is a lossless codec intended for archival and storage purposes. • WMA Voice is a lossy codec that was designed for low bandwith voice playback applications. PSYCHO-ACOUSTICS Psycho-Acoustics

• In harnessing sound for various musical instruments & multimedia applications, effect of sound on human hearing need to be analyzed.

• Psycho-Acoustics is branch of acoustics which deals with human auditory perception from the biological design of ear to the brain interception. Factors to be Analyzed..

1. Decibel (dB) A unit for measuring loudness of sound. • It is comparison of intensity of sound with the lowest sound audible by human ear. • It is expressed as a ratio of logarithmic value

Pdb = 10log10(PA/PB) where, PA : is power of energy content of a signal

PB : corresponding value for reference signal Factors to be Analyzed..

2. Sound Measurement Two popular ways for acoustical measurement 1. Direct Method: • Measuring a set of environmental factors like temp, humidity, viscocity, echo timing, etc. 2. Comparision Method: • Measuring sound pressure levels of reference soudn of known energy levels & compering it with sound being measured. Factors to be Analyzed.. 3. Hearing Threshold & Masking Two fundamental phenamon that governs human hearing are: 1. Threashould: • Minimum threshould : least auidable sound that normal human ear can detect & hear. • Max Sensitivity occurs at 1 to 5 kHz.

An equal-loudness contour. Note peak sensitivity around 2–4 kHz, the frequency around which the human voice centers. Factors to be Analyzed.. 3. Hearing Threshold & Masking 2. Amplitude Masking: • Originates due to limitation of human ear. • This limitation prevents us for hearing two sounds of different loudness but colse in freq. or time. MIDI

(Musical Instrument Digital Interface) MIDI • MIDI doesn’t directly describe musical sound • MIDI is not a language • It is a data communications protocol • 1900s: electronic synthesizers developed (monophonic) • 1970s: digital synthesizers developed (monophonic) • Each manufacturer used different design scheme, with their own keyboard / panel MIDI • With a particular input device, each player can only run one or two synthesizers at the same time

• To use a wide range of synthesized sounds, many players were needed

• 1983, full MIDI 1.0 Detailed Specification released by Yamaha, Korg & Kawai.

• It standardized the control signal and inter-machine communication between synthesizer devices. MIDI Interface MIDI In • MIDI data enters each item of MIDI equipment through the MIDI In port. MIDI Out • All the MIDI data generated by individual pieces of equipment are sent out through the MIDI Out port. • A common error for MIDI setup is: inverted connection of MIDI IN/OUT MIDI Thru • These are used to re-transmit all information received at the MIDI In port using the MIDI Thru port connections. • Often these ports are used to create a chain of connected devices in a single MIDI data path, called a 'daisy chain'. MIDI Hardware A. Pure Musical Input Devices 1. Most common: Keyboard i. Note Polyphony: Nowadays, most keyboard have polyphony ii. Touch response :A keyboard can sense different levels of input pressure MIDI Hardware A. Pure Musical Input Devices 2. Other possible pure input MIDI I/O devices: Guitar, Flute, Violin, Drumset MIDI Hardware B. Other Musical Input Devices 1. Keyboard + synthesizer = keyboard synthesizer • have real-time audio output • Some keyboard synthesizers support DSP (Digital Signal Processing) Which gives more available effects e.g. phaser, chorus

2. Keyboard + synthesizer + sequencer /sampler/effects processors…. = keyboard workstation • compose and make music, just with a keyboard MIDI Hardware C. Controllers 1. Numbered controllers e.g. volume panel 2. Continuous Controllers • You can roll the controller to get a particular value e.g. modulation wheel 3. On/Off controllers • can send two different values (e.g. 0/127) e.g. foot pedal (sustain pedal) MIDI Hardware C. Controllers 4. Universal MIDI controller • Can control all types of control events • In some products, the panel can synchronize with the software: the panel will move if you adjust parameters in the software. MIDI Hardware D. Synthesizer Generates sound from scratch MIDI Software A. Software Sapmlers Eg. Native Instruments Intakt Software Sampler (Macintosh and Windows) MIDI Software B. Recording Softwares eg. cakewalk sonar, cool edit pro , CUbase, logic, protools MIDI Software C. Score Editors e.g. Finale, cakewalk, overture •you can “listen to the score” by playback option neat and tidy

• can do transposition/chord identification….etc, more easily than using handwritten score • Can input a score with real instruments, then tidy it up by quantization Appications of MIDI Can control all types of control events 1. Studio Production • recording, playback, cut-and-splice editing • Creative control/effect can be added

2. Making score • with score editing software, MIDI is excellent in making score • some MIDI software provide function of auto intelligent chord arrangement

3. Learning • You can write a MIDI orchestra for practice Appications of MIDI 4. Commercial products •mobile phone ring tones, music box music…..

5. Musical Analysis •MIDI has detailed parameters for every input note •It is useful for doing research •For example, a pianist can input his performance with a MIDI keyboard, then we can analyze his performance style by the parameters Limitations of MIDI 1. Slow -- Serial transfer • When there have too much continuous data transfer e.g. a lot of control data

2. Slow -- • MIDI is only control information (like Csound score), and time is needed to synthesize the sound

3. Sound quality varies • It depends on which synthesizer you use • Solution: users have to judge by ear, to see which sound is good DIGITAL AUDIO Sound Facts • Sound is a continuous wave that travels through the air • The wave is made up of pressure differences. • Sound is detected by measuring the pressure level • Sound waves have normal wave properties (reflection, refraction, diffraction etc.)

Human ear detecting sound Wave Characteristics • Frequency: Represents the number of periods in a second and is measured in hertz (Hz) or cycles per second. Human hearing frequency range: 20Hz to 20kHz (audio) • Amplitude: The measure of displacement of the air pressure wave from its mean. Related to but not the same as loudness Principal of Digitization Why Digitize? • Microphones, video cameras produce analog signals (continuous-valued voltages) • To store audio or video data into a computer, we must digitize it by converting it into a stream of numbers. Principal of Digitization Sampling: Divide the horizontal axis (time) into discrete pieces Quantization: Divide the vertical axis (signal strength - voltage) into pieces. For example, 8-bit quantization divides the vertical axis into 256 levels. 16 bit gives you 65536 levels. Lower the quantization, lower the quality of the sound. • Linear vs. Non-Linear quantization: If the scale used for the vertical axis is linear we say its linear quantization; If its logarithmic then we call it non-linear Sampling & Quantization

• Sampling rate: Number of samples • 3-bit quantization gives 8 possible per second (measured in Hz) sample values • E.g., CD standard audio uses a • E.g., CD standard audio uses 16-bit sampling rate of 44,100 Hz (44100 quantization giving 65536 values. samples per second) • Why Quantize? • To Digitize!

Digital Audio Data • Digital representation of audio data has many advantages: high noise immunity, stability, and reproducibility. • Audio in digital form allows efficient implementation of various audio processing functions (e.g., mixing, filtering, and equalization). • The conversion from the analog to the digital starts by sampling the audio input in regular discrete intervals of time and quantizing the sampled values into a discrete number of evenly spaced levels. Digital Audio Data • The method of representing each sample with an independent code word is called pulse code modulation (PCM).

• Typical sampling rates range from 8 kilohertz (kHz) to 48 kHz.

• The number of quantizer levels is typically a power of 2.

• The typical number of bits per sample used for digital audio ranges from 8 to 16. Need of Compression • Data rates associated with uncompressed digital audio are large or extensive. • For example, the audio data on a (2 channels of audio sampled at 44.1 kHz with 16 bits per sample) requires a data rate of about 1.4 megabits per second.

• So compression is needed to enable the more efficient storage and transmission of this data.

• Audio compression techniques compromises between encoder and decoder complexity, the compressed audio quality, and the amount of . PCM (Phase Code Modulation) • Method used to digitally represent sampled analog signals. • Standard form of digital audio in computers, Compact Discs, digital telephony and other digital audio applications. • Steps: ■ Sampling ■ Quantizing ■ Encoding PCM (Continue…) Sampling PCM (Continue…) Quantizing • The samples are divided into many discrete levels. Then each sample is numbered according to their corresponding level. • Quantization error - the coded signal is an approximation of the actual amplitude value.

Linear Quantization Non-Linear Quantization PCM (Continue…) Encoding • After quantizing the corresponding level it is to be represented in some manner i.e. binary format. Audio Compression Techniques 1. DM (Delta Modulation) • A delta modulation (DM or Δ-modulation) is an analog-to-digital and digital-to-analog signal conversion technique. • Used for transmission of voice information where quality is not of primary importance. • Simplest form of differential pulse-code modulation (DPCM) • Difference between successive samples are encoded into n-bit data streams. • In delta modulation, the transmitted data are reduced to a 1-bit data stream. Audio Compression Techniques 1. DM (Continue…) Audio Compression Techniques 2. ADPCM (Adaptive Differential Pulse Code Modulation) • Use a larger step-size to encode differences between high-frequency samples. • Smaller step-size for differences between low-frequency samples. • Use previous sample values to estimate changes in the signal in the near future. Audio Compression Techniques 2. ADPCM (Continue…) • This algorithm offers a compression factor of (number of bits per source sample) 4 : 1. • ADPCM encoder computes the difference between each audio sample and its predicted value then outputs the PCM value of the differential. • Extension of Delta Modulation. • 4-bit ADPCM can provide the equivalent of about 12-bit PCM. • Commonly termed as a form of compression, more efficient way of storing waveforms than 16-bit or 8-bit PCM. Audio Compression Techniques 2. ADPCM (Continue…) • Stores value differences between two adjacent PCM samples and makes some assumptions that allow data reduction. • Because of these assumptions low frequencies are properly reproduced • Algorithm is used to map a series of 8 bit µ-law (or a-law) PCM samples into a series of 4 bit ADPCM samples. • In this way, the capacity of the line is doubled. Audio Compression Techniques 2. ADPCM (Continue…) • To ensure differences are always small... § Adaptively change the step-size (quanta). § (Adaptively) attempt to predict next sample value. Audio Compression Techniques 2. ADPCM (Continue…) • Applications

Audio Bit per Compression Algorithm Bit Rate Implementation Sample Format G.721 ADPCM 32 kbit/s 13 bit QuickTime, G.722 ADPCM 64 kbit/s 14 bit RealPlayer Audio Compression Techniques 3. MPEG Audio Compression (Motion Picture Expert Group) • A lossy compression. • Can achieve transparent, perceptually lossless compression. • without regard to the source of the audio data. • Removes distortion and other features that are imperceptible to the human ear. • 6-to-1 compression ratio. • Four channel modes - Monophonic - Dual monophonic - Stereo - Joint- stereo Audio Compression Techniques Audio Compression Techniques 3. MPEG (Continue…) • Filter Bank • The input audio stream passes through a filter bank. • Divides the input into multiple sub bands. • Relatively simple and provide good time resolution. • The Psychoacoustic Model • The input audio stream simultaneously passes through a psychoacoustic model. • The key component of the MPEG encoder that enables its high performance. • Analyses the audio signal and computes the amount of noise masking that is available as a function of frequency. Audio Compression Techniques 3. MPEG (Continue…) • Filter Bank • The input audio stream passes through a filter bank. • Divides the input into multiple sub bands. • Relatively simple and provide good time resolution. • The Psychoacoustic Model • The input audio stream simultaneously passes through a psychoacoustic model. • The key component of the MPEG encoder that enables its high performance. • Analyses the audio signal and computes the amount of noise masking that is available as a function of frequency. Audio Compression Techniques 3. MPEG (Continue…) • Bit/noise Allocation, Quantizer & Coding Block • The bit or noise allocation block uses the signal-to-mask ratios. • It decide how to assign total number of code bits available for the quantization of the sub band signals. • This minimizes the audibility of the quantization noise.

• Bit Stream Formatting Block • This last block takes the representation of the quantized audio samples. • It formats the data into a decodable bit stream. Audio Compression Techniques 3. MPEG (Continue…) • Offers three compatible layers. § Layer I § Layer II § Layer III • Each succeeding layer able to understand the lower layers. • Each succeeding layer offers more complexity and better compression for a given level of audio quality. Audio Compression Techniques 3. MPEG (Continue…) • Applications

Layer Application Layer - I Digital Audio Cassette Layer – II Digital Audio Broadcast Layer - III CD Quality Audio File Conversion • Convert Audio file form one format to another. • Can configure encoding parameters like quality of sound, bitrate, output file format, etc • Converter analyses the format information of audio file. • Then decompress the file in raw audio using decompressor. • This raw audio is provided to the coder executable, which applies encoding parameters/info for file format type. • The tool is generally called as codec (compressor-decompressor) Audio Formats Supported in Android Supported File Type/ Format/ Codec Encoder Decoder Details Container Format • 3GPP (.3gp) Support for mono/stereo/5.0/5.1 content • MPEG-4 (.mp4, .m4a) AAC LC • • with standard sampling rates • ADTS raw AAC (.aac, decode from 8 to 48 kHz. in Android 3.1+, encode in Android 4.0+, ADIF not supported) Support for mono/stereo AAC ELD • MPEG-TS (.ts, not seekable, •(Androi •(Androidcontent with standard (enhanced low Android 3.0+) d 4.1+) 4.1+) sampling rates from 16 to 48 delay AAC) kHz

4.75 to 12.2 kbps sampled @ AMR-NB • • 3GPP (.3gp) 8kHz Audio Formats Supported in Android

Format/ Supported File Type/ Container Encoder Decoder Details Codec Format Mono/Stereo (no multichannel). (Android FLAC FLAC (.) only 3.1+) Sample rates up to 48 kHz. 16-bit recommended MIDI Type 0 and 1. DLS Version 1 and 2. • Type 0 and 1 (.mid, .xmf, .mxmf) MIDI • XMF and Mobile XMF. • RTTTL/RTX (.rtttl, .rtx)• OTA (.ota) Support for ringtone formats • iMelody (.imy) RTTTL/RTX, OTA, and iMelody Mono/Stereo 8-320Kbps constant MP3 • MP3 (.) (CBR) or variable bit-rate (VBR) 8- and 16-bit linear PCM (rates up to limit of hardware). •(Android PCM/WAVE • WAVE (.wav) 4.1+) Sampling rates for raw PCM recordings at 8000, 16000 and 44100 Hz.