<<

ESCOLA TÈCNICA SUPERIOR D’ENGINYERIA ELECTRÒNICA I INFORMÀTICA LA SALLE

PROJECTE FI DE CARRERA

ENGINYERIA EN TELECOMUNICACIÓ

Estudio, Analisis e

implementación del

mediante procesado

digital de la señal

ALUMNE PROFESSOR PONENT

Daniel Alonso Laguna Xavier Sevillano Domínguez

ACTA DE L'EXAMEN DEL PROJECTE FI DE CARRERA

Reunit el Tribunal qualificador en el dia de la data, l'alumne

D. Daniel Alonso Laguna va exposar el seu Projecte de Fi de Carrera, el qual va tractar sobre el tema següent:

Acabada l'exposició i contestades per part de l'alumne les objeccions formulades pels Srs. membres del tribunal, aquest valorà l'esmentat Projecte amb la qualificació de

Barcelona,

VOCAL DEL TRIBUNAL VOCAL DEL TRIBUNAL

PRESIDENT DEL TRIBUNAL

2

DANIEL ALONSO LAGUNA

VOCODER ANALYSIS AND IMPLEMENTATION TROUGH DIGITAL SIGNAL PROCESSING

Canoas, 2009

3

DANIEL ALONSO LAGUNA

VOCODER ANALYSIS AND IMPLEMENTATION TROUGH DIGITAL SIGNAL PROCESSING

Trabalho de Conclusão apresentado para a banca examinadora do curso de Engenharia de Telecomunicações do Centro Universitário La Salle – Unilasalle, como exigência parcial para a obtenção do grau de Bacharel em Engenharia de Telecomunicações, sob orientação do Professor Ms. Alexandre Gaspary Haupt.

CANOAS, 2009

4

AGRADECIMENTOS

À mis padres, por el apoyo incondicional y darme la posibilidad de estudiar todo lo que he querido. A mi novia por el cariño, confianza y paciencia que siempre tuvo con mis labores de eterno estudiante, espero que el fin de mis dias de estudiante acaben pronto pero que nunca deje de aprender contigo. A los profesores a ambos lados del ‗charco‘ que me ayudaron, influenciaron y siempre me dieron algo de luz cuando todo estaba negro. A La Salle como entidad educativa internacional, por darme la posibilidad de estudiar en Brasil. Especialmente a los departamentos de relacciones internacionales de La Salle en Barcelona y Canoas que fueron los que realemente hicieron esto posible. A la Generalitat de Catalunya y al Banco Santander por darme el apoyo economico necesario para llevar esta aventura a acabo.

5

Dedico este trabajo a mi família: especialmente a mis abuel@s que supieron transmitir a mis padres los valores con los que me educaron, y que lamentablemente nos fueron dejando mientras me formaba como ingeniero, y sobretodo como persona. A mi novia Judith, que le costó bien poco liarse la manta a la cabeza y acompañarme hasta Brasil para realizar el proyecto final de carrera. No hubirera sido lo mismo sin tí, gracias de corazón. Os quiero a tod@s con locura.

6

ABSTRACT

Audio effects have been a recurring technique ever since studio recordings developed the mixing console in the fifties, nowadays every music producer know that they are essential. Initially they were developed to simulate natural effects of music, like a choir of singers or the natural reverberation of a church, which is one of the main characteristics of Gregorian sing. In the 70's these effects were simulated with electronic devices, but in the 80's the digital revolution arrived and in the 90's this became an affordable reality for every kind of musician. Actually with computer software is really easy to produce a music recording with a extremely low budget.

This work consists in the analysis and the implementation of a Vocoder special effect, (not the compression device) in different platforms through different techniques studied along this project. Starting with simulations on Matlab and Simulink with the final objective to implement one real time vocoder on Texas Instruments TM320C5416 DSP board

Keywords: Audio Effects; Digital Signal Processing; Vocoder.

7

LIST OF ILUSTRATIONS

FIG. 1 – HOMER DUDLEY ...... 14 FIG. 2 –FORMANT FREQUENCIES FOR DIFFERENT VOWELS ...... 15 FIG. 3 – PRESENTATION OF THE VODER IN 1939 ...... 16 FIG. 4 – VODER‘S SCHEMATIC CIRCUIT ...... 17 FIG. 5 –VODER´S BLOCK DIAGRAM...... 18 FIG. 6 –VODER USED TO PITCHSHIFT A VOICE SIGNAL ...... 19 FIG. 7 – TWO OF THE FIRST RECORDINGS TO USE VOCODER-LIKE EFFECTS: SONOVOX ...... 20 FIG. 8 – PERFORMER USING SONOVOX ...... 20 FIG. 9 – ARTIFICIAL LARYNGES ...... 21 FIG. 10 – PETER FRAMPTON‘S TALK BOX ...... 22 FIG. 11 – GHETTO TALK BOX INSIDE A TOILET PLUNGER ...... 23 FIG. 12 – EARLY 1970S VOCODER, CUSTOM BUILT FOR BAND KRAFTWERK ...... 25 FIG. 13 – TMS320VC5416 BOARD COMPONENT DESCRIPTION ...... 28 FIG. 14 – TMS320VC5416 FUNTIONAL BLOCK DIAGRAM ...... 29 FIG. 15 – PIONEER DM DV5 MICROPHONE ...... 39 FIG. 16 – SONY MDR XD100 HEADPHONES ...... 40 FIG. 17 – CODE COMPOSER STUDIO MAIN WINDOW ...... 41 FIG. 18 – TI DSP THIRD PARTY SUPPORT ...... 47 FIG. 19 – MATLAB MAIN WINDOW AND GRAPHS ...... 49 FIG. 20 – AUDACITY EDITING SOUND WINDOW ...... 52 FIG. 21 – 3XOSC MAIN PARAMETER CONFIGURATION WINDOW ...... 56 FIG. 22 – VOCODER PARAMETER CONFIGURATION WINDOW ...... 58 FIG. 23 – KORG VOCODER VC10 ...... 61 FIG. 24 – SIBILANT SOUND OF AN S ...... 63 FIG. 25 – PLOSIVE SOUND OF A P ...... 64 FIG. 26 – SPECTRUM OF AN S ...... 64 FIG. 27 – SPECTRUM OF A P ...... 65 FIG. 28 – SPECTRUM OF AN O ...... 65 FIG. 29 – ZOOM INTO THE SPECTRUM OF AN O ...... 66 FIG. 30 – BASS DRUM SPECTRUM ...... 67 FIG. 31 – SNARE DRUM SPECTRUM ...... 67 FIG. 32 – LOW TOM DRUM SPECTRUM ...... 67 FIG. 33 – 3XOSC SAWTOOTH CONFIGURATION ...... 69 FIG. 34 – 3XOSC SAWTOOTH DRY PLOTTED SIGNAL ...... 69 FIG. 35 – 3XOSC SAWTOOTH DRY SPECTRUM ...... 70 FIG. 36 – 3XOSC SAWTOOTH WITH FLANGER PLOTTED SIGNAL ...... 70 FIG. 37 – 3XOSC SAWTOOTH WITH FLANGER+DELAY PLOTTED SIGNAL ...... 71 FIG. 38 – 3XOSC SAWTOOTH SPECTRUM WITH FLANGER+DELAY ...... 72 FIG. 39 – 3XOSC STRING SPECTRUM WITH/WITH-OUT FLANGER+DELAY ...... 72 FIG. 40 – VOCODER SIMPLE BLOCK DIAGRAM ...... 73 FIG. 41 – PAIA VOCODER BLOCK DIAGRAM ...... 75 FIG. 42 – FILTER BANK INTERPRETATION VS. FOURIER TRANSFORM INTERPRETATION ...... 78 FIG. 43 – HANNING WINDOW ...... 79 FIG. 44 – FFT (BLUE) VS. BANK FILTER (RED) NUMBER OF MULTIPLICATIONS ...... 80 8 FIG. 45 – SPECTRAL ENVELOPE CORRECTION ...... 82 FIG. 46 – DB MAGNITUDE FREQUENCY RESPONSE OF VARIOUS WINDOWS (A) BARTLETT, (B) HANNING, (C) HAMMING, (D) BLACKMAN ...... 88 FIG. 47 –FFT SEGMENTS OVERLAPPED ...... 89 FIG. 48 – MERGING FFT INTO AVERAGED BIGGER BANDS ...... 90 FIG. 49 – TIME PLOT OF THE 3 SIGNALS INVOLVED IN THE VOCODER ...... 92 FIG. 50 – TIME PLOT OF A POORER VOCODED SIGNAL ...... 93 FIG. 51 – TIME PLOT OF DRUMS VOCODING ...... 94 FIG. 52 – SPECTRUM OF THE ORIGINAL VOICE ...... 95 FIG. 53 – SPECTRUM OF VOCODER‘S OUTPUT ...... 96

9 LIST OF TABLES

TABLE 1– 9-BAND DESIGN FOR THE BANDPASS FILTER ...... 74 TABLE 2– SUBJECTIVE PERFORMANCE OF MATLAB VOCODER ...... 93

10

SUMÁRIO

1. INTRODUCTION ...... 13

1.1. VOCODER HISTORY ...... 13

2.1.1. TMS320C5416 STARTER KIT DSP BOARD ...... 27

2.1.1.1. BRIEF DESCRIPTION ...... 27

2.1.1.2. FEATURES...... 27

2.1.1.4. MEMORY ...... 31

2.1.1.5. RELOCATABLE INTERRUPT VECTOR TABLE...... 36

2.1.1.6. MULTICHANNEL BUFFERED SERIAL PORTS (MCBSPS) ...... 37

2.1.2. PIONEER DM-DV5 MICROPHONE ...... 39

2.1.3. SONY MDR-XD100 STEREO HEADPHONES ...... 40

2.2. SOFTWARE...... 41

2.2.1. CODE COMPOSER STUDIO ...... 41

2.2.2. MATLAB ...... 49

2.2.2.1. SIMULINK ...... 51

2.2.3. AUDACITY ...... 52

2.2.4. FRUITY LOOPS STUDIO ...... 54

2.2.5. SONY SOUND FORGE ...... 60

3. VOCODER’S DESIGN ASPECTS ...... 61

3.1. MODULATOR ...... 62

3.1.1. MODULATING SIGNAL ANALYSIS ...... 63

3.2. CARRIER ...... 68

3.2.1. CARRIER SIGNAL ANALYSIS ...... 69

3.3. BLOCK DIAGRAM ...... 73

3.3.1. BLOCK IMPROVEMENTS AND OTHER ASPECTS ...... 75

3.4. FILTERING VS. FFT ...... 77

3.5. THE PHASE VOCODER ...... 82

4. RESULTS ...... 84 11 4.1. MATLAB ...... 84

4.1.1. CODE DEVELOVED ...... 84

4.1.2. TIME ANALYSIS ...... 91

4.1.3. FREQUENCY ANALYSIS ...... 95

5. CONCLUSIONS ...... 97

5.1. FUTURES LINES OF WORK ...... 98

REFERENCES ...... 99

12

1. INTRODUCTION

A vocoder (pronounced /ˈvoʊkoʊdər/, a combination of the words voice and encoder) is an analysis / synthesis system, mostly used for speech in which the input is passed through a multiband filter, each filter is passed through an envelope follower, the control signals from the envelope followers are communicated, and the decoder applies these (amplitude) control signals to corresponding filters in the (re).

1.1. VOCODER HISTORY

It was originally developed as a speech coder for telecommunications applications in the 1930s, the idea being to code speech for transmission. Its primary use in this fashion is for secure radio communication, where voice has to be encrypted and then transmitted. The advantage of this method of "encryption" is that no 'signal' is sent, but rather envelopes of the bandpass filters. The receiving unit needs to be set up in the same channel configuration to resynthesize a version of the original signal spectrum. The vocoder as both hardware and software has also been used extensively as an electronic musical instrument. Most analog vocoder systems use a number of frequency channels, all tuned to different frequencies (using band-pass filters). The various values of these filters are stored not as the raw numbers, which are all based on the original fundamental frequency, but as a series of modifications to that fundamental needed to modify it into the signal seen in the output of that filter. During playback these settings are sent back into the filters and then added together, modified with the knowledge that speech typically varies between these frequencies in a fairly linear way. The result is recognizable speech, although somewhat "mechanical" sounding. also often include a second system for generating unvoiced sounds, using a noise generator instead of the fundamental frequency.

13

Fig. 1 – Homer Dudley Font: www.music.psu.edu.

The first experiments with a vocoder were conducted in 1928 by Homer Dudley, a researcher at Bell Labs Acoustical Research division, investigated the possibility of synthesizing human speech electronically. The aim was to conserve bandwidth over telephone circuits by sending control signals instead of actual vocal signals. Dudley discovered that vowels could be simulated with an oscillator that produced a wave containing many harmonics and a set of bandpass filters that eliminated all but a specific set of frequencies (a limited band).

The wave is analogous to the sound produced by the larynx. The vocal tract then performs a variety of filtering operations on the sound, depending on the vowel being produced. Vowels are characterized by specific formants that appear in the spectrum of a person's voice. Regardless of the pitch at which a person speaks or sings, these formant ranges are predominant. Recognizable vowel sounds may be produced with as few as three formants, each centered on a given frequency range and at a respective amplitude.

14 The formant frequencies and amplitudes differ, depending on the vowel sound being produced. Three approximations are shown below:

Fig. 2 –Formant Frequencies for different vowels Font: www.music.psu.edu.

Another characteristic of speech is noise, which the vocal tract filters to produce unvoiced sounds (such as "sss," "shhh," "fff") and plosives ("k," "ch," "p"). Voiced plosives -- such as "buh" and "duh" -- are produced with a combination of filtered noise and vowel sounds.

15

Fig. 3 – Presentation of the Voder in 1939 Font: www.music.psu.edu.

At the 1939 World's Fair in New York and in San Francisco, Bell introduced the voder (Voice Operating DEmonstratoR, a machine by which a technician could create a facsimile of human speech. The machine produced a sawtooth-like wave that was sent through a series of bandpass filters. The operator manipulated a set of ten switches, each of which controlled the output level of a bandpass filter. Depressing a bar with the wrist controlled the balance of pitched sound and noise. A footpedal controlled the pitch, allowing vocal inflections to be produced.

Whereas the vocoder analyzes speech, transforms it into electronically transmitted information, and recreates it, the voder generates synthesized speech by means of a console with fifteen touch-sensitive keys and a foot pedal, basically consisting of the "second half" of the vocoder, but with manual filter controls, needing a highly trained operator. Inside the tall rack of sturdy electronic gear was a pitch controlled reedy oscillator, a white-noise source, and ten bandpass resonant filters. For a Voder to "speak" a talented, diligently trained operator "performed" at a special console

16 connected to the rack, using touch-sensitive keys and a foot-pedal. These controlled the electronic generating components. The results, while far from perfect (it was damn difficult to operate!), were still entertaining and instructive of the principles involved.

Fig. 4 – Voder‘s Schematic circuit Font: www.music.psu.edu.

Having shown that intelligible speech could be produced comparatively simply, Dudley's next step was to eliminate the human operator and instead create an analysis unit. In 1940, Dudley introduced the vocoder (VOice CODER). A vocal signal was sent through a bank of bandpass filters. The output levels of each filter were then directed to a corresponding output filter, through which noise and "buzzy" sound was sent. The result was that the signal sent through the output filters would "talk."

17

Fig. 5 –Voder´s block diagram Font: www.music.psu.edu.

The vocoder, then, employs subtractive synthesis, as did the Trautonium. The two steps, analysis/synthesis, not only allow speech to be reproduced, but also to be manipulated. One way is to vary the pitch of the signal being sent through the output filter bank, thus making the "voice" higher or lower. "Harmonies" may be produced by sending more than one signal to the synthesis filters. Another way is to redirect the output of the analysis filters to synthesis filters that do not output the same frequency band. For example, the analysis of low frequencies may be sent to high output frequencies, thus remapping portions of the input spectrum. Directing low analysis frequencies to higher synthesis frequencies may produce a sound that is nasal. Directing high analysis frequencies to low synthesis frequencies may produce a sound that sounds like the speaker has a bad cold.

18

Fig. 6 –Voder used to pitchshift a voice signal Font: www.music.psu.edu.

The irony of the vocoder was that it was expensive, both in terms of bandwidth and in circuitry, to analyze and resynthesize vocal signals effectively. So it completely defeated Dudley's original purpose, and was never used in telephone technology. Despite of Dudley's vocoder was used in the SIGSALY system, which was built by Bell Labs engineers in 1943. The SIGSALY system was used for encrypted high-level communications during World War II.

19

Fig. 7 – Two of the first recordings to use vocoder-like effects: Sonovox Font: www.musicofsound.co.nz.

Many recordings in the 1940s and 1950s used vocoder-like effects. Examples include a talking foghorn on lifeboy soap commercials, a talking train on Bromo Seltzer commercials, children's recordings such as Sparky's Magic Piano, and the talking train in the Disney film Dumbo. These effects were not produced with a vocoder, but with a simpler device called a Sonovox. The unusual sound was created by Sonovox, a device invented in January 1939 by Gilbert Wright, an engineer and radio operator. Wright hadn't shaved that particular day and was idly scratching the coarse stubble around his adam's apple.

Fig. 8 – Performer using Sonovox Font: www.musicofsound.co.nz.

20 He noticed that the sound of this action traveled through his neck and emerged from his mouth as a buzzing. Intrigued, he tried silently forming words with his mouth, lips, and tongue... and was surprised and amused to find that the words were intelligible using this odd alternate source of sound. The Sonovox worked via an audio input, but instead of loudspeakers it had two small disks. The device was held to the throat, with the two disks pressing on either side of the larynx, and a performer would silently mouth the words of a speech passage, being careful to add the unvoiced fricatives (f, sh, t, etc.) The audio sent to the disks would be substituted for vocal cord energy, and the result was a "talking signal." Such devices are now used medically for patients who have had their larynxes removed -- a buzzing sound produced by the device allows those who cannot speak to create an audible, speech-like sound.

Fig. 9 – Artificial larynges Font: www.luminaud.com.

21 Later devices, such as the "talk box," used by artists such as Peter Frampton, were based on the Sonovox. A talk box is an effects device that allows a musician to modify the sound of a musical instrument. The musician controls the modification by lip syncing, or by changing the shape of their mouth. The effect can be use d to shape the frequency content of the sound and to apply speech sounds (in the same way as singing) onto a musical instrument, typically a guitar (its non-guitar use is often confused with the vocoder) and keyboards.

Fig. 10 – Peter Frampton‘s Talk Box Font: wikipedia.org/wiki/Talk_box.

A talk box is usually an effects pedal that sits on the floor and contains a speaker attached with an airtight connection to a plastic tube; however, it can come in other forms, such as the 'Ghetto Talkbox' (a homemade version which is usually crude) or higher quality custom-made versions. The speaker is generally in the form of a compression driver, the sound-generating part of a horn loudspeaker with the horn replaced by the tube connection.

The box has connectors for the connection to the speaker output of an instrument amplifier and a connection to a normal instrument speaker. A foot-operated switch on the box directs the sound either to the talkbox speaker or to the normal speaker. The switch is usually a push-on/push-off type. The other end of the tube is taped to the side of a microphone, extending enough to direct the reproduced sound in or near the performer's mouth. When activated, the sound from the amplifier is reproduced

22 by the speaker in the talkbox and directed through the tube into the performer's mouth. The shape of the mouth filters the sound, with the modified sound being picked up by the microphone. The shape of the mouth changes the harmonic content of the sound in the same way it affects the harmonic content generated by the vocal folds when speaking.

Fig. 11 – Ghetto Talk box inside a Toilet Plunger Font: www.instructables.com/id/Build-a-Talk-box-inside-a-Toilet-Plunger/

The performer can vary the shape of the mouth and position of the tongue, changing the sound of the instrument being reproduced by the talkbox speaker. The performer can mouth words, with the resulting effect sounding as though the instrument is speaking. This "shaped" sound exits the performer's mouth, and when it enters a microphone, an instrument/voice hybrid is heard. The sound can be that of any musical instrument, but the effect is most commonly associated with the guitar. The rich harmonics of an electric guitar are shaped by the mouth producing a sound very similar to voice, effectively allowing the guitar to appear to "speak".

23 1.2. MUSICAL AND CINEMA HISTORY

For musical applications, a source of musical sounds is used as the carrier, instead of extracting the fundamental frequency. For instance, one could use the sound of a synthesizer as the input to the filter bank, a technique that became popular in the 1970s.

In 1969, electronic music pioneer Bruce Haack built one of the first truly musical vocoders. He named it 'Farad' after 1800's English chemist / physicist Michael Faraday and unlike its successors and predecessors, 'Farad' was programmed by touch and proximity relays. This invention was first used on Haack's album The Electronic Record for Children (1969), a DIY home pressing found mostly in libraries and elementary schools. In 1970 Wendy Carlos and followed with a 10- band device inspired by the vocoder designs of Homer Dudley. It was originally called a spectrum encoder-decoder, and later referred to simply as a vocoder. The carrier signal came from a Moog modular synthesizer, and the modulator from a microphone input. The output of the 10-band vocoder was fairly intelligible, but relied on specially articulated speech. Later improved vocoders use a high-pass filter to let some sibilance through from the microphone; this ruins the device for its original speech- coding application, but it makes the "talking synthesizer" effect much more intelligible.

Carlos and Moog's vocoder was featured in several recordings, including the soundtrack to Stanley Kubrick's A Clockwork Orange in which the vocoder sang the vocal part of Beethoven's "Ninth Symphony". Also featured in the soundtrack was a piece called "Timesteps," which featured the vocoder in two sections. "Timesteps" was originally intended as merely an introduction to vocoders for the "timid listener", but Kubrick chose to include the piece on the soundtrack, much to the surprise of Wendy Carlos. Stanley Kubrick used the Eltro to create the lobotomized / dying voice of HAL in "2001," something I describe there, having experienced firsthand a little of this particular forgotten audio history.

24

Fig. 12 – Early 1970s vocoder, custom built for electronic music band Kraftwerk Font: www.wikipedia.org/wiki/File:Vocoder.JPG

Bruce Haack's Electric Lucifer (1970) was the first rock album to include the vocoder and was followed several years later by Kraftwerk's Autobahn. Another of the early songs to feature a vocoder was "The Raven" on the 1976 album Tales of Mystery and Imagination by progressive rock band The Alan Parsons Project; the vocoder also was used on later albums such as I Robot. Following Alan Parsons' example, vocoders began to appear in pop music in the late 1970s, for example, on disco recordings. Jeff Lynne of Electric Light Orchestra used the vocoder in several albums such as Time (featuring the Roland VP-330 Plus MkI). ELO songs such as "Mr. Blue Sky" and "Sweet Talking Woman" both from Out of the Blue (1977) use the vocoder extensively. Featured on the album are the EMS Vocoder 2000W MkI, and the EMS Vocoder (-System) 2000 (W or B, MkI or II).

Giorgio Moroder made extensive use of the vocoder on the 1975 album Einzelganger and on the 1977 album From Here to Eternity Another example is Pink Floyd's album Animals where the band put the sound of a barking dog through the device. Vocoders are often used to create the sound of a robot talking, as in the Styx song

25 "Mr. Roboto". It was also used for the introduction to the Main Street Electrical Parade at Disneyland.

Vocoders have appeared on pop recordings from time to time ever since, most often simply as a special effect rather than a featured aspect of the work. However, many experimental electronic artists of the New Age music genre often utilize vocoder in a more comprehensive manner in specific works, such as Jean Michel Jarre (on Zoolook, 1984) and Mike Oldfield (on Five Miles Out, 1982). There are also some artists who have made vocoders an essential part of their music, overall or during an extended phase. Examples include the German synthpop group Kraftwerk, Stevie Wonder {"Send One Your Love," "A Seed's a Star"], jazz/fusion keyboardist Herbie Hancock during his late 1970s disco period, the synth-funk groups Midnight Star and The Jonzun Crew during the mid 1980s, French jazz organist Emmanuel Bex, Patrick Cowley's later recordings and more recently, avant-garde pop groups Trans Am, Black Moth Super Rainbow, Daft Punk, ROCKETS, Does It Offend You, Yeah?, The Medic Droid, electronica band The Secret Handshake, the Christian synthpop band Norway, as well as metal bands such as At All Cost, Boots With Spurs and Cynic, electronica/ progressive bands I See Stars and Breathe Carolina, and most recently Japanese electronica/dance band m.o.v.e.

The list of artists that nowadays use a vocoder is interminable, from Maddona to System of a Down.

26 2. METHODS AND MATERIALS

2.1. HARDWARE

2.1.1. TMS320C5416 starter kit DSP Board

2.1.1.1. Brief Description

The TMS320C5416 DSP starter kit (DSK) is a low-cost development platform designed to speed the development of power-efficient applications based on Texas Instruments's TMS320C54x DSPs. The kit, which provides new performance- enhancing features such as USB communications and true plug-and-play functionality, gives both experienced and novice designers an easy way to get started immediately with innovative product designs. The C5416 DSK offers the ability to detect, diagnose and correct DSK communications issues, download and step through code faster and get a higher throughput with Real Time Data Exchange (RTDX™).

2.1.1.2. Features

The TMS320C5416 features the TMS320C5416 DSP - the designer's choice for applications that require an optimized combination of power performance and area. With 160 MIPS performance, designers can use the 160 MHz device as the foundation for a range of signal processing applications, including speech compression/decompression, speech recognition, text-to-speech conversion, fax/data conversion and echo cancellation. Other hardware features of the TMS320C5416 DSK board include:

Embedded JTAG support via USB

High-quality 16-/20-bit stereo codec

Four 3.5mm audio jacks for microphone, line in, speaker and line out

256K words of Flash and 64K words RAM

Expansion port connector for plug-in modules

27 On-board standard JTAG interface

+5V universal power supply

160 MHz DSP.

PCM3002 stereo codec.

Sample rate 6KHz-48KHz.

3 multichannel buffered serial port.

DSP/BIOS real-time multitasking kernel.

Fig. 13 – TMS320VC5416 board component description Font: Prof. Artur Severo‘s personal files 2009

The TMS320VC5416 fixed-point, digital signal processor (DSP) (hereafter referred to as the device unless otherwise specified) is based on an advanced modified Harvard architecture that has one program memory bus and three data memory buses. This

28 processor provides an arithmetic logic unit (ALU) with a high degree of parallelism, application-specific hardware logic, on-chip memory, and additional on-chip peripherals. The basis of the operational flexibility and speed of this DSP is a highly specialized instruction set.

Separate program and data spaces allow simultaneous access to program instructions and data, providing a high degree of parallelism. Two read operations and one write operation can be performed in a single cycle. Instructions with parallel store and application-specific instructions can fully utilize this architecture. In addition, data can be transferred between data and program spaces. Such parallelism supports a powerful set of arithmetic, logic, and bit-manipulation operations that can all be performed in a single machine cycle. The device also includes the control mechanisms to manage interrupts, repeated operations, and function calls.

2.1.1.3. Functional Overview

Fig. 14 – TMS320VC5416 Funtional Block diagram Font: TM320VC5416 manual

29

30 2.1.1.4. Memory

The device provides both on-chip ROM and RAM memories to aid in system performance and integration

The Data Memory space addresses up to 64K of 16-bit words. The device automatically accesses the on-chip RAM when addressing within its bounds. When an address is generated outside the RAM bounds, the device automatically generates an external access.

The advantages of operating from on-chip memory are as follows:

Higher performance because no wait states are required

Higher performance because of better flow within the pipeline of the central arithmetic logic unit (CALU)

Lower cost than external memory

Lower power than external memory

The advantage of operating from off-chip memory is the ability to access a larger address space.

Concerning Program Memory, software can configure their memory cells to reside inside or outside of the program address map. When the cells are mapped into program space, the device automatically accesses them when their addresses are within bounds. When the program-address generation logic generates an address outside its bounds, the device automatically generates an external access. The advantages of operating from on-chip memory are as follows:

Higher performance because no wait states are required

Lower cost than external memory

Lower power than external memory

31 The advantage of operating from off-chip memory is the ability to access a larger address space.

The device uses a paged extended memory scheme in program space to allow access of up to 8192K of program memory. In order to implement this scheme, the device includes several features which are also present on C5416:

Twenty-three address lines, instead of sixteen

An extra memory-mapped register, the XPC

Six extra instructions for addressing extended program space

Program memory in the device is organized into 128 pages that are each 64K in length.

The value of the XPC register defines the page selection. This register is memory- mapped into data space to address 001Eh. At a hardware reset, the XPC is initialized to 0.

32 On-Chip ROM With Bootloader

The device features a 16K-word ´ 16-bit on-chip maskable ROM that can only be mapped into program memory space. Customers can arrange to have the ROM of the device programmed with contents unique to any particular application. A bootloader is available in the standard on-chip ROM.

This bootloader can be used to automatically transfer user code from an external source to anywhere in the program memory at power up. If MP/MC of the device is sampled low during a hardware reset, execution begins at location FF80h of the on- chip ROM. This location contains a branch instruction to the start of the bootloader program.

The standard devices provide different ways to download the code to accommodate various system requirements:

Parallel from 8-bit or 16-bit-wide EPROM

Parallel from I/O space, 8-bit or 16-bit mode

Serial boot from serial ports, 8-bit or 16-bit mode

Host-port interface boot

Warm boot

On-Chip RAM

The device contains 64K-word ´ 16-bit of on-chip dual-access RAM (DARAM) and 64K-word ´ 16-bit of on-chip single-access RAM (SARAM).

The DARAM is composed of eight blocks of 8K words each. Each block in the DARAM can support two reads in one cycle, or a read and a write in one cycle. Four blocks of DARAM are located in the address range 0080h-7FFFh in data space, and

33 can be mapped into program/data space by setting the OVLY bit to one. The other four blocks of DARAM are located in the address range 18000h-1FFFFh in program space. The DARAM located in the address range 18000h-1FFFFh in program space can be mapped into data space by setting the DROM bit to one.

The SARAM is composed of eight blocks of 8K words each. Each of these eight blocks is a single-access memory. For example, an instruction word can be fetched from one SARAM block in the same cycle as a data word is written to another SARAM block. The SARAM is located in the address range 28000h-2FFFFh, and 38000h-3FFFFh in program space.

34 On-Chip Memory Security

The device has a maskable option to protect the contents of on-chip memories. When the RAM/ROM security option is selected, the following restrictions apply:

· Only the on-chip ROM originating instructions can read the contents of the on-chip ROM; on-chip RAM and external RAM originating instruction can not read data from ROM: instead 0FFFFh is read. Code can still branch to ROM from on-chip RAM or external program memory.

· The contents of on-chip RAM can be read by all instructions, even by instructions fetched from external memory. To protect the internal RAM, the user must never branch to external memory.

· The security feature completely disables the scan-based emulation capability of the 54x to prevent the use of a debugger utility. This only affects emulation and does not prevent the use of the JTAG boundary scan test capability.

· The device is internally forced into microcomputer mode at reset (MP/MC bit forced to zero), preventing the ROM from being disabled by the external MP/MC pin. The status of the MP/MC bit in the PMST register can be changed after reset by the user application.

· HPI writes have no restriction, but HPI reads are restricted to the 4000h - 5FFFh address range. If the ROM-only security option is selected the following restrictions apply

· Only the on-chip ROM originating instructions can read the contents of the on-chip ROM; on-chip RAM and external RAM originating instruction cannot read data from ROM: instead 0FFFFh is read. Code can still branch to ROM from on-chip RAM or external program memory. · The contents of on-chip RAM can be read by all instructions, even by instructions fetched from external memory. To protect the internal RAM the user must never branch to external memory.

35 · The security feature completely disables the scan-based emulation capability of the 54x to prevent the use of a debugger utility. This only affects emulation and does not prevent the use of the JTAG boundary scan test capability.

· The device can be started in either microcomputer mode or microprocessor mode at reset (depends on the MP/MC pin).

· HPI read and writes have no restriction.

2.1.1.5. Relocatable Interrupt Vector Table

The reset, interrupt, and trap vectors are addressed in program space. These vectors are soft-meaning that the processor, when taking the trap, loads the program counter (PC) with the trap address and executes the code at the vector location. Four words, either two 1-word instructions or one 2-word instruction, are reserved at each vector location to accommodate a delayed branch instruction which allows branching to the appropriate interrupt service routine without the overhead. At device reset, the reset, interrupt, and trap vectors are mapped to address FF80h in program space.

However, these vectors can be remapped to the beginning of any 128-word page in program space after device reset. This is done by loading the interrupt vector pointer (IPTR) bits in the PMST register with the appropriate 128-word page boundary address. After loading IPTR, any user interrupt or trap vector is mapped to the new 128-word page.

36 2.1.1.6. Multichannel Buffered Serial Ports (McBSPs) The device provides three high-speed, full-duplex, multichannel buffered serial ports that allow direct interface to other C54x/LC54x devices, codecs, and other devices in a system. The McBSPs are based on the standard serial-port interface found on other 54x devices. Like their predecessors, the McBSPs provide:

Full-duplex communication

Double-buffer data registers, which allow a continuous data stream

Independent framing and clocking for receive and transmit

In addition, the McBSPs have the following capabilities:

Direct interface to:

o T1/E1 framers

o MVIP switching compatible and ST-BUS compliant devices

o IOM-2 compliant devices

o AC97-compliant devices

o IIS-compliant devices

o Serial peripheral interface

Multichannel transmit and receive of up to 128 channels

A wide selection of data sizes, including 8, 12, 16, 20, 24, or 32 bits

m-law and A-law companding

Programmable polarity for both frame synchronization and data clocks

Programmable internal clock and frame generation

37 The McBSP consists of a data path and control path. The six pins, BDX, BDR, BFSX, BFSR, BCLKX, and BCLKR, connect the control and data paths to external devices. The implemented pins can be programmed as general-purpose I/O pins if they are not used for serial communication.

The data is communicated to devices interfacing to the McBSP by way of the data transmit (BDX) pin for transmit and the data receive (BDR) pin for receive. The CPU or DMA reads the received data from the data receive register (DRR) and writes the data to be transmitted to the data transmit register (DXR). Data written to the DXR is shifted out to BDX by way of the transmit shift register (XSR). Similarly, receive data on the BDR pin is shifted into the receive shift register (RSR) and copied into the receive buffer register (RBR). RBR is then copied to DRR, which can be read by the CPU or DMA. This allows internal data movement and external data communications simultaneously.

Control information in the form of clocking and frame synchronization is communicated by way of BCLKX, BCLKR, BFSX, and BFSR. The device communicates to the McBSP by way of 16-bit-wide control registers accessible via the internal peripheral bus.

The control block consists of internal clock generation, frame synchronization signal generation, and their control, and multichannel selection. This control block sends notification of important events to the CPU and DMA by way of two interrupt signals, XINT and RINT, and two event signals, XEVT and REVT.

The on-chip companding hardware allows compression and expansion of data in either m-law or A-law format. When companding is used, transmitted data is encoded according to the specified companding law and received data is decoded to 2s complement format. The sample rate generator provides the McBSP with several means of selecting clocking and framing for both the receiver and transmitter. Both the receiver and transmitter can select clocking and framing independently. The McBSP allows the multiple channels to be independently selected for the transmitter and receiver. When multiple channels are selected, each frame represents a time- division multiplexed (TDM) data stream. In using time-division multiplexed data streams, the CPU may only need to process a few of them. Thus, to save memory

38 and bus bandwidth, multichannel selection allows independent enabling of particular channels for transmission and reception. All 128 channels in a bit stream consisting of a maximum of 128 channels can be enabled.

2.1.2. PIONEER DM-DV5 MICROPHONE

Fig. 15 – Pioneer DM DV5 Microphone Font: www.pioneer.com

To record voice or even use it as an entrance for the DSP, Pioneer DM-DV5 microphone it‘s been used. The main specifications are described here

Unidirectional dynamic type microphone

Frequency response: 80 - 13.000 Hz

Sensibility: - 57 dB (0 dB = 1V/Pa, at 1 kHz)

Impedance: 600 Ω

3.5 mm 2P mini-plug (gold plated)

Weight: 255 g including cord

Accessories: 6.3 mm 2P plug adapter (gold plated)

Talk switch ON/OFF

39 2.1.3. SONY MDR-XD100 STEREO HEADPHONES

Fig. 16 – Sony MDR XD100 Headphones Font: www.sony.com

To perform a good listening while developing effects these headphones where chosen to have crystal-clear sound with a student budget. Specifications:

Type : Dynamic, Closed

Driver Unit : 40mm

Sensitivity : 100dB/mW

Power Handling Capacity : 1.5W

Impedance : 70ohms

Frequency Response : 12-22,000Hz

40 2.2. SOFTWARE

2.2.1. Code Composer Studio

Code Composer Studio (CCS) is the integrated development environment for TI's DSPs, microcontrollers and application processors.

Fig. 17 – Code Composer Studio main window Font: Daniel‘s personal files 2009

Code Composer Studio™ IDE Development Tools is a powerful set of integrated development tools that can be enhanced with TI and third-party plug-ins. With all these tools in one integrated environment, CCStudio boosts effectiveness and productivity. Code Composer Studio (CCStudio) Platinum Edition series provides developers with a unified DSP platform development environment with a collection of integrated tools designed to boost productivity and reduce common debug frustrations. With powerful features for every stage of the development process, the CCStudio Platinum enables developers to meet their application performance and schedule goals with easy-to-use, robust development tools and software.

Code Composer Studio includes a suite of tools used to develop and debug embedded applications. It includes compilers for each of T I 's device families, source 41 code editor, project build environment, debugger, profiler, simulators and many other features. The CCS IDE provides a single user interface taking you through each step of the application development flow. Familiar tools and interfaces allow users to get started faster than ever before and add functionality to their application thanks to sophisticated productivity tools.

As of version 4 CCS is based on the Eclipse open source software framework. The Eclipse software framework is used for many different applications but it was originally developed as an open framework for creating development tools. Texas Instruments have chosen to base CCS on Eclipse as it offers an excellent software framework for building software development environments and is becoming a standard framework used by many embedded software vendors. CCS combines the advantages of the Eclipse software framework with advanced embedded debug capabilities from Texas Instruments resulting in a compelling feature rich development environment for embedded developers.

Code Composer Studio IDE Key Benefits

Quick start with familiar tools and interfaces

Easily manage large multi-user, multi-site and multi-processor projects

Utilize fast code creation, optimization and debugging tools

Maximize reuse and portability for faster code development

Perform real-time analysis enabled by RTDX and DSP/BIOS technologies

42 CCStudio Platinum Highlights

End Debug Frustration and Stress

Connect/Disconnect makes it easier to connect and disconnect with the target dynamically. This new functionality provides robust andresilient connection to the target board and even allows you to restorethe previous debug state when connecting again.

Rewind Debugging saves developers from repeating tedious reload and rerun sequences with a single keystroke backstep option to quickly jump to the previous instruction in the source code.

Unified Breakpoint Manager quickly organizes and manages both software and hardware breakpoints from an easy-to-use interface.

Shorten Learning Curve and Simple Configuration Control

CodeWright: Industry‘s most familiar and popular editor now integrated with CCStudio shortens learning curve and offers rich editing capabilities.

Component manager: Simple configuration control of CCStudio components allows developers to freeze or evaluate new versions of compilers and DSP/BIOS kernel releases.

CCStudio IDE: One look and feel for three DSP platforms eliminates learning curve for evolving/expanding cross-platform DSP development.

43 Achieve Programming Goals in Less Time with Maximum Results

Application Code Tuning takes weeks out of the optimization process with a collection of integrated tuning tools. CCStudio enables developers to build applications faster and easier than ever before, utilizing a ―tuning approach‖ that allows full optimization of both the code and the silicon.

Tuning Dashboard provides the user with a single interface for managing the optimization process. A user-defined Goals window allows the user to setup optimization targets and track progress towards the desired goals. The dashboard also contains a proactive Advice Window that provides specific optimization suggestions and advice on which tuning tools to use to achieve development goals. A Profile Setup and Viewer feature manages and displays the data collected during development.

Compiler Consultant analyzes your application and makes recommendations for efficient coding. Each time the application is compiled, Compiler Consultant will examine the code and create suggestions for different optimization techniques to improve code efficiency.

CacheTune makes it easier to identify non-optimal cache usage by graphically representing cache memory accesses. This visual/temporal view of cache accesses enables quick identification of problem areas (such as areas related to conflict, capacity, or compulsory misses) to help you greatly improve an application‘s overall cache efficiency.

44 Texas Instruments have developed eXpressDSP Real-Time Target-Side Software that saves valuable time, with the following features:

DSP/BIOS™ Kernel – a proven, scalable, real-time software kernel including chip support libraries that create the foundation for software development work, eliminating many low-level coding tasks and greatly simplifying real-time task scheduling.

TMS320™ DSP Algorithm Standard – an interoperability coding standard that facilitates the reuse of software components from your previous projects, other developers and outside sources.

eXpressDSP Reference Frameworks – a set of open source, Cbased starterware templates optimized for multiple application parameters. Production-ready code, which is simple to modify, enables you tobe up and running very quickly.

DSP/BIOS Kernel

DSP/BIOS is a scalable, real-time kernel that is designed for applications that require real-time scheduling and synchronization, host-to-target communication, or real-time instrumentation.

The DSP/BIOS kernel is packaged as a set of modules that can be linked into an application. It is integrated with Code Composer Studio™ Integrated Development Environment (IDE), requires no runtime license fees, and is fully supported by Texas Instruments. The kernel is also a key component of TI‘s eXpressDSP™ technology.

45 DSP/BIOS kernel enables you to develop and deploy sophisticated applications and eliminates the need to develop and maintain custom operating systems or control loops. Because multi-threading enables real-time applications to be cleanly partitioned, applications using DSP/BIOS kernel are easier to maintain and new functions can be added without disrupting realtime response. DSP/BIOS kernel provides standardized APIs across TMS320C2000™, TMS320C5000™ and TMS320C6000™ DSP platforms to support rapid application migration. Additionally, it includes configuration support for EVMs, DSKs, simulators, and some third-party boards. Existing configuration templates are easily adaptable to provide support for custom boards and other third-party boards.

DSP/BIOS kernel is integrated into the Code Composer Studio IDE. Code Composer Studio‘s kernel object viewer and real-time analysis provide a powerful set of integrated tools specifically focused on debugging and tuning multitasking applications.

DSP/BIOS Kernel Components

DSP/BIOS Configuration Tool. This tool allows you to create and configure the DSP/BIOS kernel objects used by your program. You can use this tool to configure memory, thread priorities, and interrupt handlers settings.

DSP/BIOS Real-Time Analysis Tools. These tools allow you to view program activity in real time. For example, the Execution Graph shows a diagram of thread activity.

DSP/BIOS Kernel. Your C, C++, and assembly language programs can call over 150 DSP/BIOS functions.

46

Fig. 18 – TI DSP Third Party Support Font: www.ti.com

DSP Third Party Network Overview

The TI DSP Third Party Network is a worldwide community of respected companies offering products and services that support TI DSPs. Products and services include a

broad range of end-equipment solutions, embedded software, engineering services and development tools that help customers accelerate development efforts and cut time-to-market.

47

Embedded software – eXpressDSP™ compliant algorithms and libraries for a variety of applications such as voice, audio, video, imaging, telecommunications, speech, biometrics, encryption, motor control, as well as others.

Development Tools – Hardware and software tools including emulators, device programmers, development boards, simulators, debuggers and eXpressDSP- compliant plug-ins for Code Composer Studio integrated development environment

End-Equipment Solutions – Third party developed designs for a variety of end equipment solutions. These designs allow you to get a jump start on your design while leveraging TI‘s depth and breadth of products.

Engineering Services – Engineering services include turnkey designs, hardware and software integration, training, research and development

48 2.2.2. MATLAB Matlab is a tool for doing numerical computations with matrices and vectors. It can also display information graphically To perform the algorithm tests Matlab and Simulink software have been used, as a first developing tool to achieve the goal of programming a vocoder

Fig. 19 – Matlab main window and graphs Font: www.mathworks.com

MATLAB is a numerical computing environment and fourth generation programming language. Developed by The MathWorks, MATLAB allows matrix manipulation, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs in other languages. Although it is numeric only, an optional toolbox uses the MuPAD symbolic engine, allowing access to computer algebra capabilities. An additional package, Simulink, adds graphical multidomain simulation and Model-Based Design for dynamic and embedded systems. Matlab is well adapted to numerical experiments since the underlying algorithms for Matlab's builtin functions and supplied m-files are based on the standard libraries LINPACK and EISPACK.

49 Matlab program and script files always have filenames ending with ".m"; the programming language is exceptionally straightforward since almost every data object is assumed to be an array. Graphical output is available to supplement numerical results.

MATLAB can be used in a wide range of applications, including signal and image processing, communications, control design, test and measurement, financial modeling and analysis, and computational biology. Add-on toolboxes (collections of special-purpose MATLAB functions, available separately) extend the MATLAB environment to solve particular classes of problems in these application areas.

MATLAB provides a number of features for documenting and sharing your work. You can integrate your MATLAB code with other languages and applications, and distribute your MATLAB algorithms and applications.

Key Features

High-level language for technical computing

Development environment for managing code, files, and data

Interactive tools for iterative exploration, design, and problem solving

Mathematical functions for linear algebra, statistics, Fourier analysis, filtering, optimization, and numerical integration

2-D and 3-D graphics functions for visualizing data

Tools for building custom graphical user interfaces

Functions for integrating MATLAB based algorithms with external applications and languages, such as C, C++, Fortran, Java, COM, and Microsoft Excel

50 2.2.2.1. SIMULINK Simulink, developed by The MathWorks, is a commercial tool for modeling, simulating and analyzing multidomain dynamic systems. Its primary interface is a graphical block diagramming tool and a customizable set of block libraries. It offers tight integration with the rest of the MATLAB environment and can either drive MATLAB or be scripted from it. Simulink is widely used in control theory and digital signal processing for multidomain simulation and design.

A number of MathWorks and third-party hardware and software products are available for use with Simulink. For example, Stateflow extends Simulink with a design environment for developing state machines and flow charts.

Coupled with Real-Time Workshop, another product from The MathWorks, Simulink can automatically generate C code for real-time implementation of systems. As the efficiency and flexibility of the code improves, this is becoming more widely adopted for production systems in addition to being a popular tool for embedded system design work because of its flexibility and capacity for quick iteration. Real-Time Workshop Embedded Coder creates code efficient enough for use in embedded systems. xPC Target together with x86-based real-time systems provides an environment to simulate and test Simulink and Stateflow models in real-time on the physical system. Other add-ons support specific embedded targets, including Infineon C166, Motorola 68HC12, Motorola MPC 555, TI C2000, TI C6000 and TI5000 which is the DSP we have in the laboroatory

51 2.2.3. AUDACITY

Audacity is a free, easy-to-use and multilingual audio editor and recorder for Windows, Mac OS X, GNU/Linux and other operating systems.

Audacity is free software, developed by a group of volunteers and distributed under the GNU General Public License (GPL). Free software gives you the freedom to use a program, study how it works, improve it and share it with others. Programs like Audacity are also called open source software, because their source code is available for anyone to study or use.

Fig. 20 – Audacity editing sound window Font: Daniel‘s personal files 2009

Some of Audacity's features include:

Importing and exporting WAV, AIFF, MP3 (via the LAME encoder, downloaded separately), Ogg Vorbis, all file formats supported by libsndfile library

Version 1.3.2 also supports Free Lossless Audio Codec (FLAC)

Recording and playing sounds

Editing via Cut, Copy, Paste (with unlimited Undo)

Multitrack mixing

52 A large array of digital effects and plug-ins. Additional effects can be written with Nyquist

Amplitude envelope editing

Noise removal

Audio spectrum analysis using the Fourier transformation algorithm

Support for multi-channel modes with sampling rates up to 96 kHz with 24 bits per sample

The ability to make precise adjustments to the audio's speed while maintaining pitch (Audacity calls it changing tempo), in order to synchronize it with video, run for the right length of time, etc.

The ability to change the audio's pitch without changing the speed.

Contains major features of modern multi-track audio software [7] including navigation controls, zoom and single track edit, project pane and XY project navigation, non-destructive and destructive effect processing, audio file manipulation (cut, copy, paste)

Converting cassette tapes or records into digital tracks by automatically splitting one track into multiple tracks based on silences in the track and the export multiple option.

Multi-platform: works on Windows, Mac OS X, and Unix-like systems (including GNU/Linux and BSD) amongst others.

53 2.2.4. FRUITY LOOPS STUDIO

A synthesizer is needed to generate al synth sounds but hardware one is very expensive, and we can have interesting software solutions to this problem. Nowadays, hundreds of software instruments are available in the market, so the election it is not an easy job.

For this project, FL Studio has been chosen mainly for three reasons:

It is not just a synth, it is a complete Digital Audio Workstation (DAW)

It has many different and effects included to try

It has a built-in Vocoder

2.2.4.1. General Description of FL Studio software

FL Studio, also known as FruityLoops, is a digital audio workstation (DAW) developed by Belgian company Image-Line Software. FL Studio was originally the creation of Didier Dambrin, who is now the lead programmer at Image-Line responsible for its core development.

FL Studio features a fully automatable workflow centered around a pattern-based music sequencer. The environment includes MIDI support and incorporates a number of features for the editing, mixing, recording, and mastering of audio. Completed songs or clips may be exported to Microsoft WAV, MP3, and the OGG Vorbis format using various high-quality sampling interpolation algorithms. FL Studio saves work in a proprietary *.flp format, inadvertently sharing the same extension as a file type created by Adobe Flash.

The program is acclaimed for its professional DAW features at an attractive price point, its fully-functional demonstration versions, and its lifetime free update policy. Scoring to video is possible using the video-player plugin, but there is no support for traditional music notation.

54 FL Studio processes audio using an internal 32-bit floating-point engine. It can support sampling rates up to 192 kHz using either WDM or ASIO enabled drivers. The audio engine in version 7 introduces limited multi-threading and multi-core support for some generators.

The mixer interface allows for any number of channel configurations. This makes mixing in 2.1, 5.1, 7.1 surround sound possible, granted the output hardware interface has an equal number of outputs. The mixer permits audio-in, enabling FL Studio to act as a multi-track audio recording solution.

FL Studio comes with a variety plugins and generators (software synthesizers) written in the program‘s own native plugin architecture. This API has a built in wrapper for full VST, VST2, DX, and ReWire compatibility.

An included plugin called Dashboard allows users to create full automation-enabled interfaces for their hardware MIDI devices. This allows FL Studio to control hardware from within the program.

FL Studio 8 Producer Edition includes a version of SynthMaker, the popular graphical programming environment for synthesizers. Support for the software is found in the extensive guides that are provided as HTML help documents. Users may also register for the official Image-Line forums, which are commonly recognized as a focal-point for the FL Studio community.

FL Studio also comes with a variety of sound effects, some of which include:

Chorus

Compression

Delay

Flanger

Phaser

Reverb

Equalization

55 2.2.4.2. 3xOSC

3xOsc is a 3-oscillator, subtractive synthesizer feeding the FL Studio Sampler. Its purpose is to generate a bright sound to be filtered by the channel's instrument tools. It can also produce its own kind of stereo phasing.

To save CPU and memory, the oscillators are anti-aliased during rendering only. This means the rendered tracks may sound better, but rendering will be slower.

Fig. 21 – 3xOSC main parameter configuration window Font: flstudio.image-line.com

This section covers parameters for all of the oscillators.

Shape Selector - Buttons in each of the oscillators, allowing one of the following choices: sine, triangle, square, saw, rounded saw, noise, custom. Custom uses whatever sample is loaded in the channel Sampler Settings page.

Invert Switch (INV) - Allows you to invert the phase of the oscillator. When you mix two oscillators with equal settings, and one of them is inverted, they cancel each other and no sound is produced. However, if you set the Fine frequency to a slightly different value, this can produce interesting flanging/phasing effects.

Stereo Phase (SP) - Allows you to set different phase offset for the left and right channels of the generator. The offset results in the oscillator starting at a different point on the oscillator's shape (for e.g. start in the highest value of the sine function instead at zero point). Stereo phase offset adds to the richness and stereo panorama of the sound produced. 56 Stereo Detune (SD) - Allows you to de-tune the stereo sound of the generator, by applying a slightly different frequency at the left and right sound channels. This adds to the stereo panorama of the sound produced, and creates a ?stereo flange? effect. When the knob is in the middle, the effect is turned off.

Volume (VOL) - Sets the relative volume of each of the oscillators. Relative means that the overall sound output is kept at the same level, but the relative amount of each of the oscillators is adjustable. The first oscillator lacks this control, as at least one oscillator is always required to produce sound. For e.g., to mix 50% of oscillator 1 and 2, set the volume of oscillator at 100%. To mix all 3 oscillators at 33.33..%, set the volume knobs to 100% for oscillators 2 and 3. To mix only oscillator 1 at 100%, set all other volume levels to zero.

Panning (PAN) - Sets the stereo panning of the individual oscillators.

Coarse Tuning (CRS) - Sets the coarse tuning (range -24 to +24 semitones) of the individual oscillators.

Fine Tuning (FINE) - Sets the fine tuning (range -1 to +1 semitone) of the individual oscillators.

Other Parameters

OSC 3 Amplitude Modulation (OSC 3 > AM) - Switch on to use Oscillator 3 as an amplitude modulation of the other 2 oscillators.

Stereo Phase Randomness (PR) - Allows you to add ?randomness? to the stereo phase of all oscillators. Low values can make the sound slightly more natural, while higher values can be used as a special effect. Stereo Phasing can introduce clicks, since the oscillators start somewhere ?inside? the waveform. You can fix this by using a volume envelope with a short attack time.

57 2.2.4.3. FL Studio Vocoder

Fruity Vocoder is an advanced real-time vocoder effect with a wide range of adjustable parameters and zero latency. Vocoding is the process of using the frequency spectrum of one sound to modulate the same in another.

Fig. 22 – Vocoder parameter configuration window Font: flstudio.image-line.com

FREQ section

FORM - Formant slider, changes the pitch relationship between the Modulator and Vocoded bands, up/down for a more feminine/masculine sound.

Min/Max - These knobs set the frequency range processed in the vocoder. Cutting off high and low frequency regions that are not present in the modulator sound helps you achieve higher quality result with less bands.

Scale - Lets you adjust the scale of the frequency (linear or logarithmic). The best value for this property varies with the type of modulator sound (voice, instrument, noise, etc).

Invert (INV) - This switch inverts the modulator frequencies.

Bandwidth (BW) - This knob sets the bandwidth of the bandpass filters used to filter the carrier sound. You can think of wider bands as producing softer sound. The best value differs with the different source sounds.

58 ENV Section

Sets the envelope follower attack and decay. This is the fade in/out times for the envelopes of the frequency bands tracked in the modulator sound.

Attack - Sets the "fade in" time.

Decay - Sets the "fade out" time.

MIX Section

Mix the source and carrier levels using the sliders. To solo the left (modulator) or right (carrier) channel click the LCD switches. To swap the left and right channel assignments and change the channel the modulator and carrier are taken from, click the L and R labels at the top.

Bands Display

Displays the bands and their volume levels and the sound plays. You can adjust the volume of the individual bands by dragging up/down the sliders in the view.

HOLD - Press this switch to hold the current band levels (i.e. "pause" the vocoder's band volume detection). You can automate this switch.

Bands - Allows you to select from 4 to 128 bands to be used for vocoding. The more the bands, the higher the output quality (unless you are looking for some kind of special effect), however, more bands consume more processor power. Also note that unlike the FFT-based the vocoders on the market (which usually use 256, 512 or more bands), Fruity Vocoder uses a more precise detection system, so you can achieve better results with much less bands (and no latency, unlike FFT vocoders!) than usual.

Filter - Allows you to set the filter order. Higher order allows for "steeper" frequency band edges.

59 2.2.5. SONY SOUND FORGE

To make the time and frequency study of carrier and modulator, this software provides a very useful and fully functional spectrum analyzer.

It is digital audio editor which includes a powerful set of tools, audio processes, and a variety of effects for the manipulation of audio.

This application is ideal to handle audio recording, audio editing, effects processing and media encoding. Users can combine Sound Forge with any Windows-compatible sound card to create, record, and edit audio files.

The user-friendly easy-to-use interface simplifies edition and creation. It also has built-in support for video and CD burning and can save to a number of audio and video file formats, including WAV, WMA, RM, AVI, and MP3.

Main features

Real-time sample level wave editor

Stereo and Multichannel Recording

High resolution audio support: 24-Bit, 32-Bit, 64-bit (IEEE float) 192 kHz

Video support including AVI, WMV, and MPEG-1 and MPEG-2 (both PAL and NTSC) for use in frame by frame synchronisation of audio and video

Support for a wide variety of file formats

DirectX and VST plugin support. Version 9 includes a vinyl restoration plug-in and Mastering Effects Bundle, powered by IZotope.

Batch conversion functionality

Spectrum analysis tools

White, pink, brown and filtered noise generators

DTMF/MF tone synthesis

60 3. VOCODER’S DESIGN ASPECTS

In this chapter all Vocoder‘s design aspects will be explained and analyzed, such as carrier, modulator, and different types of improvements that can be programmed on a Vocoder.

A vocoder is an audio effect that morphs sound qualities from one sound in to another sound. To sum up simply, it takes the timbre from one sound and the volume from another and makes a whole new sound.

A vocoder works by imprinting the constantly changing frequency spectrum of one signal onto the sound energy in another signal. Thus a vocoder always has two separate inputs—the speech and carrier inputs. The speech, also called modulator, input receives, not surprisingly, a signal containing spoken words and phrases. The carrier input receives the signal that will be "vocoded" by having the frequency characteristics of the speech signal imprinted on it.

Fig. 23 – KORG vocoder VC10 Font: www.proun.net/gallery/korg_vc10.html

61 3.1. MODULATOR

The modulator (sometimes called the program) is analyzed for its harmonic content. In the classic example of vocoding — robotic-sounding singing — speech is the modulator and a harmonically rich sound such as strings or a lush pad is the carrier. As you speak, the various microphones filters produce output signals which correspond to the energies present in your voice. Since these signals control the mixer amplifiers, which in turn control the set of equivalent filters connected to the Instrument, you superimpose a replica of the voice's energy patterns on to the sound of the Instrument plugged into the Instrument input.

Different human speech sounds are associated with different parts of the frequency spectrum. One of the characteristics of human speech is that it contains extremely high-frequency sounds caused by sibilants (consonants such as "S") and fricatives (consonants such as "F"). A vocoder design with 9 bands (described in block diagram section 3.3) is not very good at distinguishing among these sounds. In addition, many of the signals you might want to use as carriers don't have enough sound energy in the extreme high-frequency band to produce good sibilants and fricatives.

Plosive sounds, such as "P", "B", on the other hand, contain lots of low frequency energy. One section of the vocoder uses band-pass filters to split the Microphone signal into eight frequency bands, each covering a specific part of the audio spectrum, somewhat like a graphic equalizer. When you speak an "S" into the Microphone, the higher frequency filters fed by the Mic will produce an output but there will be no output from the lower frequency filters. Similarly, speaking a plosive into the Microphone will give an output from the low frequency filters, while little (if any) signal will pass through the higher frequency filters. Vowel sounds produce outputs from the various midrange filters. These outputs go through individual envelope followers to provide eight Control Voltages (CVs) that track the energy in the part of the spectrum covered by the filter.

Of course, you it is not limited to just a voice for the modulation input; in fact, percussive Instruments and program material can produce very interesting results. Other sound sources can be used instead for the speech input, but spoken words are the most common sources. Note also that it's not necessary to sing into a vocoder;

62 the pitch information in the output comes from the carrier signal, so speaking in a normal tone works fine.

3.1.1. MODULATING SIGNAL ANALYSIS

The interest of this part is to see the how different types of sounds will affect more the behavior of the output sound of a vocoder discussing it through the spectrum of some sounds

Both temporal and spectral characteristics have been analyzed with Audacity software from a piece of speech in English, in PCM uncompressed wave file sampled at 44100 Hz and with 16 bits of resolution.

Mainly two types of sounds have been analyzed the ones that excite the high frequencies: both sibilants and fricative sounds, and the ones that are composed mainly of lower frequencies: the plosive sounds. And finally a vowel is analyzed to discuss their difference between them and consonants.

Fig. 24 – Sibilant sound of an S Font: Daniel´s own archive. 2009

In figure 24 the speech signal time plot of a sibilant sound of an ‗S‘ while in figure 25 we can see the same number of samples from a plosive sound of a ‗P‘ letter. The difference is clearly noticeable from both temporal sounds representation. The content in higher frequencies in the sibilant sound is so obvious comparing to the plosive sound, that it could be proved without applying to it any kind of Fourier Transformation.

63

Fig. 25 – Plosive sound of a P Font: Daniel´s own archive. 2009 The main frequency of this ‗P‘ sound is clearly lower than the ‗S‘ sound, but an specific frequency analysis will prove this issue.

To represent the Fast Fourier Transformation audacity software has been used applying FFT, with Hanning windows in blocks of 4096 samples: to these actual pieces of sounds. The region of interest shown is the theoretical audible range of frequencies of human beings 20-20000 Hz in logarithmic scale, that‘s one of the reasons why 44100 Hz sample rate has been chosen, to satisfy Nyquist Theorem with a safety margin of 2050 Hz.

Fig. 26 – Spectrum of an S Font: Daniel´s own archive. 2009

The content of higher frequencies, above 2000 Hz, is nearly 15 dB higher than most of the rest of audible frequencies.

64

Fig. 27 – Spectrum of a P Font: Daniel´s own archive. 2009

Lower frequencies are the main characteristic of this spectrum of one of the most plosive of all sounds: ‗P‘. The content of frequencies between 100 and 500 Hz are approximately 20 dB higher than the rest of middle and high frequencies

Fricatives sounds like ‗F‘ have such a huge similarity to both temporal and spectral representations to sibilants sounds that they have not been represented in this section in order not being repetitive.

In figure 28 the spectrum of a letter ‗O‘ is represented, The frequency content of vowels is located all along the low-mid range of frequencies, from 100 to 1000 Hz approximately if it‘s a male voice, a little bit higher if it is a female one.

Fig. 28 – Spectrum of an O Font: Daniel´s own archive. 2009

65 All vowels have a more harmonic content as it can be seen in figure 29, zoom to the low-mid region. The fundamental frequency is located in 108 Hz, while the second, third, and even until the sixth can be seen in this representation

Fig. 29 – Zoom into the spectrum of an O Font: Daniel´s own archive. 2009

It is important to remark that this pitch content of modulator is lost in the vocoding process, is the carrier pitch that prevails on the output.

If drums are used as modulator is also interesting to insist that a modulator signal should excite most of the frequencies of the audible range independently along the time of the sound to achieve an attractive output from the vocoder. The main frequencies where average drums sets are allocated

Bass drum: Low frequencies

Toms: Low-Mid frequencies

Snare Drum: Mid-high frequencies

Cymbals: High frequencies

66 One of the classics drum machine, the TR909, have its spectrum compared wit the D16 Drumazon drum machines are shown in the following figures:

Fig. 30 – Bass Drum Spectrum Font: www.d16.pl/drumazon

Fig. 31 – Snare Drum Spectrum Font: www.d16.pl/drumazon

Fig. 32 – Low Tom Drum Spectrum Font: www.d16.pl/drumazon

67

3.2. CARRIER

The input signal to be vocoded is referred to as the carrier. The carrier is what provides the pitch for the vocoder. It can be recorded or synthesized. To produce speech-like results, the carrier is typically an impulse train for vowel sounds and noise for fricative sounds. The carrier signal will be processed and then routed to the Vocoder‘s output. The speech signal will do its job and then be discarded.

The carrier should be a signal that has significant and constant acoustic energy throughout as much of the sound spectrum as possible. Carrier wave should contain a broad range of frequencies. A synthesizer sawtooth wave works well, and for this application the synth's own filter should be wide open. Noisy signals, such as white noise and sampled wind sounds, are also good choices for the carrier. A waveform such as a triangle wave, which has weak overtones to begin with, or a synth waveform that has already been filtered by the synth's own lowpass filter, tends not to produce good results when vocoded.

Good results can be achieved with strings, brasses, flutes or any other sound with nearly constant dynamic. Even chords may be used to give the result more depth.

What it is perceived as the particular tone color of a real-world sound is the combination of these partials—their relative loudness, their precise frequencies, and the way they change over time. If a trumpet, a violin, and a soprano play or sing exactly the same note, normal listeners have no trouble distinguishing one from another. We can do this because each sound source produces different partials. Our ears can very rapidly decode the mix of partials in each tone, usually without the slightest conscious effort.

68 3.2.1. CARRIER SIGNAL ANALYSIS

The quantity of carrier signals that could be analyzed is enormous, even infinite, but as our synthesized font is going to be 3xOsc, a good potentially carrier source would be a triangular wave, also known as sawtooth wave, shown in next figure. This configuration will work well because of its rich harmonic content. This kind of synth waveform is very appropriate for a robotic voice result.

Fig. 33 – 3xOsc Sawtooth Configuration Font: Daniel´s own archive. 2009

Then some useful tricks, such as extra effects, can be applied to our carrier signal to improve the vocoder performance.

The triangular sound plot is shown in figure 34, it should be notice that its not a perfect sawtooth signal, I has fair variations due to OSC 2 and 3, but even though it still sounds very crude.

Fig. 34 – 3xOsc Sawtooth DRY plotted signal Font: Daniel´s own archive. 2009

69 Next figure shows the spectrum of this signal, the most important thing to know of this kind of signal, is that it has energy through the whole audible range. But the main problem this signal itself has crude and rude sound, also quite repetitive and even annoying from my subjective point of view. This is represented in the spectrum with a periodically repeated series of peaks.

Fig. 35 – 3xOsc Sawtooth DRY spectrum Font: Daniel´s own archive. 2009

One solution to this problem could be to add effects to our carrier signal, it will have a even more harmonic rich sound, but with slight time-varying modifications that will make our nearly mathematic sound a living thing.

In order to achieve this goal, two classic effects could be added to our signal, first next figure shows, how Flanger affects our signal temporally.

Fig. 36 – 3xOsc Sawtooth with FLANGER plotted signal Font: Daniel´s own archive. 2009

70 Flanging is an audio effect that occurs when two identical signals are mixed together, but with one signal time-delayed by a small and gradually changing amount, usually smaller than 20 milliseconds. This produces a swept comb filter effect: peaks and notches are produced in the resultant frequency spectrum, related to each other in a linear harmonic series. Varying the time delay causes these to sweep up and down the frequency spectrum.

If we add another effect, such as delay our signal would gain personality without loosing the pitch in any moment, as figure 37 shows. Remembering that what we want to achieve is a richer time-varying spectrum.

Delay is an audio effect which records an input signal to an audio storage medium, and then plays it back after a period of time. The delayed signal may either be played back multiple times, or played back into the recording again, to create the sound of a repeating, decaying echo.

Fig. 37 – 3xOsc Sawtooth with FLANGER+DELAY plotted signal Font: Daniel´s own archive. 2009

The resulting spectrum can be seen in figure 38, its important to notice that a single plot of this processed signal it is not as surprisingly full of life as it is watching it through the spectrum analyzer while it‘s being plotted using 4096 point FFT function.

71

Fig. 38 – 3xOsc Sawtooth spectrum with FLANGER+DELAY Font: Daniel´s own archive. 2009

The results vocoding signals with FL Studio vocoder are considerably better when the carrier signal is processed.

Another signal has been analyzed to compare with the sawtooth one; it is 3xOSC again with a string-like preset. The spectrum is noticeably poorer when talking about its harmonic content as it can be seen on the next figure. Even if flanging and delay is added the harmonic content it is not much broader, as it shows figure 39 with a lighter color. But the acoustic results are also interesting from an artistic point of view.

Fig. 39 – 3xOsc STRING spectrum with/with-out FLANGER+DELAY Font: Daniel´s own archive. 2009

72

3.3. BLOCK DIAGRAM

A basic vocoder contains several elements, as shown in Figure 22. It has two banks of bandpass filters, a bank of envelope followers, a bank of amplifiers, and a mixer. Let's look at each component in turn, and then see how they work together

Fig. 40 – Vocoder Simple Block Diagram Font: digitalmedia.oreilly.com

This basic block diagram of a vocoder shows just six frequency bands; some software vocoders have hundreds.

A bandpass filter is a device that, when fed a signal, allows only the frequencies within a narrow band to pass. The incoming signal may have partials at both higher and lower frequencies, but those partials are filtered out. As diagrammed in Fig.22, the vocoder uses two banks of bandpass filters. The first bank estimates frequency parameters from a given modulator input, while the second bank is used to process the carrier input. The derived modulator bandpass settings control the gains applied at the output of the carrier bandpass filters. For practical purposes, we can talk about the bandpass filter as if it were a frequency window with sharp edges, even though the edges are in fact fuzzy.

73 The bandpass filters in a vocoder split the incoming speech signal into a number of separate signals, each of which contains only the sound energy within the narrow pass-band of that particular filter. For instance, each pass-band might be an octave wide. For instance we the filters would have something like the following lower and upper boundaries:

Table 1– 9-band design for the bandpass filter Upper 50Hz 100Hz 200Hz 400Hz 800Hz 1.6kHz 3.2kHz 6.4kHz 12.8kHz Bound

Lower 25Hz 50Hz 100Hz 200Hz 400Hz 800Hz 1.6kHz 3.2kHz 6.4kHz Bound

That's a 9-band design. If the vocoder has 16 bands, each will be correspondingly narrower. Typical vocoder designs have 8, 16, 24, or 32 bands. With fewer than 8 bands, the speech input won't be detected accurately enough for us to understand the output. Conversely, using too many bands can reduce the personality of a vocoder by glossing over its characteristic distortion

An envelope follower is a device that senses the amplitude of an audio signal and outputs a control signal (also known as an envelope) whose level corresponds to the input's amplitude. In other words, if the incoming signal is loud, the control signal output has a high level. If the incoming audio is soft, the control signal has a low level. When there's no incoming audio, the control signal drops to zero.

Envelope followers are used for various effects. A compressor uses a type of envelope follower, as does an auto-wah effect. Most envelope followers have attack and release controls. The attack parameter governs how quickly the envelope rises when the incoming audio increases in amplitude. The release parameter (sometimes called decay) governs how quickly the envelope falls back toward zero when the amplitude of the incoming audio drops.

In a compressor, a long release time may give better-sounding results. In a vocoder, however, both the attack and the release are normally kept fairly short. That ensures that the vocoder will be able to track the incoming speech signal accurately. In the Reason examples, you can try experimenting with the Vocoder module's Attack and 74 Decay knobs. Increasing the Decay causes the sound to smear, while setting it too short makes the sound rather grainy.

In a vocoder (again, refer to Figure 40), each bandpass filter in the speech path feeds through a dedicated envelope follower. The result is this: when the speech signal has partials within a given frequency band, the output of that particular envelope follower rises. When there are weak partials or none at all within that band, the output of that particular envelope generator falls.

3.3.1. BLOCK IMPROVEMENTS AND OTHER ASPECTS

Most vocoders also have additional processing to enhance the sound, as shown in figure 41:

Fig. 41 – PAIA Vocoder Block Diagram Font: www.paia.com/ProdArticles/vocodwrk.htm

Almost all sounds contain energy at a number of different frequencies. If someone bows a note on a cello, for instance, the note itself may have a frequency of 100Hz. But the tone will also contain vibrations at 200Hz, 300Hz, 400Hz, and so on. These higher-frequency vibrations are called harmonics, overtones or partials. Some commercial vocoders a distortion circuit on the carrier input to increase the level of overtones as figure 22 shows a fuzz block in the carrier input.

75 A fuzzbox is a type of effects comprising an amplifier and a clipping circuit, which generates a distorted version of the input signal. As opposed to other distortion effects, a fuzzbox boosts and clips the signal sufficiently to turn a standard sine wave input into a waveform that is much closer to a square wave output. The sound of almost creating a square wave gives a "Rough around the edges" effect that creates the classic fuzz tone. This gives a much more distorted and synthetic sound than a standard distortion or overdrive. Fuzz sounds also tend to have lower Mid frequencies than other distortion types.

As clipping is a non-linear process, intermodulation will occur, leading to the generation of an output signal rich in extra harmonics of the input signal. Intermodulation distortion also produces frequency components at the various sums and differences of the frequency components of the input signal. In general, these components will be not be harmonically related to the input signal, leading to dissonance. To reduce unwanted dissonance, simple power chords (root, fifth, and octave) are often used when using fuzzboxes, rather than triads (root, third, and fifth) or four-note chords (root, third, fifth, and seventh).

Originally the output of a vocoder is Mono, but the PAIA vocoder deinterlaces odd and even bands to simulate a pseudo-stereo output. This vocoder also allows mixing the vocoded signal with the dry signal whether if it is the carrier or modulator. In addition to this dry signal can be sent out to an external effect processor, and then received it back with a desirable level to mix it with both dry and vocoded signal.

To make vocoded speech more understandable, some vocoders have a pass- through circuit that sends the portion of the speech signal above 8 kHz or so directly to the output, mixing it with the carrier signal rather than attempting to vocode it. The level of the pass-through is usually adjustable.

Other vocoders give you the option of repatching the envelope followers to arbitrary carrier bands. This will render the words in the speech input unintelligible, but the output can still have an expressive character.

76 3.4. FILTERING Vs. FFT

Originally the vocoder was implemented with a bank of filters, but after a few years technology allows to do it with Fast Fourier Transform (FFT), which can be developed easier and much cheaper in any kind of digital platform.

The Filter Bank Interpretation The simplest view of the vocoder analysis is that it consists of a fixed bank of bandpass filters with the output of each filter expressed as a time-varying amplitude and a time-varying frequency. The synthesis is then literally a sum of sine waves with the time-varying amplitude and frequency of each sine wave being obtained directly from the corresponding bandpass filter. If the center frequencies of the individual bandpass filters happen to align with the harmonics of a musical signal, then the outputs of the phase vocoder analysis are essentially the time-varying amplitudes and frequencies of each harmonic.

The filter bank itself has only three constraints. First, the frequency response characteristics of the individual bandpass filters are identical except that each filter has its passband centered at a different frequency. Second, these center frequencies are equally spaced across the entire spectrum from 0 Hz to half the sampling rate. Third, the individual bandpass frequency response is such that the combined frequency response of all the filters in parallel is essentially flat across the entire spectrum. This ensures that no frequency component is given disproportionate weight in the analysis, and that the vocoder is in fact an analysis-synthesis identity. As a consequence of these constraints, the only issues in the design of the filter bank are the number of filters and the individual bandpass frequency response. The number of filters must be sufficiently large so that there is never more than one partial within the passband of any single filter. For harmonic sounds, this amounts to saying that the number of filters must be greater than the sampling rate divided by the pitch. For inharmonic and polyphonic sounds, the number of filters may need to be much greater. If this condition is not satisfied, then the channel vocoder will not function as intended because the partials within a single filter will constructively and destructively interfere with each other, and the information about their individual frequencies will be coded as an unintended temporal variation in a single composite signal.

77 The Fourier Transform Interpretation a complementary (and equally correct) view of the vocoder analysis is that it consists of a succession of overlapping Fourier transforms taken over finite-duration windows in time. It is interesting to compare this perspective to that of the Filter Bank interpretation. In the latter, the emphasis is on the temporal succession of magnitude values in a single filter band. In contrast, the Fourier Transform interpretation focuses attention on the magnitude and phase values for all of the different filter bands or frequency bins at a single point in time (see Figure 24).

Fig. 42 – Filter Bank Interpretation vs. Fourier Transform Interpretation Font: www.panix.com

These two differing views of the vocoder analysis suggest two equally divergent interpretations of the resynthesis. In the Filter Bank interpretation (as noted above), the resynthesis can be viewed as a classic example of additive synthesis with time- varying amplitude and frequency controls for each oscillator. In the Fourier view, the synthesis is accomplished by converting back to real-and-imaginary form and overlap-adding the successive inverse Fourier transforms.

This is a first indication that the channel vocoder representation may actually be more generally applicable than would be expected of an additive-synthesis technique.

78 In the Fourier interpretation, the number of filters bands in the vocoder is simply the number of points in the Fourier transform. Similarly, the equal spacing in frequency of the individual filters can be recognized as a fundamental feature of the Fourier transform. On the other hand, the shape of the filter passbands (e.g., the steepness of the cutoff at the band edges) is determined by the shape of the window function which is applied prior to calculating the transform. For a particular characteristic shape (see.Fig 25, a Hanning window), the steepness of the filter cutoff increases in direct proportion to the duration of the window. Windowing refers to taking a small frame out of a large signal (you are only looking at your signal through a small window of time). The windowing process alters the spectrum of the signal and this effect should be minimized. In order to reduce the effect of windowing on the signal frequency representation, a Hanning window of size N is used. This is because multiplying a rectangular window with a signal in the time-domain is equivalent to performing a convolution in the frequency domain. A rectangular window has more energy in the side lobes while a Hanning window focuses most of its energy near DC. This looks more like an impulse which would lead to a perfect reconstruction of the signal.

In real-time systems the size of blocks of FFT, N, and consequently windows size, affects directly to the latency of our system.

Fig. 43 – Hanning Window Font: wikipedia.org/wiki/Window_function

79 Thus, again, we see the fundamental tradeoff between rapid time response and narrow frequency response. It is important to understand that the two different interpretations of the vocoder analysis apply only to the implementation of the bank of bandpass filters.

The operation (described in the previous section) by which the outputs of these filters are expressed as time-varying amplitudes and frequencies is the same for each. However, a particular advantage of the Fourier interpretation is that it leads to the implementation of the filter bank via the more efficient Fast Fourier Transform (FFT) technique. The FFT produces an output value for each of N filters with (on the order of) N log2N multiplications, while the direct implementation of the filter bank requires N2 multiplications. As next figure shows the number of multiplications needed for an N filter design.

Fig. 44 – FFT (BLUE) vs. Bank filter (RED) number of multiplications Font: Daniel´s own archive. 2009

It‘s clear seen that when the number of bands of frequencies becomes bigger than 30 the FFT is much more cheaper in terms of computing.

80 Thus, the Fourier interpretation can lead to a substantial increase in computational efficiency when the number of filters is large (e.g., N = 1024). The Fourier interpretation has also been the key to much of the recent progress in vocoder-like techniques.

Mathematically, these techniques are described as Short-Time Fourier-Transform techniques. In the discrete time case, the data to be transformed could be broken up into chunks or frames (which usually overlap each other, to reduce artifacts at the boundary). Each chunk is Fourier transformed, and the complex result is added to a matrix, which records magnitude and phase for each point in time and frequency. Such algorithms may also be referred to as Multirate Digital Signal Processing techniques (for reasons which will be made clear below). Sample-Rate Considerations The input and output signals to and from the vocoder are always assumed to be digital signals with a sampling rate of at least twice the highest frequency in the associated analog signal (e.g., a speech signal with a highest frequency of 5 KHz might be digitized—at least in principle—at 10 KHz and fed into the vocoder). But sibilant sounds can be lost with such a small sampling rate.

81 3.5. THE PHASE VOCODER

Considering such a deep study of the channel vocoder seems fair to mention his ‘younger brother’: the phase vocoder. In order to avoid misunderstood between the two of them.

The phase vocoder is a digital signal processing technique of potentially great musical significance. It can be used to perform very high fidelity time scaling, pitch transposition, and myriad other modifications of recorded sounds.

Fig. 45 – Spectral Envelope Correction Font: www.panix.com

82 Applications: The basic goal of the phase vocoder is to separate (as much as possible) temporal information from spectral information. The operative strategy is to divide the signal into a number of spectral bands, and to characterize the time- varying signal in each band. This strategy succeeds to the extent that this bandpass signal is itself slowly varying. It fails when there is more than a single partial in a given band, or when the time-varying amplitude or frequency of the bandpass signal changes too rapidly. Too rapidly means that the amplitude and frequency are not relatively constant over the duration of the FFT. This is equivalent to saying that the amplitude or frequency changes considerably over durations which are small compared to the inverse of the low pass filter bandwidth.

The analysis signals (which in the Filter Bank interpretation are thought of as providing the instantaneous amplitude and frequency values for a bank of sine wave oscillators) are no longer at the same sample rate as the desired output signal. Thus, an additional interpolation operation is required to convert the analysis signals back up to the original sample rate. Even so, this is a lot more computationally efficient than avoiding the sample-rate reduction in the first place. In the Fourier Transform interpretation the details of these multiple sample rates within the phase vocoder are less apparent. In the above example, where the internal sample rate is only 2% (200/10000) of the external sample rate, we simply skip 10000/200 = 50 samples between successive FFT's. As a result, the FFT values are computed only 10000/50 = 200 times per second. In this interpretation, the interpolation operation is automatically incorporated in the overlap-addition of the inverse FFT's.

Lastly, it should be noted that we have so far considered the bandwidth of the output of the lowpass filter without any mention of the conversion from rectangular to polar coordinates. This conversion involves highly nonlinear operations which (at least in principle) can significantly increase the bandwidth of the signals to which they are applied. Fortunately, this effect is usually small enough in practice that it can generally be ignored.

83 4. RESULTS

The results obtained in this project are shown through this chapter, some figures will help to illustrate the concepts but as it is an audio project the main results are audio samples processed through these digital signal techniques, being also important the raw preprocessed samplers. All this audio files can be found in the documentation CD that should be accompanying this written report.

In this section mainly two vocoders are analyzed. The first one is an algorithm developed in MATLAB platform, and then the second one is commercial software based vocoder, another implementation was left in the way, a real-time vocoder in a DSP platform from Texas Instruments DSK 5416.

4.1. MATLAB

4.1.1. CODE DEVELOVED

The following code is directly cut and pasted from MATLAB code editor, but some lines have been removed from the report-version of the code due to their triviality in the vocoding process. Some of them were removed because they were about plotting figures and this images will be explained in the next pages, another are audio files playing commands. Even though they are not in this section, the complete code can be found on the documentation CD.

function out=danivocoder (channels,numbands,overlap) % Read Carrier wav input file [carrier,fsC] = wavread('3osc'); % Read Modulator wav input file [mod,fsM] = wavread('vocal'); %Comparing sample rates if fsC~=fsM, error('carrier and modulator must have same sample rate'); end fs = fsM;

84 % shortening of MODULATOR or CARRIER signal to be same size long = min(length(carrier),length(mod)); carrier = carrier(1:long); mod = mod(1:long);

% Size of both FFT and window FFTsize = 2*channels; window = hanning(FFTsize); % indexes for different frequencies bands bands = 1:round(channels/numbands):channels; bands(end) = channels;

%Initialize output vector out = zeros(long,1); %Creation of a useful frequencies vector for plotting f=fs/FFTsize : (fs / FFTsize) : fs/2;

%Here it is the main loop that travels across the signal processing it by %overlapping blocks pointer = 0; while pointer*FFTsize*overlap+FFTsize <= long index = round([1+pointer*FFTsize*overlap:pointer*FFTsize*overlap+FFTsize]); % FFT of each windowed slice of input signals trama_freqMOD = fft( mod(index,:) .* window ); trama_freqCAR = fft( carrier(index,:) .* window ); % Initialize the enveloping for all bands syn = zeros(channels,1); % For each band of frequencies multiplies both spectrums for BarridoBandas = 1:numbands-1

85 BandaActual = bands(BarridoBandas):bands(BarridoBandas+1)-1; syn(BandaActual,1)= trama_freqCAR(BandaActual)*diag(mean(abs(trama_freqMOD(BandaActual)))); end % Rebuild the whole spectrum, positive and negative frequencies midval = trama_freqMOD(1+FFTsize/2,:).*trama_freqCAR(1+FFTsize/2,:); specneg = flipud( conj( syn(2:end,:) ) ); specfull = [syn; midval; specneg;]; % IFFT returns signal to time domain timesignal = real( ifft(specfull) ); % Rebuild the output signal slice by slice out(index,:) = out(index,:) + timesignal; pointer = pointer+1; end

% Normalize output out = 0.8*out/max(max(abs(out)));

The key commands or set of instructions that are relevant for this algorithm success, that should be highlighted and deeply explained are:

Windowing signal

Overlapping slices of audio

Separate spectrum into bigger bands than the FFT ones

Multiplying carrier and modulator

86 Windowing

When using FFT analysis to study the frequency spectrum of signals, there are limits on resolution between different frequencies, and on delectability of a small signal in the presence of a large one.

There are two basic problems: the fact that we can only measure the signal for a limited time; and the fact that the FFT only calculates results for certain discrete frequency values (the 'FFT bins'). The limit on measurement time is fundamental to any frequency analysis technique: the frequency sampling is peculiar to numerical methods like the FFT

Most real signals will have discontinuities at the ends of the measured time, and when the FFT assumes the signal repeats it will assume discontinuities that are not really there. Since sharp discontinuities have broad frequency spectra, these will cause the signal's frequency spectrum to be spread out.

Spectral leakage is not related in any way to the fact of having sampled the signal, but only to the finite measurement time. Spectral leakage causes at least two distinct problems. First, any given spectral component will contain not just the signal energy, but also noise from the whole of the rest of the spectrum. This will degrade the signal to noise ratio.

Second, the spectral leakage from a large signal component may be severe enough to mask other smaller signals at different frequencies.

This leads to the idea of multiplying the signal within the measurement time by some function that smoothly reduces the signal to zero at the end points: hence avoiding discontinuities altogether. The process of multiplying the signal data by a function that smoothly approaches zero at both ends, is called 'windowing': and the multiplying function is called a 'window' function. It is easy to analyze the effect of a window function: the frequency spectrum of the signal is convolved with the frequency spectrum of the window function.

87

Fig. 46 – dB magnitude frequency response of various windows (a) Bartlett, (b) Hanning, (c) Hamming, (d) Blackman Font: www.vocw.edu.vn

Several smooth window functions have been proposed and used. Fig. 46 show that no window is the best in all aspects but there is a tradeoff of features, and the choice of an appropriate window depending on our requirement. The Bartlett is the transition from the rectangular to the other windows. For smallest main lobe width it‘s the rectangular window, for best side lobe attenuation it‘s the Blackman. In between, the Hanning is a good choice, this is the choice that has been taken for the vocoder algorithm.

88 Overlapping

When the length of a data set to be transformed is larger than necessary to provide the desired frequency resolution, a common practice is to subdivide it into smaller sets and window them individually. To mitigate the "loss" at the edges of the window, the individual sets may overlap in time.

This loss at the edges are translated in audio signals to ‗clipping‘ and ‗crisping‗ awful sounds that are intolerable in a digital audio application.

Fig. 47 –FFT segments overlapped Font: www.virtins.com

Why is it necessary to window and overlap the analysis blocks? There are several issues here. It is theoretically possible to do a huge FFT over the entire input signal, but for practical reasons we usually want to limit the time delay and storage memory of a real time system. Breaking the signal into shorter blocks provides this opportunity. Applying the smooth window function is helpful in reducing the truncation effects that otherwise would be evident in the DFT data, but this may or may not be important depending on the application. Finally, the overlapping windows provide a smooth transition from one block to the next, and this is important if the frequency domain processing varies with time.

The Matlab function provides an input parameter called overlap from 0 to 1 which refers to the amount of overlapping that it is desired.

89 Multiplying and merging Bands

It is clear that the analysis in frequency it is made by FFT function and the number of points it is set by the input variable: channels that is doubled to set the FFT size, due to Hermitical symmetry, and consequently sets the window size.

But to improve the behavior of this vocoder, the spectrum of both carrier and modulator are merged in a smaller number of bands than the FFT output. This number of bands is set the Matlab function with the input variable numbands remarking that this number must be smaller than the FFT window size.

Fig. 48 – Merging FFT into averaged bigger bands Font: Daniel´s own archive. 2009

Figure 48 illustrates this operation quite well; in this example the number of bands is tree times smaller than the original FFT size. And lead us to the last part to be explained of the algorithm, the multiplication of bands, this is the key of the whole process.

90 Bands are multiplied one against each other; it must be reminded that each modulator band absolute value is averaged to be multiplied with the carrier‘s corresponding band.

All the modulators enveloping bands values are placed in a diagonal matrix to relieve multiply efforts to Matlab. Then the result of this multiplication is conjugated and flipped to create the negative frequencies, in other words recreate hermitical symmetry in order to force the IFFT function the return of a real signal.

4.1.2. TIME ANALYSIS

It is important to notice that all the tests with the algorithm have been done with two theoretical good inputs for a vocoder: an English male speaking voice and the sawtooth synthesized electronic instrument with delay and flanging effect. These two inputs have not been changed during the implementation. Both samples of sounds are sampled at 44100 Hz and with 16 bits of resolution.

This has been done this way with the intention of playing more with the input parameters of the vocoder that are acutely explained in the previous chapter, but they are briefly summarized in the following section:

Channels – is the number of points that FFT will obtain from our temporal and real input signal. Notice that it will be doubled due to hermitical symmetry.

Numbands – is the number of bands that all the FFT points will be merged into, the number of bands that will be multiplied from modulator and carrier.

Overlap – is a value from 0 to 1 that implies the degree of overlapping of consequent blocks. Numbers smaller than 0,25 are suggested.

In the next figure the three signals involved in the vocoder are shown in the time domain. These signals are: the voice or modulator, the instrument or carrier and the last one, obviously, the output from the vocoder: the vocoded signal.

91 In the voice signal, the blue one; is quite easy to see and to difference between sounds waveforms because of each word (or sometimes even each letter) is like an island in the time line, while a carrier sound, the green one, should have a more constant envelop to be modified with the modulator.

Fig. 49 – Time plot of the 3 signals involved in the vocoder Font: Daniel´s own archive. 2009

Figure 49 shows how the vocoded signal has similar time response as the voice but with carrier‘s pitch information. For this plot the input parameters for the vocoder were:

Channels = 512 Numbands = 32 Overlap = 0.1

Then after a few experiments another output is interesting to show in figure 50 with less frequencial resolution, with the following input parameters:

Channels = 64 Numbands = 16 Overlap = 0.3

The intelligibility of the speech in vocoded signal has been seriously damaged; signal time aspect also seems poorer. 92

Fig. 50 – Time plot of a poorer vocoded signal Font: Daniel´s own archive. 2009

After the first impressions of the vocoder performance; seems reasonable to put in writing a few of subjective impressions on next table.It is about how input parameters affect the result of the vocoder. The subjective parameters will be intelligibility, sonority from an artistic point of view and sound artifacts as chirping. And will be evaluated from 0 to 10.

Table 2– Subjective performance of Matlab vocoder Channels Numbands Overlap Intelligibility Sonority Artifacts

1024 32 0.3 8 9 2

1024 32 0.1 7 6 3

1024 8 0.3 5 5 5

1024 8 0.1 5 4 6

32 32 0.3 4 3 7

32 32 0.1 4 3 8

32 8 0.3 3 2 9

32 8 0.1 1 1 10

93 Apart from this standard point of view of the vocoder typical input, the voice; next figure shows how similar is the behavior of the vocoder to a voice input when drums are used as modulator.

Fig. 51 – Time plot of drums vocoding Font: Daniel´s own archive. 2009

In a drum signal is very easy to identify where the sounds located in the time line are. Another noticeable difference from a voice signal is the faster attack that drums signals have. Fruity Loop‘s vocoder has an envelope follower that allows configuring this parameter with a knob.

94 4.1.3. FREQUENCY ANALYSIS

Let‘s have a look at our signals spectrum. For these interesting 3D plots of frequency through time domain, waterfall‘s Matlab function has been used.

Fig. 52 – Spectrum of the original voice Font: Daniel´s own archive. 2009

The sentence analyzed says: ”The expression of men face is…” so when the X or the S sounds or even the sound of the F excites high frequencies.

The content of low frequencies is located in the beginning of the sentence, mainly in the ―The‖ and in the ‗P‘ of ‗expression‘.

Carrier spectrum is not interesting as it occupies the whole spectrum and it does not vary along time, this spectrum haven been plotted thanks to its triviality, in fact it is included in chapter 3.

95 To plot the output of the vocoder that is shown in next figure, the input parameters of the vocoder are good enough to get a good output file, these are the following:

Channels = 1024 Numbands = 32 Overlap = 0.3

Fig. 53 – Spectrum of Vocoder‘s output Font: Daniel´s own archive. 2009

It‘s important to notice that the maximum values are not in the beginning of the sentence, so this means that they are not completely correlated temporally, they have also a relation in time. But probably this is caused by the slow attack time of the carrier.

96 5. CONCLUSIONS

These are the main conclusions after studying and developing the channel vocoder:

The modulator source must have two main characteristics: the first one; it should have a discontinuous time look, in other words, all modulator sounds should not have constant amplitude. And the second one is that these independent types of sounds must have different frequency content with the intention of excite the different parts of the carrier‘s spectrum.

The carrier most important features are a constant time enveloping and it is interesting that these sounds have energy in the whole audible spectrum. Triangular waveform signals are a good choice attributable to their rich harmonic content. Some extra effects are interesting add to carrier‘s signal to achieve this goal. Flanging, fuzzing, delays, resonators, saturators… are helpful effects to make a broader and ‗dirtier‘ spectrum.

An audio effect that can not be experienced in real time is useless for a musician. So this Matlab implementation, is very interesting in an academic background to learn how exactly a vocoder works, but for musical applications it could only be used in a post-production (or even pre-production if it is used as a sample) environment. On the other hand, Matlab is fully operational in terms of drawing signals and the library of functions the program comes with is huge.

There is no an obvious relation that implies that the more number of bands the vocoder analyzes signals the better it will sound this is a subjective, this depends on the objective we are looking for. If we want a very ‗robotic‘ voice with 16 bands is enough. Contrarily the bigger we make the FFT size the more intelligible the resulting vocoded voice sounds.

In terms of execution time when a big FFT size is chosen the algorithm runs faster; this happens as a result of slicing the signal in bigger pieces. However in real time DSP platform this would be translated in a higher latency.

97 5.1. FUTURES LINES OF WORK

After such a deep study of an effect, a few questions still hang around my mind, and many improvements can, and should be made:

The first thing that can be done with the Matlab algorithm is making an adaptation of the code for the Texas Instruments DSP 5416, with the purpose of having a real time vocoder. This was my main frustration after and during the development of this project.

It is a fact that there are hundreds of commercial vocoder implementations, whether software or hardware, that has extra configuration parameters that could be also interesting to analyze and eventually add to this vocoder. Nevertheless software vocoders are easier and cheaper to get.

For electronic lovers would be attractive to build one in the old-school style with a bank of analogue filters, with component tolerance and temperature depending values make an analogue vocoder an alive being.

Another interesting future line of work could be the experimentation with as many as possible carrier and modulator signals. After analyze the results obtained; this could be done from a technical point of view or from an artistic/musical point of view; or why not both?

Experimentation with sounds would grow infinitely if we add extra effects to carrier, modulator and vocoded output signals. But for this purpose, it would save a lot of time if a DAW (digital audio workstation) is used instead of Matlab.

98

References

Udo Zölzer (2002).DAFX: Digital Audio Effects; John Wiley & Sons

Crochiere, R. E. & Rabiner, L. R. (1983). Multirate Digital Signal Processing.

Englewood Cliffs, NJ: Prentice-Hall.

Web resources

Note: Access date is placed after link between brackets. http://en.wikipedia.org/wiki/Vocoder (11-10-09) http://en.wikipedia.org/wiki/Talk_box (11-10-09) http://en.wikipedia.org/wiki/Phase_vocoder (11-10-09) http://ptolemy.eecs.berkeley.edu/%7eeal/audio/voder.html (11-10-09) http://ptolemy.eecs.berkeley.edu/%7eeal/audio/vocoder.html (14-10-09) http://www.newmusicbox.org/third-person/oct99/links.html (14-10-09) http://www.paia.com/ProdArticles/vocodwrk.htm (14-10-09) http://www.obsolete.com/120_years/machines/vocoder/(19-10-09) http://www.wendycarlos.com/vocoders.html (24-10-09) http://www.musicofsound.co.nz/blog/sparky-the-sonovox (24-10-09) http://www.panix.com/~jens/pvoc-dolson.par (24-10-09) http://recherche.ircam.fr/equipes/analyse-synthese/roebel/paper/dafx2003.pdf [A NEW APPROACH TO TRANSIENT PROCESSING IN THE PHASE VOCODER Axel Röbel] (29-10-09) http://www.sony.com.au/product/mdr-xd100 (29-10-09) http://www.pioneer.at/eur/products/42/203/17331/DM-DV5/specs.html (4-11-09)

99 http://www.mathworks.com/ (16-11-09) http://en.wikipedia.org/wiki/MATLAB (16-11-09) http://audacity.sourceforge.net (16-11-09) http://www.matsc.net/vocator%20i.html (16-11-09) http://digitalmedia.oreilly.com/pub/a/oreilly/digitalmedia/2006/03/29/vocoder-tutorial- and-tips.html (16-11-09) http://www.proun.net/gallery/korg_vc10.html (16-11-09) http://www.d16.pl/drumazon (17-11-09) http://www.dsprelated.com/dspbooks/sasp/Dudley_s_Channel_Vocoder.html (17-11-

09) http://eceserv0.ece.wisc.edu/~sethares/vocoders/channelvocoder.html (17-11-09) http://www.sirlab.de/linux/descr_vocoder.html (17-11-09) http://www.infekted.org/virus/showthread.php?p=289469 (19-11-09) http://www.bores.com/courses/advanced/windows/files/windows.pdf (19-11-09) http://www.coe.montana.edu/ee/rmaher/ee477/ee477_fftlab_sp07.pdf (19-11-09) http://en.wikipedia.org/wiki/Window_function (19-11-09) http://en.wikipedia.org/wiki/Sound_Forge (19-11-09) http://sony-sound-forge.software.informer.com/ (19-11-09)

100