Music Technology Central York High School

Course Text

1

Table of Contents

I. The Nature of Sound A. Sound Processes 1. Characteristics of Sound - Waveforms 2. Properties of Sound 3. Phase

II. MIDI - Musical Instrument Digital Interface A. Introduction 1. MIDI for Beginners 2. Multitimbral and Polyphonic

B. History of MIDI

C. MIDI Spec. 1. The Interface 2. MIDI Language 3. 128 = The Magic Number 4. Controllers 5. MIDI Messages 6. MIDI Channels

D. General MIDI 1. General MIDI Specification 2. General MIDI Instruments

E. MIDI and the Computer

F. Standard MIDI File

G. MIDI Timing Concepts 1. Relatively Real 2. MIDI Timing Clock 3. MIDI Timing Clock and the SMPTE

2

III. A. The Roots of Synthesis

B. Pitch = Frequency

C. Volume = Amplitude

D. Timbre = Harmonics

E. Modulation

F. Putting It All Together

IV. Making Sound Come to Life A. Controllers

B. Other Elements

C. Voices

D. Microprocessor Control

V. Digital Audio

A. Range of Values: Bit-Depth 1. Signal to Noise Ratio

B. The Speed: Sampling Rate

C. Converting it Back 1. Buffering

D. Two Basic Rules of Digital Recording

E. What Exactly is a Decibel?

3

I. The Nature of Sound

A. SOUND PROCESSES

“If a tree fell in the forest and no one was there to hear it, would it make a sound…?”

According to the complete definition of sound, the answer is actually “no.” Sound must have three essential elements: generation, propagation (or transmission), and reception.

So if a tree fell in the forest and no one was there to hear it, the third essential element, reception, was missing, and therefore no sound occurred. All three elements are important, and an understanding of each element is essential to a complete understanding of music and sound amplification.

In order for sound to be generated, something must set air into motion. This means anything that vibrates can generate sound, whether it is the strings on a guitar, the reed in an oboe, or our own vocal chords.

1. Characteristics of Sound - Waveforms

The movement of any vibrating sound source can be characterized by the following criteria:

1. The motion takes place symmetrically about an equilibrium position. Translation: A pendulum at rest will hang straight. This is equilibrium (Figure 1). As the pendulum is disturbed, it swings to the left and right of the equilibrium point. Another way to say this is the pendulum goes positive and negative in relation to equilibrium (Figure 2).

Figure 1 Figure 2

4

2. Regardless of how far the source vibrates from equilibrium, the number of completed movements will remain the same. Translation: If two pendulums were the same lengths and pendulum A was pulled 1 foot from equilibrium and pendulum B was pulled 5 feet from equilibrium, they would both cross equilibrium at the same time if released at the same time.

3. The speed of the vibrating source will vary throughout the movement because of the force acting upon it and the inertial property of itself. Translation: As the pendulum swings toward its maximum positive or negative position, it will slow and eventually stop in order for it to return in the opposite direction.

Figure 3 Speed varies throughout the cycle

Figure 4 Speed remains constant

2. Properties of Sound

What makes one sound different from another? How can you tell the difference between a tuba and a piccolo? What distinguishes a whisper from a yell? Why

5

does a saxophone sound different from a violin? There are many subtle things that distinguish the sounds in the examples above. The human ear can hear an astounding range of little differences, which is why it’s possible to identify so many distinct sounds. Consider the obvious:

• A tuba note is lower than a piccolo note; this is a difference in pitch. • A whisper is softer than a yell; this is a difference in volume. • A saxophone sounds different than a violin, even if they play the same pitches at the same volume. This is a difference in timbre (pronounced “tam- ber”), or tone color.

The three properties mentioned above are the building blocks of all sound.

A. Pitch = Frequency

As any object vibrates, it pushes on the air surrounding it, altering the air pressure as it vibrates from positive to equilibrium to negative and back to equilibrium. As the object pushes to the positive side, the air particles are compressed. When the object draws back to its original position, the air pressure returns to normal (equilibrium). As the object continues to vibrate, it pulls in the opposite direction to negative and creates air that is less than normal pressure. This is called rarefaction. Finally the object returns to equilibrium and the air returns to normal pressure.

One complete movement of the object is known as a cycle. The vibrations of the object through these cycles create waves in the air, which transfers the vibrations from the sound source to the receiver.

EXAMPLE: A guitar string is plucked, which sets it vibrating (generation). The vibrations of the string create waves in the atmosphere that radiate out from the string in all directions (propagation). The waves in the air are at the same speed as the string. The resulting airwaves are picked up by the eardrum, which begins vibrating in the same manner as the air (reception). The brain then translates these vibrations into what we perceive as the sound of the guitar string.

With all this in mind, we can plot a graph for sound waves and/or sound sources. Suppose we dipped a pendulum in ink and started swinging it against a roll of paper moving perpendicular to the direction of the swing. We would see that the paper would form a wave that looked like this:

6

Figure 5

Now, turning the paper horizontally and drawing a line through the center of the wave would yield:

Figure 6

This waveform is known as a Sine Wave, named after the trigonometric function. The horizontal line would represent time and the vertical line would represent the positive and negative (forward and backward) extents of the pendulum’s movement. The time line would be referenced in seconds, milliseconds (1/1000 second) or microseconds (1/1,000,000 second), and the extent line could be referenced with any distance measurement (inches, millimeters) or power measurement to achieve that distance (watts, volts, pounds, decibels, RPM’s, etc.).

A cycle can be defined as a completed movement of a wave before it repeats itself. We can also plot time points of the cycle on our horizontal axis. In Figure 7, there

7

is one complete cycle in one second known as one cycle per second or one CPS. Another term for cycle per second is Hertz (abbreviated Hz). Hertz refers to how often the sound source vibrates in a one second interval. For example, a A440 tuning fork vibrates at 440 Hz which means it moves back and forth at a rate of 440 times per second.

Figure 7 1 cycle per second or 1 Hz

Seconds .5 1 1.5 2 2.5 3 3.5 4

Therefore, this rate measures the frequency of the movement, or how often it repeats in a one second interval. We use frequency to describe and determine pitch in sound. The greater the frequency (the more cycles per second), the higher the pitch we hear. 440 Hz is a lower frequency note than 880 Hz. Since the strings on a violin vibrate faster than the strings on a , the bass guitar plays lower pitched notes.

B. Volume = Amplitude

The distance or extent of the waveform from the equilibrium position is known as the amplitude of the wave. Here we have two waves with the same frequency (3 Hz). The only difference between them is their differences in amplitude. Wave A has greater amplitude than Wave B. Amplitude refers to the intensity or loudness of the sound.

Figure 8

8

C. Timbre = Harmonics

A complex tone can be characterized as two or more waves having different frequencies, amplitudes, and phase relationships. Waves that have frequencies related by whole numbers are called harmonics. These harmonics can be added to pure tones by electronic means or can be inherent in the instrument (or sound source) itself.

If the same note were played on a piano and violin, it would be easy to differentiate the two instruments. The reason a piano sounds different from a violin is because of the different harmonics that are naturally generated by the two instruments. To put it another way, harmonics are “ghost” tones that are generated as a result of the structure or playing method of the instrument.

For example, when an A440 tone is played on a piano, the piano string vibrates at 440 Hz which is called the fundamental. However, other vibrations occur as well because of the structure of the piano. Those notes would be two times the fundamental, three times the fundamental, four times the fundamental, etc. This means that although we perceive the pitch at 440 Hz, other pitches like 880 Hz, 1320 Hz, 1760 Hz, etc. are also present and they affect the overall tone of the piano. The amount of harmonics and the intensity of each in relation to the fundamental create a distinctive sound unique to each instrument. This is known as the timbre of the instrument.

Figure 11

9

3. Phase

When two waveforms occur at the same time, they interact with one another and create a new waveform. Phase refers to the effect one wave has on another. Two waves, which are started simultaneously having the same amplitude and frequency will produce a new wave with the same frequency but greater amplitude. In fact, the amplitude will be the sum of the amplitudes of each wave. These two waves are said to be in phase. The result would be a louder sound than either wave by itself.

Figure 9

If two waves with the same amplitude and frequency are started with Wave A moving in one direction (positive) and Wave B moving in the opposite direction (negative), the waves would then cancel each other out and no sound would occur. This is known as phase cancellation.

Figure 10

Similarly, two waves having different frequencies and amplitudes will combine to create a new, more complex wave (more complex than Sine Wave).

A Sine Wave is a pure tone of only one frequency.

10

II. MIDI Specification

A. INTRODUCTION

1. MIDI for Beginners

The acronym MIDI stands for Musical Instrument Digital Interface. A Musical Instrument is a machine that makes sounds which humans have decided to call music. Digital means information that is encoded in numerical form, i.e. numbers, while Interface means facilitating communication between two or more systems.

In practical terms, MIDI is a standard way for all sorts of modern musical equipment to talk to each other. This equipment commonly consists of things like keyboards, computer sequencers, synthesizers, and samplers, but it also includes mixers, tape recorders, effects generators, guitars, drum kits, wind instruments etc.

The MIDI Standard was designed in the early 80's by a partnership between Roland and Sequential Circuits, two of the largest manufactures of the time. This came about because of pressure from keyboard players, who wanted a universal interface standard for all their synthesizers to comply to. They were fed up with different synthesizer corporations using their own communications standard, which were incompatible with those of other corporations.

After the publication of the MIDI standard in 1984, other musical equipment manufactures quickly began to implement it in the designs of their products and MIDI became a world wide standard.

A major advantage of MIDI over old analogue interface standards, such as CV (Control Voltage), is that it is possible to transfer up to sixteen channels of data down one cable, as opposed to CV's one channel per cable.

Another major advantage of MIDI is that it enables computers equipped with MIDI to be used to write music and control musical equipment. This is done with programs called sequencers. They can give a very high degree of control over music, impossible through conventional means.

Another advantage of MIDI is that it is now a worldwide standard, insuring that practically all professional equipment will be compatible with it.

Having sixteen channels to transfer MIDI data can also be a limitation when you want to use more than sixteen channels. However, using two or more interfaces, each giving sixteen channels, will get around this problem.

Another limitation of MIDI is that you can not use it to transfer real time digital audio.

11

MIDI information is transferred by digital signal sent down a wire from one system to another. This digital data takes the form of binary numbers, physically transferred by sending zero volts for zero or off and plus five volts for one or on.

Certain binary numbers convey certain types of information; for example a certain binary number will tell the device that a note on a keyboard has been pressed. This is called a note on event and the binary numbers sent through MIDI will also tell the receiving system which note has been pressed and how quickly it was pressed.

A common misconception sometimes made by those new to MIDI is that analog or digital audio data is somehow transmitted over MIDI cables. It is only possible to transfer binary data over MIDI. Although this may include sample data, you cannot use MIDI to send audio information in real time.

2. Multi-Timbral and Polyphonic

If a sampler, synthesizer or is multi-timbral then it can generate two or more different instrumental sounds at the same time. These sounds are normally accessed over MIDI; therefore, the multi-timbral capacity of an instrument is normally limited to sixteen instruments to match the sixteen available MIDI channels.

If an instrument is polyphonic it can generate two or more notes at once. For example, a keyboard that is thirty-two voice polyphonic can play thirty-two different notes at once.

The terms multi-timbral and polyphony are often confused. They do not have to be linked, for example my Roland Super JX-10 is only two part multi-timbral, but it is twelve voice polyphonic.

General MIDI is a standard for unifying the set of instruments used in different manufacturer keyboards, sound modules and computer sound cards. It consists of 128 instruments and several drum kits.

The idea is that if you put together a MIDI file using one manufactures General MIDI compliant device, you can play it back on another device and it will sound, if not exactly the same, at least all the instruments will be the same.

B. History of MIDI

The way music is made has been changed forever. MIDI instruments are now the tools of artists from an enormous range of styles and traditions. The quality, and perhaps even the quantity, of music has grown as a result of the MIDI phenomenon. First, a little history:

12

The 'sixties and 'seventies were an explosive time for the creation of new musical instruments. In addition to the blossoming use of electric guitars and new keyboards such as electric organs and pianos, a whole new breed of musical instrument was beginning to appear on albums and in concerts--the electronic synthesizer. These large, odd looking and odder sounding machines were based on simple analog electronics. They used electric voltages to create and control sounds. Higher voltages would make higher notes and lower voltages made lower notes. Several small companies began to make instruments, all based on the concept of control voltage (CV). Short electrical cables called patch cords would feed the control voltages around these instruments to manipulate the sound's character and shape.

For musicians who wanted to play from a standard organ-like keyboard, special CV keyboards were built to control the rest of the instrument. These early synthesizers could play only a single note (monophonic) at a time. To get more musical lines, you either had to buy more synthesizers, or record parts on tape. These synthesizers were difficult to set up, use and maintain, but they gave those musicians something they could get no other way--fresh new sounds.

The monophonic (single note) Moog and ARP brands of synthesizers were already bending quite a few ears by the mid 'seventies with bands such as Emerson, Lake & Palmer, Genesis and others, when the Oberheim company introduced the first commercial polyphonic (able to play several notes at a time) keyboard synthesizer. Relative to its unwieldy predecessors, it was simple to use, had a built-in keyboard, was able to play four notes at a time (400% improvement!), and had a simple array of knobs and switches you could manipulate to quickly create rich, wonderful new sounds. It was far more portable and easy to program than most of its predecessors.

Soon after, more easy to use, good-sounding, polyphonic synthesizers began to appear; Sequential Circuits, Yamaha, Moog (pronounced like "vogue"), Roland, ARP, and other companies introduced new models of electronic instruments, all able to play multiple notes simultaneously. Just a few years earlier, what was an expensive, unwieldy and difficult to use machine, was becoming a popular instrument with a growing crowd of diverse musicians.

After polyphony, perhaps the next most important advance in early synthesizer technology was the incorporation of programmable memory into instruments. All polyphonic synthesizers have a small built-in computer that "looks" at each key on the keyboard to see if it has been pressed, and then passes those notes on to the available oscillators (which are the special electronic circuits in a synthesizer that make the actual sounds). That small computer could also help store and recall sounds created by the user into the synthesizer's built-in memory (like taking a snapshot of all the knobs and buttons on the instrument). This opened up a whole new world for live performance.

Prior to programmable memory, the reason that people like Keith Emerson and Rick Wakeman had such extravagant keyboard setups on stage was that each of the

13

instruments could only be set-up to produce a single sound per show. Hours of preparation were needed to patch together the sounds and the different instruments. When memory came along, it allowed a single synthesizer to be used for several different sounds during a live show or recording session, by simply pressing a single button.

Adding memory to the synthesizer made it many times more useful. But many early synthesizers--like many cars--had personalities of their own. Some got wonderful, thick brass. Others were more adept at woodwinds, or strings, or bells, or sound effects, or pianos, or colorful tropical birds, or the laugh of small friendly aliens. What was needed next was a way to combine the best of each instrument into a single, useful musical system.

A technique that some early synthesizer players adopted to create new sounds was to play the same part on two keyboards at the same time, one hand on each instrument. A keyboardist, could then use each instrument to its best advantage: strings from the "synth string," brass from the "brass synth," and so on. This was an awkward technique at best, and one's polyphony was limited to the number of fingers on one hand, typically five.

Rock musicians such as Keith Emerson became famous for the enormous stacks of electronic keyboards they would stand in front of and play. Joe Zawinful, of the 'seventies jazz group Weather Report, developed a unique technique for playing on two keyboards simultaneously. He placed himself between a pair of ARP 2600 synthesizers, one of which had its keyboard electronically reversed, going from high notes on the left to low notes on the right.

All these elaborate measures were designed to accomplish one thing--getting the most from these great new instruments. The layering of sounds upon sounds became an important tool, almost like a trademark sound for some of these and other artists. Then, in 1979, came the next big step: some new keyboards were coming equipped with computer interface plugs on the back. Instruments from the Oberheim. Rhodes, and Roland companies could, for the first time, be connected to another of the same model of synthesizer. For example, an Oberheim OBX synthesizer could be connected to other OBXs. When you played on the keyboard of one, both would play whatever sound was programmed. This was an improvement for performers, since sounds could be layered on top of each other while playing a single keyboard, but it didn't answer the big question of how to connect different instruments from different brands together for unique combinations.

One person who took matters into his own hands was jazz musician Herbie Hancock. Newly enthralled with the technology of synthesizers, he spent a small fortune to have many of his electronic instruments custom modified to connect with each other, allowing him to mix and match sounds any way he wished. For the first time, instruments of different makes were connected with each other by means of a common, though custom, digital connection.

14

More and more rock and jazz musicians were approaching the instrument makers to try and get their own equipment to interconnect. In addition, the first digital sequencers (a device that records and replays back a performance on an electronic musical instrument) were starting to show up. These sequencers, such as the Roland MC-4 Micro-Processor and the Oberheim DSX, were yet another reason to want compatibility between products from the different instrument makers. It would be possible for one person to sequence and play back all parts of a song on a group of synthesizers. The Roland Micro-Processor was a primitive, four-track sequencer that produced either control voltages, used extensively for controlling earlier analog synthesizers, or used on a special "Roland only" digital connector for some of their newer instruments. Oberheim's sequencer was quite a bit more sophisticated, but was limited to use only with their own OBX model synthesizers.

Time was ripe for a change to occur in the musical instrument industry by the early 'eighties. Synthesizers were no longer a techno-oddity, and sales on instruments to the mass market musicians, as well as to professionals, were growing quickly. There were more companies involved now from Japan, the U.S. and Europe. The diversity of keyboards, drum machines, sequencers, and other musical devices were growing rapidly. To move up another notch in technology and accessibility, the synthesizer industry needed to take a lesson in compatibility from the computer industry. Computer makers have long depended on certain standards to ensure compatibility between computers and other devices. For example, the modem is a device that lets computers exchange information over telephone lines. It makes no difference what the makes, models or cost of the interconnected computers are. Now, millions of people on thousands of different computers all speak the same "language" because their equipment was designed using the same technical standard for modems. Others examples of computer standards are disk drives, printers, cables, memory chips, and many types of software. Compatibility strengthened the new personal computer industry and was a major factor in its amazing success. There are many examples of technical standards that allow devices from different companies to work together. Look at the success of items such as VCRs, video cameras and tapes, cassette machines and tapes, stereo equipment and a host of everyday gizmos.

Twice a year, the members of the National Association of Music Merchandisers (NAMM) hold a huge convention to show new musical products and find new ways to market musical instruments and accessories. During one of these shows in 1982, a meeting of a small group of synthesizer manufacturers took place at the request of Dave Smith, President of Sequential Circuits, a popular synthesizer company at the time. Engineers from many of the major synthesizer companies were in attendance. They discussed a proposal for the adoption of a universal standard for the transmitting and receiving of musical performance information between all types of electronic instruments. The original proposal was called UMI, for Universal Musical Interface.

The original proposal went through a significant number of revisions before being renamed and becoming the Musical Instrument Digital Interface, or MIDI standard. Several prominent Japanese musical instrument companies became involved in

15

engineering the final version. It was a truly international cooperative venture. Finally, in 1983, Sequential Circuits from the U.S. and Roland from Japan introduced the first keyboards with MIDI, soon followed by virtually every other synthesizer company in the world!

Within three years after MIDI's introduction, almost no electronic instrument was made in the world that didn't have a MIDI plug on it. It became a true universal standard. To this day there is no competition to MIDI for connecting all types of electronic musical instruments together or for creating personal musical systems. Like computers, MIDI is used by millions of people for thousands of applications. It's also being used in fields other than just music, such as theatrical lighting, computer games, and recording studio automation.

From the beginning, the MIDI standard was designed with room for growth and improvement. Since its start, new features have been added, while others have been defined more clearly. A great deal of room was left for expansion without sacrificing the main power of MIDI--simplicity and compatibility with all other existing MIDI instruments.

C. MIDI Specification

1. The Interface

MIDI = Musical Instrument Digital Interface

MIDI is a digital communications protocol. In August of 1983, music manufacturers agreed on a document that is called "MIDI 1.0 Specification". Any device that has MIDI capabilities must adhere to this specific data structure to ensure that all MIDI devices are capable of working together. This protocol is a language that allows inter-working between instruments from different manufacturers by providing a link that is capable of transmitting and receiving digital data. It is important to remember that MIDI transmits commands, but it does not transmit an audio signal.

The MIDI specification includes a common language that provides information about events, such as note on and off, preset changes, sustain pedal, pitch bend, and timing information. The specification has been updated more recently with specific data structures for handling sample dumps, MIDI time code, general MIDI and standard MIDI files.

MIDI information is transmitted through a MIDI cable that has DIN-type male plug connectors with five pins. Two of the pins are used to transfer digital binary information (MIDI Code). One of the pins issues a steady stream of five volts, while the other pin alternates between 5 volts and 0 volts to represent binary information

16

(on and off). The third pin is a ground and the remaining two pins are currently not in use.

The MIDI data is sent down the cable one bit at a time as a stream of information, which is called a serial interface. A parallel interface allows the information to be sent down separate wires so that the message reaches the device at the same time, making it faster than a serial interface. Computer chips communicate via a parallel interface.

The serial interface was chosen by MIDI manufacturers because it is less expensive and more efficient than a parallel interface. The speed of a MIDI serial interface is 31,250 bits per second. There are 10 bits needed for every MIDI digital word or 3125 messages per second. Snap your finger and think about how many events could be transmitted during that time. Consequently, the serial interface speed is more than adequate for most music applications.

17

2. MIDI Language

The MIDI language is represented with binary code. Each 0 or 1 is called a bit. Four bits equal a nibble and eight bits equal a byte. With MIDI, each digital word consists of a total of 10 bits: 8 bits (1 byte) plus one start bit and one stop bit. (The MIDI messages in this program will not display the start and stop bits.)

When we look as a digital word, 10010110, the bit at the far left is considered the Most Significant Bit. The remaining seven bits, 10010110, are considered the Least Significant Bits. Most MIDI messages consist of one, two or three bytes. Each byte may be classified as a status or data byte.

A MIDI processor will look at the Most Significant Bit to see if it is a 1 or a 0. Status bytes start with a 1, while data bytes start with a 0. A status byte is the first word in a digital MIDI message, and it is used as an identifier or an instruction.

Channel messages are composed of status bytes that are followed by one or more data bytes. The data bytes are information that is pertinent to the status byte. Because a data byte starts with a 0 in the binary number, 01101100, there are 128 possible values.

3. 128 = The Magic Number

It is not important at this stage to understand the difference between status and data bytes. It is important to know that there are 128 possible values in the MIDI binary language.

Almost all parameters in the MIDI spec are built around 128 values. Also important to note is that some applications number these values 1-128 and others number them 0-127. Either way, there 128 references or levels for every parameter.

128 possible notes 128 levels of dynamics (volume) 128 levels of velocity 128 levels of stereo panning and so on …

4. Controllers

Controllers are functions in the MIDI specification that control the parameters of the sound. There are theoretically 128 controllers available. We will only be concerned with a few of the controllers. The MIDI controllers are broken down as follows:

VARIABLE CONTROLS (0 - 63) Patch #1 Main volume control #7 Pan #10

18

SWITCH CONTROLS (64 - 95) Sustain pedal #64 UNASSIGNED CONTROLS (96 - 120) CHANNEL MODE MESSAGES (121 - 127)

The unassigned controllers enable equipment such as automated audio mixers to be set by MIDI control change messages and can be assigned by the user.

5. MIDI Messages

MIDI information is transmitted and received one bit at a time. A small current (5 volts) is switched on and off rapidly. Current on represents 1, current off represents 0. Each 0 or 1 in a MIDI message is called a bit. In computer jargon, a group of 8 bits is called a byte. Each MIDI message consists of a number of bytes, typically 1, 2 or 3 bytes. Each byte is preceded with a start bit and succeeded by a stop bit. The first byte in a MIDI message is sometimes called the header byte.

6. MIDI Channels

MIDI has 16 channels, numbered 1 to 16. Channels allow MIDI messages to be directed (channeled) to a particular device. This is made possible by placing a channel number in the header byte of a MIDI message. Channel numbers allow messages to be directed to a particular instrument in a multi-instrument setup. Channel numbers also allow messages to be directed to a particular device within an instrument, e.g. to a particular voice.

D. GENERAL MIDI

Not everyone becomes involved with MIDI and electronic instruments because they want to create all kinds of new sounds. Some of us are just musicians who want a palette of good sounds with which to compose and perform. A need arose in the computer world for better sounds in games and multimedia. Those needs were recognized, and a new part of the MIDI standard was born. Welcome to General MIDI.

No, not a high ranking musical military officer, General MIDI (GM) is an unusual part of the MIDI specification called a recommended practice. It isn't a MIDI message or command. Instead, it is a description of a class of musical instruments that all share a consistent set of features and capabilities. This means that music created for playback on a GM instrument will sound musically consistent on any GM instrument.

A typical MIDI system is very personalized. You choose the instruments you want to use and the sounds that are in each instrument. Manufacturers developing new synthesizers are free to put any kind of sounds in the machine they wish, in any

19

order. Sequences you record in your studio with your equipment will not automatically fit into other MIDI systems. You will need to put some amount of effort in to redirecting channels and changing pitch change messages to fit the new gear you are using.

1. GM Specifications

All of this extra noodling disappears with GM. Any instrument that bears the mighty GM logo must adhere to a predetermined list of features and patch assignments. They include:

* minimum of 24 voices of polyphony in order to play back very full arrangements * Respond to all 16 MIDI channels * Each channel can access any number of voices (dynamic voice allocation) * Each channel can play a different timbre (multi-timbral) * A full set of percussion instruments is available on Channel 10 * All percussion instruments are mapped to specific MIDI note numbers * A minimum of 128 presets available as MIDI Program numbers * All sounds are available on all MIDI channels (except 10, which is for percussion) * All GM instruments and percussion respond to Note On velocity (touch sensitive) * Middle C is always note number 60 * All GM instruments can respond to these MIDI Controllers: 1 Modulation 7 Volume 10 Pan 11 Expression 64 Sustain Pedal 121 Reset All Controllers 123 All Notes Off

* GM instruments respond to all Registered Parameters: 0 Pitch Bend Sensitivity 1 Fine Tuning 2 Coarse Tuning

* GM instruments all respond to these additional MIDI messages:

Channel Pressure (Aftertouch) Pitch Bend (with a default of two semitones)

2. GM Instruments

Perhaps most importantly, GM doesn't leave the order of the patches in a synthesizer's memory to chance. At the heart of the GM system is the GM Sound Set. It specifies the exact sound, by name, for each of the 128 sound in the

20

synthesizer. For example, Patch #1 must always be an acoustic grand piano on all channels (except 10 which is reserved for drums and percussion) on all GM instruments. All other patch locations are also organized to conform to the GM Sound Set.

The patches are arranged into 16 "families" of instruments, with each family containing 8 instruments. For example, there is a Reed family. Among the 8 instruments within the Reed family, you will find Saxophone, Oboe, and Clarinet.

PIANO CHROMATIC PERCUSSION 1. Acoustic Grand Piano 9. Celesta 2. Bright Acoustic Piano 10. Glockenspiel 3. Electric Grand Piano 11. Music Box 4. Honky-tonk Piano 12. Vibraphone 5. 1 13. Marimba 6. Electric Piano 2 14. Xylophone 7. Harpsichord 15. Tubular Bells 8. Clavi 16. Dulcimer

ORGAN GUITAR 17. Drawbar Organ 25. Acoustic Guitar (nylon) 18. Percussive Organ 26. Acoustic Guitar (steel) 19. Rock Organ 27. (jazz) 20. Church Organ 28. Electric Guitar (clean) 21. Reed Organ 29. Electric Guitar (muted) 22. Accordian 30. Overdriven Guitar 23. Harmonica 31. Distortion Guitar 24. Tango Accordian 32. Guitar Harmonics

BASS STRINGS 33. Acoustic Bass 41. Violin 34. Electric Bass (finger) 42. Viola 35. Electric Bass (pick) 43. Cello 36. Fretless Bass 44. Contrabass 37. Slap Bass 1 45. Tremelo Strings 38. Slap Bass 2 46. Pizzicato Strings 39. Synth Bass 1 47. Orchestral Harp 40. Synth Bass 2 48. Timpani

ENSEMBLE BRASS 49. String Ensemble 57. Trumpet 50. String Ensemble 58. Trombone 51. SynthStrings 1 59. Tuba 52. SynthStrings 2 60. Muted Trumpet 53. Choir Aahs 61. French Horn 54. Voice Oohs 62. Brass Section 55. Synth Voice 63. SynthBrass 1 56. Orchestra Hit 64. SynthBrass 2

21

REED PIPE 65. Soprano Sax 73. Piccolo 66. Alto Sax 74. Flute 67. Tenor Sax 75. Recorder 68. Baritone Sax 76. Pan Flute 69. Oboe 77. Blown Bottle 70. English Horn 78. Shakuhachi 71. Bassoon 79. Whistle 72. Clarinet 80. Ocarina

SYNTH LEAD SYNTH PAD 81. Lead 1 (square) 89. Pad 1 (new age) 82. Lead 2 (sawtooth) 90. Pad 2 (warm) 83. Lead 3 (calliope) 91. Pad 3 (polysynth) 84. Lead 4 (chiff) 92. Pad 4 (choir) 85. Lead 5 (charang) 93. Pad 5 (bowed) 86. Lead 6 (voice) 94. Pad 6 (metallic) 87. Lead 7 (fifths) 95. Pad 7 (halo) 88. Lead 8 (bass + lead) 96. Pad 8 (sweep)

SYNTH EFFECTS ETHNIC 97. FX 1 (rain) 105. Sitar 98. FX 2 (soundtrack) 106. Banjo 99. FX 3 (crystal) 107. Shamisen 100. FX 4 (atmosphere) 108. Koto 101. FX 5 (brightness) 109. Kalimba 102. FX 6 (goblins) 110. Bag Pipe 103. FX 7 (echoes) 111. Fiddle 104. FX 8 (sci-fi) 112. Shanai

PERCUSSIVE SOUND EFFECTS 113. Tinkle Bell 121. Guitar Fret Noise 114. Agogo 122. Breath Noise 115. Steel Drums 123. Seashore 116. Woodblock 124. Bird Tweet 117. Taiko Drum 125. Telephone Ring 118. Melodic Tom 126. Helicopter 119. Synth Drum 127. Applause 120. Reverse Cymbal 128. Gunshot

22

GENERAL MIDI PERCUSSION SET

35. Acoustic Bass Drum 59. Ride Cymbal 2 36. Bass Drum 1 60. Hi Bongo 37. Side Stick 61. Low Bongo 38. Acoustic Snare 62. Mute Hi Conga 39. Hand Clap 63. Open Hi Conga 40. Electric Snare 64. Low Conga 41. Low Floor Tom 65. High Timbale 42. Closed Hi-Hat 66. Low Timbale 43. High Floor Tom 67. High Agogo 44. Pedal Hi-Hat 68. Low Agogo 45. Low Tom 69. Cabasa 46. Open Hi-Hat 70. Maracas 47. Low-Mid Tom 71. Short Whistle 48. Hi Mid Tom 72. Long Whistle 49. Crash Cymbal 1 73. Short Guiro 50. High Tom 74. Long Guiro 51. Ride Cymbal 1 75. Claves 52. Chinese Cymbal 76. Hi Wood Block 53. Ride Bell 77. Low Wood Block 54. Tambourine 78. Mute Cuica 55. Splash Cymbal 79. Open Cuica 56. Cowbell 80. Mute Triangle 57. Crash Cymbal 2 81. Open Triangle 58. Vibraslap

23

GENERAL MIDI DRUM KIT MAPPING

24

E. MIDI AND THE COMPUTER

Perhaps the most exciting field where GM has had an impact is with personal computers. The market for computer soundcards (a special electronic card that fits into an internal slot in a computer) has become enormous, primarily for video game sound enhancement. These cards provide sound capabilities far beyond what is built into most standard PCs. In order for game designers to take advantage of better sounds for their scores, they must know exactly what card the game will use. The industry standard in the IBM world has been the SoundBlaster series from Creative Labs for many years. The original was a simple, 11 voice synthesizer with relatively crude sound quality and basic sample playback for sound effects. The newer SoundBlaster versions come with many more voices and much better sound quality. Virtually every video game for the IBM PC supports these cards for enhanced sound capabilities.

With the introduction of GM, computer-game software companies recognized that the far superior sound quality and musical sophistication of these instruments were a much-desired upgrade from the current crop of relatively simple sound cards. GM-compatible synthesizer cards that plug inside a PC are available from several companies. Many games offer the option of either SoundBlaster or General MIDI. Of course, while the music and sound effects for games using the SoundBlaster will work with only that one card, games using GM can use any GM synthesizer. Creative Labs released a GM add-on for the SoundBlaster called the WaveBlaster.

While some multimedia systems rely only on digitally recorded sound from disk, MIDI sequences sent to a GM synthesizer module have become an attractive alternative for some software makers. Recording to hard disks takes up a huge amount of space, while even a very complex MIDI sequence takes up very little space and still sounds great. The use of GM-compatible MIDI sequences and the proper hardware can allow for hours instead of minutes of music on a small disk. MIDI can also be used to supplement audio recordings in a multimedia presentation.

F. STANDARD MIDI FILE (SMF)

Standard MIDI File is the protocol that is used to transfer MIDI information from one type of device to another. A MIDI sequencer file could be transferred to another sequencer or to a notation program.

This standard was added to the MIDI specification in 1988. It is a universal language that saves all MIDI notes, velocities, and controller codes as a generic file that may be interpreted by any program that supports the Standard MIDI File. In music applications that support Standard MIDI Files, the user may access and create Standard MIDI Files with the import and export commands. Standard MIDI files have the extension .mid added to the end of the document name.

There are three types of Standard MIDI Files, that include:

25

* Type 0 - which combines all the tracks or staves in to a single track. * Type 1 - saves the files as separate tracks or staves for a complete score with the tempo and time signature information only included in the first track. * Type 2 - saves the file as separate tracks or staves and also includes the tempo and time signatures for each track.

The internet is an excellent source for Standard MIDI Files. Because the files are MIDI information, they are usually fairly compact is size and they may also be compressed before they are sent over the internet.

G. MIDI Timing Concepts

1. Relatively Real

The first concept that needs to be understood is the difference between absolute and relative time. MIDI Time Code and SMPTE Time Code are representations of absolute time in that they follow hours, minutes and seconds just like your watch. Absolute time is always the same and you cannot speed it up or slow it down. Relative time is a reference to a musical piece that has an inner tempo. A composition may take three minutes to perform at a tempo of 80 bpm (beats per minute), but would take only a minute and a half if the tempo was increased to 160 bpm. An advanced MIDI sequencer is able to work with both absolute time and relative time and make adjustments when there are changes in the relative time of a composition.

In the chart below the upper line represents absolute time and includes an example of 17 seconds of time. Both MIDI Time Code and SMPTE Time Code may be used to represent absolute time which is fixed and may not be moved. Following the absolute time line are examples of repeated quarter notes at 3 different tempos (relative time). Notice that 2 measures have passed at 4 seconds with a tempo of 120 bps, but it would take 8 seconds for the same two measures at a tempo of 60 bpm and 16 seconds for the same two measures at a tempo of 30 bpm.

26

2. MIDI Timing Clock

(MIDI Sync) is a status byte that is sent 24 times per quarter note for note resolution. Advanced sequencers will also subdivided each MIDI clock twenty times for a resolution of 480 times per quarter note. Standard - 24 PPQ. Professional - 480 PPQ. If the tempo changes, the speed of the MIDI Timing Clock will pass at a faster rate, but the number of status bytes per quarter note will stay the same.

A MIDI Sync status byte is a system real time message that is used in a MIDI sequence to connect all the different rhythm values. The image above is a simple sequence consisting of quarter notes in the 1st measure, half notes in the 2nd measure, a whole note in the 3rd measure, followed by eighth-notes and sixteenth notes in the 4th and 5th measure, and finally a whole note in the last measure.

To the left is a graph depicting the same information in an event listing. Notice that the first number represents the measure, followed by the beat and finally the MIDI Sync timing clock resolution, also referred to as “ticks.” In the first three measures each note starts exactly on the beat which corresponds to a timing clock reference of 0. Notice at the 4th measure that the timing clock has changed to values of 240. Remember that 480 times is the total for each quarter note and the 240 value is an eight note. In measure 5, the value changes to 120 which now represents the sixteenth note.

If the tempo of this sequence was changed to a different number, the numbers and values on these two charts would not change. What would change is the location of the notes in respect to absolute time. The value of the new tempo would map out a new chart for placement along the MIDI Time Code.

27

3. MIDI Time Clock & SMPTE Time Code

Now we will look at what MIDI Time Code and SMPTE Time Code have in common (absolute time) and how it is used in conjunction with MIDI Timing Clock (relative time).

A Time Code is an electrical or digital signal that gives a common timing reference for syncing MIDI devices with other electronic devices. The evolution of timing references has changed dramatically since the beginning of MIDI, from the use of click track, PPQ Clocks, FSK Clocks, to the more sophisticated use of SMPTE Time Code and MIDI Time Code.

SMPTE Time Code (Society of Motion Picture and Television Engineers) was originally developed by NASA to sync computers together. SMPTE is a digitally encoded labeling system that strips a tape with an exact timing reference. It is a 1200 Hz modulated square wave that uses a biphase modulation to encode the signal on a tape. SMPTE is used to sync video with music, audio and sound effects. It may be used with a multi-track tape recorder and a synchronizer to sync a MIDI sequence with the tape or to sync two multi-track tape recorders. It is important to remember that a SMPTE Time Code signal is made up of digital words that contain 80 bits. These bits are used to represent the type of SMPTE signal, the actual location which includes hours, minutes, seconds, and frames, as well as user bits for dates and reel numbers. This digital number is converted to an audio tone so that it may be recorded on to an analog tape.

There are two different ways to record a SMPTE Time Code on to a tape. LTC Time Code (Longitudinal Time Code) is SMPTE Time Code encoded on one of the audio tracks or in-between the audio tracks of a video tape. This type of time code needs to be running in order to be read. A window burn is used on a working copy of a tape with LTC in order to address the code in slow or paused position. VITC Time Code (Vertical Interval Time Code) is SMPTE Time Code encoded on the Video signal in the Interval between frames. This allows for the use of SMPTE at very slow speeds without the need of a window burn or the loss of an audio track.

There are four different SMPTE formats in the standard. Each format refers to the number of frames per second. In film there are 24 frames which corresponds to the number of pictures per second. In European film and video the number of frames per second is 25. In the United States the original black and white TV programs ran at exactly 30 frames per second. This changed with the advent of color to run at 29.97 frames per second. In order to compensate for this discrepancy, there is a SMPTE format called "drop frame" at 30 frames per second. There are exactly 30 frames for every second, except two frames are dropped at the beginning of every minute, except at minutes 0, 10, 20, 30, 40 and 50. A total of 108 frames are dropped for each minute to adjust to the 29.97 frames per second. The standard that is used for most applications is "30 drop frame" and listed below are some examples.

28

The number to the above represents a specific moment in time. The number represents 1 hour, 6 minutes, 12 seconds, 9 frames and 6 bits. There are 80 bits for every frame.

The number above represents 23 hours, 59 minutes, 59 seconds, 29 frames, and 79 bits. By adding one more bit to this number would change the number to 24 hours.

When MIDI was invented there was no specific data for synchronizing with SMPTE Time Code. If a musician was trying to use MIDI sequences to synchronize with a video tape they needed a sync device. This was a very complicated way of synchronizing a MIDI sequence to a video tape, so in 1987, MIDI Time Code, was added to the MIDI specification in order to have a direct link of MIDI with SMPTE Time Code.

MIDI Time Code is a digital conversion of the SMPTE Time Code that allows MIDI devices to lock to the SMPTE Time Code in real time.

29

III. Synthesizers

A. The Roots of Synthesis

The birth of the synthesizer came as musicians and electronics engineers pondered a practical way to create and control sound – specifically, to control the three elements of pitch, volume, and timbre. What arose from these ponderings was a collection of related electronic devices, each of which performed a specific task. Originally, each of these devices was a separate module, so synthesizers were called “modular.” These modules were provided with some degree of control, and could be connected in various ways, giving them a certain amount of flexibility.

Connections between modules were made with “patch chords,” and the results were called “patches” – terms borrowed from radio and telephone communications. The terminology had proven to be durable, for today, even though few synthesizers use patch chords anymore, the control settings for a synth sound are still called a patch. Since about 1970, more and more synthesizers have been manufactured as self-contained units rather than as collections of modules, trading flexibility for convenience. Nevertheless, the idea of separate, specialized functions remains.

Although digital technology has largely taken over the electronic music world, many of the methods for dealing with sound, and the terminology used to describe those methods, have roots in analog synthesis.

Analog synthesis remains the best place to begin learning about the use of electronics in music, for at least three reasons:

• It is important historically as the starting point of new musical technology. • It divides synthesis into logical areas that are directly related to the three components of sound (oscillator=pitch, filter=timbre, amplifier=volume). • Because of the first two reasons above, many digital synthesizers are patterned after the classic analog model. And analog synthesizers themselves have refused to die out completely.

B. Pitch = Frequency

Since sound consists of waves, the basic component in the modular synthesizer is a “wave-maker,” called an oscillator. To further define its operation, it bears the full name voltage-controlled oscillator, or VCO.

The abbreviation is just one of many spawned in the early years of synthesis; but the meaning is more important than the letters. Control of this module, and all others, is provided by changes in voltage – electricity. The waves that are produced by the oscillator are nothing more than variations in voltage. They are the electrical analogs – counterparts – of sound waves that travel through the air. Because the synthesizer produces and manipulates these electrical analogs directly, this kind of synthesis is called analog synthesis.

30

Some kind of controller, usually a keyboard, provides a control voltage to the oscillator, telling it what note to play. The control voltage simulates the frequency of the waveform, which in turn, determines the pitch. The earliest synthesizers could only play one note at a time. In the language of synthesis, they were monophonic. Polyphonic synthesizers, those capable of playing several notes at a time, were a later development. Most synthesizers now being made are polyphonic, though many have a monophonic mode, which is useful in certain circumstances and styles of music, especially dance genres.

Early polyphonic instruments offered only a few simultaneous voices. The first polyphonic synthesizer was capable of playing four notes simultaneously. More recently, models allowing 64 or even as high as 128 simultaneous voices have become common.

C. Volume = Amplitude

With means to produce electrical waves and control pitch, the next component to consider is the one that controls volume. This module is the voltage-controlled amplifier, or VCA. The “voltage-controlled” part is as important here as in the oscillator. Most musical sounds do not maintain a uniform volume for their entire length; rather, different sounds have different contours, or envelopes. A few examples from the world of acoustic instruments illustrate this:

• An accordian note begins slowly because it takes time for the reeds to start vibrating. It builds to a certain level of volume, where it remains for as long as the key is held down and air is being squeezed through the bellows. • A xylophone note starts quickly and fades away quickly. • A piano note starts quickly and fades out gradually as long as the key is held down. When the key is released, the note ends quickly.

The synthesizer needs a means to produce different envelops; the component that provides this means is the envelope generator. It allows the synthesizer player to establish the contour of the sound as a series of control voltages that regulate the voltage-controlled amplifier.

The most common envelope generator found in analog synthesizers divides the envelope into four different segments that can be controlled by the synthesizer programmer: Attack, Delay, Sustain, and Release. For this reason, this kind of envelope generator is usually called an ADSR.

The attack segment governs the time it takes for the level to rise form 0 to an initial peak. Since many envelopes of acoustic sounds decay somewhat after the initial peak, the envelope generator includes a decay segment, which also controls a length of time. The decay segment ends at the sustain level which is the level at

31

which the sound remains for as long as the key is held down. After the key is released, the level returns to 0 at a rate determined by the release segment.

The keyboard sends a signal to the envelope generator to start the envelope. This signal, called a trigger, is sent when a key is pressed. Another signal, called a gate, tells the envelope generator to keep running until the key is released. The gate really is the electrical counterpart of a gate in a fence: it stays open (on) while the key is down. When the key is released, the gate closes, signaling the envelope generator to begin the release segment of the envelope.

D. Timbre = Harmonics

A typical oscillator in an offers a choice of several different waveforms for the productions of different timbres. These waveforms are all simple shapes and thus easy to produce as patterns of changing voltage. Yet the offer a broad variety of tone colors. Modern digital synths often substitute more complex waveforms, like the ones generated by samples, for these basic waves, but the principal is the same.

Further control over timber is given by the voltage-controlled filter, or VCF. Like filters in the physical world, it screens out some elements while letting others pass. What it screens out are the harmonics. Because of the central role played by the filter in the removal of harmonics, analog synthesis is also know as .

The filter can be controlled by an envelope generator, just as the amplifier can. In this way, the timbre of a synthesized sound can change over time as the envelope generator changes the extent to which the harmonics are filtered out. One example of this, so familiar that it has become a cliché, is the “wow” sound that is perhaps the sound most readily associated with a synthesizer.

E. Modulation

The typical synthesizer contains other components designed to provide control over the sound. The most important of these is the low-frequency oscillator, or LFO. As its name implies, it produces waves at slow speeds (low frequencies) – usually below the audio range. It is used to change the output of the other components (oscillator, amplifier, filter). This kind of change is called modulation. When the low-frequency oscillator is used to control the main audio-frequency oscillator, it results in vibrato, of frequency modulation, which is a repeating variation in pitch. This is similar to a violinist moving his or her left hand back and forth on a string while playing. Vibrato often makes a synthesized voice sound more natural.

When the low-frequency oscillator is used to control the amplifier, the result is tremolo, or amplitude modulation, which is a repeating variation in volume. The most familiar example of tremolo is the motorized “fans” on a vibraphone.

32

The low-frequency oscillator can also be used to control the filter, which is simply called filter modulation. This is similar to a harmonica player wiggling his hand across the back of his instrument, creating an alternation between a muted and unmated sound.

The low-frequency oscillator has two basic controls: speed and depth. They correspond to the frequency and the amplitude of the low-frequency wave. Since the latter control is especially useful in performance to add and remove modulation, it is included on many keyboard synthesizers as a small wheel. It is placed at the left end of the keyboard and can be rotated by the thumb or fingers of the left hand while playing. It’s usually called the modulation wheel or mod wheel.

Other continuous controllers can be used as well. The most popular include channel and polyphonic aftertouch (where adding extra pressure to the key modifies the sound), foot control, and various sliders, strips, and knobs.

F. Putting It All Together

Here is a block diagram of the main components of the analog-type synthesizer as it has been outlined on in this chapter. This actually represents the minimum standard components of a synthesizer. Many synthesizers, for example, have the capability of combining two or more audio-frequency oscillators to produce a sound. But what is covered here is enough for a solid fundamental understanding of synthesis.

33

IV. Making Sound Come to Life

Before the advent of voltage-controlled synthesizers, an earlier generation of electric and electronic instruments, such as the Hammond organ and the Rhodes electric piano, had been developed with the intent of providing economical and, in some cases, portable alternatives to traditional instruments. The sounds of these instruments were, at best, approximations of their acoustic counterparts. In fact, their most prominent features were their idiosyncrasies and shortcomings: a noticeable click at the beginning of each note in the case of the Hammond, and a thick, bell-like timbre in the Rhodes. Then something odd happened. Over the course of years of being played in concerts and on recordings, the "Hammond sound" and the "Rhodes sound" (to remain with these two examples) became desirable. When the Hammond Organ Company changed the technology in its instruments and did away with the previously unavoidable "key click," many performers were outraged. Public demand forced the company to find a way to "put the click back" into later instruments.

There is a parallel between these earlier instruments and the synthesizer. Although the designers of the modular synthesizers of the 1960s didn't conceive of them for the purpose of imitating acoustic instruments, performers soon began seeking ways to make the synthesizer an imitative instrument. The sounds they developed, though they were poor imitations, became trademarks, so to speak, of the "synthesizer sound." And when newer, digital synthesizers were developed, they were judged on their ability to reproduce sounds such as "analog strings" and "synth brass" - as well as, not surprisingly, the Hammond organ and the Rhodes electric piano.

A. Controllers

Whether synthesizer designers and players agreed on imitative synthesis or not, both groups saw the need to make synthesizers expressive, in order to make them true musical instruments. This has already been seen in the discussion of envelope generator. So the designers developed a number of different kinds of controllers to make this possible.

Since most synthesizers are played from a keyboard, it was natural that on many of them the keyboard was given a sensitivity to touch similar to that of the most familiar keyboard instrument: the piano. This kind of touch-sensitivity is called velocity sensitivity, for changes in sound are a result of the speed with which the key is pressed. Another kind of touch-sensitivity is after-touch, or pressure sensitivity. This allows the player to control the sound after the key has been pressed, while it is being held down.

Complete control over the pitch is essential to expressiveness on most acoustic instruments. A guitarist or a clarinetist shapes the pitch of a note every bit as carefully as he or she does the volume or the timbre. Consequently, nearly every

34

keyboard synthesizer includes a controller to obtain pitches that the keyboard alone cannot. This is the pitch-bend control. It may take any number of physical forms, including a wheel (similar to the modulation wheel), a lever, a joystick, or a touch- sensitive ribbon. The pitch-bend control is usually near the modulation wheel at the left end of the keyboard; together, these are often referred to as the left-hand controllers.

Another means of controlling pitch is portamento, sometimes called glide. This is usually used in monophonic mode (when the synthesizer can only play one note at a time). It works this way: When the player connects the notes - pressing each successive key down before releasing the previous one - the pitch slides from the former note to the latter one. But when he or she plays the notes separately, they sound separately, with no slide between them. A control on the synth governs the speed of the portamento. Wind controllers are another class of control devices. The simplest of these are small mouthpieces that connect by wire to the synthesizer and respond to breath pressure. Both hands are left free to play the keyboard and operate other controllers. More elaborate wind controllers are not so much supplements to the keyboard as alternatives to it. The most famous of these is the Lyricon, which looks like - and is played - as a soprano saxophone. There are also trumpet-like wind controllers. In all these cases, the synthesizer itself is separate from the controller.

The idea of separating the control of the synthesizer from the electronics is one that exerted a strong appeal to keyboard players in rock bands during the 1970s. Hidden behind increasingly large stacks of instruments, they envied the freedom of the guitarists to move about the stage and capture the attention of the audience.

Consequently, a few adventuresome players began using portable "strap-on" keyboard controllers. And it should come as little surprise that some of these players, such as ran Hammer, became known for a style of playing that imitated the guitar, especially in its use of the pitch-bend control (This serves to make an important point about imitative synthesis: Convincing imitation depends as much on duplicating the playing style of an instrument as it does on reproducing the timbre.)

Once synthesists began pretending they were guitarists, it wasn't long before guitarists wanted to be synthesists. Guitar controllers existed during the 1970s, but it wasn't until MIDI arrived in the 1980s that they became widely accepted. Then the same technology that allowed guitarists to control synthesizers also allowed all other instrumentalists, and even vocalists, to do the same.

Although keyboard controlled synths dominate the electronic music scene, guitar, drum, and other controllers continue to carve an important niche. Guitar synthesizers fall into two basic but similar categories, both of which rely on polyphonic pickups (also known as hex pickups because they read all six strings) to translate a guitar string's vibration into electronic information. Proprietary systems take the signal generated by the hex pickup and use it to drive an internal sound engine. These generally offer excellent tracking (the ability to accurately trigger a

35

synth sound from the note being played). MIDI systems take that same signal and convert it to MIDI note information, which is then fed to external devices such as synthesizers, samplers, and sequencers. Many of the proprietary devices offer some form of MIDI output, though you'll often notice a slight delay (lag) between the note being struck and the triggering of the MIDI device. It seems that some players want to take advantage of every available part of the body in controlling the synthesizer. So foot controllers are also made, and they range from simple on-off pedals similar to the sustain pedal on a piano to full control panels for the feet - often used by guitar synthesists. There are also pedal keyboard controllers, similar to the bass pedals on organs.

Electronic Drums have also come a long way in recent years. Drum pads are effective triggers for sampled sounds (triggers can even be mounted on conventional acoustic drums). Physical modeling technology has also emerged in electronic drumming as a means of bringing more expression to the drummer's performance.

B. Other Elements

In addition to the voltage-controlled oscillator, which produces simple waveforms, most analog synthesizers include a noise generator. This produces sound without any specific frequency, or, more precisely, sound in which all frequencies are present in equal amounts. It is useful in creating many special effects that would otherwise be impossible to produce. This noise, which sounds something like radio static, is also called "." When it is filtered to emphasize or eliminate certain frequencies, the result is "colored noise."

Some synthesizers use technology to help play themselves. One component used for this purpose is the arpeggiator, which, when more than one note is being played at one time, causes the notes to sound one after the other (an arpeggio) rather than simultaneously.

C. Voices

There is sometimes confusion about the word "voice" as it is applied to synthesizers. Synthesizer designers prefer to use the word to describe the "polyphony" of an instrument: a 16-voice instrument is one that can play 16 notes at one time, for example. On the other hand, players often use it as a synonym for "patch" - a specific sound. To avoid confusion, it might be best to use "patch," or its equivalents, "preset" and "program," when talking about sounds. Most synthesizers are able to play more than one patch at a time. These instruments are known as polytimbral, or multitimbral. The simplest of these play one patch in one part of the keyboard and another in the other part. These are known as split keyboards. On some, the player can determine where one patch stops and another starts; these instruments are said to have a programmable split point. Some keyboards can be split into more than two parts.

36

Just as a split keyboard plays different patches on different keys, a layered keyboard plays different patches on the same keys, at the same time. This allows the synthesist to play more than one "instrument" at the same time. Often, two versions of a single patch are layered, with one of them detuned - tuned slightly higher or lower than normal; the result is a fuller sound. The most powerful aspect of a multitimbral instrument is that it can trigger different voices across a number of MIDI channels. For example, you could have one channel play a piano sound, another a drum sound, and another a bass sound, all at the same time. This feature is commonly used in sequencing.

D. Microprocessor Control

As mentioned before, the earliest analog synthesizers were modular and produced different sounds by means of patch cords and the setting of numerous switches and dials. Such instruments are still being made and used, but in increasingly smaller numbers. The primary drawbacks to them are that it takes time to change from one sound to another and it is not always possible to produce the same sound twice in a row, since analog controls incorporate a degree of uncertainty.

These drawbacks were overcome for the first time in the late 1970s, when microprocessors were introduced to the synthesizer world. Microprocessors made synthesizers programmable: capable of making changes to patch settings easily, then storing those settings so that the entire patch could be recalled at the touch of a button. Today, when the synthesist makes a change to a patch, he or she is said to be editing it, or programming the synthesizer. Some people refer to the microprocessor as the "CPU" ("Central Processing Unit"); others call it a "computer brain." However, the microprocessor by itself is not quite a complete computer. A program, which is simply a set of instructions, is necessary to tell the microprocessor what operations to perform. This program is stored in memory - usually a kind of memory that can't be changed, so the program can't accidentally be erased. This kind of memory is known as ROM. That acronym stands for "Read- Only Memory," meaning the microprocessor can only "read" instructions from it; it can't "write" any instructions into it.

Patches in a programmable synthesizer are stored in a kind of memory that can be changed. This kind of memory is properly called "read-write memory," because the microprocessor can both "read" from it and "write" to it. It is more popularly known as RAM ("Random-Access Memory") because RAM and ROM sound good together. RAM requires electrical power to retain what is stored in it, so programmable synthesizers are equipped with a battery that serves as a backup power supply when the synthesizer itself is turned off. Many synthesizers allow patches to be stored outside the instrument. This storage allows the player to assemble a "library" of sounds, more than could possibly fit in his instrument at one time. Depending on the instrument, storage may be on floppy disk, removable hard disk, CD-ROM, or a RAM cartridge that can be plugged into the instrument. Instruments that use RAM cartridges usually also have ROM cartridges available, containing patches that have been developed for commercial sale. Patches can also be stored on a computer via MIDI.

37

Because microprocessors deal with numbers, the adjective digital is frequently used when speaking of the things they do. The digital world - the world inside the computer - is a world of discrete divisions and separate steps. A digital dock, which contains a simple microprocessor, displays the time as definite numbers. In contrast, the analog world - the world around us - is a world of continuous change. An analog dock - the familiar one with the circular face and the sweeping second hand - shows not only the hours, minutes, and seconds, but all of the infinite times in between them. An analog synthesizer typically uses continuous controls (called "potentiometers," or "pots") to provide settings for the oscillators, filter, and so on. When programmability (microprocessor control) is added to such an instrument, the settings must be converted into numbers for storage in RAM. The result is a loss of the subtlety of continuous control but a gain in precision and the ability to reproduce a patch exactly.

38

V. Digital Audio

Digital audio is not just recording. Digital audio plays a major, even dominant role in signal processing, broadcasting, and communications of all kinds. But the principals are the same, regardless of what it is used for.

Sound travels through the air as minute, rapid vibrations in pressure, referred to as wave forms. The faster the vibrations, the higher the pitch. The bigger the vibrations, the louder the sound. The variations are cyclical in nature, and if you were able to see them as they pass by a single point, they’d look like a continuously changing wave.

Digital audio is a method of representing the continuous waveform of sound as a series of discrete numbers. In some ways it’s similar in principal to movies or television: in those media, moving images are broken up into individual still pictures, or “frames.” When viewed at the proper speed, somewhere from 24 to 30 frames per second depending on the medium, our brains fuse those images into one continuously changing image.

Digital audio uses “snapshots” of sound, called samples, to represent the waveform. But it’s more complex than film, because if the nature of the way we hear. Sounds are “analog” in nature, continuously varying, with no discrete “jumps” from one pressure level to another, and that’s what we expect to hear. Ears can’t be fooled the way the eyes can: a sound that “flickers” won’t sound right, no matter how fast the flicker is.

FIGURE 1

This means that digital audio isn’t a complete medium for delivering information to our ears: it is an “intermediate” medium, and some form of conversion to or from analog must be done at each end of the chain. A microphone creates analog signals, by turning the variations in air pressure into continuously varying electrical signals. In a digital audio system, this “analog” of the real sound gets converted into digital data. At the listening end, the digital data is converted back to an analog voltage, which can then be sent to an ordinary amplifier and speaker, so we can hear it.

39

FIGURE 2

The process of converting analog audio to digital numbers (A-to-D conversion) involves measuring the instantaneous volume (amplitude) level of the sound many times each second, and recording that level as a number. This process is called sampling, and each number that is created in the process is called a sample, or a “word.”

The fidelity of the conversion process is determined by two major factors: the range – the maximum and minimum values – of the numbers available for a given sample; and the speed at which the samples are taken.

A. The Range of Values: Word Length

When it comes to representing the volume of a sound, analog audio has infinite resolution – that is, there are an infinite number of values it can take. In other words, the depth of the values for volume is infinite and cannot be nailed to a specific resolution. Whenever you take a sample of amplitude, therefore, you are making an approximation, or “rounding off” to the nearest value. How close that approximation is to the original analog volume, depends on the range of numbers you have available to you in the digital realm. If you can only use numbers 1, 2, 3, & 4, your approximation is most likely going to be very crude. If your available numbers range from 1 to 65,536, your approximation can be quite good.

The number of numbers a converter can create is calculated by raising 2 to the power of the number of bits in the system: an 8-bit system can have 28 or 256 values a 12-bit system can have 212 or 4096 a 16-bit system can have 216 or 65,536 a 24-bit system can have 224 or 16,777,216 a 32-bit system can have 232 or 4,294,967,296

1. Signal to noise ratio

The difference between the approximation you make of the value of a signal and the actual value of that signal before you converted it is “quantization error.” We hear it as noise. The greater the range of values you have available for samples,

40

the lower the quantization error, and therefore the less noise the system has – the lower its “noise floor.” The difference between the maximum level that can be sampled and the noise produced by quantization error is the signal-to-noise ration of the analog-to-digital converter.

FIGURE 3

Digital samples are expressed in binary code – strings of ones and zeros – because that’s the way electronic switches and computers work. Each digit in a binary word is known as a “bit.” As it happens, there is a very neat formula that links the number of bits a converter uses to create a digital sample with the maximum theoretical signal-to-noise ration (S/N) of the converter.

S/N ratio in dB = 1.76 + (number of bits x 6.02)

This can be (and usually is) approximated into:

S/N ratio in dB = 2 + (number of bits x 6)

Therefore, a converter that uses digital words that are 8 bits long has a potential signal-to-noise ratio of about 50 dB, which is about what an AM radio is capable of; a 12-bit converter has a ration of 74 dB, or about that of a good cassette deck or FM broadcast; and a 16-bit converter has a ration of 98 dB, which is about the dynamic range of a symphony orchestra.

The industry standard for recording digital audio is 16 bits: that’s what CD’s use. But that standard is quickly changing. DVD and DVD-audio uses 24-bit as standard.

B. The Speed: Sampling Rate

41

The other major factor in determining the fidelity of an analog-to-digital converter is how quickly the samples are taken. If a waveform is to be sampled accurately, there have to be at least two samples taken of the waveform, or two samples taken during a given cycle of the waveform. If we turn that around, we see that the sampling rate of a system must be twice as high as the highest frequency of the sound that is being sampled. If this rule is not observed, “foldover” or “aliasing” occurs, which consists of unwanted frequencies showing up in the digital signal.

Here’s an example of how aliasing occurs. If we take a converter with a sampling rate of 20 kHz, and feed it an analog signal at 15 kHz, the samples that are taken don’t represent different parts of the same cycle, they represent different parts of different cycles. The converter doesn’t know this, however, and assumes the samples are from within the same cycle. The frequency of the signal the converter thinks it’s looking at happens to be equal to the difference between the input signal and half of the sampling frequency, or 5 kHz, and this 5 kHz “alias” is dutifully passed through the recording chain and eventually to our ears. Obviously, this situation is something to avoid.

Much professional audio production is done at 48 kHz. However, DVD-audio presents a possibility of 96 kHz sampling. Compact Discs are recorded at 44.1 kHz.

SO:

A 16-bit system, recording at 44.1 kHz:

Takes 44,100 samples per second, or snapshots of the audio source, with each sample being 16-bits, and assigns each of those a relative dynamic value between 1 and 65,536 (but in binary).

SO:

Each second of digital audio takes up 705,600 bits, or 88, 200 bytes, or 86 kb or .08 MB.

SO:

60 seconds of audio takes up roughly 5 MB of space (per channel). Stereo digital audio takes about 10 MB of space per minute. Therefore, a standard CD holds 750 MB, or 75 minutes of stereo audio – actually, the limit is 74 minutes, because there are other files (directories, etc.) that must go on the disc.

If you are recording 8 tracks of audio, you will use 40 MB of disc space for every minute you record. That’s 5 MB per track/channel. If you record multiple takes, that would be 5 MB per take per track/channel.

Digital audio is very memory intensive. It is important to have ample hard disc space available or a dedicated drive to record on.

42

C. Converting it Back

The reverse process – creating analog signals out of digital ones so we can hear them – is called, not surprisingly, “D-to-A conversion.” The numbers are fed into a device which generates a voltage whose level corresponds to the value of the sample. This results in a waveform that looks like a staircase. This waveform is smoothed by a low-pass filter (analog or digital), which takes out all the high harmonics above 20 kHz, and turns the steps into straight lines or smooth curves. This signal is now a nearly perfect reproduction of the original input signal, and it can be sent to an amplifier and speakers, and then to your ears.

1. Buffering

On analog tape recording, the speed that the tape moves at controls the speed - and pitch – that the sound plays at. Minute speed changes caused by mechanical imperfections in the transport, or by stretching of the tape, result in small changes in pitch, which we hear as wow and flutter.

Digital audio recording does things differently. Digital audio signals must have a master clock controlling their speed, so that the beginning of each sample word occurs at exactly the right time. If it doesn’t, bits will get scrambled, producing unpredictable – but predictably nasty – results. This master clock runs at a speed equal to the sampling rate (or some multiple of it), and is generated by an ultra- stable crystal.

But a mechanical transport, like a cassette or reel-to-reel tape deck, cannot be counted on to run at an absolutely steady speed all the time. To compensate for this, digital data coming off of the tape is put into a “buffer” or storage area before it is converted to analog. The buffer may have more or less data coming into it, depending on whether the tape is fast or slow, but its output will always be at exactly the right rate, because it is controlled by the sampling-rate crystal.

D. Two Basic Rules of Digital Recording

Rule 1 : The limit on the loudest signal you can record is absolute.

If you push an A-to-D converter with a signal louder than it can except, it will literally chop off the peaks of the waveform, and the result will be distortion of a particularly ugly sort. This is very different from analog recording, which gradually increases distortion as levels increase, and the distortion is not always unpleasant – many engineers think it adds “warmth” or “fatness” to the sound, and often use it deliberately.

43

Rule 2: The limit on the softest signal you can record is absolute.

Analog tape has an inherent noise floor, which is the hiss level of the tape, but you can hear signals that are actually softer than the noise. Listen to a long fade-out on a master tape: the signal keeps going for a while even after you can hear the tape hiss clearly. But the bottom of a digital dynamic range is not “transparent:” if the signal level is below the quantization level for the least significant bit, you won’t hear it, period.

E. What exactly is a decibel?

To understand how and why decibels are used in measuring sound, we need to revisit intensity.

1. Intensity of Sound Waves A source which vibrates the air is really transferring energy into that medium. This energy propagates through the air in the form of sound. Put another way, the sound source radiates acoustical energy.

Musical instruments radiate this acoustical energy and the rate at which an instrument radiates that energy is that instrument’s “power output.” Our ears are sensitive to this acoustic power. The measurement for this acoustic power is the acoustic watt.

The human ear is quite remarkable. It has the unique ability to detect very quiet and very loud sounds. If a sound source had the ability to generate one acoustic watt of power, it would be perceived as being very loud. In fact, it would hurt our ears. This is known as the “threshold of pain.”

Remarkably, if a sound source produced a sound of only one trillionth of one acoustic watt (.000000000001 acoustic watts), we would still be able to hear it. This sound would be very soft and is called the “threshold of sound.” Therefore, the difference between the loudest sound we can tolerate is one trillion times greater than the slightest sound we can hear.

Because of this wide range of hearing, using the acoustic watt as a unit of measurement would be difficult because of the large numbers involved. So, instead of using acoustic power and acoustic watts, we will use Sound Pressure Level and measure it in Decibels.

Another important fact about hearing is that it is non-linear. That is, doubling the acoustic power of an instrument will not be doubling the loudness. The decibel scale of measurement takes the non-linear nature of hearing into account, so it is a more descriptive measure of how we actually hear.

44

Decibels (abbreviated dB) are based on ratios and logarithms. Don’t let these terms frighten you. Logarithms are simply a way to reduce large ranges of numbers. We said that the range of human hearing is from .000000000001 acoustic watts to 1 acoustic watt. Using decibels, we can replace those numbers with zero decibels as the slightest sound we can hear up to 120 dB for the loudest sound. Compare the two scales in FIGURE 4.

FIGURE 4

The decibel system can be used to measure practically anything because it is based on ratios. It is always comparing one level to another. We cannot say a car’s top speed is 50 decibels, but we can say that Car A is 10 dB faster than Car B. (zero dB has to be referenced to something like MPH).

The most common uses of decibels are comparing sound pressure levels , power levels, and voltage levels. It’s important to remember that there has to be some comparative number. For example, zero dB SPL is the slightest sound that we can

45

hear. To say a concert is 90 dB means that the sound pressure level at the concert is 90 dB greater than the slightest sound we can hear or 30 dB less than pain.

The important points of decibels include power and sound pressure level relationships. The following table shows the relationship for various numbers.

Change of dB Change of Power 0 1 1 1.3 2 1.6 3 2 4 2.5 5 3.2 6 4 7 5 8 6 9 8 10 10 11 12 15 32 18 64 20 100 30 1,000 40 10,000 50 100,000 60 1,000,000 70 10,000,000 80 100,000,000 90 1,000,000,000 100 10,000,000,000 110 100,000,000,000 120 1,000,000,000,000

This chart shows how more power is required to attain a particular dB increase. For example, if you want to increase SPL (sound pressure level) by 10 dB, you need to increase the power by ten times the existing amount. Let’s say you were using a 100 watt amplifier which produced an SPL level of 95 dB. In order to get to 105 dB SPL, you would need a 1000 watt amplifier (10 time 100 watts).

Decibels are used for describing wound pressure levels because of their similarity to how humans hear. There are three rules to remember:

A one decibel change in sound pressure is impossible for most humans to detect. The average human hears loudness differences at 3 dB increments. This means in order to generate a slight change in perceived loudness, power must be doubled. In order to double the perceived loudness, sound pressure level must be increased by 10 dB, which requires 10 time the power.

46