The Measurement of *

S.S. STEVENS Psycho-AcousticLaboratory, Harvard University,Cambrfigge, Massachusetts (ReceivedJune 17, 1955)

This paperreviews the availableevidence (published and unpublished)on the relationbetween loudness and stimulusintensity. The e.vidence suggests that for the typical listenerthe loudnessL of a 1000-cycle tone can be approximatedby a powerfunction of intensity I, of which the exponentis log•02.The equation is: L--kI ø.a.Intensity here is assumedto be proportionalto the squareof the soundpressure. In terms of sones,where 1 soneis the loudnessproduced by a tone at 40 db above the standardreference level, the equationfor loudnessL as a function of the number of decibelsN becomes'logL=0.03N--1.2. Otherwisesaid, a loudnessratio of 2:1 is producedby a pair of stimuli that differ by 10 db, and this re- lation appearsto hold over the entire range of audibleintensities. At low levelsof intensity,the loudnessof white noisegrows more rapidly than the loudnessof a 1000-cycle tone, but abovethe level of approximately50 db the two loudnessesremain more nearly proportional.The suggestionis madethat for all levelsgreater than 50 db the loudnessof continuousnoises may be calculated from the equation:logL=O.O3N-t--S, where $ is a spectrumparameter to be determinedempirically.

HE purposeof this reviewis to examinethe the nature of the central issue. What we want to know availabledata on the measurementof subjective is how loud variousstimuli soundto people.By this we loudness.It is hoped that by assemblingthe relevant mean, perforce,what do people say when they try to informationin one place we may be able to reach a describeloudness in quantitative terms?In askingthis reasonableconclusion concerning the relation between questionwe are not at the outset trying to solve the loudnessand intensity.The variousresults obtained by problem of how the ear works or how the nervous workersin this fieldmake it plain that the scalerelating systemperforms its integrations.We are not trying to loudnessto intensity is not somethingthat can be countnerve impulses nor provea theory.We are merely determinedwith high precision,but these efforts also looking for the empirical answer to a very empirical make it plain that peopleare able to make quantitative question'How do peopledescribe when we ask estimates of loudness and that it is not unreasonable to them to usea numericallanguage instead of adjectives? try to determinea loudnessscale that will be representa- Our interestin askingthis questionmay be academic tive of the typical listener. or it may be practical.The academicside of the issue Despite its many pitfalls, I think we can probably hasa long historyfull of side-takingand polemics.• The make senseof this problemprovided we are sufficiently practicalside of the problemhad its originin acoustical modestin our demands.Not only mustwe renouncethe engineering.Not long after they had developedthe hopefor highprecision, but alsowe mustkeep in mind conventionaldecibel scale for measuringsound inten- sity, the engineersnoted that equalsteps on the * This work was carried out under Contract N5ori-76 between Harvard Universityand the Officeof Naval Research,U.S. Navy scaledo not soundlike equalsteps, and that a level of (ProjectNR142-201, Report PNR-168). Reproductionfor any purposeof the U.S. Governmentis permitted. • E.G. Boring,Am. J. Psychol.32, 449-471 (1921). 815 Redistribution subject to ASA license or copyright; see Download to IP: On: Tue, 25 Aug 2015 06:45:44 816 S. S. STEVENS

50 db does not like half of 100 db. Since the periment.Thus bisectionmight providea test of the engineeroften faces the problemof communicatingwith generalityof a ratio scale,but by itselfbisection cannot a customer, it was soon realized that there was a need generate a ratio scale. for a scale whose numbers would make more sense to In the tabulationsthat follow only the resultsob- the customer than do the numbers on the decibel scale. tainedby magnitudeestimation and ratio determination The generationof a loudnessscale is in principle will be considered. quite simple.All we need to do is producean array of Magnitudeestimation means the following:A stand- soundsand ask a groupof listenersto assignnumbers to ard tone is presentedand an arbitrary number is as- thesesounds in sucha way that the numbersreflect the signedto its loudness,e.g., 1 or 100.Then a comparison perceivedloudness of the sounds.In practice,of course, tone is presented,and the subjectdecides what number it turns out that many alternative techniquesare pos- he thinks shouldbe assignedto the loudnessof the com- sible, and that subtle differencesin experimentalpro- parison stimulus.Under another versionof the method cedure sometimesinfluence what the listener says or of magnitude estimation, the standard is omitted does. When the listener tries to tell us about the relative entirely.The subjecthears a seriesof intensitiespre- loudnessof two sounds,he is subjectto a host of poten- sented in random order, and to these intensities he tially biasing factors. Some of these factors are built assignsnumbers proportioned to theirapparent loudness. into the listener;someare suppliedby the experimenter. Ratio determinationmeans those procedures that aim Someare easyto discover;others are as elusiveas foxes to discoverwhat intensityproduces a loudnessbearing in a forest. The parametersthat affect experimentson a prescribedratio to a givenloudness. The ratio may be loudnessare numerousand sometimesso conflicting that a fractionor a multiple of the standard. it may be impossibleunder any one procedureto opti- We can further divide the method of ratio determina- mize all of them. Consequently,the definitive experi- tion into two classes'(1) The subjectis allowedto ment in this field, if we may hopefor such,will probably adjust the intensity to produce the desired ratio needto involvea multipleattack of the sort that might (methodof adjustment).(2) The experimentersets the balance out the various sources of distortion and bias. intensityand asksthe subjectto say whetherit is too Nevertheless, under a wide variety of conditions highor too low (methodof constantstimuli). Of course, rather similar results continue to be obtained. Provided from a seriesof magnitudeestimations a ratio deter- we are willing to be content with modestprecision, we minationcan be made by a processof interpolation. can predict pretty well what the averagelistener will Consequently,it is possibleto combinethe resultsof say about loudnessin most ordinary situations.It is experimentsthat usethese two differentprocedures. my ownbelief that this predictabilityis sufficientlyhigh Each of these methodshas its advantagesand its to justify the standardizationof a loudnessscale to disadvantages.And, of course,each has many sub- representfor the "standardobserver" the relation be- varieties, someof which are better or worsethan others. tween loudnessand intensity. Merits of the Methods METHODS Let me try to list someof the assetsand liabilities of Three principal methodshave been used to obtain these methods as they seem to be disclosedin the direct estimates of the relation between loudness and studiesof other experimentersand in our work at the intensity:bisection (or equisection),ratio determination Psycho-AcousticLaboratory. (includingfractionation), and direct magnitudeestima- tion. Methods that involve the comparisonof one-rs- Methodof MagnitudeEstimation (ME) two tones and one-rs-two ears are not included here This is the mostdirect and, in someways, the most becausethey rest on assumptionsthat can be tested efficientmethod. Each presentationof the stimulusis only by the direct comparisontechniques that are re- rated numerically,and no information need be thrown quiredto establisha loudnessscale in the first place. The methodof bisection,involving the determination away. Like all procedures,the method of magnitude estimation is susceptibleto various biases, some of of the loudnessthat lies midway between two given whichcan probably be avoidedor counterbalanced.One ,has its limitations, owing principally to the biasarises from the fact that the subject'sestimates are fact that it leadsonly to an interval scale2 (analogous to the temperaturescales, Fahrenheit and Celsius).Its influencedby the order in which the stimuli are pre- sented. Since the subject usually tries to be self-con- resultsdo not allowus to setup a ratio scale(analogous sistent,what he saysabout a givencomparison stimulus to scalesof weight and length). Ratio scalescan, in dependsto someextent on what he has said about the principle,be determinedby the methodsof fractionation precedingones. This particular bias doesnot affect the and magnitudeestimation, and suchratio scalescan be first judgment he makes, however, and it is therefore usedto predictwhat oughtto happenin a bisectionex- instructiveto comparethese first judgmentswith the • S.S. Stevens, editor, Handbookof ExperimentalPsychology later ones.Actually, thesefirst judgmentsare usually (John Wiley and Sons,Inc., New York, 1951),Chap. 1. consistent with the later ones.

Redistribution subject to ASA license or copyright; see Download to IP: On: Tue, 25 Aug 2015 06:45:44 MEASUREMENT OF LOUDNESS 817

Another bias arisesfrom the fact that, although the potentiometerso that the positionof its knob is approxi- subject tries to judge the comparisonrelative to the matelyproportional to loudness.We find that the results standard,he may be slightlyinfluenced by the absolute obtainedwith sucha "sonepotentiometer" are different level of the comparisontone. He may overestimatethe from, and presumablysuperior to, the resultsobtained relative loudnessof high intensitiesand underestimate with decibel attenuators? the relative loudness of low intensities. Since in estimat- The designof a sonepotentiometer can be achieved, ing subjectivemagnitudes the subjectis completelyun- of course,only after we have somenotion of the form constrainedin the choiceof the numbershe assigns,it of the loudnessfunction. In principlewe might proceed turnsout that an occasionalsubject may giveestimates by successiveapproximations. Beginning with an arbi- that are far out of line with thoseof the group.When trary potentiometer(e.g., logarithmic)we woulddeter- this happens,the distributionof the estimatesis con- mine a first approximationto the loudnessfunction, siderablyskewed. And, sincethe arithmetic mean is not which we would then use to modify the characteristic a good measureto use with skeweddistributions, it is of the potentiometer.With this modifiedpotentiometer usually advisable to compute medians rather than we would redeterminethe loudnessfunction, and again means. modify the potentiometer,and so on. In practice,how- ever, the bias due to a decibelattenuator can probably Methodof ConstantStimuli (CS) be effectivelyeliminated by substitutinga potentiome- This method differs from magnitude estimation in ter whoseangular turn is roughly proportionalto the loudness function first determined with the decibel that the subject merely indicateswhether the com- attenuator. parisonstimulus is greaterthan, lessthan, or equal to, The intensityof the comparisontone when the subject some criterion, such as "half as loud." Some experi- first hears it at the beginningof his adjustment also mentershave changedthe comparisonin the direction indicatedby the subject'sresponse and recordedonly seemsto make a difference.This biascan presumablybe the value the subjectsaid was equal to the criterion. avoided if the experimenter sets the comparisonat random about the criterion point. Othershave limited the subject'sresponses to "greater" Whatever method is used,it probably makesa lot of and "less"only, and presentedcomparisons at random until sufficientjudgments have beenmade to determine differencehow the subjectsare treated. Their attention a "psychometricfunction" from which the "equal" flagsquickly, and whatever biasingforces are at work point couldbe found. In this more rigorousform, the can producemore formidable distortions when the sub- ject's attention has wandered.Judging loudnessis at methodof constantstimuli is relativelytime-consuming and laborious. best a delicate, difficult business. Another sourceof bias arisesfrom the commonprac- The principalsource of biasin the methodof constant stimuliseems to stemfrom the fact that the range,level, tice of combiningresults by averagingdecibels. This would be a proper averageif loudnesswere linearly order, and spacing of the comparisonstimuli may related to ,which it is not. Sinceit is loudnesswe influencethe subject'sjudgments. There is also an order effect called the "time error" want to average,it might be better, beforewe average, to turn all the decibel values into loudness values that seemsto operate when stimuli are presentedin succession.This effect is evidencedby the fact that (sones)by meansof a reasonablyapproximate loudness scale.Then the averagedsones may be convertedback when two successivestimuli are physicallyequal the to decibels via the same loudness scale. 4 When we are secondtends to be judged louder than the first. The concernedwith the decibelreduction needed to produce magnitudeof the effectis usuallynot large,and its role half loudness,the differencebetween averaging decibels in loudnessratio judgmentsis not easy to determine. It couldmean that whena comparisontone followinga and averagingsones may amount to a decibelor more, dependingon the originalvariability. Evidenceof this standardis set to half loudnessthe setting is slightly fact is seen in Table I where the median decibel reduc- biasedin the directionof beingtoo low. tionsreported by Geigerand Firestone5 are smallerthan Methodof Adjustment(adj) the decibel averages.The median is a less efficient statisticthan the mean,but in experimentson fractiona- This method puts the task of finding the criterion tion the median of the decibel values usually comes levelunder the:control of thesubject. He turnsa dialto closer to what we would get if we averaged sones adjust the comparisontone. A potentialsource of bias instead of decibels. in the methodof adjustmentderives from the relation As a matter of fact, experimentersseem generally betweenloudness and the angular turning of the dial. to have paid too little attention to the problem of Most experimentershave usedlogarithmic attenuators on whichthe scale(decibels) is very nonlinearlyrelated a S.S. Stevensand E. C. Poulton, "The estimation of loudness to loudness.The useof thisnonlinear control apparently by unpractisedobservers," J. Exptl. Psychol.(to be published). 4 S.S. Stevens,Science 121, 113-116 (1955). leadsthe subjectto set the dial too low whenadjusting * P. I-I. Geigerand F. A. Firestone,J. Acoust.Soc. Am. 5, 25-30 for half loudness.It is possible,of course,to arrangea (1933).

Redistribution subject to ASA license or copyright; see Download to IP: On: Tue, 25 Aug 2015 06:45:44 818 S. S. STEVENS

averagingtheir data. It is a difficultproblem and there stimulusjudged relative to the standard,but also to a is no universalsolution to it. There are many potential slight extent it appears to be judged relative to the sourcesof skewnessand no oneaveraging procedure can "comfortablelistening level." It is as though the effec- undo all of them. tive standardstimulus, from the point of view of the For experimentson loudnessa defensiblerule might subject, were systematicallyshifted toward the com- be this' When a conversion of the decibel values to sone fortable level. The resulting constant errors can prob- values eliminates the skewnessin the data, it is advis- ably be ascertainedand eliminatedby a balancedpro- able to averagein sones.When this conversionto sones cedure that makes each stimulus serve once as the does not eliminate the skewnessin the data, medians standard and once as the variable. In ratio determina- shouldbe computed.Medians must, of course,be used tion, this balancingprocedure would call for both frac- with caution when the number of cases is small. tional and multiple loudnessjudgments, and an averag- Another generalsource of distortionin loudnessjudg- ing of the two results. ments, sharedmore or lessby all the methods,arises Many of the foregoingproblems relating to loudness from the preferencethe subjectsshow for listening to judgmentsare further discussedand illustratedin the moderate levels of loudness.They seem to avoid Appendix. listening to soundsthat are too loud or too weak by DATA adjustingthe comparisonstimulus a little too high when it is in a low range and a little too low when it is in a Tables I to IV record the decibel values that corres- high range. Consistentwith this they underrate the pond to a loudnessratio of 2' 1. Other ratioshave not loudnessof a faint sound relative to a faint standard, been recordedin thesetables, but wheneveran experi- and they overestimatethe loudnessof an intensesound ment has determined a ratio of 4' 1 it has been used to relative to an intensestandard. The centerof the range, estimatean additional entry of 2' 1 by taking half the between60 and 80 db above threshold,seems to be the decibelratio correspondingto 4:1. These entries are preferredlevel for listeningø and it is as thoughdepar- italicized.Ratios greater than 4:1 have not been used tures from this regionwere "resisted"by the subjects. for thispurpose, despite the fact that higherratios often The consequencesof this tendencyshow up in almost give resultsquite consistentwith thoseobtained for every experimentthat requiresa subjectto judge one 2' 1 itself.Brief descriptions of the experimentson which stimulusrelative to another. Not only is the variable thesetables are basedare given in the Appendix.

TA•.v.I. Tones--decibelreductions required for half loudness.

Method and Decibels re 0.0002 dyne/cm• Year and investigator Frequency subjects 10--20 20--30 30-40 40--50 50--60 60--70 70--80 80--90 90--100 100-110 110--

1930 Richardson and Ross ME 11 12

1932 Laird, Taylor, and 1024 CS 10 5.0 13.5 15.1 19.0 19.0 19.5 20.5 20.0 22.0 24.0 Wille

1932 Ham and Parkinson 350 ME 53-54 10.0 11.0 10.0 10.0 1000 ME 51 7.2 8.2 7.8 7.8 2500 ME 54-55 7.9 9.0 9.4 9.6 Warble tone 500-1000 ME 18 7.0 9.0 9.4 8.8 11.3 11.3 Warble tone 260•500 CS 35-37 10.8 9.4 9.5 7.6 Warble tone 500-1000 CS 13-16 6.8 9.3 6.7 7.6 Warble tone 260---500 CS 32-39 9.0 8.8 9.1 8.9 Warble tone 500-1000 CS 13-16 9.2 9.0 7.8 8.6

1933 Geiger and Firestone 60 adj 31 2.1 4.1 3.5 5.9 5.8 7.5 1000 adj 44 (mean) 3.6 4.3 7.5 9.4 9.8 11.7 1000 adj 44 (median) 3.1 3.5 7.0 8.8 9.5 10.0 1934 Churcher, King, and 800 adj 34 8.0 8.5 10.0 10.0 12.0 16.0 Davies 800 adj 30 7.0 8.0 15.0

1936 Rschevkin and 1 ooo CS 11 9.8 12.8 12.3 11.4 10.1 10.5 Rabinovitch 1000 CS 11 12.0 13.5 13,2 14.0 11.0

1951 Pollack 1000 adj 7 3.2 2.5 3.0 4.1 6.2 6.4 7.9 8.0 8.1 8.9

1952 Garner 1000 adj 18 6.3 9.4 13.0 t6.4 17.9 18.0 17.3 17.4 16.6 14.6

1953 Robinson 1000 cs 25 5.0 10.3 11.6 12.8 11.6 10.6 10.8 10.3

1954 Garner 1000 adj 18 7.4 10.2 13.2 15.6 16.7 17.8 17.9 17.7 16.3 17.5

P.A.L. Stevens and 1024 CS 20 6.9 10.2 7.5 9.4 9.8 8.5 9.3 Herrnstein Stevens 1000 ME 14-17 9.0 9.5 9.5 9.0 13.0 Stevens and Poulton 1000 adj 22, 33 (mean db) 9.0 9.5 1000 adj 22, 33 (mean sones) 7.8 8.7 1000 adj 22, 33 (median) 7.4 8.1 Poulton 1000 ME 8 (numerical estimates) 10.8 12.0 1000 ME 8 (fractional estimates) 11.1 12.0 Poulton 1000 ME 32 13.0 13.5 11.2 10.0

6I. Pollack,J. Acoust.Soc. Am. 24, 158-162(1952); T. Somerville,BBC Quart.3, 11.-16(1949).

Redistribution subject to ASA license or copyright; see Download to IP: On: Tue, 25 Aug 2015 06:45:44 MEASUREMENT OF LOUDNESS 819

TABLEII. Tones--decibelincrease required for twice loudness.

Method and Decibels re 0.0002 dyne/cm• Year and investigator Frequency subjects 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100 100--110 110-

1932 Ham and Parkinson Warble tone 500-1000 CS 19 8.5 9.5 9.8 6.1 Warble tone 250•500 CS 40-47 12.8 12.8 7.2

1932 Geiger and Firestone 60 adj 31 5.6 5.3 5.3 5.3 6.6 4.8 1000 adj 44 (mean) 10.6 8.4 7.0 7.1 9.1 7.5 1000 adj 44 (median) 9.5 8.0 6.1 6.5 8.2 7.5

1936 Rschevkin and Rabinovitch 1000 CS 11 18.5 16.3 17.2 15.5 10.8 9.0 10.8 1000 CS 11 16.7 14.0 11.3 9.5 7.3

1951 Pollack 1000 adj 7 3.5 6.3 9.9 9.7 9.3 10.8 7.5 6.5

1953 Robinson 1000 CS 25 17.5 17.2 17.0 17.0 14.6 10.3 10.3

P.A.L. Poulton 1000 ME 8 12.0 11.7

P.A.L. Poulton 1000 ME 32 11.0 12.0 7.5 8.5

In the tables the values recorded are the decibel re- be used to strike somesort of averageand thereby ductionsrequired to producehalf loudnessor double determinethe decibelchange corresponding to a loud- loudness.Since not all experimentersused the same ness ratio of 2: !. If we could decide which data are standard levels, the various columnsin the tables are most trustworthywe couldweight them heaviest.Or we madeto representintervals of 10 db, and if the standard coulduse someother weighting,based perhaps on the used fell anywherewithin a given 10-db interval it is numberof subjectsused. Of course,after workingon recordedin the appropriatecolumn. The justification this problemfor a coupleof years,and after usingthe for this groupingof resultsrests on the fact that a small variousmethods, I find it difficultnot to form strong changein the level of the standardstimulus produces opinionson this questionof which data to trust, but a negligiblechange in the decibeldifference required for opinionsare alwaysliable to be wrong. a loudness ratio of 2' 1. Sincethe data in Figs. ! and 2 are obviouslyskewed, These tables,then, provide an inventory of most of perhapsthe safest objective procedureis to consider what has been done about loudness.The next problem the mediansof the columnsas the most representative is, what do we concludefrom thesedata? values.The medianis usuallya goodstatistic to useon skeweddata, provided the number of casesis not too Treatment of the Data small. An important questionthat must be decidedat the The entries of Tables I and II have been plotted in Figs. ! and 2. Each point representsa tableentry, except outsetis whetherto combinethe data for halving and that the mean values have not been plotted when doubling. At low intensities,half loudnessseems to medianswere available and the data for 60 cps have requirea smallerdecibel difference than twice loudness. been omitted. At high intensitiesthe reverseis true. But as we have Theseplots are encouragingor discouraging,depend- already noted, there are biasing forces that tend to ing on one'slevel of aspirationin thesematters. Never- producethis kind of result. If we assumethat the effects theless,these are the data, and we must now try to of theseforces are roughly equal and opposite,we can decidewhat they mean. Many possibleschemes could argue that data on halving and doubling should be

TABLEIII. Noise--decibel reductionrequired for half loudness.

Method and Decibels re 0.0002 dyne/cm• Year and investigator Kind of noise subjects 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100 100--110 110- 1932 Laird, Taylor, and Wille audiometer buzz CS 10 9.3 13.0 15.5 17.5 19.0 18.5 20.5 21.5 26.0

1932 Ham and Parkinson recorded noise ME 54 6.5 8.5 8.0 10.1

1933 Geiger and Firestone 40 tone complex adj 31 (mean) 4.3 4.9 7.0 10.4 9.3 11.0 (median) 3.9 5.0 7.0 10.0 9.0 12.0

1951 Pollack white noise adj 7 3.2 3.2 4.7 5.5 6.7 8.7 8.7 8.7 7.4 8.3

1953 Robinson white noise CS 25 7.3 9.6 10.0 12.5 12.3

white noise 12.4

P.A.L.J.C. Stevens ME 10 10.0 10.0

P.A.L. Poulton and Stevens white noise adj 16 (mean db) 6.7 7.1 8.6 9.4 10.0 9.1 9.4 10.0 8.8 white noise adj 16 (median) 6.0 6.5 7.7 8.6 8.6 8.9 8.5 10.0 7.9

Redistribution subject to ASA license or copyright; see Download to IP: On: Tue, 25 Aug 2015 06:45:44 820 S. S. STEVENS

TABLEIV. Noise--decibel increaserequired for twice loudness.

Method and Decibels re 0.0002 dyne/cm •- Year and investigator Kind of noise subjects 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100 100-110 110- 1932 Laird, Taylor, and audiometer buzz CS 10 14.3 16.0 20.0 19.5 17.5 19.0 15.0 14.5 Wille

1933 Geiger and 40 tone complex adj 31 (mean) 6.2 5.8 5.7 6.0 8.5 7.8 Firestone adj 31 (median) 5.5 5.0 5.0 5.5 7.3 7.5

1951 Pollack white noise adj 7 2.7 2.3 4.7 7.7 9.6 8.5 7.9 6.2 4.9 4.8

1953 Robinson white noise CS 25 12.7 13.6 14.5 12.3 11.3 8.1

engine noise CS 25 13.1 12.0 8.0

P.A.L.J.C. Stevens white noise ME 10 (mean) 9.0 9.0 ME 10 (median) 11.0 11.0

P.A.L. Poulton and white noise adj 16 (mean db) 8.4 9.0 8.7 9.5 8.1 7.7 7.0 6.9 Stevens 8.9 adj 16 (median) 7.4 8.9 8.8 9.4 7.9 7.8 6.1 6.5 9.4

combined.7 It then turns out that to a fair approxima- the fact that the arithmetic mean is 10.9 db, whichis tion the decibeldifference corresponding to a loudness higher than the median. The standarddeviation is 3.9 ratio of 2' 1 is constantthroughout the whole intensity db. The rangeon either sideof the medianthat includes range. 50% of the casesis 2.4 db (the semi-interquartilerange). If we acceptthis notion, it followsthat loudnesscan Figure 3 showsa histogramof all the data for the be describedby a power functionof intensity, and we halving and doublingof tones.The valuesare grouped can write a simple formula for it. Furthermore,if we by intervals of 0.5 db. know that loudnessis proportionalto intensityraised to Since in Tables I and II some experimentersare a (fractional) power we can, in principle,use bisection representedby more entries than are other experi- data to determinethe exponent.For if loudnesseshave menters,it might be instructiveto determineone rep- been adjustedso that La--L2=L2--L•, and if L= ki n, resentativevalue for each experimenterby taking the then, cancelingthe k's, Ia•--I2=I2--I• •. median of all his values.When this is done,it turns out Since we assume the three intensities Ix, 12 and 13 that the median of these medians is 10.3 db. If we omit were measuredin the experiment, the value of n is the singleand somewhatquestionable value enteredfor determinable•at least by a processof cut and try. Richardsonand Ross, this median becomes10.0 db. Some of the difficulties that attend bisectionexperi- Many other proceduresfor treating these data can ments will be discussed later. readily be thoughtof, but againstany given procedure Our problemnow is to decidewhat decibeldifference reasonableobjections can probably be urged. There correspondsto a ratio of 2' ! in loudnessfor tonesin the appearsto be no escapefrom the fact that a certain vicinity of 1000 cps.The mediansof the valuesin the degreeof arbitrary judgment must attend our choice tables suggestthat this decibel differenceis in the of a loudnessscale for our typical "standardobserver." vicinity of 10 db. Actually the medianof all the 178 With this arbitrarinessfully in mind, I shouldlike to valuesin Tables I and II taken togetherturns out to be proposethat for the 1000-cycletone we acceptthe value exactly 10.0 db. The skewnessof thesedata is shownby of 10.0 db as the intensity ratio correspondingto a loudnessratio of 2:1. The exponentof the powerfunc-

, , , , , , , , , ß HALF LOUONESS tion then becomeslogx0 2 and the formula for loudness ß : I TONESO MEOIANSOFCOLUMNS ß becomesL= kI ø'3,where I refersto energyflux density g .. •. and is assumedto be proportionalto the squareof the

•g .... :' o ,.--oß .ff..o -':-oß • • O soundpressure. In termsof soundpressure p the formula

e is L-k'p ø.6.If we measureI in unitsequal to the refer- ß ß . ence level of 10-•6 watt per cm2and defineL= ! sone

e e when 1-40 db--then k becomes 0.06 and we have • ß ß e L=0.06I ø-3. Or, if we measure the stimulus as the e number of decibels N above the reference level we can write Fro. 1. The decibel reductionsrequired to producehalf loud- logL = 0.03N-- 1.2. ness.Each point representsan entry from Table I, except that mean values are omitted when medians are available. The data This is a simpleformula to use,but for mostpractical for 60 cps are omitted, and the two rows of entries for Poulton involving eight subjectshave been combined.The open circle to work it will probably suffice to remember that the the right of each column indicates the median of the column. loudnessof a 1000-cycletone goes up by a factorof two for each 10 db-increase in the stimulus. ßE. C. Poulton and S.S. Stevens,J. Acoust.S•. Am. 27, 329- 331 (1955). It will be noted that the chief difference between this

Redistribution subject to ASA license or copyright; see Download to IP: On: Tue, 25 Aug 2015 06:45:44 MEASUREMENT OF LOUDNESS 821

revised sone scale and those previously constructed TONES ß HALVING AND DOUr. INC concernsthe slopeof the functionat levelsbelow about - MEDIANß IO.Odb __ 40 db. Over this range the old scaleswere steeper,and MEAN ' IO.9 db their steepnessincreased as the thresholdwas ap- proached.This steepnessis consistentwith the results of experimentson the halvingof loudness,but if we confineour attention to halving alone we are viewing 6 7 B 9 I0 II 12 13 14 15 16 I? 18 19 20 21 •.• 93 24 only one sideof the coin.The outcomeof experiments DECIBEL CHANGE REQUIRED FOR 2:1 LOUDNESS RATIO on the doublingof loudnesssuggests that the loudness Fro. 3. Histogram of the resultsfor the halving and doubling functionbecomes less steep near threshold,rather than of the loudnessof tones.The data of Figs. 1 and 2 are combined, more steep.Since halving and doublingappear to be making 178 entries. subjectto a biasthat affectsthem in oppositedirections, it seems reasonable to let the loudness scale be deter- supportto the conclusionthat a loudnessratio of 2:1 minedby a compromisebetween the resultsof thesetwo correspondsto an intensity ratio of 10 db. I used a procedures.The simplepower-function loudness scale is variety of procedures,including the making of direct sucha compromise. loudnessestimates over a range as great as 90 db at a singlesitting. LOUDNESS AT THRESHOLD Figure 4 showssome of the resultsobtained when the From the foregoingequation we seethat at the thresh- estimates were made relative to standard intensities of old of hearing,loudness is not zero.If we assumethat medium level. The median of the estimates of 18 sub- the thresholdis equivalentto the referencelevel, then jectsshows that the loudnessratio correspondingto in- at threshold L--0.06 sone. That the threshold loudness tensitiesseparated by 90 db is approximately1000 to 1. The slopesof the straightlines in Fig. 4 are suchthat , ß , , ß TWICE LOUDNESS TONES 10 db correspondsto a 2:1 loudnessratio. The fact that 0 MEDIANS OF COLUMNS the data in Fig. 4 fall slightly below this loudnessfunc- ß ße ß • ß tion at the low intensitiesand above at the high in- ß ee I O ß eeO * o ; o tensities has been shown to be due to the level of the

0 eø eeO I I ß ß standardchosen. In other words,the slopeof the func- tion obtainedis altered when the standardis shifted, so that slopescan be obtainedthat are both greater and lessthan that of the straight lines in Fig. 4. I i i i i i i i 80-10 •-40 40-50 •10'-•0 JO'"'70 70-80 80-•0 90-100 IO0-11O ,•3UND PRESSURE: LEVEl. IN DECIBELS I00 I000 Fro. 2. The decibel increasesrequired to produce twice loud- ness.Each point representsan entry from Table II, except that mean valuesare omitted when mediansare available, and the data for 60 cpsare omitted. The opencircle to the right of eachcolumn representsthe median of the column. ,Olr Y t 'oo should turn out to be a small fraction of a sone is not entirely unreasonable,provided loudness is funda- mentally quantal in nature--provided, in other words, z loudnessis not infinitely divisible.What this meansis • Lot /o ,7"L...... ],o •< simply that if a sound is heard at all, its loudnessis finite. As Robinson8 suggests,when intensity crossesthe threshold,loudness comes on with a jump. Perhapsit will provepossible to obtaina moredirect •o.• • • / qLo e estimate of the sonevalue of a just audible tone, but until that is done the thresholdloudness given by the formulais probablyas goodas any.

SO ME FURTHER EXPERIMENTS • •u•l• 0.1 After the foregoingsummary of the loudnessexperi- •0 40 • • I O0 120 ments was prepared,I undertooksome additional ex- •UND PRESSURE LEVEL IN DECIBELS perimentson the magnitudeestimation of the loudness Fro. 4. Magnitude estimations relative to a fixed standard of 1000-cycletones? These experiments give additional stimulus.Upper curve: the loudnessof the standard(80 db) was called10. Lower curve: the loudnessof the standard(90 db) was a D. W. Robinson,Acustica 3, 344-358 (1953). called10. The pointsare the mediansof 36 loudnessjudgments- 9 S.S. Stevens, "The direct estimation of sensory magni- two by each of 18 subjects.The vertical lines mark the inter- tudes:loudness," Am. J. Psychol.(to be published). quartile ranges.

Redistribution subject to ASA license or copyright; see Download to IP: On: Tue, 25 Aug 2015 06:45:44 822 S. S. STEVENS

In another experimentI dispensedwith a standard tensities,and to eachintensity the subjecthas assigned stimulusand merely presentedto each of 26 subjects a numberrepresenting his perceptionof the loudness. a seriesof eight intensitiesspaced 10 db apart from 40 The variability is fairly large,but the medianjudgments to 110 db. The order of the tones was different for each fall close to the loudness function and confirm our con- subjectand eachtone was presented twice. All but one clusionthat loudnessis a powerfunction of intensity. of the subjectshad previously made loudness judgments in other typesof experiments.The instructionswere as NOISE follows: As shownin Tables III and IV, the data for noiseare I am goingto give you a seriesof tonesof different not as numerousas those for tones. Although these intensities.Your task is to tell me how loud they sound tablesinclude results on othertypes of noise,in trying to by assigningnumbers to them.To turn on the toneyou analyzethe data we shall consideronly thosefor white simplypress the key. You may pressit as oftenas you noise. From these values for white noise,plotted in like. Fig. 6, it wouldappear that the decibelchange required When you hear the first tone, give its loudnessa for a loudnessratio of 2' 1 may be somewhatless than number--anynumber you think appropriate.I will then 10 db. The median of all the values for white noise is tell youwhen to turn on the nexttone, to whichyou will 8.5 db and the meanis 8.4 db. I might point out, paren- alsogive a number. thetically, that thesevalues are strikinglysimilar to the Try to make the ratios betweenthe numbersyou decibelchange I have found to be necessaryto produce assignto the differenttones correspond to the ratiosbe- a 2' 1 differencein the apparentbrightness of a white tween the loudnessesof the tones.In other words, try to light. make the number proportionalto the loudness,as you On the basisof the evidencein Fig. 6 we might well hear it. concludethat, with white noise, the decibeldifference requiredfor a 2' 1 ratio in loudnessis relatively invari- In order to combinethe resultsfor the differentsubjects ant with intensity.This is the conclusionwe reachedfor it wasnecessary to bring the estimatesinto coincidence pure tones.As we have seen,it is an attractive conclu- at a givenlevel (80 db) by multiplyingby an appropri- ate factor--a different factor for each subject. The ß HALVING' WHITE: NOISE medianestimates were then computed for eachlevel. 0 DOUBLING :o oo The results are shown in Fig. 5. The straight line in z ,' ß • ,o o ß • Fig. 5 hasthe slopeof theloudness function (10 db- 2:1 • oß ooo =• ß • ß.' o o o o ß e• loudness). o This experimentgoes about as far as it is possibleto o go towardgetting subjects to makedirect quantitative I i i i I i I I I judgmentsof the loudnessof sounds.Here wehave done 80 10-20 •0-•0 •0-40 40-50 50-60 60-70 70-80 80-90 90-100I00-110 nothing more than present a "random" seriesof in- LEVEL IN DECIBELS Fro. 6. The median decibelvalues for the halving (filled circles) ! I I I ! I I and doubling (open circles) of white noise. The data are from Tables III and IV. ioo sion,because it entailsa simplepower-function relation z --NODESIGNATED STANDARD between loudness and intensity. Furthermore, if the loudness of tones and the loudness of noise are both •- •o powerfunctions of intensity, then the ratio of thesetwo loudnessesis itself a powerfunction of intensity. But here we run into trouble, for this last conclusion doesnot agreewith the resultsof experimentsin which subjectshave been asked to equate the loudnessof a tone to the loudnessof a white noise. These experi- - /I •,'•'• *•_ ments•ø are not themselvesin particularly goodagree- ment, mainly becausethe task of equatingthe loudness of a pure tone to the loudnessof a white noiseis difficult. 0.1 ,: , , ,•ou:.,o;o.,-,..,'•-,•ø,' / .)O 40 60 80 IO0 120 The difficulties are not unlike those encountered in the SOUND PRESSURE LEVEL IN DECIBELS frustratingart of heterochromaticphotomerry. Subjects Fro. 5. Magnitude estimates of loudnessmade with no fixed seemto disagreemore when they try to match the loud- standard.The eight differentintensities were presented in irregular nessof a tone to that of a noisethen when they try to order, and the subject estimated the loudnessby numbers of his own choosing.These numberswere multiplied by appropriate •0F. H. Brittain, J. Acoust. Soc. Am. 11, 113-117 (1939); H. factors to make each subject'sestimates at 80 db average 10. Fletcher and W. A. Munson, J. Acoust Soc. Am. 9, 1-10 (1937); The points represent medians of the resulting values and the G. A. Miller, J. Acoust. Soc. Am. 19, 609-619 (1947); D. W. vertical lines representthe interquartile ranges. RobinsonS;I. Pollack?5

Redistribution subject to ASA license or copyright; see Download to IP: On: Tue, 25 Aug 2015 06:45:44 MEASUREMENT OF LOUDNESS 823

set one tone to half the loudness of another. But all the experimentsrelating tones to noiseseem to concur on one point, namely, that the ratio betweenthe loudness of tonesand noiseis not a powerfunction of intensity. Figure 7 showsthe data as I am able to read them from publishedgraphs. The curves representthe dif- ference in decibelsbetween the level of a 1000-cycle tone and that of a white noise that matches the tone in loudness.Why are thesecurves so far apart? Is it owing to differencesin the methodsused to measurethe noise, or to the large variability among subjects,or to both? Fro. 8. Resultswhen the subject adjuststhe loudnessof a whitenoise to equalthat of a 1000-cycletone. Each point is the Whatever the reasonfor thesediscrepancies, all five of medianof 24 adjustments--twoby eachof twelvesubjects. The the curvesare somewhatsimilar in that they are more verticallines mark the interquartileranges. or less concave downward. Three of the curves show a a general rule that subjectsoverestimate an intense variablerelative to an intensestandard--and that they do the opposite at low levels. Consistent with this generalrule, when the subjectadjusts a stimulusto be equal to an intensestandard he tendsto undershoot, and he tends to overshoot when the standard is weak. Thus, in the extreme case, when the standard borders on painful, the subjectdoes not err in the direction of •,/ FLETCHERANDMUNSON TONE--NOISE_. settingthe variabletoo high. As we shouldexpect, he LOUDNESS MATCHES errs in the directionof settingit too low. ,• ' • ' • ' ,'oo SOUND PRESSURE LEVEL OF NOISE We can predict, therefore, that it makes a difference whether the variable stimulus is the tone or the noise. Fro. 7. The results of five experientsin which,lthe loudness of a tone was matched to the loudness of a noise. All but Fletcher A convincingdemonstration of this differencewas ob- and Munson usedan approximatelywhite noise.In theseexperi- tainedin a preliminaryexperiment on a dozensubjects ments it appearsthat the variable stimuluswas the tone. usingthe methodof adjustment.One time the subject adjustedthe tone, and anothertime he adjustedthe definite maximum. The curvesare not the straight lines noise.His two types of settingswere inconsistentwith they would have to be if, for both tones and noise, eachother at the high intensitylevels, and they were loudnesswere a power functionof intensity. equally inconsistent,in the oppositedirection, at the It is clear from Fig. 5 that, at low levels,a given in- low intensity levels.The discrepanciesat 30 and at 100 creasein the stimulusproduces a greaterchange in the db were of the order of 5 db. loudness of a white noise than in the loudness of a tone. Followingthis encouragingoutcome, I improvedthe This fact seemscertain. Apparently, when we start from apparatusand undertookto testanother dozen subjects, below thresholdand increasethe intensity of a white usingthe noiseas the variable.The subjectused a "sone noise, the amount of energy surpassingthe threshold potentiometer"to adjust the intensity of a 10-kc band increasesvery rapidly for a time. Sinceall the energy of white noisefed through a pair of PDR-10 "extended that exceedsthe thresholdpresumably contributes to range"earphones. The responseof theseearphones was loudness,it is not unreasonablethat the loudnessof approximately"flat" up to 7500 cps and down about white noise should grow rapidly over the first few 10 db at 10 kc. A rotary switchpresented the tone and decibelsabove threshold. This rapid growthof loudness the noisealternately. Each stimuluspresentation lasted is not inconsistent with at least some of the data in about 1.3 sec, with a silent interval of about 0.5 sec. Tables III and IV--those of Pollack and of Poulton The levels of tone and noise were measured with a and Stevens. On the other hand, the fact that someof the curves Ballantine electronicvoltmeter (model 300), and for in Fig. 7 turn down at the higher intensitiessuggests our present purposethe sound-pressurelevel is con- that over the upper range the loudnessof white noise sideredthe samewhen the voltage readingis the same. growsless rapidly with intensitythan doesthe loudness (Actually, when measuringwhite noisethe Ballantine of a 1000-cycletone. But in Tables I to IV there is no read about ! db lower than a thermocouple.)The onset evidencefor a lessrapid growth in the loudnessof noise. and declineof the stimuli weregoverned by an electronic Obviouslywe face a discrepancythat needsexplaining. switch set for a 5-msec rise-time. A possibleexplanation can be found in the fact that The resultsare shownin Fig. 8. We seethat when the in the experimentsunderlying the curvesin Fig. 7 the noise is varied there is no tendency for the curve to variable stimulus seems to have been the tone and not turn down at the high intensities.In this respect,the the noise.As we have previouslynoted, it appearsto be data in Fig. 8 are moreconsistent with the evidencein

Redistribution subject to ASA license or copyright; see Download to IP: On: Tue, 25 Aug 2015 06:45:44 824 S. S. STEVENS

Tables I to IV than are the data in Fig. 7. This is an usedsone potentiometers instead of decibelattenuators encouragingfact. for this purpose. Nevertheless,the problemof decidingon the exact It is a distressingfact that my resultsdo not agree form of a standard loudness scale for white noise can with thoseobtained by Garner.The intervalsproduced probablynot be settledon the basisof our present by Garner's18 subjectsare more nearly equal in decibel knowledge.At any rate, my ownimpression is that there sizethan are the intervalsproduced by my 49 subjects. are still too many looseends and inconsistencies.It I cannot prove it, but I rather suspectthat Garner's appears,however, that at the low intensitiesthe scale data are somewhatbiased by the fact that the subjects for noisewill prove to be decidedlysteeper than the could judge the time it took for the motor to drive the scalefor the 1000-cycletone, and that at the higher attenuator through the 40-db range (80 seconds)and intensities the loudnessscale for noise will have ap- that their fractional judgmentswere perhapsuncon- proximatelythe sameslope as the loudness scale for the sciouslyinfluenced by the judgmentof fractionalvalues 1000-cycletone. of this time interval. An additional,but slight,source of Plotted against a commonscale of sound-pressurebias lies in the fact that Garner averageddecibels. The level, the two loudnessscales will crosseach other, bisectionexperiment is onein whichaveraging by sones probablyin the vicinity of 30 or 40 db. This crossing has a demonstrableadvantage. 4 point is one of the importantitems that needto be In anotherexperiment, I tried to checkthe validity pinneddown. of the results by having the experimenterreset the The principlementioned above regarding the neces- variableintensities to variouslevels and then requiring sity of usingboth the toneand the noiseas the variable 25 subjectsto estimatethe relative spacingbetween the in a loudnessbalance appears to have wide generality. resultingloudnesses by means of a set of adjustable An experimenton loudnessbalance should itself be markerson a steelbar. The subjectset the spacingof the balanced.This principlesuggests that, in the determina- markers to representhis perceptionof the spacingof tion of equalloudness contours, each of the frequencies the loudnesses.The data obtainedagree quite precisely to be comparedshould be made to serveas both the with the soneaverages of the data obtainedunder the fixed stimulus and the variable stimulus. In measure- method of adjustment.The bar and markersconstitute ments in which this balancedprocedure has not been a rather usefuldevice for this purpose. followed we have reason to expect that systematic One of the first things I discoveredin these studies errors may be present--errorsthat are presumably was the disconcertingfact that the point at which a minimal near the middle of the loudnessrange and that subjectbisects a 40-db interval dependson the orderin increaseas we go to very faint or very loud tones. which he listens to the tones. For example,when he hears them in the ascendingorder from faintest to BISECTION EXPERIMENTS loudesthe may judge the midpoint to be about 32 db Let us now return to the problemof the bisection(or from the bottom of the range. When he hears them in equisection)of loudnessintervals in order to seehow descendingorder he may judgethe midpointto be about well the experimentalresults agree with thoseof experi- 27 db from the bottom. In an appendixto his main ex- ments on ratio determination. Two major studies of periment, Garner verified this order effect,but in what bisection have been made, one by GarnerTM and the orderor ordershis subjectspressed the keysin the main other by me (unpublished).In both experimentsthe experimentis not known. subjectsat beforea setof five keyswhich he pressedin Becauseof this order effect, I found it necessaryto orderto producethe tones.The intensitiesof the tones control the order in which the stimuli were presented. producedby the two end keyswere fixed, say, 40 db For the part of the experimentthat we shall consider apart,and the subjectadjusted the intensitiesproduced here, the subjectwas requiredto useboth the ascending by the intermediatekeys in orderto dividethe 40-dbin- and descendingorder and to try to adjust the loudnesses terval into four equal-appearingsteps in loudness. so that the intervals soundedequal in both directions. Garner'ssubjects adjusted the intermediatetones by TABLE V. Sample results obtained by subjects who set one throwing switchesthat controlledmotor-driven at- tone to the midpoint of the loudnessinterval between two other tenuators(2 sec/db). Before the beginningof a given tones. The predicted values were calculated on the assumption that the decibel changecorresponding to a 2:1 loudnessratio is run, all the variable toneswere set at one or the other 10 db. All values are in decibels re standard reference level. end of the 40-db range.In mostof my experimentsthe

subjectsimply turned attenuators (no visible dials) and Interval Midpoint Midpoint Number of before each run the attenuators were randomly posi- bisected obtained predicted subjects tioned.My three attenuatorshad differentsized steps- 16-56 42.5 46.9 7 1.5 db for the lower quarter point, 1 db for the mid- 36-76 65.3 66.8 12 56-96 86.2 86.8 45 point, and 0.5 db for the upper (threequarter) point. 90-120 109.6 111.7 12 Actually,I nowbelieve it wouldhave been wiser to have 101-130 119.3 121.8 10 111.5-140 130.6 131.8 12 n W. R. Garner, J. Acoust.Soc. Am. 26, 73-88 (1954).

Redistribution subject to ASA license or copyright; see Download to IP: On: Tue, 25 Aug 2015 06:45:44 MEASUREMENT OF LOUDNESS 825

It is possible,of course,to controlthe orderof presenta- at least 140 db the loudnessof a 1000-cycletone grows tion of the tones,but it is not possible,unfortunately, as a power function of intensity. to controlthe degreeto whichthe subjectpays attention POSSIBLE ENGINEERING RULE FOR MEASURING to the two orders.He may or may not chooseto ignore THE LOUDNESS OF NOISE one of them. The determinationof a scaleto expressthe loudness Table V givessome of the resultsobtained, together judgmentsof a typical listenerconfronted with a 1000- with the valuesthat wouldbe predictedon the assump- tion that a loudnessratio of 2' 1 correspondsto a decibel cycletone is only a first steptoward the developmentof a procedurefor measuring the loudnessof soundsin changeof 10 db. It turnsout that the midpointto which general.In principlewe can usethe 1000-cycletone as a the subjectsadjusted the attenuatoris slightlylower yardstickand evaluatean unknownnoise by finding thanthe predicted midpoint. This same effect was found the level of the 1000-cycletone that matchesthe un- by Garnerwhen he comparedhis bisectiondata with known noise in loudness. But such loudness balances the fractionationdata obtainedfrom the samesubjects. Sincebisection and ratio determination do not quite betweentone and noiseare difficult to perform in the field. For one thing, in order to avoid systematicbiases agree,we must either choosewhich methodwe will we need to make the noise as well as the tone serve as trust,or elsereject them both. Garner decided to accept the variable stimulus. the verdict of bisection and to correct the ratio deter- What we need, therefore, is a reasonable,workable minationsaccordingly. I wouldprefer to do theopposite. procedurefor transformingsound level measurements My mistrustof the bisectiondata is basedon two con- into sones.The procedurest5 now in vogue give fair siderations.One is the rather large differencesobtained resultsbut they are not particularly simple.In addition as a function of the order in which the tones are pre- to assuminga loudnessfunction for the 1000-cycletone sented.The other is the old questiont2 whether the sub- that is significantlysteeper than the present evidence ject isable to bisectan intervalwithout being influenced indicatesit oughtto be, they make the further assump- by the ratiosas well asthe differencesbetween the loud- tion that the total loudness of a wide-band noise can be nesses.Does he set the middle tone b so that a-b-b-c, found by adding together the loudnessesof certain of or doeshe set if so that a/b-b/c? Or, doeshe compro- its componentbands (octave bands or mel bands). misebetween these two criteria and thereby producea This assumedadditivity certainly needsa more thor- settingof b that is closerto the half-waypoint between a and c in decibels than it would otherwise be? ough validation than it has yet received. Perhapsa better way to approachthe problemof the What lookslike a tendencyamong some subjects to loudnessof noiseis first of all to break the probleminto slipover from an intervaltoward a ratiojudgment seems two separateparts. We may then ask (1) how doesthe to be particularlystrong when the lower tone of the interval is within about 20 db of threshold. t3 loudnessof noise depend on the over-all level, and In view of the difficultiesthat attend bisectionjudg- (2) how doesit dependon the frequencycomposition? It appearsnot unlikely that the first of thesequestions ments,the agreementshown in Table V betweenthe can be given a simpleanswer. As moreand more results obtainedand the predictedbisections might be con- accumulate,I becomeincreasingly convinced that all sideredreasonably satisfactory. There is a similarly continuousnoises of engineeringinterest follow to a goodagreement between the bisectionswe wouldpre- goodapproximation the samesimple rule. This rule says dict and thoserecently reported by BlackTM who used that, for all levels above about 50 db, the loudnessL a noisewith a spectrumthat sloped6 db per octave. changesby a factor of two when the over-all level N SinceBlack's subjects listened to the noisesonly in changesby 10 db. In other words, for a constant ascendingorder, the midpointobtained tends to be spectrumand for N> 50 db higherthan the midpointpredicted. This is just what we shouldexpect. logL=O.O3Nq-S The data in Table V also demonstrate another im- where S is the spectrumparameter. portantfact, namely,that loudnesscontinues to grow The value of S will dependupon the make-up of the as a powerfunction of intensityup to at least140 db. spectrum(including its phaserelations) and will there- At this level the stimuli, under binaural listening,are fore needto be evaluatedempirically. As we have seen, truly painful.One of a pair of earphonesburned out at the value of S for a 1000-cycletone is -1.2. In Figs. 7 this level! If, as somehave supposed,loudness reaches and 8, we seethat above 50 db a white noiseis as loud a ceilingabove which it ceasesto grow,this upper limit as a 1000-cycletone when the noiseis roughly 10 db mustbe beyond140 db. It appearsthat overa rangeof lower than the tone. Hence for white noise the value of

•' S.S. Stevens and H. Davis, Hearing, Its Psychologyand the spectrumparameter S is approximately-0.9. We Physiology(John Wiley andSons, Inc., New York, 1938). obtain this result by a formula that gives the value of •aNewman, Volkmann, and Stevens,Am. J. Psychol.49, 134- S from the results of matching the 1000-cycletone to 137 (1937). 14John W. Black, "Control of the soundpressure level of •5 Beranek, Marshall, Cudworth, and Peterson,J. Acoust. Soc. voice,"Joint ReportNM 001 104 500.42,U.S. Naval Schoolof Am. 23, 261-269 (1951); F. Mintz and F. G. Tyzzer, J. Acoust. AviationMedicine, Pensacola, Florida (February15, 1955). Soc.Am. 24, 80-82 (1952).

Redistribution subject to ASA license or copyright; see Download to IP: On: Tue, 25 Aug 2015 06:45:44 826 S. S. STEVENS

the noise in loudness. If N•000 is the level of the tone in Over the rangeof levelsexplored the resultsfrom 21 decibelsand Ns is the level of the noisein decibels,the subjectsusing the methodof adjustmentfor halving formula becomesS=O.03(N•ooo-N,,)--l.2. The values and doublinggive a loudnessfunction very much like of S andN will dependof courseupon the characteristics the one I have proposed.Above about 50 db the slope of the meter. Provided, however,the meter is properly is approximately10 db for a 2' 1 loudnessratio, and be- calibratedto read the sound-pressurelevel of the 1000- low about 50 db the resultsfor halving and doubling cycletone, the loudnessL computedby theseformulas showthe usual divergence: halving gives a steeperslope; is invariant with meter characteristics(expect for cer- doublinggives a flatter slope. tain nonlinearities). Quietzschalso studied37 widely different types of In principlethe valuesof S for other spectracan be noiseby meansof (1) loudnessbalances against a 1000- obtained by a similar procedure:a loudnessbalance cycletone, (2) soundlevel measurements with twotypes made againsta 1000-cycletone. The optimal level for of meters,and (3) measurementsbased on the summa- making this loudnessbalance is in the vicinity of 70 db tion of the loudnessin separateoctave bands. Among (the comfortablelevel for listening) becausein this other things,Quietzsch proposes certain correctionsto regionthe bias in loudnessmatching is minimal. By make the summationtechnique [procedure (3)• agree followingthis procedurewe couldreadily determine the more closelywith the resultsof direct loudnessbalance value of S for a variety of representativespectra of [procedure(1) •. engineeringinterest. From the resultsof Quietzsch'sprocedures (1) and Alternatively, we could perhapswork out a method (2) we can determine the value of what I have called for evaluatingS from an octave-or a mel-bandanalysis the spectrumparameter S for his variousnoises and for of the noise. But we would first have to determine the his two typesof meters.Unfortunately, in balancingthe properrule for combiningthe loudnessesof the separate tone against the noise he varied only the tone, but bands.This may provemore difficult than an evaluation despitethis possiblesource of bias we learn many in- of S for representativespectra by a direct loudness structive facts from his experiments.Applying our balance. In any case,I believe that the equation here formula for S, we note that when he usesa standard proposedwill lead to a more valid assessmentof the soundlevel meter (DIN-Lautst•rkemesser) the value loudnessof noisethan the proceduresnow in use. of S rangesfrom very near - 1.2 (the value of S for the For many purposeswe do not needto know the value 1000-cycletone) to about --0.5, with an averagevalue of of S. In order to determine by what ratio we have about --0.85. When he usesa particularpeak-reading changedthe loudnessof a noisewe needto measureonly meter (Ger•uschspannungsmesser)with a frequency the sound-pressurelevel beforeand after, providedthe characteristicresembling the human threshold, the spectrumremains constant. For example,a 20-db re- valueof S rangesfrom -- 1.3 to --0.74, with an average duction in the over-all level of a noise will decrease its value of about --1.05. These figuressuggest that the loudnessto 25% of the original value. Only when a peak-readingmeter has an advantagein that it indicates significantchange is made in the spectrumwould we a more nearly invariant spectrumparameter. need to worry about S, and then we would face the Ideally, what we would like to have is a meter whose empiricalproblem of evaluatingS either by meansof a readingswould always lead to a value of S--1.2, loudnessbalance, or by a properrule for the combining whichwould mean that it wouldgive the samereading of loudnessesin separatefrequency bands. for a 1000-cycletone as for a noise,provided the noise This schemedoes not depend,of course,upon the use and the tone wereequally loud. If we were to limit our of the 1000-cycletone as a referencestandard, and it concernto levels above 50 or 60 db, couldwe combine may ultimately prove better to chooseas the reference in a singledevice a set of characteristics(frequency sound a spectrum that can be more easily balanced response,peak indication, time constant, etc.) that againstother noises.In the meantime,the potentiality would allow us to measure the loudness level of noise of the simplifiedprocedure based on the power-function with reasonableaccuracy? This is an old question,I relation betweenloudness and intensity invites serious realize,but one that may needre-examination. If such consideration.The procedurehas the advantagethat it a devicecould be constructedit wouldthen be a simple explicitly separatesthe problem of the dependenceof matter to calibrateits scaleto read directlyin sones. loudnesson intensity from the problem of its depend- In the meantime it would seem advisable to use the enceon frequencycomposition. equationlogL=O.O3N-t-S, and to evaluateS eitherby As this paper goesto. press there appearsan interest- directloudness balance or by an improved"summation ing new study by Quietzsch which is relevant to our method." presentconcern. Quietzsch carried out experimentson the halving and doublingof loudness(1000 cps,30 to APPENDIX. DESCRIPTION OF THE EXPERIMENTS 80 db) and on various proceduresfor measuringthe AlthoughMerkeP 7 (1889) appearsto have been the loudness of different kinds of noise. first to performan experimentin whichthe subjectwas •6 G. Quietzsch,Acustica $, 49-66 (1955). •7j. Merkel, Phil. Stud. $, 499-557 (1889).

Redistribution subject to ASA license or copyright; see Download to IP: On: Tue, 25 Aug 2015 06:45:44 MEASUREMENT OI• LOUDNESS 827

requiredto estimatethe ratio betweentwo loudnesses, Ham and Parkinson• producedtones by meansof a it was not until the developmentof the modern elec- loudspeakerand had groups of untrained subjects tronic art that systematic studies could be made to (collegestudents) estimate the percentageof the loud- determine the dependenceof loudnesson stimulusin- nessthat remainedafter a tone had beenreduced by a tensity. Merkel dropped balls from various heights given numberof decibels.This is oneform of the method onto a soundingblock in order to produce different we have called magnitudeestimation (ME). From the sound intensities. curves plotted from these percentageestimates it is Richardsonand Ross•s (1930) used electrical equip- possibleto determine the decibelreduction required for ment to producetones for this purpose.The observer half loudness.For this interpolation procedure, the heard tonesof two different intensities,and he was re- logarithm of the estimatewas plotted against decibels. quired to rate the loudnessof one of the tones as a The resulting plot is reasonablyrectilinear, and the fraction or multiple of the loudnessof the other. Al- fitting of a curve to the data is not difficult. This curve though their results were not reported in acoustical was used to estimate the half loudnessand the quarter quantities,from their statedvalues of telephonecurrent loudnessreductions. Some arbitrary judgment is neces- it can be determinedthat the averagereduction in in- sarily involvedin this process,but it probablyprovides tensity neededto reducethe loudnessby a factor of two a fair estimateof the half-loudnesspoint. was about 12 db. This is the value entered in Table I. In the experimentsof Ham and Parkinson some Laird, Taylor, and Wille•9 (1932) were among the degreeof bias wasprobably introduced by the fact that first to make fractional estimates of loudness over a wide the comparisonstimuli were always presentedin de- range of intensities.For their tonal stimuli they useda scendingorder, instead of in a random order. 2-A audiometerwhich providesstimulus intensities in Ham and Parkinson also used another method to 5-db steps.Their procedurewas a simplifiedmethod of determine the decibel difference for ratios of 2:1--a constantstimuli. Unfortunately, their resultsare so far methodof constantstimuli (CS). The subjectwas given out of line with thoseof most other experimentersthat a standard and eight comparisonlevels and was re- their data have beendeliberately ignored by thosewho quired to pick the one of the eight whoseloudness was have tried to constructloudness scales. An attempt was nearest to half (or double) loudness.They also used made by Stevensand Herrnstein to repeat thesemeas- ratios of «, {, 3, and 5 and obtained judgmentsthor- urementswith a 2-A audiometer,using presumably the oughly consistentwith thoseobtained for « and 2. The same procedure.Their results, shownin Table I, are judgments for half and double loudnessare entered quite at variancewith thoseof Laird, Taylor, and Wille. directly in the tables. An interpolation between the Later on, Stevens,Rogers, and Herrnstein2ø tried de- values for « and { and for 3 and 5 has been made to liberately to alter the subjects'judgments in the direc- give a value for -} and for 4, from which additional tion of larger decibelreductions for half loudness.They estimatesof -} and 2 have been obtained. presentedcomparison tones at fainter levels than those Geiger and Firestone5 used a method of adjustment used by Stevensand Herrnstein and they altered the in which the subjectswere allowed to set the levels of scheduleby which the successivecomparison tones were the comparisontones by means of a potentiometer chosen.This procedurewas fairly successfulwith sub- (characteristicnot stated). Their 44 subjectsset the jects who had never judged loudnessbefore, but it comparisontone to variousfractions and multiples' «, made much lessdifference to subjectswho had served -}, •o, 1/100, 2, 4, 10, 100.They computedboth means previouslyin a loudnessexperiment. Apparently, by and medians for their data, so that we have here a presentingfirst a standardand then a batch of properly chanceto comparethese two measures.Except when the chosencomparison stimuli well belowthe standard,the adjustmentsare beingmade to the moreextreme ratios, subjectcan be inducedto chooseone of theselow values the mediansare nearly alwaysless than the means. as being half as loud. Garner2• has also demonstrated Churcher,King, and Davies•a allowedtheir subjects this effect.The comparisontones presented by Stevens to adjust an attenuatorto reducethe comparisontone and Herrnsteinwere limited to those5, 10, 15, 20, and 25 to half loudness.Having foundthe half-loudnesspoint, db below the standard, becausethese reductionswere the subject proceededfrom there to make a second all that turned out to be necessary.Of course,if the fractionation, and so on. Thus a curve was obtained for subjecthad said that a 25-db reductionwas louder than eachsubject, and finally the decibelvalues correspond- half, the comparisonwould have been lowered still ing to the various fractionswere averaged.They re- further. In the later experimentthe rangeof comparison peated the procedurefor judgmentsof -} loudnessand levels was extended to 40 db below that of the standard. obtained quite consistentresults. Rschevkinand Rabinovitch•4 used tones lasting only •8L. F. Richardsonand J. S. Ross,J. Gen. Psychol.3, 288-306 (1930). • L. B. Ham and J. S. Parkinson,J. Acoust. Soc. Am. 3, 511- •9Laird, Taylor, and Wille, Jr., J. Acoust. Soc.Am. 3, 393-401 534 (1932). (1932). •a Churcher, King, and Davies, J. inst. Elec. Engr. (London) •0 Stevens, Rogers, and Herrnstein, J. Acoust. Soc. Am. 27, 75, 401-446 (1934). 326-328 (1955). 34S. N. Rschevkinand A. V. Rabinovitch, Rev. Acoust. 5, • W. R. Garner, J. Exptl. Psychol.48, 218-224 (1954). 183-200 (1936).

Redistribution subject to ASA license or copyright; see Download to IP: On: Tue, 25 Aug 2015 06:45:44 828 S. S. STEVENS

0.4 secondand determinedthe ratios «, -}, 2, and 4. A possiblesystematic bias may be presentowing to They arguethat the useof longertones fatigues the ear the fact that the spacingof the comparisonstimuli was and leadsto spuriousresults. Otherwise their procedure in decibelsteps instead of in equal-loudnesssteps. And was not unlike that of Laird, Taylor, and Wille. They since it has been demonstrated that the outcome of the say that their resultsagree with thoseof Laird, Taylor, method of constantstimuli is particularly sensitiveto and Wille, but this seemshardly to be the case. the rangeof the comparisonstimuli presented, we must When Rschevkin and Rabinovitch lengthened the considerthe possibilitythat this "context" factor may tones to 4 seconds,the average reduction needed to have played a role in Robinson'sexperiment. producehalf loudnessdecreased from 11.0 db to 9.5 db. The resultsof certain experiments,either finishedor But sincethe authorsprefer the shorterpresentation, in progressat the Psycho-AcousticLaboratory, are in- the values recordedin Table I are the larger onesthey cluded in the tables as P.A.L. entries. These experi- obtained with the shorter tones. ments have been undertakento explore the effectsof Pollack25 used a method of adjustment with a re- various procedureson loudnessjudgments and the re- latively small numberof subjects.His are the smallest sults obtained have been incidental to this main pur- valuesyet reportedfor the decibelreduction required for pose.Nevertheless, these results need to be considered half loudness. Since it is difficult to read values from in any decisionrelating to the choiceof a loudnessscale. the graphspublished in his paper,the valuesentered in The results of Stevens and Herrnstein with the 2-A the tablesare thosekindly suppliedme by Dr. Pollack. audiometerhave already been referredto. Garner•ø used the methodof adjustmentto obtain 20 The experimentby Stevensø usingmagnitude estima- settingsat each level by each of 18 subjects.His sub- tion was an early attempt to seehow far it might be subjectsadjusted a decibelattenuator (1-db steps). His possibleto go toward getting the subjectsto make ab- is the most thoroughstatistical analysis of an experi- solute numerical judgments of loudness.Three stand- ment of this sort. Unfortunately, the data, as he says, ardswere used: 70, 100, and 120 db sound-pressurelevel "disagreemarkedly with data previously reported," (binaural earphonelistening). The standardwas pre- and we are left wonderingwhy. His secondexperiment, n sentedonly once and then 8 to 10 comparisontones employing18 other subjectswho made 10 settingsat spaced5 db apart werepresented in randomorder. The eachlevel, alsoused the methodof adjustment,but here subjectwas asked simply, "If the standardwere called the subject threw a switch to control a motor-driven 100,what wouldyou call eachof the comparisontones?" attenuator which moved at the rate of 1 db every 2 This procedurewas repeated twice at each level. The seconds.In this secondexperiment the decibelreduc- logarithms of the median estimates give fairly recti~ tions required for half loudnessare smaller at the low linear functionswhen plotted againstdecibels and from levels and larger at the high levels than in the first these plots the « and { loudnesspoints have been de- experiment.The data recordedin Table I were kindly termined. suppliedme by Dr. Garner, sincethey were not tabled The experimentof J. C. Stevens(reported in reference in his paper. 9) with white noisewas similar to the foregoingexcept The large decibelreductions reported by Garner are that the standard was presentedbefore each of the probably due to a number of factors. His subjects comparisonstimuli. In part of this experiment the adjusteda decibelattenuator insteadof a sonepotenti- stimuli were spaced5 db apart below the standard;in ometerwhose attenuation, as a functionof angularturn, anotherpart they werespaced more nearly proportional is more nearly proportionalto loudness.And insteadof to equal-loudnessintervals. These different spacings averaging in sones,or computinga median, Garner madeno significantdifference in the resultingmagnitude averagedthe decibelvalues themselves.If he had used estimations(Table III). a sonepotentiometer and averagedin sones,the decibel Another variation involved the use of a faint standard reductionswould probably have turned out smaller. (45 db) which was assignedthe value 1. The subjects Also, the fact that the subjectsworked for fairly long then assignednumbers to a random series of louder periodsat a stretchprobably accentuated the distortion comparisonnoises (Table IV). The functionsdeter- due to whatever biasingfactors were present. minedby this ascendingprocedure are not significantly Robinson'ss experimentwas perhapsthe most thor- different (in the middle of the intensity range)from the ough study yet made with the method of constant functions obtained when the standard is called 100. stimuli. The stimuluswas presentedby a loudspeaker Other evidence indicates that at the extremes of the to a subject seated 1 meter in front of it. Tones and intensity range thesetwo proceduresgive differentre- noiseswere used, and results were obtained for the suits--in the directionone would predict. At high in- ratios «, x--•, 2, and 10. These results are internally quite consistent,except for the usual discrepancybe- tensitiesthe ascendingprocedure (standard called 1) tween the data for halving and doublingat low loudness givesa steeperloudness function than doesthe descend- levels. ing procedure(standard called 100). The reverseis true for low intensities. •5 I. Pollack, J. Acoust. Soc.Am. 23, 654-657 (1951). 20W. R. Garner, J. Acoust.Soc. Am. 24, 153-157 (1952). In the experimentby Stevensand Poultona using33

Redistribution subject to ASA license or copyright; see Download to IP: On: Tue, 25 Aug 2015 06:45:44 MEASUREMENT OF LOUDNESS 829

subjects,an attempt was made to seewhat peoplewho called 100. Groupsof 8 subjectseach judged, as their had neverbefore made quantitative loudness judgments first attempt ever made, a different comparisonlevel: would do when askedto adjust one tone to a fraction of 6, 10, 20, or 40 db below the standard.The mediansof the loudnessof a standard (90 db). The controlused was these first estimates were 62.5, 50.0, 29.5, and 12.0, a gangedpotentiometer, which reducedthe level about respectively.From the pooled estimatesof the group 15 db when turned from full-on to half-on and about 12 ratingson all sevencomparison levels ultimately used, db when turned from half-on to quarter-on. In other half-loudnessand quarter-loudnessvalues were deter- words,it wasmade to approximatea sonepotentiometer. mined, as shownin Table I. Also shownare values for The level was so adjusted that when this potenti- similar tests with a standard of 80 db. Since these latter ometerwas full on, the comparisontone was at 94 db, values were out of line with expectations,Poulton i.e., 4 db more than the standard. satisfiedhis curiosity a month later by repeating the On their first attempts, eleven subjectsset to -} test on eight of the subjects.Seven of the eight then loudness,eleven to « loudness,and elevento -• loudness. gave lower estimates,but these have not been entered Thosewho set to r}and -• then went on to set to the other in Table I. What went wrong the first time is still a fractions. These other settingswere quite consistent mystery--as it often remainsin theseexperiments. with the first. On the first judgmentsthe mean reduc- The same 32 subjectsalso made multiple estimates tionswere'-•= 4.7 db; «= 9.3 db; -}= 19.0 db. relative to standards(40 and 60 db) which were called In Table I are entered the decibel means, the sone 1. The results are shown in Table II. means,and the mediansof the half-loudnessjudgments Poulton and Stevensa had a group of 16 previously and of half the values of the quarter-loudnessjudg- unpracticedsubjects adjust the sonepotentiometer so ments. that the loudness of a white noise was half and twice In this same experimenta study was made of the that of various standards. The means and medians of influence of the characteristics of the control device the resultsare shownin Tables III and IV. (Twenty usedby the subjects.One group of elevensubjects made othersubjects were later addedto the groupmaking 36 their first settingswith a decibelattenuator and, a week subjectsin all.) later, switchedto a sonepotentiometer. Another group A word shouldperhaps be saidabout anotherP.A.L. of eleven subjectsdid the reverse. For both groups experimentthat is only about half finished.It beganas combined,the effect of using the sone potentiometer a rather ambitiousproject, employingthe method of was to reduce the decibel differencerequired for half adjustmentfor the ratios «, -•, ½-•,1/100, 2, 4, 10, 100, loudnessby 2.7 db. and also the method of magnitude estimation both as- Using the method of magnitudeestimation, Poulton cendingand descending.Sixteen subjectswere to go did a variety of experiments.The purposeof one of through this array of tasks in different orders, all them (reportedin reference9) was to explorethe effect counterbalancedin accordancewith the precepts of of the particular numbersused by the subjectsto report modernstatistical design. It lookednice on paper, but their judgments.Eight experiencedsubjects sometimes we have run into two difficulties.We are finding in were told that the standard (100 db) was 100 and that practice that a few of the subjects "can't take it." the comparisonswere to be judged accordingly.Some- They are requiredto shift their criterion so drastically timesthey weretold the standardwas 1 and that they and so often that consistencybecomes almost impossi- were to say what fractionalpart of 1 seemedto repre- ble. Some of them complain of losing confidence,and sent the comparisontone. These two proceduresgave their resultsshow it. The seconddifficulty has arisen very similarresults, as can be seenfrom the interpolated from the fact that we unwiselyset someof the levels so estimatesof half loudnessand quarter loudnessentered ßthat the subjecthad to use the extremelow end of the in Table I. These subjects also rated louder tones sonepotentiometer where a very small turn of the knob relative to a standard(60 db) called1, and theseresults makesa large changein the loudnessratio. Apparently, were used to estimate twice and four times loudness if we are to determine how loud different tones sound to (Table II). people,we must be careful to set for them simple,un- In another test with magnitude estimation, 32 in- confusingtasks, and we must stop our experimentsbe- experiencedsubjects heard a 100-dbstandard which was fore the listenersbecome uncritical in their judgments.

Redistribution subject to ASA license or copyright; see Download to IP: On: Tue, 25 Aug 2015 06:45:44