Psychological Review 1971, Vol. 78, No. 1, 58-70 : METHOD AND THEORY'

PHILIP J. DUNHAM 2 Dalhousie University

A methodological framework for the analysis of punishment is outlined. The methodology, which is called a multiple-response base-line procedure, serves two purposes. First, it raises a number of new questions about the properties of punishment. Second, it permits the examination of some untested assumptions found in traditional punishment theory. Initial evidence obtained with the multiple-response methodology questions the validity of traditional theoretical assumptions and suggests two simple rules for predicting the properties of various punishment operations.

When an aversive stimulus is contingent the punishment suppression phenomenon has upon the occurrence of a particular response, been referred to as the alternative-response a decrement in the probability of the re- assumption (cf. Dunham, Mariner, & sponse is usually observed. This procedure Adams, 1969). In its simplest form, the is typically called punishment and the decre- assumption states that the decrement in a ment in response probability is called punish- punished response is caused by an increment ment suppression. The basic purpose of this in some alternative behavior. paper is to delineate some fundamental prob- All contemporary explanations of punish- lems with existing punishment theory and to ment suppression are specific elaborations suggest an alternative approach to the prob- of this alternative-response assumption. lem of punishment. Those specific elaborations which have been most formalized fall into two major cate- An Overview of Punishment Theory gories. These categories are referred to as Historically, there have been two funda- single-process and two-process theories of mental assumptions used to explain the phe- punishment (cf. Solomon, 1964). The ear- nomenon of punishment suppression. The mark of the single-process theory is the first of these assumptions was the strong assumption that only one type of learning version of the negative Law of Effect pro- mechanism is involved in the development posed by Thorndike (1913). Thorndike and maintenance of the alternative response assumed that any painful or unpleasant event during punishment training. Two types of would weaken the response (or assumed single-process theory have been suggested S-R bond) which preceded that event. and are differentiated in terms of suggesting Thorndike (1932) subsequently rejected this either a classical or an instrumental con- notion and it has not enjoyed any serious ditioning mechanism. Estes and Skinner attention since that time. The second funda- (1941), for example, suggested that emo- mental assumption suggested to account for tional responses elicited by the punishing event are classically conditioned to stimuli 1 Research reported in this paper was supported by Project Grant APA-194 from the National which precede the punishing event. The Research Council of Canada. Many valuable classically conditioned behavior is assumed suggestions have been made by different individuals to compete with the punished response and during the preparation of this manuscript. The cause the suppression. Miller and Dollard author is particularly grateful to N. J. Mackin- tosh, C. J. Brimer, and B. Moore for their critical (1941) exemplify the instrumental condi- comments. tioning version of single-process theory. 2 Requests for reprints should be sent to Philip They suggested that any response which is J. Dunham, Department of Psychology, Dalhousie University, 1460 Oxford Street, Halifax, Nova associated with the termination of the Scotia, Canada. punishing stimulus will be instrumentally 58 PUNISHMENT: METHOD AND THEORY 59 conditioned as a response which escapes Two reasons can be suggested for the pain and competes directly with the punished lack of direct evidence bearing on the alter- response. native-response assumption. In the case of Two-process punishment theories specify single-process theories, contingencies are two different learning mechanisms which are specified which make the response elicited sequentially involved in the development and by the punishing event a prime candidate for maintenance of the assumed alternative re- conditioning during the suppression of the sponse. Dinsmoor (1954, 1955) and punished response. It is not a profound ob- Mowrer (1947) are formal examples of the servation to note that punishment procedures two-process explanation of punishment sup- which have traditionally been employed in- pression. Dinsmoor suggested that the volve organisms, aversive stimuli, and ap- proprioceptive stimulus feedback from the paratus which make it difficult to measure punished response acquires secondary aver- those responses which are suspected to par- sive properties via classical pairings with the ticipate in the relevant contingency. These primary aversive event. Any response "emotional" behaviors have not been re- which is instrumental in disrupting this chain corded on impulse counters and this has pre- of conditioned aversive stimulation will de- vented the accumulation of any evidence con- velop and be maintained as a response which cerning changes in their probability during competes with the punished response. punishment training. The major problem Hence, two learning mechanisms, first classi- with single-process versions of the alterna- cal then instrumental, are assumed to operate tive-response assumption at this point would in the development of the alternative be- appear to be the lack of an adequate method- havior. Mowrer's (1947) version of two- ology to test what would appear to be very process theory substitutes the notion of con- testable implications. ditioned fear for the notion of aversive With respect to the two-process theories, stimulation conditioned in the Dinsmoor the problem is more serious. As Schuster theory. and Rachlin (1968) have suggested: There are two basic implications of any Because both the reinforcer and the response are version of the alternative response theory. unobserved and unobservable, the two factor theory First, there is the implication that some of punishment poses a serious problem for the alternative behavior will develop and be experimenter who wishes to test it: how can it be maintained during punishment training. disproved? All the critical events are assumed to occur within the organism being punished [p. 784]. Second, there is the implication that this alternative response causes the reduction In addition to the specification of the critical in the punished response. Presumably the contingencies "inside" the organism, it former implication could be confirmed inde- should be noted that any increase in some pendent of the latter. But the latter could alternative behavior observed during punish- not be confirmed if the former were false. ment can be taken as support for the opera- In spite of the substantial amount of re- tion of the assumed internal contingency. search on punishment in the last decade, Hence, the measurement of the unobserved there is no direct evidence to support either behavior referred to by Schuster and Rachlin of the above implications of the alterna- does not make the positions any more sus- tive-response assumption. As Azrin and ceptible to disproof. Holz (1966) have stated, the typical pro- The picture which emerges from this brief cedure in punishment research has been overview of punishment theory and research to infer the presence of the alternative re- is that there has been a lack of interaction sponse 'from the absence of the punished between punishment theory and punishment response. A minimal requirement for test- data. The lack of any data relevant to the ing the assumption is the measurement of most fundamental theoretical assumptions the alternative response independent of the has permitted the alternative response inter- phenomenon which it seeks to explain. pretations to persist as originally formulated. 60 PHILIP J. DUNHAM

In the next section of the discussion, a response probability by dividing the amount methodological approach to punishment will of time spent in a particular response state be described which can serve two functions by the total time possible (cf. Premack, in the context of the existing punishment 1965). Any type of behavior can be mea- literature. First, the methodology permits sured in terms of its duration, including the us to examine some questions about the class of behavior which is labeled "doing properties of punishment which have not nothing." With the appropriate manipu- previously been considered. Second, it landa for a particular organism and control permits us to examine some previously un- of the relevant parameters, a multiple-re- tested implications of existing punishment sponse steady-state base line of behavior, theory. in which several measurable responses are observed to fill experimental time, can be A Methodological Approach to Punishment established. No shaping or active manipula- The methodological approach to be sug- tion of contingencies is assumed to be nec- gested is called a multiple-response base-line essary. Assume, for the purposes of discus- procedure. It can be applied to a variety of sion, that our gerbil cooperated and filled problems in addition to punishment, and the half hour with drinking (p = .1), eating when viewed in the context of traditional (p = .3), and paper shredding (p = .6). methodology, it falls between a typical oper- Once the multiple-response base line has ant procedure, in which a single response is stabilized, we are in the interesting position shaped under constraints, and typical etho- of actively manipulating the organism's en- logical procedures, in which behavior is ob- vironment and assessing the effects of such served without external laboratory con- manipulations on all of the responses in the straints. The most convenient way to de- repertoire. Obviously, the manipulations of scribe the essential features of the methodol- most interest in the present context are those ogy is to elaborate on a specific hypothetical which we call punishment operations; how- example which is representative of several ever, it is instructive to digress briefly and obvious variations on the basic approach. consider a simple manipulation like making The hypothetical example should be noted a running wheel available. The most visible with some care since it will be approximated effect of making the running wheel available in reality when experimental evidence is sub- will be introduction of a running response sequently discussed. into a repertoire of behavior which already Consider a small animal chamber with fills experimental time. If running occurs, grid floor and three sources of enjoyment by definition, there will be a decrement in for the small rodent commonly called a Mon- the observed probability of one or more of golian gerbil (Meriones unquiculates). The the existing responses in the repertoire. three items of interest in the chamber are a Curiously, there is little more than intui- food bin with an unlimited supply of stan- tion to tell us how the organism will re- dard Noyes pellets, a drinking tube with organize his response hierachy to accommo- unlimited supply of water, and adding ma- date the running response. Will he sacrifice chine paper which is threaded through a a little bit of each of the existing behaviors ? slot in the wall of the chamber. Will he select one response and sacrifice it The reader familiar with the behavior of for running privileges? If the latter, what this curious rodent will not be surprised to is the rule of response selection? find that the gerbil will spend much of a If an aversive stimulus like electric shock half-hour daily session in the chamber shred- is introduced, one is faced with roughly the ding the adding machine paper, eating the same problem as that posed by the introduc- food pellets, and drinking from the tube, tion of the running wheel. Shock, as an un- in that order of preference. It is relatively conditioned stimulus, will define a certain easy to record the duration of each of these probability of unconditioned behavior which three behaviors during the session and con- must be assimilated into a response hier- vert these duration measures to a scale of archy. By definition, some decrement in the PUNISHMENT: METHOD AND THEORY 61 probability of one or more of the existing various temporal schedules which are responses must take place. Again, we are arranged independent of the organism's be- not sure how the organism alters the existing havior. In future discussion, the terms preference structure to accomodate this addi- response contingent and noncontingent will tional behavior, but the rules which describe be used to describe these generic operational such alterations would be of importance in categories. Some measure of the compre- predicting the immediate effects of any va- hensiveness of any theoretical framework is riety of punishment operation on responses provided by its ability to subsume data in in the repertoire. both operational categories. Several of these The point to be made with the two pre- operations, most often employed in single- ceding examples is that some manipulations response operant research, reveal interest- which can be used in the context of a multi- ing new dimensions when considered in the ple-response base line have the property of multiple-response context. adding a response to the existing repertoire Consider the response-contingent punish- of behavior. Punishment procedures are one ment operation where one response is se- such manipulation, and it is suggested that lected from the repertoire and the onset of procedures be arranged which permit the that response is followed immediately by a priori specification of the response which an instance of shock. The suppressive will be added and subsequent measurement effects of this operation on the referent be- of that response during punishment training. havior are well known. However, the opera- In this respect, the selection of the gerbil tion defines a version of Sidman avoidance was fortuitous. The response which we have contingency for all other responses in the observed to be associated with the introduc- organism's repertoire, including the response tion of shock is a vigorous biting and chew- elicited by the shock event. In the gerbil ex- ing of the grid floor. As reliable, if not ample, response-contingent shock for eating as desirable, is an aggressive attack on also defines a very effective Sidman avoid- another gerbil if the target animal is pro- ance contingency for paper shredding. The vided. more time spent paper shredding, the longer When one considers the variety of ways the interval between shocks and the fewer in which an aversive event like electric shock shocks received. Does the gerbil adjust his can be introduced into a multiple-response behavior in a manner suggested by the procedure, a number of empirical questions contingency ? are generated which have not received pre- Consider a fixed-interval noncontingent vious experimental attention. A brief con- shock procedure. For a given density of sideration of a few of these questions will shock, the probability of each response in illustrate the heuristic value of the multiple- the organism's repertoire will determine the response base-line procedure. number of shocks which are associated with The traditional literature dealing with that particular response. In the gerbil ex- shock provides us with two major classes ample, paper shredding occupies over half of operation which are called punishment the experimental session, hence more shocks and avoidance, with variations on each will arrive during paper shredding than dur- theme. In the context of the multiple-re- ing any other response in the repertoire. sponse base-line procedure, an alternative Of more interest, with shocks delivered at operational dichotomy is suggested. Spe- fixed intervals of time, that response which cifically, the shock event can be delivered most consistently follows the shock event according to a program which makes refer- will most consistently predict the longest ence to one or more responses in the reper- interval of "safety." Again, one must ask if toire, or according to a program without the organism changes his behavior accord- reference to the organism's response reper- ing to these implicit contingencies defined by toire. The former includes traditional re- the fixed-interval noncontingent schedule. sponse-contingent punishment and Sidman A phenomenon reported initially by Morse, avoidance operations; the latter includes Mead, and Kelleher (1967) is perhaps rele- 62 PHILIP J. DUNHAM vant. These investigators reported that initially predicts safety for a longer period monkeys which are exposed to fixed-inter- of time than any other response in the val noncontingent shock will aggressively organism's repertoire. bite a rubber tube made available after each The converse situation can also be sug- shock is delivered. The postshock elicited gested. If we give the elicited response biting behavior would be the response in the the initial advantage of consistently pre- organism's repertoire which most consist- dicting the longest safe interval of time, yet ently predicts the longest interval of safety. attempt to train a different response as an As suggested by the preceding analysis, the avoidance behavior, it would not be sur- animal changes its biting behavior accord- prising to observe the animal attempting ini- ingly. The amount of biting behavior in- tially to develop the elicited behavior to the creases during extended punishment train- detriment of the avoidance response. This ing and eventually fills the shock-shock would be particularly true in the case of interval with a "scallop" in the rate of biting constant shock-shock intervals (e.g., the which appears at the end of the interval. Morse, Mead, & Kelleher, 1967, phenome- Consider also a Sidman avoidance pro- non), as opposed to variable shock-shock cedure with a constant shock-shock interval intervals where the predictive properties and constant response-shock interval. Once of the elicited response are less evident. we select a single response from the multi- This also leads to the suggestion that a ple-response base line as an avoidance re- variable shock-shock interval will produce sponse, all other responses in the repertoire faster learning of avoidance behavior when are initially punished according to a non- the avoidance response is other than the contingent shock schedule, with the shortest shock-elicited response (cf. Bolles & Popp, possible intershock interval being the Sid- 1964). man shock-shock interval. Which of the Finally, it is of some interest to consider responses in the organism's repertoire should the variations in temporal schedule which one select for the avoidance response for are possible with noncontingent shock. We efficient learning? It would seem obvious have already discussed the fixed-interval that the response elicited by the shock would case; now consider some implicit contingen- enjoy some advantages not enjoyed by the cies established by variable-interval cases. other responses in the repertoire. First, that When programming shocks to occur at vary- response will predict the longest safe interval ing intervals in time, the distribution of between shocks even if the avoidance con- intervals may be an important consideration. tingency were not in effect. Second, it has The typical variable-interval tape program a relatively high probability during initial is a rectangular distribution of intervals with training sessions, hence it will sample an some guaranteed minimum interval. It is implicit avoidance contingency quite often. typically described in terms of its arithmetic Third, if it is explicitly given the avoidance mean and delivers the shocks with a random property of delaying shock, there should be sequence of intershock intervals. Of more few responses which will compete with it for interest is an exponential distribution of rapid learning. This suggestion is in line intervals which will deliver the same num- with Holies' (1970) view that the organism's ber of shocks at varying intervals, but is a innate species-specific defense reactions continuous distribution. The probability of (SSDRs) to an aversive stimulus are most shock at any "moment" in time is a fixed readily learned as avoidance responses. I value and the animal has no programmed would suggest, however, that the rapid learn- interval which is guaranteed to be safe. In ing has nothing to do with their "innate typical punishment studies where a single defensive" properties. It is the property of operant response is shaped, trained, and immediately following the shock event which measured, the effects of shocks delivered optimizes the conditions for avoidance learn- according to fixed intervals, rectangular dis- ing—that is, not only does the response tributions of variable intervals, and exponen- eventually reduce shock frequency, but it tial distributions of variable intervals, may PUNISHMENT: METHOD AND THEORY 63 be the same—the operant is suppressed. tingent shock procedures. Basically, the However, in the multiple-response proced- two-process theories suggest that any re- ure, very different effects may be found with sponse which disrupts the chain of dis- the three distributions. As mentioned ear- criminative stimulus conditions which pre- lier, the elicited response enjoys the advant- cede the shock event will increase in prob- age of predicting a safe period most con- ability and be maintained during punishment sistently in the fixed-interval shock schedule. training. In the case of response-contingent This would be true to a lesser degree of punishment, these theories make the general variable-interval schedules with rectangular prediction that any response other than the distributions. The elicited response would punished response might increase in prob- most consistently predict the minimum inter- ability. They do not provide us with a re- val of safety programmed in that distri- sponse-selection rule to tell us if one or all bution. In the exponential distribution, of the unpunished responses increases. In however, no response consistently predicts the case of noncontingent shock operations, a minimum safe interval. According to the the two-process theories fail to make any program, shock is equally probable at any meaningful prediction. When a nonconting- point in time. This removes those implicit ent shock operation is employed, every re- contingencies for certain responses which are sponse in the organism's response repertoire most evident in fixed-interval schedules. will be part of a discriminative chain of stim- Again the question which must be asked is uli which terminates with shock on some whether or not the organism adjusts his be- occasions. Hence, it is impossible to suggest havior according to the presence and absence that any response in the repertoire partici- of these implicit contingencies. pates in an instrumental contingency which The preceding examples are intended to disrupts the chain of conditioned aversive illustrate two points: first, the heuristic stimulation. value of the multiple-response analysis in terms of generating testable questions; sec- Some Rules for Prediction ond, the sparsity of evidence relevant to The predictions which the single-process these questions. This sparsity is understood and two-process explanations of punishment when one recognizes the emphasis which has suppression make in the context of the multi- traditionally been placed on single-response ple-response methodology have been dis- measurement in both free operant and dis- cussed in the preceding section. Prior to crete-trial punishment research. considering some evidence, I would like to The discussion of the multiple-response suggest some alternative rules for predicting methodology can be concluded with a brief the effects of a variety of punishment opera- consideration of the predictions which single- tions on the various behaviors measured in process and two-process punishment theories the multiple-response procedure. The rules would make in the context of the hypo- are, at this point, tentative, and a systematic thetical gerbil procedure. With respect to analysis of punishment in the context of a single-process theory, the use of any type multiple-response base line may modify or of punishment operation will introduce the contraindicate these initial suggestions. grid-biting response into the organism's re- Once the immediate change in perform- sponse repertoire. This grid-biting response ance caused by the introduction of an uncon- should be a prime candidate for both classical ditioned behavior into the repertoire has conditioning (Estes & Skinner, 1941) and taken place (shock is introduced), the or- instrumental escape conditioning (Miller & ganism has a hierarchy of responses de- Bollard, 1941) since it follows the shock scribed in terms of response probability. The event more often than other responses. The two rules which attempt to predict the fate predictions of the two-process versions of the of each of the responses in the organism's alternative-response assumption must be con- repertoire during subsequent aversive train- sidered separately in terms of predictions ing are the following: (a) That particular about the effects of contingent and noncon- response in the organism's repertoire which PHILIP J. DUNHAM is most frequently associated with shock sists of a series of successive approxima- onset and/or predicts the onset of shock tions to the multiple-response methodology within a shorter time than other responses which has been discussed throughout this will decrease in probability and remain be- paper. Many of the thoughts developed in low its operant base line. (&) That par- the preceding discussion of punishment orig- ticular response in the organism's repertoire inated with a serendipitous observation made which is most frequently associated with the while studying the effects of punishment on absence of shock onset and/or predicts the key pecking in pigeons (Dunham et al., absence of shock onset for a longer period 1969). Dunham et al. observed that pigeons of time than other responses will increase trained to peck a response key for grain on in probability and remain above its operant a variable-interval schedule missed the key base line. and hit the wall area adjacent to the key with Consider the spirit in which these two a certain number of pecks. The introduction rules are formulated. I am suggesting that of key-peck response-contingent punishment there are two basic contingencies of import- suppressed key pecking and increased the ance in any operation for delivering the frequency of off-key pecks during punish- shock. First, there is an instrumental ment. These results suggested that an un- punishment contingency. It has two im- punished, response, other than the shock- portant dimensions. A response can be more elicited response, will increase in probability frequently associated with shock onset than and be maintained during punishment train- other responses, or it can predict a given ing. frequency of shock onset within a shorter Subsequently, a more deliberate approxi- period of time, or both. Thus, the two di- mation to the multiple-response methodology mensions of the contingency are frequency was attempted, taking advantage of the phe- and time. Second, there is an instrumental nomenon of schedule-induced polydipsia (cf. avoidance contingency. It also has two Falk, 1966) in order to establish two steady- important dimensions. A response can be state responses which occupied a large por- more frequently associated with the absence tion of experimental time.3 Falk (1966) of shock onset, or can predict the absence of reported that rats trained to lever press for shock onset for a longer period of time, or food pellets on a variable-interval schedule both. will indulge in an excessive amount of drink- In traditional single-response operant me- ing during an experimental session if a thodology, the time not spent on the operant drinking tube is made freely available. Using is usually assigned the label of "nonresponse" Falk's standard procedure, two rats were and is assumed to be a homogeneous mass of trained one hour each day to the point of behaviors, (cf. Rachlin & Herrnstein, 1969). polydipsic drinking, and a mild .2-milliamp- In spite of their tentative nature, the two ere, .5-second shock was introduced which rules described above should make it obvious was contingent on the lever press response that it may be of more value to recognize (Subject 1) or on the drinking response the "heterogeniety" of the nonreponse class (Subject 2). With the relatively mild shock —at least in terms of response probability intensity to minimize the probability of the differences. Further empirical analysis may unconditioned (and unmeasured) responses reveal such factors as the sequential depend- elicited by shock, it was assumed that drink- encies between different responses to be an ing would be the most probable unpunished important determinant of the organism's ad- alternative if lever pressing were punished, justment to aversive contingencies. This and that lever pressing would be the most criticism of the single-response operant ap- probable unpunished alternative if drinking proach is equally applicable to the procedures which employ appetitive contingencies. were punished. According to the rules out- lined in earlier discussion, the punished re- Some Evidence sponse was predicted to be suppressed, and Over the past year, the research com- 3 The author thanks Jon Little for his assistance pleted by the author and his associates con- with this experiment. PUNISHMENT: METHOD AND THEORY 65

the most probable unpunished alternative in The final line of evidence from the au- the response repertoire was predicted to in- thor's laboratory work to date was an crease in probability following some degree attempt to approximate the hypothetical ger- of initial disruption by the addition of shock- bil procedure used as an example throughout elicited behavior to the response repertoire. this article. There were several reasons to The results of the initial polydipsia train- extend experimentation beyond the case of ing phase, the first punishment training two highly probable responses to three or phase, a recovery phase, and a second more measured responses. First, the tenta- punishment training phase are illustrated in tive rules specify that only one of the several Figure 1. During initial training, the unpublished alternative responses will in- animals developed polydipsic levels of water crease in probability during punishment intake similar to those usually reported by training. In order to test this prediction, Falk (1966). In the initial punishment more than one alternative must be measured. phase, Subject 1 was shocked for lever press- Second, a test of single-process theories is ing and Subject 2 was shocked for drinking. possible only if the unconditioned response In both cases, the punished response was introduced by the shock event is measured. observed to be suppressed immediately and Third, an alternative approach to the data the unpunished alternative behavior was ob- in the polydipsia experiment would be the served, following initial disruption, to in- suggestion that the shock has some general crease to levels which exceeded the estab- arousal property which increases all un- lished prepunishment base-line level. The published responding. The demonstration results obtained with Subject 1 were very of a single-response increase would make surprising. In spite of the aberrant (poly- such an arousal mechanism less appealing dipsic) base line of water intake prior to than the type of contingency analysis sug- punishment, the response was observed to gested by the tentative rules. exceed that base line during punishment ses- To answer these questions, Kennedy sions. In the third phase of the procedure, Muyesu-Kaisha Munavi conducted a bur- the punishment contingencies were removed densome experiment in the author's labora- and recovery was observed for 14 sessions. tory which was a very close approximation With the removal of shock, all responses in- to the hypothetical gerbil procedure de- creased to levels even higher than preceding scribed earlier. Nine gerbils were randomly base lines. This overshoot following re- assigned to one of three groups. Each ger- moval of punishment is typical of the recov- bil was placed in a small response chamber ery of punished responses (cf. Azrin & Holz, with food bin, drinking tube, and adding 1966). It is interesting to note that over- machine paper for a daily half-hour session. shoot occurs with both the punished and un- The total duration of each of the three be- punished alternatives, which were measured haviors was recorded on elapsed-time meters in the present procedure. Following the 14 by an observer looking at the animals days of recovery, punishment contingencies through a one-way glass window. When a were again introduced—this time for the re- gerbil entered the chamber, it had been per- sponse which had not been punished during mitted access to only 80% of its normal the first punishment phase. Basically the ad lib intake of laboratory chow during the same phenomena observed during the first preceding 23^ hours, and the only water punishment phase were observed during the permitted was that available during the ex- second. The punished responses suppressed perimental session. Under these conditions, and the unpunished alternatives were dis- the gerbils managed to fill approximately rupted and then increased in probability. 25 of the 30 minutes available each session Unfortunately the increase in drinking ob- with paper-shredding, eating, and drinking, served in Subject 2 did not consistently in that order of preference. It should be exceed the prepunishment base line. This noted that records were kept of the amount may be the result of the ceiling problem on of food and water consumed during base- water intake referred to earlier. line observations. Subsequently, if eating 66 PHILIP J. DUNHAM

b .S-l pre- shock shock recovery shock °—D lever : o§* _ lever drink •— • drink

/ ^o__0^— - -- *-o o^._o— b— .-« Z 1^ D ___Q ^-*° ^"-0 — ° — D— ° a. _ S-2 pre* shock shock recovery shock 29; drink lever yx./-._x 3 1 : y~" .7jr-*H3_D_»0_,_D^a ft 24 6 8 10 12 14 16 18 20 22 24 26 TWO SESSION BLOCKS FIG. 1. Mean number of punished and unpunished responses plotted in two-session blocks for Subject 1 (S-l) and Subject 2 (S-2) in the polydipsia procedure. was the punished behavior, the animal was session, grid biting was measured along with permitted to make up any deficit in food the three other responses during each session. intake during the experimental session in Prior to examining the results of the ex- its daily ration one hour after the experi- periment, it is of some value to consider the mental session. Hence, there was no con- predictions made by the rules when applied founding of changes in either the total daily to this situation: (a) the rules suggest that food intake or water intake with the intro- the response most frequently and immedi- duction of the shock (cf. Dunham & Kilps, ately associated with shock will decrease in 1969). probability and remain below its operant After a base line of three behaviors was base line; it is the referent punished response established, Group E was punished for eat- in the response-contingent procedure; (b) ing, Group P for paper shredding, and there will be an immediate disruption of all Group D for drinking. The shock was a responses in the repertoire immediately .2-milliampere, .S-second shock delivered at upon the introduction of the grid-biting re- the onset of the referent behavior and con- sponse into the three-response repertoire tinued at 2-second intervals until the referent which already nearly fills time; (c) after response was stopped. At this point in the the initial disruption, the response in the experimentation with gerbils, it was not repertoire which is most frequently asso- known what elicited behavior or behaviors ciated with the absence of shock for the would be introduced with this shock in- longest period of time will increase in prob- tensity. After running four animals through ability to levels which exceed the prepunish- the first session of punishment it was obvious ment base line. This response should be that that all animals were attacking the grid bars response which is most probable on the through which the shock was delivered. At assumption that the most probable behavior this point a record was started of the dura- will sample the implicit avoidance conting- tion of grid biting as a response which con- ency most often. sistently occurred with the shock in all nine Subsidiary predictions are also implicit in animals. Hence, we have the probability of the rules. For example, if grid biting is not grid biting during the first punishment the most probable unpunished alternative be- session for only five of the nine subjects in havior, it will drop out of the repertoire as this initial experiment. After the first the shock frequency declines (punished re- PUNISHMENT: METHOD AND THEORY 67 paper-shredding eating grid-biting drinking

1 3 5 7 9 11 13 TWO SESSION BLOCKS FIG. 2. Probability of punished and unpunished responses for the three subjects in each of the three groups in the gerbil procedure. (Note that all data points are two-session blocks with the exception of the first punishment session which is plotted as a single-session point.) sponse suppressed). The latter suggestion repertoire was observed to increase in prob- is contrary to the predictions of the tradi- ability to levels which exceeded the base- tional single-process punishment theories. line probability. These two results are taken If, however, the grid biting response is the as support for the operation of the con- most probable of the unpunished alternatives, tingencies outlined in preceding discussion. it will be maintained even in the absence of Consider next the results specific to each the shock UCS during punishment training. group. Of particular interest in this con- The results of the gerbil experiment for text is the response which is selected to in- each subject in each of the three groups are crease in probability during the punishment presented in Figure 2. training. The subjects in Group E were Consider first the two results which are punished for eating, and all three of the common to all three groups and which are subjects were observed to increase the prob- of primary interest in terms of the conting- ability of paper shredding following an initial ency rules. First, in all cases, the instru- period of disruption. The subjects in Group mental punishment contingency was suf- P were punished for paper shredding, and cient to suppress the referent behavior. in all three subjects the probability of grid Second, and of more interest, in all cases, biting increased during punishment training. only one of the alternative responses in the Group D was punished for drinking and re- 68 PHILIP J. DUNHAM vealed inconsistent results. Subject 1 and behaviors in other situations (cf. Dunham, Subject 2 increased the probability of paper et al., 1969). The emotional behaviors gen- shredding, while Subject 3 increased the erally drop out within a very few sessions probability of grid biting during punishment. of response-contingent punishment training. To what extent are the responses observed Of course, this decrement in grid-biting be- to increase in each group predicted by the havior would not be expected in the case contingency rules? The rules suggest that of noncontingent punishment in the sense the response which most frequently samples that the UCS is maintained at a particular the avoidance contingency in early sessions frequency independent of the changes in be- will be the most probable of the safe alterna- havior. tives. In spite of the fact that the probability The data also have implications for the of the grid-biting response was not measured two-process interpretations of punishment during the first session in four of the animals suppression. Basically, the very general studied, it is safe to assume that two re- prediction that some alternative behavior will sponses were the most probable unpunished increase is supported. It is questionable, behaviors during early punishment sessions. however, to think that the increase observed These are grid-biting and paper-shredding in the alternative behaviors observed in the responses. Both would sample the avoid- present experiments could account for the ance contingency more frequently, and the suppression in the punished response. The grid-biting response which followed the shock time course of the two transition processes event would predict the absence of shock for are very different. The punished behavior a longer period of time more consistently is suppressed very rapidly relative to the than the other responses which were not initial disruption and slow rise of the alterna- sequentially dependent on the shock event. tive behavior. For these reasons, we would expect all sub- The preceding evidence involves the re- jects to reveal an increase in either grid sponse-contingent punishment operation. biting or paper shredding. However, there The author has not yet started an empirical were some instances in which grid biting was analysis of the noncontingent punishment initially more probable than paper shredding, procedures. There are, however, several yet paper shredding was observed to increase studies in the literature which are relevant to levels above base line as grid biting drop- to the predictions made by the rules dis- ped out (e.g., Group E, Subject 1). Fur- cussed in this article as applied to noncon- ther research, in which more precise mea- tingent procedures. As indicated in earlier surements are obtained of the degree to discussion, the organism cannot, by defini- which various responses sample both the tion, reduce the shock frequency when a frequency and time dimensions of the two noncontingent operation is employed. This contingencies, will provide a more critical leaves the temporal dimension of the con- test of the suggested rules and perhaps sug- tingency in an important role. The response gest refinements. which consistently follows the shock event In addition to suggesting that the me- will predict the absence of the next shock thodology and the contingency analysis are onset for a longer length of time than any tractable approaches to the punishment prob- other response in the repertoire. The phe- lem, the gerbil data question the validity of nomenon reported by Morse et al. (1967) in the single-process versions of the alterna- which an elicited behavior develops and tive-response assumption. Stated quite modulates under fixed-interval nonconting- simply, those cases in the present situation ent shock suggests that the organism is which revealed a gradual decline in the prob- sensitive to the temporal dimension of the ability of grid biting over the course of avoidance contingency. More recent evi- punishment training are exactly the op- dence reported by Powell and Creer (1969) posite to changes expected from traditional can be subjected to a similar interpretation. single-process assumptions. This is con- These investigators delivered noncontingent sistent with observations of shock-associated shock at the rate of 100 shocks per session PUNISHMENT: METHOD AND THEORY 69 to pairs of rats and recorded the frequency plied by the empirical rules which have been of aggressive attacks per session. The au- outlined in the preceding discussion. I thors did not specify the intershock interval would like, however, to conclude by suggest- or the session length in their procedure. ing briefly the general directions which one Ten successive days of noncontingent shock might take in the development of such a sessions were conducted and measures of mechanism. First, the data and arguments aggressive behavior indicated that an in- which have been discussed suggest very crease in the amount of aggression occurred strongly that there is little to recommend over the course of the experiment. Although those traditional theoretical accounts of the experiment was designed to determine punishment which start from an alternative the effects of maturation (among other response assumption. In place of this things) on aggressive behavior, it is sug- assumption, the rules which have been out- gested that the 10 days of maturation were lined imply a symmetrical conditioning me- confounded with 10 days of avoidance train- chanism in which (a) those responses which ing during the experiment. Specifically, predict the aversive event are actively inhi- the aggressive behavior of these rats pre- bited and (b) those responses which predict dicted the absence of shock onset for a the absence of the aversive event are actively longer period of time than any of the other excited. :(unmeasured) behaviors in the organism's History provides at least two examples of repertoire. Similar to the monkeys in the symmetrical excitatory and inhibitory me- Morse et al. procedure, the rats in the chanisms which deal with appetitive and Powell and Creer experiment are assumed aversive events. If one is biased toward to have recognized this avoidance contin- Pavlovian conditioning mechanisms, it is gency and adjusted the behavior appro- possible to conceptualize the multiple-re- priately. sponse methodology as a multiple-CS me- Assuming for the moment that the avoid- thodology in which different responses are ance interpretation of the Powell and Creer viewed as different CSs which predict, to experiment is correct, the present rules have varying degrees, the presence or the absence implications for the development of aggres- of the unconditioned stimulus (aversive or sion; namely, that certain stimuli will elicit appetitive). Once conceptualized in terms aggressive behaviors, and under a variety of Pavlovian operations, some version of of procedures used to deliver the stimuli, Pavlov's (1927) concepts of inhibition and the aggressive behavior will develop as an excitation can be developed to explain the avoidance response and be maintained in effects of punishment on performance. the absence of the primary UCS. Under Konorski's (1967) inhibitory and excitatory other conditions, the avoidance contingency processes and some recent modifications of is not present and one should observe ag- these ideas (e.g., Maier, Seligman, & Solo- gression at the unconditioned level. I know mon, 1969) should, for example,, be con- of no theoretical account of aggression which sidered in the context of the multiple-re- implicates an avoidance contingency in the sponse punishment procedure.* development and maintenance of aggression 4 A basic assumption in Konorski's theorizing is in animals. Additional work with temporal the notion that noxious stimuli such as shock elicit schedules of aversive stimulation whJch elim- a drive state, fear, which has general inhibitory inate this avoidance contingency should re- effects on "preservative" drive states such as hun- veal the extent to which it is involved in the ger, thirst, etc. Punishment would be predicted to have suppressive effect on all responses motivated conditioning of the aggressive behavior of by "preservative" drive states. In the experiments various species. conducted thus far, the inhibitory effects which we have observed are specific to the punished Some Concluding Remarks response. For example, when the gerbils were In the absence of more data, it would be punished for eating they continued to drink the usual amount each session. Hence, if Konorski's premature to attempt to elaborate on the inhibition mechanism were to be considered in the type of theoretical mechanism which is im- context of punishment, the inhibitory properties 70 PHILIP J. DUNHAM

Alternatively, if one is biased toward in- ment. Journal of the Experimental Analysis of strumental conditioning mechanisms, it is Behavior, 1969, 1, 156-166. ESTES, W. K. Outline of a theory of punish- possible to conceptualize the multiple-re- ment. In B. A. Campbell & R. M. Church sponse methodology in terms of instrumental (Eds.), Punishment and amrsive behavior. contingencies in which each response has New York: Appleton-Century-Crofts, 1969. either rewarding or punishing consequences. ESTES, W. K., & SKINNER, B. F. Some quantita- Once conceptualized in terms of instrumental tive properties of anxiejty. Journal of Experi- operations, some version of Thorndike's mental Psychology, 1941, 29, 390-400. FALK, J. L. The motivational properties of sched- (1913) positive and negative Law of Effect ule induced polydipsia. Journal of the Experi- can be elaborated upon as a symmetrical re- mental Analysis of Behavior, 1966, 9, 19-25. inforcement mechanism, Rachlin and KONORSKI, J. Integrative activity of the brain. Herrnstein (1969) have recently made such An interdisciplinary approach. Chicago: Uni- a suggestion. versity of Chicago Press, 1967. MAIER, S. F., SELIGMAN, M. E. P., & SOLOMON, In either case, it is hoped that the multi- R. L. Pavlovian fear conditioning and learned ple-response base-line procedure will help to helplessness. In B. A. Campbell & R. M. Church restore the interaction between punishment (Eds.), Punishment and aversive behavior. New theory and punishment data which has been York: Appleton-Century-Crofts, 1969. lacking in the punishment literature. MILLER, N. E., & DOLLARD, J. Social learning and imitation. New Haven: Yale University Press, of shock would have to be more "drive" or "re- 1941. sponse" specific than he has assumed. Estes (1969, MORSE, W. H., MEAD, R. N., & KELLEHER, R. T. p. 69), in his more recent theorizing, makes an Modulation of elicited behavior by a fixed-in- assumption very similar to Konorski's. Specifically, terval schedule of electric shock presentation. he suggests that the activation of negative drive Science, 1967, 1S7, 215-217. systems (e.g., shock-produced attack or flight) re- MOWRER, O. H. On the dual nature of learning sults in the inhibition of positive drive systems, a reinterpretation of "conditioning" and "prob- which, in turn, accounts for the decrement in lem-solving." Harvard Educational Review, punished responding. Again, it should be noted 1947, 17, 102-148. that the gerbil data suggest that the only positive PAVLOV, I. V. Conditioned reflexes. (Trans, by drive system which appears to be permanently G. V. Anrep) New York: Dover, 1927. inhibited during punishment is the specifically POWELL, D. A., & CREER, T. L. Interaction of de- punished drive system—or its associated response. velopmental and environmental variables in shock- elicited aggression. Journal of Comparative and REFERENCES Physiological Psychology, 1969, 69, 219-226. AZRIN, N. H., & HOLZ, W. C. Punishment. In PREMACK, D. theory. Nebraska W. K. Honig (Ed.), Operant behavior: Areas Symposium on , 1965, 13, 123-188. of research and applications. New York: Ap- RACHLIN, H., & HERRNSTEIN, R. J. Hedonism re- pleton-Century-Crofts, 1966. visited: On the negative law of effect. In B. A. BOLLES, R. C. Species-specific defense reactions Campbell & R. M. Church (Eds.), Punishment in avoidance learning. Psychological Review, and aversive behavior. New York: Appleton- 1970, 77, 32-48. Century-Crofts, 1969. BOLLES, R. C., & POPP, R. J., JR. Parameters SCHUSTER, R., & RACHLIN, H. Indifference be- affecting acquisition of Sidman avoidance. Jour- tween punishment and free shock: Evidence for nal of the Experimental Analysis of Behavior, the negative law of effect. Journal of the Ex- 1964, 7, 315-321. perimental Analysis of Behavior, 1968, 11, 777- DINSMOOR, J. A. Punishment: I. The avoidance 786. hypothesis. Psychological Review, 1954, 61, 34- SOLOMON, R. L. Punishment. American Psycho- 46. logist, 1964, 19, 239-253. DINSMOOR, J. A. Punishment: II. An interpre- THORNDIKE, E. L. Educational psychology. Vol. tation of empirical findings. Psychological Re- 2. The psychology of learning. New York: view, 1955, 62, 96-105. Columbia University, Teacher's College, Bureau DUNHAM, P. J., & KILPS, B, Shifts in magnitude of Publications, 1913. of reinforcement: Confounded factors or contrast THORNDIKE, E. L. The fundamentals of learning. effects. Journal of Experimental Psychology, New York: Columbia University, Teacher's 1969, 79, 373-374. College, Bureau of Publications, 1932. DUNHAM, P. J., MARINER, A., & ADAMS, H. En- hancement of off-key pecking by on-key punish- (Received April 21, 1970)