Downloaded by guest on October 7, 2021 www.pnas.org/cgi/doi/10.1073/pnas.1703440114 and behavioral the between match distributions, with The reward associated 8–16). rewards (3, relative options to their distribute proportion to in tend makers behavior decision their that found decision been for has model It simpler making. relatively a for evidence found science these by prescribed computations (6–8). implement- complex models the of it on capable acting are models, and organisms maximization ing whether such unclear (5). of been terms appeal has subjects probability normative how individual the dictate the Despite to compute used models, optimally be maximization should can such theories Within decision indi- of occur. Bayesian the sum that will the probabilities outcomes the as vidual by computed weighted is values deci- outcomes’ potential the the- expected of The prospect utility outcomes. expected in the sion example, maximize For subjects expected (1–3). (4), their ory income maximize or to utility, models, order reward, these in to decisions According make individuals decisions. make should individuals intense of matter disciplines. a multiple involving been efforts therefore research has behavior choice viduals’ P neuroeconomics behavior choice law behavior. the choice of in importance the returns the diminishing to underscores of according article behave explain The commonly can law. finding so this animals settings, and social humans diminish- and why of nature prevalence in the returns Given ing returns. of diminishing hallmarks of sat- are law Such effort the effort. and reward invested between the relationships with urating fact deci- saturate with in of options associated makers’ can rewards law sion the matching the when simple reward through maximal relatively provide linked dominant The are two returns. behavior the diminishing choice that demonstrates of with article associated frameworks distributing rewards This strategy, relative options. “matching” to their proportion simpler in a indi- behavior use their that to observed tend have choice given viduals ecologists a and with Psychologists associated contingencies situation. knowl- reward complete frame- a the requires This of but edge income. influential, decision widely reward been optimal long- overall has as work life their of viewed maximize the matter are who in subjects makers a researchers economics, and been In economists sciences. has among decisions debate standing make individuals How 2017) Auton 1, Nacional March Universidad Romo, Ranulfo by Edited a Kubanek Jan returns tied diminishing are through matching and making decision Optimal czdfrlcigatertclbss acigi nempirical an is matching crit- basis; been theoretical (17–19). has a phenomenon matching lacking However, for eco- 17). icized to remaining (14, while Analogously conditions compact decision 14). of relatively 13, known wide a 11, be capture may 9, to come (8, has models, law” nomic reward, “matching obtained the the as option of at rate allocated sponding behavior of rate the eateto erbooy tnodUiest colo eiie tnod A94305 CA Stanford, Medicine, of School University Stanford Neurobiology, of Department eerhr npyhlg,eooy oilg,adneuro- and sociology, ecology, psychology, in Researchers how of prescriptions normative provide models Economic rus o osial oe,udrtn,adpeitindi- social predict and and understand, individuals model, suitably of to future How groups. the define decisions eople’s | a,1 | R ainlcoc theory choice rational aciglaw matching i a lorpeetuiiis hswy matching way, This . represent also can B 1 +B 2 B +...B i n | = cnmcmaximization economic R 1 +R i and , 2 R +...R i m eM de oma ´ R i n where , stecorre- the is xc,Mxc iy .. eio n prvdJl ,21 rcie o review for (received 2017 3, July approved and Mexico, D.F., City, Mexico exico, ´ | B i is taey(,81) itiuigterbehavior their distributing 8–16), (3, strategy Behavior. Matching “reward.” as to referred R 1073/pnas.1703440114/-/DCSupplemental at online information supporting contains article This Submission. 1 Direct PNAS a is article This interest. of conflict no declares author paper.The the wrote and , analyzed tools, reagents/analytic effort per reward marginal equalizing when (Methods) attained is limited, optimum is invest can vidual her among R effort of distribution specific harvested total R the maximize effort to When as reward. such options across effort her Making. Decision Optimal why for Results strategy. explanation matching the an follow often provides so connection animals and and the connec- humans frameworks, a of psychological identifies nature it and so, the economic doing the By level. between general tion a at and face face remained maximization to and has matching level turns general article present a The at two elusive. analytically the linked whether be but 28), can 27, models (24, reinforcement of spe- schedules Psychologists used that cific 26). tasks 25, in maximization 20, and (3, matching compared strategy have max- better though a even be match match would subjects not imization cases do other and in maximize whereas debate subjects 24), substantial cases, (23, some of In subject 18–22). a 8, been (7, has behavior choice predict uhrcnrbtos ..dsge eerh efre eerh otiue new contributed research, performed research, designed J.K. contributions: Author dR mi:[email protected]. Email: ol bevdmthn eairi o nysml u also com- but rational. the simple and only settings, efficient not social is behavior and matching observed nature monly in diminishing prevalent Because are returns. returns diminishing relationship with of a law associated effort, the as invested rewards known the the with strat- diminish when optimal options makers their an decision be for can egy matching This options. that their demonstrates of worth article to relative to tend proportion their in distributing makers strategy, behavior “matching” decision simpler that relatively a observed use Psychologists have income. mak- ecologists reward decision overall and their optimal maximize as who individuals ers view models challenge. long-standing and Economic a and understand, been has model, individuals decisions people’s suitably of predict to well-being how However, the societies. affect critically Decisions Significance 1 i h xetdrt frwr,i nti ril o simplicity for article this in is reward, of rate expected the , dE hc fteemdl stemr prpit ocpueand capture to appropriate more the is models these of Which n (E (E ( E n i 1 n h oa eadrt h eiinmkrotisfra for obtains maker decision the rate reward total the ), + ) ) eiinmkrcnmxmz e oa eadby reward total her maximize can maker decision A . R 2 (E PNAS 2 + ) uasadaiasotnflo matching a follow often animals and Humans | . . . E uut8 2017 8, August i + loae toption at allocated notmldcso ae distributes maker decision optimal An R n P (E . i n E h oa fotta nindi- an that effort total The ). i | = o.114 vol. dR E www.pnas.org/lookup/suppl/doi:10. dR max dE 1 dE ( ( E 1 n E 1 ie ht reward that, Given . ) ) pin mut to amounts options i | coshroptions. her across = ilsrwr rate reward yields B o 32 no. dR i dE nproportion in 2 ( E 2 2 | ) 8499–8504 = . . . =

PSYCHOLOGICAL AND COGNITIVE SCIENCES to relative rewards Ri associated with their options, Bi = Ri . This “matching” equation (9, 11) B1+B2+...+Bn R1+R2+...+Rn can be equivalently written as R1 = R2 = ... = Rn . Further- B1 B2 Bn more, noting that behavior B embodies effort E, matching can be stated as R1 = R2 = ... = Rn . It becomes apparent that when E1 E2 En R subjects match, they equalize average reward per effort E across their options. When the value of an option i, Vi , is defined as Ri the ratio of the obtained reward per the invested effort, Vi = , Ei matching equalizes the values Vi = V across the choice options.

Maximization–Matching Relation. Given these observations, max- imization and matching align when marginal reward per effort associated with an option is a strictly monotonic function g of dR R  average reward per effort associated with the option, dE = g E (Methods). The marginal and average quantities for an example relationship between reward and effort are illustrated as dotted and dashed lines in Fig. 1, respectively. This formulation empowers us to identify the relationships Fig. 2. Relationships between reward and effort for which matching is (henceforth referred to as contingencies) between reward and optimal exhibit diminishing returns. Choice behavior has often been stud- effort R(E) for which the two approaches provide an equal ied using the variable-interval schedule task. In this task, a single reward amount of reward. The contingencies emerge simply by defin- is scheduled at an option at a specific rate, such as, on average, one per ing a particular form of the strictly monotonic function g. There minute here. Once scheduled, the reward is available until a subject chooses are only two requirements (Methods) on g to yield contingencies the option. A simulation of the reward harvested in this task (black curve) for which matching delivers maximal reward: (i) g(V ) must have shows that when a subject chooses an option at the same rate as the sched- ule rate (once per minute), reward is delivered on average in 50% of the its derivative greater than zero and (ii) it must be g(V ) < V . cases, as expected. Furthermore, as expected, investing more effort leads to Simple forms of g (Methods) yield contingencies including more reward. Nonetheless, at a certain point, there are diminishing returns. a R(E) = aR ln(E) + c with a > 0, R(E) = RE with 0 < a < 1, The diminishing-returns character of the data can be well captured with the E derived reward–effort contingencies, exemplified by three particular func- and R(E) = a+cE with a, c > 0, where R is the reward associ- ated with an option, E is the invested effort, and a and c are tions (colored curves). A logarithmic function (R(E) = aR ln(E) + c) explains 97.8% of in the data, a Cobb–Douglas function (R(E) = REa) 95.9% constants. E of variance, and a hyperbola (R(E) = a+cE ) 99.7% of variance. Similarly tight These derived contingencies provide tight fits to the relation- fits are observed also for other reward scheduling rates. ship between reward and effort in a dominant task used to study choice behavior (8, 9, 13). In particular, in the variable- interval schedule task (Fig. 2), an increase in a subject’s effort profile of the variable-interval schedule task, exhibit diminishing generally leads to an increase in the harvested reward. How- returns. ever, at a certain point the reward saturates and shows dimin- This, in fact, turns out to be a general finding. For matching to ishing returns with additional effort (Fig. 2). The derived R(E) provide a reward maximum, the reward–effort contingencies of contingencies for which matching is optimal provide excellent the choice options must exhibit diminishing returns (see Methods fits to the diminishing-returns character of this task, explain- for proof). This finding ties the matching law, deemed by many a ing 97.8%, 95.9%, and 99.7% of variance, respectively (Fig. 2). fundamental principle underlying choice behavior (3, 11, 12, 14) It is apparent that these solutions, just like the reward–effort with economic maximization through a surprising link—the law of diminishing returns. This conclusion holds also in environments in which rewards are stochastic and for maximization metrics that explicitly incor- porate probabilistic outcomes. For instance, when rewards Ri are delivered with probabilities pi , an optimal decision maker distributes her effort to maximize the of reward over the individual options, E[R] = p1R1(E1)+p2R2(E2)+...+ pn Rn (En ). Diminishing returns turn out to be a critical require- ment for matching to be optimal also under this optimization criterion (SI Methods). In addition, the E[R] criterion can be rewritten to accommodate expected utility (4, 5), and the same connection between matching and diminishing returns is observed (SI Methods). Discussion and Conclusions The matching law has been an influential description of choice behavior of animals and humans (3, 8–15), but how this principle relates to the maximization apparatus of economic theories and why individuals so often adopt matching have been unclear. This article notes that reward depends on the invested effort and iden- Fig. 1. Relationship between reward maximization and matching. Reward- dR tifies the reward–effort contingencies for which matching is an maximizing behavior and matching align when marginal ( dE ) and aver- R optimal choice strategy. It is found that the contingencies for age ( E ) reward per effort for each option are related through a strictly monotonic function. This function can be distinct from identity, and so the which matching provides a reward maximum share one striking marginal (dotted line) and average (dashed line) quantities can have differ- commonality: They exhibit diminishing returns. In these cases ent slopes. (e.g., Fig. 2), the harvested reward increases as a function of the

8500 | www.pnas.org/cgi/doi/10.1073/pnas.1703440114 Kubanek Downloaded by guest on October 7, 2021 Downloaded by guest on October 7, 2021 hc ersnshproi eprldsonig(0.I this In (40). effort discounting temporal when hyperbolic instance, represents For which (E effort. time mental involves and financial, poral, to heuristic this effective this, an income. like represents reward maximizing returns matching diminishing that of shows the article situations across In effort per (39). compute reward and sources marginal a sources the the evaluate to across to contrast effort derivatives of in of outcomes is allocations to the possible This about effort all learn 38). her must the (37, that visits distributes both agent predator reward-maximizing from and The value value ben- simple. equal major their is harvest The it determine 1. that to source is amount sources with approach the this compared twice spends of 2 prey she efit source of 1, on amount source effort the in twice of as obtains effort she foraging 2 of source per effort) in invested If per source. reward each law, obtained matching (the the predator value to the the according equalizing effectively allow article effort returns her This a diminishing distribute crowding. to of of as regeneration situations such that the factors shows to or due prey saturates com- the the resource either situations depletes example, harvest gradually For (36). predator or returns Foraging effort diminishing foraging involve prey. her monly of distribute to sources how between decide must who predator process. per production production each average within the allocated equalize labor simply total to strategy matching manager the a production, allows total In the processes. maximize the to between case, labor this of distribution optimal exponents an Cobb– antees a equal by with described be function can function. Douglas processes production Cobb–Douglas more production, or in two the if Exemplified returns. by diminishing prescribed constitutes This as just solu- optimal, the be (Methods), of to matching one For exponent as optimal. the is output, matching production which to for function tions regard in Cobb–Douglas labor the input identifies exam- article For (R present (32–35). the theories production, growth ple, including economic economics, and in at as branches investment, lies now many such returns of diminishing economists heart of the by law The elaborated Ricardo. and been Malthus The (31). subsequently labor has and capital idea invested Turgot decreases of to quantities progressively back increasing output history with agricultural its that traces discovered 30) law who The (29, society. returns commonly human diminishing how and of of nature in question figure the returns to diminishing us brings returns diminishing involve- in the (proof uncovering returns to diminishing for matching of crucial for ment prove at requirements sad- maxima problem the a reward the providing or attain framing and minimum that level a shows general also study a but present maximum The land- point. a reward the dle be those in can point as critical which only a scape, attain identified can were matching which solutions for specific these spe- addition, two match- only to identified matching solutions, which studies for cific for requirements Previous the maxima. solutions deriving reward by of attain and level, space optimal general is complete a ing a at frame- optimization a providing and developing literature. by by matching finding previous relates this that from at work arrives derivable study obviously present The not was connects returns that link maximization. intriguing economic an with diminishing be law of matching to psychological law out the con- The turns investment 30) benefit. (29, additional marginal returns where small point relatively a a fers to up effort invested Kubanek iiihn eun loapyt iutosta eto tem- on rest that situations to apply also returns Diminishing a Consider ecology. to apply also article this of findings The with situations on rests critically matching that finding The diminishing on based crucially is matching that finding The (E = ) RE a ,otnue ocpuetedmnsigrtrson returns diminishing the capture to used often ), a R fefr lbri hscs)ms be must case) this in (labor effort of (E = T = ) ,tevletrstk h form the take terms value the ), RE a and R (E Methods). = ) a hnmthn guar- matching then , a +cE E 2,2,2) In 27). 21, (20, 0 V < a = < T R 1 , favral ihngtv aec.Scn,i ssonthat shown is it Second, valence. negative instead value resource with a computing variable as treated First, a is ways. effort of two framework, in present making, the decision in effort-based body on the research to contributes of also article this maximization, gen- reward and be can that functions the of reward variety using and maximize erated extent to the matching on for rests condition sufficient gener- a the present of forms function specific reward using ator derived a approx- functions imply be with nonetheless sat- not imated can of does functions examples saturating matching Such find which maximum. can for the one functions at condition; not urating sufficient is This a harvested. time reward same total maximal provide to ing used exhibit be not can should or task should given subjects a whether matching. of in indicator returns an diminishing as presence of the Therefore, absence 24). exclusively or (23, almost alternative on richer Indeed, converge the (Methods). commonly choosing match cases not such should in subjects subjects article, this to ing effort in,sc stoemdldb adm(aibe ai sched- ratio (variable) rate random reward the by which modeled in ules, those as such tions, con- law strategy. matching good the a to stitutes according article choice present a the making returns, that diminishing suggests men- of and situations In investment effort. deci- monetary options. tal to choice for across applies time function reasoning allocate Analogous value to how optimal with concerned an of sions embodies discounting hyperbolic effort the invested temporal returns, the which diminishing in reaches situations in time that reveals article this regard, 4) nrgigy h rcinlrpeetto fvalue, of representation fractional the Intriguingly, (41). reward express to difficulty the hi fot ihcranfeuny oteoto ihthe with option the to , adjust and certain option each with time of certain effort, effort) a per their over (reward assess, value Accord- makers the 51). period, decision (47, melioration, melioration directly to been candidates, ing has leading matching, the of choice from One derived result the 47–50). to across (16, shown matching been effort in have per strategies behavioral reward Several average options. equalized by ized computations 5). complex such (4, in situations decisions whether choice optimal guide seen maxi- to the necessary provide be metrics to mization representations to utility with remains combined be plausi- it can biologically 46), be (5, to ble appear computations inference neuronal the Bayesian experi- Although underlying prior evidence. given recent individu- Bayesian estimates and how probability ence The their for 45). update prescription should (5, optimal als an outcomes relation- provides and uncertain framework decisions which model utility, between that expected ships terms the maximize probability to subjects incorporates as formalism, such the this decisions been to has make According candidate maximization theory. (16, main reward decision debate the to Bayesian of of lead matter one can may that a environments, frameworks strategy stochastic been In has optimization 44). equilibrium effort 18, Which an such of options. in unit result choice per reward the marginal across brain equalized the by of characterized regions specific in encoded be 43). to (42, found been has maxi- representation returns, fractional diminishing The of matching reward. through contingencies mal representation for value approach, such can on operating and nadto oilmntn h eainhpbtenmatching between relationship the illuminating to addition In match- for condition necessary a embody returns Diminishing acigi o bqiospeoeo.Teeaesitua- are There phenomenon. ubiquitous a not is Matching acigas ersnsabhvoa qiiru,character- equilibrium, behavioral a represents also Matching equilibrium behavioral a represents maximization Reward E nsc iutoswt odmnsigrtrs accord- returns, nondiminishing with situations such In . g . g hs h xett hc iiihn returns diminishing which to extent the Thus, . V PNAS fa pina eadpreffort, per reward as option an of | uut8 2017 8, August R (E R n effort and ) nrae rprinlywith proportionally increases | o.114 vol. E V ntesm scale same the on = E R | o 32 no. circumvents V V | = = 8501 E E R R , ,

PSYCHOLOGICAL AND COGNITIVE SCIENCES highest value. Equilibrium is achieved and subjects match when for a certain λ. This that

the values (reward per effort) are equalized across the options. dR1(E1) dR2(E2) dRn(En) = = ... = . [3] In contrast, optimization, the process that leads to reward max- dE1 dE2 dEn imization, continuously reallocates effort to the option with the Eq. 3 dictates how effort should be allocated to attain a critical point in highest marginal reward per effort. Equilibrium is achieved when the reward landscape. This can be a maximum, a minimum, or a saddle the marginals are equal across the choice options (Optimal Deci- point. To obtain a reward maximum, the leading principal minors of the sion Making). There are two major advantages of melioration bordered Hessian matrix corresponding to Eq. 1, evaluated at critical points, over optimization. First, evaluating reward that has cumulated must alternate in sign, with the first minor (of order 3) showing a positive over a certain time period reduces noise in the reward income sign (34). 2 d Ri(Ei) 00 in stochastic environments. Second, because melioration oper- Let us label = R (Ei) at a certain critical point Ei. The bordered dE2 i i ates on a certain time period, effort can be adjusted with a fre- Hessian for Eq. 1 is quency corresponding to that time period; during optimization,  0 −1 −1 ... −1  effort is adjusted at each point in time. Evaluating cumulative 00  −1 R1 (E1) 0 ... 0  values every so often is a biologically much more plausible strat-  00  B  −1 0 R2 (E2) ... 0  egy compared with computing local derivatives at each point in H (E1, E2, ... ,En) =  .  . . . .   ......  time. This paper shows that making decisions based on a certain  . . . . .  00 time period—inherent to melioration and matching—can con- −10 0 0 Rn (En) stitute a good strategy in choice environments with diminishing For two options, the leading principal minor of order 3 is equal to returns. B 00 00 Deciding between different classes of options, such as whether det H (E1, E2) = −(R1 (E1) + R2 (E2)). to cook at home or eat out at a restaurant, involves a comparison This leading principal minor must be positive, so it must be of utilities and efforts associated with the options. In this regard, R00(E ) + R00(E ) < 0. melioration as a simple behavioral strategy can be generalized to 1 1 2 2 represent utilities instead of reward rates (14, 17). In this gener- This result leads to two important arguments that are used below. First, for alized model, subjects choose the option that provides the high- a critical point to embody a reward maximum—so that the above equation holds—at least for one option and at least for some value of Ei the sec- est utility per effort (or per economic cost). In equilibrium, such 00 00 ond derivative Ri (Ei) must be Ri (Ei) < 0. Second, if for any Ei and for both a strategy leads to matching of effort (or financial resources) to 00 options Ri (Ei) < 0, then the above equation always holds and the critical the relative utilities. point is guaranteed to be a maximum. In sum, the present article links two dominant frameworks of The same two arguments also hold for choice situations that feature choice behavior and finds that matching can be an efficient and three and more options. For example, for three options, the leading princi- effective instantiation of economic maximization as long as the pal minor of order 4 is equal to choice environment features diminishing returns. The observa- det HB(E , E , E ) = −(R00(E )R00(E ) + R00(E )R00(E ) tions that humans and animals so often behave according to the 1 2 3 1 1 2 2 1 1 3 3 + 00( ) 00( )) matching law now find footing in the law of diminishing returns. R2 E2 R3 E3 In light of diminishing returns, matching becomes an efficient and it must be negative. Thus, it must be heuristic to optimal decision making. 00 00 00 00 00 00 R1 (E1)R2 (E2) + R1 (E1)R3 (E3) + R2 (E2)R3 (E3) > 0, Methods and the conditions associated with the principal minors of order 3 must also hold. Thus, again, for a critical point to represent a reward maximum, at Reward and Effort in Variable-Interval Schedules of Reinforcement. The rela- least for one option and at least for some value of E must R00(E ) < 0. And, tionship between reward and effort in Fig. 2 was obtained using a simula- i i i if for any E and for all three options R00(E ) < 0, then the above equation tion of the variable-interval schedule task. In this task, a reward of a certain i i i evaluates positive, and so the critical point is a maximum. Analogously, for magnitude is delivered at random intervals but at a constant overall rate. four options, it must be For example, the data shown in Fig. 2 use a rate of 1 reward per minute. 00 00 00 00 00 00 The decision whether to schedule a reward or not at a certain time is gov- R1 (E1)R2 (E2)R3 (E3) + R1 (E1)R3 (E3)R4 (E4) erned by a Poisson process; each point in time has an equal probability that 00 00 00 00 00 00 + R1 (E1)R3 (E3)R4 (E4) + R2 (E2)R3 (E3)R4 (E4) < 0, a reward will be scheduled. Once a reward is scheduled, it remains avail- able until a subject (a computer in this case) harvests it. The abscissa of the and the conditions associated with the principal minors of lower orders must plot (i.e., effort) provides the number of times per minute that the subject also hold. Here, the sum of all triplets must be negative; for five options, checked whether there was a reward. The subject’s decision to check is also the sum of all quadruplets must be positive, and so on. Each additional driven by a Poisson process in which the probability of checking is the same option adds an additional term in the multiples; if each such term is negative 00 at any point in time. (i.e., the second derivative Ri (Ei) < 0 for all i), the expression flips sign, as required. Thus, it is apparent that the two arguments we initially made for Optimal Decision Making. This section derives how effort should be dis- two options hold for any number of options in this constraint-optimization tributed among choice options to maximize the total reward harvested, problem. R1(E1) + R2(E2) + ... + Rn(En), subject to the constraint that total effort of P Bi a decision maker is limited, E = Emax. The criterion constitutes the total Matching Behavior. Behavior according to the matching law, = i i B1+B2+...+Bn reward to be maximized, independently of effort. Effort serves as a means Ri , where Bi is the rate of behavior allocated at option i and Ri to maximize the total reward. This problem can be solved using the method R1+R2+...+Rn is the corresponding rate of the obtained reward, can be equivalently writ- of Lagrange multipliers. The Lagrangian is in this case formulated as R R ten as 1 = 2 = ... = Rn . Furthermore, noting that behavior B embodies X B1 B2 Bn L(E , E , ... ,E , λ) = R (E ) + R (E ) + ... + R (E ) + λ(E − E ). R R 1 2 n 1 1 2 2 n n max i effort E, the matching law can be stated as 1 = 2 = ... = Rn . i E1 E2 En [1] Maximization–Matching Relation. Matching and optimization align when dR Setting the partial derivatives to 0, we get marginal reward dE associated with an option is a strictly monotonic func- R tion g of the corresponding matching term E : ∂L dRi(Ei) = − λ = 0, [2] dR  R  ∂Ei dEi = g . [4] dE E and so   To see why, incorporating Eq. into Eq. leads to dR1(E1) = R1(E1) = dR (E ) 4 3 dE g E i i = λ 1 1 dR2(E2)  R2(E2)  dRn(En)  Rn(En)  dEi = g = ... = = g . A strictly monotonic dE2 E2 dEn En

8502 | www.pnas.org/cgi/doi/10.1073/pnas.1703440114 Kubanek Downloaded by guest on October 7, 2021 Downloaded by guest on October 7, 2021 o hsfnto,mthn ilsarwr aiu hnteexponent the when maximum reward a yields a matching function, this For reward becomes of amount the acigyed eadmxmmo iiu eed oeyo this on that solely depends apparent minimum a or on maximum requirement reward a yields matching some least at for and rewards estimated for effort to respect with of derivative returns. second reward returns. the Estimated diminishing evaluating the show from maximum, follows must reward proof options a The deliver choice to all matching of function for profiles that generator reward–effort point shown the be saddle of will a it or properties more, minimum, the a on maximum, a solely Eq. is depends of point solution critical the every whether for that landscape reward the in Returns. Diminishing and Maxima Reward the for function specific feedback a of derivation (52). task a schedule to variable-interval corresponds solution This been also has function contingency This (21). (32). psychology in labor behavior input and reinforcement the relate of production to in identified function returns a diminishing model as to output applied originally was that tion ing of estimate subject’s a of label case and Let simple return. case a actual former For an the or with estimate deal an either first be us can option an with associated utbe because must options all for equal be must it maximum, reward a yield to lem Making Decision Optimal func- value the effort cases, for obtained such form actual In the the option. takes to an distribution tion choosing response from their match obtained subjects returns which in cases the Returns. Reward Actual for Contingencies Reward–Effort c Here, contingency: reward–effort following effort, and reward between contingencies the identify law. matching the represents which Inverting invertible. is function ceeae,ie,so iiihn eun,adte os o l values Kubanek all for so do they and negatively returns, are effort. diminishing of options all show for i.e., contingencies accelerated, reward–effort the that proves of value expression some above least the at for and terms matching Returns. Reward Estimated for Contingencies Reward–Effort align. models two i s0 is and l te eadefr otnece a egnrtdb enn a defining by generated be can contingencies reward–effort other All for example, another As aigtre h aciglwadotmzto aet ae ecan we face, to face optimization and law matching the turned Having dR R dE < ˆ i a (E i i g(V r rirr constants. arbitrary are stesbetsetmt fterwr soitdwt option with associated reward the of estimate subject’s the is a i ) g < = 0 (V .Teamsil om fti ucinaepoie next. provided are function this of forms admissible The ). seblw.Tu,Eq. Thus, below). (see 1 a ) R > i E (E d i R E 2 i ,adti au seulfralotos hs whether Thus, options. all for equal is value this and 0, utemr,we hsrqieeti ufild tis it fulfilled, is requirement this when Furthermore, g. i i dE ) R ilstesolution the yields , i au function value a i 2 (E E g(V i i R V ) − xre toption at exerted 1 < (R E E (E ) d R E 1 u eas fmthn,the matching, of because But . ˆ R i = o l pin n o n ausof values any for and options all for 0 2 2 dE 1 (E i hwdta o u osritotmzto prob- optimization constraint our for that showed R(E ) R < bandfruiayeffort unitary for obtained aV i 2 pligtecanrl,tedrvtv fEq. of derivative the rule, chain the Applying i ), = (E R ,i utbe must it 0, g(V ) E Eq. , i i (E ) i R R = eobtain we g, ) ˆ R E = i 2 = i ) (E 1 E E 1 ) (E g = 2 ome htrqieet eas in because requirement, that meet To . = = i o becoming now 4, a 0 R 2 ) V  R i aV ) ˆ E = (E i ˆ i R E a = 6 R E ˆ = i 2 ln(E 2 ) 2 R + where ,  moisteCb–oga func- Cobb–Douglas the embodies i Eq. , = R E R E i . . . For . i E i acigpoie rtclpoint critical a provides Matching i i c i nti xrsin h reward the expression, this In . (E i a g ) i . . . d E . 0 − + i 2 dE = R i ˆ ( ) R(E . ˆ g(V R E 4 = 2 seulto equal is c E = ) R ˆ R i ) 2 . eeae reward–effort a generates > c R n < E i  ˆ ) R E i (E E n (E n n = o tlatoeoption one least at for 0 twl esonhere shown be will It 4. i . a o tlatoeoption one least at for 0 n For . = i ) aV steata reward actual the is ) dR , dE V R(E i (E e snwconsider now us Let Eq. , E i tflosta it that follows It . i i g ) = ,frwihthese which for ), c 0 = e sasg the assign us Let i R(E ,tesolution the 1, argument o becom- now 4, nepee as interpreted a g(V R E ). ˆ i i ilsthe yields , .Further- ). R E i i This . i as and , ˆ R E [6] [7] [5] R ˆ R is i 4 i . oeas itt h aueo h ouin.Frisac,i Eqs. in there- instance, and For solutions. solutions the of the nature of the dictate parameters also fore the constrain requirements These g eadmxmmwhen maximum reward Examples. Specific effort. of values all for 3. Fig. in charted is span to diagonal the requirements, above value a general a of value some account for least at function a solely is equation the of side term matching right the the of that of Given sign equation. the this that through apparent shines matching of beauty The Eq. to according Here, o matching, be for must it situations, return reward E be must it maximum, reward a sent Again, ihe ngreen. in lighted must function generator (g zero the than maximum, greater reward derivative its a have deliver to matching For 3. Fig. by solely governed and is sign the and options, of all properties for the equally so does it sign, tive options is effort to respect returns reward obtained ally returns. reward Actual i 0 hspoe htalotospoiedmnsigrtrs n hyd so do they and returns, diminishing provide options all that proves This . (V o matching, For g(V ) < pia eiinMaking Decision Optimal and 0 ) i eeaino ouin o hc acigyed eadmaxima. reward yields matching which for solutions of Generation E hs when Thus, . < > V V ,teeaetopsiiiis (i possibilities: two are there 0, > ,i ste paetthat apparent then is it ), g g(V V ,afunction a 0, 0 (V = g(V d ) d h rvosscinsoe htmthn rvdsa provides matching that showed section previous The ) V > R E 2 dE > 2 dE R(E = .Frmthn iligarwr aiu (g maximum reward a yielding matching For ). > R(E V V d 2 PNAS and 0 4, 2 2 d h eodpsiiiyi ndisbebcuefor because inadmissible is possibility second The . ,tesaeo h admissible the of space the 0, dE R ) R = ) 2 g dE h ro rcesi iia aho o actu- for fashion similar a in proceeds proof The 1 i E i 2 dR(E R(E (E = (E 0 1 = = dE (V R(E i 1 2 ) E E ) g g(V ) ) = g(V g g | ) ) ome htrqieet n aiginto taking and requirement, that meet To . = g(V 0 R(E : = > 0 0  =   g uut8 2017 8, August R g( ) 0 n,frata eadreturns, reward actual for and, 0 R(E antb eraigwiemaintaining while decreasing be cannot ) .I hscs,tedrvtv fEq. of derivative the case, this In ). 2 ) g R(E R(E (V < E hwdta o rtclpitt repre- to point critical a for that showed E (E = 2 0 E E R(E (V 2 ) ) V E  ) ) ) g(V V  ) h eutn pc that space resulting The .   d ) = Thus, ). hslae swt h rtstof set first the with us leaves This . g(V  2 dE d

0

R(E 2 (V ) dE . . . 2 E g(V R )−V g( i < i ) i dR(E dR(E 2 (E ) dE dE < ) > i R(E V ) ) = E  E g obndwt h atthat fact the with Combined . ) ) < − o tlatoeoto and option one least at for 0 ) n nadto,fractual for addition, in and 0), 0 ) E sue oiieo nega- a or positive a assumes E − (V ) R E | E o l pin n o any for and options all for 0 n V − 2 − E (E ) n o.114 vol. R(E  > n R(E E R(E ) . ) E and 0 ! ) ) stesm o all for same the is ! ! . g(V . | g(V ausi high- is values ) o 32 no. ) g < 5 sallowed is E V g(V and > 0 n (ii and (V | 4 ,i is it 0, ) 6 ) < with 8503 > for V 0 ) .

PSYCHOLOGICAL AND COGNITIVE SCIENCES 0 which g(V) = aV, g (V) > 0 requires that a > 0. It then becomes obvious that all options i and for all values of effort Ei. Under these conditions, it follows ˆ d2R (E ) 2 a reward–effort function aR ln(E) shows diminishing returns with increasing j j d Ri(Ei) 00 that when > 0 for an option j, it must be = Ri(Ei) > 0 for all dE2 dE2 effort E. As another example, in Eq. 6, in addition to a > 0 that again follows j i from g0(V) > 0, the requirement g(V) < V dictates that aV < V and so a < 1. options i. In this case, the principal minors of the bordered Hessian (Opti- It is apparent that a reward–effort function REa with 0 < a < 1 also shows mal Decision Making) become all negative. But negative principal minors diminishing returns. And, in Eq. 7, the g(V) requirements on a reward maxi- constitute a sufficient condition for a reward minimum (34). Thus, increas- mum impose a > 0 and ci > 0. It is easy to see that a reward–effort function ing returns do not allow for matching to be optimal; diminishing returns E 2 with a, ci > 0 also shows diminishing returns. d Ri(Ei) a+ciE are indeed required. An additional consequence of exhibiting the dE2 The previous section proved that this is a general finding. For matching i to deliver a reward maximum, the reward–effort contingencies of all choice same sign for all options is that the leading principal minors (Optimal Deci- options must show diminishing returns. sion Making) either alternate in sign—which is a sufficient condition for a reward maximum—or are all negative—which is a sufficient condition for a Increasing Returns. It is worth also considering the complement, i.e., the minimum. A third possible outcome—a saddle point—would require a dis- possibility that matching might be optimal under situations of increas- tinct pattern of signs of the principal minors (34). ing (and not diminishing) returns. Increasing returns for an option j imply 2 d Rj(Ej) 00 ACKNOWLEDGMENTS. I thank Drs. Leonard Green, Lawrence Snyder, Nina = Rj(Ej) > 0. Reward Maxima and Diminishing Returns showed that dE2 Miolane, and Julian Brown for helpful comments. This work was supported j by the NIH Grant K99NS100986 and by the McDonnell Center for Systems d2R (E ) when Eq. 4 is satisfied and matching holds, i i exhibits the same sign for Neuroscience. dE2 i

1. Friedman M (1953) Essays in Positive Economics (Univ of Chicago Press, Chicago). 27. Rachlin H (1978) A molar theory of reinforcement schedules. J Exp Anal Behav 30: 2. Becker G (1978) The Economic Approach to Human Behavior (Univ of Chicago Press, 345–360. Chicago). 28. Heyman GM (1982) Is time allocation unconditioned behavior? Quant Anal Behav 3. Mazur JE (1981) Optimization theory fails to predict performance of pigeons in a 2:459–490. two-response situation. Science 214:823–825. 29. Shephard RW, Fare¨ R (1974) The Law of Diminishing Returns in Production Theory 4. Kahneman D, Tversky A (1979) Prospect theory: An analysis of decision under risk. (Springer, Berlin), pp 287–318. Econometrica 47:263–291. 30. Brue SL (1993) Retrospectives: The law of diminishing returns. J Econ Perspect 7: 5. Kording¨ K (2007) : What “should” the nervous system do? Science 185–192. 318:606–610. 31. Turgot ARJ (1999) The formation and distribution of wealth. Reflections on Capital- 6. Hollis M, Nell EJ (1975) Rational Economic Man: A Philosophical Critique of Neo- ism, ed Jupp K (Othila Press, London). Classical Economics (Cambridge Univ Press, Cambridge, UK). 32. Cobb CW, Douglas PH (1928) A theory of production. Am Econ Rev 18: 7. Herrnstein RJ (1990) Rational choice theory: Necessary but not sufficient. Am Psychol 139–165. 45:356–367. 33. Knight FH (1944) Diminishing returns from investment. J Polit Econ 52:26–47. 8. McDowell J (2013) On the theoretical and empirical status of the matching law and 34. Baumol WJ (1977) Economic Theory and Operations Analysis (Prentice-Hall, matching theory. Psychol Bull 139:1000–1028. Englewood Cliffs, NJ). 9. Herrnstein RJ (1961) Relative and absolute strength of response as a function of fre- 35. Jones C (2013) Introduction to Economic Growth (WW Norton & Co., New York). quency of reinforcement. J Exp Anal Behav 4:267–272. 36. Real LA (1980) On uncertainty and the law of diminishing returns in evolution and 10. Staddon J (1964) Reinforcement as input: Cyclic variable-interval schedule. Science behavior. Limits to Action: The Allocation Individual Behavior, ed Staddon JER (Aca- 145:410–412. demic, Cambridge, MA), pp 37–64. 11. Herrnstein RJ (1970) On the law of effect. J Exp Anal Behav 13:243–266. 37. Fagen R (1987) A generalized habitat matching rule. Evol Ecol 1:5–10. 12. Baum WM (1974) Choice in free-ranging wild pigeons. Science 185:78–79. 38. Morris DW (1994) Habitat matching: Alternatives and implications to populations and 13. Davison M, McCarthy D (1988) The Matching Law: A Research Review. (Lawrence communities. Evol Ecol 8:387–406. Erlbaum Assoc., Mahwah, NJ). 39. Charnov EL (1976) Optimal foraging, the marginal value theorem. Theor Popul Biol 14. Herrnstein RJ (2000) The Matching Law: Papers in Psychology and Economics (Harvard 9:129–136. Univ Press, Cambridge, MA). 40. Myerson J, Green L (1995) Discounting of delayed rewards: Models of individual 15. Sugrue LP, Corrado GS, Newsome WT (2004) Matching behavior and the representa- choice. J Exp Anal Behav 64:263–276. tion of value in the parietal cortex. Science 304:1782–1787. 41. Levy DJ, Glimcher PW (2012) The root of all value: A neural common currency for 16. Loewenstein Y, Seung HS (2006) Operant matching is a generic outcome of synaptic choice. Curr Opin Neurobiol 22:1027–1038. plasticity based on the between reward and neural activity. Proc Natl Acad 42. Croxson PL, Walton ME, O’Reilly JX, Behrens TE, Rushworth MF (2009) Effort-based Sci USA 103:15224–15229. cost–benefit valuation and the human brain. J Neurosci 29:4531–4541. 17. Rachlin H (1971) On the tautology of the matching law. J Exp Anal Behav 15:249– 43. Prevost´ C, Pessiglione M, Met´ ereau´ E, Clery-Melin´ ML, Dreher JC (2010) Separate val- 251. uation subsystems for delay and effort decision costs. J Neurosci 30:14080–14090. 18. Rachlin H, Battalio R, Kagel J, Green L (1981) Maximization theory in behavioral psy- 44. Vaughan W (1981) Melioration, matching, and maximization. J Exp Anal Behav chology. Behav Brain Sci 4:371–388. 36:141–149. 19. Binmore K (1991) Rational choice theory: Sufficient but not necessary. Am Psychol 45. Kording¨ KP, Wolpert DM (2006) Bayesian decision theory in sensorimotor control. 46:797–799. Trends Cognit Sci 10:319–326. 20. Staddon JER (1980) Limits to Action: The Allocation of Individual Behavior 46. Soltani A, Wang XJ (2010) Synaptic computation underlying probabilistic inference. (Academic, Cambridge, MA). Nat Neurosci 13:112–119. 21. Rachlin H (1982) Economics of the matching law. Quant Anal Behav 2:347–374. 47. Herrnstein RJ, Prelec D (1991) Melioration: A theory of distributed choice. J Econ 22. Rachlin H, Green L, Tormey B (1988) Is there a decisive test between matching and Perspect 5:137–156. maximizing? J Exp Anal Behav 50:113–123. 48. Lau B, Glimcher PW (2005) Dynamic response-by-response models of matching behav- 23. Herrnstein RJ, Loveland DH (1975) Maximizing and matching on concurrent ratio ior in rhesus monkeys. J Exp Anal Behav 84:555–579. schedules 1. J Exp Anal Behav 24:107–116. 49. Corrado GS, Sugrue LP, Seung HS, Newsome WT (2005) Linear-nonlinear-Poisson mod- 24. Baum WM (1981) Optimization and the matching law as accounts of instrumental els of primate choice dynamics. J Exp Anal Behav 84:581–617. behavior. J Exp Anal Behav 36:387–403. 50. Soltani A, Wang XJ (2006) A biophysically based neural model of matching law behav- 25. Otto AR, Taylor EG, Markman AB (2011) There are at least two kinds of probability ior: Melioration by stochastic synapses. J Neurosci 26:3731–3744. matching: Evidence from a secondary task. Cognition 118:274–279. 51. Herrnstein RJ, Prelec D (1992) Melioration (Russell Sage Foundation, New York). 26. Kubanek J, Snyder LH (2015) Matching behavior as a tradeoff between reward maxi- 52. Staddon J, Motheral S (1978) On matching and maximizing in operant choice experi- mization and demands on neural computation. F1000Res 4:147. ments. Psychol Rev 85:436–444.

8504 | www.pnas.org/cgi/doi/10.1073/pnas.1703440114 Kubanek Downloaded by guest on October 7, 2021