The Deterrence Effect of Corporate Leniency Programs

Jeong-Yoo Kim∗ Kyung Hee University

December 8, 2015

Abstract

We examine the deterrence effect of the leniency program. Depending on the rela- tive importance of type I errors and type II errors, we characterize either a separating- equilibrium in which only guilty defendants apply for leniency and a pooling equilib- rium in which guilty and innocent types of defendants both apply for leniency. The leniency program has two conflicting effects on the incentive to collude. On one hand, a discount in fines reduces the expected cost of collusion and therefore increases the collusion rate. On the other hand, it discourages firms from colluding due to its in- formational efficiency. Contrary to the widespread perceptions, the model predicts that the latter effect predominates while the former effect is degenerated, so that the leniency program unambiguously reduces the overall collusion rate. Accordingly, it is socially beneficial, as long as the wrongful conviction probability of innocent firms is very low. The underlying intuition is that the antitrust authority must maintain the expected fine to the level not lower than without the leniency program to minimize type II errors, which prevents a diminution in the deterrence effect. This implies that it is never optimal to waive all penalties for any firm to self-report. Even if there are more than two co-defendants, the general intuition that the optimal leniency program allows a discount only to one defendant and the resultant collusion rate is reduced is carried over.

∗(Mailing address) Department of Economics, Kyung Hee University, 1 Hoegidong, Dongdaenumku, Seoul 130-701, Korea, (Tel & Fax) +82-2-961-0986, Email: [email protected]

1 JEL Classification Code: L41 Key Words: leniency programs, multiple defendants, deterrence

Running Head: Corporate Leniency Programs

1 Introduction

The leniency program has been used as a means of law enforcement against illegal antitrust behavior starting from US followed by the EU, Germany, South Korea and Japan. According to Korean Fair Trade Commission (KFTC), the corporate leniency program played a role in investigations for 17 out of 25 uncovered and fined cartels in Korea.1 According to U.S. Justice Department (2010), in the U.S. where the program was first introduced in 1978, firms have been fined more than $5 billion for antitrust crimes since 1996, with over 90 percent of the total attributed to leniency applications. Despite its efficiency argument, there are many criticisms against the leniency program. For example, some critics argue that the leniency program is unfair because it leads to inconsistent punishment for the same illegal corporate behavior and the possibility that a corporate criminal who applies to a leniency program may not receive penalty commensurate with the crime or even receive no penalty.2 Also, the leniency program may weaken the deterrent effect because the leniency program reduces the expected cost of illegal behavior. Notwithstanding these criticisms against the leniency program, it is now one of the most effective tools in cartel investigation. Recently, many authors including Aubert, Kovacic and Rey (2006), Motta and Polo (2003), Spragnolo (2003), Feess and Walzl (2004), Motchenkova (2004) and Harrington (2008) made economic analyses of the leniency program, especially focusing on the dynamic incentives of firms and the optimal design of the leniency program. However, most of the work assumes that the antitrust authority’s objective is to minimize the occurrence of il- legal activities such as price fixing collusion. This assumption is problematic, because the antitrust authority must be concerned about the judicial errors, that is, the possibilities that innocent defendants are penalized (type-I error) and that guilty defendants are acquitted (type-II error) and the antitrust authority actually does care about the possibility of judicial

1See The Korea Times (2012). 2This argument is incomplete, however, because it considers only one side—the possibility that an actually guilty firm might be acquitted with the leniency program, although the opposite possibility that an actually innocent firm who is likely to be convicted might be acquitted with the leniency program clearly exists.

2 errors.3 The leniency program appears to be quite similar to plea bargaining whereby the pros- ecutor is allowed to bargain with defendants for sentences in exchange for their promise to testify against other co-defendants. Therefore, earlier work on plea bargaining may help understand the mechanism of the leniency program through which the society can be made better off. Most of the economic analysis for plea bargaining, however, focus on the single- defendant setting. There are some exceptions such as Kobayashi (1992) and Kim (2009), which consider the situation of multiple co-defendants who are known to be connected with the same crime. However, neither of the papers addressed the dynamic effect of plea bargain- ing on crime deterrence in a multiple defendant setting. Reinganum (1993) and Miceli (1996) undertook the dynamic issue and examined how the practice of plea bargaining can influence the criminal incentive of a potential defendant, but only in a model of a single defendant. In a companion paper by Kim (2015), we considered a dynamic model of plea-bargaining of a prosecutor accusing multiple co-defendants. Thus, in the model, defendants are not ex ante known to be guilty. Guilt or innocence of defendants are endogenously determined as a result of their criminal decisions. The paper shares much intuition in common with the current paper. We will characterize all possible separating-equilibrium in which only guilty defendants apply for leniency and a pooling equilibrium in which both types of defendants (guilty and innocent) apply for leniency. In both types of equilibria, the reduced fines must be fair in the sense that the more culpable defendant receives a harsher penalty. This stands in sharp contrast to the result of Kobayashi, who demonstrated the possibility of an unfair equilibrium in which the more culpable defendant receives a less harsh penalty in a plea bargaining setting. The difference comes mainly from the equilibrium concept employed. We demonstrate that unfairness cannot happen in equilibrium as long as the spirit of is respected by implicitly requiring agents’ beliefs (or predictions) to be consistent with equilibrium strategies. The clear empirical regularities in the present-day US leniency institution motivates the model in which well-reinforced beliefs support veridical matching of beliefs and objective frequency as a forcing principle leading to the equilibrium selection. In a separating equilibrium, the reduced fines must be asymmetric because it is more costly to induce a more culpable defendant to report by accepting its respective discounted

3The mission of Federal Trade Commission (FTC) is to prevent anticompetitive business practices and to accomplish this without unduly burdening legitimate business activity. (See http://www.ftc.gov/about-ftc.) This is an evidence that the antitrust authority is concerned about minimizing the occurrence of judicial errors as well, not just minimizing illegal corporate activities nor maximizing the number of convictions.

3 fine and thus only a less culpable defendant is offered a discount. In a pooling equilib- rium, both defendants are offered discounts. Intuitively, higher pooling fines increase the probability of a type-I error and decrease the probability of a type-II error. The optimal pooling equilibrium is determined by the reduction in fines that balance these two effects (i.e., equating the marginal benefit from reducing type-II error with the marginal cost of increased rates of type-I errors for each defendant). The choice between a separating equi- librium and a pooling equilibrium depends on the relative importance of type-I and type-II errors. If type-I errors are relatively important, then the discounted fines must be charac- terizable as the antitrust authority’s decisions that convey the larger aggregate rewards that are offered and accepted by both types, which leads to the pooling equilibrium. In contrast, the separating equilibrium is selected whenever type-II errors are relatively important. It is worthwhile to compare our result with that of Kobayashi (1992). Kobayashi considers a complete-information game of plea bargaining with two co-defendants. His setup assumes that it is well-known that the defendants are guilty. Therefore, the prosecutor’s objective is specified as maximizing the sum of the expected penalties (or, equivalently, minimizing the sum of expected discounts or aggregate rewards offered in plea-bargain discounts rela- tive to otherwise expected punishment). Kobayashi obtains the counterintuitive result that a more culpable defendant may receive a more lenient penalty, which is intuitively unfair. Such unfair outcomes may occur, for example, when a more culpable defendant (Defendant 1) believes that less culpable Defendant 2 is unlikely to accept his respective offer, while Defendant 2 believes that Defendant 1 is highly likely to accept his respective offer. In this case, Defendant 1 is more likely to reject the offer of the antitrust authority than Defendant 2. Thus, the authority is forced to make a more attractive offer to Defendant 1 to induce him to self-report by accepting its respective offer, implying that the fine offered to Defendant 1 can—in theory—be lower than the fine offered to Defendant 2 (who is less culpable) in equilibrium. This unsettling possibility of the less culpable defendant receiving a more se- vere penalty follows from Kobayashi’s assumption that each defendant’s belief regarding the other’s acceptance decision is exogenously given. However, if the beliefs are required to sat- isfy the consistency condition so as to be consistent with defendants’ equilibrium strategies, then Kobayashi’s unfair plea bargains cannot occur in equilibrium. Another difference between Kobayashi’s model and ours is the meaning of “more culpa- ble.” In our model, we define one defendant as being more culpable if he deserves a larger penalty than the other. In contrast, Kobayashi specifies the more culpable defendant as the defendant who has a higher probability of conviction and whose acceptance of the plea offer increases the conviction probability of the other more. In our model, self-reporting of a

4 more culpable defendant does not necessarily imply a larger increase in the probability of the other defendant’s conviction. Both defendants’ conviction probability is increased equally, no matter which co-defendant may report. In other words, a more culpable defendant does not have more information than the other but possesses the same information as the other. However, because a more culpable defendant’s fine conditional on going to court is expected to be higher, the defendant benefits more from judicial errors, implying that the antitrust authority must offer larger discounts to a more culpable defendant (who is less willing to report), in order to induce him to accept it. Since it is costly to the antitrust authority, she would better offer a discount only to a less culpable defendant. Hence, equilibrium fines must be fair. Despite the attractive static efficiency of reducing judicial errors through the leniency program, there are nevertheless important concerns about the institution’s potential for dynamic inefficiency, because reduced fines may increase the colluding incentive. Presumably, one of the most reasonable criticisms against the leniency program is that the leniency program may soften the deterrent effects of expected penalties, thereby increasing collusion rates. Although it seems fundamental that firms who, as potential defendants, face lower expected fines under the leniency program, our analysis leads to the unambiguous conclusion that the benefit of information revealing exceeds the disadvantage and, thus, the leniency program leads to lower collusion rates. The reason is that the latter effect is degenerated in equilibrium, because the antitrust authority must maintain the expected fine to the level not lower than without the leniency program to minimize type II errors, which in turn prevents a diminution in the deterrence effect. This implies that it is never optimal to waive all penalties for any firm to self-report. This result is contrasted with Harrington (2008) providing sufficient conditions under which it is optimal to waive all penalties for the first firm to come forward. Harrington’s sufficient conditions suggest that the antitrust authority should not provide amnesty when she has a sufficiently strong case in the sense that the conviction probability without insider testimony is very high. He also argued that although other work simply assumes that the first application is always approved, the U.S. Corporate Leniency Policy places some conditions on when leniency is given, and that his result is consistent with the reality. However, the result of this paper does not support his result in that full amnesty should not be allowed for any conviction probability, although it is the case that the reduced fine should be higher if the conviction probability is higher. Therefore, this paper has a stronger policy implication that full amnesty is never optimal rather than it may not be optimal. The main lesson of this paper is that it is optimal to set discounted fines high enough.

5 It has two advantages. First, it can prevent a diminution in the deterrence effect that may possibly result from the leniency program. Second, it can prevent innocent defendants from applying for leniency.4 In October 2014, Samsung Life Insurance and Hanhwa Life Insurance who were accused by Korean Fair Trade Commission (KFTC) applied to leniency programs and were exempted fines by 100% and 50% respectively, but later, the Korean court ruled that there was no collusion for the purpose of limiting competition in spite of some agreement for information exchange. It was the judgement of the court that self-reporting of Samsung and Hanhwa was solely to avoid a possible huge amount of fine. A high discounted fine may provide a solution for the concern of the antitrust authority that was raised by such antitrust cases as KFTC vs. Korean insurance companies.5 The deterrent effect of plea bargaining was analyzed earlier by Reinganum (1993) and Miceli (1996) but only for a single defendant. Miceli’s model is closer to ours in the sense that it uses a screening model just as ours. Miceli (1996) considers a two-stage game in which a legislature first determines criminal punishments, and once crimes are committed as a result of the punishments, actual punishments are determined by prosecutors and judges in the subsequent plea bargaining or trial stage. He shows that if legislatures raise the magnitude of punishment too high, less crime may be deterred due to the response of prosecutors who believe that the punishment does not fit the crime. Reinganum (1993) also considers a two- stage game but it is a built on Reinganum (1988). Our model extends Miceli’s analysis into the case of two co-defendants, although the model itself is quite different. It is also important to compare our result with previous work on the optimal leniency program most of which is modelled as a among colluding firms. Especially Harrington (2008) identified three effects of a leniency policy, the cartel amnesty effect, the deviator amnesty effect and the race to the courthouse effect. The cartel amnesty effect, which is intuitively clear, is that more leniency increases the incentive to collude since it reduces the expected penalty. This effect exists in our model as well, depending on the

4Some argue that less than full amnesty would not be necessary to prevent innocent firms from applying for leniency because applying for leniency would be to admit their guilt, thereby triggering civil litigations incurring an enormous amount of damages. Of course, this possibility restricts the incentive of innocent firms to come forward, but according to U.S. Leniency program, the firm who first comes forward gets additional benefits including protection from punitive damages and no obligation to submit information obtained under the leniency program, (Micron Technology, Inc., Case No. 09-mc-00609 (D.D.C.2010)) and possibly more benefits in the future. Then, it would be difficult to prevent innocent firms who anticipate full amnesty from coming forward. 5Bar-Gill and Gazal-Ayal (2006) provided a similar solution to prevent innocent defendants from pleading guilty. They proposed to restrict the permissible sentence reduction in a plea bargain.

6 size of reduced fine, but in equilibrium, the effect is degenerated since the equilibrium size of a reduced fine is so determined as to maintain the disincentive to form collusion. The deviator amnesty effect meaning that the leniency program provides an incentive to betray (cheat and self-report) by reducing the fine. Since our model is not a repeated game but a one-shot game, the effect does not exist insofar as the deviator is interpreted as a cheater. Even if the deviator refers to the self-reporter, the expected fine of each defendant is not lower in equilibrium than without the leniency program. Finally, the race to the courthouse effect, the meaning of which is clear, does not exist, either, in our model. In fact, we believe that this effect is not due to the leniency program per se, but due to the restriction that amnesty is allowed only to one reporter. In our model, however, the opportunity cost of not self-reporting (opportunity benefit of self-reporting) is larger when two defendants get discounts in fines than when only one defendant gets a discount, because each defendant gets the benefit from the whole discounts in the former case while each one gets the benefit only probabilistically. Moreover, although collusion is indeed reduced due to higher expected costs if only one defendant gets a discount, such a unilateral discount is not to reduce the collusion rate but to reduce type-II errors. The lower collusion rate results directly from a lower expected fine, not from the race to the courthouse effect at least in our model. The paper is organized as follows. In Section 2, we set up the model of the leniency program under incomplete information. In Section 3, taking the discounted fines as given, we analyze the defendants’ decisions about whether to self-report. In Sections 4 and 5, we characterize the separating equilibrium fines and pooling equilibrium fines. In Section 6, an extension is considered that brings defendants’ decisions about whether to collude before they enter into a leniency stage into our model. In section 7, we discuss the effects of two further modifications to our modeling . First, we generalize the model into more than two defendants. We then discuss the possibility that reporting decisions of defendants can reduce the investigation effort of the antitrust authority. Section 8 contains concluding remarks and further caveats. All the proofs are in the Appendix.

2 Model

There are two firms (potential co-defendants) denoted by Di, i = 1, 2. They may have engaged in illegal price fixing behavior or not. That is, they are either guilty (G) or innocent

(I). Defendants are referred to as type t = G or I, denoted Di(G) or Di(I) respectively, if they are guilty or innocent. Each defendant’s type t is known by both defendants but not by the antitrust authority (AA). AA is assumed to know only that Prob(t = G) = α ∈ (0, 1).

7 AA’s belief probability α is assumed to be common knowledge among all players including the judge (J). Suppose that AA is going to investigate the two firms to prosecute them for collusion. Then, in anticipation of the prospect, the firms may apply for leniency.6 According to Harrington (2008), we will refer to this as the prosecution effect. For simplicity, we will assume that the firms are certain that the investigation/prosecution will occur.

Let fi(> 0) be fines ordered by the judge for each defendant convicted of collusion at 7 court. Without loss of generality, the two fines are assumed to satisfy the inequality f1 ≥ f2.

It is also assumed that f1 and f2 are common knowledge. If firms apply for leniency, they may pay reduced fines. Usually, a fine can be reduced in exchange for the reporting Cartel member’s promise to testify in court in support of some fact or piece of evidence. Our model focuses on the leniency program in which the defendant who applies for leniency agrees to testify against the other defendant.8

We consider a simple but fairly general game between AA, D1 and D2 modeled as follows. We say that it is “simple” because it involves only two defendants simply as a benchmark case. Also we allege that it is “general” because the reduced fine does not depend on the order of reporting. Thus, there is no preemption effect leading to the race to the courthouse.9

AA chooses individualized reduced fines ri ∈ [0, ∞) to each defendant Di. Each Di then

decides whether to apply for leniency or not given the reduced fine. If Di applies, then its

fine in court is ri with certainty. If Di does not apply, then J decides whether to convict the firm.10 In this case, J may find that t = G, I based on the evidence submitted at trial by

AA, D1 and D2. The probability that a defendant is convicted depends on t and whether

6In fact, it is not crucial whether the firms apply for leniency before or after investigation starts. The 1993 revision of the U.S. Corporate Leniency Program opens the possibility that firms can apply for leniency even after an investigation has been initiated as long as the investigation is yet to have evidence against it. 7We can think of firm 1 as a larger firm whose gain from collusion would be higher. 8Our model assumes a stochastic type-generating process in which the guilt of the two defendants is perfectly correlated. Therefore, it may appear to some readers that it would not be necessary for a self- reporting defendant to testify against the other defendant. Although the guilt of one defendant does indeed imply guilt of the other given our specification of the true type-generating process, the actual judicial institutions that we are modeling require that each defendant’s guilt be verified individually, which places a real economic burden on the court—to verify the guilt of the other defendant who has not entered into a leniency program—that is the object of our model to explain. 9By the preemption effect, Harrington (2008) means the tendency to apply for lenience earlier than the rival. 10Following Grossman and Katz, we are implicitly assuming that AA can effectively commit to not dis- missing the case.

8 the other defendant has agreed to testify against it. Denote the conviction probability when neither self-reports as q.11 We assume (i) that q(t) ∈ (0, 1) for both types t; and (ii) that q(I) < q(G). Assumption (i) implies that there is a strictly positive probability of both type- I and type-II errors: q(I) > 0 implies there are type-I errors and q(G) < 1 implies there are type-II errors. The assumption that the probability distributions for both types of judicial errors are non-degenerate is crucial in driving the main results of our subsequent analysis. Assumption (ii) reflects the intuition (and empirical reality) that innocent defendants can defend themselves better than guilty defendants can.12

If Di applies for leniency and testifies against Dj, then we assume that Dj is convicted with probability one.13 Indeed, J might conceivably interpret a defendant’s refusal to self- report as a signal of that defendant’s innocence, which we rule out, however, with the assumption that such a signal would not affect q(t).14 Moreover, we assume that J has no judicial discretion, implying that once the defendants are found to be guilty at trial, then all 15 J can do is simply order the fine fi for any defendant who did not self-report. In order to isolate the informational motive for the leniency program, which is the focus of our model, from cost-saving motives analyzed by others, our model also abstracts from trial costs by assuming them to equal zero.

Letx ˜i denote the fine that Di deserves to pay and xi denote the reduced fine that

it actually pays. We assume that Di minimizes its own expected fine xi. AA, based on the benevolent social-welfare objective of minimizing judicial error (i.e., matching fines to

11 We are assuming that either both D1 and D2 are convicted or that neither is convicted. In other words, the model’s state space excludes the possibility that only one defendant is convicted. We can show that if one interprets q as the probability that each defendant is convicted and these probabilities are independent, then the subsequent analysis remains valid. 12Grossman & Katz (1983) and Reinganum (1988) also assume that a guilty defendant is more likely to be convicted than an innocent one. 13The 100% conviction probability conditional on one defendant testifying against the other is not as restrictive as it may seem. This assumption can be weakened and is maintained only to simplify the analysis while preserving intuition that one defendant’s self-report (in exchange for testifying against the other) increases the conviction probability of the other defendant. An alternative assumption would be that the conviction probability r(t, α) is strictly less than unity (when the other defendant self-reports) but greater than q(t), which would only change the equilibrium reduction in fines without affecting important qualitative features of the equilibrium. 14There is a rule (Rule 11) in the Federal Rules of Criminal Procedure against admissibility of plea discussions as evidence in court. 15If J realizes the possibility of his own judicial errors, then he or she may prefer a lower fine rather than fi; but reduced fines are not permitted under our assumption of no judicial discretion. Therefore, our setup rules out the possibility of intermediate judgements.

9 appropriately fit the social harm it caused), is assumed to minimize a weighted average of

type-I errors (which occur when xi > x˜i) and type-II errors (which occur when xi < x˜i).

Thus, the defendant Di’s loss-function objective is:

Wi(xi;x ˜i) = xi.

AA’s loss-function objective, denoted LA, measures the loss associated with an accusation against two defendants as the sum of losses Li associated with each defendant (based on the weighted sum of type-1 and type-2 errors for each):

∑2 LA(x1, x2;x ˜i) = Li(xi, x˜i), i=1   θ∥xi, x˜i∥ if xi > x˜i

Li(xi;x ˜i) = (1 − θ)∥x , x˜ ∥ if x < x˜  i i i i 0 if xi =x ˜i,

where ∥x, y∥ is a metric measuring the distance between x and y,16 and θ ∈ (0, 1) represents the relative importance of type-I errors. For simplicity, we assume that ∥x, y∥ = (x − y)2. And because the role of J in our model is simply a machine that orders convictions and fines by applying rules (as specified in our model), there is no need to introduce a loss function for J .

We are implicitly assuming that both defendants are informed of both r1 and r2, but they must make their individual decisions to apply for leniency without first observing the 2 other defendant’s decision. We can then express Di’s strategy as the rule σi : S × T → D, where S = [0, ∞), T = {G, I} and D = {0, 1}, where d ∈ D is an indicator variable representing decisions to self-report: d = 1 if defendant’s decision is to self-report; and d = 0 if defendant’s decision is not to self-report.

3 Defendants’ Decision Rules

As is typical of sequential games, the final stage is considered first. In this section, we develop the second-stage Nash-equilibrium decision rules of the two defendants conditional on the

reduced fines (r1, r2) decided earlier by AA in the first stage of the game.

16A function ∥·, ·∥ is called a metric if it satisfies (i) ∥x, y∥ ≥ 0 with equality if and only if x = y, (ii) ∥x, y∥ = ∥y, x∥ and (iii) ∥x, z∥ ≤ ∥x, y∥ + ∥y, z∥, for all x, y, z. In particular, if ∥x, y∥ = (x − y)2, AA is risk-neutral if n = 1, risk-averse if n > 1 and risk-loving if n < 1.

10 For each type t of two defendants each of whom may reach different decisions about self-reporting, there are four combinations of self-reporting decisions by the two defendants. The 2-dimensional space of AA’s choice of reduction in fines can be partitioned as follows to correspond to defendants’ four possible joint decision profiles (denoting the respective decisions as R for “report” and N for “not report”):17

RR(t) = {(r1, r2) | ri ≤ fi, i = 1, 2},

NN(t) = {(r1, r2) | ri > q(t)fi, i = 1, 2},

RN(t) = {(r1, r2) | r1 ≤ q(t)f1, r2 > f2},

NR(t) = {(r1, r2) |, r1 > f1, r2 ≤ q(t)f2}.

RR(t) represents the set of reduced fines that induce both defendants to report. NN(t) represents the set of reduced fines that fail to induce them to report. RN(t) and NR(t) are the sets of reduced fines that induce only one defendant to report. These four regions are illustrated in Figure 1. As shown in Figure 1, the regions of both reporting and neither reporting, RR(t) and

NN(t), overlap in some set M = {(r1, r2) | q(t)fi < ri ≤ fi}. The implication of this overlap

is that there are multiple equilibria: one in which all (r1, r2) ∈ M are accepted by both

defendants; and the other in which all (r1, r2) ∈ M are rejected by both defendants. If each of the defendants believes that the other defendant will accept the reduced fine and report, then the RR-equilibrium will be realized. If both defendants believe that the reduced fine offered to the other will be rejected, then the NN-equilibrium will be realized. The selection criterion of Pareto dominance can be used to deal with this multiplicity of

equilibria. The NN-equilibrium Pareto dominates the RR-equilibrium as long as ri > q(t)fi. We apply this equilibrium selection criterion as needed.18

17We assume that any ties in payoffs where a defendant might otherwise be indifferent are resolved in favor of reporting. 18Under the Pareto dominance selection criterion, it is easy to see that, whenever t = G, the complete information version of this game has the unique equilibrium offer point labeled as point E in Figure 1. This

result implies that the equilibrium offers of reduced fines must be fair as long as f1 > f2, which is different from Kobayashi (1992). The intuition is that if the co-defendants are known to be guilty, then only type-II errors are possible. Therefore, to minimize type-II errors, AA minimizes the distance between those offers

and (f1, f2). Since D1 benefits more from judicial errors at trial than D2 does (because the conviction probability in court is the same for D1 and D2), AA must offer a larger discount to induce D1 to report. Therefore, AA prefers to give D2 a discount whenever a discount is given to only one defendant. Because D1 knows that D2 will report by accepting r2, then D1 therefore accepts r1 = f1 and reports.

11 If ri < q(t)fi for all i = 1, 2, then “reporting” (d = 1) is the dominant strategy for defendant i in the game between two defendants. However, the game differs from Prisoner’s Dilemma in the sense that the strategy profile obtained when both defendants “defect” by reporting and revealing information is Pareto superior to the cooperative outcome obtained when both defendants “cooperate” by refusing to report. Hence, the strategic interaction between the two defendants is not a “dilemma.” The defect-defect profile (i.e., both players reporting) is, in our model, collectively as well as individually rational. Multiplicity of equilibria occurs in Harrington (2008) as well and he also uses Pareto dominance as an equilibrium selection criterion just as in this paper. His model is different from our model in the sense that the choice of the firms is whether to collude or apply to amnesty (not simply whether to self-report or not) and their model involves infinitely repeated interaction among firms (not simply a reduced form). In his model, the equilibrium in which no firm self-reports Pareto dominates the equilibrium in which all firms self-report, just as in our model. Also, he notes that the amnesty game is turned into a Prisoners’ Dilemma, as leniency gets larger, which is distinguished from our model, because the game between firms cannot be a Prisoners’ Dilemma for any value of r in our model.

4 Separating Equilibrium

The defendants’ types may be revealed by their decisions (in a separating equilibrium) or remain unrevealed (in a pooling equilibrium). The possible separating and pooling offers are shown in Figure 2, which illustrates defendants’ equilibrium decisions, both when t = G and when t = I. In this section, we characterize the separating equilibrium. Consider a point in the separation region (SE) in Figure 3. Note that all points in SE generate the same type-I error, since they are always rejected by innocent defendants. Thus, the AA’s most preferred point in this region is the one that minimizes type-II errors. The type-II errors associated with a point such as point A in Figure 3 can be depicted by an iso-loss circle that consists of pairs of reduced fines that produce the same type-II error. It is not difficult to see that point E yields the least type-II error among the points in the set K = RR(G) ∩ NN(I) of SE. It remains to compare type-II errors associated with point E and points outside K

(but still in SE), such as point B. Since reduced fines at point B are accepted by D2(G) and rejected by D1(G), the comparison here depends on losses with respect to D1(G). Note that the resulting errors associated with D1(G) at point B will be larger than those at point E due to the possibility of J’s judicial errors at court. Hence, point E is the only possible

12 pair of fines that could be a separating equilibrium.

s s Proposition 1 If the pair of reduced fines (r1, r2) is a separating equilibrium , then it must s s be the case that (r1, r2) = (f1, q(G)f2).

The proof is omitted, since it is clear from the above argument. This proposition implies s s that r1 > r2 if f1 > f2, which suggests that the separating equilibrium fines are fair as long 19 as f1 > f2. One interesting feature of the equilibrium pair of fines in Proposition 1 is that those fines must be asymmetric even if f1 = f2. This observation suggests that separating discounted

fines are unfair if and only if f1 = f2. The intuition for this result is as follows. Only guilty defendants choose to accept the fines in a separating equilibrium. Therefore, it is better from AA’s point of view for those fines to be as close to the penalties that fit the crime, (f, f), as possible, where f = f1 = f2. Symmetric fines that are both accepted must be at most q(G)f. Therefore, such pair of fines far apart from (f, f) can never be optimal according to AA’s objective: AA could always reduce type-II errors by slightly increasing one of the fines above the other. Why is a less culpable defendant (assuming both defendants are guilty) offered a lower fine whenever f1 ≠ f2? For separation, at least one guilty defendant must accept its respective fine. From AA’s point of view, it is more costly to induce the more culpable defendant to accept the offer, because a larger discount is required. To see this, note that q(G)fi is the maximal fine offered to Di(G) given that Dj rejects. Then, for the two required discounts, we have the inequality, f1 −q(G)f1 > f2 −q(G)f2, since f1 > f2. This implies that AA always prefers to offer a more lenient discount to the less culpable of the two defendants so that it will be accepted at minimum cost. In short, it is more costly (requiring more discounts) to induce a more culpable defendant to accept a discounted fine and report. This has an important policy implication. It implies that in a separating equilibrium, AA who minimizes the weighted sum of type-I errors and type-II errors awards amnesty to only one firm to come forward. Harrington obtained a similar result, but the economic rationale is quite different. In our model, it is to reduce judicial errors. Clearly, awarding amnesty to more than one firm would cause too much judicial error. On the other hand, Harrington argues that awarding amnesty to more than one firm would simply increase the cartel amnesty effect by reducing expected penalties. However, at least in our model with two

19In fact, this is the unique equilibrium outcome in the game of complete information in which defendants are known to be guilty. This result means that reduced fines are fair even in the case of complete information.

13 firms, his argument does not hold, because collusion requires two firms; once a firm decides to apply for leniency, the other firm alone cannot collude even with the lower expected penality. Also, it implies that it cannot be an equilibrium to waive all penalties to the firm. It is never optimal to award full amnesty because it increases type-II error too much. This is contrasted with Harrington (2008) who provides sufficient conditions whereby it is optimal to waive all penalties for the first firm to come forward. The difference again comes from the assumption on AA’s objective. In our model, awarding full amnesty is not optimal because it increases type II error too much. In Harrington, AA is assumed to minimize the frequency of collusion without caring about the social cost due to judicial errors (type-I errors and type-II errors). Similarly, Motta and Polo (2003) derived a contrasted result that the optimal leniency policy

calls for ri = 0. This is also due to their assumption that AA minimizes the occurrence of collusion.

5 Pooling Equilibrium

In this section, we consider the possibility of a pooling equilibrium in which two defendants of different types (i.e., guilty ones and innocent ones) choose the same action. Although Figure 2 indicates that there are two distinct pooling regions (PL), we now show that pooling equilibrium fines cannot be set in the region labeled NN(G) (in the upper northeast of Figure 2).

Lemma 1 None of the reduced fines in NN(G) can be a pooling equilibrium .

Next we consider a pair of pooling fines such as point A in Figure 4. Such reduced fines

induce Di(t) to report for all i = 1, 2 and for all t = G, I. Thus, type-I errors occur if t = I, and type-II errors occur if t = G. The iso-loss circles associated with type-I and

type-II errors, respectively, are illustrated by circles in Figure 4. Circle C1 is the set of fine

pairs (r1, r2) that yield the same level of type-I errors, and C2 is the set of fine pairs (r1, r2) yielding the same level of type-II errors. It is straightforward to see from Figure 4 that both type-I and type-II errors can be reduced by moving in the direction of the arrow from point A southeast toward the diagonal segment OR in Figure 4. Thus, it directly follows that point A cannot be a pooling equilibrium pair of fines. On this ground, an equilibrium pair of fines must be located either at a tangency of two iso-loss circles associated with type-I and type-II errors, respectively, or along the horizontal boundary of NN(I). Note that any point on the vertical boundary of NN(I) cannot be a pooling equilibrium pair of fines, though.

14 This asymmetry comes from different culpability (f1 > f2). Intuitively, it can be explained as follows. Any pair of reduced fines (r1, r2) on the vertical boundary of NN(I) means that f2−r2 f2 D1 is given a relatively advantageous discount in the sense that < . Then, for any f1−r1 f1 such point, there exists a pair of fines which is symmetric around the diagonal line given r2 f2 by = . The meaning of this pair is to give relative advantage to D2 in the same way r1 f1 as before i.e., giving D1 a discount of f2 − r2 and D2 a discount of f1 − r1. Then, AA is indifferent between the two pair of fines. Even if the latter pair is not feasible (in the sense that ri < 0 for some i), there must exist a point between them which can induce both innocent defendants to report and this point must be strictly preferred to AA. In other words, AA can duplicate any pair of fines favorable to a more culpable defendant or even be made better by giving the (almost) same discount to a less culpable defendant because it is more costly to induce a more culpable defendant to report.

p p Proposition 2 Suppose there exists a pooling equilibrium pair of fines, (r1, r2). Then it p p ∈ ∪ must be the case that (r1, r2) OR RV in Figure 4.

The proposition above suggests that pooling equilibrium fines are always fair. Unlike separating equilibrium fines, pooling equilibrium fines can be symmetric when f1 = f2. The pooling equilibrium fines are determined by AA’s objective as follows:

− − 2 − 2 − 2 2 min LA(r1, r2) = α(1 θ)[(f1 r1) + (f2 r2) ] + (1 α)θ(r1 + r2). (r1,r2)∈PL

An interior solution must satisfy the two first-order conditions (i = 1,2):

∂LA − − − p − p = 2α(1 θ)(fi ri ) + 2(1 α)ri = 0. (1) ∂ri The first term in the left-hand-side expression above represents the marginal benefit of reduced type-II errors associated with a small increase in ri. The second term measures the marginal cost of an increased probability of type-I error. The optimal reduced fines are defined by the requirement of balancing these two effects. From Equation (1), one obtains the optimal pooling fines:

α(1 − θ) 1 rp = f = f , (2) i α(1 − θ) + (1 − α)θ i 1 + Θ i

≡ (1−α)θ p where Θ α(1−θ) . Note that ri can lie strictly between q(I)fi and q(G)fi. This stands in sharp contrast to civil cases for which the payoff function of a plaintiff is monotonic with

15 respect to ri. In civil cases, monotonicity implies that the plaintiff always prefers larger settlements as long as the acceptance probability remains the same. For corner solutions (offering the more culpable defendant precisely his expected fine without the leniency program, i.e., zero discount), it is convenient to transform the condition

that (r1, r2) ∈ PL into the condition that r1 ≤ f1 and r2 = q(I)f2, which makes use of Proposition 2. Such a corner solution must satisfy the following conditions: 1 rp = f and rp = q(I)f . (3) 1 1 + Θ 1 2 2 Comparing the formulas presented so far for separating and pooling equilibria, two ob- servations are worth emphasizing. First, the pooling equilibrium fines depend on (and vary with respect to) the AA’s views of the relative importance of type-I errors, θ, and belief about the unconditional probability of guilt, α. In contrast, the separating equilibrium pair of fines presented in the previous section is, if it exists, unique (i.e., constant with respect to θ and α). The formulas for the pooling equilibria given in this section provide several comparative static that we investigate next. As the importance of avoiding type-I errors becomes relatively p more important, the pooling equilibrium fines become more lenient (i.e., ri is decreasing with respect to θ). Also, the more likely is the defendants’ guilt or as the importance of avoiding type-II errors becomes relatively more important, then the severer the equilibrium fines will p be (i.e., ri is increasing with respect to both α and to the importance of type-II errors, 1 − θ).20 It remains to see which of the separating versus pooling fines AA will choose. Intuitively, if AA believes type-I errors are more important, then she will choose low fines that induce both defendants to report, which leads to a pooling equilibrium. If instead AA believes that type-II errors are more important, then she will prefer high fines so that innocent defendants refuse to report, which leads to the separating equilibrium. Figure 5 illustrates the equilibrium fines with changes in Θ.

As a concrete numerical example, we consider the parameterization for which f1 = 2, f2 = 1, q(G) = .8 and q(I) = .4. With exogenous parameters set to these values, AA prefers the separating fines over the pooling fines if Θ < Θ ≈ .466 and prefers the pooling fines otherwise. 20If the defendants coordinate on the RR-equilibrium, then the pooling equilibrium fines are always in- p ≤ terior solutions satisfying Equation (2) up to ri fi. Therefore, both the implication of fairness and the comparative statics obtained in NN-equilibrium remain valid.

16 6 Incentive to Join in a Collusion

The analysis above assumes that the defendants’ decision of whether to join in a collusion has already been made. In this section, we consider an extended model in which defendants decide whether to collude, anticipating that the leniency procedure will follow in the event they are prosecuted. Intuitively, the leniency program has two conflicting effects on the incentive to join in a collusion. Insofar as the leniency program can improve informational efficiency by increasing the probability that guilt is revealed, then a case can be made that leniency discourages firms from colluding. On the other hand, discounted fines reduce the expected cost of collusion and therefore is frequently criticized as incentivizing collusion. We will refer to the colusion- disincentivizing effect as the high-conviction-rate effect and the collusion-incentivizing effect as the low-penalty effect. Despite the two conflicting effects, we will show that the net effect of the leniency program on the overall collusion rate is signed unambiguously. Moreover, the leniency program has the additional effect of reducing judicial errors on social welfare. Thus, the overall social efficiency of the leniency program should be evaluated by taking into account both the static effect (of reducing judicial errors given the collusion rate) and the dynamic effect (on the collusion rate). To analyze the dynamic effect, the basic model presented in Section 2 must be appropriately extended.

Let vi measure Di’s private benefit from joining in a collusion. We assume that vi is

distributed according to the distribution function Fi(vi) over [0, ∞). Unlike Motta and

Polo (2003) and Harrington (2008), we simply use a reduced expression for vi which can be interpreted as a discounted sum of profits realized as a consequence of repeated interac- tion between the firms. This interpretation can be justified if we assume that as soon as AA launches an investigation, firms stop colluding and no more above-normal profits are earned.21 Let p be the probability of future prosecution at the time that the firms decide whether to collude. For the time being, we assume that the probability of prosecution does not depend on whether they collude. Then, a firm decides to join in a collusion if its private benefit

from collusion exceeds the expected cost (i.e., if vi ≥ pE(Wi), where p is the probability of prosecution).22 Since the two co-defendants are assumed to both be involved, the probability

21Moreover, not all collusion are maintained by the threat of Nash-reversion punishments in infinitely repeated games. There are various collusion-facilitating practices which do not rely on infinitely repeated interactions among firms. 22In our model, p is exogenously given just like Miceli(1996), while it is endogenously determined in Reinganum (1993). Since our objective is to focus purely on the effect of the leniency program on the

17 that they jointly committed the crime is α(x1, x2) = Prob(v1 ≥ pW1(x1), v2 ≥ pW2(x2)). If

Di’s expected fine is xi (with no uncertainty), then Wi(xi) = xi; and therefore, α(x1, x2) =

[1 − F1(px1)][1 − F2(px2)]. Thus, the probability that the two co-defendants are guilty is no longer fixed in this extended model but rather endogenously determined based on the defendants’ private benefits and costs leading to their respective decisions about whether or

not to join in a collusion. Note that ∂α/∂xi < 0 for i = 1, 2. AA may be concerned about the social harm caused by collusion as well as by judicial errors. We consider two alternative loss functions for the prosecutor AA. One loss function ˆ is identical to our original specification, LA. The other loss function LA that we introduce in this section is a weighted sum of judicial errors and social harm caused directly by the collusion: ∑2 ˆ LA(x1, x2; λ) = λ Li(xi) + (1 − λ)α(x1, x2)H, (4) i=1 where H is the social harm from collusion itself and λ is the relative weight placed on judicial errors. Note that the second term (social harms) as well as the first term (judicial errors)

are affected by x1 and x2 because the collusion rate depends on the actual fines x1 and x2. The overall effect of introducing the leniency program may depend on whether the re- ≡ (1−α)θ sulting value of Θ α(1−θ) yields a separating equilibrium or a pooling equilibrium in the subsequent leniency game, but the following proposition dispenses with the possibility of a pooling equilibrium. Firms decide whether to join in a collusion by comparing the net gains from colluding fi versus not colluding. Since the pooling fine to Di is 1+Θ in an interior pooling equilibrium, the − fi − fi defendant’s net gain from colluding is vi 1+Θ , and its net gain from not colluding is 1+Θ , fi since the firm is induced to report by the same discounted fine 1+Θ in a pooling equilibrium ≥ fi − fi regardless of joining in a collusion. Thus, they collude if vi 1+Θ 1+Θ = 0. That is, firms always collude in a pooling equilibrium. The reason is that the opportunity cost of joining in a collusion is zero in a pooling equilibrium because the firm’s cost would be the same even if it did not join in a collusion. To put it differently, the possibility of the leniency program makes the opportunity cost of collusion zero in a pooling equilibrium. Therefore, firms will prefer colluding under the pooling equilibrium (unless the private benefit from collusion is negative). Since firms always collude, it implies that α = 1 thereby Θ = 0(< Θ)¯ which is a

collusion rate, our modeling choice is not to incorporate the law enforcement policy choosing the legal

penalty (fi) or the apprehension probability or the prosecution probability (p) into the model. If we were to choose such an extended model, the effect on the collusion rate would be unclear whether it is due to the leniency program itself or due to its indirect effect through a change in fi or p.

18 contradiction to the condition for a pooling equilibrium. The main insight behind the result of no pooling in equilibrium is that although firms have a higher incentive to collude so as to induce higher α in a pooling equilibrium than in a separating equilibrium, AA prefers a pooling equilibrium to a separating equilibrium only if Θ is high, or equivalently, α is low, which is contradictory to each other. This general

insight is carried over to the more general case that pC > pN (> 0) where pC is the prosecution

probability when a firm joins in a collusion and pN is the probability of wrongful prosecution 23 (when he does not join), as long as ∆p ≡ pC − pN is small.

Proposition 3 There exists ρ(> 0) such that for any ∆p ≤ ρ, a pooling equilibrium does not exist in the extended game.

If ∆p is large, a pooling equilibrium may be viable. Let the collusion rate in an interior pooling equilibrium be αP . Then, it must satisfy [ ( )] [ ( )] f f α = 1 − F ∆p 1 1 − F ∆p 2 ≡ Φ(α), (5) 1 + Θ(α) 1 + Θ(α) ( ) fi where Φ(α) represents the collusion rate function in units of probability. Note that F ∆p 1+Θ(α) increases in α so that the right-hand side of equation (5) decreases in α. Figure 6 shows how αP is determined. If αP < α, then αP is a legitimate pooling equilibrium crime rate (and α, as defined above, denotes the minimum value of α for which a separating equilib- rium is supported). The collusion rate in a non-interior pooling equilibrium can be similarly determined. We now consider the possibility of a separating equilibrium. The separating equilibrium obtained in Proposition 1 remains unaffected in this extended game. To see this, consider once again point E in Figure 3, whose coordinates provide the unique separating equilibrium fines in the the leniency game. If we want to stay in the region labeled SE region in Figure

3, the only possible changes in x1 and x2 are to reduce them. However, if we reduce either

x1 or x2 even slightly within the SE region, we can see immediately that this will increase the overall collusion rate. Therefore, the social loss cannot be reduced by introducing such

a change. If we increase x1 or x2 hoping to reduce the collusion rate, then separation is no longer possible. Therefore, the unique separating equilibrium obtained in Proposition 1 remains the unique separating equilibrium in this extended game. Note that this is the case

23 To see this, note that α = [1 − F (b1∆p)] [1 − F (b2∆p)] and lim∆p→0 α(∆p) = 1, because the pooling plea offers b1 and b2 are bounded above.

19 ˆ ˆ whether AA’s loss function is LA or LA. Also, the loss function LA may raise the issue of time inconsistency. Once firms have colluded, AA only needs to be concerned about judicial errors ignoring the colluding incentive, since the collusion rate is already determined, that is, her loss function is now reduced to LA. However, since it is equivalent to minimize LA ˆ and LA in this case, no time-inconsistency problem occurs. As argued above, the leniency program has two conflicting effects on the collusion rate. We will see which effect dominates by comparing the collusion rate associated with the leniency program versus without. First note that the outcome in the case of innocent de- fendants is the same regardless of whether the leniency program is used, since innocent defendants will not report in the separating equilibrium. In the case of guilty defendants, the less culpable defendant faces exactly the same incentive to collude because its penalty under the leniency program is the certainty equivalent, q(G)f2, of its uncertain penalty when the leniency program is not allowed (i.e., it pays the fine f2 with probability q(G)). In contrast, the incentive of the more culpable defendant changes once the leniency program is introduced. Under the leniency program, the more culpable defendant’s fine is f1 and chooses to report. Therefore, its expected penalty under the leniency program, f1, must be greater than the expected penalty without the leniency program, q(G)f1. This is because, under the leniency program, it is subject to the same penalty but with a higher conviction probability due to the other defendant’s testimony. It is exactly the collusion-deterrent or high-conviction effect. Due to this difference (whether the other defendant testifies or not), the leniency program has the effect of reducing the collusion rate in a separating equilibrium by increasing the probability of conviction. The reason why the low-penalty effect does not appear in this separating equilibrium is that the fine reduced to the less culpable defendant is not just a reduced penalty but a rather elaborately calculated penalty that turns out to be exactly the same as the expected penalty (i.e., certainty equivalent) under the no-plea- bargaining institution. By making such an offer of fine (which is not lower than the expected penalty q(G)f2) taking the possibility of type-II error into account, AA can avoid weakening the deterrent effect.24 We turn now to the case that the leniency game selects a pooling equilibrium. The

24In the presence of positive type-I errors, firms must decide whether to collude by comparing net gains from the collusion (i.e., choosing to be guilty) traded off against net gains when they do not collude (i.e., choosing to remain innocent). Since innocent defendants go to trial whether or not they use the leniency program, however, their costs of remaining innocent are the same in both cases. Therefore, we can ignore the costs of not colluding (occurring due to type-I error) when we consider the potential defendants’ incentive to collude in a separating equilibrium. This is contrasted with the case of pooling equilibrium in which costs of not colluding under the leniency program are crucial.

20 viability of a pooling equilibrium requires that αP < α. Also, in any separating equilibrium, we must have α < αS < αN . Therefore, we can conclude that αP < αN , which means that the collusion rate is lowered by the leniency program even in a pooling equilibrium . The intuition is rather tricky, because it is not immediately clear how to compare the severity of expected penalties in the two cases of the no-leniency outcome and the pooling equilibrium with leniency. From the static model, it seems that pooling offers are much less severe than equilibrium separating offers are and also less severe than expected penalties without the leniency program. While it is true that separating-equilibrium plea offers are more severe than pooling-equilibrium plea offers for any given α, the two cases of separating and pooling offers never arise for a single value of α. Pooling equilibria are chosen only for sufficiently low values of α. In other words, AA does not choose to make pooling offers whenever there are concerns that they will lead to higher collusion rate. When can AA expect that the resulting collusion rate will be sufficiently low, satisfying the condition αP < α? One observes from this condition that P will prefer a pooling over a separating equilibrium only if AA is very 1 unlikely to win the case by going to court (i.e., if q(G) is low, satisfying q(G) < 1+Θ ). Then, the increase in the expected penalties of the firms when joining in a collusion is larger in a − pG pI − pooling equilibrium[ than] under no[ leniency program,] because 1+Θ > pGq(G) pI q(I) or 1 − 1 − ≫ equivalently, 1+Θ q(G) pG > 1+Θ q(I) pI if pG pI . If pG and pI differ by much, then the marginal increase in the penalty in a pooling equilibrium when colluding exceeds the marginal increase in the penalty when not colluding, implying that firms have less incentive to collude under the leniency program even in the case of the pooling equilibrium. Again, the intuitive reason why a harsher fine (than without leniency program) is feasible with the leniency program in a pooling equilibrium is that AA can exploit the informational advantage that follows from the testimony of the other firm as a threat. The following proposition summarizes the analysis presented in this section.

Proposition 4 The leniency program reduces the collusion rate.

The proposition says that using the leniency program reduces the collusion rate, because the high-conviction effect dominates the low-penalty effect. Interestingly, the deterrence effect of the leniency program remains the same even if AA does not care about the collusion ˆ rate at all, i.e., her loss function is LA instead of LA. Next, we examine whether introducing the leniency program is socially∑ beneficial. We ˆ 2 − simply assume that the social loss function is identical to LA = λ i=1 Li(xi) + (1 λ)α(x1, x2)H. To compare the social losses when the leniency program is allowed versus when it is not allowed, it is convenient first to compare the social losses due to judicial errors

21 ˆ (the first term in LA). The equilibrium social losses (due to judicial errors) with no leniency program (N) versus with the leniency program (P) are denoted as LN and LP , respectively, and computed as follows:

N N − − 2 2 − N 2 2 L = α (1 θ)(1 q(G))(f1 + f2 ) + (1 α )θq(I)(f1 + f2 ), (6)

P P − − 2 2 − P 2 2 L = α (1 θ)(1 q(G)) f2 + (1 α )θq(I)(f1 + f2 ), (7) where αN and αP are collusion rates under no-leniency and leniency institutions, respectively. The collusion rate αN can be determined from the following reasoning. When the leniency program is not allowed, firms decide to collude if their net gain from colluding exceeds their

net gain from not colluding, that is, if vi ≥ (q(G) − q(I))fi. Here, we can interpret the

net cost of colluding is (q(G) − q(I))fi. If the firm joins in a collusion, it can save the cost

q(I)fi which would be incurred if it did not join. Thus, q(I)fi could be interpreted as the

opportunity benefit from colluding, or −q(I)fi could be interpreted as the opportunity cost of colluding. As we argued before, the introduction of the leniency program has two advantages on social welfare. First, it lowers the collusion rate (αB < αN ), but second, the type-II − 2 2 − 2 2 errors are reduced ((1 q(G)) f2 < (1 q(G))(f1 + f2 )) due to AA’s square loss function reflecting risk aversion, to elaborate, because the actual penalty of the guilty defendants under the leniency program is the certainty equivalent for uncertain penalty under no leniency program. However, since the relative magnitude of type-I error and type-II error is ambiguous − − 2 2 ≷ 2 2 ((1 θ)(1 q(G)) f2 q(I)(f1 + f2 )), it is not possible to tell in general that one is greater than the other. Intuitively, if type-I error is larger than type-II error under the leniency program, the type-I error with the larger weight (1 − αP > 1 − αN ) may increase the overall social loss more than under no leniency program. However, if we assume that the wrongful conviction probability of innocent defendants is very low (i.e., q(I) ≈ 0), then we can say that the leniency program is socially beneficial because its first-order effects are to deter collusion and reduce type-II errors. ˆ Now, if we consider the whole social loss function LA, then the social efficiency of the leniency institution remains. If we add the term αH in the social loss function (due to judicial errors), then the social efficiency of introducing the leniency program is strengthened because it has the additional effect on social losses of decreasing the collusion rate. This result implies that the social loss is even lower if we take the social harm due to collusion into account.

22 7 Discussion

In this section, we discuss the effects of further extensions and generalizations intended to strengthen what we learn from the implications of the model.

General Number of Firms

Our analysis can be straightforwardly extended into the case of n firms, Di, for i ∈ N =

{1, 2, ··· , n} where 2 ≤ n < ∞. To simplify the analysis, we assume that f1 = f2 = ··· = fn ≡ s. Just as in Section 3, we have

RR ··· R(t) = {r ≡ (r1, ··· , rn) | ri ≤ f, i = 1, 2, ··· , n},

NN ··· N(t) = {r | ri > q(t)f, i = 1, 2, ··· , n},

for t = G, I.25 Once again, we have multiple equilibria in M = RR ··· R(t) ∩ NN ··· N(t) =

{r | q(t)f < ri ≤ f, i = 1, 2, ··· , n}. For any r ∈ M, Di reports if at least one of the

other firms report (since ri ≤ f) and it does not report if none of them reports (since

ri > q(t)f). Since the NN ··· N-equilibrium Pareto dominates the RR ··· R-equilibrium, we use the NN ··· N-equilibrium as the equilibrium selection criterion. The insight for the no pooling equilibrium result remains still valid. Since both guilty defendants and innocent defendants accept the same plea bargaining offer in a pooling equi- librium, the opportunity cost of committing a crime is zero, which leads to α = 1. This is contradictory to the condition for the viability of a pooling equilibrium. Therefore, we can focus on separating equilibria. ∪ { | ≤ ≤ } The separating region can be characterized by SE = i∈N ri q(I)f ri q(G)f . This can be explained as follows. Due to the Pareto dominance selection criterion, Di(G) reports given ri ∈ [q(I)f, q(G)f] while Di(I) does not. Thus, r separates the actual type t = G, I by Di’s reporting decision in this case. SE set is drawn below for n = 3. Since innocent defendants does not report for any separating fines, AA only needs to minimize type II errors associated with guilty defendants by searching for the point in SE ≡ ··· ∥ ∥ region∑ that is closest to f (f, f, , f). Let us define the generalized metric by x, y = n − 2 ∈ Rn i=1(xi yi) where x, y . Then, it is clear that type II errors are minimized when the iso-cost sphere with center at s is tangent to one of the hyperplanes Hi = {r | ri = q(G)f}.

25We may consider the possibility that only some (not all) of the firms engage in collusion. But such a collusion is hard to be sustainable, because they cannot compete with non-colluding firms that charge much lower prices if the products are highly substitutable. So, we can exclude the possibility.

23 Therefore, it must be that for some i ∈ N, ri = q(G)f and rj = f for all j ≠ i in a separating equilibrium. That is, only one defendant gets a discount in a separating equilibrium. Since this point of fines is clearly in RR ··· R(G) region, the fines induce all guilty defendants to report and induce none of the innocent ones to report. The separating equilibrium is not unique, since any guilty defendant i ∈ N can be the beneficiary of the lenient plea discount. Note, however, that any separating equilibrium pair of fines is unfair. There is a sharp tradeoff between fairness and efficiency involved in this situation. To enhance efficiency (by minimizing judicial errors), unfairness is an unavoidable sacrifice. The implication on the colluding incentive is also clear. Since the reduced fines are not lower than the expected penalties of the defendants although one of them is lower than the maximum penalties f, the leniency program does not weaken the deterrent effect; rather, it reduces collusion, because it increases the expected penalty by increasing the conviction probability. Therefore, the intuition that more information obtained through the leniency program enables the collusion rate to fall by increasing the probability of convicting guilty defendants successfully is carried over to the general case.

The Effect of Reducing Investigation Expenditures The analysis up until now maintains the assumption that the probability of prosecution (or investigation) is exogenously given.26 However, one of the important justifications for the leniency program is that it can save the antitrust authority’s expenditures on investigation by inducing guilty defendants to self-report voluntarily. In this subsection, we analyze the case in which potential defendants can seek amnesty before AA launches an investigation and the investigation probability is determined by the reporting outcome.27 The timing of seeking amnesty is very crucial for our analysis. In fact, the European Union’s regulation, which was introduced in 1996, limits fine reductions if self-reporting occurs after a case has been opened.28 The current model can partly explain this because a firm’s cooperation after the opening of a case fails to contribute to save the investigation cost even if it may help save the prosecution cost.

26For simplicity, we do not distinguish prosecution from investigation by implicitly assuming that prose- cution follows investigation with a fairly high probability. 27The leniency program can be regarded as a mixture of plea bargaining and self-reporting in the criminal law. It resembles plea bargaining in the sense that it has the effect of saving the cost of proving guilt after prosecution and it resembles self-reporting in the sense that it has the effect of saving the cost of investigation before prosecution. 28See European Union (1996).

24 The post-collusion game is modified as follows. First, AA determines reductions in fines contingent on the reporting behavior of the firms. We assume that only one reporting firm is allowed to be exempted (completely or partially) from fines. Then, based on the promised fine schedules, each firm decides whether to self-report or not. So far, we implicitly assumed that the investigation probability p ∈ (0, 1) is given and that there is some fixed investigation cost K(> 0) that leads to p. Now, we assume that if one of the firms self-report, p = 1 without any investigation cost incurred. For simplicity, we assume that f1 = f2 = f and focus only on the symmetric equilibrium. Suppose AA proposes to reduce the fine up to r to the firm or firms that self-report. First, we consider a mechanism whereby AA grants amnesty to all reporting firms on a non-discriminatory basis. Second, we consider a mechanism that allows discrimination, i.e., AA can grant amnesty only to one firm that reports. The optimal mechanism solves the following constrained optimization problem; ∑2 I min LA = α Li(r, r) + (1 − α)L , (8) r i=1 subject∑ to incentive compatibility conditions and individual rationality conditions, where I 2 2 L = q i=1 Li(f, f) = 2qf . In this separating equilibrium, guilty defendants self-report by accepting the fine r and innocent defendants do not. We assume that defendants play a Nash equilibrium in the subgame given r following the spirit of the analysis in Section 3. Then, unfortunately, the subgame has multiple Nash equilibria as we saw in Section 4. We will use Pareto dominance as a selection criterion again. Then, the analysis of Section 3 enables us to conclude that the constraints are reduced to q(I)f ≤ r ≤ q(G)f. The inequality r ≤ q(G)f comes from the incentive compatibility condition of the guilty type and the inequality r ≥ q(I)f comes from the individual rationality condition of the innocent type. Since q(I) < q(G), it is the optimal mechanism (among symmetric mechanisms that grant amnesty to all the reporting defendants) that AA grants r = q(G)f to both defendants who self-report. Now, consider the second mechanism whereby AA grants amnesty to at most one self- reporter. We assume that if both defendants report, either one can get a discount with equal probability. Note that this mechanism is also ex ante symmetric in the sense that each defendant can get the same expected discount ex post, even though it is ex post asymmetric in the sense that only one defendant gets a discount ex post. In this case, the incentive compatibility condition for guilty defendants is 1 1 r + f ≤ f, 2 2

25 which is reduced to r ≤ f. This is the condition for both guilty defendants reporting to be a Nash equilibrium and this condition coincides exactly with the condition in the discriminatory case. Therefore, if we stick to the Pareto dominance equilibrium selection, we face the same constraints q(I)f ≤ r ≤ q(G)f. Therefore, if AA commits a discount only to one reporting firm, she will optimally choose r = q(G)f only to one reporting defendant with equal probability and make the other firm pay f. Then, we can see that the social cost is lower under this discriminatory mechanism than under non-discriminatory mechanism. This argument can be easily generalized to the case of n(> 2) firms. Suppose that AA grants full amnesty to k(≤ n) firms that self-report with equal probability. Then, the incentive compatibility condition of the guilty defendants is that k n − k f + ≤ f, n f which is reduced to r ≤ f. Therefore, the constraints of the optimization problem remains the same regardless of the number of k, and the resulting optimal mechanism is to reduce fines only to one defendant who reports to minimize type-II errors. Also, note that in this equilibrium, no investigation expenditures are incurred because self-reporting occurs anyway and that the deterrence effect is optimally maintained in the sense that the reduced fine is exactly the same as the one we derived in Section 5. This has several important policy implications. First, it is optimal to allow a discount in fines to only one firm that reports in the sense that type-II errors would be higher if AA grants amnesty to more than one self-reporter due to double discounting. Second, self-reporting occurs in the same likelihood in any separating equilibrium regardless of the magnitude of fine reduction. Although some may suspect that more discounts in fines than the equilibrium reduction will be dynamically more efficient in the sense that it increases the tendency to self-report thereby saving the investigation expenditures, it is not the case at least in this model, because any separating equilibrium fines induce the same likelihood that some (not all) defendant self-reports. Third, it is not important to make firms rush for self-reporting insofar as all that matters is whether or not some defendant self-reports, not how fast they report. Finally, the optimal magnitude of reduced fines remains unaffected even after we take into account the the effect of self-reporting on saving the investigation cost, contrary to the 1996 EU legislation. It is true that differential fine reductions between ex ante self- reporting and ex post self-reporting can have the effect that firms will rush to the courthouse, leading to saving in the investigation cost at the expense of static inefficiency, but firms will self-report before an investigation begins in our model even if AA sets uniform discounted fines for ex ante self-reporters and ex post self-reporters.

26 8 Conclusion and Caveats

In this paper, we considered the colluding incentive of firms and some issues surrounding asymmetric information between AA and the firms in a model of the leniency program. We also discussed the issue of fairness in the leniency program. The model predicts that unfair discounted fines are possible only off the equilibrium so long as beliefs are required to be consistent with equilibrium strategies. We also show that this result is not merely the consequence of asymmetric information but remains valid in a model of complete information. Although fair fines are largely predicted in this paper, it must be acknowledged that leniency outcomes which appear grossly unfair—where the defendant who appears to be most culpable receives the most favourable (i.e., lenient) fine or even goes unindicted—are commonplace. We believe that it is not difficult to explain unfair leniency outcomes as equilibrium behavior under asymmetric conviction probabilities. Another intriguing direction for extending this paper’s model would be to incorporate the possibility that the defendants’ types (whether colluding or not colluding) are not perfectly correlated. Imperfect correlation would be possible if the products they produce are suffi- ciently differentiated, even if it is hardly feasible if their products are quite homogeneous. Extending to cases in which only a proper subset of the firms join in a collusion, the model’s implications based on the mechanism of inferring one defendant’s guilt from others’ reporting choice would likely require substantial modifications.

Appendix

Proof of Lemma 1: The proof follows from comparing AA’s losses from the reduced fines in the pooling equilibrium and her losses from the separating equilibrium fines. When ri > fi for i = 1, 2, AA’s losses are:

− − 2 2 LA(r1, r2) = α(1 θ)(1 q(G))(f1 + f2 ) + L0, (9)

− 2 2 where L0 = (1 α)θq(I)(f1 + f2 ), since both fines are always rejected. On the other hand, AA’s losses from the separating fines are:

2 LA(f1, q(G)f2) = α(1 − θ)(f2 − q(G)f2) + L0. (10)

One can easily see that:

⇐⇒ − 2 2 2 LA(f1, q(G)f2) < LA(r1, r2) (1 q)f2 < f1 + f2 , (11)

27 which always holds. Thus, the proof is complete.

Proof of Proposition 2: The pooling region consists of RR(I), RN(I), NR(I) and NN(G), but we omit considering NN(G) due to Lemma 1. Next, consider an interior point in RR(I). Note that both fines at this point induce both types of defendants to report. Then any point except points on the line segment OR is associated with a lens-shaped area (as illustrated in Figure 4) generated by two iso-loss circles passing through the point so that a deviation to a point in the lens area reduces both type-I and type-II errors. This result implies that all the possible pooling fines in the interior of RR(I) region are in the line segment OR. Now consider the vertical boundary of RR(I). Any point except the one point on OR is, similar as in the previous paragraph, associated with a loss-reducing lens-shaped area overlapping with RR(I), so that it cannot be a pair of pooling equilibrium fines. On the other hand, the lens area associated with a point on the horizontal boundary of RR(I) does not overlap with RR(I), which means that AA cannot choose any better pooling fines by deviating to a point in the loss-reducing lens area. Finally, we consider any point in RN(I) ∪ NR(I). Since a pair of fines in RN(I) and

NR(I) is rejected by D2(I) and D1(I), respectively, the pair of fines that the defendants actually expect to pay will end up on the border between RR(I) and RN(I) (or NR(I), respectively). As before, AA can find a better pair of fines than any point in the border between RR(I) and RN(I) by moving inside a loss-reducing lens area, except for the point

U = (q(I)f1, f2). Similarly, AA can profitably deviate from any point in the border between

RR(I) and NR(I) except for the point V = (f1, q(I)f2). Regarding point U, one can see that the intersection of the loss-reducing lens area and

the set RR(I) is not empty as long as f1 > f2. This means that AA can profitably deviate from point U, which implies that point U cannot be a pair of pooling equilibrium fines. At point V , there is no intersection of the lens area and the set RR(I). Thus, the result follows.

Proof of Proposition 3:(If pC = pN)= p, in a pooling equilibrium, a potential defendant joins ≥ fi − fi − in a collusion if vi p 1+Θ 1+Θ = 0. Because F (0) = 0, α(r1, r2) = 1 F (0) = 1 for any pooling offers (r1, r2), implying that Θ = 0. Because P prefers pooling offers to separating ¯ offers only if Θ ≥ Θ(> 0), it is a contradiction. If ∆p = pC − pN > 0, lim∆p→0 α(∆p) =

lim∆p→0 [1 − F (r1∆p)] [1 − F (r2∆p)] = 1. Therefore, there exists ρ(> 0) such that for any ∆p < ρ, α(∆p) > α, which violates the condition for the existence of a pooling equilibrium.

S N Proof of Proposition 4: α = α(f1, q(G)f2) < α(q(G)f1, q(G)f2) = α because ∂α/∂xi < 0.

28 Because αP < α < αS < αN , the proof is complete.

References

[1] Aubert, C., Patrick, R., and W. Kovacic, 2006, The Impact of Leniency and Whistle- Blowing Programs on Cartels, International Journal of Industrial Organization, 24, 12411266

[2] Baker, S. and C. Mezzetti, 2001, Prosecutorial Resources, Plea Bargaining, and the Decision to Go to Trial, Journal of Law, Economics and Organization 17, 149-167

[3] Bar-Gill, O., O. Gazal-Ayal, 2006, Plea Bargains only for the Guilty, Journal of Law and Economics 49, 353-364

[4] European Union, 1996, Notice on the Non-Imposition or Reduction of Fines in Cartel Cases, Official Journal 207, 4

[5] Feess, E. and M. Walzl, 2004, An Analysis of Corporate Leniency Programs and Lessons to Learn for U.S. and E.U. Policies, University of Maastricht

[6] French Ministry of Justice, 2006, Les chiffres-cles de la Justice

[7] Grossman, G. and M. Katz, 1983, Plea Bargaining and Social Welfare, American Eco- nomic Review 73, 749-757

[8] Harrington, J., 2008, Optimal Corporate Leniency Programs, Journal of Industrial Eco- nomics 56, 215-246

[9] Kim, J.-Y., 2009, Secrecy and Fairness in Plea Bargaining with Multiple Defendants, Journal of Economics 96, 263-276

[10] Kim, J.-Y., 2010, Credible Plea Bargaining, European Journal of Law and Economics 29, 279-293

[11] Kim, J.-Y., 2015, Plea Bargaining with Multiple Defendants and Its Deterrence Effect, Mimeo

[12] Kobayashi, B., 1992, deterrence with Multiple Defendants: An Explanation for “Unfair” Plea Bargains, RAND Journal of Economics 23, 507-517

29 [13] Miceli, T., 1996, Plea Bargaining and deterrence: An Institutionial Approach, European Journal of Law and Economics 3, 249-264

[14] Motchenkova, E., 2004, The Effects of Leniency Programs on the Behavior of the Firms Participating in Cartel Agreements, Tilburg University

[15] Motta, M. and M. Polo, 2003, Leniency Programs and Cartel Prosecution, International Journal of Industrial Organization 21, 347-379

[16] Reinganum, J., 1988, Plea Bargaining and Prosecutorial Discretion, American Economic Review 78, 713-728

[17] Reinganum, J., 1993, The Law Enforcement Process and Criminal Choice, International Review of Law and Economics 13, 115-134

[18] Spagnolo, G., 2003, Divide et Impera: Optimal Deterrence Mechanisms Against Cartels and Organized Crime, University of Mannheim.

[19] The Korea Times, 2012, Is Korea’s Corporate Leniency Program Too Lenient for Large Firms?, available at http://www.koreatimes.co.kr/www/common /printpre- view.asp?categoryCode=335&newsIdx=116152

[20] U.S. Department of Justice, 2010, The Evolution of Criminal Antitrust Enforcement Over the Last Two Decades

30 (r2)

RN NN

f2

RR, NN M qf2 ● E RR NR

(r ) 1

qf1 f1

Figure 1: Mapping from AA's offer space (r1, r2) into two defendants' “Self- report” (R) or “Not report”((N) decision profiles (NN, NR, RR, RN) (r2)

Pooling (PL) RN(G)

RN(I) Separating(SE) NN(G)

f2

NN(I) RR(G) q(G)f2

q(I)f2 RR(I) RANR(G)

Pooling (PL) RR(G) NR(I) (r ) 1

q(I)f1 q(G)f1 f1

Figure 2: Separating and pooling offers given defendants' type, guilty (G) or innocent (I)

(r2)

f2 ● E’ NN(I) ● A RR(G) q(G)f2 ● ● E B q(I)f2

(r ) 1

q(I)f1 q(G)f1 f1

Figure 3: Separating equilibrium (r2)

Q f2 NN(I)

q(G)f2 RR(G) ● A q(I)f2 R V

O (r1) q(I)f1 q(G)f1 f1

C2

C1

Figure 4. Pooling equilibrium r1, r2

f1

q(G)f2

q(I)f2

r1 = f1/(1 + Θ)

r2 = f2/(1 + Θ) Θ Separating 1/q(I) – 1 Pooling

Θ�

Figure 5. Separating and Pooling Equilibrium Offers (Φ)

Φ(α)

(α)

α∗ α 1

Figure 6: Determination of the pooling collusion rate