Statistical Robustness of Voting Rules
Ronald L. Rivest∗ and Emily Shen†
Abstract you get on a rainy day (when only some fraction of the vot- ers show up). (We assume here that the chance of a voter We introduce a notion of “statistical robustness” for voting showing up on a rainy day is independent of his preferences.) rules. We say that a voting rule is statistically robust if the winner for a profile of ballots is most likely to be the winner We show that plurality voting is statistically robust, of any random sample of the profile, for any positive sam- while—perhaps surprisingly—approval voting, STV, and ple size. We show that some voting rules, such as plurality, most other familiar voting rules are not statistically robust. veto, and random ballot, are statistically robust, while others, We consider the property of being statistically robust a such as approval, score voting, Borda, single transferable vote desirable one for a voting rule, and thus consider lack of (STV), Copeland, and Maximin are not statistically robust. such statistical robustness a defect in voting rules. In gen- Furthermore, we show that any positional scoring rule whose eral, we consider a voting rule to be somewhat defective if scoring vector contains at least three different values (i.e., any applying the voting rule to a sample of the ballots may give positional scoring rule other than t-approval for some t) is not misleading guidance regarding the likely winner for the en- statistically robust. tire profile. Keywords: social choice, voting rule, sampling, statistical One reason why statistical robustness may be desirable robustness is for “ballot-auditing” (Lindeman and Stark 2012), which attempts to confirm the result of the election by checking that the winner of a sample is the same as the overall winner. 1 Introduction Similarly, in an AI system that combines the recommen- It is well known that polling a sample of voters before an dations of expert subsystems according to some aggregation election may yield useful information about the likely out- rule, it may be of interest to know whether aggregating the come of the election, if the sample is large enough and the recommendations of a sample of the experts is most likely voters respond honestly. to yield the same result as aggregating the recommendations It is less well known that the effectiveness of a sample in of all experts. In some situations, some experts may have predicting an election outcome also depends on the voting transient faults or be otherwise temporarily unavailable (in a rule (social choice function) used. manner independent of their recommendations) so that only We say a voting rule is “statistically robust” if for any a sample of recommendations is available for aggregation. profile the winner of any random sample of that profile is Since our definition is new, there is little or no directly most likely to be the same as the (most likely) winner for the related previous work. The closest work may be that of complete profile. While the sample result may be “noisy” Walsh and Xia (Walsh and Xia 2011), who study various due to sample variations, if the voting rule is statistically “lot-based” voting rules with respect to their computational robust the most common winner(s) for a sample will be the resistance to strategic voting. In their terminology, a voting same as the winner(s) of the complete profile. rule of the form “Lottery-Then-X” (a.k.a. “LotThenX”) first To coin some amusing terminology, we might say that a takes a random sample of the ballots, and then applies voting statistically robust voting rule is “weather resistant”—you rule X (where X may be plurality, Borda, etc.) to the sam- expect to get the same election outcome if the election day ple. Their work is not concerned, as ours is, with the fidelity weather is sunny (when all voters show up at the polls) as of the sample winner to the winner for the complete profile.
∗ Amar (Amar 1984) proposes actual use of the “random bal- Computer Science and Artificial Intelligence Laboratory, lot” method. Procaccia et al. (Procaccia, Rosenschein, and Massachusetts Institute of Technology, Cambridge, MA 02139 Kaminka 2007) study a related but different notion of “ro- [email protected] bustness” that models the effect of voter errors. †Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139 The rest of this paper is organized as follows. Section 2 [email protected] introduces notation and the voting rules we consider. We Copyright c 2012, Association for the Advancement of Artificial define the notion of statistical robustness for a voting rule in Intelligence (www.aaai.org). All rights reserved. Section 3, determine whether several familiar voting rules are statistically robust in Section 4, and close with some dis- eliminated. Each ballot is counted as a vote for its highest- cussion and open questions. ranked alternative that has not yet been eliminated. The winner of the election is the last alternative remaining. 2 Preliminaries • Plurality with runoff : The winner is the winner of the Ballots, Profiles, Alternatives. Assume a profile P = pairwise election between the two alternatives that receive (B1,B2,...,Bn) containing n ballots will be used to de- the most first-choice votes. A = {A ,A ,...,A } termine a single winner from a set 1 2 m • Copeland: The winner is an alternative that maximizes m of alternatives. The form of a ballot depends on the vot- the number of alternatives it beats in pairwise elections. ing rule used. We may view a profile as either a sequence or a multiset; it may contain repeated items (identical ballots). • Maximin: The winner is an alternative whose lowest score in any pairwise election against another alternative is the greatest among all the alternatives. Social choice functions. Assume that a voting rule (social choice function) f maps profiles to a single outcome (one Other (non-preferential) voting rules we consider are: of the alternatives): for any profile P , f(P ) produces the • Score voting (also known as range voting): Each allow- winner for the profile P . able ballot type is associated with a vector that specifies We allow f to be randomized, in order for “ties” to be a score for each alternative. The winner is the alternative handled reasonably. Our development could alternatively that maximizes its total score. have allowed f to output the set of tied winners; we prefer allowing randomization, so that f always outputs a single • Approval (Brams and Fishburn 1978; Laslier and Sanver alternative. In our analysis, however, we do consider the set 2010): Each ballot gives a score of 1 or 0 to each alter- ML(f(P )) of most likely winners for a given profile. native. The winner is an alternative whose total score is Thus, we say that A is a “most likely winner” of P if no maximized. other alternative is more likely to be f(P ). There may be • Random ballot (Gibbard 1977) (also known as random several most likely winners of a profile P . For most profiles dictator): A single ballot is selected uniformly at random and most voting rules, however, we expect f to act deter- from the profile, and the alternative named on the selected ministically, so there is a single most likely winner. ballot is the winner of the election. Often the social choice function f will be neutral— symmetric with respect to the alternatives—so that chang- 3 Sampling and Statistical Robustness ing the names of the alternatives won’t change the outcome Sampling. The profile P is the universe from which the distribution of f on any profile. While there is nothing in sample will be drawn. our development that requires that f be neutral, we shall sampling process restrict attention in this paper to neutral social choice func- We define a to be a randomized function G P n tions. Thus, for example, a tie-breaking rule used by f in that takes as input a profile of size and an integer k 1 ≤ k ≤ n S this paper will not depend on the names of the alternatives; parameter ( ) and produces as output a sample P k S it will pick one of the tied alternatives uniformly at random. of of expected size , where is a subset (or sub-multiset) P We do assume that social-choice function f is anony- of . mous—symmetric with respect to voters: reordering the bal- We consider three kinds of sampling: lots of a profile leaves the outcome unchanged. • Sampling without replacement. Here GWOR(P, k) pro- We will consider the following voting rules. (For more duces a set S of size exactly k chosen uniformly without details on voting rules, see (Brams and Fishburn 2002), for replacement from P . example.) • Sampling with replacement. Here GWR(P, k) produces Many of the voting rules are preferential voting rules; that a multiset S of size exactly k chosen uniformly with re- is, each Bi gives a linear order Ai1 Ai2 ... Aim. placement from P . (In the rest of the paper, we will omit the symbols and just • Binomial sampling. Here G (P, k) produces a sample write Ai1Ai2 ...Aim, for example.) BIN S of expected size k by including each ballot in P in the • A positional scoring rule is defined by a vector ~α = sample S independently with probability p = k/n. hα1, . . . , αmi; we assume αi ≥ αj for i ≤ j. Thus, the output of the voting rule on a sample might be Alternative i gets α points for every ballot that ranks al- j denoted as f(S), or f(G(P, k)), depending on the situation. ternative i in the jth position. The winner is the alternative that receives the most points. Some examples of positional scoring rules are: Statistically Robust Voting Rules. We now give our main Plurality: ~α = h1, 0,..., 0i definitions. Veto: ~α = h1,..., 1, 0i Definition 1 If X is a discrete random variable (or more Borda: ~α = hm − 1, m − 2,..., 0i generally, some function defined on a finite sample space), we let ML(X) denote the set of values that X takes with • Single-transferable vote (STV) (also known as instant- maximum probability. That is, runoff voting (IRV)): The election proceeds in m rounds. In each round, the alternative with the fewest votes is ML(X) = {x | Pr(X = x) is maximum} denotes the set of “most likely” possibilities for the value any advantage. For non-empty samples, by the assump- of X. tion of statistical robustness under sampling without replace- ment, for any k > 0, f(P ) is the most likely winner of a uni- For example, ML(f(P )) contains the “most likely win- form random sample of size k. Therefore, f(P ) is the most ner(s)” for a (possibly randomized) voting rule f and pro- likely winner of a sample produced by binomial sampling file P ; typically this will contain just a single alterna- with any positive probability p. tive. Similarly, ML(f(G(P, k))) contains the most likely winner(s) of a sample of size k. Note that ML(f(P )) involves randomization only within f (if any), whereas 4 Statistical Robustness of Various Voting ML(f(G(P, k))) also involves the randomization of sam- Rules pling by G. In this section, we analyze whether various voting rules are statistically robust. Definition 2 We say that a social choice function f is statis- tically robust for sampling rule G if for any profile P of size 4.1 Random Ballot n and for any sample size k ∈ {1, 2, ..., n}, Theorem 2 The random ballot method is statistically ro- ML(f(G(P, k))) = ML(f(P )) . bust, with sampling methods GWR, GWOR, and GBIN . That is, an alternative is a most likely winner for a sample Proof: Each ballot is equally likely to be chosen as the of size k if and only if it is a most likely winner for the entire one to name the winner. profile P . 4.2 Score Voting Note that this definition works smoothly with ties: if the Theorem 3 Score voting is not statistically robust. original profile P was tied (i.e., there is more than one most likely winner of P ), then the definition requires that all most Proof: By means of a counterexample. Consider the fol- likely winners of P have maximum probability of being a lowing profile: winner in a sample (and that no other alternatives will have (1) A1 : 100,A2 : 0 such maximum probability). (99) A1 : 0,A2 : 1 Having a statistically robust voting rule is something like There is one vote that gives scores of 100 for A1 and 0 for having an “unbiased estimator” in classical statistics. How- A2, and 99 votes that gives scores of 0 for A1 and 1 for A2. ever, we are not interested in estimating some linear com- Then A1 wins the complete profile. bination of the individual elements (as with classical statis- Under binomial sampling with probability p, A1 wins tics), but rather in knowing which alternative is most likely with probability about p — that is, with about the proba- (i.e., which is the winner), a computation that may be a bility A1’s vote is included in the sample. (The probability highly nonlinear function of the ballots. is not exactly p because the binomial sampling may produce an empty sample, in which case A1 and A2 will be equally likely to be selected as the winner.) A simple plurality example. Suppose we have a plurality For p < 1/2, A2 wins more than half the time; thus score election with 10 votes: 6 for A, 3 for B, and 1 for C. We try voting is not robust under binomial sampling. all three sampling methods, all possible values of k, and see how often each alternative is a winner in 1000 trials; Figure 1 4.3 Plurality reports the results, illustrating the statistical robustness of Throughout this section, we let ni denote the number of plurality voting, a fact we prove in Section 4.3. P For brevity in this paper, we will generally assume that votes alternative Ai receives, with i ni = n. the three kinds of sampling will yield equivalent results; we Theorem 4 Plurality voting is statistically robust, with don’t expect differences in the results depending on which sampling without replacement. sampling process is used. (In a longer paper we would not make this assumption.) Thus, to show a method is not sta- Proof: Assume n1 > n2 ≥ ... ≥ nm, so A1 is the unique tistically robust, it suffices here to show that it is not statis- winner of the complete profile. (The proof below can easily tically robust for one of the three sampling methods. How- be adapted to show that plurality is statistically robust when ever, we do take care to show that plurality is robust under the complete profile has a tie for the winner.) all three sampling methods. Let K = (k1, k2, ..., km) denote the number of votes for In fact, statistical robustness under sampling without the various alternatives within the sample of size k. a a b replacement implies statistical robustness under binomial Let b denote the binomial coefficient “ choose .” n sampling, as shown in the following theorem. Thus, there are k ways to choose a sample of size k from the profile of size n. Theorem 1 If a voting rule f is statistically robust under The probability of a given configuration K is equal to sampling without replacement, then it is statistically robust Pr(K) = (Q ni)/ n. under binomial sampling. i ki k Let γ(i) denote the probability that Ai wins the election, Proof: When binomial sampling returns an empty sample, and let γ(i, kmax) denote the probability that Ai receives then, with a neutral tie-breaking rule, no alternative gains kmax votes and wins the election. GWOR GWR GBIN k A B C k A B C k A B C 1 594 303 103 1 597 299 104 1 507 315 178 2 625 258 117 2 569 325 106 2 619 277 104 3 727 217 56 3 676 260 64 3 698 235 67 4 794 206 0 4 718 256 26 4 763 212 25 5 838 162 0 5 749 219 32 5 822 161 17 6 868 132 0 6 764 212 24 6 879 117 4 7 920 80 0 7 804 181 15 7 930 70 0 8 1000 0 0 8 818 171 11 8 973 27 0 9 1000 0 0 9 842 146 12 9 993 7 0 10 1000 0 0 10 847 150 3 10 1000 0 0
Figure 1: Plurality voting with three sampling schemes on a profile with ten votes: 6 for A, 3 for B, and 1 for C. 1000 trials were run for each sample size, with sample sizes running from k = 1 to k = 10. The entry indicates how many trials each alternative won, with ties broken by uniform random selection. (The perhaps surprisingly large value for C of 178 for GBIN results from the likelihood of an empty sample when k = 1; such ties are broken randomly.) Note that ML(G(P, k)) = A for all three sampling methods G and all sample sizes k.
Then γ(i) = P γ(i, k ), and γ(i, k ) = Theorem 6 Plurality is statistically robust, under sampling kmax max max P with replacement. K∈K Pr(K)/Tied(K), where K is the set of configura- tions K such that ki = kmax and kj ≤ kmax for all j 6= i, Proof: The proof follows the same structure as for sampling and Tied(K) is the number of alternatives tied for the max- without replacement. Again, assume n1 > n2 ≥ ... ≥ nm. imum score in K. (Note that this equation depends on the For each configuration K used in computing γ(1, kmax) tie-breaking rule being neutral.) and the corresponding configuration K0 used in computing For any k , consider now a particular configuration K 0 max γ(2, kmax), we show that Pr(K) > Pr(K ). used in computing γ(1, kmax): K = (k1, k2, ..., km), where Under sampling with replacement, the probability of a k = k and k ≤ k for i > 1. 1 max i max configuration K = (k1, . . . , km) is equal to Now consider the corresponding configuration K0 used in 0 m k computing γ(2, kmax), where k1 and k2 are switched: K = k Y ni i Pr(K) = . (k2, k1, k3, ..., km). k , . . . , k n 0 1 m i=1 Each configuration K used in computing γ(2, kmax) has such a corresponding configuration K used in computing For any configuration K used in computing γ(1, kmax), γ(1, kmax). consider the corresponding configuration K0, obtained by 0 0 Then, by Lemma 1 below, Pr(K) > Pr(K ). Thus, swapping k1 and k2, used in computing in γ(2, kmax): K = γ(1, kmax) > γ(2, kmax). (k2, k1, k3, . . . , km). Then Since γ(1, kmax) > γ(2, kmax) for any kmax, we have k k m k that γ(1) > γ(2); that is, A1 is more likely to be the winner k n2 1 n1 2 Y ni i Pr(K0) = . of the sample than A2. k , . . . , k n n n 1 m i=3 By a similar argument, for every i > 1, γ(1, kmax) ≥ γ(i, k ) for any k , so γ(1) > γ(i). Therefore, A is (k1−k2) 0 max max 1 So Pr(K) = (n1/n2) Pr(K ). If n1 > n2 and the most likely to win the sample. 0 k1 > k2, then Pr(K) > Pr(K ). If n1 > n2 and k1 = k2, 0 Lemma 1 If n1 > n2, k1 > k2, n1 ≥ k1, and n2 ≥ k2, then Pr(K) = Pr(K ). Thus, γ(1, kmax) > γ(2, kmax) for n n n n 1 2 1 2 every kmax, and therefore, A1 is more likely than A2 to win then k k > k k . 1 2 2 1 a sample without replacement. Proof: We wish to show that By a similar argument, for every i > 1, γ(1, kmax) ≥ n1 n2 n1 n2 γ(i, kmax) for any kmax, so γ(1) > γ(i). Therefore, A1 is > . (1) most likely to win the sample. k1 k2 k2 k1