This is Chapter 25 of a forthcoming book edited by K. Arrow, A. Sen, and K. Suzumura. It is intended only for participants of my 2007 MAA short course on the “mathematics of ” in New Orleans. Chapter 25 GEOMETRY OF VOTING1 Donald G. Saari Director, Institute for Mathematical Behavioral Sciences Departments of Mathematics and Economics University of California, Irvine Irvine, California 92698

1. Introduction 2. Simple Geometric Representations 2.1 A geometric profile representation 2.2 Elementary geometry; surprising results 2.3 The source of pairwise voting problems 2.4 Finding other pairwise results 3. Geometry of Axioms 3.1 Arrow’s Theorem 3.2 Cyclic voters in Arrow’s framework? 3.3 Monotonicity, strategic behavior, etc. 4. Plotting all outcomes 4.1 Finding all positional and AV outcomes 4.2 The converse; finding election relationships 5. Finding symmetries – and profile decompositions 6. Summary Abstract. I show how to use simple geometry to analyze pairwise and posi- tional voting rules as well as those many other decision procedures, such as runoffs and , that rely on these methods. The value of us- ing geometry is introduced with three approaches, which depict the profiles along with the election outcomes, that help us find new voting paradoxes, compute the likelihood of disagreement among various election outcomes, and explain problems such as the “paradox of voting.” This geometry even extends McGarvey’s theorem about possible pairwise election rankings to

1This research was supported by NSF grant DMI-0233798. My thanks to K. Arrow, N. Baigent, H. Nurmi, T. Ratliff, M. Salles, and K. Sieberg among others for comments and corrections on an earlier draft.

1 indicate all possible pairwise tallies. After using geometry to provide a be- nign interpretation for Arrow’s Theorem, an intuitive argument is described to analyze a variety of seemingly disparate topics such as strategic behav- ior, monotonicity, and the “no-show” paradox. Another geometric approach identifies all possible positional and Approval Voting election outcomes ad- mitted by a given profile: the converse becomes a geometric tool that identi- fies new election relationships. Finally, a geometric “profile decomposition” is described with which we can identify and explain all possible differences in positional and pairwise voting outcomes and generate illustrating profiles for any possible paradox.

1 Introduction

“Geometry of voting” is intended to capture the sense that “a picture is worth a thousand words.” After all, geometry has long served as a powerful tool that provides a global perspective of whatever we happen to be studying while exposing unexpected relationships. This is why we graph functions, plot data, study the Edgeworth box from economics, and use diagrams to enhance lectures. Similarly, the geometry of voting seeks to create appro- priate geometric tools to capture global aspects about decision and voting rules while exposing new relationships. Since most, if not all voting rules in wide use involve pairwise or positional methods, or are based on them (such as runoffs), I emphasize these methods. Why is social choice so complex? In part, it is due to the “curse of dimensionality;” e.g., this is why standard geometric tools fail the challenges offered by social choice. After all, the large dimensions of a profile space alone make it impossible to graph relationships between profiles and their election outcomes. This is displayed already with three alternatives where the 3! = 6 dimensions of profile space overwhelm any hope to use standard graphs to connect profiles with election outcomes. (In this chapter, a profile is used in the traditional manner of specifying the number of voters whose preferences are given by each ranking.) Because standard approaches will not work, we must develop new geometric tools that will offer help. In the next section, for instance, three methods are described that geometrically depict profiles along with their associated procedural outcomes. Section 3 shows how to use geometry to analyze axiomatic issues ranging from Arrow’s Theorem to concerns about strategic behavior, monotonicity, and the “no-show” paradox (where a voter does better by not voting). Ge- ometry even demonstrates why many seemingly dissimilar concerns admit a

2 strikingly similar analysis. A different theme is motivated by the temptation—one that we all expe- rience whenever we encounter a “nail-biting” close election involving three or more candidates—to explore whether the outcome would have changed had a different election rule been used. (For instance, had a different elec- tion rule been used in the 2000 US presidential election, could Gore have beaten Bush?) As published results typically consider only the better known methods, we must wonder what would have happened had any of the infinite number of other rules been used. The geometric approach in Sect. 4 resolves this problem by showing how to depict all possible positional and Approval Voting outcomes for any specified profile. The converse creates an easily used tool to identify new election relationships and to compute probability estimates. In the final section, natural geometric symmetries within a profile are identified and extracted to create a “profile decomposition.” This decom- position permits us to construct, analyze, and describe all possible election paradoxes that can occur with all election rules that are based on positional and/or pairwise election outcomes.

2 Simple geometric representations

As it is not feasible to use standard graphs with choice theory, the first of three approaches described in this section addresses the problem by listing profiles in a manner that roughly mimics the structure of profile space. Advantages of using what I call the geometric profile representation are that it provides simpler and quicker ways to tally positional and pairwise , it helps us develop intuition as to why the same profile can allow different rules to have conflicting election outcomes, and, later, it leads to natural profile relationships that explain the source of several problems from voting theory. The second geometric approach exposes surprisingly complex relation- ships that exist among voting rules. This approach probably can be used to analyze other decision rules because nothing more difficult than elementary geometry and algebra is needed. The third approach, which examines cycles, the paradox of voting, and other intricacies of pairwise voting, identifies all possible profiles that define specified pairwise outcomes; expect surprises in the interpretation of majority votes. Throughout the geometry unveils a common source for all voting problems: problems arise when a voting rule ignores crucial but available information about the profile.

3 2.1 Geometric profile representation A traditional way to describe a profile for the three alternatives A, B, C is to list how many voters prefer each of the six preference rankings; e.g.,

Number Ranking Number Ranking 7 A B C 12 C B A (1) 15 A C B 4 B C A 2 C A B 12 B A C

The tedium of tallying ballots requires sifting through the data to find how many voters rank each candidate in different ways. This suggests searching for alternative ways to represent profiles that will simplify the tallying pro- cess. The following approach was developed in (Saari, 1994, 1995, 2001); for applications, see Nurmi (1999, 2000, 2002) and Tabarrok (2001). Assign each candidate to a vertex of an equilateral triangle (see Fig. 1a). Assign a ranking to each point in the triangle by its distance to each vertex where “closer is better.” This binary relationship divides the triangle into the thirteen regions depicted in Fig. 1: the six small open triangles represent strict rankings while the seven remaining ranking regions, which involve at least one tie, are portions of the lines. In Fig. 1a, for instance, the number 15 is in a region closest to the A vertex, next closest to C, and farthest from B, so it corresponds to an A C B ranking. Points on the vertical line are equal distance from the A and B vertices, so they represent the tie A ∼ B. This geometry positions the ranking regions in a manner similar to that of profile space in that adjacent “ranking regions” differ only by the ranking of an adjacent pair. 14 + 19s C C ...... 18 ...... 29 ...... 2. 12 ...... 34 ... 15...... 4 ... 23 ...... 7. 12 ...... AB AB...... 24 28 ..... 22 + 14s 16 + 19s ...... A ∼ B a. Profile b. Tallying process

Fig. 1. Profile representation and tallies The geometric profile representation places the number of voters with a particular ranking in the associated ranking region; e.g., Fig. 1a represents

4 the Eq. 1 profile. To tally election outcomes, notice that the numbers rep- resenting voters preferring A to B are to the left of the vertical A ∼ B line. Thus, A’s tally in a {A, B} pairwise election is the sum of terms that would be in the three shaded regions of Fig. 1b; B’s tally would be the sum in the three unshaded regions. With the Fig. 1a profile, the B A outcome has the 28 : 24 tally listed under the A–B leg. All other pairwise tallies are similarly computed and listed near the appropriate triangle edge. Notice that the outcome is the cycle B A, C B,A C. As A’s plurality tally is the number of voters who have her top-ranked, in Fig. 1b this tally would be the sum of the numbers in the two heavily shaded regions. In this manner we find that the Fig. 1a plurality ranking is A B C with a 22 : 16 : 14 tally: these are the s = 0 values for the expressions found near the appropriate vertices. A positional election assigns points to candidates based on how each voter positions them on the . It is defined by the voting vector (w1, w2, w3), w1 ≥ w2 ≥ w3, w1 > w3, where wj points are assigned to the candidate that a voter positions in the j place.2 A way to normalize these procedures is to let w = 0 and divide the weights by w to obtain w = ( w1 , w2 , 0) = (1, s, 0) 3 1 s w1 w1 for 0 ≤ s ≤ 1. For instance, a rule assigning 7, 5, and 0 points, respectively, to a voter’s first, second, and third positioned candidate defined by the 7 5 voting vector (7, 5, 0) has the normalized w5/7 = ( 7 , 7 , 0) form. A’s ws election tally is her plurality tally plus s times the number of voters who have her second ranked. In Fig. 1b, this is [the sum of the numbers in the two heavily shaded regions] plus s times [the sum of numbers in the two adjacent regions indicated by the arrow]. With the Fig. 1a data, this [15 + 7] + s[2 + 12] = 22 + 14s value is listed by the A vertex. All similarly computed ws tallies are posted by the appropriate vertex.

2.1.1 Different procedures; different information An allure of voting paradoxes is that they generate delightful mysteries that motivate much of the research in this field. We must anticipate, of course, conflicting election outcomes if different voting rules use different information from a profile. A surprising fact revealed by the geometry of voting is that all differences in election outcomes, all conflicts (including

2Neophytes to this theoretical area often comment that differs from letting w1 = 1, w2 = w3 = 0. This may be true from a psychological or practical perspec- tive, but for a theoretical analysis, which examines the peculiarities and flaws of voting procedures, there is absolutely no difference. Indeed, for decades this is the standard approach to handle plurality voting.

5 Arrow’s and Sen’s theorems) reflect differences in how voting rules use, or even ignore, information from a profile.3 To illustrate this comment with the geometric profile representation, notice how computing A’s pairwise vote in a {A, B} comparison uses data from the lightly shaded “C A B ranking region” in Fig. 1b. But information from this region is totally ignored when computing A’s plurality vote, so, rather than being surprised, we should anticipate differences in A’s plurality and pairwise rankings. Also observe how the ws tallies place varying influence on the second ranked candidates. Thus, as we now must anticipate, these informational differences can generate different election outcomes. A pragmatic way to demonstrate the effects of these informational dif- ferences is to use them to create “paradoxes.” For instance, notice that all ws rules have the same A B C outcome for the Fig. 1a profile. So let’s modify this profile in a way that retains the plurality tallies while creating dramatically different ws outcomes. 14 + 31s C C ...... 30 ...... 29 ...... 2. 12 ...... 3. 4 ...... 22 ... 15...... 16 ... 23 ... 2...... 5 ...... 7. 0 ...... 1. 6 ...... AB...... AB...... 22 + 2s 24 28 16 + 19s a. Modifying profile 1a b. Type numbers

Fig. 2. Profile changes and type numbers To fix the plurality tallies, move numbers only between ranking regions that share a vertex. Such changes keep the plurality tallies fixed, but they can significantly alter the ws conclusions. For instance, to improve bottom ranked C’s standing, move numbers into regions where C is middle ranked; e.g., (see the arrow in Fig. 2a) moving the “12” from the B A C region of Fig. 1a to the B C A region helps C and hurts A. This one profile change breaks the original pairwise cycle to crown C as the Condorcet winner (namely, C beats each of the other candidates in a majority vote) 6 and creates the reversed C B A ws ranking for s > 17 while preserving the plurality A B C outcome with its original tally.

3For Arrow’s and Sen’s results, rather than just counting the number of voters with each ranking, a profile needs to identify each voter’s ranking.

6 2.1.2 Higher dimensions and “type numbers” For convenience, I designate the transitive rankings by their “type num- bers” as assigned in Fig. 2b; e.g., a “type 4” ranking is “C B A.” With this notation, the Fig. 2a profile has the vector representation p = (7, 15, 2, 12, 16, 0). More alternatives require higher dimensions. With four candidates, for instance, assign each candidate to a particular vertex of an equilateral tetrahedron and divide the tetrahedron into 4! = 24 ranking re- gions in a manner similar to the above. There are easy ways (Saari, 2000a, 2001) to convert the tetrahedron into a comfortable two-dimensional figure.

2.2 Elementary geometry; surprising results A surprise is how complicated voting structures can be discovered by using only elementary geometry and algebra. The approach (introduced in Saari and Valognes, 1998, 1999) is illustrated here with pairwise and positional methods. These rules enjoy a convenient scaling property where, rather than needing to know the number of voters with each preference ranking, an election outcome can be determined just by knowing the fraction of all voters with each preference. It is this scaling feature that allows us to create a theory for voting rules rather than being forced into using multiple ad hoc analyses with that are separately examined based on the number of voters, say one analysis for Senate bill #100 involving one hundred voters and another one for the election of a senator, which may involve millions of voters. If both elections involve positional rules, the scaling property allows the same general analysis to apply to both elections. As shown next, surprising complexities already arise with only three types of voter preferences! To illustrate, consider those Fig. 3a profiles where voters have only type 2, 5 or 4 preferences; i.e., preferences A C B, B C A, or C B A.A n-voter profile has n2, n5, n4 voters with these respective preferences. To use the scaling, let x = n2/n, y = n5/n, and z = n4/n, so that x, y, z ≥ 0 while x + y + z = 1. Plot a profile’s (x, y) values in a Cartesian framework; the z value is z = 1 − (x + y). In this manner a profile is identified with point (x, y) in the Fig. 3b triangle T ; the closer the point is to the origin, the larger the z value. Conversely, any (rational) point in T defines a profile; e.g., point 1 1 1 1 1 1 7 ( 5 , 10 ) ∈ T corresponds to x = 5 , y = 10 , z = 1 − ( 5 + 10 ) = 10 , where 1 1 the common denominator 10 identifies a possible profile for ( 5 , 10 ) as being n2 = 2, n5 = 1, n4 = 7. (Thanks to the scaling factor, any positive integer multiple of these nj’s defines the same T point.) Thus the set of rational

7 points of T geometrically represent all profiles.

1 + (s − 1)(x + y) C y ...... A B ...... A C ...... 1 − x ...... 1 − y ...... z ...... B C . 5 ...... x ... x ...... y ... y ...... 24 ...... x . ABx 1 − x . x y + sz . a. Admitted types b. Triangle T

Fig. 3. Plotting profiles and outcomes Notice that T also has points with irrational components (i.e., points that cannot be expressed as a fraction). As we will see, this does not cause any difficulties. While only points with rational components, which are densely positioned in T , are identified with actual profiles, we can signifi- cantly simplify the analysis by treating the full triangle T —both the rational and irrational points—as the “generalized profile space.”4 Indeed, it is easy to tally (compute) positional and pairwise outcomes with irrational values. To find all possible profiles that support different outcomes, plot the elec- tion outcomes on the generalized profile space T . To represent the {A, B} 1 pairwise vote, notice that A beats B iff x > 2 : in Fig. 3b these generalized 1 profiles are to the right of the x = 2 vertical dashed line in T . Similarly, B 1 beats C iff y > 2 : these generalized profiles are above the horizontal dashed 1 line in T . The final A C relationship also has x > 2 . Consequently, the pairwise elections divide the generalized profile space T into three major regions—the square and the two smaller triangles—where election outcomes are identified by type numbers.

2.2.1 First results To illustrate what kinds of results follow immediately from this simple ge- ometry, consider the impartial anonymous culture (IAC) assumption popu- larized by W. Gehrlein, P. Fishburn, and others. With T , this assumption

4One of many interpretations of these irrational points comes from voting theory and economics literatures with an infinite number of agents. The T irrational points describe divisions with an infinite number of voters.

8 merely means that each (x, y, z) point (with rational components) is equally likely. With our extension, assume that all points in the generalized profile space are equally likely. Using IAC, probabilities correspond to the areas of appropriate regions. For instance, by using the areas of the square and the two triangles in T , it follows that half of the generalized profiles have a type 4 outcome while a quarter each have type 5 and type 2 outcomes.5 Although actual profiles are identified with the rational points in T , everything still holds because, as shown next, the ratios of the number of rational points with the same common denominator (that is, with a specified number of voters) in different regions is approximated by the ratios of their areas. Moreover, agreement improves (surprisingly rapidly) as n, the number of voters, increases. To explain, notice that all possible n voter profiles are represented by k j (x, y) = ( n , n ) where k, j ≥ 0 are integers satisfying j + k ≤ n. To locate these points, draw the vertical and horizontal lines passing, respectively, j through x, y = n , j = 0, 1, . . . , n; the vertices of this grid within T are equally distributed and identify all possible n-voter profiles. The equal dis- tribution requires the ratio of number of vertices in two regions to closely match the ratio of the areas of the regions. By adding voters, the refined grid improves agreement.6 For any n, then, expect about half of the frac- tional profiles to have a type 4 outcome while about a fourth each to have type 2 and 5 outcomes. It is interesting how this geometry of simple regions replaces the computational complexities that have troubled this research area by offering an intuitive, easily computed, and surprisingly accurate likelihood estimate. methods are graphed in a similar manner; i.e., to find all outcomes, plot the following equations defining A ∼ B,A ∼ C, and B ∼ C ties for a specified ws. According to Fig. 3a, they are given, respectively, by

(1+s)x+(s−1)y = s, (2−s)x+(1−s)y = 1, (1−2s)x+(2−2s)y = 1−s (2) The Fig. 4a triangle represents the plurality vote (s = 0) where the dashed lines identify generalized profiles with tied plurality outcomes while the type

5The likelihoods associated with other choices of profile distributions, such as a normal distribution, are determined by integrating the probability distribution over the different regions. 6In part, agreement occurs because as n increases, the number of profiles causing tie votes becomes negligible. The same argument allows other smooth probability distribu- tions to be used because the grid essentially generates the Riemann sum defining the integrals.

9 numbers describe the six (strict) plurality rankings: the dotted lines repro- duce the pairwise outcomes.

...... B . BA . C B . A BA ...... 6 ...... 5 ...... 5 ...... 1 ...... 6 ...... •••...... •••...... C B ...... 1 ...... 4 ...... 2...... 4. . 233...... 4 . 3 ...... C B ...... A CA CA. C 1 a. s = 0 b. s = 2 c. s = 1

Fig. 4. Computing ws outcomes New results are now surprisingly easy to find. One approach is to note where different rules have different outcomes. An example is the interesting conflict where a type 4 pairwise ranking C B A (given by a generalized profile in the square) can be accompanied by any of the thirteen possible plurality rankings. Creating an illustrating profile for a particular behavior 2 1 is easy: just select an (x, y) point in an appropriate region; e.g., ( 5 , 3 ) is in 4 the square and in the “1” region, so z = 15 . Multiplying by the common denominator of 15 creates the integer profile n2 = 6, n5 = 5, n4 = 4, which, by construction, has a C B A (type 4) pairwise outcome and the conflicting A B C (type 1) plurality ranking. “Likelihood statements” again follow from the geometry. For instance, the areas of the triangle with a type 3 plurality outcome and the square are, 1 1 respectively, 12 and 4 . This elementary computation shows that with a C B A (type 4) pairwise outcome, the (conditional) likelihood of a C A B (type 3) plurality outcome is the relative area of these plurality outcomes 1 1 1 in the square, or 12 / 4 = 3 . Again, using a similar simple area computation 1 we find that if C is the Condorcet winner, then with probability 3 the plurality winner is someone else. However, if either A or B is the Condorcet winner, then, with certainty, that candidate is the plurality winner. 1 The (s = 2 ) and antiplurality (s = 1) rankings are de- picted, respectively, in Figs. 4b,c. Notice the dramatic decrease in conflict between the pairwise and positional outcomes; e.g., no longer can all pos- sible positional outcomes occur. Also, if B or C is the Condorcet winner, then that candidate is the Borda winner. But when A is the Condorcet

10 winner, the Borda winner could be A or C. According to the geometry, if 1 A is the Condorcet winner, the likelihood C is the Borda winner is 3 . The antiplurality vote always conflicts with the Condorcet winner if it is A or B. A small sample of possible results follows.

Theorem 1 (Saari and Valognes, 1998) With Fig. 3a type profiles, the plu- rality vote allows all thirteen rankings to accompany the pairwise C B A ranking. If all profiles in T are equally likely, then, given that the pairwise rankings define C B A, we have that C is plurality bottom ranked with 1 (conditional) probability 6 . In contrast, the Borda Count allows only four strict rankings and the antiplurality method allows only two.

2.2.2 Same profile, different rankings A way to discover other fascinating conflicts is to trace how the location of the completely tied ws election outcome changes with s. To explain the advantage of doing so, notice that with only a slight change in a tied outcome, any other ws outcome can be created. So by tracing how the ws completely tied outcome changes with s, we discover how all other ws election rankings change with s. Consequently, we now can identify how the election rankings for a specified profile vary with s. Changes must occur 1 1 because (see Fig. 4) the s = 0 completely tied point is at ( 3 , 3 ), it moves 1 1 1 to the T boundary at ( 2 , 2 ) when s = 2 , and it vanishes at infinity when s = 1. As the completely tied election point is the intersection of the A ∼ B and B ∼ C boundary surfaces, solving these algebraic equations shows that these tied points satisfy

1 + s 1 − s + s2 (x, y) = ( , ), 0 ≤ s ≤ 1. (3) 3 3(1 − s)

This curve is plotted in Fig. 5a along with the (s = 0) plurality rankings. The accompanying Fig. 5b magnified version displays the location of the tied 1 1 point for s = 4 while the dashed lines indicate the s = 4 ranking regions.

11 ...... A C ...... A B ...... 5. 6 ...... A B ...... 1 ...... 4 ...... 5... 6 ...... •• ...... 4 ...... 1 ...... 23 ...... C B ...... 23 ...... A C ...... C B ...... 1 a. s = 0 .. b. s = 4

Fig. 5. Locus of the completely tied points To extract results from Figs. 5a, b, first notice that with the positioning of the dashed lines within the plurality ranking region one, we now know from Fig. 5b that there are many profiles with the plurality A B C 1 (type one) outcome, but where the (1, 4 , 0) outcome can be any of the thir- teen ways to rank the three candidates—positional methods can significantly 1 disagree! (Observe from Fig. 5b that this comment holds not just for s = 4 , 1 but for all positional methods given by 0 < s < 2 .) To extract more subtle conclusions from Fig. 5a, because the curve approaches infinity as s → 1, the lines representing generalized profiles with tied w1 outcomes must be vertical. This geometry does not hold for any other s value, so for all s < 1 at least two different ws strict rankings accompany the C B A pairwise outcomes. As the completely tied point leaves T only after the Borda Count, 1 for s < 2 profiles exist where any of the thirteen ws rankings can accompany the C B A pairwise outcome. To extract other information from the geometry, notice that a profile p located between the curve and the A ∼ B plurality line has the plurality ranking A B C. Increasing the s value moves the ws completely tied point along the curve: this forces different ws ranking regions to cross p. Thus p’s election outcome must vary with the procedure. For instance, the 1 magnified version of T in Fig. 5b displays the s = 4 ranking regions. If p 1 has a type 4 election outcome for s = 4 , then p already experienced type 1, 6, and 5 election outcomes for earlier s values. In other words, with this fixed profile p, each candidate “can win” by using an appropriate ws. By including tied election outcomes, the geometry shows that each generalized

12 profile in the region between the curve and the A ∼ B plurality boundary line admits seven different election rankings for different ws procedures. (A similar argument shows that any profile with a type-one plurality outcome that is below this curve also has seven different positional rankings but now each candidate is bottom-ranked with some ws.) Illustrating profiles are easily found by selecting (x, y) points from an appropriate region. For probabilistic results, since the area (from integration) between the 1 1 curve and the s = 0 boundary line for A ∼ B is 12 − 9 ln 2, the limiting probability of this peculiar behavior (where any candidate can “win”) is 1 twice as large (because the area of T is 2 ). Considering only profiles in 1 1 4 the square (with area 4 ), the limiting probability is 3 − 9 ln 2 ≈ 0.0253. A sample of the election behavior attributed to the Fig. 3a class of profiles follows.

Theorem 2 (Saari and Valognes, 1998) With profiles of the Fig. 3a type, the limiting probability each candidate can win by using an appropriate ws 1 2 method is 6 − 9 ln 2. Assuming the C B A pairwise ranking, the 1 4 (conditional) probability of this behavior where anyone can win is 3 − 9 ln 2. The election phenomenon where each candidate is bottom-ranked with 1 1 2 2 some ws procedure has limiting probability 6 −[ 6 − 9 ln(2)] = 9 ln 2 ≈ 0.1540. (When restricted to the profiles with C B A pairwise outcomes, the probability is 0.308.) For the plurality vote (s = 0) the limiting probability for each of the six 1 possible strict plurality outcomes is 6 . The Borda Count has four possible 1 strict outcomes; the limiting probability of a type 2 or 3 outcome is 6 , of 7 1 a type 4 outcome is 12 , and of a type 5 outcome is 4 . For the antiplurality vote, the limiting probabilities (with an increase in the number of voters) for 1 3 the type 3 and 4 outcomes are, respectively, 4 and 4 .

2.2.3 Other profiles and Black’s condition These results prove that surprising inconsistencies can arise even should the profiles be restricted to these three specified types. They are not alone; all profile classes with only three rankings allow conflict among the pairwise and positional election rankings. What simplifies the analysis is that while 6 there are 3 = 20 profile classes of this type, the actual number is sharply reduced by using symmetries; e.g., changing the names of the candidates at each Fig. 3 vertex does not change the theoretical conclusions. All cases are analyzed in (Saari and Valognes, 1998). When using this approach to study profiles with four specified rankings,

13 we discover, as special cases, surprising consequences of Black’s (1958) single peaked condition and Ward’s (1965) conditions that avoid pairwise cycles. (Triangle T is replaced with a tetrahedron.) Serious conflict among election rules occurs with profiles allowing only three rankings, so we must antici- pate even more incompatibility to arise by allowing an extra ranking. This happens; e.g., as developed in (Saari and Valognes, 1999), while Black’s con- dition avoids pairwise cycles, it permits surprising conflict among election rankings. This goes beyond the surprising differences in election outcomes that occur with Fig. 3a profiles, which satisfy Black’s condition. On the other hand, other kinds of profile restrictions do not admit as severe differ- ences. (See Saari (2005).)

2.3 The source of pairwise voting problems I now turn to the third geometric approach. Before describing cycles and other pairwise ranking mysteries, consider those pairwise voting issues that involve disjoint pairs. Examples include the many initiatives typically placed on California ballots, or the several bills facing a senator. One pair {A, B} may be a yes-no vote on a proposal for stronger criminal penalties while {C,D} is a vote to elect Charles or Diane as a criminal court judge. It is easy to find news accounts after an election stating how the out- comes reflect the “will of the people.” Is this true? In what sense? Does it mean that a majority of the voters approve the combined outcomes? The Anscombe (1976) paradox, which shows that a majority of the voters can be frustrated on a majority of the issues, proves that this need not be the case. Even worse; Brams, Kilgour, and Zwicker (1998) analyzed actual Cal- ifornia election initiatives and showed that not one person of the millions of voters agreed with all of the actual outcomes. To describe why this can hap- pen, another geometric approach (developed in (Saari and Sieberg, 2001)) is presented that identifies profiles with their pairwise outcomes; the resulting geometry provides new insights into the source of all of these problems. Suppose A beats B by receiving 60% of the vote. A way to represent this outcome is to use a line interval with A and B at the endpoints and place a point on the interval 60% of the way from B to A. This point represents both the profile and the outcome; it means that with 100% certainty, 60% of these voters prefer A to B. Now suppose these same voters vote over two pairs. Suppose in {A, B}, {C,D} pairwise elections that A beats B and C beats D by receiving, re- spectively, 60% and 55% of the vote. To represent this joint outcome, let the horizontal and vertical axes of a square depict, respectively, the A–B (where

14 A is to the right) and C–D (where C is on top) outcomes. In this manner, the bullet labelled q in the first quadrant of the Fig. 6a square represents the specified joint outcome.

(A B,C D) ν(B,C) = 1 ν(A, C) = 1 ...... 11 ...... •... (1, ) ...... 12 . . . . 3 ...... •. qR = (1, ) ...... 5 . . q ...... 4 ...... q ...... (1, ) ...... 6 ...... •...... •...... 1 ...... 1 ...... q = (0, ) •...... •. (1, ) L ...... 4 ...... 4 ...... (B A, D C) .. ν(A, C) + ν(A, D) ν(B,D) = 1 ν(A, D) = 1 a. Profile line b. All supporting profiles

Fig. 6. Profile representations The bullet means that, with certainty, 60% of the voters prefer A to B, and 55% of them prefer C to D. But what percentage of the voters approve of both outcomes? With 100 voters, does the joint outcome ensure a 55% approval because

• 55 of the voters approve of both outcomes,

• 5 prefer A B but disagree with the C D outcome,

• while the remaining 40 voters disagree with both rankings?

It need not because the same election outcome occurs should

• only 15 of the voters agree with the joint outcome

• while 45 of the 100 voters prefer A B but disagree with the C D outcome,

• and 40 of the voters prefer C D but disagree with the A B outcome.

In other words, even though both elections enjoy a sizeable victory margin, 85% of the voters could disagree with the combined decision. To geometrically identify all possible scenarios creating this joint election outcome, the following approach (Saari and Sieberg, 2001) shows how to superimpose profiles on their outcomes on the square. To do this, let ν(A, C)

15 be the fraction of all voters preferring A B,C D, and use similar notation for the other three choices. Thus, a profile becomes

p = (ν(A, C), ν(A, D), ν(B,C), ν(B,D)). (4)

Next, artificially divide the voters into the “rightists”—voters with pref- erences represented on the right edge as they all prefer A B, and the “leftists”—voters with their preferences represented on the left edge as they all prefer B A. Further divide the rightists according to how they rank C and D. As a “one-pair” election is represented by a point on a line segment, this division of the rightists defines a point on the right edge as indicated by qR in Fig. 6a. Similarly, the division of the leftist party is indicated by qL on the left edge. For example, the particular Fig. 6a choice of qR is above the midpoint of the right edge, so over half of the rightists support C D; the lower position of qL shows that most leftists prefer D C. A computation shows that the general representation of these points is

ν(B,C) ν(B,C) qL = (0, ν(B,C)+ν(B,D) ) = (0, 1−(ν(A,C)+ν(A,D)) ), ν(A,C) (5) qR = (1, ν(A,C)+ν(A,D) ).

The straight line connecting these two points, (1−t)qL +tqR, 0 ≤ t ≤ 1, passes through q. Indeed, the special case where t = ν(A, C)+ν(A, D)—this value is A’s vote in the {A, B} election—defines the joint election outcome

q = (ν(A, C) + ν(A, D), ν(B,C) + ν(A, C)).

In Fig. 6a, q is the intersection of the solid line connecting the endpoints and the dashed vertical line indicating the {A, B} pairwise outcome. Stated in words, a profile can be represented as a line segment with endpoints on the left and right edges of the square and a distinguished point on the line. The distinguished point identifies how the voters split in the {A, B} election, and it designates the joint election outcome. Call the line segment with its distinguished point a profile line. Crucial to our discussion is that the converse is true. By this I mean that each line segment with these properties defines an Eq. 4 profile: the line segment is a profile line for some profile. To find the profile, notice that the Fig. 6a line is defined by the 60% vote of A over B along with the endpoints 1 3 qL = (0, 4 ) and qR = (1, 4 ). As qR requires 75% of the rightist to prefer C D, we have from their definitions that ν(A, C) = (0.6)(0.75) = 0.45 while ν(A, D) = (0.6)(0.25) = 0.15. Similarly, 25% of the leftist prefer C

16 D, so ν(B,C) = (0.4)(0.25) = 0.1 while ν(B,D) = (0.4)(0.75) = 0.3 Thus, this Fig. 6a profile line represents the profile

(ν(A, C), ν(A, D), ν(B,C), ν(B,D)) = (0.45, 0.15, 0.10, 0.30).

The important message is that all lines passing through q with endpoints on the side edges represent profiles with outcome q. Consequently, the set of all possible profiles with q as the pairwise outcomes corresponds to all lines in the shaded wedge of Fig. 6b. This set of profile lines, called the profile cone, identifies all possible profiles that support the specified pairwise outcomes. The cone’s geometry, which encompasses a surprisingly wide array of different profiles, graphically indicates why problems must arise with pair- wise elections. Stated simply, with at least two pairs, the set of profiles that support the joint outcomes can be surprisingly varied. An important point is that the pairwise vote ignores these variations (i.e., it cannot distinguish among the different profile lines), so the pairwise vote dismisses the consid- erable amount of available and useful information that distinguishes among these significant profile differences.

2.3.1 Certainty estimates Now that we can identify all profile lines (hence, all profiles) that support a specified outcome, we can determine the likelihood of any of many events. A natural question, for instance, is to determine the likelihood that certain percentages of the voters accept both outcomes. Geometrically, this involves comparing the relative abundance of those profile lines that satisfy a specified event relative to the full cone (Saari and Sieberg, 2001). To illustrate this approach with the Fig. 6 outcome, consider the seem- ingly modest conjecture that at least 55% of the voters prefer the joint A B, C D outcome. A way to analyze this claim is to compute the likelihood that it is correct. To start, notice that only rightists support A B, so this conjecture holds only if a certain fraction of rightists sup- port C D. Namely, at least y of the rightist, given by y(0.60) ≥ 0.55, 11 or y ≥ 12 , must prefer C D. According to Fig. 6b, precisely one pro- file satisfies this constraint. Consequently with a smooth probability profile distribution, 55% level of support has essentially zero likelihood of occurring. A more relaxed condition is to claim that at least half of the voters support both outcomes: this requires finding the likelihood that ν(A, C) > 1 2 . The same analysis shows that y of the rightists, where y(0.60) ≥ 0.50, 5 or y ≥ 6 , must support C D. In Fig. 6b, the relatively small size of this profile subset—the heavier shaded region in the profile cone—provides

17 a graphical indication that, while the assertion may seem reasonable, in fact it is unlikely to have half of the voters supporting both outcomes. Actual likelihood values require specifying a probability distribution for the profile lines. A first choice is to assume that each profile line is equally likely. Here, the likelihood (for details, see Saari and Sieberg, 2001) is the length of the heavily shaded portion on the right edge divided by the length 11 5 11 1 1 7 of the shaded portion on the right edge; it is only [ 12 − 6 ]/[ 12 − 4 ] = 8 . When the more realistic binomial probability distribution is used, however, the likelihood becomes negligible. (To justify this distribution, consider two profile lines that support the same q where one line has 2 out of the 50 rightist voters preferring C to D while the other one has 20 of the 50 rightists preferring C to D. A neutral way to determine which profile line is more likely to occur is to determine the number of ways to select 2 of 50 voters for the first line, and 20 of 50 voters for the second line. This leads 1 to the binomial distribution with a p = 2 probability for either selection.) The following samples the available results.

Theorem 3 (Saari and Sieberg, 2001) For two pairs, suppose the winning 1 alternative wins with the majority mj of the vote, mj > 2 , j = 1, 2, where m1 ≥ m2. Assume all profiles, as represented by their endpoints on an edge, are equally likely. The likelihood that at least α of all voters prefer both outcomes is the smaller of unity or m − α Prob(α) = max( 2 , 0). (6) 1 − m1 Similarly, the likelihood that at least β of all voters dislike both outcomes is 1 − m − β Prob(β) = max( 1 , 0). (7) 1 − m1 To use this theorem to indicate the surprising loss of information about profiles that occurs with two pairs, recall that with certainty the winning alternative of a single pair enjoys at least 50% of the vote. But to say with certainty that at least 50% of the voters prefer both outcomes, we must have m1 + m2 ≥ 3/2. (Use Eq. 6 with α = 1/2 and Prob(α) = 1.) But m1 ≥ m2, so the winner of the first election must receive over 75% of the vote and the winner of the second election must have nearly as strong a victory. In other words, to assert with “certainty” that at least

7As with likelihood estimates on T , intuition can be developed by dividing the larger length on the edge into R equal portions where R is the number of “rightists.”

18 half of the voters support both outcomes, we need overwhelmingly strong pairwise votes. With more common election outcomes such as m1 = 0.53 and m2 = 0.52, Eq. 6 establishes the likelihood that half of the voters (α = 0.5) approve of both outcomes is the negligible 0.02/0.47 ≈ 0.04. These disturbing results become significantly more discouraging when using the more realistic binomial distribution for profile lines. By setting Prob(α) = 1 in Eq. 6, we obtain the interesting relationship that, with certainty, only the small proportion of α = m1 + m2 − 1 of all voters approve of both outcomes. Using the Fig. 6a example, we can say with certainty only that 0.60 + 0.55 − 1, or 15% of the voters, prefer both outcomes. For m1 = 0.53, m2 = 0.52, all we can say with certainty is that at least 5% of the voters prefer both outcomes. Similarly, Eq. 7 determines the fraction of all voters not liking either result. Using m1 = 0.52, m2 = 0.51, Eq. 7 shows that the likelihood at least β of all voters disapprove of both outcomes is [0.48 − β]/0.48. To illustrate this expression with numbers, there is about a 50% chance that 24% of the voters disapprove of both outcomes. Other relations, and how this analysis applies to “bundled voting,” are described in (Saari and Sieberg, 2001).

2.3.2 Several pairwise votes When two pairs are voted upon separately, the above argument proves that only a few of the voters may approve of both outcomes. Thus, any sense that the pairwise votes accurately reflect the views of most voters over all outcomes becomes highly doubtful. New, more accurate interpretations are needed. (Also see Saari, 2004.) Among the interpretations advanced in (Saari and Sieberg, 2001), a con- venient one is repeated here. To describe the idea, notice from the Fig. 6b profile cone that, for most profile lines supporting q, ν(A, C) has the largest value of the four profile components. This suggests that the principal role of q is not to identify which joint outcomes most voters prefer, but to identify which profile entry has the largest value for the largest number of supporting profiles. For one pair, this statement supports our common expectation; for several pairs, this observation directly conflicts with popular interpretations.

Theorem 4 (Saari and Sieberg, 2001) Suppose the probability distribution 1 over profiles is either the uniform or the binomial distribution with p = 2 . For two or more pairs, the combined pairwise outcome agrees with the com- ponent with the largest value that occurs with the largest number of profiles.

19 A tendency found in the literature, probably inherited from the prop- erties of a single pair, is to assume that the combined pairwise elections reflect most voters’ views. This is true for one pair. With two pairs, a pro- file has four components, so, according to Thm. 4, the best we know is that 1 1 the dominating component is greater than 4 ; not 2 . This means that the largest number of profile lines have more than a fourth, not a half, of the voters agreeing with the combined outcome. (Added support comes from Thm. 3 where with m1 = 0.52, m2 = 0.51, there is about a 57% chance that at least a quarter (α = 1/4) of all voters approve both outcomes.) With N 1 N pairs, we only know that the dominating component is larger than ( 2 ) ; so, with N = 3 or 4 pairs, we just know, respectively, that the largest number 1 1 of profile lines have at least 8 or 16 of the voters approving all outcomes. Returning to the motivating example of California, if there were N = 22 issues on the ballot, all we can expect is that the combined outcome is sup- 1 ported by 4,194,304 of the voters; this result is compatible with what was reported earlier. In other words, it is unrealistic to expect the joint outcome of several pairs to enjoy much support. As the profile cone geometry proves, do not expect pairwise majority vote outcomes to reflect the properties of an actual profile; instead these outcomes reflect a statistical sense of all possible profiles that could give the same result. This interpretation plays a central role in understanding Arrow’s theorem.

...... 5 4...... 5 ...... 3...... 3...... E ...... 8 . . C ...... 8 ...... B ...... 6 .B ...... F 1 A A 12 A ...... DC...... CB...... a. Three separate pairs b. Pairs from {A, B, C}

Fig. 7. Three-pair cubes

2.3.3 More alternatives, and the “paradox of voting” More pairs require larger geometric dimensions. With three pairs, for in- stance, the square is replaced with a cube. While the profile representation for all higher dimensional settings is described in (Saari and Sieberg, 2001),

20 it suffices for this survey to point out what now should be obvious: the cost of adding pairs for a majority vote is the significant loss of information about the profiles. With one pair, with certainty over half of the voters support the winning outcome. With two pairs, all we can say with certainty is that some voters support both outcomes. But with three or more pairs, the geometry admits profiles where no voter supports the combined election outcome. To understand this phenomenon, remember (Thm. 4) that the combined election outcome q need not have much to do with the actual supporting profile. This comment can be illustrated with the Fig. 7a cube that repre- sents three pairs. If vertex 8 is the closest to the combined outcomes q, then q’s joint election outcome is A B,C D,E F . Again, this ranking just identifies which profile component has the largest value for the largest number of profiles supporting q. Rather than representing the wishes of the voters from the actual profile, the outcome merely identifies a statistical property of an associated set of profiles. With the geometry of the cube, it is easy to create profiles where no voter agrees with all outcomes. A simple choice is to equally divide the voters’ preferences among the rankings represented by vertices 1, 3, and 5; the resulting q outcome, which is in the center of the triangle defined by these vertices, has the vertex 8 ranking. Namely, a profile where each of A B,C D, and E F wins with a landslide 2/3 vote is Vertex Fraction {A, B} {C,D} {E,F } 1 1 3 A B D C E F 1 (8) 3 3 B A D C F E 1 5 3 A B C D F E yet none of the voters supports all three outcomes. Again, rather than reflecting the specific and actual input, the procedure delivers an outcome that is statistically consistent with most profiles supporting q. This geometry identifies the source of all possible three-pair voting diffi- culties (Saari and Sieberg, 2001); this includes the Anscombe (1976) paradox (Nurmi, 1999) where a majority of the voters can be frustrated on a majority of the issues, Ostrogorski’s (1970) (Nurmi, 1999) questioning the meaning of the “dominant party” because the choice can change by emphasizing the party where most voters like most of its stands on issues, or the party that wins elections over most of the issues, and even conflicts that arise with propositional logic as developed by List and his coauthors (e.g., see List and Pettit, 2002). An interesting fact is that all examples illustrating any these behaviors, including the insightful one Nurmi (1999) created to show

21 that the Ostrogorksi and Anscombe behaviors differ, must involve the above geometric structure (Saari and Sieberg, 2001, Saari, 2001a). To illustrate with a List-type example, where each of the three judges believes that for a person to be guilty, he must have committed both of two other events. The outcomes are Judge #1 #2 Guilty? 1 Y es No No 2 No Y es No 3 Y es Y es Y es Majority Outcome Y es Y es No where the majority vote on each event is “Yes,” yet the majority vote on the outcome is “not guilty.” This example is captured by Fig. 7 by identifying an event one ”Yes” with A B, an event two “Yes” with D C, and a not-guilty verdict with F E. Let me now turn to an important application. Instead of three disjoint pairs, suppose that the three pairs are defined by the three alternatives {A, B, C} as indicated in Fig. 7b. Here, vertices 1, 3, 5 represent the transitive rankings given by the type numbers while vertex 8 corresponds to the cyclic ranking A B,B C,C A. In this manner Eq. 8 now becomes the Condorcet triplet of A B C,B C A, C A B. In other words, and as explained next, this geometry identifies the elusive source of the “paradox of voting.” The point is that even if we require each individual’s pairwise rankings to satisfy a constraint of transitivity, nothing has been added to the voting rule to empower it to distinguish whether the data comes from transitive rankings or from the Fig. 7a setting where the data consists of disjoint pairs. Thus the election rule’s outcomes must be interpreted in the same statistical sense as described above and in Thm. 4. Namely, rather than manifesting the actual input, the pairwise voting cycle indicates that this cyclic ranking is the most frequent data component over all data sets that support q: this interpretation is true whether or not the “other data sets” satisfy desired constraints such as transitivity. The bothersome cyclic outcome caused by the Condorcet triplet, then, reflects the attempt of the pairwise vote to rep- resent even voters with cyclic preferences who do not exist in voting problems once we assume transitive preferences. The source of the problem is that the voting rule must ignore this assumption. A valuable message is that the explicit restrictions we impose on the data, or on profiles, will be ignored when they are not compatible with how a rule interprets data. In particular, with the Condorcet profile, the

22 pairwise vote dismisses the actual transitive preferences as an outlier—a highly unlikely (from the perspective of the rule) data set because no voter has cyclic preferences.8 Stated more strongly, the crucial information dismissed by pairwise vot- ing is that the voters have transitive preferences. As pairwise voting satisfies binary independence, we must suspect that this observation explains Arrow’s Theorem: it does. The issue is to show how Arrow’s binary independence condition inadvertently negates the assumption that the voters have transi- tive preferences: this is described in Sect. 3. As developed in (Saari 2001a), the same argument removes all mystery about Sen’s (1970) important min- imal liberalism theorem. Also see Saari and Petron (2004) where we show how this kind of analysis leads to the surprising counter-intuitive conclu- sion that rather than analyzing individual liberties, for many settings Sen’s seminal result more accurately models a dysfunctional society!

2.4 Finding other pairwise results The above description samples results that are consequences of connecting pairwise voting outcomes with their associated profiles. Still other kinds of “pairwise voting” conclusions can be obtained by knowing the geometry of all possible pairwise outcomes over N alternatives. To find this geome- try for three alternatives, as indicated in Fig. 7b, start with the cube and then cut along the dashed line to excise the tetrahedron with vertex 8. The reason these points are eliminated is that they represent pairwise outcomes that require data where some voter’s binary preferences define a cycle; they cannot be realized with transitive preferences. Similarly, the tetrahedron with the vertex diametrically opposite 8 is also dropped; this requires cut- ting along the plane through vertices 2, 4, 6. (See (Saari, 1994, 1995) for details.) What remains is the set of all possible pairwise outcomes that can be obtained with transitive preferences. While pictures cannot be drawn for more alternatives, a similar construction holds — starting with the or- thogonal or natural cube and then truncating appropriate regions to get the representation cube. (The representation cube includes as a special case the cube defined by the convex hull of the n(n − 1) points where a coordinate axis passes through the orthogonal cube.) The representation cube extends

8Three of the other four possibilities for the “paradox of voting” are where two voters have opposing transitive rankings and the third has this cyclic ranking. The last is where two voters have the cyclic ranking, and the last has the opposite cyclic ranking. So, four of the five possibilities, which determines the same outcome for pairwise vote, have the cycle as a component.

23 McGarvey’s result (1953), asserting that all possible pairwise rankings are possible, by providing precise information about all possible pairwise tallies. Knowing all possible pairwise normalized tallies (the rational points in the truncated, or representation cubes) for any number of alternatives al- lows us to find geometric representations for different choice methods. For instance, each representation cube has what I call the “transitivity plane” (Saari, 1999, 2000) defined by the remarkable feature that by knowing the tally for any two pairs {A, B} and {B,D}, we also know the tally for {A, D}; it is determined exactly in the manner how distances on a line between points a and b and between b and d determine the distance between a and d. In Fig. 7b, this plane connects the midpoints of the 1-2, 2-3, 3-4, 4-5, and 5-6 edges: it is parallel to the dashed lines. For any pairwise tally q in the representation cube, the associated Borda Count outcome (and normalized tally) is the point in the transitivity plane that is closest (with the usual Euclidean distance) to q. The Copeland out- come is determined by first finding the vertex of the orthogonal cube (the uncut cube) closest to q and then finding the point in the transitivity plane that is closest to this vertex. The Kemeny outcome is the ranking of the closest region to q that has a transitive ranking (where “closest” now is defined by the l1 distance, or the sum of the magnitudes of the coordinates), and so forth (Saari, 2000). (For definitions of these voting rules, see the nice descriptions Brams and Fishburn give in Chap. 4 of this series.) This geometry permits a fairly complete description of all possible Copeland and Kemeny (Saari and Merlin, 1996, 2000) outcomes that can occur as can- didates are dropped, when comparing these methods with other procedures, etc. As a flavor of newly discovered results, we now know that the Kemeny method always ranks the Borda winner above the Borda loser, and the Borda method always ranks the Kemeny winner above the Kemeny loser. (The pa- per (Saari and Merlin, 2000) extends in several ways many of the valued results about Kemeny’s method found by Le Breton and Truchon (1997) and by Young (1978, 1988).) Expect surprises. For instance, the Kemeny method handles pairwise rankings by finding the “closest” transitive ranking, and the Dodgson method finds a “winner” by finding the “closest” ranking with a Condorcet winner, so it is reasonable to expect the Dodgson winner to be Kemeny top-ranked. But this need not be the case: using the geometry of the higher dimen- sional representation cube, Ratliff (2001) proved that the Dodgson winner can be ranked anywhere within a Kemeny ranking, even last. Ratliff (2002) also showed that no relationship exists between the Dodgson winner and the Borda ranking. By generalizing Dodgson’s method to select a commit-

24 tee (a committee of size k is found from q by finding the nearest ranking where each of the k candidates is preferred to all of the remaining n − k candidates), Ratliff (2002a) found more surprising conclusions. As a sam- ple, different sized committees need not have anyone in common! Thus, the “best” committee of five selected by the Ratliff method need not include the “best” committee of two, or even the Dodgson winner. As indicated later, the mysteries about Dodgson’s, Kemeny’s, Copeland’s, Ratliff’s methods, differences between the Condorcet and Borda winners, agenda manipulation, Sen’s Minimal Liberalism Theorem, and all other pairwise voting phenomenon are due to the same effect that explains the “paradox of voting:” decision rules can lose so much information that they interpret certain profile combinations as representing the binary preferences of non-existing voters with cyclic preferences. As described next, this is the source of Arrow’s Theorem.

3 Geometry of axioms

Since “axiomatic choice theory” explores the relationships of different prop- erties for choice procedures, and since geometry displays relationships, we must expect that an appropriate use of geometry will help us understand different axiomatic consequences. To indicate how to do this, two differ- ent themes are described. The first considers Arrow’s Theorem, the second examines strategic behavior, monotonicity, and so forth.

3.1 Arrow’s Theorem Arrow’s Theorem is explored in two ways. The first approach uses geometry to identify how to extend this classic result; the second argument uses geom- etry to show why binary independence has the unintended effect of forcing the decision rule to act as though the voters have cyclic preferences. A flavor is provided; details and precise proofs can be found in the references.

3.1.1 Standard argument It is interesting how the geometry of the triangle captures Arrow’s require- ment that the voters’ preferences and the societal ranking are transitive. In Fig. 8a, for instance, the two large right triangles with a vertical leg corre- spond to A B and B A rankings; the vertical leg represents A ∼ B. The dashed B ∼ C line defines two right triangles representing B C and C B; the dotted line describes the three {A, C} binary rankings. The

25 intersection of different right triangles and lines define the appropriate tran- sitive relationships; e.g., the intersection of the A B and B C triangles is strictly within the A C region as it should be. Likewise, the intersection of the A C triangle with the A ∼ B line lies in the B C region. This geometry is basic in recapturing Arrow’s seminal result.

voter 2 ...... BC ...... •...... •. 54 ...... C ...... 3 ...... PC...... •. . •...... A ...... C BC ...... • ...... • 6 ...... B ...... 2 ...... NC ...... voter 1 A C ...... B C ...... AB...... •...... •...... A BB1 A 12 A a. Changing preferences b. Cyclic voters?

Fig. 8. Geometry of Arrow’s Theorem Arrow’s theorem admits rules that need not respect anonymity, so in this section a profile lists each voter’s transitive ranking of the alternatives; there are no restrictions on the choice of the rankings. For convenience of exposition, I use strict rankings where there is no indifference. The decision rule must satisfy the weak Pareto condition; i.e., when everyone has the same relative ranking of a pair, that is the pair’s societal ranking. The next condition, binary independence (or Independence of Irrelevant Alternatives) extends the Pareto assumption from its unanimity setting to require each pair’s societal ranking to be determined strictly by each voter’s relative ranking of the pair. So if p1 and p2 are any two profiles where each voter’s ranking of some pair, say {A, B}, is the same, then the {A, B} societal ranking is the same for both profiles. The societal outcome is to be transitive: ties are allowed. By using geometry I indicate why we must expect Arrow’s conclusion that the only rule satisfying these conditions for three (or more) alternatives is a “dictator;” i.e., there is a specified voter whose preferences always agree with the societal ranking independent of the preferences of any other voter. The main role of the Pareto condition is to ensure there are at least

26 two rankings of each pair. This fact is used to prove there are choices of preferences where, for any specified pair, some voter can change the outcome. To show this, select profiles p1 and p2 with different societal rankings of the pair. Start with p1 and successively change voter preferences of this pair toward defining p2. (When changing the preferences, the rankings of pairs other than the specified one may change. However, binary independence ensures that only the voters’ pairwise rankings of this particular pair affect the pair’s ranking.) At some stage, the outcome has to change. Thus, there is a profile p0 and a voter so that when this voter changes his ranking of the pair, the pair’s societal ranking changes. Do this for each pair and all possible starting arrangements and ways to determine who moves in what order. This construction defines all profiles and all voters (for each pair) where a change in the specified voter’s preferences changes the pair’s societal outcome. If the same person always changes the ranking for each pair in all settings, then it is easy to show (by using Pareto) that the person is a dictator. (All we really need is that the person is dominant in the sense that this person, and only this person, can change any pair’s societal ranking.) So suppose this is not the case. Suppose some one person is not dominant over all pairs. This means there are two pairs, say {A, B} and {B,C} and two different agents, say 1 and 2, so that with specified associated arrangements of the other voter preferences, when 1 changes {A, B} preferences, the {A, B} societal outcome changes, when 2 changes {B,C} preferences, the {B,C} societal outcome changes. As we will see, with binary independence we only need to know how all other voters rank these two pairs. Since all voters, other than 1 and 2, have assigned rankings for each of the two pairs, select for each of them a fixed transitive ranking that is consistent with this assignment. For voter 1 to affect the {A, B} ranking, voter 2 might need to have a particular {A, B} ranking. Either choice, indicated by the two Fig. 8a dashed arrows, allows voter 2 to change {B,C} rankings while satisfying the specified {A, B} constraint. Similarly, the construction for voter 2 assigns a specific {B,C} ranking for voter 1: voter 1 varies preferences between the two rankings indicated by the appropriate choice of one of the solid Fig. 8a arrows. This construction establishes that each agent is free to vary preferences of his assigned pair while keeping a fixed {A, C} ranking and a fixed ranking of the “other pair” (that is chosen to be the one required to empower the other agent). Thanks to binary independence, each agent can change the societal ranking of the assigned pair independent of the other agent’s actions. If the societal pairwise outcomes are strict rankings, choose the joint

27 societal outcome defined by two agents to be A B and B C; according to Fig. 8a, this forces the A C conclusion. Similarly, the reversed C B and B A setting forces a C A outcome. But all voters have fixed {A, C} preferences, so changing the {A, C} outcome violates binary independence and proves Arrow’s result. If “indifference” is a societal outcome, a similar geometric argument generates a similar contradiction.

3.1.2 Extensions The geometry identifies several immediate extensions. (See (Saari, 1991, 1995) for details.) For instance, the principal role of “Pareto” is to ensure there are at least two different societal outcomes for each pair, so it can be weakened. Wilson (1972) introduced a negative Pareto condition, but Pareto can be replaced with any condition that requires at least two societal outcomes for each pair. This much weaker condition can be satisfied, for instance, just by requiring one strict ranking and a complete tie as possible societal outcomes. As another extension, strict preferences are not needed; the geometry shows that a negative conclusion will follow even with “indifference” in in- dividual preferences as long as those settings where a voter changes a pair’s societal outcome does not require any other voter to be indifferent over the pair. As still another extension, notice that the argument only uses changes in societal pairwise rankings: there is nothing in carrying out the argument that requires the pair assigned to an agent to be the same pair being affected in the societal ranking. Thus, similar results occur when, say, an agent changing {B,C} preferences changes the societal {A, B} outcomes. More- over, this geometric argument can be applied to utility functions to provide a simple proof of the Kalai, Muller, and Satterthwaite theorem (1979) about public goods.

3.2 Cyclic voters in Arrow’s framework? A benign interpretation of Arrow’s (1952) and Sen’s (1970) theorems is developed in (Saari, 1998) and described in detail in (Saari, 2001a). As indicated in Sect. 2.3 and in (Saari and Sieberg, 2001), the basic argument is that whenever a decision rule is forced to concentrate on “parts” (e.g., binary rankings)—as required by Arrow’s IIA condition—the rule ignores the connecting information that the binary rankings define a transitive one. Namely, the rule can misinterpret certain profile arrangements (i.e., each voter has a strict transitive ranking) as being cyclic data (i.e., each voter

28 has a strict ranking of each pair, but the pairwise rankings need not be transitive). As with the pairwise vote, it turns out that in a statistical sense, the axiomatic assumptions force the procedures to favor the preferences of the non-existent cyclic data. By understanding the source of the difficulties, resolutions and extensions follow. Before outlining a geometric argument, a way to intuitively see how bi- nary independence drops information about transitivity is to inform you that among Sibelius, Beethoven, and Mozart, I definitely prefer Sibelius to Beethoven. Are my preferences transitive? It is impossible to say because transitivity is a condition relating the pairwise rankings over all three pairs; the information I provided about one pair is insufficient to answer this ques- tion. Now notice that Arrow’s “binary independence” condition requires a procedure to use only information about each particular pair; in ranking a pair, a procedure is not permitted to use information how the alternatives are related in other pairs so it is not permitted to use the information that these preferences are transitive. To provide a flavor of this argument in geometric terms (see Saari 2001a), I use Fig. 8b, which lists all eight rankings of three pairs; the vertex labels refer to the six transitive types while PC represents the “positive cycle” A B,B C,C A and NC represents the opposite cyclic preferences. Start from the earlier argument where voter 2’s change in {B,C} prefer- ences changes the pair’s societal ranking. As voter 2 must have a particular {A, B} ranking (to allow voter 1 to change the {A, B} outcome), his rankings change on either the front or back face of Fig. 8b (to satisfy either A B or B A). To keep the same {A, C} ranking (to derive the conclusion) on the front face, voter 2’s preferences must vary either between the two vertices on the bottom edge (this voter has transitive preferences) or the two on the top (this voter has cyclic preferences). A similar “edge” observation (to have a fixed {A, C} outcome) holds for the back face. Thanks to binary indepen- dence, the procedure is indifferent between the choices; i.e., the procedure is indifferent whether the voter has, or does not have, transitive preferences. The same argument applies to voter 1’s preferences on the side faces. The construction shows that there are four different choices (edges) for the two voters to vary their preferences; because of binary independence, the procedure cannot distinguish among them. But for three of the four possible choices, at least one voter has a cyclic ranking. Only one choice of edges defines a profile with transitive preferences for both voters. As with pairwise voting, the source of Arrow’s disturbing conclusion is that by emphasizing “parts,” as required by binary independence, the de- cision rule inadvertently dismissed the valuable information that the voters

29 have rational preferences. The same argument applies to Sen’s important result: here Sen’s minimal liberalism condition drops the assumption of transitivity. (While I refer the reader to (Saari 2001a) for a discussion of Sen’s result, an interesting feature developed there proves that all examples illustrating Sen’s minimal liberalism assertion must be associated with an “unanimity set of cyclic rankings.” By using this observation, it now be- comes trivial to construct all possible examples illustrating Sen’s theorem; even those involving as many interlinking societal cycles as desired. Also, see Saari and Petron (2004) for the interpretation that, rather than describing the rights of individuals, Sen’s result can be interpreted as where individuals impose hardship on others. Even stronger, as we show, with each Sen cycle, each agent suffers from the actions of others.)

3.3 Monotonicity, strategic behavior, etc. The remainder of this chapter returns to those voting rules based on po- sitional rules, so, again, a profile is a listing of the number of voters with each strict, transitive ranking of the alternatives for personal preferences. I now consider those social choice puzzles where surprising behavior can occur when preferences change. This includes strategic behavior, mono- tonicity, the Brams and Fishburn (1983) “no-show” paradox where a voter does better by not voting, and so forth. Geometry provides considerable insight into these concerns, and it demonstrates that many of these issues can be analyzed and explained in a surprisingly similar manner.

3.3.1 Strategic action To illustrate the approach (developed in Saari 1994, 1995), consider the Fig. 4a plurality vote setting where voter preferences come from only three types; this figure is reproduced as Fig. 9a where the bullet indicates a sincere election outcome slightly favoring A over B. Does any voter have both a strategic opportunity and the incentive to alter who is the plurality winner? To aid the discussion, the solid arrows in Fig. 9b indicate the directions a vote moves the outcome; the opposing dashed arrows indicate how the outcome changes if a voter of a particular type neglects to vote.

30 ...... B vertex ..... A ∼ B ...... Vote type 5 or 6 ...... A . A BB ...... toward the B vertex ...... 5 ...... 6 ...... • 1 ...... 4 ...... Vote type 1 or 2; ...... toward the A vertex ...... 23 ...... Vote type 3 or 4; ...... toward the C vertex ...... C vertex ... A vertex .. ... b. a. . ... Fig. 9. Finding strategic behavior To be precise, a plurality vote for a candidate moves the outcome directly toward the candidate’s vertex. This geometry coincides with intuition; by receiving more votes, the outcome should move toward the candidate’s ver- tex. (The actual direction of a change depends on the election point. For instance, if the outcome is on T ’s bottom edge, a vote for A moves the new outcome in a horizontal direction, but if the outcome is on T ’s top slanted edge, a vote for A moves the outcome in a diagonal direction toward this vertex.) Recall, the voters are restricted to types 2, 4, and 5. The analysis is simple; just check which options allow a voter to move the bullet in a desired direction. For instance, as a type 2 voter prefers A B, this voter’s best strategy is to vote sincerely. After all, by not voting as type 2, the outcome moves to the left (directly away from the A vertex) helping B. This voter has no incentive to vote for B, so the only option is to consider voting as though type 3 or 4; as this moves the outcome toward the C vertex, it offers no personal advantage. Similarly, a type 5 voter has no strategic options; not voting sincerely first moves the outcome downwards directly away from the B vertex followed by a strategic vote (given by one of the two other solid arrows); neither choice helps the type 5 voter. While type 4 voters’ favored C cannot win, these voters can assist their second ranked B. Not voting sincerely moves the bullet to the northeast, somewhat parallel to the dashed slanted arrow. By voting as though type 5 or 6, the outcome can move across the dashed line to elect B; whether this happens depends on the number of type 4 strategic voters and whether the sincere outcome is sufficiently close to the A ∼ B indifference line. As the geometry suggests, the Gibbard-Satterthwaite Theorem captures an interaction between allowable election outcomes and ways votes can change them: it is a “directional derivative” result. Using nontechnical

31 terms, two alternatives usually offer a voter only two ways to vote: one is sincere and the other is counterproductive. More alternatives, however, add ways votes can change the outcome: one is sincere, another counterproduc- tive, and some of the remaining options could be strategic—for someone. As the geometry indicates, whether the other options are strategic depends on the interplay between the structure of a rule’s outcomes and the profiles. To further illustrate this “interaction effect” with the Fig. 9a profile restriction, now consider the pairwise outcomes given by the dotted lines. Again, the arrows indicate how different votes change the outcomes. So by placing the bullet on either side of a dotted line, the geometry identifies all potential pairwise voting strategies. Notice, for instance, that the only way type 5 voters might change a type 2 outcome to a more favored type 4 is to vote as though type 4. Tracing the direction of arrows does not provide a conclusive answer, so the actual changes must be computed. (A simple approach is to make the changes in the Fig. 3a values.) By doing so, we find that the proposed change moves the outcome vertically; it cannot cross the boundary. More generally, it follows from this geometry that when pro- files have only these three specified types, there are no successful pairwise voting strategies for any voter and any profile. (Rather than contradicting the Gibbard-Satterthwaite Theorem, this assertion means that profiles al- lowing a strategic action must involve other voter types.) The conclusion differs from the plurality vote because of a geometric difference in interac- tion effects—the vertical and horizontal directions of pairwise tied outcomes cannot be crossed with strategic changes. Beyond providing insight about the important Gibbard (1973) – Sat- terthwaite (1975) Theorem, asserting that strategic settings exist, the ge- ometry allows us to identify all such settings, to determine who can be strate- gic, and to determine all strategies. Indeed, it now is a simple exercise to completely characterize all three-alternative (actually, all n-alternative) pro- files where a positional method can be manipulated. The formal approach involves nothing more than the dot product between vectors. (A simpler approach, developed after the original draft of this chapter was written, is described in (Saari 2003).)

Theorem 5 (Saari, 1994, 1995) For three candidates A, B, C, suppose the sincere outcome has A beating B. For a type j strategic voter to successfully elect B, the A B sincere outcome must be sufficiently close to a tie. Moreover, only a type four or six voter has a successful strategy by voting as though a type k voter (as specified next) where the choice depends on which ws = (1, s, 0) system is being used.

32 Sincere type s methods Strategy s methods Strategy 1 1 4 2 ≤ s < 1 5 s < 2 5, 6 (9) 1 1 6 0 < s ≤ 2 5 s > 2 4, 5

Using this geometric approach, Merlin and Saari (1997) characterize all strategic opportunities for Copeland’s method and (Saari 2001) finds all three alternative Approval Voting strategies. (This result contradicts asser- tions claiming that Approval Voting is “more sincere” than other procedures; see Chap. 4 of this book series and Sect. 4 of this chapter.) Saari and Merlin (2000) characterize all Kemeny method strategic opportunities and develop a measure (which applies to all procedures) to determine which strategies are “more effective.” For instance, in Eq. 9, is one of the two offered strategies more “effective” than the other? The geometric structure of strategic behavior leads to other conclusions. For instance, since all procedures can be manipulated, it is reasonable to question which procedures are least likely to allow a successful manipulation by a small fraction of the voters. This question is answered for positional methods in Saari (1990) for any numbers of alternatives; the counterintu- itive conclusion for three alternatives is reproduced in (Saari, 1994, 1995). While precise definitions are left to the references, think of the “level of susceptibility” as the number of profiles where a small number of voters can successfully manipulate the outcome; i.e., a positional procedure that allows fewer succesful strategic opportunities is less susceptible.

Theorem 6 (Saari, 1990, 1995) The positional method ws = (1, s, 0) that is least susceptible to a small, successful manipulation is the Borda Count. 1 As the value of |s − 2 | increases, so does the level of susceptibility of the positional method.

3.3.2 Monotonicity, no-show paradox, etc. It is interesting how the same geometry explaining strategic behavior essen- tially explains other puzzles from choice theory. This is because any issue that involves two or more profiles requires examining how the directional changes of profiles affect outcomes. “Paradoxes” arise when directional changes create unexpected outcomes. An intuitive presentation is given here; the interested reader can find formal descriptions and results (for any number of alternatives) in (Saari, 1994, 1995). For a recent, insightful survey of some of these issues, see (Nurmi 2001, 2002).

33 Figure 10a represents the plurality and pairwise outcomes for profiles with x, y, z = 1 − (x + y) voters of, respectively, types 1, 5, and 3. The pairwise outcomes define the four strict triangular regions described by the dotted lines. The top, lower left, right, and central regions have the respec- tive rankings of types 5, 3, 1 and the cyclic “A B,B C,C A.” For a runoff election, the bullet represents a A B C plurality election with cyclic pairwise outcome; thus A and B are advanced where A wins the runoff. But if a couple of type-five voters forget to vote (which moves the outcome directly away from the B vertex), the new outcome crosses the B ∼ C line to create the plurality outcome of A C B. Here, C beats A in the runoff. So, by not voting, these voters elect their middle, rather than bot- tom ranked, candidate. A slightly more general geometric argument shows that even after replacing the plurality vote with any ws, a runoff still has “no-show” paradoxes. According to the geometry, the “no-show” paradox describe settings where the first step of the strategic behavior (not voting sincerely) already forces a desired outcome...... A ∼ B ...... Vote type 5 or 6 ...... A ... A BB ...... toward the B vertex ...... 5 ...... 6 ...... 1 ...... 4 ...... •...... Vote type 1 or 2; ...... toward the A vertex ...... 23 ...... Vote type 3 or 4; ...... toward the C vertex ... a...... B ∼ C b...... Fig. 10. Other strange behavior

Earlier when comparing pairwise and plurality strategic actions, I showed that the geometric differences in lines of “tied election outcomes” for differ- ent voting rules can mean that they have different properties. Also, different rules can change the direction a vote change moves the outcome. These ob- servations suggest that different rules experience different kinds of “no show” effects: this is the case. For instance, Nurmi (2001) describes the strong no- show paradox as where, by not voting, voters elect their top-ranked candidate. Described in terms of “positive involvement,” the following result is proved in (Saari 1994, 1995).

Theorem 7 All three-candidate (1, s, 0)-runoff methods admit the “no-show” paradox. With the exception of the plurality vote (s = 0), all other ws-runoff

34 procedures allow the strong no-show paradox.

Monotonicity and other conditions are examined by checking whether profile changes cross unexpected boundaries. For instance (Saari, 1995), all ws runoffs violate monotonicity. As a final illustration of how to use geometry for these topics, the shaded area of Fig. 11 depicts all profiles that elect A with a plurality runoff when using the Fig. 10a type preferences. The upper left region are the profiles where A wins because the profiles define cyclic pairwise outcomes where A beats B; the region to the right is where A wins because she is the Condorcet winner. The “corner” cut out of this region generates other unexpected election phenomenon...... B A ... A B ...... 5 ...... 6...... 4 . .... •...... 1...... 23 ...... • ...... B ∼ C ...... Fig. 11. Other behavior To explain, suppose the profiles of two subcommittees using a plurality runoff to select a candidate are given by the two bullets in Fig. 11. As both bullets are in the shaded region, both subcommittees elect A. The profile for the full committee, created when the two subcommittees join together, is some point on the line connecting these two bullets; e.g., if the subcommittees are of the same size, the profile for the full committee is the midpoint of this connecting line. According to the graph, this creates a situation where each of the two subcommittees elect A, but when they get together and use the same procedure, the full committee elects C. A central point in many of these constructions is that those procedures, which involve more than one set of candidates, create additional bound- aries. But, the extra boundaries allow the method to suffer more sorts of paradoxes. Many of them are characterized in (Saari, 1995). The types of behavior allowed by Copeland’s and Kemeny’s methods are fully described, respectively, in (Merlin and Saari, 1997) and (Saari and Merlin, 2000a).

35 4 Plotting all election outcomes

Wouldn’t it be nice to be able to take a profile and then find all possible positional and Approval Voting outcomes? This is possible by using a rather elementary, simple geometric construction. While the construction is simple, the conclusions can be surprising. Beyond taking a profile and finding all outcomes, a challenging difficulty in social choice is to construct a profile with a desired behavior. For instance, how would one construct a profile with an A B C Borda outcome while the plurality ranking is the opposite C B A? If we could find all ways to construct such profiles, we would have a valuable tool that could describe variations in election outcomes. This approach, described in this section, turns out to be the converse of the way all election outcomes are found. C C ...... • ...... AB...... AB......

a. Procedure line; AV hull b. Finding relationships

Fig. 12. Procedure lines

4.1 Finding all positional and AV outcomes The Fig. 1a tallies always define a linear equation, or a straight line. To describe this line for a (A, B, C) profile, use the fraction of the total vote received by each candidate. This normalized tally can be plotted on the simplex {(x, y, z) | x + y + z = 1, x, y, z ≥ 0}. To geometrically find all ws outcomes, just plot the profile’s normalized plurality (s = 0) and antiplu- rality (s = 1) outcomes and then connect them with a straight line; this is the procedure line (Saari, 1995, 2001). The profile’s ws normalized tally is 2s 1+s of the way from the plurality to the antiplurality endpoint. To illustrate, profile (6, 20, 0, 10, 17, 7) has the tallies:

plurality tally = (26, 24, 10), antiplurality tally = (33, 40, 47). (10)

26 24 10 33 40 47 Connecting ( 60 , 60 , 60 ) and ( 120 , 120 , 120 ) creates the Fig. 12a procedure

36 line.9 According to the positioning of this line, this profile allows seven different election rankings (four are strict, three involve ties) with different ws methods; indeed, each candidate “wins” by using an appropriate ws. To find all AV outcomes, recall that a voter can vote for one or two candidates, so A can receive the eight different tallies between 26 and 33 from Eq. 10. With the 17 and 38 different tallies allowed for B and C, respectively, this single profile generates (8)(17)(38) = 5168 different AV normalized tallies. The actual AV tally depends on whatever personal forces motivate voters to vote for one or two candidates. Consequently, it is not difficult to create reasonable scenarios to rationalize each of these thousands of points as the “natural outcome.” To avoid plotting thousands of points, we only need to plot the eight normalized tallies formed by combining the Eq. 10 values. For instance, the (25, 40, 13) tally (where only B receives first and second place votes) 24 40 13 requires plotting ( 78 , 78 , 78 ). The set of AV outcome is the convex combi- nation of these vertices; it is the Fig. 12a shaded region. Notice: while this profile allows seven positional outcomes (counting ties) by using different ws, AV admits all 13 ways to rank candidates. (This behavior where the AV outcomes includes all possible outcomes is not unusual (Saari and Van Newenhizen, 1988).) For immediate results from this geometry, since the endpoints of the procedure line are included when defining the AV hull, we have that the admissible AV outcomes for a profile always include all positional outcomes as special cases. Consequently, any problem or questionable outcome with any positional method is a potential AV problem. Also the positional out- comes form a one-dimensional line segment in the two-dimensional AV hull, so it follows that with any natural probability or statistical assumption on voter preferences, it is highly unlikely for the AV outcome to agree with a positional outcome. The following illustrates another result that follows from this geometry: while described for three alternative, it holds for any number of them.

Theorem 8 If p is a profile where some ws method allows a voter to be strategic; then p also offers AV strategic opportunities. On the other hand, there are large sets of profiles that admit AV strategic settings for some voter, but no strategic opportunities for any positional method. The reason AV is more prone to strategic action than any positional method follows from the geometry. As the procedure line always is a subset of the AV 9To plot (x, y, z) on a (u, v) plane with a triangle bottom edge [0,1], use (u, v) = √ 1 3 ( 2 (1 + y − x), 2 z).

37 hull, if any positional outcome is near a tie allowing a strategic opportunity, then the AV hull also has these outcomes. However, the AV hull can have many strategic settings near a tie even though no positional method does. (See Saari, 1995, 2001).) For a practical application of procedure lines, since W. Clinton won the 1992 US Presidential election with less than a majority vote, it is reason- able to question whether one of his opponents, Bush or Perot, could have won with a different election method. By using the procedure line, Tabar- rok (2001) showed that Clinton’s support was more solid than previously believed: Clinton would have been victorious with any positional method. (Tabarrok shows that with AV, however, any candidate—Bush, Clinton, or Perot—could have won depending on whether and which voters voted for one or two candidates.) Tabarrok then argues that the procedure line pro- vides a useful tool to analyze whether a candidate is a solid choice of the voters. Elsewhere, Tabarrok and Spector (1999) used the higher dimensional version of this approach (Saari, 1992, 2001) to analyze the important 1860 US presidential election that precipitated the Civil War. The 1992 election provides an interesting setting to discuss AV outcomes. Start with the obvious fact that it is strategically irrational for an AV voter to vote for his second place candidate if he believes that this candidate could beat his top choice. For instance, a voter with Clinton Bush Perot preferences would not vote for Clinton and Bush as it would cancel any impact on the {Clinton, Bush} outcome. Thus, it is reasonable that only AV voters with Perot second ranked would vote for two candidates. If enough had done so, we would have had President Perot. As this example illustrates, Approval Voting has troubling properties.

4.2 The converse; finding election relationships Rather than using a profile to find the procedure line, the converse approach is to start with a line segment and determine whether it is the procedure line for some profile. The conditions the line segment must satisfy are surpris- ingly minor (Saari 1995, 2001). Other than having fractional values, each component of the normalized antiplurality vector must be at least half the corresponding component for the normalized plurality tally, and bounded 1 above by 2 . The last condition is that for any two candidates, the sum of the difference between twice their antiplurality value minus their plurality value is at least as large as the third candidate’s normalized plurality value. While this last condition is awkward to describe and understand, it al- lows just about any line segment reasonably near the indifference point to be

38 the procedure line for some profile. This is illustrated by the three choices given in Fig. 12b, which show that:

1. The bullet represents a degenerate line where both endpoints are placed at the same position. This positioning means that there exist profiles where all normalized ws tallies are identical. 2. The line to the upper left passes through seven ranking regions (three involve ties); in contrast with an earlier example, this line proves that a profile can have seven different ws outcomes where each candidate is bottom-ranked with some method.

3. The line passing through the complete indifference point shows that if one ws outcome is a complete tie, then there are two other rankings and they reverse each other (because they are in diametrically opposite ranking regions).

Other results are in (Saari, 1995, 2001). Finding a supporting profile. While drawing procedure lines illustrate different admissible behaviors, we may want an actual profile that illustrates a particular behavior. Let me illustrate how to do this by creating an ex- ample where each candidate can win with some positional method. A way to draw an associated procedure line is to have the plurality endpoint, with a A B C ranking near the A ∼ B line where the C value is small, and the antiplurality endpoint, with a C B A outcome near the C ∼ B line and the A value is small. According to the geometry (see Fig. 1), the line will pass through regions where B wins. 1 2 1 An example of such points has ( 2 , 5 , 10 ) as the plurality endpoint and 1 7 2 ( 4 , 20 , 5 ) as the antiplurality endpoint. (Notice that the coordinates satisfy 1 1 1 the earlier conditions; e.g., 4 ≥ 2 ( 2 ), etc.) To create a profile, first find a common denominator n for the plurality endpoint where 2n is a common denominator for the antiplurality endpoint—n will be the total number of voters. In the example, n = 10 suffices. Multiply the plurality and antiplu- rality points, respectively, by n and 2n to get integer outcomes. Here we have (5, 4, 1) and (5, 7, 8). Finding the profile involves using the reverse of the computational method described with Fig. 1. (The reader should consult Fig. 1b while reading about this construction.) For example, 5 points must be distributed be- tween regions 1 and 2, 4 points between regions 5 and 6, and 1 point in either region 3 or 4. These points must be distributed in a manner so that each candidate receives the correct antiplurality vote. The difference

39 between a candidate’s antiplurality and plurality tallies are “second-place” votes; they come from voters who have another candidate top-ranked. Thus, for A, 5 − 5 = 0 points must be distributed between regions 3 and 6, for B, 7 − 4 = 3 points must be distributed between regions 1 and 4, and for C, 8 − 1 = 7 points must be distributed between regions 2 and 5. This last condition for A means that there are zero points in regions 3 and 6, so all 4 plurality votes for B must be in region 5 and the single plurality vote for C must be in region 4. All that remains is to determine how to divide the 5 plurality votes for A. This also is easy: 3 points need to be distributed between regions 1 and 4, and region 4 already has 1 point, so region 1 must have the other 2. This leaves the remaining 3 points for region 2. Thus the ten-voter supporting profile for this procedure line is (2, 3, 0, 1, 4, 0). More candidates The same approach holds any number of candidates. As a sample of possible results, as the Borda Count vector always is the midpoint of all positional methods, it follows from the geometry that if A is not the Borda winner for a profile, then A is not the top ranked candidate for most positional methods. To provide the flavor of other results, there exist 10-candidate profiles that allow millions of different election rankings; the data (profile) remains the same, the different rankings result by changing the positional method (Saari, 1992). In a different direction, the procedure line plays a central role in a math- ematical technique developed by Saari and Tataru (1999) to find the likeli- hood of certain events. The idea is that the position of the procedure line’s endpoints determines the number of election outcomes that can occur. Thus, algebraic conditions defined by the location of these endpoints characterize all profiles with a specific number of election outcomes. By knowing the description of the profile subset, the likelihood can be determined. For instance, using central limit theorem arguments, we showed that the likelihood a profile will have more than one ws ranking is the surpris- ingly large 0.69. This Saari-Tataru technique has subsequently been used by Merlin, Tataru, and Valognes (2000, 2001) to determine the likelihood the winner will change with the procedure and the likelihood of Condorcet profiles. Other papers using the Saari-Tataru approach are Lepelley and Merlin (2001) and Tataru and Merlin (1997).

40 5 Finding symmetries — and profile decomposi- tions

Of particular importance, the geometry of voting suggests how to create a profile decomposition. By this I mean that a profile is divided into the dif- ferent components that cause differences in pairwise and positional rankings and tallies. The value of this decomposition is that it completely identifies which profile components create all pairwise paradoxes and conflict, which components cause all differences when candidates are dropped, and which components cause all differences in outcomes with changes in the positional election procedure. Consequently, this development allows us to determine all possible single profile paradoxes, create illustrating examples for any paradox, and determine all possible paradoxical behaviors associated with a given profile. Only basic notions are introduced here: the reader is referred to the references (Saari, 1999, 2000, 2000a, 2001a) for more information. To construct a decomposition that emphasizes a particular procedure, see (Saari, 2002). For expositional reasons, I limit this discussion to three alternatives, but everything extends to any number of alternatives. To start, notice that the two basic symmetries of the triangle used in the geometric profile represen- tation involve a 120o and a 180o rotation about the indifference point. As an example of a 120o rotation, which I call the Condorcet symmetry, start with A B C and keep moving 120o in the triangle to add the B C A and C B A rankings: this defines a Condorcet triplet. Because each candi- date is listed in first, second, and third place once, this profile has no effect on positional outcomes: indeed, it is arguable that the outcome should be a complete tie because no candidate has an advantage with their symmetric positioning. Yet, as explored earlier in this chapter (and by others, most notably Sen (1966)), this Condorcet triplet generates pairwise cycles. (Also see Zwicker, 1991.) As described earlier in this chapter, the unexpected explanation for the cycles is that the pairwise vote treats this Condorcet profile as representing the views of non-existent voters with cyclic binary rankings rather than the actual transitive preferences. To illustrate a 180o rotation—what I call —start with a ranking, say A B C, and then add the reversed ranking in the triangle C B A. As with the Condorcet triplet, the reversal nature of this configuration of directly opposing preferences suggests that it should create a complete tie. Indeed, pairwise voting respects this symmetry: it does require a completely tied outcome. With the sole exception of the

41 Borda Count, however, positional methods ignore this symmetry: this turns out to be the cause all of all possible positional differences. Indeed, the ws 1 outcome of (1, 2s, 1) means that the s < 2 procedures favor A and C over 1 B while the s > 2 methods favor B over A and C. These two symmetries may not seem to be particularly interesting, so it is surprising that they completely explain all three-alternative paradoxes with pairwise and positional voting; namely, these symmetries explain all of the difficulties and basic properties of all procedures using these out- comes. Consequently, any axiomatic representations characterizing differ- ences among procedures (e.g., Kemeny, Copeland, different positional meth- ods, etc.) must manifest these symmetries. How to use these symmetries to create profiles illustrating any possible paradox is described in (Saari, 1995); that they are the total explanation for three alternatives is described in (Saari, 1999, 2001a). The n alternatives symmetries are discussed in (Saari, 2000, 2000a). A way to demonstrate these symmetries is to create a “paradoxical” example where the Condorcet winner is A with pairwise ranking A B C, the Borda winner is B with ranking B A C and C wins with the plurality ranking of C B A. 0 + 0s C C z + y C ...... 0 ...... 0 2x ...... z . y ...... x ...... xx ...... 4 ...... 4 ...... x ...... y ...... 1. 3 ...... z ...... AB...... AB...... AB...... 1 + 3s 1 3 3 + s 2x x y + 2zs z + 2ys a. Initial profile b. Condorcet term c. Reversal terms Fig. 13. Constructing paradoxes The starting Fig. 13a profile has the specified B A C as the Borda and pairwise ranking. To change the pairwise rankings without affecting any positional outcomes, add the Fig. 13b Condorcet terms. The desired A B C pairwise rankings require x to satisfy the inequalities (add the Figs. 13a and 13b {A, B} and {A, C} pairwise tallies)

1 + 2x > 3 + x, 4 + x > 0 + 2x; this is satisfied by x = 3. Similarly, to change the plurality outcome without affecting pairwise or Borda rankings, add the reversal terms of Fig. 13c; the sum of plurality tallies (s = 0) of Figs. 13a and 13c show that the desired

42 C B A plurality outcome requires y and z to satisfy z + y > 3 + z, 3 + z > 1 + y. One solution is y = 4, z = 3. Adding the three components, profile (8, 0, 6, 4, 3, 6) has the desired properties. The following is a sample of new conclusions that follow from this char- acterization. 1. It is well known that the Borda Count ranks the Condorcet winner above the Condorcet loser. A new result is that the converse also is true; transitive pairwise rankings always rank the Borda winner above the Borda loser (Saari, 2000). Similar results using the decomposition show that the Borda Count strictly ranks the Kemeny winner above the Kemeny loser, and the Kemeny method strictly ranks the Borda winner above the Borda loser. (Saari and Merlin, 2000a, Saari, 2000.) 2. All pairwise ranking abnormalities, including Arrow’s and Sen’s sem- inal theorems, cycles, agenda manipulation, problems with tourna- ments, etc., are caused by the Condorcet component. (Saari, 2000, 2001a) For instance, Arrow’s and Sen’s negative conclusions do not hold with profiles that do not have any components in Condorcet di- rections. 3. Any difference between the Condorcet and Borda winners is strictly due to the Condorcet component. They occur because the pairwise vote misinterprets Condorcet collections of voter preferences as rep- resenting nonexisting voters with cyclic preferences. Notice how this argument casts doubt on the validity of the Condorcet winner and Condorcet principle as standards for election outcomes. (Incidentally, with a natural probability distribution over profiles, when a Condorcet winner exists, it is more likely for the Condorcet winner to agree with the Borda winner than disagree (Saari 1999, 2000, 2001b).) 4. The profile decomposition is unique (and defined by a matrix oper- ation). For three alternatives, graphs can be drawn displaying the precise profiles that create all differences in election outcomes (Saari, 1999, 2002). 5. Any difference in a Borda ranking of n candidates caused by adding or dropping a candidate is strictly due to the Condorcet term. Differences in any other positional method are caused by the Condorcet term, and by other profile symmetries.

43 6. As the decomposition describes symmetry aspects of profile informa- tion along with all paradoxes, it demonstrates that all of these para- doxes occur because of how different procedures use (or misuse) infor- mation from profiles.

7. By characterizing the properties of all profiles, it follows that the var- ious axiomatic characterizations describing differences among these procedures reflect properties of the decomposition.

6 Summary

This brief survey is intended to make the geometry of voting more intuitive by emphasizing approaches that, after described, should be transparent and applicable elsewhere, and to offer a sample of new results and techniques. My hope is that these results will suggest to the reader new ways to analyze other choice rules and problems. To suggest what else is possible, by combining geometry with other tech- niques such as concepts from dynamical “chaos” (Saari, 1995a), it becomes possible to list all possible paradoxical positional election outcomes that can occur with any number of candidates, any combination of positional proce- dures (for the different subsets of candidates) and any profile Saari (1989, 1995a). In other words, all possible election paradoxes now are known for all positional and pairwise methods, and, by extension, for all voting rules that use them. These conclusions identify a surprisingly large numbers of paradoxes. Conversely, if specified voting methods never admit certain kinds of elec- tion rankings, then the missing listings identify properties that these meth- ods satisfy; e.g., the Borda Count never ranks a Condorcet loser over a Condorcet winner. It is easy to convert properties specific to a rule into an “axiomatic characterization.” Because the Borda Count admits, by far, the smallest number and kinds of lists of election rankings over the different sub- sets of candidates (Saari, 1989, 1990a), so it enjoys the most “properties,” it now is possible to generate all sorts of new “axiomatic characterizations” for the Borda Count (Saari, 1990a, 1995a). This comment, suggesting how to find new “axiomatic” representations, leads to my concluding point. While the geometry of voting already has provided many new and different insights into several choice theory issues, this approach remains at an early stage of development: it can and should be extended in other directions.

44 References

[1] Anscombe, G.E.M. 1976. On the frustration of the majority by fulfill- ment of the majority’s will, Analysis 36, 161-168.

[2] Arrow, K.J. [1952] 1963. Social Choice and Individual Values 2nd. ed., Wiley, New York.

[3] Black, D., 1958, The Theory of Committees and Elections, Cambridge University Press, London, New York.

[4] Brams, S., D. Kilgour and W. Zwicker, 1998, The paradox of multiple elections, Social Choice & Welfare 15, 211-236.

[5] Fishburn, P. and S. Brams, 1983, Paradoxes of , Mathematics Magazine 56, 207-214.

[6] Gibbard, A., 1973, Manipulation of voting schemes: a general result, Econometrica 41, 587-601.

[7] Le Breton, M. and M. Truchon, 1997, A Borda measure for social choice functions, Mathematical Social Sciences 34, 249-272.

[8] Kalai, E., E. Muller and M. Satterthwaite, 1979, Social welfare func- tions when preferences are convex, strictly monotonic, and continuous, 34, 87-97.

[9] Lepelley, D. and V. Merlin, 2001, Scoring runoff paradoxes for variable electorates, Economic Theory 17, 53-80.

[10] List, C., and P. Pettit, 2002, Aggregating sets of judgments: an impos- sibility result, Economics and Philosophy 18, 89-110.

[11] Merlin, V. and D. G. Saari, 1997, Copeland method II: manipulation, monotonicity, and paradoxes, Jour Econ Theory 72, 148-172.

[12] Merlin, V., M. Tataru and F. Valognes, 2000, On the probability that all decision rules select the same winner”, Journal of Mathematical Eco- nomics 33, 183-208.

[13] Merlin, V., M.Tataru and F. Valognes, 2001, The likelihood of Con- dorcet’s profiles, Social Choice and Welfare.

[14] McGarvey, D. C., 1953, A theorem on the construction of voting para- doxes, Econometrica 21, 608-610.

45 [15] Nurmi, H., 1999, Voting Paradoxes and How to Deal with Them, Springer-Verlag, NY.

[16] Nurmi, H., 2001, Monotonicity and its cognates in the theory of choice, Department of Political Science, University of Turku, Turku, Finland.

[17] Nurmi, H., 2002, Voting Procedures Under Uncertainty, Springer- Verlag, NY.

[18] Ostrogorski, M., 1970, Democracy and the Organization of Political Parties, Vol. I, II, Haskell House, New York.

[19] Ratliff, T., 2001, A Comparison of Dodgson’s Method and Kemeny’s Rule, Social Choice and Welfare, 18, 79-89.

[20] Ratliff, T., 2002, A Comparison of Dodgson’s Method and the Borda Count, Economic Theory 20 357-372.

[21] Ratliff, T., 2002a, Some startling paradoxes when electing committees, Preprint; Dept. of Math., Wheaton College, Norton, MA 02766. To appear in Social Choice & Welfare.

[22] Saari, D. G., 1989, A dictionary for voting paradoxes, Journal of Eco- nomic Theory, 48, 443-475.

[23] Saari, D. G., 1990, Susceptibility to manipulation, Public Choice 64, 21-41.

[24] Saari, D. G., 1990a, The Borda Dictionary, Social Choice and Welfare 7, 279-317.

[25] Saari, D. G., 1991, Calculus and extensions of Arrow’s Theorem, Jour- nal of Mathematical Economics 20, 271-306.

[26] Saari, D. G., 1992, Millions of election outcomes from a single profile, Social Choice & Welfare 9, 277-306.

[27] Saari, D. G., 1994, Geometry of Voting, Springer-Verlag, Heildelberg.

[28] Saari, D. G., 1995, Basic Geometry of Voting, Springer-Verlag, Heildel- berg.

[29] Saari, D. G., 1995a, A chaotic exploration of aggregation paradoxes, SIAM Review 37, 37-52.

46 [30] Saari, D. G., 1998, Connecting and resolving Sen’s and Arrow’s Theo- rems, Social Choice & Welfare 15, 239-261. [31] Saari, D. G., 1999, Explaining all three-alternative voting outcomes, Journal of Economic Theory 87, 313-335. [32] Saari, D. G., 2000, Mathematical structure of voting paradoxes I: pair- wise vote, Economic Theory 15, 1-53. [33] Saari, D. G., 2000a Mathematical structure of voting paradoxes II: positional voting, Economic Theory 15, 55-101. [34] Saari, D. G., 2001, Chaotic Elections! A Mathematician Looks at Vot- ing, American Mathematical Society, Providence, R.I. [35] Saari, D. G., 2001a, Decisions and Elections; Explaining the Unex- pected, Cambridge University Press, NY. [36] Saari, D. G., 2001b, Analyzing a “nail-biting” election, Social Choice & Welfare 18 (2001), 415-430. [37] Saari, D. G., 2002, Adopting a plurality vote perspective; Math of Op- erations Research 27, 45-64. [38] Saari, D. G., 2003, Unsettling aspects of voting theory, Economic The- ory. 22, 529-556. [39] Saari, D. G. 2004, Analyzing Pairwise Voting Rules, pp 318-342 in Reasoned Choices, ed. M. Wiberg, Finnish Political Science Associa- tion, Turku, Finland. [40] Saari, D. G., 2005, The profile structure for Luce’s choice axiom, Jour- nal of Mathematical Psychology, 49, 226-253. [41] Saari, D. G. and V. Merlin, 1996, Copeland method I: Dictionaries and relationships; Econ. Theory 8, 51-76. [42] Saari, D. G. and V. Merlin, 2000, A geometric examination of Kemeny’s rule, Social Choice & Welfare, 17, 403-438. [43] Saari, D. G. and V. Merlin, 2000a, Changes that cause changes, Social Choice & Welfare, 17, 691-705. [44] Saari, D. G. and A. Petron, 2004, Negative Externalities and Sen’s Liberalism Theorem, IMBS working papers, IMBS, University of Cali- fornia, Irvine. (To appear in Economic Theory, 2006.)

47 [45] Saari, D. G. and K. K. Sieberg, 2001, The sum of the parts can violate the whole, American Political Science Review, 415-433.

[46] Saari, D. G. and F. Valognes, 1998, Geometry, voting, and paradoxes, Math Magazine 4, 243-259.

[47] Saari, D. G. and F. Valognes, 1999, The geometry of Black’s single peakedness and related conditions. Journal of Mathematical Economics 32, 429-456.

[48] Saari, D. G. and M. Tataru, 1999, The likelihood of dubious election outcomes, Economic Theory 13, 345-363.

[49] Saari, D. G. and J. van Newenhizen, 1988, Is approval voting an “un- mitigated evil?” Public Choice 59, 133–147.

[50] Satterthwaite, M., 1975, Strategyproofness and Arrow’s conditions, Journal of Economic Theory 10, 187-217.

[51] Sen, A., 1966, A possibility theorem on majority decisions, Economet- rica 34, 491-499.

[52] Sen, A., 1970, The impossibility of a paretian liberal, Jour. of Political Economy 78, 152-157.

[53] M. Tataru and V. Merlin, 1997, On the relationships of the Condorcet winner and positional voting rules, Mathematical Social Sciences 34, 81-90.

[54] Tabarrok, A., 2001, President Perot or Fundamentals of voting theory illustrated with the 1992 election, or could Perot have won in 1992? Public Choice, 106, 275-297.

[55] Tabarrok, A. and L. Spector, 1999, Would the Borda Count have avoided the civil war? Journal of Theoretical 11, 261-288.

[56] Ward, B., 1965, Majority voting and alternative forms of public enter- prises, in The Public Economy of Urban Expenditures, ed. by J. Mar- golis, John Hopkins Press, Baltimore.

[57] Wilson, R., 1972, without the Pareto principle, Journal of Economic Theory 5, 478-486.

[58] Young, H. P., 1988, Condorcet’s theory of voting, American Political Science Review 82, 1231-1244.

48 [59] Young, H. P. and A. Levenglick, 1978, A consistent extension of Con- dorcet’s election principle, SIAM Journal of Applied Mathematics 35, 285-300.

[60] Zwicker, W., 1991, The voters’ paradox, spin, and the Borda count, Math. Soc. Sci. 22, 181-227.

49