Experiments with Kemeny Ranking: What Works When? Alnur Alia, Marina Meilab a1 Microsoft Way, Microsoft, Redmond, WA, 98052, U.S.A. bUniversity of Washington, Department of Statistics, Seattle, WA, 98195, U.S.A. Abstract This paper performs a comparison of several methods for Kemeny rank aggregation (104 algorithms and combinations thereof in total) originating in social choice theory, machine learning, and theoretical computer science, with the goal of establishing the best trade-offs between search time and performance. We find that, for this theoretically NP-hard task, in practice the problems span three regimes: strong consensus, weak consensus, and no consensus. We make specific recommendations for each, and propose a computationally fast test to distinguish between the regimes. In spite of the great variety of algorithms, there are few classes that are consistently Pareto optimal. In the most interesting regime, the inte- ger program exact formulation, local search algorithms and the approximate version of a theoretically exact branch and bound algorithm arise as strong contenders. Keywords: Kemeny ranking, consensus ranking, branch and bound, sorting, experimental evaluation 1. Introduction Preference aggregation has been extensively studied by economists un- der social choice theory. Arrow discussed certain desirable properties that a ranking rule must have and showed that no rule can simultaneously satisfy them all [2]. Thus, a variety of models of preference aggregation have been Email addresses:
[email protected] (Alnur Ali),
[email protected] (Marina Meila) Preprint submitted to Mathematical Social Sciences July 11, 2011 proposed, each of which satisfy subsets of properties deemed desirable. In theoretical computer science, too, many applications of preference aggrega- tion exist, including merging the results of various search engines [8, 14], collaborative filtering [26], and multiagent planning [16].