Visualizing Spatial Proximity of Search Algorithms

Mingxuan Sun Guy Lebanon Kevyn Collins-Thompson Georgia Institute of Technology Microsoft Research

Overview We introduce a new dissimilarity function for ranked lists, the expected weighted Hoeffding dis- tance, that has several advantages over current dissimilarity measures for partial rankings: it is (i) symmetric, (ii) interpretable with respect to search algorithms retrieving ranked lists of different lengths, (iii) flexible enough to model the increased attention users pay to top ranks over bottom ranks, (iv) computationally efficient, and (v) aggregate information over multiple queries in a mean- ingful way. We then use the measure with multi-dimensional scaling to visualize and explore rela- tionships between the different search engines. Such visualization is highly effective for obtaining insights on which search engines to use, what search strategies users can employ, and how search results evolve over time. Methodology and Experiments

The ranked list output by search algorithms si(q),i = 1,....l given a query q forms ordered lists hi1,...,iki of subset of the websites on the internet {i1,...,ik}⊂{1,...,n}. Different search algorithms may result in lists of different sizes k. The key for comparing ranked lists output by search engines is to define an appropriate dissimilarity measure ρ(si,sj). One problem with ex- isting dissimilarities, for example ones based on Kendall’s tau is that they are not symmetric, not computationally efficient, not interpretable, or do not distinguish between disagreement at top rank- ings and at the bottom rankings. In this paper we propose to define ρ using an expectation over the weighted Hoeffding distance, which provide a clear probabilistic interpretation and ensure that the measure is symmetric, compu- tataionally efficient, interpretable, and flexible enough to distinguish disagreement at the top and at bottom ranks. The measure is

ρ(si,sj )= E q∼Q{ρ(si(q),sj (q))} = E q∼Q E π∼S(si(q))E σ∼S(sj (q)){dw(π, σ)} (1) where dw(π, σ) is a weighted Hoeffding distance between permutations π, σ,

E π∼S(si(q))E σ∼S(sj (q)) is the expectation with respect to permutations π, σ that are sam- pled from the sets of all permutations consistent with the ranked lists output by the two search engines si(q),sj (q) (respectively), and E q∼Q is an expectation with respect to all queries sampled from a representative set of queries Q. In the absence of any evidence to the contrary, we assume a uniform distribution over the set of queries Q and over the sets of permutations consistent with si(q),sj (q). The weighted Hoeffding distance is equivalent to the earth movers distance and is the minimim amount of work needed to bring each item r from its rank in π to its rank in σ where n ′ moving an item from rank i to i + 1 costs wi. That is dw(π, σ) = Pr=1 dw(π(r), σ(r)) with ′ v−1 −q dw(u, v) = Pt=u wt assuming u < v. A monotonic decreasing weight vector, e.g., wt = t , t = 1,...,n − 1, q ≥ 0 correctly captures the fact that disagreements in top ranks should matter more than disagreements in bottom ranks. The expectation in (1) ensures that the measure is inter- pretabile and symmetric. We also show that it is computationally efficient with online complexity 2 O(k + l) (number of items ranked by si and sj ) and offline complexity O(n + max(k,l) ). We apply (1) with multi-dimensional scaling (MDS) for visualization. Figure 1 show the visual- ization over full ranking and partial ranking with n = 5. Figure 2 show visualization of 9 different search engines over 3 different query categories (each with 50 popular queries). Altavista, alltheweb, , and yahoo all form a tightly knit cluster. A similar tight cluster contains google and . The dendrogram in top right of figure 3 output by standard hierarchical clustering confirms the analy- sis above visually. The clusters do in fact mirror the technology relationships that have evolved in the industry. Figure 3 visualizes the sensitivity of search engines to popular query manipulation techniques and to the time in which the query was submitted.

1 12543 21543

12345 21345 21453 12534 12354 21354 12453 21534 12345 21345 12354 21354 12435 21435 12435 21435 12453 21453 12453 21453 12534 21534 12543 21543 12435 21354 21435 12354 12534 21534 12543 21543

12345 21345

15 4 4 14 41 51 45 1 4 14 15 42 41 45 43 54 41 12 13 43 45 1 42 54 12345 5 51 42 14 12 5 52 21 25 52 15 53 12345 24 24 54 1 21 51 25 24 13 52 5 12345 53 12 35 2 25 31 43 53 13 34 23 35 23 31 2 34 21 34 31 3 32 32 35 23 32 3 2 3 Figure 1: MDS embedding of permutations over n = 5 websites of full length (top row) and partial rankings of varying lengths (k varies) in the bottom row . The embeddings were computed using the weighted Hoeffding distance with uniform weight function wt = 1 (left), linear weight function wt = 1/t (middle) and quadratic 2 weight function wt = 1/t (right). Top: The permutations starting with 1 and 2 (colored in red) and the permutations starting with 2 and 1 (colored in blue) become more spatially disparate as the rate of weight decay increases. Bottom: The expected distance (1) separates ranked lists agreeing in their top rankings (denoted by different colors) better as the weights decay faster.

1. questions sports tourism 4 2. alltheweb 4 8 3. ask 8 4. google 6 5. lycos 9 7 6. live 5 7. yahoo 8. aol 6 9. 7 1 3 5 2 2 1 71 25 6

9 3 3 48 9

Figure 2: MDS embedding of search engine results over 3 sets of representative queries: questions, sports, and −1 tourism. The MDS was based on the expected weighted Hoeffding distance with linear weighting wt = t over the top 100 sites. Circle sizes indicate position variance with respect to within category queries.

celebrity

1. altavista 2. alltheweb 3. ask 4. google 5. lycos 1. altavista 6. live 2. alltheweb 7. yahoo 3. google 8. aol 4. lycos 9. dogpile 5. live 6. yahoo altav alweb lycos yahoo ask google aol dog msn 7. aol

0.6

0.4

0.2 1. to Day2

2. to Day6 0 1 2 3 4 5 6 7 Figure 3: Left: MDS embedding of search engine results over the query category celebrity with different query manipulations. Each marker represents a combination of one of the 9 search engines and one of the 5 query manipulation techniques: (a)w1w2, (b)w1 + w2, (c)“w1w2”, (d)w1 and w2, (e)w1 or w2. Middle: MDS embedding over 7 days for a set of temporal event queries. Yahoo cluster (yahoo, altavista, alltheweb, and lycos) shows a high degree of temporal variability, and in particular a sharp spatial shift on the third day. Right Bottom: The dissimilarity of Yahoo results over seven days with respective to a reference day. Right Top: Hierarchical clustering dendrogram for the nine search engines matches recent news in the commercial search engine sector (aol uses google technology, yahoo purchased altavista, alltheweb, and lycos).

2