In Defense of Minhash Over Simhash

Total Page:16

File Type:pdf, Size:1020Kb

In Defense of Minhash Over Simhash In Defense of MinHash Over SimHash Anshumali Shrivastava Ping Li Department of Computer Science Department of Statistics and Biostatistics Computing and Information Science Department of Computer Science Cornell University, Ithaca, NY, USA Rutgers University, Piscataway, NJ, USA Abstract 1 Introduction MinHash and SimHash are the two widely The advent of the Internet has led to generation of adopted Locality Sensitive Hashing (LSH) al- massive and inherently high dimensional data. In gorithms for large-scale data processing ap- many industrial applications, the size of the datasets plications. Deciding which LSH to use for has long exceeded the memory capacity of a single a particular problem at hand is an impor- machine. In web domains, it is not difficult to find tant question, which has no clear answer in datasets with the number of instances and the num- the existing literature. In this study, we pro- ber of dimensions going into billions [1, 6, 28]. vide a theoretical answer (validated by exper- iments) that MinHash virtually always out- The reality that web data are typically sparse and high performs SimHash when the data are binary, dimensional is due to the wide adoption of the “Bag as common in practice such as search. of Words” (BoW) representations for documents and images. In BoW representations, it is known that the The collision probability of MinHash is a word frequency within a document follows power law. function of resemblance similarity ( ), while Most of the words occur rarely in a document and most the collision probability of SimHashR is a func- of the higher order shingles in the document occur only tion of cosine similarity ( ). To provide a once. It is often the case that just the presence or common basis for comparison,S we evaluate absence information suffices in practice [7, 14, 17, 23]. retrieval results in terms of for both Min- Leading search companies routinely use sparse binary Hash and SimHash. This evaluationS is valid representations in their large data systems [6]. as we can prove that MinHash is a valid LSH with respect to , by using a general inequal- Locality sensitive hashing (LSH) [16] is a gen- 2 S ity S . Our worst case analysis eral framework of indexing technique, devised for effi- S ≤R≤ 2 can show that MinHash−S significantly outper- ciently solving the approximate near neighbor search forms SimHash in high similarity region. problem [11]. The performance of LSH largely de- Interestingly, our intensive experiments re- pends on the underlying particular hashing methods. veal that MinHash is also substantially better Two popular hashing algorithms are MinHash [3] and SimHash (sign normal random projections) [8]. arXiv:1407.4416v1 [stat.CO] 16 Jul 2014 than SimHash even in datasets where most of the data points are not too similar to each MinHash is an LSH for resemblance similarity other. This is partly because, in practical which is defined over binary vectors, while SimHash data, often S holds where z is only is an LSH for cosine similarity which works for gen- R ≥ z slightly larger than−S 2 (e.g., z 2.1). Our re- eral real-valued data. With the abundance of binary ≤ stricted worst case analysis by assuming data over the web, it has become a practically im- S S shows that MinHash in- portant question: which LSH should be preferred in z ≤R≤ 2 deed−S significantly−S outperforms SimHash even binary data?. This question has not been adequately in low similarity region. answered in existing literature. There were prior at- We believe the results in this paper will pro- tempts to address this problem from various aspects. vide valuable guidelines for search in practice, For example, the paper on Conditional Random Sam- especially when the data are sparse. pling (CRS) [19] showed that random projections can be very inaccurate especially in binary data, for the task of inner product estimation (which is not the same as near neighbor search). A more recent paper [26] em- pirically demonstrated that b-bit minwise hashing [22] outperformed SimHash and spectral hashing [30]. In Defense of MinHash Over SimHash Our contribution: Our paper provides an essentially 1 conclusive answer that MinHash should be used for 0.8 near neighbor search in binary data, both theoretically and empirically. To favor SimHash, our theoretical 0.6 S analysis and experiments evaluate the retrieval results 0.4 2−S of MinHash in terms of cosine similarity (instead of S2 resemblance). This is possible because we are able to 0.2 show that MinHash can be proved to be an LSH for 0 0 0.2 0.4 0.6 0.8 1 cosine similarity by establishing an inequality which S bounds resemblance by purely functions of cosine. Figure 1: Upper (in red) and lower (in blue) bounds Because we evaluate MinHash (which was designed for in Theorem 1, which overlap in high similarity region. resemblance) in terms of cosine, we will first illustrate the close connection between these two similarities. While the high similarity region is often of interest, we must also handle data in the low similarity region, 2 Cosine Versus Resemblance because in a realistic dataset, the majority of the pairs We focus on binary data, which can be viewed as sets are usually not similar. Interestingly, we observe that (locations of nonzeros). Consider two sets W1, W2 ⊆ for the six datasets in Table 1, we often have = z S Ω= 1, 2, ..., D . The cosine similarity ( ) is R −S { } S with z only being slightly larger than 2; see Figure 2. a = , where (1) S √f1f2 Table 1: Datasets f1 = W1 , f2 = W2 , a = W1 W2 (2) | | | | | ∩ | Dataset # Query # Train # Dim The resemblance similarity, denoted by , is R MNIST 10,000 60,000 784 W1 W2 a NEWS20 2,000 18,000 1,355,191 = (W1, W2)= | ∩ | = (3) NYTIMES 5,000 100,000 102,660 R R W W f + f a | 1 ∪ 2| 1 2 − RCV1 5,000 100,000 47,236 Clearly these two similarities are closely related. To URL 5,000 90,000 3,231,958 better illustrate the connection, we re-write as WEBSPAM 5,000 100,000 16,609,143 R a/√f1f2 = = S (4) 0.8 0.4 R f1/f2 + f2/f1 a/√f1f2 z − − S MNIST NEWS20 1 0.6 0.3 z = zp(r)= √r +p 2 (5) √r ≥ 0.4 0.2 f2 f1f2 f1f2 1 Frequency Frequency 0.2 0.1 r = = 2 2 = 2 (6) f1 f1 ≤ a S 0 0 2 2.1 2.2 2.3 2.4 2.5 2 2.1 2.2 2.3 2.4 2.5 There are two degrees of freedom: f2/f1, a/f2, which z z make it inconvenient for analysis. Fortunately, in The- 0.5 0.4 orem 1, we can bound by purely functions of . NYTIMES 0.4 RCV1 R S 0.3 Theorem 1 0.3 0.2 2 0.2 S (7) Frequency Frequency S ≤R≤ 2 0.1 − S 0.1 Tightness Without making assumptions on the data, 0 0 2 2 2.1 2.2 2.3 2.4 2.5 2 2.1 2.2 2.3 2.4 2.5 neither the lower bound or the upper bound 2 S can z z S −S be improved in the domain of continuous functions. 1 0.4 URL 0.8 WEBSPAM Data dependent bound If the data satisfy z z∗, 0.3 ≤ where z is defined in (5), then 0.6 0.2 0.4 Frequency Frequency S S (8) 0.1 z ≤R≤ 2 0.2 ∗ − S − S 0 0 Proof: See Appendix A. 2 2.1 2.2 2.3 2.4 2.5 2 2.1 2.2 2.3 2.4 2.5 z z Figure 1 illustrates that in high similarity region, the Figure 2: Frequencies of the z values for all six datasets upper and lower bounds essentially overlap. Note that, in Table 1, where z is defined in (5). We compute z in order to obtain S 1, we need f f (i.e., z 2). ≈ 1 ≈ 2 ≈ for every query-train pair of data points. Anshumali Shrivastava, Ping Li 1 1 For each dataset, we compute both cosine and resem- MNIST Resemblance MNIST Cosine blance for every query-train pair (e.g., 10000 60000 0.9 0.9 × 0.8 pairs for MNIST dataset). For each query point, we 0.8 rank its similarities to all training points in descending 0.7 Similarity 0.7 order. We examine the top-1000 locations as in Fig- 0.6 ure 3. In the left panels, for every top location, we plot 0.5 Resemblance of Rankings 0 1 2 3 0.6 0 1 2 3 10 10 10 10 10 10 10 10 the median (among all query points) of the similari- Top Location Top Location ties, separately for cosine (dashed) and resemblance 0.5 1 NEWS20 Resemblance NEWS20 (solid), together with the lower and upper bounds of 0.4 Cosine 0.9 (dot-dashed). We can see for NEWS20, NYTIMES, 0.3 R 0.8 and RCV1, the data are not too similar. Interestingly, 0.2 Similarity for all six datasets, matches fairly well with the up- 0.7 0.1 R 2 per bound 2 S . In other words, the lower bound 0 Resemblance of Rankings 0.6 −S S 0 1 2 3 0 1 2 3 can be very conservative even in low similarity region. 10 10 10 10 10 10 10 10 Top Location Top Location 0.3 1 The right panels of Figure 3 present the comparisons NYTIMES Resemblance NYTIMES Cosine of the orderings of similarities in an interesting way.
Recommended publications
  • Communications of the Acm
    COMMUNICATIONS CACM.ACM.ORG OF THEACM 11/2014 VOL.57 NO.11 Scene Understanding by Labeling Pixels Evolution of the Product Manager The Data on Diversity On Facebook, Most Ties Are Weak Keeping Online Reviews Honest Association for Computing Machinery tvx-full-page.pdf-newest.pdf 1 11/10/2013 12:03 3-5 JUNE, 2015 BRUSSELS, BELGIUM Course and Workshop C proposals by M 15 November 2014 Y CM Paper Submissions by MY 12 January 2015 CY CMY K Work in Progress, Demos, DC, & Industrial Submissions by 2 March 2015 Welcoming Submissions on Content Production Systems & Infrastructures Devices & Interaction Techniques Experience Design & Evaluation Media Studies Data Science & Recommendations Business Models & Marketing Innovative Concepts & Media Art TVX2015.COM [email protected] ACM Books M MORGAN& CLAYPOOL &C PUBLISHERS Publish your next book in the ACM Digital Library ACM Books is a new series of advanced level books for the computer science community, published by ACM in collaboration with Morgan & Claypool Publishers. I’m pleased that ACM Books is directed by a volunteer organization headed by a dynamic, informed, energetic, visionary Editor-in-Chief (Tamer Özsu), working closely with a forward-looking publisher (Morgan and Claypool). —Richard Snodgrass, University of Arizona books.acm.org ACM Books ◆ will include books from across the entire spectrum of computer science subject matter and will appeal to computing practitioners, researchers, educators, and students. ◆ will publish graduate level texts; research monographs/overviews of established and emerging fields; practitioner-level professional books; and books devoted to the history and social impact of computing. ◆ will be quickly and attractively published as ebooks and print volumes at affordable prices, and widely distributed in both print and digital formats through booksellers and to libraries and individual ACM members via the ACM Digital Library platform.
    [Show full text]
  • Benchmarks for IP Forwarding Tables
    Reviewers James Abello Richard Cleve Vassos Hadzilacos Dimitris Achilioptas James Clippinger Jim Hafner Micah Adler Anne Condon Torben Hagerup Oswin Aichholzer Stephen Cook Armin Haken William Aiello Tom Cormen Shai Halevi Donald Aingworth Dan Dooly Eric Hansen Susanne Albers Oliver Duschka Refael Hassin Eric Allender Martin Dyer Johan Hastad Rajeev Alur Ran El-Yaniv Lisa Hellerstein Andris Ambainis David Eppstein Monika Henzinger Amihood Amir Jeff Erickson Tom Henzinger Artur Andrzejak Kousha Etessami Jeremy Horwitz Boris Aronov Will Evans Russell Impagliazzo Sanjeev Arora Guy Even Piotr Indyk Amotz Barnoy Ron Fagin Sandra Irani Yair Bartal Michalis Faloutsos Ken Jackson Julien Basch Martin Farach-Colton David Johnson Saugata Basu Uri Feige John Jozwiak Bob Beals Joan Feigenbaum Bala Kalyandasundaram Paul Beame Stefan Felsner Ming-Yang Kao Steve Bellantoni Faith Fich Haim Kaplan Micahel Ben-Or Andy Fingerhut Bruce Kapron Josh Benaloh Paul Fischer Michael Kaufmann Charles Bennett Lance Fortnow Michael Kearns Marshall Bern Steve Fortune Sanjeev Khanna Nikolaj Bjorner Alan Frieze Samir Khuller Johannes Blomer Anna Gal Joe Kilian Avrim Blum Naveen Garg Valerie King Dan Boneh Bernd Gartner Philip Klein Andrei Broder Rosario Gennaro Spyros Kontogiannis Nader Bshouty Ashish Goel Gilad Koren Adam Buchsbaum Michel Goemans Dexter Kozen Lynn Burroughs Leslie Goldberg Dina Kravets Ran Canetti Paul Goldberg S. Ravi Kumar Pei Cao Oded Goldreich Eyal Kushilevitz Moses Charikar John Gray Stephen Kwek Chandra Chekuri Dan Greene Larry Larmore Yi-Jen Chiang
    [Show full text]
  • Diffie and Hellman Receive 2015 Turing Award Rod Searcey/Stanford University
    Diffie and Hellman Receive 2015 Turing Award Rod Searcey/Stanford University. Linda A. Cicero/Stanford News Service. Whitfield Diffie Martin E. Hellman ernment–private sector relations, and attracts billions of Whitfield Diffie, former chief security officer of Sun Mi- dollars in research and development,” said ACM President crosystems, and Martin E. Hellman, professor emeritus Alexander L. Wolf. “In 1976, Diffie and Hellman imagined of electrical engineering at Stanford University, have been a future where people would regularly communicate awarded the 2015 A. M. Turing Award of the Association through electronic networks and be vulnerable to having for Computing Machinery for their critical contributions their communications stolen or altered. Now, after nearly to modern cryptography. forty years, we see that their forecasts were remarkably Citation prescient.” The ability for two parties to use encryption to commu- “Public-key cryptography is fundamental for our indus- nicate privately over an otherwise insecure channel is try,” said Andrei Broder, Google Distinguished Scientist. fundamental for billions of people around the world. On “The ability to protect private data rests on protocols for a daily basis, individuals establish secure online connec- confirming an owner’s identity and for ensuring the integ- tions with banks, e-commerce sites, email servers, and the rity and confidentiality of communications. These widely cloud. Diffie and Hellman’s groundbreaking 1976 paper, used protocols were made possible through the ideas and “New Directions in Cryptography,” introduced the ideas of methods pioneered by Diffie and Hellman.” public-key cryptography and digital signatures, which are Cryptography is a practice that facilitates communi- the foundation for most regularly used security protocols cation between two parties so that the communication on the Internet today.
    [Show full text]
  • Communications of the Acm
    COMMUNICATIONS CACM.ACM.ORG OF THEACM 10/2015 VOL.58 NO.10 Discovering Genes Involved in Disease and the Mystery of Missing Heritability Crash Consistency Concerns Rise about AI Seeking Anonymity in an Internet Panopticon What Can Be Done about Gender Diversity in Computing? A Lot! Association for Computing Machinery Previous A.M. Turing Award Recipients 1966 A.J. Perlis 1967 Maurice Wilkes 1968 R.W. Hamming 1969 Marvin Minsky 1970 J.H. Wilkinson 1971 John McCarthy 1972 E.W. Dijkstra 1973 Charles Bachman 1974 Donald Knuth 1975 Allen Newell 1975 Herbert Simon 1976 Michael Rabin 1976 Dana Scott 1977 John Backus ACM A.M. TURING AWARD 1978 Robert Floyd 1979 Kenneth Iverson 1980 C.A.R Hoare NOMINATIONS SOLICITED 1981 Edgar Codd 1982 Stephen Cook Nominations are invited for the 2015 ACM A.M. Turing Award. 1983 Ken Thompson 1983 Dennis Ritchie This is ACM’s oldest and most prestigious award and is presented 1984 Niklaus Wirth annually for major contributions of lasting importance to computing. 1985 Richard Karp 1986 John Hopcroft Although the long-term influences of the nominee’s work are taken 1986 Robert Tarjan into consideration, there should be a particular outstanding and 1987 John Cocke 1988 Ivan Sutherland trendsetting technical achievement that constitutes the principal 1989 William Kahan claim to the award. The recipient presents an address at an ACM event 1990 Fernando Corbató 1991 Robin Milner that will be published in an ACM journal. The award is accompanied 1992 Butler Lampson by a prize of $1,000,000. Financial support for the award is provided 1993 Juris Hartmanis 1993 Richard Stearns by Google Inc.
    [Show full text]
  • Stal Aanderaa Hao Wang Lars Aarvik Martin Abadi Zohar Manna James
    Don Heller J. von zur Gathen Rodney Howell Mark Buckingham Moshe VardiHagit Attiya Raymond Greenlaw Henry Foley Tak-Wah Lam Chul KimEitan Gurari Jerrold W. GrossmanM. Kifer J.F. Traub Brian Leininger Martin Golumbic Amotz Bar-Noy Volker Strassen Catriel Beeri Prabhakar Raghavan Louis E. Rosier Daniel M. Kan Danny Dolev Larry Ruzzo Bala Ravikumar Hsu-Chun Yen David Eppstein Herve Gallaire Clark Thomborson Rajeev Raman Miriam Balaban Arthur Werschulz Stuart Haber Amir Ben-Amram Hui Wang Oscar H. Ibarra Samuel Eilenberg Jim Gray Jik Chang Vardi Amdursky H.T. Kung Konrad Jacobs William Bultman Jacob Gonczarowski Tao Jiang Orli Waarts Richard ColePaul Dietz Zvi Galil Vivek Gore Arnaldo V. Moura Daniel Cohen Kunsoo Park Raffaele Giancarlo Qi Zheng Eli Shamir James Thatcher Cathy McGeoch Clark Thompson Sam Kim Karol Borsuk G.M. Baudet Steve Fortune Michael Harrison Julius Plucker NicholasMichael Tran Palis Daniel Lehmann Wilhelm MaakMartin Dietzfelbinger Arthur Banks Wolfgang Maass Kimberly King Dan Gordon Shafee Give'on Jean Musinski Eric Allender Pino Italiano Serge Plotkin Anil Kamath Jeanette Schmidt-Prozan Moti Yung Amiram Yehudai Felix Klein Joseph Naor John H. Holland Donald Stanat Jon Bentley Trudy Weibel Stefan Mazurkiewicz Daniela Rus Walter Kirchherr Harvey Garner Erich Hecke Martin Strauss Shalom Tsur Ivan Havel Marc Snir John Hopcroft E.F. Codd Chandrajit Bajaj Eli Upfal Guy Blelloch T.K. Dey Ferdinand Lindemann Matt GellerJohn Beatty Bernhard Zeigler James Wyllie Kurt Schutte Norman Scott Ogden Rood Karl Lieberherr Waclaw Sierpinski Carl V. Page Ronald Greenberg Erwin Engeler Robert Millikan Al Aho Richard Courant Fred Kruher W.R. Ham Jim Driscoll David Hilbert Lloyd Devore Shmuel Agmon Charles E.
    [Show full text]
  • Contents U U U
    Contents u u u ACM Awards Reception and Banquet, June 2018 .................................................. 2 Introduction ......................................................................................................................... 3 A.M. Turing Award .............................................................................................................. 4 ACM Prize in Computing ................................................................................................. 5 ACM Charles P. “Chuck” Thacker Breakthrough in Computing Award ............. 6 ACM – AAAI Allen Newell Award .................................................................................. 7 Software System Award ................................................................................................... 8 Grace Murray Hopper Award ......................................................................................... 9 Paris Kanellakis Theory and Practice Award ...........................................................10 Karl V. Karlstrom Outstanding Educator Award .....................................................11 Eugene L. Lawler Award for Humanitarian Contributions within Computer Science and Informatics ..........................................................12 Distinguished Service Award .......................................................................................13 ACM Athena Lecturer Award ........................................................................................14 Outstanding Contribution
    [Show full text]
  • Adaptive Approximate State Storage
    Adaptive Approximate State Storage A dissertation presented by Peter C. Dillinger to the Faculty of the Graduate School of the College of Computer and Information Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy Northeastern University Boston, Massachusetts December, 2010 Acknowledgments I can only begin to thank all who have helped me in completing this Ph.D. I especially thank my advisor, Panagiotis (Pete) Manolios, for his pa- tience, insight, encouragement, constructive criticism, and perspective on computer science, working as a scientist, and everything. After my Bache- lor’s and Master’s degrees at Georgia Tech, I decided to continue there for a Ph.D. because of an awesome professor I had recently started working with, Pete Manolios. I was able to transition to Ph.D. work with the confidence that I had an advisor who would help me find the best ways to apply my strengths and who would work with me on my weaknesses. Without Pete, I would not have been able to distinguish the things that only I would care about from the things that would become widely read and respected. For such successes, first thanks goes to Pete. Although I transferred to Northeastern University to continue working under Pete, I have many people to thank from my time at Georgia Tech. Per- haps at the top of the list is the great teacher and wise computer scientist, Prof. Olin Shivers, because I took three great classes from him while at Geor- gia Tech and he moved to Northeastern University a little before Pete and I did.
    [Show full text]
  • Communications of the Acm
    COMMUNICATIONS CACM.ACM.ORG OF THEACM 11/2017 VOL.60 NO.11 Reconfigurable Cambits Association for Computing Machinery Previous A.M. Turing Award Recipients 1966 A.J. Perlis 1967 Maurice Wilkes 1968 R.W. Hamming 1969 Marvin Minsky 1970 J.H. Wilkinson 1971 John McCarthy 1972 E.W. Dijkstra 1973 Charles Bachman 1974 Donald Knuth 1975 Allen Newell 1975 Herbert Simon 1976 Michael Rabin 1976 Dana Scott 1977 John Backus 1978 Robert Floyd 1979 Kenneth Iverson 1980 C.A.R Hoare ACM A.M. TURING AWARD 1981 Edgar Codd 1982 Stephen Cook 1983 Ken Thompson NOMINATIONS SOLICITED 1983 Dennis Ritchie 1984 Niklaus Wirth Nominations are invited for the 2017 ACM A.M. Turing Award. 1985 Richard Karp 1986 John Hopcroft This is ACM’s oldest and most prestigious award and is given 1986 Robert Tarjan to recognize contributions of a technical nature which are of 1987 John Cocke 1988 Ivan Sutherland lasting and major technical importance to the computing field. 1989 William Kahan The award is accompanied by a prize of $1,000,000. 1990 Fernando Corbató 1991 Robin Milner Financial support for the award is provided by Google Inc. 1992 Butler Lampson 1993 Juris Hartmanis Nomination information and the online submission form 1993 Richard Stearns 1994 Edward Feigenbaum are available on: 1994 Raj Reddy http://amturing.acm.org/call_for_nominations.cfm 1995 Manuel Blum 1996 Amir Pnueli 1997 Douglas Engelbart Additional information on the Turing Laureates 1998 James Gray is available on: 1999 Frederick Brooks http://amturing.acm.org/byyear.cfm 2000 Andrew Yao 2001 Ole-Johan Dahl 2001 Kristen Nygaard The deadline for nominations/endorsements is 2002 Leonard Adleman January 15, 2018.
    [Show full text]
  • Contact: Jim Ormond 212-626-0505 [email protected]
    Contact: Jim Ormond 212-626-0505 [email protected] CRYPTOGRAPHY PIONEERS RECEIVE ACM A.M. TURING AWARD Diffie and Hellman’s Invention of Public-Key Cryptography and Digital Signatures Revolutionized Computer Security and Made Internet Commerce Possible NEW YORK, March 1, 2016 – ACM, the Association for Computing Machinery, (www.acm.org) today named Whitfield Diffie, former Chief Security Officer of Sun Microsystems and Martin E. Hellman, Professor Emeritus of Electrical Engineering at Stanford University, recipients of the 2015 ACM A.M. Turing Award for critical contributions to modern cryptography. The ability for two parties to use encryption to communicate privately over an otherwise insecure channel is fundamental for billions of people around the world. On a daily basis, individuals establish secure online connections with banks, e- commerce sites, email servers and the cloud. Diffie and Hellman’s groundbreaking 1976 paper, “New Directions in Cryptography,” introduced the ideas of public-key cryptography and digital signatures, which are the foundation for most regularly-used security protocols on the Internet today. The Diffie- Hellman Protocol protects daily Internet communications and trillions of dollars in financial transactions. The ACM Turing Award, often referred to as the “Nobel Prize of Computing,” carries a $1 million prize with financial support provided by Google, Inc. It is named for Alan M. Turing, the British mathematician who articulated the mathematical foundation and limits of computing and who was a key contributor to the Allied cryptoanalysis of the German Enigma cipher during World War II. “Today, the subject of encryption dominates the media, is viewed as a matter of national security, impacts government-private sector relations, and attracts billions of dollars in research and development,” said ACM President Alexander L.
    [Show full text]
  • A Bibliography of Publications in ACM SIGACT News
    A Bibliography of Publications in ACM SIGACT News Nelson H. F. Beebe University of Utah Department of Mathematics, 110 LCB 155 S 1400 E RM 233 Salt Lake City, UT 84112-0090 USA Tel: +1 801 581 5254 FAX: +1 801 581 4148 E-mail: [email protected], [email protected], [email protected] (Internet) WWW URL: http://www.math.utah.edu/~beebe/ 11 March 2021 Version 1.28 Title word cross-reference #1 [Mat14a]. #53 [Hes03]. #P [HV95]. f0; 1g [Vio09]. $139.95 [Cha09]. 2 [Hem06a, SS06]. $28.00 [Jou94]. 3 [Sto73a, Sto73b]. $33.95 [Gas05a]. $37.50 [Sta93]. $39.50 [Gas94, O'R92a]. $39.95 [Mak93, Pan93]. 3x + 1 [Sch97]. $46.25 [Vla94]. $49.50 [Hen93]. $49.95 [Pur93, Rie93, Sis93, Zwa93]. 4 × 4 [Shy78a]. $51.48 [Pap05b]. $54.95 [Oli04b]. $60.00 [Ada11]. $69.00 [Mik93]. $72.00 [Oli04c]. $79.95 [Bea07]. 1 [Lit92]. A = B [Kre00, PWZ96]. e [Bla06, Mao94]. i [GKP96]. i; n [GKP96]. k [GG86]. λ [Di 95, Lea97]. N [Nis78, GKP96]. n2 [BBG94]. O(ln n) [Qia87]. O(log log n) [AM75]. O(nlog2 7 log n) [Boo77]. p [Gal74,p LP92]. P = NP [Lip10]. π [Bla97, Hem03a, Mil99, Puc00, SW01]. Σ∗ [Swe87]. −1 [Nah98]. -Aha [Gas09h]. -Calculus [Di 95, Hem03a, Lea97, Mil99, SW01, Puc00]. -colorability [Sto73a, Sto73b]. -complete [Gal74]. -hard [BBG94]. 1 2 -shovelers [LP92]. -stage [Hem06a, SS06]. //cstheory.stackexchange.com [CEG+10]. /Heidelberg [Gri10]. 0 [Kri08, Law06, Lou05, Mos05, Oli06, Pap10, Pri10, Vio04, Wan05, dW07]. 0-201-54886-0 [Vla94]. 0-262-07143-6 [Sta93]. 0-262-08306-X [Wil04].
    [Show full text]
  • Conference Info & Program
    FOCS'89 Conference Information Location The conference will be held at the Sheraton Imperial Hotel, which is located in the Research Triangle Park (RTP) near the Raleigh-Durham International Airport on I-40 at Exit 282 (Page Road exit). (Telephone, inside North Carolina: (800) 222-6503; oth- erwise: (800) 325-3535). All technical sessions and lunches will be held in the meeting rooms of the Sheraton. The cost for reception, lunches, business meeting, and the banquet is included in the reg- ular registration fee. Student registration fees do not include the banquet. Please note that the registration fee increases sharply as of October 1, 1989, so we encourage early registration. Accommodations A block of 250 rooms has been reserved (until October 2nd) at the Sheraton Imperial Hotel. Please make reser- vations directly with the Sheraton using the enclosed form. The October occupancy rate is high, so please register early. Make sure your reservation reaches the Sheraton by 5:00 p.m. on October 2nd to guarantee your accomodation. The prices for rooms are $83 for a single and $88 for a double or triple. In order to receive the special negotiated rate for the conference, please identify yourself with the FOCS'89 Conference. An additional block of rooms has been reserved at the Holiday Inn Raleigh-Durham Airport, located on the west side of Page Road, opposite the Sheraton (approximately a 10 minute walk, 1/2 mile to the Sheraton). The prices of rooms are $75 for a single, $80 for a double and $85 for a triple. Reservations received after October 14th will be provided on a space available basis.
    [Show full text]
  • Fundamental Data Structures
    Fundamental Data Structures PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Thu, 17 Nov 2011 21:36:24 UTC Contents Articles Introduction 1 Abstract data type 1 Data structure 9 Analysis of algorithms 11 Amortized analysis 16 Accounting method 18 Potential method 20 Sequences 22 Array data type 22 Array data structure 26 Dynamic array 31 Linked list 34 Doubly linked list 50 Stack (abstract data type) 54 Queue (abstract data type) 82 Double-ended queue 85 Circular buffer 88 Dictionaries 103 Associative array 103 Association list 106 Hash table 107 Linear probing 120 Quadratic probing 121 Double hashing 125 Cuckoo hashing 126 Hopscotch hashing 130 Hash function 131 Perfect hash function 140 Universal hashing 141 K-independent hashing 146 Tabulation hashing 147 Cryptographic hash function 150 Sets 157 Set (abstract data type) 157 Bit array 161 Bloom filter 166 MinHash 176 Disjoint-set data structure 179 Partition refinement 183 Priority queues 185 Priority queue 185 Heap (data structure) 190 Binary heap 192 d-ary heap 198 Binomial heap 200 Fibonacci heap 205 Pairing heap 210 Double-ended priority queue 213 Soft heap 218 Successors and neighbors 221 Binary search algorithm 221 Binary search tree 228 Random binary tree 238 Tree rotation 241 Self-balancing binary search tree 244 Treap 246 AVL tree 249 Red–black tree 253 Scapegoat tree 268 Splay tree 272 Tango tree 286 Skip list 308 B-tree 314 B+ tree 325 Integer and string searching 330 Trie 330 Radix tree 337 Directed acyclic word graph 339 Suffix tree 341 Suffix array 346 van Emde Boas tree 349 Fusion tree 353 References Article Sources and Contributors 354 Image Sources, Licenses and Contributors 359 Article Licenses License 362 1 Introduction Abstract data type In computing, an abstract data type (ADT) is a mathematical model for a certain class of data structures that have similar behavior; or for certain data types of one or more programming languages that have similar semantics.
    [Show full text]